* Re: [PATCH] staging: media: imx: fix style issues
From: Frank Li @ 2026-03-27 14:47 UTC (permalink / raw)
To: vivek yadav
Cc: slongerbeam, p.zabel, mchehab, gregkh, shawnguo, s.hauer, kernel,
festevam, linux-media, linux-staging, imx, linux-arm-kernel,
linux-kernel
In-Reply-To: <20251202161413.92230-1-y9.vivek@gmail.com>
On Tue, Dec 02, 2025 at 09:44:13PM +0530, vivek yadav wrote:
> Applied checkpatch.pl recommendations:
> - corrected whitespace
> - fixed line length
> - adjusted indentation
>
> Signed-off-by: vivek yadav <y9.vivek@gmail.com>
> ---
Applied, thanks! It should be in media-committers/next.
Frank
> --
> 2.43.0
>
^ permalink raw reply
* Re: [PATCH v2 21/30] KVM: arm64: Kill topup_memcache from kvm_s2_fault
From: Marc Zyngier @ 2026-03-27 14:49 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel, kvm
Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Fuad Tabba, Will Deacon, Quentin Perret
In-Reply-To: <20260327113618.4051534-22-maz@kernel.org>
On Fri, 27 Mar 2026 11:36:09 +0000,
Marc Zyngier <maz@kernel.org> wrote:
>
> The topup_memcache field can be easily replaced by the equivalent
> conditions, and the resulting code is not much worse.
>
> Tested-by: Fuad Tabba <tabba@google.com>
> Reviewed-by: Fuad Tabba <tabba@google.com>
> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
> arch/arm64/kvm/mmu.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index e8bda71e862b2..5b05caecdbd92 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1712,7 +1712,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
>
> struct kvm_s2_fault {
> bool writable;
> - bool topup_memcache;
> bool mte_allowed;
> bool is_vma_cacheable;
> bool s2_force_noncacheable;
> @@ -1983,9 +1982,8 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
> .logging_active = logging_active,
> .force_pte = logging_active,
> .prot = KVM_PGTABLE_PROT_R,
> - .topup_memcache = !perm_fault || (logging_active && kvm_is_write_fault(s2fd->vcpu)),
> };
> - void *memcache;
> + void *memcache = NULL;
> int ret;
>
> /*
> @@ -1994,9 +1992,11 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
> * only exception to this is when dirty logging is enabled at runtime
> * and a write fault needs to collapse a block entry into a table.
> */
> - ret = prepare_mmu_memcache(s2fd->vcpu, fault.topup_memcache, &memcache);
> - if (ret)
> - return ret;
> + if (!perm_fault || (logging_active && kvm_is_write_fault(s2fd->vcpu))) {
> + ret = prepare_mmu_memcache(s2fd->vcpu, true, &memcache);
> + if (ret)
> + return ret;
> + }
>
> /*
> * Let's check if we will get back a huge page backed by hugetlbfs, or
Sashiko has spotted [1] an interesting corner case here, which is that the
original code always initialises memcache to its correct value, while
we now only do it in a limited number of cases.
I'm proposing to restore the original behaviour by folding the
following change into this patch, splitting the retrieval of the
memcache pointer from the top-up and avoiding the ugly pointer
indirection:
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1fe7182be45ac..03e1f389339c7 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1513,25 +1513,22 @@ static bool kvm_vma_is_cacheable(struct vm_area_struct *vma)
}
}
-static int prepare_mmu_memcache(struct kvm_vcpu *vcpu, bool topup_memcache,
- void **memcache)
+static void *get_mmu_memcache(struct kvm_vcpu *vcpu)
{
- int min_pages;
-
if (!is_protected_kvm_enabled())
- *memcache = &vcpu->arch.mmu_page_cache;
+ return &vcpu->arch.mmu_page_cache;
else
- *memcache = &vcpu->arch.pkvm_memcache;
-
- if (!topup_memcache)
- return 0;
+ return &vcpu->arch.pkvm_memcache;
+}
- min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
+static int topup_mmu_memcache(struct kvm_vcpu *vcpu, void *memcache)
+{
+ int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
if (!is_protected_kvm_enabled())
- return kvm_mmu_topup_memory_cache(*memcache, min_pages);
+ return kvm_mmu_topup_memory_cache(memcache, min_pages);
- return topup_hyp_memcache(*memcache, min_pages);
+ return topup_hyp_memcache(memcache, min_pages);
}
/*
@@ -1589,7 +1586,8 @@ static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
gfn_t gfn;
int ret;
- ret = prepare_mmu_memcache(s2fd->vcpu, true, &memcache);
+ memcache = get_mmu_memcache(s2fd->vcpu);
+ ret = topup_mmu_memcache(s2fd->vcpu, memcache);
if (ret)
return ret;
@@ -1993,7 +1991,7 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
struct kvm_s2_fault_vma_info s2vi = {};
enum kvm_pgtable_prot prot;
- void *memcache = NULL;
+ void *memcache;
int ret;
/*
@@ -2002,9 +2000,10 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
* only exception to this is when dirty logging is enabled at runtime
* and a write fault needs to collapse a block entry into a table.
*/
+ memcache = get_mmu_memcache(s2fd->vcpu);
if (!perm_fault || (memslot_is_logging(s2fd->memslot) &&
kvm_is_write_fault(s2fd->vcpu))) {
- ret = prepare_mmu_memcache(s2fd->vcpu, true, &memcache);
+ ret = topup_mmu_memcache(s2fd->vcpu, memcache);
if (ret)
return ret;
}
The bot has also pointed out a couple of cases where memcache and
permission faults interact badly. I'll look into them separately, as
they predate this rework.
Thanks,
M.
[1] https://sashiko.dev/#/patchset/20260327113618.4051534-1-maz%40kernel.org?patch=12134
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply related
* Re: [PATCH] staging: media: Remove unnecessary braces from if statement
From: Frank Li @ 2026-03-27 14:49 UTC (permalink / raw)
To: Ayush Kumar
Cc: slongerbeam, p.zabel, mchehab, gregkh, shawnguo, s.hauer, kernel,
festevam, linux-media, linux-staging, imx, linux-arm-kernel,
linux-kernel, kernel-newbies
In-Reply-To: <aSYcZYDUsJ3jy8cR@lizhi-Precision-Tower-5810>
On Tue, Nov 25, 2025 at 04:15:17PM -0500, Frank Li wrote:
> On Tue, Nov 25, 2025 at 08:23:31PM +0000, Ayush Kumar wrote:
> > Adhering to Linux kernel coding style guidelines (Chapter 3: Indentation).
> >
> > Signed-off-by: Ayush Kumar <ayushkr0s@gmail.com>
> > ---
>
> Reviewed-by: Frank Li <Frank.Li@nxp.com>
Applied, thank! It should be in media-committers/next branch
Frank
>
> > drivers/staging/media/imx/imx-media-of.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/drivers/staging/media/imx/imx-media-of.c b/drivers/staging/media/imx/imx-media-of.c
> > index bb28daa4d713..7413551052ae 100644
> > --- a/drivers/staging/media/imx/imx-media-of.c
> > +++ b/drivers/staging/media/imx/imx-media-of.c
> > @@ -57,9 +57,8 @@ int imx_media_add_of_subdevs(struct imx_media_dev *imxmd,
> > of_node_put(csi_np);
> > if (ret) {
> > /* unavailable or already added is not an error */
> > - if (ret == -ENODEV || ret == -EEXIST) {
> > + if (ret == -ENODEV || ret == -EEXIST)
> > continue;
> > - }
> >
> > /* other error, can't continue */
> > return ret;
> > --
> > 2.43.0
> >
^ permalink raw reply
* Re: [RFC PATCH] dmaengine: xilinx_dma: Fix per-channel direction reporting via device_caps
From: Marek Vasut @ 2026-03-27 14:51 UTC (permalink / raw)
To: Rahul Navale, Folker Schwesinger
Cc: Rahul Navale, dmaengine, linux-arm-kernel, linux-kernel, vkoul,
Frank.Li, michal.simek, suraj.gupta2, thomas.gessler,
radhey.shyam.pandey, tomi.valkeinen, Michal Simek
In-Reply-To: <20260318123524.4959-1-rahulnavale04@gmail.com>
On 3/18/26 1:35 PM, Rahul Navale wrote:
Hello Rahul,
>> If yes,make sure you only test these three
> I have confirmed no other pathes applied on xilinx dma driver.
> I have applied only three patches provided by you.
> and tested audio but facing same issue.
Can you please add [1] to the patch stack and let me know whether that
improves the behavior ?
Thank you
[1]
https://lore.kernel.org/linux-sound/20260327143014.54867-1-marex@nabladev.com/
^ permalink raw reply
* Re: [PATCH v3 4/5] KVM: arm64: Enable HDBSS support and handle HDBSSF events
From: Leonardo Bras @ 2026-03-27 15:00 UTC (permalink / raw)
To: Tian Zheng
Cc: Leonardo Bras, maz, oupton, catalin.marinas, corbet, pbonzini,
will, yuzenghui, wangzhou1, liuyonglong, Jonathan.Cameron,
yezhenyu2, linuxarm, joey.gouly, kvmarm, kvm, linux-arm-kernel,
linux-doc, linux-kernel, skhan, suzuki.poulose
In-Reply-To: <e3253959-0340-4c13-a980-a599e090a6de@huawei.com>
On Fri, Mar 27, 2026 at 03:35:29PM +0800, Tian Zheng wrote:
>
> On 3/26/2026 2:05 AM, Leonardo Bras wrote:
> > Hello Tian,
> >
> > I am currently working on HACDBS enablement(which will be rebased on top of
> > this patchset) and due to the fact HACDBS and HDBSS are kind of
> > complementary I will sometimes come with some questions for issues I have
> > faced myself on that part. :)
> >
> > (see below)
>
>
> Of course! Happy to exchange ideas and learn together.
:)
>
>
> >
> > On Wed, Feb 25, 2026 at 12:04:20PM +0800, Tian Zheng wrote:
> > > From: eillon <yezhenyu2@huawei.com>
> > >
> > > HDBSS is enabled via an ioctl from userspace (e.g. QEMU) at the start of
> > > migration. This feature is only supported in VHE mode.
> > >
> > > Initially, S2 PTEs doesn't contain the DBM attribute. During migration,
> > > write faults are handled by user_mem_abort, which relaxes permissions
> > > and adds the DBM bit when HDBSS is active. Once DBM is set, subsequent
> > > writes no longer trap, as the hardware automatically transitions the page
> > > from writable-clean to writable-dirty.
> > >
> > > KVM does not scan S2 page tables to consume DBM. Instead, when HDBSS is
> > > enabled, the hardware observes the clean->dirty transition and records
> > > the corresponding page into the HDBSS buffer.
> > >
> > > During sync_dirty_log, KVM kicks all vCPUs to force VM-Exit, ensuring
> > > that check_vcpu_requests flushes the HDBSS buffer and propagates the
> > > accumulated dirty information into the userspace-visible dirty bitmap.
> > >
> > > Add fault handling for HDBSS including buffer full, external abort, and
> > > general protection fault (GPF).
> > >
> > > Signed-off-by: eillon <yezhenyu2@huawei.com>
> > > Signed-off-by: Tian Zheng <zhengtian10@huawei.com>
> > > ---
> > > arch/arm64/include/asm/esr.h | 5 ++
> > > arch/arm64/include/asm/kvm_host.h | 17 +++++
> > > arch/arm64/include/asm/kvm_mmu.h | 1 +
> > > arch/arm64/include/asm/sysreg.h | 11 ++++
> > > arch/arm64/kvm/arm.c | 102 ++++++++++++++++++++++++++++++
> > > arch/arm64/kvm/hyp/vhe/switch.c | 19 ++++++
> > > arch/arm64/kvm/mmu.c | 70 ++++++++++++++++++++
> > > arch/arm64/kvm/reset.c | 3 +
> > > 8 files changed, 228 insertions(+)
> > >
> > > diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> > > index 81c17320a588..2e6b679b5908 100644
> > > --- a/arch/arm64/include/asm/esr.h
> > > +++ b/arch/arm64/include/asm/esr.h
> > > @@ -437,6 +437,11 @@
> > > #ifndef __ASSEMBLER__
> > > #include <asm/types.h>
> > >
> > > +static inline bool esr_iss2_is_hdbssf(unsigned long esr)
> > > +{
> > > + return ESR_ELx_ISS2(esr) & ESR_ELx_HDBSSF;
> > > +}
> > > +
> > > static inline unsigned long esr_brk_comment(unsigned long esr)
> > > {
> > > return esr & ESR_ELx_BRK64_ISS_COMMENT_MASK;
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index 5d5a3bbdb95e..57ee6b53e061 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -55,12 +55,17 @@
> > > #define KVM_REQ_GUEST_HYP_IRQ_PENDING KVM_ARCH_REQ(9)
> > > #define KVM_REQ_MAP_L1_VNCR_EL2 KVM_ARCH_REQ(10)
> > > #define KVM_REQ_VGIC_PROCESS_UPDATE KVM_ARCH_REQ(11)
> > > +#define KVM_REQ_FLUSH_HDBSS KVM_ARCH_REQ(12)
> > >
> > > #define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
> > > KVM_DIRTY_LOG_INITIALLY_SET)
> > >
> > > #define KVM_HAVE_MMU_RWLOCK
> > >
> > > +/* HDBSS entry field definitions */
> > > +#define HDBSS_ENTRY_VALID BIT(0)
> > > +#define HDBSS_ENTRY_IPA GENMASK_ULL(55, 12)
> > > +
> > > /*
> > > * Mode of operation configurable with kvm-arm.mode early param.
> > > * See Documentation/admin-guide/kernel-parameters.txt for more information.
> > > @@ -84,6 +89,7 @@ int __init kvm_arm_init_sve(void);
> > > u32 __attribute_const__ kvm_target_cpu(void);
> > > void kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> > > void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
> > > +void kvm_arm_vcpu_free_hdbss(struct kvm_vcpu *vcpu);
> > >
> > > struct kvm_hyp_memcache {
> > > phys_addr_t head;
> > > @@ -405,6 +411,8 @@ struct kvm_arch {
> > > * the associated pKVM instance in the hypervisor.
> > > */
> > > struct kvm_protected_vm pkvm;
> > > +
> > > + bool enable_hdbss;
> > > };
> > >
> > > struct kvm_vcpu_fault_info {
> > > @@ -816,6 +824,12 @@ struct vcpu_reset_state {
> > > bool reset;
> > > };
> > >
> > > +struct vcpu_hdbss_state {
> > > + phys_addr_t base_phys;
> > > + u32 size;
> > > + u32 next_index;
> > > +};
> > > +
> > > struct vncr_tlb;
> > >
> > > struct kvm_vcpu_arch {
> > > @@ -920,6 +934,9 @@ struct kvm_vcpu_arch {
> > >
> > > /* Per-vcpu TLB for VNCR_EL2 -- NULL when !NV */
> > > struct vncr_tlb *vncr_tlb;
> > > +
> > > + /* HDBSS registers info */
> > > + struct vcpu_hdbss_state hdbss;
> > > };
> > >
> > > /*
> > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > > index d968aca0461a..3fea8cfe8869 100644
> > > --- a/arch/arm64/include/asm/kvm_mmu.h
> > > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > > @@ -183,6 +183,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> > >
> > > int kvm_handle_guest_sea(struct kvm_vcpu *vcpu);
> > > int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
> > > +void kvm_flush_hdbss_buffer(struct kvm_vcpu *vcpu);
> > >
> > > phys_addr_t kvm_mmu_get_httbr(void);
> > > phys_addr_t kvm_get_idmap_vector(void);
> > > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> > > index f4436ecc630c..d11f4d0dd4e7 100644
> > > --- a/arch/arm64/include/asm/sysreg.h
> > > +++ b/arch/arm64/include/asm/sysreg.h
> > > @@ -1039,6 +1039,17 @@
> > >
> > > #define GCS_CAP(x) ((((unsigned long)x) & GCS_CAP_ADDR_MASK) | \
> > > GCS_CAP_VALID_TOKEN)
> > > +
> > > +/*
> > > + * Definitions for the HDBSS feature
> > > + */
> > > +#define HDBSS_MAX_SIZE HDBSSBR_EL2_SZ_2MB
> > > +
> > > +#define HDBSSBR_EL2(baddr, sz) (((baddr) & GENMASK(55, 12 + sz)) | \
> > > + FIELD_PREP(HDBSSBR_EL2_SZ_MASK, sz))
> > > +
> > > +#define HDBSSPROD_IDX(prod) FIELD_GET(HDBSSPROD_EL2_INDEX_MASK, prod)
> > > +
> > > /*
> > > * Definitions for GICv5 instructions]
> > > */
> > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > index 29f0326f7e00..d64da05e25c4 100644
> > > --- a/arch/arm64/kvm/arm.c
> > > +++ b/arch/arm64/kvm/arm.c
> > > @@ -125,6 +125,87 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> > > return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
> > > }
> > >
> > > +void kvm_arm_vcpu_free_hdbss(struct kvm_vcpu *vcpu)
> > > +{
> > > + struct page *hdbss_pg;
> > > +
> > > + hdbss_pg = phys_to_page(vcpu->arch.hdbss.base_phys);
> > > + if (hdbss_pg)
> > > + __free_pages(hdbss_pg, vcpu->arch.hdbss.size);
> > > +
> > > + vcpu->arch.hdbss.size = 0;
> > > +}
> > > +
> > > +static int kvm_cap_arm_enable_hdbss(struct kvm *kvm,
> > > + struct kvm_enable_cap *cap)
> > > +{
> > > + unsigned long i;
> > > + struct kvm_vcpu *vcpu;
> > > + struct page *hdbss_pg = NULL;
> > > + __u64 size = cap->args[0];
> > > + bool enable = cap->args[1] ? true : false;
> > > +
> > > + if (!system_supports_hdbss())
> > > + return -EINVAL;
> > > +
> > > + if (size > HDBSS_MAX_SIZE)
> > > + return -EINVAL;
> > > +
> > > + if (!enable && !kvm->arch.enable_hdbss) /* Already Off */
> > > + return 0;
> > > +
> > > + if (enable && kvm->arch.enable_hdbss) /* Already On, can't set size */
> > > + return -EINVAL;
> > > +
> > > + if (!enable) { /* Turn it off */
> > > + kvm->arch.mmu.vtcr &= ~(VTCR_EL2_HD | VTCR_EL2_HDBSS | VTCR_EL2_HA);
> > > +
> > > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > > + /* Kick vcpus to flush hdbss buffer. */
> > > + kvm_vcpu_kick(vcpu);
> > > +
> > > + kvm_arm_vcpu_free_hdbss(vcpu);
> > > + }
> > > +
> > > + kvm->arch.enable_hdbss = false;
> > > +
> > > + return 0;
> > > + }
> > > +
> > > + /* Turn it on */
> > > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > > + hdbss_pg = alloc_pages(GFP_KERNEL_ACCOUNT, size);
> > > + if (!hdbss_pg)
> > > + goto error_alloc;
> > > +
> > > + vcpu->arch.hdbss = (struct vcpu_hdbss_state) {
> > > + .base_phys = page_to_phys(hdbss_pg),
> > > + .size = size,
> > > + .next_index = 0,
> > > + };
> > > + }
> > > +
> > > + kvm->arch.enable_hdbss = true;
> > > + kvm->arch.mmu.vtcr |= VTCR_EL2_HD | VTCR_EL2_HDBSS | VTCR_EL2_HA;
> > > +
> > > + /*
> > > + * We should kick vcpus out of guest mode here to load new
> > > + * vtcr value to vtcr_el2 register when re-enter guest mode.
> > > + */
> > > + kvm_for_each_vcpu(i, vcpu, kvm)
> > > + kvm_vcpu_kick(vcpu);
> > > +
> > > + return 0;
> > > +
> > > +error_alloc:
> > > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > > + if (vcpu->arch.hdbss.base_phys)
> > > + kvm_arm_vcpu_free_hdbss(vcpu);
> > > + }
> > > +
> > > + return -ENOMEM;
> > > +}
> > > +
> > > int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > > struct kvm_enable_cap *cap)
> > > {
> > > @@ -182,6 +263,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > > r = 0;
> > > set_bit(KVM_ARCH_FLAG_EXIT_SEA, &kvm->arch.flags);
> > > break;
> > > + case KVM_CAP_ARM_HW_DIRTY_STATE_TRACK:
> > > + mutex_lock(&kvm->lock);
> > > + r = kvm_cap_arm_enable_hdbss(kvm, cap);
> > > + mutex_unlock(&kvm->lock);
> > > + break;
> > > default:
> > > break;
> > > }
> > > @@ -471,6 +557,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > > r = kvm_supports_cacheable_pfnmap();
> > > break;
> > >
> > > + case KVM_CAP_ARM_HW_DIRTY_STATE_TRACK:
> > > + r = system_supports_hdbss();
> > > + break;
> > > default:
> > > r = 0;
> > > }
> > > @@ -1120,6 +1209,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
> > > if (kvm_dirty_ring_check_request(vcpu))
> > > return 0;
> > >
> > > + if (kvm_check_request(KVM_REQ_FLUSH_HDBSS, vcpu))
> > > + kvm_flush_hdbss_buffer(vcpu);
> > > +
> > > check_nested_vcpu_requests(vcpu);
> > > }
> > >
> > > @@ -1898,7 +1990,17 @@ long kvm_arch_vcpu_unlocked_ioctl(struct file *filp, unsigned int ioctl,
> > >
> > > void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
> > > {
> > > + /*
> > > + * Flush all CPUs' dirty log buffers to the dirty_bitmap. Called
> > > + * before reporting dirty_bitmap to userspace. Send a request with
> > > + * KVM_REQUEST_WAIT to flush buffer synchronously.
> > > + */
> > > + struct kvm_vcpu *vcpu;
> > > +
> > > + if (!kvm->arch.enable_hdbss)
> > > + return;
> > >
> > > + kvm_make_all_cpus_request(kvm, KVM_REQ_FLUSH_HDBSS);
> > > }
> > >
> > > static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
> > > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> > > index 9db3f11a4754..600cbc4f8ae9 100644
> > > --- a/arch/arm64/kvm/hyp/vhe/switch.c
> > > +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> > > @@ -213,6 +213,23 @@ static void __vcpu_put_deactivate_traps(struct kvm_vcpu *vcpu)
> > > local_irq_restore(flags);
> > > }
> > >
> > > +static void __load_hdbss(struct kvm_vcpu *vcpu)
> > > +{
> > > + struct kvm *kvm = vcpu->kvm;
> > > + u64 br_el2, prod_el2;
> > > +
> > > + if (!kvm->arch.enable_hdbss)
> > > + return;
> > > +
> > > + br_el2 = HDBSSBR_EL2(vcpu->arch.hdbss.base_phys, vcpu->arch.hdbss.size);
> > > + prod_el2 = vcpu->arch.hdbss.next_index;
> > > +
> > > + write_sysreg_s(br_el2, SYS_HDBSSBR_EL2);
> > > + write_sysreg_s(prod_el2, SYS_HDBSSPROD_EL2);
> > > +
> > > + isb();
> > > +}
> > > +
> > I see in the code below you trust that the tracking will happen with
> > PAGE_SIZE granularity (you track with PAGE_SHIFT).
> >
> > That may be a problem when we have guest memory backed by hugepages or
> > transparent huge pages.
> >
> > When we are using HDBSS, there is no fault happening, so we have no way of
> > doing on-demand block splitting, so we need to make use of eager block
> > splitting, _before_ we start to track anything, or else we may have
> > different-sized pages in the HDBSS buffer, which is harder to deal with.
> >
> > Suggestion: do the eager splitting before we enable HDBSS.
> >
> > For this to happen, we have to enable the EAGER_SPLIT_CHUNK_SIZE
> > capability, which can only be enabled when all memslots are empty.
> >
> > I suggest doing that at kvm_init_stage2_mmu(), and checking if HDBSS is
> > in which case we set mmu->split_page_chunk_size to PAGESIZE.
> >
> > I will send a patch you can put before this one to make sure it works :)
> >
> > Thanks!
> > Leo
>
> Hi Leo,
>
> Thanks for the helpful suggestion. I had previously traced the
> hugepage-splitting path
>
> during live migration and found that when migration starts, enabling dirty
> logging
>
> triggers the splitting path. I also tested HDBSS with traditional hugepages
> and haven't
>
> observed any issues yet.
>
>
> However, your concern is valid — there may be cases not covered, especially
> when the
>
> VMM uses transparent hugepages. I'll integrate your patch into the next
> version and
>
> run some tests.
>
>
> For reference, here's the path I traced:
>
> ```
>
> - userspace, e.g., QEMU
>
> kvm_log_start
> +-> kvm_section_update_flags
> +-> kvm_slot_update_flags
> |
> | // For each memory region, QEMU issues a
> KVM_SET_USER_MEMORY_REGION ioctl.
> | // Before issuing it, flags are updated to include
> KVM_MEM_LOG_DIRTY_PAGES.
> +-> kvm_mem_flags
> +-> kvm_set_user_memory_region // ioctl that enables dirty logging
> on the memslot
>
> - KVM
>
> KVM_SET_USER_MEMORY_REGION
> +-> kvm_vm_ioctl_set_memory_region
> +-> kvm_set_memory_region / __kvm_set_memory_region
> +-> kvm_set_memslot
> +-> kvm_commit_memory_region
> +-> kvm_arch_commit_memory_region
> +-> kvm_mmu_split_memory_region
> // Splits Stage-2 hugepages/contiguous mappings into
> 4KB PTEs.
Right, except on a case we have dirty_log_manual_protect and init_set, when
it returns before splitting pages:
```
if (kvm_dirty_log_manual_protect_and_init_set(kvm))
return;
```
IIUC, that's desired to avoid holding the lock for a long time while it
cleans every page in the beginning, and instead do it in a per dirty-page
basis. I guess it may benefit guests with very little dirty pages, as it
does not have to split/dirty everything at the start.
(Its a pain for my HACDBS routines, though)
> +-> kvm_mmu_split_huge_pages
Other important point here:
You can see in this function it skips splitting if chunk_size == 0.
This value is set by a capability that configures EAGER_SPLIT, meaning
splitting before the guest have write faults, which is nice as the
write-fault is faster.
Two points in this capability:
- It's optional, if it's not set, only on-demand splitting (on fault) will
happen, and since HDBSS removes the write-fault, we have no splitting
- It can be set to any valid block size, not only 4K, nor PAGE_SIZE, it can
be set to PMD_SIZE, PUD_SIZE, and so on, which will depend on the
PAGE_SIZE the kernel was compiled to.
That's only some points to keep in mind :)
if (kvm_dirty_log_manual_protect_and_init_set(kvm))
return;
> +-> kvm_pgtable_stage2_split
>
> ```
>
> Thanks again for the detailed explanation and for sending the patch.
>
Thank you for the collaboration on this!
Leo
> > > void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu)
> > > {
> > > host_data_ptr(host_ctxt)->__hyp_running_vcpu = vcpu;
> > > @@ -220,10 +237,12 @@ void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu)
> > > __vcpu_load_switch_sysregs(vcpu);
> > > __vcpu_load_activate_traps(vcpu);
> > > __load_stage2(vcpu->arch.hw_mmu, vcpu->arch.hw_mmu->arch);
> > > + __load_hdbss(vcpu);
> > > }
> > >
> > > void kvm_vcpu_put_vhe(struct kvm_vcpu *vcpu)
> > > {
> > > + kvm_flush_hdbss_buffer(vcpu);
> > > __vcpu_put_deactivate_traps(vcpu);
> > > __vcpu_put_switch_sysregs(vcpu);
> > >
> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > index 070a01e53fcb..42b0710a16ce 100644
> > > --- a/arch/arm64/kvm/mmu.c
> > > +++ b/arch/arm64/kvm/mmu.c
> > > @@ -1896,6 +1896,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > > if (writable)
> > > prot |= KVM_PGTABLE_PROT_W;
> > >
> > > + if (writable && kvm->arch.enable_hdbss && logging_active)
> > > + prot |= KVM_PGTABLE_PROT_DBM;
> > > +
> > > if (exec_fault)
> > > prot |= KVM_PGTABLE_PROT_X;
> > >
> > > @@ -2033,6 +2036,70 @@ int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> > > return 0;
> > > }
> > >
> > > +void kvm_flush_hdbss_buffer(struct kvm_vcpu *vcpu)
> > > +{
> > > + int idx, curr_idx;
> > > + u64 br_el2;
> > > + u64 *hdbss_buf;
> > > + struct kvm *kvm = vcpu->kvm;
> > > +
> > > + if (!kvm->arch.enable_hdbss)
> > > + return;
> > > +
> > > + curr_idx = HDBSSPROD_IDX(read_sysreg_s(SYS_HDBSSPROD_EL2));
> > > + br_el2 = HDBSSBR_EL2(vcpu->arch.hdbss.base_phys, vcpu->arch.hdbss.size);
> > > +
> > > + /* Do nothing if HDBSS buffer is empty or br_el2 is NULL */
> > > + if (curr_idx == 0 || br_el2 == 0)
> > > + return;
> > > +
> > > + hdbss_buf = page_address(phys_to_page(vcpu->arch.hdbss.base_phys));
> > > + if (!hdbss_buf)
> > > + return;
> > > +
> > > + guard(write_lock_irqsave)(&vcpu->kvm->mmu_lock);
> > > + for (idx = 0; idx < curr_idx; idx++) {
> > > + u64 gpa;
> > > +
> > > + gpa = hdbss_buf[idx];
> > > + if (!(gpa & HDBSS_ENTRY_VALID))
> > > + continue;
> > > +
> > > + gpa &= HDBSS_ENTRY_IPA;
> > > + kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
> > > + }
> > Here ^
>
> Thanks!
>
> Tian
>
>
> >
> > > +
> > > + /* reset HDBSS index */
> > > + write_sysreg_s(0, SYS_HDBSSPROD_EL2);
> > > + vcpu->arch.hdbss.next_index = 0;
> > > + isb();
> > > +}
> > > +
> > > +static int kvm_handle_hdbss_fault(struct kvm_vcpu *vcpu)
> > > +{
> > > + u64 prod;
> > > + u64 fsc;
> > > +
> > > + prod = read_sysreg_s(SYS_HDBSSPROD_EL2);
> > > + fsc = FIELD_GET(HDBSSPROD_EL2_FSC_MASK, prod);
> > > +
> > > + switch (fsc) {
> > > + case HDBSSPROD_EL2_FSC_OK:
> > > + /* Buffer full, which is reported as permission fault. */
> > > + kvm_flush_hdbss_buffer(vcpu);
> > > + return 1;
> > > + case HDBSSPROD_EL2_FSC_ExternalAbort:
> > > + case HDBSSPROD_EL2_FSC_GPF:
> > > + return -EFAULT;
> > > + default:
> > > + /* Unknown fault. */
> > > + WARN_ONCE(1,
> > > + "Unexpected HDBSS fault type, FSC: 0x%llx (prod=0x%llx, vcpu=%d)\n",
> > > + fsc, prod, vcpu->vcpu_id);
> > > + return -EFAULT;
> > > + }
> > > +}
> > > +
> > > /**
> > > * kvm_handle_guest_abort - handles all 2nd stage aborts
> > > * @vcpu: the VCPU pointer
> > > @@ -2071,6 +2138,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > >
> > > is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> > >
> > > + if (esr_iss2_is_hdbssf(esr))
> > > + return kvm_handle_hdbss_fault(vcpu);
> > > +
> > > if (esr_fsc_is_translation_fault(esr)) {
> > > /* Beyond sanitised PARange (which is the IPA limit) */
> > > if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
> > > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > > index 959532422d3a..c03a4b310b53 100644
> > > --- a/arch/arm64/kvm/reset.c
> > > +++ b/arch/arm64/kvm/reset.c
> > > @@ -161,6 +161,9 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
> > > free_page((unsigned long)vcpu->arch.ctxt.vncr_array);
> > > kfree(vcpu->arch.vncr_tlb);
> > > kfree(vcpu->arch.ccsidr);
> > > +
> > > + if (vcpu->kvm->arch.enable_hdbss)
> > > + kvm_arm_vcpu_free_hdbss(vcpu);
> > > }
> > >
> > > static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
> > > --
> > > 2.33.0
> > >
^ permalink raw reply
* [PATCH v3] mailbox: remove superfluous internal header
From: Wolfram Sang @ 2026-03-27 15:10 UTC (permalink / raw)
To: linux-renesas-soc
Cc: Wolfram Sang, Sudeep Holla, Daniel Baluta, Peter Chen,
Fugang Duan, CIX Linux Kernel Upstream Group, Jassi Brar,
Frank Li, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
Thierry Reding, Jonathan Hunter, linux-kernel, linux-arm-kernel,
imx, linux-acpi, linux-tegra
Quite some controller drivers use the defines from the internal header
already. This prevents controller drivers outside the mailbox directory.
Move the defines to the public controller header to allow this again as
the defines are not strictly internal anyhow.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Sudeep Holla <sudeep.holla@kernel.org>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
---
Changes since v2:
* rebased to 7.0-rc5
* add tag (Thanks, Daniel!)
drivers/mailbox/cix-mailbox.c | 2 --
drivers/mailbox/hi3660-mailbox.c | 2 --
drivers/mailbox/imx-mailbox.c | 2 --
drivers/mailbox/mailbox-sti.c | 2 --
drivers/mailbox/mailbox.c | 2 --
drivers/mailbox/mailbox.h | 12 ------------
drivers/mailbox/omap-mailbox.c | 2 --
drivers/mailbox/pcc.c | 2 --
drivers/mailbox/tegra-hsp.c | 2 --
include/linux/mailbox_controller.h | 5 +++++
10 files changed, 5 insertions(+), 28 deletions(-)
delete mode 100644 drivers/mailbox/mailbox.h
diff --git a/drivers/mailbox/cix-mailbox.c b/drivers/mailbox/cix-mailbox.c
index 443620e8ae37..864f98f21fc3 100644
--- a/drivers/mailbox/cix-mailbox.c
+++ b/drivers/mailbox/cix-mailbox.c
@@ -12,8 +12,6 @@
#include <linux/module.h>
#include <linux/platform_device.h>
-#include "mailbox.h"
-
/*
* The maximum transmission size is 32 words or 128 bytes.
*/
diff --git a/drivers/mailbox/hi3660-mailbox.c b/drivers/mailbox/hi3660-mailbox.c
index 17c29e960fbf..9b727a2b54a5 100644
--- a/drivers/mailbox/hi3660-mailbox.c
+++ b/drivers/mailbox/hi3660-mailbox.c
@@ -15,8 +15,6 @@
#include <linux/platform_device.h>
#include <linux/slab.h>
-#include "mailbox.h"
-
#define MBOX_CHAN_MAX 32
#define MBOX_RX 0x0
diff --git a/drivers/mailbox/imx-mailbox.c b/drivers/mailbox/imx-mailbox.c
index 003f9236c35e..22331b579489 100644
--- a/drivers/mailbox/imx-mailbox.c
+++ b/drivers/mailbox/imx-mailbox.c
@@ -23,8 +23,6 @@
#include <linux/slab.h>
#include <linux/workqueue.h>
-#include "mailbox.h"
-
#define IMX_MU_CHANS 24
/* TX0/RX0/RXDB[0-3] */
#define IMX_MU_SCU_CHANS 6
diff --git a/drivers/mailbox/mailbox-sti.c b/drivers/mailbox/mailbox-sti.c
index b4b5bdd503cf..b6c9ecbbc8ec 100644
--- a/drivers/mailbox/mailbox-sti.c
+++ b/drivers/mailbox/mailbox-sti.c
@@ -21,8 +21,6 @@
#include <linux/property.h>
#include <linux/slab.h>
-#include "mailbox.h"
-
#define STI_MBOX_INST_MAX 4 /* RAM saving: Max supported instances */
#define STI_MBOX_CHAN_MAX 20 /* RAM saving: Max supported channels */
diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index e63b2292ee7a..9d41a1ab9018 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -18,8 +18,6 @@
#include <linux/property.h>
#include <linux/spinlock.h>
-#include "mailbox.h"
-
static LIST_HEAD(mbox_cons);
static DEFINE_MUTEX(con_mutex);
diff --git a/drivers/mailbox/mailbox.h b/drivers/mailbox/mailbox.h
deleted file mode 100644
index e1ec4efab693..000000000000
--- a/drivers/mailbox/mailbox.h
+++ /dev/null
@@ -1,12 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-
-#ifndef __MAILBOX_H
-#define __MAILBOX_H
-
-#include <linux/bits.h>
-
-#define TXDONE_BY_IRQ BIT(0) /* controller has remote RTR irq */
-#define TXDONE_BY_POLL BIT(1) /* controller can read status of last TX */
-#define TXDONE_BY_ACK BIT(2) /* S/W ACK received by Client ticks the TX */
-
-#endif /* __MAILBOX_H */
diff --git a/drivers/mailbox/omap-mailbox.c b/drivers/mailbox/omap-mailbox.c
index d9f100c18895..5772c6b9886a 100644
--- a/drivers/mailbox/omap-mailbox.c
+++ b/drivers/mailbox/omap-mailbox.c
@@ -22,8 +22,6 @@
#include <linux/pm_runtime.h>
#include <linux/mailbox_controller.h>
-#include "mailbox.h"
-
#define MAILBOX_REVISION 0x000
#define MAILBOX_MESSAGE(m) (0x040 + 4 * (m))
#define MAILBOX_FIFOSTATUS(m) (0x080 + 4 * (m))
diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c
index 22e70af1ae5d..636879ae1db7 100644
--- a/drivers/mailbox/pcc.c
+++ b/drivers/mailbox/pcc.c
@@ -59,8 +59,6 @@
#include <linux/io-64-nonatomic-lo-hi.h>
#include <acpi/pcc.h>
-#include "mailbox.h"
-
#define MBOX_IRQ_NAME "pcc-mbox"
/**
diff --git a/drivers/mailbox/tegra-hsp.c b/drivers/mailbox/tegra-hsp.c
index ed9a0bb2bcd8..2231050bb5a9 100644
--- a/drivers/mailbox/tegra-hsp.c
+++ b/drivers/mailbox/tegra-hsp.c
@@ -16,8 +16,6 @@
#include <dt-bindings/mailbox/tegra186-hsp.h>
-#include "mailbox.h"
-
#define HSP_INT_IE(x) (0x100 + ((x) * 4))
#define HSP_INT_IV 0x300
#define HSP_INT_IR 0x304
diff --git a/include/linux/mailbox_controller.h b/include/linux/mailbox_controller.h
index 80a427c7ca29..16fef421c30c 100644
--- a/include/linux/mailbox_controller.h
+++ b/include/linux/mailbox_controller.h
@@ -3,6 +3,7 @@
#ifndef __MAILBOX_CONTROLLER_H
#define __MAILBOX_CONTROLLER_H
+#include <linux/bits.h>
#include <linux/completion.h>
#include <linux/device.h>
#include <linux/hrtimer.h>
@@ -11,6 +12,10 @@
struct mbox_chan;
+#define TXDONE_BY_IRQ BIT(0) /* controller has remote RTR irq */
+#define TXDONE_BY_POLL BIT(1) /* controller can read status of last TX */
+#define TXDONE_BY_ACK BIT(2) /* S/W ACK received by Client ticks the TX */
+
/**
* struct mbox_chan_ops - methods to control mailbox channels
* @send_data: The API asks the MBOX controller driver, in atomic
--
2.51.0
^ permalink raw reply related
* [PATCH v2] mailbox: exynos: drop superfluous mbox setting per channel
From: Wolfram Sang @ 2026-03-27 15:12 UTC (permalink / raw)
To: linux-renesas-soc
Cc: Wolfram Sang, Tudor Ambarus, Jassi Brar, Krzysztof Kozlowski,
Alim Akhtar, linux-kernel, linux-samsung-soc, linux-arm-kernel
The core initializes the 'mbox' field exactly like this, so don't
duplicate it in the driver.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Tested-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
Changes since v1:
* rebased to 7.0-rc5
* add tags (Thanks, Tudor!) and dropped RFT
drivers/mailbox/exynos-mailbox.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/mailbox/exynos-mailbox.c b/drivers/mailbox/exynos-mailbox.c
index 5f2d3b81c1db..d2355b128ba4 100644
--- a/drivers/mailbox/exynos-mailbox.c
+++ b/drivers/mailbox/exynos-mailbox.c
@@ -99,7 +99,6 @@ static int exynos_mbox_probe(struct platform_device *pdev)
struct mbox_controller *mbox;
struct mbox_chan *chans;
struct clk *pclk;
- int i;
exynos_mbox = devm_kzalloc(dev, sizeof(*exynos_mbox), GFP_KERNEL);
if (!exynos_mbox)
@@ -129,9 +128,6 @@ static int exynos_mbox_probe(struct platform_device *pdev)
mbox->ops = &exynos_mbox_chan_ops;
mbox->of_xlate = exynos_mbox_of_xlate;
- for (i = 0; i < EXYNOS_MBOX_CHAN_COUNT; i++)
- chans[i].mbox = mbox;
-
exynos_mbox->mbox = mbox;
platform_set_drvdata(pdev, exynos_mbox);
--
2.51.0
^ permalink raw reply related
* [PATCH 0/4] media: rkvdec: Switch to using a bitwriter
From: Detlev Casanova @ 2026-03-27 15:15 UTC (permalink / raw)
To: Ezequiel Garcia, Mauro Carvalho Chehab, Heiko Stuebner,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonas Karlman, Nicolas Dufresne
Cc: linux-kernel, linux-media, linux-rockchip, linux-arm-kernel, llvm,
kernel, Detlev Casanova
Using bitfields in large structures where fields are mostly unaligned can
be hard on the compiler.
Issues have been reported with clang ([1], [2]) and, even though those
issues are addressed by clang devs, some setup can't or won't update clang
just to compile a driver.
Even when fixed, the compiler still might have to allocate a bigger stack
frame to manage misalignement. Coupled with other features like KASAN, the
stack becomes larger than the kernel's maximum [3].
To avoid this, let's drop the bitfield implementation and switch to a
bitwriter. There is already one for the older variants, so make it global
and use it in other variants.
Note that only buffer structures are switched to the bitwriter. The
registers representation structures are kept with bitfields, as they are
properly aligned every 32 bits and don't require heavy stack overhead.
Also note that the VDPU381 SPS and PPS structs are kept with bitfields,
for the same reason that they are small and aligned enough not to require
heavy stack overhead.
[1]: https://lore.kernel.org/oe-kbuild-all/202601211924.rqKS2Ihm-lkp@intel.com/
[2]: https://github.com/llvm/llvm-project/issues/178535
[3]: https://yhbt.net/lore/llvm/20260121230406.GA2625738@ax162/T/#mad878ec24a8224e1387ef5e73cb77b9ada55e3f2
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
---
Detlev Casanova (4):
media: rkvdec: Introduce a global bitwriter helper
media: rkvdec: Use the global bitwriter instead of local one
media: rkvdec: common: Drop bitfields for the bitwriter
media: rkvdec: vdpu383: Drop bitfields for the bitwriter
drivers/media/platform/rockchip/rkvdec/Makefile | 1 +
.../platform/rockchip/rkvdec/rkvdec-bitwriter.c | 30 ++
.../platform/rockchip/rkvdec/rkvdec-bitwriter.h | 25 +
.../platform/rockchip/rkvdec/rkvdec-h264-common.c | 51 +--
.../platform/rockchip/rkvdec/rkvdec-h264-common.h | 40 +-
.../media/platform/rockchip/rkvdec/rkvdec-h264.c | 109 ++---
.../platform/rockchip/rkvdec/rkvdec-hevc-common.c | 92 +---
.../platform/rockchip/rkvdec/rkvdec-hevc-common.h | 57 +--
.../media/platform/rockchip/rkvdec/rkvdec-hevc.c | 171 +++----
.../platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c | 351 ++++++--------
.../platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c | 502 +++++++++------------
11 files changed, 578 insertions(+), 851 deletions(-)
---
base-commit: bbeb83d3182abe0d245318e274e8531e5dd7a948
change-id: 20260327-rkvdec-use-bitwriter-f1d149b3cf7c
Best regards,
--
Detlev Casanova <detlev.casanova@collabora.com>
^ permalink raw reply
* [PATCH 1/4] media: rkvdec: Introduce a global bitwriter helper
From: Detlev Casanova @ 2026-03-27 15:16 UTC (permalink / raw)
To: Ezequiel Garcia, Mauro Carvalho Chehab, Heiko Stuebner,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonas Karlman, Nicolas Dufresne
Cc: linux-kernel, linux-media, linux-rockchip, linux-arm-kernel, llvm,
kernel, Detlev Casanova
In-Reply-To: <20260327-rkvdec-use-bitwriter-v1-0-982cf872b590@collabora.com>
The use of structures with bitfields is good when the values are
somewhat aligned.
More mis-alignement means that compilers need to do more gymanstics
to edit the fields values.
Some cases have been reported with CLang on specific architectures
like armhf and hexagon, where the compiler would allocate a bigger
local stack than needed or even completely freeze during compilation.
Some fixes have been provided to ease the issues, but the real fix
here is to use a bitwriter instead of heavily unaligned bitfields.
This is a preparation commit to provide a global bitwriter interface
for the whole driver.
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
---
drivers/media/platform/rockchip/rkvdec/Makefile | 1 +
.../platform/rockchip/rkvdec/rkvdec-bitwriter.c | 30 ++++++++++++++++++++++
.../platform/rockchip/rkvdec/rkvdec-bitwriter.h | 25 ++++++++++++++++++
3 files changed, 56 insertions(+)
diff --git a/drivers/media/platform/rockchip/rkvdec/Makefile b/drivers/media/platform/rockchip/rkvdec/Makefile
index e629d571e4d8..11e2122bcbbf 100644
--- a/drivers/media/platform/rockchip/rkvdec/Makefile
+++ b/drivers/media/platform/rockchip/rkvdec/Makefile
@@ -2,6 +2,7 @@ obj-$(CONFIG_VIDEO_ROCKCHIP_VDEC) += rockchip-vdec.o
rockchip-vdec-y += \
rkvdec.o \
+ rkvdec-bitwriter.o \
rkvdec-cabac.o \
rkvdec-h264.o \
rkvdec-h264-common.o \
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-bitwriter.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-bitwriter.c
new file mode 100644
index 000000000000..673ebb89002b
--- /dev/null
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-bitwriter.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Rockchip Video Decoder bit writer
+ *
+ * Copyright (C) 2026 Collabora, Ltd.
+ * Detlev Casanova <detlev.casanova@collabora.com>
+ * Copyright (C) 2019 Collabora, Ltd.
+ * Boris Brezillon <boris.brezillon@collabora.com>
+ */
+
+#include <linux/types.h>
+#include <linux/bits.h>
+
+#include "rkvdec-bitwriter.h"
+
+void rkvdec_set_bw_field(u32 *buf, struct rkvdec_bw_field field, u32 value)
+{
+ u8 bit = field.offset % 32;
+ u16 word = field.offset / 32;
+ u64 mask = GENMASK_ULL(bit + field.len - 1, bit);
+ u64 val = ((u64)value << bit) & mask;
+
+ buf[word] &= ~mask;
+ buf[word] |= val;
+ if (bit + field.len > 32) {
+ buf[word + 1] &= ~(mask >> 32);
+ buf[word + 1] |= val >> 32;
+ }
+}
+
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-bitwriter.h b/drivers/media/platform/rockchip/rkvdec/rkvdec-bitwriter.h
new file mode 100644
index 000000000000..44154f1ebc65
--- /dev/null
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-bitwriter.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Rockchip Video Decoder bit writer
+ *
+ * Copyright (C) 2026 Collabora, Ltd.
+ * Detlev Casanova <detlev.casanova@collabora.com>
+ * Copyright (C) 2019 Collabora, Ltd.
+ * Boris Brezillon <boris.brezillon@collabora.com>
+ */
+
+#ifndef RKVDEC_BIT_WRITER_H_
+#define RKVDEC_BIT_WRITER_H_
+
+#include <linux/types.h>
+
+struct rkvdec_bw_field {
+ u16 offset;
+ u8 len;
+};
+
+#define BW_FIELD(_offset, _len) ((struct rkvdec_bw_field){ _offset, _len })
+
+void rkvdec_set_bw_field(u32 *buf, struct rkvdec_bw_field field, u32 value);
+
+#endif /* RKVDEC_BIT_WRITER_H_ */
--
2.53.0
^ permalink raw reply related
* [PATCH 2/4] media: rkvdec: Use the global bitwriter instead of local one
From: Detlev Casanova @ 2026-03-27 15:16 UTC (permalink / raw)
To: Ezequiel Garcia, Mauro Carvalho Chehab, Heiko Stuebner,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonas Karlman, Nicolas Dufresne
Cc: linux-kernel, linux-media, linux-rockchip, linux-arm-kernel, llvm,
kernel, Detlev Casanova
In-Reply-To: <20260327-rkvdec-use-bitwriter-v1-0-982cf872b590@collabora.com>
Both rkvdec-h264.c and rkvdec-hevc.c use their own bitwriter
function and macros.
Move to using the global one introduced before.
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
---
.../media/platform/rockchip/rkvdec/rkvdec-h264.c | 109 ++++++-------
.../media/platform/rockchip/rkvdec/rkvdec-hevc.c | 171 +++++++++------------
2 files changed, 119 insertions(+), 161 deletions(-)
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c
index d3202cecb988..ffa606038192 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c
@@ -16,6 +16,7 @@
#include "rkvdec-regs.h"
#include "rkvdec-cabac.h"
#include "rkvdec-h264-common.h"
+#include "rkvdec-bitwriter.h"
/* Size with u32 units. */
#define RKV_CABAC_INIT_BUFFER_SIZE (3680 + 128)
@@ -25,56 +26,48 @@ struct rkvdec_sps_pps_packet {
u32 info[8];
};
-struct rkvdec_ps_field {
- u16 offset;
- u8 len;
-};
-
-#define PS_FIELD(_offset, _len) \
- ((struct rkvdec_ps_field){ _offset, _len })
-
-#define SEQ_PARAMETER_SET_ID PS_FIELD(0, 4)
-#define PROFILE_IDC PS_FIELD(4, 8)
-#define CONSTRAINT_SET3_FLAG PS_FIELD(12, 1)
-#define CHROMA_FORMAT_IDC PS_FIELD(13, 2)
-#define BIT_DEPTH_LUMA PS_FIELD(15, 3)
-#define BIT_DEPTH_CHROMA PS_FIELD(18, 3)
-#define QPPRIME_Y_ZERO_TRANSFORM_BYPASS_FLAG PS_FIELD(21, 1)
-#define LOG2_MAX_FRAME_NUM_MINUS4 PS_FIELD(22, 4)
-#define MAX_NUM_REF_FRAMES PS_FIELD(26, 5)
-#define PIC_ORDER_CNT_TYPE PS_FIELD(31, 2)
-#define LOG2_MAX_PIC_ORDER_CNT_LSB_MINUS4 PS_FIELD(33, 4)
-#define DELTA_PIC_ORDER_ALWAYS_ZERO_FLAG PS_FIELD(37, 1)
-#define PIC_WIDTH_IN_MBS PS_FIELD(38, 9)
-#define PIC_HEIGHT_IN_MBS PS_FIELD(47, 9)
-#define FRAME_MBS_ONLY_FLAG PS_FIELD(56, 1)
-#define MB_ADAPTIVE_FRAME_FIELD_FLAG PS_FIELD(57, 1)
-#define DIRECT_8X8_INFERENCE_FLAG PS_FIELD(58, 1)
-#define MVC_EXTENSION_ENABLE PS_FIELD(59, 1)
-#define NUM_VIEWS PS_FIELD(60, 2)
-#define VIEW_ID(i) PS_FIELD(62 + ((i) * 10), 10)
-#define NUM_ANCHOR_REFS_L(i) PS_FIELD(82 + ((i) * 11), 1)
-#define ANCHOR_REF_L(i) PS_FIELD(83 + ((i) * 11), 10)
-#define NUM_NON_ANCHOR_REFS_L(i) PS_FIELD(104 + ((i) * 11), 1)
-#define NON_ANCHOR_REFS_L(i) PS_FIELD(105 + ((i) * 11), 10)
-#define PIC_PARAMETER_SET_ID PS_FIELD(128, 8)
-#define PPS_SEQ_PARAMETER_SET_ID PS_FIELD(136, 5)
-#define ENTROPY_CODING_MODE_FLAG PS_FIELD(141, 1)
-#define BOTTOM_FIELD_PIC_ORDER_IN_FRAME_PRESENT_FLAG PS_FIELD(142, 1)
-#define NUM_REF_IDX_L_DEFAULT_ACTIVE_MINUS1(i) PS_FIELD(143 + ((i) * 5), 5)
-#define WEIGHTED_PRED_FLAG PS_FIELD(153, 1)
-#define WEIGHTED_BIPRED_IDC PS_FIELD(154, 2)
-#define PIC_INIT_QP_MINUS26 PS_FIELD(156, 7)
-#define PIC_INIT_QS_MINUS26 PS_FIELD(163, 6)
-#define CHROMA_QP_INDEX_OFFSET PS_FIELD(169, 5)
-#define DEBLOCKING_FILTER_CONTROL_PRESENT_FLAG PS_FIELD(174, 1)
-#define CONSTRAINED_INTRA_PRED_FLAG PS_FIELD(175, 1)
-#define REDUNDANT_PIC_CNT_PRESENT PS_FIELD(176, 1)
-#define TRANSFORM_8X8_MODE_FLAG PS_FIELD(177, 1)
-#define SECOND_CHROMA_QP_INDEX_OFFSET PS_FIELD(178, 5)
-#define SCALING_LIST_ENABLE_FLAG PS_FIELD(183, 1)
-#define SCALING_LIST_ADDRESS PS_FIELD(184, 32)
-#define IS_LONG_TERM(i) PS_FIELD(216 + (i), 1)
+#define SEQ_PARAMETER_SET_ID BW_FIELD(0, 4)
+#define PROFILE_IDC BW_FIELD(4, 8)
+#define CONSTRAINT_SET3_FLAG BW_FIELD(12, 1)
+#define CHROMA_FORMAT_IDC BW_FIELD(13, 2)
+#define BIT_DEPTH_LUMA BW_FIELD(15, 3)
+#define BIT_DEPTH_CHROMA BW_FIELD(18, 3)
+#define QPPRIME_Y_ZERO_TRANSFORM_BYPASS_FLAG BW_FIELD(21, 1)
+#define LOG2_MAX_FRAME_NUM_MINUS4 BW_FIELD(22, 4)
+#define MAX_NUM_REF_FRAMES BW_FIELD(26, 5)
+#define PIC_ORDER_CNT_TYPE BW_FIELD(31, 2)
+#define LOG2_MAX_PIC_ORDER_CNT_LSB_MINUS4 BW_FIELD(33, 4)
+#define DELTA_PIC_ORDER_ALWAYS_ZERO_FLAG BW_FIELD(37, 1)
+#define PIC_WIDTH_IN_MBS BW_FIELD(38, 9)
+#define PIC_HEIGHT_IN_MBS BW_FIELD(47, 9)
+#define FRAME_MBS_ONLY_FLAG BW_FIELD(56, 1)
+#define MB_ADAPTIVE_FRAME_FIELD_FLAG BW_FIELD(57, 1)
+#define DIRECT_8X8_INFERENCE_FLAG BW_FIELD(58, 1)
+#define MVC_EXTENSION_ENABLE BW_FIELD(59, 1)
+#define NUM_VIEWS BW_FIELD(60, 2)
+#define VIEW_ID(i) BW_FIELD(62 + ((i) * 10), 10)
+#define NUM_ANCHOR_REFS_L(i) BW_FIELD(82 + ((i) * 11), 1)
+#define ANCHOR_REF_L(i) BW_FIELD(83 + ((i) * 11), 10)
+#define NUM_NON_ANCHOR_REFS_L(i) BW_FIELD(104 + ((i) * 11), 1)
+#define NON_ANCHOR_REFS_L(i) BW_FIELD(105 + ((i) * 11), 10)
+#define PIC_PARAMETER_SET_ID BW_FIELD(128, 8)
+#define PPS_SEQ_PARAMETER_SET_ID BW_FIELD(136, 5)
+#define ENTROPY_CODING_MODE_FLAG BW_FIELD(141, 1)
+#define BOTTOM_FIELD_PIC_ORDER_IN_FRAME_PRESENT_FLAG BW_FIELD(142, 1)
+#define NUM_REF_IDX_L_DEFAULT_ACTIVE_MINUS1(i) BW_FIELD(143 + ((i) * 5), 5)
+#define WEIGHTED_PRED_FLAG BW_FIELD(153, 1)
+#define WEIGHTED_BIPRED_IDC BW_FIELD(154, 2)
+#define PIC_INIT_QP_MINUS26 BW_FIELD(156, 7)
+#define PIC_INIT_QS_MINUS26 BW_FIELD(163, 6)
+#define CHROMA_QP_INDEX_OFFSET BW_FIELD(169, 5)
+#define DEBLOCKING_FILTER_CONTROL_PRESENT_FLAG BW_FIELD(174, 1)
+#define CONSTRAINED_INTRA_PRED_FLAG BW_FIELD(175, 1)
+#define REDUNDANT_PIC_CNT_PRESENT BW_FIELD(176, 1)
+#define TRANSFORM_8X8_MODE_FLAG BW_FIELD(177, 1)
+#define SECOND_CHROMA_QP_INDEX_OFFSET BW_FIELD(178, 5)
+#define SCALING_LIST_ENABLE_FLAG BW_FIELD(183, 1)
+#define SCALING_LIST_ADDRESS BW_FIELD(184, 32)
+#define IS_LONG_TERM(i) BW_FIELD(216 + (i), 1)
/* Data structure describing auxiliary buffer format. */
struct rkvdec_h264_priv_tbl {
@@ -91,20 +84,6 @@ struct rkvdec_h264_ctx {
struct rkvdec_regs regs;
};
-static void set_ps_field(u32 *buf, struct rkvdec_ps_field field, u32 value)
-{
- u8 bit = field.offset % 32, word = field.offset / 32;
- u64 mask = GENMASK_ULL(bit + field.len - 1, bit);
- u64 val = ((u64)value << bit) & mask;
-
- buf[word] &= ~mask;
- buf[word] |= val;
- if (bit + field.len > 32) {
- buf[word + 1] &= ~(mask >> 32);
- buf[word + 1] |= val >> 32;
- }
-}
-
static void assemble_hw_pps(struct rkvdec_ctx *ctx,
struct rkvdec_h264_run *run)
{
@@ -128,7 +107,7 @@ static void assemble_hw_pps(struct rkvdec_ctx *ctx,
hw_ps = &priv_tbl->param_set[pps->pic_parameter_set_id];
memset(hw_ps, 0, sizeof(*hw_ps));
-#define WRITE_PPS(value, field) set_ps_field(hw_ps->info, field, value)
+#define WRITE_PPS(value, field) rkvdec_set_bw_field(hw_ps->info, field, value)
/* write sps */
WRITE_PPS(sps->seq_parameter_set_id, SEQ_PARAMETER_SET_ID);
WRITE_PPS(sps->profile_idc, PROFILE_IDC);
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c
index ac8b825d080a..6d367bfcdd13 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c
@@ -18,6 +18,7 @@
#include "rkvdec-regs.h"
#include "rkvdec-cabac.h"
#include "rkvdec-hevc-common.h"
+#include "rkvdec-bitwriter.h"
/* Size in u8/u32 units. */
#define RKV_SCALING_LIST_SIZE 1360
@@ -34,80 +35,72 @@ struct rkvdec_rps_packet {
u32 info[RKV_RPS_SIZE];
};
-struct rkvdec_ps_field {
- u16 offset;
- u8 len;
-};
-
-#define PS_FIELD(_offset, _len) \
- ((struct rkvdec_ps_field){ _offset, _len })
-
/* SPS */
-#define VIDEO_PARAMETER_SET_ID PS_FIELD(0, 4)
-#define SEQ_PARAMETER_SET_ID PS_FIELD(4, 4)
-#define CHROMA_FORMAT_IDC PS_FIELD(8, 2)
-#define PIC_WIDTH_IN_LUMA_SAMPLES PS_FIELD(10, 13)
-#define PIC_HEIGHT_IN_LUMA_SAMPLES PS_FIELD(23, 13)
-#define BIT_DEPTH_LUMA PS_FIELD(36, 4)
-#define BIT_DEPTH_CHROMA PS_FIELD(40, 4)
-#define LOG2_MAX_PIC_ORDER_CNT_LSB PS_FIELD(44, 5)
-#define LOG2_DIFF_MAX_MIN_LUMA_CODING_BLOCK_SIZE PS_FIELD(49, 2)
-#define LOG2_MIN_LUMA_CODING_BLOCK_SIZE PS_FIELD(51, 3)
-#define LOG2_MIN_TRANSFORM_BLOCK_SIZE PS_FIELD(54, 3)
-#define LOG2_DIFF_MAX_MIN_LUMA_TRANSFORM_BLOCK_SIZE PS_FIELD(57, 2)
-#define MAX_TRANSFORM_HIERARCHY_DEPTH_INTER PS_FIELD(59, 3)
-#define MAX_TRANSFORM_HIERARCHY_DEPTH_INTRA PS_FIELD(62, 3)
-#define SCALING_LIST_ENABLED_FLAG PS_FIELD(65, 1)
-#define AMP_ENABLED_FLAG PS_FIELD(66, 1)
-#define SAMPLE_ADAPTIVE_OFFSET_ENABLED_FLAG PS_FIELD(67, 1)
-#define PCM_ENABLED_FLAG PS_FIELD(68, 1)
-#define PCM_SAMPLE_BIT_DEPTH_LUMA PS_FIELD(69, 4)
-#define PCM_SAMPLE_BIT_DEPTH_CHROMA PS_FIELD(73, 4)
-#define PCM_LOOP_FILTER_DISABLED_FLAG PS_FIELD(77, 1)
-#define LOG2_DIFF_MAX_MIN_PCM_LUMA_CODING_BLOCK_SIZE PS_FIELD(78, 3)
-#define LOG2_MIN_PCM_LUMA_CODING_BLOCK_SIZE PS_FIELD(81, 3)
-#define NUM_SHORT_TERM_REF_PIC_SETS PS_FIELD(84, 7)
-#define LONG_TERM_REF_PICS_PRESENT_FLAG PS_FIELD(91, 1)
-#define NUM_LONG_TERM_REF_PICS_SPS PS_FIELD(92, 6)
-#define SPS_TEMPORAL_MVP_ENABLED_FLAG PS_FIELD(98, 1)
-#define STRONG_INTRA_SMOOTHING_ENABLED_FLAG PS_FIELD(99, 1)
+#define VIDEO_PARAMETER_SET_ID BW_FIELD(0, 4)
+#define SEQ_PARAMETER_SET_ID BW_FIELD(4, 4)
+#define CHROMA_FORMAT_IDC BW_FIELD(8, 2)
+#define PIC_WIDTH_IN_LUMA_SAMPLES BW_FIELD(10, 13)
+#define PIC_HEIGHT_IN_LUMA_SAMPLES BW_FIELD(23, 13)
+#define BIT_DEPTH_LUMA BW_FIELD(36, 4)
+#define BIT_DEPTH_CHROMA BW_FIELD(40, 4)
+#define LOG2_MAX_PIC_ORDER_CNT_LSB BW_FIELD(44, 5)
+#define LOG2_DIFF_MAX_MIN_LUMA_CODING_BLOCK_SIZE BW_FIELD(49, 2)
+#define LOG2_MIN_LUMA_CODING_BLOCK_SIZE BW_FIELD(51, 3)
+#define LOG2_MIN_TRANSFORM_BLOCK_SIZE BW_FIELD(54, 3)
+#define LOG2_DIFF_MAX_MIN_LUMA_TRANSFORM_BLOCK_SIZE BW_FIELD(57, 2)
+#define MAX_TRANSFORM_HIERARCHY_DEPTH_INTER BW_FIELD(59, 3)
+#define MAX_TRANSFORM_HIERARCHY_DEPTH_INTRA BW_FIELD(62, 3)
+#define SCALING_LIST_ENABLED_FLAG BW_FIELD(65, 1)
+#define AMP_ENABLED_FLAG BW_FIELD(66, 1)
+#define SAMPLE_ADAPTIVE_OFFSET_ENABLED_FLAG BW_FIELD(67, 1)
+#define PCM_ENABLED_FLAG BW_FIELD(68, 1)
+#define PCM_SAMPLE_BIT_DEPTH_LUMA BW_FIELD(69, 4)
+#define PCM_SAMPLE_BIT_DEPTH_CHROMA BW_FIELD(73, 4)
+#define PCM_LOOP_FILTER_DISABLED_FLAG BW_FIELD(77, 1)
+#define LOG2_DIFF_MAX_MIN_PCM_LUMA_CODING_BLOCK_SIZE BW_FIELD(78, 3)
+#define LOG2_MIN_PCM_LUMA_CODING_BLOCK_SIZE BW_FIELD(81, 3)
+#define NUM_SHORT_TERM_REF_PIC_SETS BW_FIELD(84, 7)
+#define LONG_TERM_REF_PICS_PRESENT_FLAG BW_FIELD(91, 1)
+#define NUM_LONG_TERM_REF_PICS_SPS BW_FIELD(92, 6)
+#define SPS_TEMPORAL_MVP_ENABLED_FLAG BW_FIELD(98, 1)
+#define STRONG_INTRA_SMOOTHING_ENABLED_FLAG BW_FIELD(99, 1)
/* PPS */
-#define PIC_PARAMETER_SET_ID PS_FIELD(128, 6)
-#define PPS_SEQ_PARAMETER_SET_ID PS_FIELD(134, 4)
-#define DEPENDENT_SLICE_SEGMENTS_ENABLED_FLAG PS_FIELD(138, 1)
-#define OUTPUT_FLAG_PRESENT_FLAG PS_FIELD(139, 1)
-#define NUM_EXTRA_SLICE_HEADER_BITS PS_FIELD(140, 13)
-#define SIGN_DATA_HIDING_ENABLED_FLAG PS_FIELD(153, 1)
-#define CABAC_INIT_PRESENT_FLAG PS_FIELD(154, 1)
-#define NUM_REF_IDX_L0_DEFAULT_ACTIVE PS_FIELD(155, 4)
-#define NUM_REF_IDX_L1_DEFAULT_ACTIVE PS_FIELD(159, 4)
-#define INIT_QP_MINUS26 PS_FIELD(163, 7)
-#define CONSTRAINED_INTRA_PRED_FLAG PS_FIELD(170, 1)
-#define TRANSFORM_SKIP_ENABLED_FLAG PS_FIELD(171, 1)
-#define CU_QP_DELTA_ENABLED_FLAG PS_FIELD(172, 1)
-#define LOG2_MIN_CU_QP_DELTA_SIZE PS_FIELD(173, 3)
-#define PPS_CB_QP_OFFSET PS_FIELD(176, 5)
-#define PPS_CR_QP_OFFSET PS_FIELD(181, 5)
-#define PPS_SLICE_CHROMA_QP_OFFSETS_PRESENT_FLAG PS_FIELD(186, 1)
-#define WEIGHTED_PRED_FLAG PS_FIELD(187, 1)
-#define WEIGHTED_BIPRED_FLAG PS_FIELD(188, 1)
-#define TRANSQUANT_BYPASS_ENABLED_FLAG PS_FIELD(189, 1)
-#define TILES_ENABLED_FLAG PS_FIELD(190, 1)
-#define ENTROPY_CODING_SYNC_ENABLED_FLAG PS_FIELD(191, 1)
-#define PPS_LOOP_FILTER_ACROSS_SLICES_ENABLED_FLAG PS_FIELD(192, 1)
-#define LOOP_FILTER_ACROSS_TILES_ENABLED_FLAG PS_FIELD(193, 1)
-#define DEBLOCKING_FILTER_OVERRIDE_ENABLED_FLAG PS_FIELD(194, 1)
-#define PPS_DEBLOCKING_FILTER_DISABLED_FLAG PS_FIELD(195, 1)
-#define PPS_BETA_OFFSET_DIV2 PS_FIELD(196, 4)
-#define PPS_TC_OFFSET_DIV2 PS_FIELD(200, 4)
-#define LISTS_MODIFICATION_PRESENT_FLAG PS_FIELD(204, 1)
-#define LOG2_PARALLEL_MERGE_LEVEL PS_FIELD(205, 3)
-#define SLICE_SEGMENT_HEADER_EXTENSION_PRESENT_FLAG PS_FIELD(208, 1)
-#define NUM_TILE_COLUMNS PS_FIELD(212, 5)
-#define NUM_TILE_ROWS PS_FIELD(217, 5)
-#define COLUMN_WIDTH(i) PS_FIELD(256 + ((i) * 8), 8)
-#define ROW_HEIGHT(i) PS_FIELD(416 + ((i) * 8), 8)
-#define SCALING_LIST_ADDRESS PS_FIELD(592, 32)
+#define PIC_PARAMETER_SET_ID BW_FIELD(128, 6)
+#define PPS_SEQ_PARAMETER_SET_ID BW_FIELD(134, 4)
+#define DEPENDENT_SLICE_SEGMENTS_ENABLED_FLAG BW_FIELD(138, 1)
+#define OUTPUT_FLAG_PRESENT_FLAG BW_FIELD(139, 1)
+#define NUM_EXTRA_SLICE_HEADER_BITS BW_FIELD(140, 13)
+#define SIGN_DATA_HIDING_ENABLED_FLAG BW_FIELD(153, 1)
+#define CABAC_INIT_PRESENT_FLAG BW_FIELD(154, 1)
+#define NUM_REF_IDX_L0_DEFAULT_ACTIVE BW_FIELD(155, 4)
+#define NUM_REF_IDX_L1_DEFAULT_ACTIVE BW_FIELD(159, 4)
+#define INIT_QP_MINUS26 BW_FIELD(163, 7)
+#define CONSTRAINED_INTRA_PRED_FLAG BW_FIELD(170, 1)
+#define TRANSFORM_SKIP_ENABLED_FLAG BW_FIELD(171, 1)
+#define CU_QP_DELTA_ENABLED_FLAG BW_FIELD(172, 1)
+#define LOG2_MIN_CU_QP_DELTA_SIZE BW_FIELD(173, 3)
+#define PPS_CB_QP_OFFSET BW_FIELD(176, 5)
+#define PPS_CR_QP_OFFSET BW_FIELD(181, 5)
+#define PPS_SLICE_CHROMA_QP_OFFSETS_PRESENT_FLAG BW_FIELD(186, 1)
+#define WEIGHTED_PRED_FLAG BW_FIELD(187, 1)
+#define WEIGHTED_BIPRED_FLAG BW_FIELD(188, 1)
+#define TRANSQUANT_BYPASS_ENABLED_FLAG BW_FIELD(189, 1)
+#define TILES_ENABLED_FLAG BW_FIELD(190, 1)
+#define ENTROPY_CODING_SYNC_ENABLED_FLAG BW_FIELD(191, 1)
+#define PPS_LOOP_FILTER_ACROSS_SLICES_ENABLED_FLAG BW_FIELD(192, 1)
+#define LOOP_FILTER_ACROSS_TILES_ENABLED_FLAG BW_FIELD(193, 1)
+#define DEBLOCKING_FILTER_OVERRIDE_ENABLED_FLAG BW_FIELD(194, 1)
+#define PPS_DEBLOCKING_FILTER_DISABLED_FLAG BW_FIELD(195, 1)
+#define PPS_BETA_OFFSET_DIV2 BW_FIELD(196, 4)
+#define PPS_TC_OFFSET_DIV2 BW_FIELD(200, 4)
+#define LISTS_MODIFICATION_PRESENT_FLAG BW_FIELD(204, 1)
+#define LOG2_PARALLEL_MERGE_LEVEL BW_FIELD(205, 3)
+#define SLICE_SEGMENT_HEADER_EXTENSION_PRESENT_FLAG BW_FIELD(208, 1)
+#define NUM_TILE_COLUMNS BW_FIELD(212, 5)
+#define NUM_TILE_ROWS BW_FIELD(217, 5)
+#define COLUMN_WIDTH(i) BW_FIELD(256 + ((i) * 8), 8)
+#define ROW_HEIGHT(i) BW_FIELD(416 + ((i) * 8), 8)
+#define SCALING_LIST_ADDRESS BW_FIELD(592, 32)
/* Data structure describing auxiliary buffer format. */
struct rkvdec_hevc_priv_tbl {
@@ -123,20 +116,6 @@ struct rkvdec_hevc_ctx {
struct rkvdec_regs regs;
};
-static void set_ps_field(u32 *buf, struct rkvdec_ps_field field, u32 value)
-{
- u8 bit = field.offset % 32, word = field.offset / 32;
- u64 mask = GENMASK_ULL(bit + field.len - 1, bit);
- u64 val = ((u64)value << bit) & mask;
-
- buf[word] &= ~mask;
- buf[word] |= val;
- if (bit + field.len > 32) {
- buf[word + 1] &= ~(mask >> 32);
- buf[word + 1] |= val >> 32;
- }
-}
-
static void assemble_hw_pps(struct rkvdec_ctx *ctx,
struct rkvdec_hevc_run *run)
{
@@ -159,7 +138,7 @@ static void assemble_hw_pps(struct rkvdec_ctx *ctx,
hw_ps = &priv_tbl->param_set[pps->pic_parameter_set_id];
memset(hw_ps, 0, sizeof(*hw_ps));
-#define WRITE_PPS(value, field) set_ps_field(hw_ps->info, field, value)
+#define WRITE_PPS(value, field) rkvdec_set_bw_field(hw_ps->info, field, value)
/* write sps */
WRITE_PPS(sps->video_parameter_set_id, VIDEO_PARAMETER_SET_ID);
WRITE_PPS(sps->seq_parameter_set_id, SEQ_PARAMETER_SET_ID);
@@ -321,17 +300,17 @@ static void assemble_sw_rps(struct rkvdec_ctx *ctx,
int i, j;
unsigned int lowdelay;
-#define WRITE_RPS(value, field) set_ps_field(hw_ps->info, field, value)
+#define WRITE_RPS(value, field) rkvdec_set_bw_field(hw_ps->info, field, value)
-#define REF_PIC_LONG_TERM_L0(i) PS_FIELD((i) * 5, 1)
-#define REF_PIC_IDX_L0(i) PS_FIELD(1 + ((i) * 5), 4)
-#define REF_PIC_LONG_TERM_L1(i) PS_FIELD(((i) < 5 ? 75 : 132) + ((i) * 5), 1)
-#define REF_PIC_IDX_L1(i) PS_FIELD(((i) < 4 ? 76 : 128) + ((i) * 5), 4)
+#define REF_PIC_LONG_TERM_L0(i) BW_FIELD((i) * 5, 1)
+#define REF_PIC_IDX_L0(i) BW_FIELD(1 + ((i) * 5), 4)
+#define REF_PIC_LONG_TERM_L1(i) BW_FIELD(((i) < 5 ? 75 : 132) + ((i) * 5), 1)
+#define REF_PIC_IDX_L1(i) BW_FIELD(((i) < 4 ? 76 : 128) + ((i) * 5), 4)
-#define LOWDELAY PS_FIELD(182, 1)
-#define LONG_TERM_RPS_BIT_OFFSET PS_FIELD(183, 10)
-#define SHORT_TERM_RPS_BIT_OFFSET PS_FIELD(193, 9)
-#define NUM_RPS_POC PS_FIELD(202, 4)
+#define LOWDELAY BW_FIELD(182, 1)
+#define LONG_TERM_RPS_BIT_OFFSET BW_FIELD(183, 10)
+#define SHORT_TERM_RPS_BIT_OFFSET BW_FIELD(193, 9)
+#define NUM_RPS_POC BW_FIELD(202, 4)
for (j = 0; j < run->num_slices; j++) {
uint st_bit_offset = 0;
--
2.53.0
^ permalink raw reply related
* [PATCH 3/4] media: rkvdec: common: Drop bitfields for the bitwriter
From: Detlev Casanova @ 2026-03-27 15:16 UTC (permalink / raw)
To: Ezequiel Garcia, Mauro Carvalho Chehab, Heiko Stuebner,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonas Karlman, Nicolas Dufresne
Cc: linux-kernel, linux-media, linux-rockchip, linux-arm-kernel, llvm,
kernel, Detlev Casanova
In-Reply-To: <20260327-rkvdec-use-bitwriter-v1-0-982cf872b590@collabora.com>
Currently, the common code files for hevc and h264 use structs with
bitfields to represent the HW RPS buffer.
Because the bitfields are mostly unaligned and numerous, it brings compiler
issues, especially with clang.
To prevent that, switch to using the global bitwriter previously
introduced instead.
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
---
.../platform/rockchip/rkvdec/rkvdec-h264-common.c | 51 +-----------
.../platform/rockchip/rkvdec/rkvdec-h264-common.h | 40 +++-------
.../platform/rockchip/rkvdec/rkvdec-hevc-common.c | 92 ++++------------------
.../platform/rockchip/rkvdec/rkvdec-hevc-common.h | 57 ++++----------
4 files changed, 43 insertions(+), 197 deletions(-)
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-h264-common.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-h264-common.c
index e28f06394470..54639512e456 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-h264-common.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-h264-common.c
@@ -21,51 +21,6 @@
#define RKVDEC_NUM_REFLIST 3
-static void set_dpb_info(struct rkvdec_rps_entry *entries,
- u8 reflist,
- u8 refnum,
- u8 info,
- bool bottom)
-{
- struct rkvdec_rps_entry *entry = &entries[(reflist * 4) + refnum / 8];
- u8 idx = refnum % 8;
-
- switch (idx) {
- case 0:
- entry->dpb_info0 = info;
- entry->bottom_flag0 = bottom;
- break;
- case 1:
- entry->dpb_info1 = info;
- entry->bottom_flag1 = bottom;
- break;
- case 2:
- entry->dpb_info2 = info;
- entry->bottom_flag2 = bottom;
- break;
- case 3:
- entry->dpb_info3 = info;
- entry->bottom_flag3 = bottom;
- break;
- case 4:
- entry->dpb_info4 = info;
- entry->bottom_flag4 = bottom;
- break;
- case 5:
- entry->dpb_info5 = info;
- entry->bottom_flag5 = bottom;
- break;
- case 6:
- entry->dpb_info6 = info;
- entry->bottom_flag6 = bottom;
- break;
- case 7:
- entry->dpb_info7 = info;
- entry->bottom_flag7 = bottom;
- break;
- }
-}
-
void lookup_ref_buf_idx(struct rkvdec_ctx *ctx,
struct rkvdec_h264_run *run)
{
@@ -111,7 +66,7 @@ void assemble_hw_rps(struct v4l2_h264_reflist_builder *builder,
if (!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
continue;
- hw_rps->frame_num[i] = builder->refs[i].frame_num;
+ rkvdec_set_bw_field(hw_rps->info, RPS_FRAME_NUM(i), builder->refs[i].frame_num);
}
for (j = 0; j < RKVDEC_NUM_REFLIST; j++) {
@@ -138,7 +93,9 @@ void assemble_hw_rps(struct v4l2_h264_reflist_builder *builder,
dpb_valid = !!(run->ref_buf[ref->index]);
bottom = ref->fields == V4L2_H264_BOTTOM_FIELD_REF;
- set_dpb_info(hw_rps->entries, j, i, ref->index | (dpb_valid << 4), bottom);
+ rkvdec_set_bw_field(hw_rps->info, RPS_ENTRY_DPB_INFO(j, i),
+ ref->index | (dpb_valid << 4));
+ rkvdec_set_bw_field(hw_rps->info, RPS_ENTRY_BOTTOM_FLAG(j, i), bottom);
}
}
}
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-h264-common.h b/drivers/media/platform/rockchip/rkvdec/rkvdec-h264-common.h
index 5336370507d6..8d3255289135 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-h264-common.h
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-h264-common.h
@@ -16,6 +16,7 @@
#include <media/v4l2-mem2mem.h>
#include "rkvdec.h"
+#include "rkvdec-bitwriter.h"
struct rkvdec_h264_scaling_list {
u8 scaling_list_4x4[6][16];
@@ -38,39 +39,16 @@ struct rkvdec_h264_run {
struct vb2_buffer *ref_buf[V4L2_H264_NUM_DPB_ENTRIES];
};
-struct rkvdec_rps_entry {
- u32 dpb_info0: 5;
- u32 bottom_flag0: 1;
- u32 view_index_off0: 1;
- u32 dpb_info1: 5;
- u32 bottom_flag1: 1;
- u32 view_index_off1: 1;
- u32 dpb_info2: 5;
- u32 bottom_flag2: 1;
- u32 view_index_off2: 1;
- u32 dpb_info3: 5;
- u32 bottom_flag3: 1;
- u32 view_index_off3: 1;
- u32 dpb_info4: 5;
- u32 bottom_flag4: 1;
- u32 view_index_off4: 1;
- u32 dpb_info5: 5;
- u32 bottom_flag5: 1;
- u32 view_index_off5: 1;
- u32 dpb_info6: 5;
- u32 bottom_flag6: 1;
- u32 view_index_off6: 1;
- u32 dpb_info7: 5;
- u32 bottom_flag7: 1;
- u32 view_index_off7: 1;
-} __packed;
+#define RPS_FRAME_NUM(i) BW_FIELD((i) * 16, 16)
+#define RPS_ENTRY_DPB_INFO(l, e) BW_FIELD(288 + (l) * 7 * 32 + (e) * 7, 5) //l: 0-2, e: 0-31
+#define RPS_ENTRY_BOTTOM_FLAG(l, e) BW_FIELD(293 + (l) * 7 * 32 + (e) * 7, 1) //l: 0-2, e: 0-31
+#define RPS_ENTRY_VIEW_INDEX_OFF(l, e) BW_FIELD(294 + (l) * 7 * 32 + (e) * 7, 1) //l: 0-2, e: 0-31
+
+#define RKVDEC_H264_RPS_SIZE ALIGN(RPS_ENTRY_VIEW_INDEX_OFF(3, 32).offset, 128)
struct rkvdec_rps {
- u16 frame_num[16];
- u32 reserved0;
- struct rkvdec_rps_entry entries[12];
- u32 reserved1[66];
-} __packed;
+ u32 info[RKVDEC_H264_RPS_SIZE / 8 / 4];
+};
void lookup_ref_buf_idx(struct rkvdec_ctx *ctx, struct rkvdec_h264_run *run);
void assemble_hw_rps(struct v4l2_h264_reflist_builder *builder,
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.c
index 3119f3bc9f98..be7e86dd976b 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.c
@@ -74,72 +74,6 @@ void compute_tiles_non_uniform(struct rkvdec_hevc_run *run, u16 log2_min_cb_size
row_height[i] = pic_in_cts_height - sum;
}
-static void set_ref_poc(struct rkvdec_rps_short_term_ref_set *set, int poc, int value, int flag)
-{
- switch (poc) {
- case 0:
- set->delta_poc0 = value;
- set->used_flag0 = flag;
- break;
- case 1:
- set->delta_poc1 = value;
- set->used_flag1 = flag;
- break;
- case 2:
- set->delta_poc2 = value;
- set->used_flag2 = flag;
- break;
- case 3:
- set->delta_poc3 = value;
- set->used_flag3 = flag;
- break;
- case 4:
- set->delta_poc4 = value;
- set->used_flag4 = flag;
- break;
- case 5:
- set->delta_poc5 = value;
- set->used_flag5 = flag;
- break;
- case 6:
- set->delta_poc6 = value;
- set->used_flag6 = flag;
- break;
- case 7:
- set->delta_poc7 = value;
- set->used_flag7 = flag;
- break;
- case 8:
- set->delta_poc8 = value;
- set->used_flag8 = flag;
- break;
- case 9:
- set->delta_poc9 = value;
- set->used_flag9 = flag;
- break;
- case 10:
- set->delta_poc10 = value;
- set->used_flag10 = flag;
- break;
- case 11:
- set->delta_poc11 = value;
- set->used_flag11 = flag;
- break;
- case 12:
- set->delta_poc12 = value;
- set->used_flag12 = flag;
- break;
- case 13:
- set->delta_poc13 = value;
- set->used_flag13 = flag;
- break;
- case 14:
- set->delta_poc14 = value;
- set->used_flag14 = flag;
- break;
- }
-}
-
static void assemble_scalingfactor0(struct rkvdec_ctx *ctx, u8 *output,
const struct v4l2_ctrl_hevc_scaling_matrix *input)
{
@@ -218,10 +152,10 @@ static void rkvdec_hevc_assemble_hw_lt_rps(struct rkvdec_hevc_run *run, struct r
return;
for (int i = 0; i < sps->num_long_term_ref_pics_sps; i++) {
- rps->refs[i].lt_ref_pic_poc_lsb =
- run->ext_sps_lt_rps[i].lt_ref_pic_poc_lsb_sps;
- rps->refs[i].used_by_curr_pic_lt_flag =
- !!(run->ext_sps_lt_rps[i].flags & V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT);
+ rkvdec_set_bw_field(rps->info, RPS_LT_REF_PIC_POC_LSB(i),
+ run->ext_sps_lt_rps[i].lt_ref_pic_poc_lsb_sps);
+ rkvdec_set_bw_field(rps->info, RPS_LT_REF_USED_BY_CURR_PIC(i),
+ !!(run->ext_sps_lt_rps[i].flags & V4L2_HEVC_EXT_SPS_LT_RPS_FLAG_USED_LT));
}
}
@@ -235,18 +169,24 @@ static void rkvdec_hevc_assemble_hw_st_rps(struct rkvdec_hevc_run *run, struct r
int j = 0;
const struct calculated_rps_st_set *set = &calculated_rps_st_sets[i];
- rps->short_term_ref_sets[i].num_negative = set->num_negative_pics;
- rps->short_term_ref_sets[i].num_positive = set->num_positive_pics;
+ rkvdec_set_bw_field(rps->info, RPS_ST_REF_SET_NUM_NEGATIVE(i),
+ set->num_negative_pics);
+ rkvdec_set_bw_field(rps->info, RPS_ST_REF_SET_NUM_POSITIVE(i),
+ set->num_positive_pics);
for (; j < set->num_negative_pics; j++) {
- set_ref_poc(&rps->short_term_ref_sets[i], j,
- set->delta_poc_s0[j], set->used_by_curr_pic_s0[j]);
+ rkvdec_set_bw_field(rps->info, RPS_ST_REF_SET_DELTA_POC(i, j),
+ set->delta_poc_s0[j]);
+ rkvdec_set_bw_field(rps->info, RPS_ST_REF_SET_USED(i, j),
+ set->used_by_curr_pic_s0[j]);
}
poc = j;
for (j = 0; j < set->num_positive_pics; j++) {
- set_ref_poc(&rps->short_term_ref_sets[i], poc + j,
- set->delta_poc_s1[j], set->used_by_curr_pic_s1[j]);
+ rkvdec_set_bw_field(rps->info, RPS_ST_REF_SET_DELTA_POC(i, poc + j),
+ set->delta_poc_s1[j]);
+ rkvdec_set_bw_field(rps->info, RPS_ST_REF_SET_USED(i, poc + j),
+ set->used_by_curr_pic_s1[j]);
}
}
}
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.h b/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.h
index 6f4faca4c091..cd3a2eb36b58 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.h
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.h
@@ -19,53 +19,24 @@
#include <linux/types.h>
#include "rkvdec.h"
+#include "rkvdec-bitwriter.h"
-struct rkvdec_rps_refs {
- u16 lt_ref_pic_poc_lsb;
- u16 used_by_curr_pic_lt_flag : 1;
- u16 reserved : 15;
-} __packed;
+#define RPS_LT_REF_PIC_POC_LSB(i) BW_FIELD(0 + (i) * 32, 16) // i: 0-31
+#define RPS_LT_REF_USED_BY_CURR_PIC(i) BW_FIELD(16 + (i) * 32, 1) // i: 0-31
-struct rkvdec_rps_short_term_ref_set {
- u32 num_negative : 4;
- u32 num_positive : 4;
- u32 delta_poc0 : 16;
- u32 used_flag0 : 1;
- u32 delta_poc1 : 16;
- u32 used_flag1 : 1;
- u32 delta_poc2 : 16;
- u32 used_flag2 : 1;
- u32 delta_poc3 : 16;
- u32 used_flag3 : 1;
- u32 delta_poc4 : 16;
- u32 used_flag4 : 1;
- u32 delta_poc5 : 16;
- u32 used_flag5 : 1;
- u32 delta_poc6 : 16;
- u32 used_flag6 : 1;
- u32 delta_poc7 : 16;
- u32 used_flag7 : 1;
- u32 delta_poc8 : 16;
- u32 used_flag8 : 1;
- u32 delta_poc9 : 16;
- u32 used_flag9 : 1;
- u32 delta_poc10 : 16;
- u32 used_flag10 : 1;
- u32 delta_poc11 : 16;
- u32 used_flag11 : 1;
- u32 delta_poc12 : 16;
- u32 used_flag12 : 1;
- u32 delta_poc13 : 16;
- u32 used_flag13 : 1;
- u32 delta_poc14 : 16;
- u32 used_flag14 : 1;
- u32 reserved_bits : 25;
- u32 reserved[3];
-} __packed;
+#define RPS_ST_REF_SET_NUM_NEGATIVE(i) BW_FIELD(1024 + ((i) * 384), 4) // i: 0-63
+#define RPS_ST_REF_SET_NUM_POSITIVE(i) BW_FIELD(1028 + ((i) * 384), 4) // i: 0-63
+
+// i: 0-63, j: 0-14
+#define RPS_ST_REF_SET_DELTA_POC(i, j) BW_FIELD(1032 + ((i) * 384) + ((j) * 17), 16)
+
+// i: 0-63, j: 0-14
+#define RPS_ST_REF_SET_USED(i, j) BW_FIELD(1048 + ((i) * 384) + ((j) * 17), 1)
+
+#define RKVDEC_RPS_HEVC_SIZE ALIGN(RPS_ST_REF_SET_USED(64, 15).offset, 128)
struct rkvdec_rps {
- struct rkvdec_rps_refs refs[32];
- struct rkvdec_rps_short_term_ref_set short_term_ref_sets[64];
+ u32 info[RKVDEC_RPS_HEVC_SIZE / 8 / 4];
} __packed;
struct rkvdec_hevc_run {
--
2.53.0
^ permalink raw reply related
* [PATCH 4/4] media: rkvdec: vdpu383: Drop bitfields for the bitwriter
From: Detlev Casanova @ 2026-03-27 15:16 UTC (permalink / raw)
To: Ezequiel Garcia, Mauro Carvalho Chehab, Heiko Stuebner,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonas Karlman, Nicolas Dufresne
Cc: linux-kernel, linux-media, linux-rockchip, linux-arm-kernel, llvm,
kernel, Detlev Casanova
In-Reply-To: <20260327-rkvdec-use-bitwriter-v1-0-982cf872b590@collabora.com>
The VDPU383 support for hevc and h264 use structs with bitfields to
represent the SPS and PPS.
Because the fields are mostly unaligned and numerous, it brings compiler
issues, especially with clang.
To prevent that, switch to using the global bitwriter previously
introduced instead.
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
---
.../platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c | 351 ++++++--------
.../platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c | 502 +++++++++------------
2 files changed, 360 insertions(+), 493 deletions(-)
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
index fb4f849d7366..a08038fbc6d5 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
@@ -15,105 +15,64 @@
#include "rkvdec-cabac.h"
#include "rkvdec-vdpu383-regs.h"
#include "rkvdec-h264-common.h"
-
-struct rkvdec_sps {
- u16 seq_parameter_set_id: 4;
- u16 profile_idc: 8;
- u16 constraint_set3_flag: 1;
- u16 chroma_format_idc: 2;
- u16 bit_depth_luma: 3;
- u16 bit_depth_chroma: 3;
- u16 qpprime_y_zero_transform_bypass_flag: 1;
- u16 log2_max_frame_num_minus4: 4;
- u16 max_num_ref_frames: 5;
- u16 pic_order_cnt_type: 2;
- u16 log2_max_pic_order_cnt_lsb_minus4: 4;
- u16 delta_pic_order_always_zero_flag: 1;
-
- u16 pic_width_in_mbs: 16;
- u16 pic_height_in_mbs: 16;
-
- u16 frame_mbs_only_flag: 1;
- u16 mb_adaptive_frame_field_flag: 1;
- u16 direct_8x8_inference_flag: 1;
- u16 mvc_extension_enable: 1;
- u16 num_views: 2;
- u16 view_id0: 10;
- u16 view_id1: 10;
-} __packed;
-
-struct rkvdec_pps {
- u32 pic_parameter_set_id: 8;
- u32 pps_seq_parameter_set_id: 5;
- u32 entropy_coding_mode_flag: 1;
- u32 bottom_field_pic_order_in_frame_present_flag: 1;
- u32 num_ref_idx_l0_default_active_minus1: 5;
- u32 num_ref_idx_l1_default_active_minus1: 5;
- u32 weighted_pred_flag: 1;
- u32 weighted_bipred_idc: 2;
- u32 pic_init_qp_minus26: 7;
- u32 pic_init_qs_minus26: 6;
- u32 chroma_qp_index_offset: 5;
- u32 deblocking_filter_control_present_flag: 1;
- u32 constrained_intra_pred_flag: 1;
- u32 redundant_pic_cnt_present: 1;
- u32 transform_8x8_mode_flag: 1;
- u32 second_chroma_qp_index_offset: 5;
- u32 scaling_list_enable_flag: 1;
- u32 is_longterm: 16;
- u32 voidx: 16;
-
- // dpb
- u32 pic_field_flag: 1;
- u32 pic_associated_flag: 1;
- u32 cur_top_field: 32;
- u32 cur_bot_field: 32;
-
- u32 top_field_order_cnt0: 32;
- u32 bot_field_order_cnt0: 32;
- u32 top_field_order_cnt1: 32;
- u32 bot_field_order_cnt1: 32;
- u32 top_field_order_cnt2: 32;
- u32 bot_field_order_cnt2: 32;
- u32 top_field_order_cnt3: 32;
- u32 bot_field_order_cnt3: 32;
- u32 top_field_order_cnt4: 32;
- u32 bot_field_order_cnt4: 32;
- u32 top_field_order_cnt5: 32;
- u32 bot_field_order_cnt5: 32;
- u32 top_field_order_cnt6: 32;
- u32 bot_field_order_cnt6: 32;
- u32 top_field_order_cnt7: 32;
- u32 bot_field_order_cnt7: 32;
- u32 top_field_order_cnt8: 32;
- u32 bot_field_order_cnt8: 32;
- u32 top_field_order_cnt9: 32;
- u32 bot_field_order_cnt9: 32;
- u32 top_field_order_cnt10: 32;
- u32 bot_field_order_cnt10: 32;
- u32 top_field_order_cnt11: 32;
- u32 bot_field_order_cnt11: 32;
- u32 top_field_order_cnt12: 32;
- u32 bot_field_order_cnt12: 32;
- u32 top_field_order_cnt13: 32;
- u32 bot_field_order_cnt13: 32;
- u32 top_field_order_cnt14: 32;
- u32 bot_field_order_cnt14: 32;
- u32 top_field_order_cnt15: 32;
- u32 bot_field_order_cnt15: 32;
-
- u32 ref_field_flags: 16;
- u32 ref_topfield_used: 16;
- u32 ref_botfield_used: 16;
- u32 ref_colmv_use_flag: 16;
-
- u32 reserved0: 30;
- u32 reserved[3];
-} __packed;
+#include "rkvdec-bitwriter.h"
+
+#define SEQ_PARAMETER_SET_ID BW_FIELD(0, 4)
+#define PROFILE_IDC BW_FIELD(4, 8)
+#define CONSTRAINT_SET3_FLAG BW_FIELD(12, 1)
+#define CHROMA_FORMAT_IDC BW_FIELD(13, 2)
+#define BIT_DEPTH_LUMA BW_FIELD(15, 3)
+#define BIT_DEPTH_CHROMA BW_FIELD(18, 3)
+#define QPPRIME_Y_ZERO_TRANSFORM_BYPASS_FLAG BW_FIELD(21, 1)
+#define LOG2_MAX_FRAME_NUM_MINUS4 BW_FIELD(22, 4)
+#define MAX_NUM_REF_FRAMES BW_FIELD(26, 5)
+#define PIC_ORDER_CNT_TYPE BW_FIELD(31, 2)
+#define LOG2_MAX_PIC_ORDER_CNT_LSB_MINUS4 BW_FIELD(33, 4)
+#define DELTA_PIC_ORDER_ALWAYS_ZERO_FLAG BW_FIELD(37, 1)
+#define PIC_WIDTH_IN_MBS BW_FIELD(38, 16)
+#define PIC_HEIGHT_IN_MBS BW_FIELD(54, 16)
+#define FRAME_MBS_ONLY_FLAG BW_FIELD(70, 1)
+#define MB_ADAPTIVE_FRAME_FIELD_FLAG BW_FIELD(71, 1)
+#define DIRECT_8X8_INFERENCE_FLAG BW_FIELD(72, 1)
+#define MVC_EXTENSION_ENABLE BW_FIELD(73, 1)
+#define NUM_VIEWS BW_FIELD(74, 2)
+#define VIEW_ID(i) BW_FIELD(76 + ((i) * 10), 10) // i: 0-1
+
+#define PIC_PARAMETER_SET_ID BW_FIELD(96, 8)
+#define PPS_SEQ_PARAMETER_SET_ID BW_FIELD(104, 5)
+#define ENTROPY_CODING_MODE_FLAG BW_FIELD(109, 1)
+#define BOTTOM_FIELD_PIC_ORDER_IN_FRAME_PRESENT_FLAG BW_FIELD(110, 1)
+#define NUM_REF_IDX_L_DEFAULT_ACTIVE_MINUS1(i) BW_FIELD(111 + ((i) * 5), 5) // i: 0-1
+#define WEIGHTED_PRED_FLAG BW_FIELD(121, 1)
+#define WEIGHTED_BIPRED_IDC BW_FIELD(122, 2)
+#define PIC_INIT_QP_MINUS26 BW_FIELD(124, 7)
+#define PIC_INIT_QS_MINUS26 BW_FIELD(131, 6)
+#define CHROMA_QP_INDEX_OFFSET BW_FIELD(137, 5)
+#define DEBLOCKING_FILTER_CONTROL_PRESENT_FLAG BW_FIELD(142, 1)
+#define CONSTRAINED_INTRA_PRED_FLAG BW_FIELD(143, 1)
+#define REDUNDANT_PIC_CNT_PRESENT BW_FIELD(144, 1)
+#define TRANSFORM_8X8_MODE_FLAG BW_FIELD(145, 1)
+#define SECOND_CHROMA_QP_INDEX_OFFSET BW_FIELD(146, 5)
+#define SCALING_LIST_ENABLE_FLAG BW_FIELD(151, 1)
+#define IS_LONG_TERM(i) BW_FIELD(152 + (i), 1) // i: 0-15
+
+#define PIC_FIELD_FLAG BW_FIELD(184, 1)
+#define PIC_ASSOCIATED_FLAG BW_FIELD(185, 1)
+#define CUR_TOP_FIELD BW_FIELD(186, 32)
+#define CUR_BOT_FIELD BW_FIELD(218, 32)
+
+#define TOP_FIELD_ORDER_CNT(i) BW_FIELD(250 + (i) * 64, 32) // i: 0-15
+#define BOT_FIELD_ORDER_CNT(i) BW_FIELD(282 + (i) * 64, 32) // i: 0-15
+
+#define REF_FIELD_FLAGS(i) BW_FIELD(1274 + (i), 1) // i: 0-15
+#define REF_TOPFIELD_USED(i) BW_FIELD(1290 + (i), 1) // i: 0-15
+#define REF_BOTFIELD_USED(i) BW_FIELD(1306 + (i), 1) // i: 0-15
+#define REF_COLMV_USE_FLAG(i) BW_FIELD(1322 + (i), 1) // i: 0-15
+
+#define SPS_SIZE ALIGN(REF_COLMV_USE_FLAG(16).offset, 128)
struct rkvdec_sps_pps {
- struct rkvdec_sps sps;
- struct rkvdec_pps pps;
+ u32 info[SPS_SIZE / 8 / 4];
} __packed;
/* Data structure describing auxiliary buffer format. */
@@ -130,67 +89,6 @@ struct rkvdec_h264_ctx {
struct vdpu383_regs_h26x regs;
};
-static noinline_for_stack void set_field_order_cnt(struct rkvdec_pps *pps, const struct v4l2_h264_dpb_entry *dpb)
-{
- pps->top_field_order_cnt0 = dpb[0].top_field_order_cnt;
- pps->bot_field_order_cnt0 = dpb[0].bottom_field_order_cnt;
- pps->top_field_order_cnt1 = dpb[1].top_field_order_cnt;
- pps->bot_field_order_cnt1 = dpb[1].bottom_field_order_cnt;
- pps->top_field_order_cnt2 = dpb[2].top_field_order_cnt;
- pps->bot_field_order_cnt2 = dpb[2].bottom_field_order_cnt;
- pps->top_field_order_cnt3 = dpb[3].top_field_order_cnt;
- pps->bot_field_order_cnt3 = dpb[3].bottom_field_order_cnt;
- pps->top_field_order_cnt4 = dpb[4].top_field_order_cnt;
- pps->bot_field_order_cnt4 = dpb[4].bottom_field_order_cnt;
- pps->top_field_order_cnt5 = dpb[5].top_field_order_cnt;
- pps->bot_field_order_cnt5 = dpb[5].bottom_field_order_cnt;
- pps->top_field_order_cnt6 = dpb[6].top_field_order_cnt;
- pps->bot_field_order_cnt6 = dpb[6].bottom_field_order_cnt;
- pps->top_field_order_cnt7 = dpb[7].top_field_order_cnt;
- pps->bot_field_order_cnt7 = dpb[7].bottom_field_order_cnt;
- pps->top_field_order_cnt8 = dpb[8].top_field_order_cnt;
- pps->bot_field_order_cnt8 = dpb[8].bottom_field_order_cnt;
- pps->top_field_order_cnt9 = dpb[9].top_field_order_cnt;
- pps->bot_field_order_cnt9 = dpb[9].bottom_field_order_cnt;
- pps->top_field_order_cnt10 = dpb[10].top_field_order_cnt;
- pps->bot_field_order_cnt10 = dpb[10].bottom_field_order_cnt;
- pps->top_field_order_cnt11 = dpb[11].top_field_order_cnt;
- pps->bot_field_order_cnt11 = dpb[11].bottom_field_order_cnt;
- pps->top_field_order_cnt12 = dpb[12].top_field_order_cnt;
- pps->bot_field_order_cnt12 = dpb[12].bottom_field_order_cnt;
- pps->top_field_order_cnt13 = dpb[13].top_field_order_cnt;
- pps->bot_field_order_cnt13 = dpb[13].bottom_field_order_cnt;
- pps->top_field_order_cnt14 = dpb[14].top_field_order_cnt;
- pps->bot_field_order_cnt14 = dpb[14].bottom_field_order_cnt;
- pps->top_field_order_cnt15 = dpb[15].top_field_order_cnt;
- pps->bot_field_order_cnt15 = dpb[15].bottom_field_order_cnt;
-}
-
-static noinline_for_stack void set_dec_params(struct rkvdec_pps *pps, const struct v4l2_ctrl_h264_decode_params *dec_params)
-{
- const struct v4l2_h264_dpb_entry *dpb = dec_params->dpb;
-
- for (int i = 0; i < ARRAY_SIZE(dec_params->dpb); i++) {
- if (dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM)
- pps->is_longterm |= (1 << i);
- pps->ref_field_flags |=
- (!!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_FIELD)) << i;
- pps->ref_colmv_use_flag |=
- (!!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE)) << i;
- pps->ref_topfield_used |=
- (!!(dpb[i].fields & V4L2_H264_TOP_FIELD_REF)) << i;
- pps->ref_botfield_used |=
- (!!(dpb[i].fields & V4L2_H264_BOTTOM_FIELD_REF)) << i;
- }
- pps->pic_field_flag =
- !!(dec_params->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC);
- pps->pic_associated_flag =
- !!(dec_params->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD);
-
- pps->cur_top_field = dec_params->top_field_order_cnt;
- pps->cur_bot_field = dec_params->bottom_field_order_cnt;
-}
-
static void assemble_hw_pps(struct rkvdec_ctx *ctx,
struct rkvdec_h264_run *run)
{
@@ -202,6 +100,7 @@ static void assemble_hw_pps(struct rkvdec_ctx *ctx,
struct rkvdec_h264_priv_tbl *priv_tbl = h264_ctx->priv_tbl.cpu;
struct rkvdec_sps_pps *hw_ps;
u32 pic_width, pic_height;
+ int i;
/*
* HW read the SPS/PPS information from PPS packet index by PPS id.
@@ -213,23 +112,25 @@ static void assemble_hw_pps(struct rkvdec_ctx *ctx,
memset(hw_ps, 0, sizeof(*hw_ps));
/* write sps */
- hw_ps->sps.seq_parameter_set_id = sps->seq_parameter_set_id;
- hw_ps->sps.profile_idc = sps->profile_idc;
- hw_ps->sps.constraint_set3_flag = !!(sps->constraint_set_flags & (1 << 3));
- hw_ps->sps.chroma_format_idc = sps->chroma_format_idc;
- hw_ps->sps.bit_depth_luma = sps->bit_depth_luma_minus8;
- hw_ps->sps.bit_depth_chroma = sps->bit_depth_chroma_minus8;
- hw_ps->sps.qpprime_y_zero_transform_bypass_flag =
- !!(sps->flags & V4L2_H264_SPS_FLAG_QPPRIME_Y_ZERO_TRANSFORM_BYPASS);
- hw_ps->sps.log2_max_frame_num_minus4 = sps->log2_max_frame_num_minus4;
- hw_ps->sps.max_num_ref_frames = sps->max_num_ref_frames;
- hw_ps->sps.pic_order_cnt_type = sps->pic_order_cnt_type;
- hw_ps->sps.log2_max_pic_order_cnt_lsb_minus4 =
- sps->log2_max_pic_order_cnt_lsb_minus4;
- hw_ps->sps.delta_pic_order_always_zero_flag =
- !!(sps->flags & V4L2_H264_SPS_FLAG_DELTA_PIC_ORDER_ALWAYS_ZERO);
- hw_ps->sps.mvc_extension_enable = 0;
- hw_ps->sps.num_views = 0;
+ rkvdec_set_bw_field(hw_ps->info, SEQ_PARAMETER_SET_ID, sps->seq_parameter_set_id);
+ rkvdec_set_bw_field(hw_ps->info, PROFILE_IDC, sps->profile_idc);
+ rkvdec_set_bw_field(hw_ps->info, CONSTRAINT_SET3_FLAG,
+ !!(sps->constraint_set_flags & (1 << 3)));
+ rkvdec_set_bw_field(hw_ps->info, CHROMA_FORMAT_IDC, sps->chroma_format_idc);
+ rkvdec_set_bw_field(hw_ps->info, BIT_DEPTH_LUMA, sps->bit_depth_luma_minus8);
+ rkvdec_set_bw_field(hw_ps->info, BIT_DEPTH_CHROMA, sps->bit_depth_chroma_minus8);
+ rkvdec_set_bw_field(hw_ps->info, QPPRIME_Y_ZERO_TRANSFORM_BYPASS_FLAG,
+ !!(sps->flags & V4L2_H264_SPS_FLAG_QPPRIME_Y_ZERO_TRANSFORM_BYPASS));
+ rkvdec_set_bw_field(hw_ps->info, LOG2_MAX_FRAME_NUM_MINUS4,
+ sps->log2_max_frame_num_minus4);
+ rkvdec_set_bw_field(hw_ps->info, MAX_NUM_REF_FRAMES, sps->max_num_ref_frames);
+ rkvdec_set_bw_field(hw_ps->info, PIC_ORDER_CNT_TYPE, sps->pic_order_cnt_type);
+ rkvdec_set_bw_field(hw_ps->info, LOG2_MAX_PIC_ORDER_CNT_LSB_MINUS4,
+ sps->log2_max_pic_order_cnt_lsb_minus4);
+ rkvdec_set_bw_field(hw_ps->info, DELTA_PIC_ORDER_ALWAYS_ZERO_FLAG,
+ !!(sps->flags & V4L2_H264_SPS_FLAG_DELTA_PIC_ORDER_ALWAYS_ZERO));
+ rkvdec_set_bw_field(hw_ps->info, MVC_EXTENSION_ENABLE, 0);
+ rkvdec_set_bw_field(hw_ps->info, NUM_VIEWS, 0);
/*
* Use the SPS values since they are already in macroblocks
@@ -245,48 +146,72 @@ static void assemble_hw_pps(struct rkvdec_ctx *ctx,
if (!!(dec_params->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC))
pic_height /= 2;
- hw_ps->sps.pic_width_in_mbs = pic_width;
- hw_ps->sps.pic_height_in_mbs = pic_height;
+ rkvdec_set_bw_field(hw_ps->info, PIC_WIDTH_IN_MBS, pic_width);
+ rkvdec_set_bw_field(hw_ps->info, PIC_HEIGHT_IN_MBS, pic_height);
- hw_ps->sps.frame_mbs_only_flag =
- !!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY);
- hw_ps->sps.mb_adaptive_frame_field_flag =
- !!(sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD);
- hw_ps->sps.direct_8x8_inference_flag =
- !!(sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE);
+ rkvdec_set_bw_field(hw_ps->info, FRAME_MBS_ONLY_FLAG,
+ !!(sps->flags & V4L2_H264_SPS_FLAG_FRAME_MBS_ONLY));
+ rkvdec_set_bw_field(hw_ps->info, MB_ADAPTIVE_FRAME_FIELD_FLAG,
+ !!(sps->flags & V4L2_H264_SPS_FLAG_MB_ADAPTIVE_FRAME_FIELD));
+ rkvdec_set_bw_field(hw_ps->info, DIRECT_8X8_INFERENCE_FLAG,
+ !!(sps->flags & V4L2_H264_SPS_FLAG_DIRECT_8X8_INFERENCE));
/* write pps */
- hw_ps->pps.pic_parameter_set_id = pps->pic_parameter_set_id;
- hw_ps->pps.pps_seq_parameter_set_id = pps->seq_parameter_set_id;
- hw_ps->pps.entropy_coding_mode_flag =
- !!(pps->flags & V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE);
- hw_ps->pps.bottom_field_pic_order_in_frame_present_flag =
- !!(pps->flags & V4L2_H264_PPS_FLAG_BOTTOM_FIELD_PIC_ORDER_IN_FRAME_PRESENT);
- hw_ps->pps.num_ref_idx_l0_default_active_minus1 =
- pps->num_ref_idx_l0_default_active_minus1;
- hw_ps->pps.num_ref_idx_l1_default_active_minus1 =
- pps->num_ref_idx_l1_default_active_minus1;
- hw_ps->pps.weighted_pred_flag =
- !!(pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED);
- hw_ps->pps.weighted_bipred_idc = pps->weighted_bipred_idc;
- hw_ps->pps.pic_init_qp_minus26 = pps->pic_init_qp_minus26;
- hw_ps->pps.pic_init_qs_minus26 = pps->pic_init_qs_minus26;
- hw_ps->pps.chroma_qp_index_offset = pps->chroma_qp_index_offset;
- hw_ps->pps.deblocking_filter_control_present_flag =
- !!(pps->flags & V4L2_H264_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT);
- hw_ps->pps.constrained_intra_pred_flag =
- !!(pps->flags & V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED);
- hw_ps->pps.redundant_pic_cnt_present =
- !!(pps->flags & V4L2_H264_PPS_FLAG_REDUNDANT_PIC_CNT_PRESENT);
- hw_ps->pps.transform_8x8_mode_flag =
- !!(pps->flags & V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE);
- hw_ps->pps.second_chroma_qp_index_offset = pps->second_chroma_qp_index_offset;
- hw_ps->pps.scaling_list_enable_flag =
- !!(pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT);
-
- set_field_order_cnt(&hw_ps->pps, dpb);
- set_dec_params(&hw_ps->pps, dec_params);
+ rkvdec_set_bw_field(hw_ps->info, PIC_PARAMETER_SET_ID, pps->pic_parameter_set_id);
+ rkvdec_set_bw_field(hw_ps->info, PPS_SEQ_PARAMETER_SET_ID, pps->seq_parameter_set_id);
+ rkvdec_set_bw_field(hw_ps->info, ENTROPY_CODING_MODE_FLAG,
+ !!(pps->flags & V4L2_H264_PPS_FLAG_ENTROPY_CODING_MODE));
+ rkvdec_set_bw_field(hw_ps->info, BOTTOM_FIELD_PIC_ORDER_IN_FRAME_PRESENT_FLAG,
+ !!(pps->flags &
+ V4L2_H264_PPS_FLAG_BOTTOM_FIELD_PIC_ORDER_IN_FRAME_PRESENT));
+ rkvdec_set_bw_field(hw_ps->info, NUM_REF_IDX_L_DEFAULT_ACTIVE_MINUS1(0),
+ pps->num_ref_idx_l0_default_active_minus1);
+ rkvdec_set_bw_field(hw_ps->info, NUM_REF_IDX_L_DEFAULT_ACTIVE_MINUS1(1),
+ pps->num_ref_idx_l1_default_active_minus1);
+ rkvdec_set_bw_field(hw_ps->info, WEIGHTED_PRED_FLAG,
+ !!(pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED));
+ rkvdec_set_bw_field(hw_ps->info, WEIGHTED_BIPRED_IDC, pps->weighted_bipred_idc);
+ rkvdec_set_bw_field(hw_ps->info, PIC_INIT_QP_MINUS26, pps->pic_init_qp_minus26);
+ rkvdec_set_bw_field(hw_ps->info, PIC_INIT_QS_MINUS26, pps->pic_init_qs_minus26);
+ rkvdec_set_bw_field(hw_ps->info, CHROMA_QP_INDEX_OFFSET, pps->chroma_qp_index_offset);
+ rkvdec_set_bw_field(hw_ps->info, DEBLOCKING_FILTER_CONTROL_PRESENT_FLAG,
+ !!(pps->flags & V4L2_H264_PPS_FLAG_DEBLOCKING_FILTER_CONTROL_PRESENT));
+ rkvdec_set_bw_field(hw_ps->info, CONSTRAINED_INTRA_PRED_FLAG,
+ !!(pps->flags & V4L2_H264_PPS_FLAG_CONSTRAINED_INTRA_PRED));
+ rkvdec_set_bw_field(hw_ps->info, REDUNDANT_PIC_CNT_PRESENT,
+ !!(pps->flags & V4L2_H264_PPS_FLAG_REDUNDANT_PIC_CNT_PRESENT));
+ rkvdec_set_bw_field(hw_ps->info, TRANSFORM_8X8_MODE_FLAG,
+ !!(pps->flags & V4L2_H264_PPS_FLAG_TRANSFORM_8X8_MODE));
+ rkvdec_set_bw_field(hw_ps->info, SECOND_CHROMA_QP_INDEX_OFFSET,
+ pps->second_chroma_qp_index_offset);
+ rkvdec_set_bw_field(hw_ps->info, SCALING_LIST_ENABLE_FLAG,
+ !!(pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT));
+
+ for (i = 0; i < ARRAY_SIZE(dec_params->dpb); i++) {
+ rkvdec_set_bw_field(hw_ps->info, TOP_FIELD_ORDER_CNT(i),
+ dpb[i].top_field_order_cnt);
+ rkvdec_set_bw_field(hw_ps->info, BOT_FIELD_ORDER_CNT(i),
+ dpb[i].bottom_field_order_cnt);
+
+ rkvdec_set_bw_field(hw_ps->info, IS_LONG_TERM(i),
+ !!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_LONG_TERM));
+ rkvdec_set_bw_field(hw_ps->info, REF_FIELD_FLAGS(i),
+ !!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_FIELD));
+ rkvdec_set_bw_field(hw_ps->info, REF_COLMV_USE_FLAG(i),
+ !!(dpb[i].flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE));
+ rkvdec_set_bw_field(hw_ps->info, REF_TOPFIELD_USED(i),
+ !!(dpb[i].fields & V4L2_H264_TOP_FIELD_REF));
+ rkvdec_set_bw_field(hw_ps->info, REF_BOTFIELD_USED(i),
+ !!(dpb[i].fields & V4L2_H264_BOTTOM_FIELD_REF));
+ }
+
+ rkvdec_set_bw_field(hw_ps->info, PIC_FIELD_FLAG,
+ !!(dec_params->flags & V4L2_H264_DECODE_PARAM_FLAG_FIELD_PIC));
+ rkvdec_set_bw_field(hw_ps->info, PIC_ASSOCIATED_FLAG,
+ !!(dec_params->flags & V4L2_H264_DECODE_PARAM_FLAG_BOTTOM_FIELD));
+ rkvdec_set_bw_field(hw_ps->info, CUR_TOP_FIELD, dec_params->top_field_order_cnt);
+ rkvdec_set_bw_field(hw_ps->info, CUR_BOT_FIELD, dec_params->bottom_field_order_cnt);
}
static void rkvdec_write_regs(struct rkvdec_ctx *ctx)
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c
index 96d938ee70b0..c818a92f1e63 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c
@@ -13,149 +13,106 @@
#include "rkvdec-rcb.h"
#include "rkvdec-hevc-common.h"
#include "rkvdec-vdpu383-regs.h"
+#include "rkvdec-bitwriter.h"
+
+#define VIDEO_PARAMETER_SET_ID BW_FIELD(0, 4)
+#define SEQ_PARAMETER_SET_ID BW_FIELD(4, 4)
+#define CHROMA_FORMAT_IDC BW_FIELD(8, 2)
+#define PIC_WIDTH_IN_LUMA_SAMPLES BW_FIELD(10, 16)
+#define PIC_HEIGHT_IN_LUMA_SAMPLES BW_FIELD(26, 16)
+#define BIT_DEPTH_LUMA BW_FIELD(42, 3)
+#define BIT_DEPTH_CHROMA BW_FIELD(45, 3)
+#define LOG2_MAX_PIC_ORDER_CNT_LSB BW_FIELD(48, 5)
+#define LOG2_DIFF_MAX_MIN_LUMA_CODING_BLOCK_SIZE BW_FIELD(53, 2)
+#define LOG2_MIN_LUMA_CODING_BLOCK_SIZE BW_FIELD(55, 3)
+#define LOG2_MIN_TRANSFORM_BLOCK_SIZE BW_FIELD(58, 3)
+#define LOG2_DIFF_MAX_MIN_LUMA_TRANSFORM_BLOCK_SIZE BW_FIELD(61, 2)
+#define MAX_TRANSFORM_HIERARCHY_DEPTH_INTER BW_FIELD(63, 3)
+#define MAX_TRANSFORM_HIERARCHY_DEPTH_INTRA BW_FIELD(66, 3)
+#define SCALING_LIST_ENABLED_FLAG BW_FIELD(69, 1)
+#define AMP_ENABLED_FLAG BW_FIELD(70, 1)
+#define SAMPLE_ADAPTIVE_OFFSET_ENABLED_FLAG BW_FIELD(71, 1)
+#define PCM_ENABLED_FLAG BW_FIELD(72, 1)
+#define PCM_SAMPLE_BIT_DEPTH_LUMA BW_FIELD(73, 4)
+#define PCM_SAMPLE_BIT_DEPTH_CHROMA BW_FIELD(77, 4)
+#define PCM_LOOP_FILTER_DISABLED_FLAG BW_FIELD(81, 1)
+#define LOG2_DIFF_MAX_MIN_PCM_LUMA_CODING_BLOCK_SIZE BW_FIELD(82, 3)
+#define LOG2_MIN_PCM_LUMA_CODING_BLOCK_SIZE BW_FIELD(85, 3)
+#define NUM_SHORT_TERM_REF_PIC_SETS BW_FIELD(88, 7)
+#define LONG_TERM_REF_PICS_PRESENT_FLAG BW_FIELD(95, 1)
+#define NUM_LONG_TERM_REF_PICS_SPS BW_FIELD(96, 6)
+#define SPS_TEMPORAL_MVP_ENABLED_FLAG BW_FIELD(102, 1)
+#define STRONG_INTRA_SMOOTHING_ENABLED_FLAG BW_FIELD(103, 1)
+#define SPS_MAX_DEC_PIC_BUFFERING_MINUS1 BW_FIELD(111, 4)
+#define SEPARATE_COLOUR_PLANE_FLAG BW_FIELD(115, 1)
+#define HIGH_PRECISION_OFFSETS_ENABLED_FLAG BW_FIELD(116, 1)
+#define PERSISTENT_RICE_ADAPTATION_ENABLED_FLAG BW_FIELD(117, 1)
+
+/* PPS */
+#define PIC_PARAMETER_SET_ID BW_FIELD(118, 6)
+#define PPS_SEQ_PARAMETER_SET_ID BW_FIELD(124, 4)
+#define DEPENDENT_SLICE_SEGMENTS_ENABLED_FLAG BW_FIELD(128, 1)
+#define OUTPUT_FLAG_PRESENT_FLAG BW_FIELD(129, 1)
+#define NUM_EXTRA_SLICE_HEADER_BITS BW_FIELD(130, 13)
+#define SIGN_DATA_HIDING_ENABLED_FLAG BW_FIELD(143, 1)
+#define CABAC_INIT_PRESENT_FLAG BW_FIELD(144, 1)
+#define NUM_REF_IDX_L0_DEFAULT_ACTIVE BW_FIELD(145, 4)
+#define NUM_REF_IDX_L1_DEFAULT_ACTIVE BW_FIELD(149, 4)
+#define INIT_QP_MINUS26 BW_FIELD(153, 7)
+#define CONSTRAINED_INTRA_PRED_FLAG BW_FIELD(160, 1)
+#define TRANSFORM_SKIP_ENABLED_FLAG BW_FIELD(161, 1)
+#define CU_QP_DELTA_ENABLED_FLAG BW_FIELD(162, 1)
+#define LOG2_MIN_CU_QP_DELTA_SIZE BW_FIELD(163, 3)
+#define PPS_CB_QP_OFFSET BW_FIELD(166, 5)
+#define PPS_CR_QP_OFFSET BW_FIELD(171, 5)
+#define PPS_SLICE_CHROMA_QP_OFFSETS_PRESENT_FLAG BW_FIELD(176, 1)
+#define WEIGHTED_PRED_FLAG BW_FIELD(177, 1)
+#define WEIGHTED_BIPRED_FLAG BW_FIELD(178, 1)
+#define TRANSQUANT_BYPASS_ENABLED_FLAG BW_FIELD(179, 1)
+#define TILES_ENABLED_FLAG BW_FIELD(180, 1)
+#define ENTROPY_CODING_SYNC_ENABLED_FLAG BW_FIELD(181, 1)
+#define PPS_LOOP_FILTER_ACROSS_SLICES_ENABLED_FLAG BW_FIELD(182, 1)
+#define LOOP_FILTER_ACROSS_TILES_ENABLED_FLAG BW_FIELD(183, 1)
+#define DEBLOCKING_FILTER_OVERRIDE_ENABLED_FLAG BW_FIELD(184, 1)
+#define PPS_DEBLOCKING_FILTER_DISABLED_FLAG BW_FIELD(185, 1)
+#define PPS_BETA_OFFSET_DIV2 BW_FIELD(186, 4)
+#define PPS_TC_OFFSET_DIV2 BW_FIELD(190, 4)
+#define LISTS_MODIFICATION_PRESENT_FLAG BW_FIELD(194, 1)
+#define LOG2_PARALLEL_MERGE_LEVEL BW_FIELD(195, 3)
+#define SLICE_SEGMENT_HEADER_EXTENSION_PRESENT_FLAG BW_FIELD(198, 1)
+
+/* pps extensions */
+#define LOG2_MAX_TRANSFORM_SKIP_BLOCK_SIZE BW_FIELD(202, 2)
+#define CROSS_COMPONENT_PREDICTION_ENABLED_FLAG BW_FIELD(204, 1)
+#define CHROMA_QP_OFFSET_LIST_ENABLED_FLAG BW_FIELD(205, 1)
+#define LOG2_MIN_CU_CHROMA_QP_DELTA_SIZE BW_FIELD(206, 3)
+#define CB_QP_OFFSET_LIST(i) BW_FIELD(209 + (i) * 5, 5) // i: 0-5
+#define CB_CR_OFFSET_LIST(i) BW_FIELD(239 + (i) * 5, 5) // i: 0-5
+#define CHROMA_QP_OFFSET_LIST_LEN_MINUS1 BW_FIELD(269, 3)
+
+/* mvc0 && mvc1 */
+#define MVC_FF BW_FIELD(272, 16)
+#define MVC_00 BW_FIELD(288, 9)
+
+/* poc info */
+#define RESERVED2 BW_FIELD(297, 3)
+#define CURRENT_POC BW_FIELD(300, 32)
+#define REF_PIC_POC(i) BW_FIELD(332 + (i) * 32, 32) // i: 0-14
+#define RESERVED3 BW_FIELD(812, 32)
+#define REF_IS_VALID(i) BW_FIELD(844 + (i), 1) // i: 0-14
+#define RESERVED4 BW_FIELD(859, 1)
+
+/* tile info*/
+#define NUM_TILE_COLUMNS BW_FIELD(860, 5)
+#define NUM_TILE_ROWS BW_FIELD(865, 5)
+#define COLUMN_WIDTH(i) BW_FIELD(870 + (i) * 12, 12) // i: 0-19
+#define ROW_HEIGHT(i) BW_FIELD(1110 + (i) * 12, 12) // i: 0-21
+
+#define HEVC_SPS_SIZE ALIGN(ROW_HEIGHT(22).offset, 256)
struct rkvdec_hevc_sps_pps {
- // SPS
- u16 video_parameters_set_id : 4;
- u16 seq_parameters_set_id_sps : 4;
- u16 chroma_format_idc : 2;
- u16 width : 16;
- u16 height : 16;
- u16 bit_depth_luma : 3;
- u16 bit_depth_chroma : 3;
- u16 max_pic_order_count_lsb : 5;
- u16 diff_max_min_luma_coding_block_size : 2;
- u16 min_luma_coding_block_size : 3;
- u16 min_transform_block_size : 3;
- u16 diff_max_min_transform_block_size : 2;
- u16 max_transform_hierarchy_depth_inter : 3;
- u16 max_transform_hierarchy_depth_intra : 3;
- u16 scaling_list_enabled_flag : 1;
- u16 amp_enabled_flag : 1;
- u16 sample_adaptive_offset_enabled_flag : 1;
- u16 pcm_enabled_flag : 1;
- u16 pcm_sample_bit_depth_luma : 4;
- u16 pcm_sample_bit_depth_chroma : 4;
- u16 pcm_loop_filter_disabled_flag : 1;
- u16 diff_max_min_pcm_luma_coding_block_size : 3;
- u16 min_pcm_luma_coding_block_size : 3;
- u16 num_short_term_ref_pic_sets : 7;
- u16 long_term_ref_pics_present_flag : 1;
- u16 num_long_term_ref_pics_sps : 6;
- u16 sps_temporal_mvp_enabled_flag : 1;
- u16 strong_intra_smoothing_enabled_flag : 1;
- u16 reserved0 : 7;
- u16 sps_max_dec_pic_buffering_minus1 : 4;
- u16 separate_colour_plane_flag : 1;
- u16 high_precision_offsets_enabled_flag : 1;
- u16 persistent_rice_adaptation_enabled_flag : 1;
-
- // PPS
- u16 picture_parameters_set_id : 6;
- u16 seq_parameters_set_id_pps : 4;
- u16 dependent_slice_segments_enabled_flag : 1;
- u16 output_flag_present_flag : 1;
- u16 num_extra_slice_header_bits : 13;
- u16 sign_data_hiding_enabled_flag : 1;
- u16 cabac_init_present_flag : 1;
- u16 num_ref_idx_l0_default_active : 4;
- u16 num_ref_idx_l1_default_active : 4;
- u16 init_qp_minus26 : 7;
- u16 constrained_intra_pred_flag : 1;
- u16 transform_skip_enabled_flag : 1;
- u16 cu_qp_delta_enabled_flag : 1;
- u16 log2_min_cb_size : 3;
- u16 pps_cb_qp_offset : 5;
- u16 pps_cr_qp_offset : 5;
- u16 pps_slice_chroma_qp_offsets_present_flag : 1;
- u16 weighted_pred_flag : 1;
- u16 weighted_bipred_flag : 1;
- u16 transquant_bypass_enabled_flag : 1;
- u16 tiles_enabled_flag : 1;
- u16 entropy_coding_sync_enabled_flag : 1;
- u16 pps_loop_filter_across_slices_enabled_flag : 1;
- u16 loop_filter_across_tiles_enabled_flag : 1;
- u16 deblocking_filter_override_enabled_flag : 1;
- u16 pps_deblocking_filter_disabled_flag : 1;
- u16 pps_beta_offset_div2 : 4;
- u16 pps_tc_offset_div2 : 4;
- u16 lists_modification_present_flag : 1;
- u16 log2_parallel_merge_level : 3;
- u16 slice_segment_header_extension_present_flag : 1;
- u16 reserved1 : 3;
-
- // pps extensions
- u16 log2_max_transform_skip_block_size : 2;
- u16 cross_component_prediction_enabled_flag : 1;
- u16 chroma_qp_offset_list_enabled_flag : 1;
- u16 log2_min_cu_chroma_qp_delta_size : 3;
- u16 cb_qp_offset_list0 : 5;
- u16 cb_qp_offset_list1 : 5;
- u16 cb_qp_offset_list2 : 5;
- u16 cb_qp_offset_list3 : 5;
- u16 cb_qp_offset_list4 : 5;
- u16 cb_qp_offset_list5 : 5;
- u16 cb_cr_offset_list0 : 5;
- u16 cb_cr_offset_list1 : 5;
- u16 cb_cr_offset_list2 : 5;
- u16 cb_cr_offset_list3 : 5;
- u16 cb_cr_offset_list4 : 5;
- u16 cb_cr_offset_list5 : 5;
- u16 chroma_qp_offset_list_len_minus1 : 3;
-
- /* mvc0 && mvc1 */
- u16 mvc_ff : 16;
- u16 mvc_00 : 9;
-
- /* poc info */
- u16 reserved2 : 3;
- u32 current_poc : 32;
- u32 ref_pic_poc0 : 32;
- u32 ref_pic_poc1 : 32;
- u32 ref_pic_poc2 : 32;
- u32 ref_pic_poc3 : 32;
- u32 ref_pic_poc4 : 32;
- u32 ref_pic_poc5 : 32;
- u32 ref_pic_poc6 : 32;
- u32 ref_pic_poc7 : 32;
- u32 ref_pic_poc8 : 32;
- u32 ref_pic_poc9 : 32;
- u32 ref_pic_poc10 : 32;
- u32 ref_pic_poc11 : 32;
- u32 ref_pic_poc12 : 32;
- u32 ref_pic_poc13 : 32;
- u32 ref_pic_poc14 : 32;
- u32 reserved3 : 32;
- u32 ref_is_valid : 15;
- u32 reserved4 : 1;
-
- /* tile info*/
- u16 num_tile_columns : 5;
- u16 num_tile_rows : 5;
- u32 column_width0 : 24;
- u32 column_width1 : 24;
- u32 column_width2 : 24;
- u32 column_width3 : 24;
- u32 column_width4 : 24;
- u32 column_width5 : 24;
- u32 column_width6 : 24;
- u32 column_width7 : 24;
- u32 column_width8 : 24;
- u32 column_width9 : 24;
- u32 row_height0 : 24;
- u32 row_height1 : 24;
- u32 row_height2 : 24;
- u32 row_height3 : 24;
- u32 row_height4 : 24;
- u32 row_height5 : 24;
- u32 row_height6 : 24;
- u32 row_height7 : 24;
- u32 row_height8 : 24;
- u32 row_height9 : 24;
- u32 row_height10 : 24;
- u32 reserved5 : 2;
- u32 padding;
-} __packed;
+ u32 info[HEVC_SPS_SIZE / 8 / 4];
+};
struct rkvdec_hevc_priv_tbl {
struct rkvdec_hevc_sps_pps param_set;
@@ -171,51 +128,6 @@ struct rkvdec_hevc_ctx {
struct vdpu383_regs_h26x regs;
};
-static void set_column_row(struct rkvdec_hevc_sps_pps *hw_ps, u16 *column, u16 *row)
-{
- hw_ps->column_width0 = column[0] | (column[1] << 12);
- hw_ps->row_height0 = row[0] | (row[1] << 12);
- hw_ps->column_width1 = column[2] | (column[3] << 12);
- hw_ps->row_height1 = row[2] | (row[3] << 12);
- hw_ps->column_width2 = column[4] | (column[5] << 12);
- hw_ps->row_height2 = row[4] | (row[5] << 12);
- hw_ps->column_width3 = column[6] | (column[7] << 12);
- hw_ps->row_height3 = row[6] | (row[7] << 12);
- hw_ps->column_width4 = column[8] | (column[9] << 12);
- hw_ps->row_height4 = row[8] | (row[9] << 12);
- hw_ps->column_width5 = column[10] | (column[11] << 12);
- hw_ps->row_height5 = row[10] | (row[11] << 12);
- hw_ps->column_width6 = column[12] | (column[13] << 12);
- hw_ps->row_height6 = row[12] | (row[13] << 12);
- hw_ps->column_width7 = column[14] | (column[15] << 12);
- hw_ps->row_height7 = row[14] | (row[15] << 12);
- hw_ps->column_width8 = column[16] | (column[17] << 12);
- hw_ps->row_height8 = row[16] | (row[17] << 12);
- hw_ps->column_width9 = column[18] | (column[19] << 12);
- hw_ps->row_height9 = row[18] | (row[19] << 12);
-
- hw_ps->row_height10 = row[20] | (row[21] << 12);
-}
-
-static void set_pps_ref_pic_poc(struct rkvdec_hevc_sps_pps *hw_ps, const struct v4l2_hevc_dpb_entry *dpb)
-{
- hw_ps->ref_pic_poc0 = dpb[0].pic_order_cnt_val;
- hw_ps->ref_pic_poc1 = dpb[1].pic_order_cnt_val;
- hw_ps->ref_pic_poc2 = dpb[2].pic_order_cnt_val;
- hw_ps->ref_pic_poc3 = dpb[3].pic_order_cnt_val;
- hw_ps->ref_pic_poc4 = dpb[4].pic_order_cnt_val;
- hw_ps->ref_pic_poc5 = dpb[5].pic_order_cnt_val;
- hw_ps->ref_pic_poc6 = dpb[6].pic_order_cnt_val;
- hw_ps->ref_pic_poc7 = dpb[7].pic_order_cnt_val;
- hw_ps->ref_pic_poc8 = dpb[8].pic_order_cnt_val;
- hw_ps->ref_pic_poc9 = dpb[9].pic_order_cnt_val;
- hw_ps->ref_pic_poc10 = dpb[10].pic_order_cnt_val;
- hw_ps->ref_pic_poc11 = dpb[11].pic_order_cnt_val;
- hw_ps->ref_pic_poc12 = dpb[12].pic_order_cnt_val;
- hw_ps->ref_pic_poc13 = dpb[13].pic_order_cnt_val;
- hw_ps->ref_pic_poc14 = dpb[14].pic_order_cnt_val;
-}
-
static void assemble_hw_pps(struct rkvdec_ctx *ctx,
struct rkvdec_hevc_run *run)
{
@@ -245,104 +157,130 @@ static void assemble_hw_pps(struct rkvdec_ctx *ctx,
memset(hw_ps, 0, sizeof(*hw_ps));
/* write sps */
- hw_ps->video_parameters_set_id = sps->video_parameter_set_id;
- hw_ps->seq_parameters_set_id_sps = sps->seq_parameter_set_id;
- hw_ps->chroma_format_idc = sps->chroma_format_idc;
+ rkvdec_set_bw_field(hw_ps->info, VIDEO_PARAMETER_SET_ID, sps->video_parameter_set_id);
+ rkvdec_set_bw_field(hw_ps->info, SEQ_PARAMETER_SET_ID, sps->seq_parameter_set_id);
+ rkvdec_set_bw_field(hw_ps->info, CHROMA_FORMAT_IDC, sps->chroma_format_idc);
log2_min_cb_size = sps->log2_min_luma_coding_block_size_minus3 + 3;
width = sps->pic_width_in_luma_samples;
height = sps->pic_height_in_luma_samples;
- hw_ps->width = width;
- hw_ps->height = height;
- hw_ps->bit_depth_luma = sps->bit_depth_luma_minus8 + 8;
- hw_ps->bit_depth_chroma = sps->bit_depth_chroma_minus8 + 8;
- hw_ps->max_pic_order_count_lsb = sps->log2_max_pic_order_cnt_lsb_minus4 + 4;
- hw_ps->diff_max_min_luma_coding_block_size = sps->log2_diff_max_min_luma_coding_block_size;
- hw_ps->min_luma_coding_block_size = sps->log2_min_luma_coding_block_size_minus3 + 3;
- hw_ps->min_transform_block_size = sps->log2_min_luma_transform_block_size_minus2 + 2;
- hw_ps->diff_max_min_transform_block_size =
- sps->log2_diff_max_min_luma_transform_block_size;
- hw_ps->max_transform_hierarchy_depth_inter = sps->max_transform_hierarchy_depth_inter;
- hw_ps->max_transform_hierarchy_depth_intra = sps->max_transform_hierarchy_depth_intra;
- hw_ps->scaling_list_enabled_flag =
- !!(sps->flags & V4L2_HEVC_SPS_FLAG_SCALING_LIST_ENABLED);
- hw_ps->amp_enabled_flag = !!(sps->flags & V4L2_HEVC_SPS_FLAG_AMP_ENABLED);
- hw_ps->sample_adaptive_offset_enabled_flag =
- !!(sps->flags & V4L2_HEVC_SPS_FLAG_SAMPLE_ADAPTIVE_OFFSET);
+
+ rkvdec_set_bw_field(hw_ps->info, PIC_WIDTH_IN_LUMA_SAMPLES, width);
+ rkvdec_set_bw_field(hw_ps->info, PIC_HEIGHT_IN_LUMA_SAMPLES, height);
+ rkvdec_set_bw_field(hw_ps->info, BIT_DEPTH_LUMA, sps->bit_depth_luma_minus8 + 8);
+ rkvdec_set_bw_field(hw_ps->info, BIT_DEPTH_CHROMA, sps->bit_depth_chroma_minus8 + 8);
+ rkvdec_set_bw_field(hw_ps->info, LOG2_MAX_PIC_ORDER_CNT_LSB,
+ sps->log2_max_pic_order_cnt_lsb_minus4 + 4);
+ rkvdec_set_bw_field(hw_ps->info, LOG2_DIFF_MAX_MIN_LUMA_CODING_BLOCK_SIZE,
+ sps->log2_diff_max_min_luma_coding_block_size);
+ rkvdec_set_bw_field(hw_ps->info, LOG2_MIN_LUMA_CODING_BLOCK_SIZE,
+ sps->log2_min_luma_coding_block_size_minus3 + 3);
+ rkvdec_set_bw_field(hw_ps->info, LOG2_MIN_TRANSFORM_BLOCK_SIZE,
+ sps->log2_min_luma_transform_block_size_minus2 + 2);
+ rkvdec_set_bw_field(hw_ps->info, LOG2_DIFF_MAX_MIN_LUMA_TRANSFORM_BLOCK_SIZE,
+ sps->log2_diff_max_min_luma_transform_block_size);
+ rkvdec_set_bw_field(hw_ps->info, MAX_TRANSFORM_HIERARCHY_DEPTH_INTER,
+ sps->max_transform_hierarchy_depth_inter);
+ rkvdec_set_bw_field(hw_ps->info, MAX_TRANSFORM_HIERARCHY_DEPTH_INTRA,
+ sps->max_transform_hierarchy_depth_intra);
+ rkvdec_set_bw_field(hw_ps->info, SCALING_LIST_ENABLED_FLAG,
+ !!(sps->flags & V4L2_HEVC_SPS_FLAG_SCALING_LIST_ENABLED));
+ rkvdec_set_bw_field(hw_ps->info, AMP_ENABLED_FLAG,
+ !!(sps->flags & V4L2_HEVC_SPS_FLAG_AMP_ENABLED));
+ rkvdec_set_bw_field(hw_ps->info, SAMPLE_ADAPTIVE_OFFSET_ENABLED_FLAG,
+ !!(sps->flags & V4L2_HEVC_SPS_FLAG_SAMPLE_ADAPTIVE_OFFSET));
pcm_enabled = !!(sps->flags & V4L2_HEVC_SPS_FLAG_PCM_ENABLED);
- hw_ps->pcm_enabled_flag = pcm_enabled;
- hw_ps->pcm_sample_bit_depth_luma =
- pcm_enabled ? sps->pcm_sample_bit_depth_luma_minus1 + 1 : 0;
- hw_ps->pcm_sample_bit_depth_chroma =
- pcm_enabled ? sps->pcm_sample_bit_depth_chroma_minus1 + 1 : 0;
- hw_ps->pcm_loop_filter_disabled_flag =
- !!(sps->flags & V4L2_HEVC_SPS_FLAG_PCM_LOOP_FILTER_DISABLED);
- hw_ps->diff_max_min_pcm_luma_coding_block_size =
- sps->log2_diff_max_min_pcm_luma_coding_block_size;
- hw_ps->min_pcm_luma_coding_block_size =
- pcm_enabled ? sps->log2_min_pcm_luma_coding_block_size_minus3 + 3 : 0;
- hw_ps->num_short_term_ref_pic_sets = sps->num_short_term_ref_pic_sets;
- hw_ps->long_term_ref_pics_present_flag =
- !!(sps->flags & V4L2_HEVC_SPS_FLAG_LONG_TERM_REF_PICS_PRESENT);
- hw_ps->num_long_term_ref_pics_sps = sps->num_long_term_ref_pics_sps;
- hw_ps->sps_temporal_mvp_enabled_flag =
- !!(sps->flags & V4L2_HEVC_SPS_FLAG_SPS_TEMPORAL_MVP_ENABLED);
- hw_ps->strong_intra_smoothing_enabled_flag =
- !!(sps->flags & V4L2_HEVC_SPS_FLAG_STRONG_INTRA_SMOOTHING_ENABLED);
- hw_ps->sps_max_dec_pic_buffering_minus1 = sps->sps_max_dec_pic_buffering_minus1;
+ rkvdec_set_bw_field(hw_ps->info, PCM_ENABLED_FLAG, pcm_enabled);
+ rkvdec_set_bw_field(hw_ps->info, PCM_SAMPLE_BIT_DEPTH_LUMA,
+ pcm_enabled ? sps->pcm_sample_bit_depth_luma_minus1 + 1 : 0);
+ rkvdec_set_bw_field(hw_ps->info, PCM_SAMPLE_BIT_DEPTH_CHROMA,
+ pcm_enabled ? sps->pcm_sample_bit_depth_chroma_minus1 + 1 : 0);
+ rkvdec_set_bw_field(hw_ps->info, PCM_LOOP_FILTER_DISABLED_FLAG,
+ !!(sps->flags & V4L2_HEVC_SPS_FLAG_PCM_LOOP_FILTER_DISABLED));
+ rkvdec_set_bw_field(hw_ps->info, LOG2_DIFF_MAX_MIN_PCM_LUMA_CODING_BLOCK_SIZE,
+ sps->log2_diff_max_min_pcm_luma_coding_block_size);
+ rkvdec_set_bw_field(hw_ps->info, LOG2_MIN_PCM_LUMA_CODING_BLOCK_SIZE,
+ pcm_enabled ? sps->log2_min_pcm_luma_coding_block_size_minus3 + 3 : 0);
+ rkvdec_set_bw_field(hw_ps->info, NUM_SHORT_TERM_REF_PIC_SETS,
+ sps->num_short_term_ref_pic_sets);
+ rkvdec_set_bw_field(hw_ps->info, LONG_TERM_REF_PICS_PRESENT_FLAG,
+ !!(sps->flags & V4L2_HEVC_SPS_FLAG_LONG_TERM_REF_PICS_PRESENT));
+ rkvdec_set_bw_field(hw_ps->info, NUM_LONG_TERM_REF_PICS_SPS,
+ sps->num_long_term_ref_pics_sps);
+ rkvdec_set_bw_field(hw_ps->info, SPS_TEMPORAL_MVP_ENABLED_FLAG,
+ !!(sps->flags & V4L2_HEVC_SPS_FLAG_SPS_TEMPORAL_MVP_ENABLED));
+ rkvdec_set_bw_field(hw_ps->info, STRONG_INTRA_SMOOTHING_ENABLED_FLAG,
+ !!(sps->flags & V4L2_HEVC_SPS_FLAG_STRONG_INTRA_SMOOTHING_ENABLED));
+ rkvdec_set_bw_field(hw_ps->info, SPS_MAX_DEC_PIC_BUFFERING_MINUS1,
+ sps->sps_max_dec_pic_buffering_minus1);
/* write pps */
- hw_ps->picture_parameters_set_id = pps->pic_parameter_set_id;
- hw_ps->seq_parameters_set_id_pps = sps->seq_parameter_set_id;
- hw_ps->dependent_slice_segments_enabled_flag =
- !!(pps->flags & V4L2_HEVC_PPS_FLAG_DEPENDENT_SLICE_SEGMENT_ENABLED);
- hw_ps->output_flag_present_flag = !!(pps->flags & V4L2_HEVC_PPS_FLAG_OUTPUT_FLAG_PRESENT);
- hw_ps->num_extra_slice_header_bits = pps->num_extra_slice_header_bits;
- hw_ps->sign_data_hiding_enabled_flag =
- !!(pps->flags & V4L2_HEVC_PPS_FLAG_SIGN_DATA_HIDING_ENABLED);
- hw_ps->cabac_init_present_flag = !!(pps->flags & V4L2_HEVC_PPS_FLAG_CABAC_INIT_PRESENT);
- hw_ps->num_ref_idx_l0_default_active = pps->num_ref_idx_l0_default_active_minus1 + 1;
- hw_ps->num_ref_idx_l1_default_active = pps->num_ref_idx_l1_default_active_minus1 + 1;
- hw_ps->init_qp_minus26 = pps->init_qp_minus26;
- hw_ps->constrained_intra_pred_flag =
- !!(pps->flags & V4L2_HEVC_PPS_FLAG_CONSTRAINED_INTRA_PRED);
- hw_ps->transform_skip_enabled_flag =
- !!(pps->flags & V4L2_HEVC_PPS_FLAG_TRANSFORM_SKIP_ENABLED);
- hw_ps->cu_qp_delta_enabled_flag = !!(pps->flags & V4L2_HEVC_PPS_FLAG_CU_QP_DELTA_ENABLED);
- hw_ps->log2_min_cb_size = log2_min_cb_size +
- sps->log2_diff_max_min_luma_coding_block_size -
- pps->diff_cu_qp_delta_depth;
- hw_ps->pps_cb_qp_offset = pps->pps_cb_qp_offset;
- hw_ps->pps_cr_qp_offset = pps->pps_cr_qp_offset;
- hw_ps->pps_slice_chroma_qp_offsets_present_flag =
- !!(pps->flags & V4L2_HEVC_PPS_FLAG_PPS_SLICE_CHROMA_QP_OFFSETS_PRESENT);
- hw_ps->weighted_pred_flag = !!(pps->flags & V4L2_HEVC_PPS_FLAG_WEIGHTED_PRED);
- hw_ps->weighted_bipred_flag = !!(pps->flags & V4L2_HEVC_PPS_FLAG_WEIGHTED_BIPRED);
- hw_ps->transquant_bypass_enabled_flag =
- !!(pps->flags & V4L2_HEVC_PPS_FLAG_TRANSQUANT_BYPASS_ENABLED);
+ rkvdec_set_bw_field(hw_ps->info, PIC_PARAMETER_SET_ID, pps->pic_parameter_set_id);
+ rkvdec_set_bw_field(hw_ps->info, SEQ_PARAMETER_SET_ID, sps->seq_parameter_set_id);
+ rkvdec_set_bw_field(hw_ps->info, DEPENDENT_SLICE_SEGMENTS_ENABLED_FLAG,
+ !!(pps->flags & V4L2_HEVC_PPS_FLAG_DEPENDENT_SLICE_SEGMENT_ENABLED));
+ rkvdec_set_bw_field(hw_ps->info, OUTPUT_FLAG_PRESENT_FLAG,
+ !!(pps->flags & V4L2_HEVC_PPS_FLAG_OUTPUT_FLAG_PRESENT));
+ rkvdec_set_bw_field(hw_ps->info, NUM_EXTRA_SLICE_HEADER_BITS,
+ pps->num_extra_slice_header_bits);
+ rkvdec_set_bw_field(hw_ps->info, SIGN_DATA_HIDING_ENABLED_FLAG,
+ !!(pps->flags & V4L2_HEVC_PPS_FLAG_SIGN_DATA_HIDING_ENABLED));
+ rkvdec_set_bw_field(hw_ps->info, CABAC_INIT_PRESENT_FLAG,
+ !!(pps->flags & V4L2_HEVC_PPS_FLAG_CABAC_INIT_PRESENT));
+ rkvdec_set_bw_field(hw_ps->info, NUM_REF_IDX_L0_DEFAULT_ACTIVE,
+ pps->num_ref_idx_l0_default_active_minus1 + 1);
+ rkvdec_set_bw_field(hw_ps->info, NUM_REF_IDX_L1_DEFAULT_ACTIVE,
+ pps->num_ref_idx_l1_default_active_minus1 + 1);
+ rkvdec_set_bw_field(hw_ps->info, INIT_QP_MINUS26, pps->init_qp_minus26);
+ rkvdec_set_bw_field(hw_ps->info, CONSTRAINED_INTRA_PRED_FLAG,
+ !!(pps->flags & V4L2_HEVC_PPS_FLAG_CONSTRAINED_INTRA_PRED));
+ rkvdec_set_bw_field(hw_ps->info, TRANSFORM_SKIP_ENABLED_FLAG,
+ !!(pps->flags & V4L2_HEVC_PPS_FLAG_TRANSFORM_SKIP_ENABLED));
+ rkvdec_set_bw_field(hw_ps->info, CU_QP_DELTA_ENABLED_FLAG,
+ !!(pps->flags & V4L2_HEVC_PPS_FLAG_CU_QP_DELTA_ENABLED));
+ rkvdec_set_bw_field(hw_ps->info, LOG2_MIN_CU_QP_DELTA_SIZE, log2_min_cb_size +
+ sps->log2_diff_max_min_luma_coding_block_size -
+ pps->diff_cu_qp_delta_depth);
+ rkvdec_set_bw_field(hw_ps->info, PPS_CB_QP_OFFSET, pps->pps_cb_qp_offset);
+ rkvdec_set_bw_field(hw_ps->info, PPS_CR_QP_OFFSET, pps->pps_cr_qp_offset);
+ rkvdec_set_bw_field(hw_ps->info, PPS_SLICE_CHROMA_QP_OFFSETS_PRESENT_FLAG,
+ !!(pps->flags &
+ V4L2_HEVC_PPS_FLAG_PPS_SLICE_CHROMA_QP_OFFSETS_PRESENT));
+ rkvdec_set_bw_field(hw_ps->info, WEIGHTED_PRED_FLAG,
+ !!(pps->flags & V4L2_HEVC_PPS_FLAG_WEIGHTED_PRED));
+ rkvdec_set_bw_field(hw_ps->info, WEIGHTED_BIPRED_FLAG,
+ !!(pps->flags & V4L2_HEVC_PPS_FLAG_WEIGHTED_BIPRED));
+ rkvdec_set_bw_field(hw_ps->info, TRANSQUANT_BYPASS_ENABLED_FLAG,
+ !!(pps->flags & V4L2_HEVC_PPS_FLAG_TRANSQUANT_BYPASS_ENABLED));
tiles_enabled = !!(pps->flags & V4L2_HEVC_PPS_FLAG_TILES_ENABLED);
- hw_ps->tiles_enabled_flag = tiles_enabled;
- hw_ps->entropy_coding_sync_enabled_flag =
- !!(pps->flags & V4L2_HEVC_PPS_FLAG_ENTROPY_CODING_SYNC_ENABLED);
- hw_ps->pps_loop_filter_across_slices_enabled_flag =
- !!(pps->flags & V4L2_HEVC_PPS_FLAG_PPS_LOOP_FILTER_ACROSS_SLICES_ENABLED);
- hw_ps->loop_filter_across_tiles_enabled_flag =
- !!(pps->flags & V4L2_HEVC_PPS_FLAG_LOOP_FILTER_ACROSS_TILES_ENABLED);
- hw_ps->deblocking_filter_override_enabled_flag =
- !!(pps->flags & V4L2_HEVC_PPS_FLAG_DEBLOCKING_FILTER_OVERRIDE_ENABLED);
- hw_ps->pps_deblocking_filter_disabled_flag =
- !!(pps->flags & V4L2_HEVC_PPS_FLAG_PPS_DISABLE_DEBLOCKING_FILTER);
- hw_ps->pps_beta_offset_div2 = pps->pps_beta_offset_div2;
- hw_ps->pps_tc_offset_div2 = pps->pps_tc_offset_div2;
- hw_ps->lists_modification_present_flag =
- !!(pps->flags & V4L2_HEVC_PPS_FLAG_LISTS_MODIFICATION_PRESENT);
- hw_ps->log2_parallel_merge_level = pps->log2_parallel_merge_level_minus2 + 2;
- hw_ps->slice_segment_header_extension_present_flag =
- !!(pps->flags & V4L2_HEVC_PPS_FLAG_SLICE_SEGMENT_HEADER_EXTENSION_PRESENT);
- hw_ps->num_tile_columns = tiles_enabled ? pps->num_tile_columns_minus1 + 1 : 1;
- hw_ps->num_tile_rows = tiles_enabled ? pps->num_tile_rows_minus1 + 1 : 1;
- hw_ps->mvc_ff = 0xffff;
+ rkvdec_set_bw_field(hw_ps->info, TILES_ENABLED_FLAG, tiles_enabled);
+ rkvdec_set_bw_field(hw_ps->info, ENTROPY_CODING_SYNC_ENABLED_FLAG,
+ !!(pps->flags & V4L2_HEVC_PPS_FLAG_ENTROPY_CODING_SYNC_ENABLED));
+ rkvdec_set_bw_field(hw_ps->info, PPS_LOOP_FILTER_ACROSS_SLICES_ENABLED_FLAG,
+ !!(pps->flags &
+ V4L2_HEVC_PPS_FLAG_PPS_LOOP_FILTER_ACROSS_SLICES_ENABLED));
+ rkvdec_set_bw_field(hw_ps->info, LOOP_FILTER_ACROSS_TILES_ENABLED_FLAG,
+ !!(pps->flags & V4L2_HEVC_PPS_FLAG_LOOP_FILTER_ACROSS_TILES_ENABLED));
+ rkvdec_set_bw_field(hw_ps->info, DEBLOCKING_FILTER_OVERRIDE_ENABLED_FLAG,
+ !!(pps->flags &
+ V4L2_HEVC_PPS_FLAG_DEBLOCKING_FILTER_OVERRIDE_ENABLED));
+ rkvdec_set_bw_field(hw_ps->info, PPS_DEBLOCKING_FILTER_DISABLED_FLAG,
+ !!(pps->flags & V4L2_HEVC_PPS_FLAG_PPS_DISABLE_DEBLOCKING_FILTER));
+ rkvdec_set_bw_field(hw_ps->info, PPS_BETA_OFFSET_DIV2, pps->pps_beta_offset_div2);
+ rkvdec_set_bw_field(hw_ps->info, PPS_TC_OFFSET_DIV2, pps->pps_tc_offset_div2);
+ rkvdec_set_bw_field(hw_ps->info, LISTS_MODIFICATION_PRESENT_FLAG,
+ !!(pps->flags & V4L2_HEVC_PPS_FLAG_LISTS_MODIFICATION_PRESENT));
+ rkvdec_set_bw_field(hw_ps->info, LOG2_PARALLEL_MERGE_LEVEL,
+ pps->log2_parallel_merge_level_minus2 + 2);
+ rkvdec_set_bw_field(hw_ps->info, SLICE_SEGMENT_HEADER_EXTENSION_PRESENT_FLAG,
+ !!(pps->flags &
+ V4L2_HEVC_PPS_FLAG_SLICE_SEGMENT_HEADER_EXTENSION_PRESENT));
+ rkvdec_set_bw_field(hw_ps->info, NUM_TILE_COLUMNS,
+ tiles_enabled ? pps->num_tile_columns_minus1 + 1 : 1);
+ rkvdec_set_bw_field(hw_ps->info, NUM_TILE_ROWS,
+ tiles_enabled ? pps->num_tile_rows_minus1 + 1 : 1);
+ rkvdec_set_bw_field(hw_ps->info, MVC_FF, 0xffff);
// Setup tiles information
memset(column_width, 0, sizeof(column_width));
@@ -367,15 +305,19 @@ static void assemble_hw_pps(struct rkvdec_ctx *ctx,
row_height[0] = (height + max_cu_width - 1) / max_cu_width;
}
- set_column_row(hw_ps, column_width, row_height);
+ for (i = 0; i < 20; i++)
+ rkvdec_set_bw_field(hw_ps->info, COLUMN_WIDTH(i), column_width[i]);
+ for (i = 0; i < 22; i++)
+ rkvdec_set_bw_field(hw_ps->info, ROW_HEIGHT(i), row_height[i]);
// Setup POC information
- hw_ps->current_poc = dec_params->pic_order_cnt_val;
+ rkvdec_set_bw_field(hw_ps->info, CURRENT_POC, dec_params->pic_order_cnt_val);
- set_pps_ref_pic_poc(hw_ps, dec_params->dpb);
for (i = 0; i < ARRAY_SIZE(dec_params->dpb); i++) {
- u32 valid = !!(dec_params->num_active_dpb_entries > i);
- hw_ps->ref_is_valid |= valid << i;
+ rkvdec_set_bw_field(hw_ps->info, REF_IS_VALID(i),
+ !!(dec_params->num_active_dpb_entries > i));
+ rkvdec_set_bw_field(hw_ps->info, REF_PIC_POC(i),
+ dec_params->dpb[i].pic_order_cnt_val);
}
}
--
2.53.0
^ permalink raw reply related
* Re: [PATCH net-next v2 00/15] net: stmmac: qcom-ethqos: more cleanups
From: Mohd Ayaan Anwar @ 2026-03-27 15:20 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Andrew Lunn, Alexandre Torgue, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-arm-msm,
linux-stm32, netdev, Paolo Abeni
In-Reply-To: <acZDEg9wdjhBTHlL@shell.armlinux.org.uk>
Hi Russell,
On Fri, Mar 27, 2026 at 08:42:58AM +0000, Russell King (Oracle) wrote:
> Further cleanups to qcom-ethqos, mainly concentrating on the RGMII
> code, making it clearer what the differences are for each speed, thus
> making the code more readable.
>
> I'm still not really happy with this. The speed specific configuration
> remains split between ethqos_fix_mac_speed_rgmii() and
> ethqos_rgmii_macro_init(), where the latter is only ever called from
> the former. So, I think further work is needed here - maybe it needs
> restructuring into the various componenet parts of the RGMII block?
>
> v2:
> - patch 2: fix typo in commit message
> - patch 3: fix ethqos_fix_mac_speed() comment
>
> .../ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c | 220 ++++++++-------------
> 1 file changed, 87 insertions(+), 133 deletions(-)
>
No issues found at 100M and 1G on the QCS615 Ride board with the KSZ9031
RGMII PHY. As noted earlier, Ethernet support for this board is not yet
upstream, but I have some local changes to make it work.
10M could not be tested due to limitations of the link partner. But with
100M working fine, I am fairly certain that this series will not
introduce any new issues at 10M.
Please feel free to add my:
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Ayaan
^ permalink raw reply
* Re: [PATCH v1 1/3] arm64: dts: amlogic: meson-s4: add VRTC node
From: neil.armstrong @ 2026-03-27 15:34 UTC (permalink / raw)
To: Nick Xie, khilman, martin.blumenstingl, jbrunet
Cc: krzk+dt, robh, conor+dt, linux-amlogic, linux-arm-kernel,
devicetree, linux-kernel
In-Reply-To: <20260327093016.722095-2-nick@khadas.com>
On 3/27/26 10:30, Nick Xie wrote:
> Add the Virtual RTC (VRTC) controller node to the Meson S4 SoC dtsi.
>
> Signed-off-by: Nick Xie <nick@khadas.com>
> ---
> arch/arm64/boot/dts/amlogic/meson-s4.dtsi | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/amlogic/meson-s4.dtsi b/arch/arm64/boot/dts/amlogic/meson-s4.dtsi
> index 936a5c1353d15..2a6fbd5308362 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-s4.dtsi
> +++ b/arch/arm64/boot/dts/amlogic/meson-s4.dtsi
> @@ -59,6 +59,11 @@ psci {
> method = "smc";
> };
>
> + vrtc: rtc@fe010288 {
> + compatible = "amlogic,meson-vrtc";
> + reg = <0x0 0xfe010288 0x0 0x4>;
> + };
> +
> xtal: xtal-clk {
> compatible = "fixed-clock";
> clock-frequency = <24000000>;
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Thanks,
Neil
^ permalink raw reply
* Re: [PATCH v1 2/3] arm64: dts: amlogic: meson-s4-s905y4-khadas-vim1s: enable HYM8563 RTC
From: neil.armstrong @ 2026-03-27 15:34 UTC (permalink / raw)
To: Nick Xie, khilman, martin.blumenstingl, jbrunet
Cc: krzk+dt, robh, conor+dt, linux-amlogic, linux-arm-kernel,
devicetree, linux-kernel
In-Reply-To: <20260327093016.722095-3-nick@khadas.com>
On 3/27/26 10:30, Nick Xie wrote:
> The Khadas VIM1S board has an on-board Haoyu Micro HYM8563 Real Time
> Clock (RTC) connected to the I2C1 bus.
>
> Enable the I2C1 controller and add the RTC child node to support
> hardware clock persistence.
>
> Signed-off-by: Nick Xie <nick@khadas.com>
> ---
> .../dts/amlogic/meson-s4-s905y4-khadas-vim1s.dts | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/amlogic/meson-s4-s905y4-khadas-vim1s.dts b/arch/arm64/boot/dts/amlogic/meson-s4-s905y4-khadas-vim1s.dts
> index 792ab45c4c944..7314e0ab81da3 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-s4-s905y4-khadas-vim1s.dts
> +++ b/arch/arm64/boot/dts/amlogic/meson-s4-s905y4-khadas-vim1s.dts
> @@ -20,6 +20,8 @@ aliases {
> mmc0 = &emmc; /* eMMC */
> mmc1 = &sd; /* SD card */
> mmc2 = &sdio; /* SDIO */
> + rtc0 = &rtc;
> + rtc1 = &vrtc;
> serial0 = &uart_b;
> };
>
> @@ -223,6 +225,19 @@ ðmac {
> phy-mode = "rmii";
> };
>
> +&i2c1 {
> + status = "okay";
> + pinctrl-names = "default";
> + pinctrl-0 = <&i2c1_pins2>;
> + clock-frequency = <100000>;
> +
> + rtc: rtc@51 {
> + compatible = "haoyu,hym8563";
> + reg = <0x51>;
> + #clock-cells = <0>;
> + };
> +};
> +
> &ir {
> status = "okay";
> pinctrl-0 = <&remote_pins>;
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Thanks,
Neil
^ permalink raw reply
* Re: [PATCH v1 3/3] arm64: dts: amlogic: meson-s4-s905y4-khadas-vim1s: use rc-khadas keymap
From: neil.armstrong @ 2026-03-27 15:34 UTC (permalink / raw)
To: Nick Xie, khilman, martin.blumenstingl, jbrunet
Cc: krzk+dt, robh, conor+dt, linux-amlogic, linux-arm-kernel,
devicetree, linux-kernel
In-Reply-To: <20260327093016.722095-4-nick@khadas.com>
On 3/27/26 10:30, Nick Xie wrote:
> The Khadas VIM1S board has an onboard IR receiver.
> Configure the default keymap to "rc-khadas" to support the official
> Khadas IR remote control.
>
> Signed-off-by: Nick Xie <nick@khadas.com>
> ---
> arch/arm64/boot/dts/amlogic/meson-s4-s905y4-khadas-vim1s.dts | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/boot/dts/amlogic/meson-s4-s905y4-khadas-vim1s.dts b/arch/arm64/boot/dts/amlogic/meson-s4-s905y4-khadas-vim1s.dts
> index 7314e0ab81da3..99d5df71b9cd4 100644
> --- a/arch/arm64/boot/dts/amlogic/meson-s4-s905y4-khadas-vim1s.dts
> +++ b/arch/arm64/boot/dts/amlogic/meson-s4-s905y4-khadas-vim1s.dts
> @@ -242,6 +242,7 @@ &ir {
> status = "okay";
> pinctrl-0 = <&remote_pins>;
> pinctrl-names = "default";
> + linux,rc-map-name = "rc-khadas";
> };
>
> &pwm_ef {
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Thanks,
Neil
^ permalink raw reply
* Re: [PATCH v6 01/40] arm_mpam: Ensure in_reset_state is false after applying configuration
From: James Morse @ 2026-03-27 15:42 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, zengheng4, linux-doc
In-Reply-To: <20260313144617.3420416-2-ben.horgan@arm.com>
Hi Ben, Zeng,
On 13/03/2026 14:45, Ben Horgan wrote:
> From: Zeng Heng <zengheng4@huawei.com>
>
> The per-RIS flag, in_reset_state, indicates whether or not the MSC
> registers are in reset state, and allows avoiding resetting when they are
> already in reset state. However, when mpam_apply_config() updates the
> configuration it doesn't update the in_reset_state flag and so even after
> the configuration update in_reset_state can be true and mpam_reset_ris()
> will skip the actual register restoration on subsequent resets.
>
> Once resctrl has a MPAM backend it will use resctrl_arch_reset_all_ctrls()
> to reset the MSC configuration on unmount and, if the in_reset_state flag
> is bogusly true, fail to reset the MSC configuration. The resulting
> non-reset MSC configuration can lead to persistent performance restrictions
> even after resctrl is unmounted.
>
> Fix by clearing in_reset_state to false immediately after successful
> configuration application, ensuring that the next reset operation
> properly restores MSC register defaults.
Reviewed-by: James Morse <james.morse@arm.com>
Thanks!
James
^ permalink raw reply
* Re: [PATCH v6 08/40] arm64: mpam: Drop the CONFIG_EXPERT restriction
From: James Morse @ 2026-03-27 15:43 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, zengheng4, linux-doc,
Shaopeng Tan
In-Reply-To: <20260313144617.3420416-9-ben.horgan@arm.com>
Hi Ben,
On 13/03/2026 14:45, Ben Horgan wrote:
> In anticipation of MPAM being useful remove the CONFIG_EXPERT restriction.
Useful - ha! I've added a second paragraph describing why this was done, just
so it doesn't look odd in 5 years time.
| This was done to prevent the driver being enabled before the user-space
| interface was wired up.
Reviewed-by: James Morse <james.morse@arm.com>
Thanks,
James
^ permalink raw reply
* Re: [PATCH v6 11/40] arm64: mpam: Initialise and context switch the MPAMSM_EL1 register
From: James Morse @ 2026-03-27 15:44 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, zengheng4, linux-doc,
Shaopeng Tan
In-Reply-To: <20260313144617.3420416-12-ben.horgan@arm.com>
Hi Ben,
On 13/03/2026 14:45, Ben Horgan wrote:
> The MPAMSM_EL1 sets the MPAM labels, PMG and PARTID, for loads and stores
> generated by a shared SMCU. Disable the traps so the kernel can use it and
> set it to the same configuration as the per-EL cpu MPAM configuration.
>
> If an SMCU is not shared with other cpus then it is implementation
> defined whether the configuration from MPAMSM_EL1 is used or that from
> the appropriate MPAMy_ELx. As we set the same, PMG_D and PARTID_D,
> configuration for MPAM0_EL1, MPAM1_EL1 and MPAMSM_EL1 the resulting
> configuration is the same regardless.
>
> The range of valid configurations for the PARTID and PMG in MPAMSM_EL1 is
> not currently specified in Arm Architectural Reference Manual but the
> architect has confirmed that it is intended to be the same as that for the
> cpu configuration in the MPAMy_ELx registers.
Reviewed-by: James Morse <james.morse@arm.com>
> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> index 0747e0526927..6bccbfdccb87 100644
> --- a/arch/arm64/include/asm/mpam.h
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -53,6 +53,8 @@ static inline void mpam_thread_switch(struct task_struct *tsk)
> return;
>
> write_sysreg_s(regval | MPAM1_EL1_MPAMEN, SYS_MPAM1_EL1);
> + if (system_supports_sme())
> + write_sysreg_s(regval & (MPAMSM_EL1_PARTID_D | MPAMSM_EL1_PMG_D), SYS_MPAMSM_EL1);
Doing it here saves a surprise later.
> isb();
> /* Synchronising the EL0 write is left until the ERET to EL0 */
(down here would have been the alternative)
Thanks,
James
^ permalink raw reply
* Re: [PATCH v6 21/40] arm_mpam: resctrl: Hide CDP emulation behind CONFIG_EXPERT
From: James Morse @ 2026-03-27 15:44 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, zengheng4, linux-doc
In-Reply-To: <20260313144617.3420416-22-ben.horgan@arm.com>
Hi Ben,
On 13/03/2026 14:45, Ben Horgan wrote:
> When CDP is not enabled, the 'rmid_entry's in the limbo list,
> rmid_busy_llc, map directly to a (PARTID,PMG) pair and when CDP is enabled
> the mapping is to two different pairs.
> As the limbo list is reused between
> mounts and CDP disabled on unmount this can lead to stale mapping and the
> limbo handler will then make monitor reads with potentially out of range
> PARTID.
Bother - I missed that!
> This may then cause an MPAM error interrupt and the driver will
> disable MPAM.
... and that's why it's not a problem on x86 because the RMID range is unaffected by CDP,
whereas MPAM works on a combined value.
> No problems are expected if you just mount the resctrl file system
> once with CDP enabled and never unmount it.
(guess how it was tested!)
> Hide CDP emulation behind CONFIG_EXPERT to protect the unwary.
>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Adding this ugliness in the hope of avoiding patch churn and extra
> reviewer work. I am looking into the resctrl changes needed to fix this.
Makes sense - people can still use this if they're aware of the limitation, and it sounds
like you've got a plan to fix it properly. We just don't want it enabled in distros until
then.
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 903d1a0f564f..cab3e9ccb5c7 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -82,6 +82,18 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level rid, bool enable)
> u32 partid_i = RESCTRL_RESERVED_CLOSID, partid_d = RESCTRL_RESERVED_CLOSID;
> int cpu;
>
> + if (!IS_ENABLED(CONFIG_EXPERT) && enable) {
> + /*
> + * If the resctrl fs is mounted more than once, sequentially,
> + * then CDP can lead to the use of out of range PARTIDs.
> + */
> + pr_warn("CDP not supported\n");
> + return -EOPNOTSUPP;
> + }
> +
> + if (enable)
> + pr_warn("CDP is an expert feature and may cause MPAM to malfunction.\n");
> +
> /*
> * resctrl_arch_set_cdp_enabled() is only called with enable set to
> * false on error and unmount.
Reviewed-by: James Morse <james.morse@arm.com>
Thanks,
James
^ permalink raw reply
* Re: [PATCH 1/4] exec: inherit HWCAPs from the parent process
From: Andrei Vagin @ 2026-03-27 15:46 UTC (permalink / raw)
To: Will Deacon, Mark Rutland
Cc: Kees Cook, Andrew Morton, Marek Szyprowski, Cyrill Gorcunov,
Mike Rapoport, Alexander Mikhalitsyn, linux-kernel, linux-fsdevel,
linux-mm, criu, Catalin Marinas, linux-arm-kernel, Chen Ridong,
Christian Brauner, David Hildenbrand, Eric Biederman,
Lorenzo Stoakes, Michal Koutny, Alexander Mikhalitsyn
In-Reply-To: <CAEWA0a7iR8YHooqXJfhersV6YhAXGMZDUhib3QQH5XGn=KNowA@mail.gmail.com>
On Tue, Mar 24, 2026 at 3:19 PM Andrei Vagin <avagin@google.com> wrote:
>
> Hi Mark and Will,
>
> Thanks for the feedback. Please read the inline comments.
Mark, Will, just checking in to see if my explanation makes sense to you.
Let me know if you have any further feedback or questions.
Thanks,
Andrei
>
> On Tue, Mar 24, 2026 at 3:28 AM Will Deacon <will@kernel.org> wrote:
> >
> > On Mon, Mar 23, 2026 at 06:21:22PM +0000, Mark Rutland wrote:
> > > On Mon, Mar 23, 2026 at 05:53:37PM +0000, Andrei Vagin wrote:
> > > > Introduces a mechanism to inherit hardware capabilities (AT_HWCAP,
> > > > AT_HWCAP2, etc.) from a parent process when they have been modified via
> > > > prctl.
> > > >
> > > > To support C/R operations (snapshots, live migration) in heterogeneous
> > > > clusters, we must ensure that processes utilize CPU features available
> > > > on all potential target nodes. To solve this, we need to advertise a
> > > > common feature set across the cluster.
> > > >
> > > > This patch adds a new mm flag MMF_USER_HWCAP, which is set when the
> > > > auxiliary vector is modified via prctl(PR_SET_MM, PR_SET_MM_AUXV). When
> > > > execve() is called, if the current process has MMF_USER_HWCAP set, the
> > > > HWCAP values are extracted from the current auxiliary vector and stored
> > > > in the linux_binprm structure. These values are then used to populate
> > > > the auxiliary vector of the new process, effectively inheriting the
> > > > hardware capabilities.
> > > >
> > > > The inherited HWCAPs are masked with the hardware capabilities supported
> > > > by the current kernel to ensure that we don't report more features than
> > > > actually supported. This is important to avoid unexpected behavior,
> > > > especially for processes with additional privileges.
> > >
> > > At a high level, I don't think that's going to be sufficient:
> > >
> > > * On an architecture with other userspace accessible feature
> > > identification mechanism registers (e.g. ID registers), userspace
> > > might read those. So you might need to hide stuff there too, and
> > > that's going to require architecture-specific interfaces to manage.
> > >
> > > It's possible that some code checks HWCAPs and others check ID
> > > registers, and mismatch between the two could be problematic.
> > >
> > > * If the HWCAPs can be inherited by a more privileged task, then a
> > > malicious user could use this to hide security features (e.g. shadow
> > > stack or pointer authentication on arm64), and make it easier to
> > > attack that task. While not a direct attack, it would undermine those
> > > features.
>
> I agree with Mark that only a privileged process have to be able to mask
> certain hardware features. Currently, PR_SET_MM_AUXV is guarded by
> CAP_SYS_RESOURCE, but PR_SET_MM_MAP allows changing the auxiliary vector
> without specific capabilities. This is definitely the issue. To address
> this, I think we can consider to introduce a new prctl command to enable
> HWCAP inheritance explicitly.
>
> >
> > Yeah, this looks like a non-starter to me on arm64. Even if it was
> > extended to apply the same treatment to the idregs, many of the hwcap
> > features can't actually be disabled by the kernel and so you still run
> > the risk of a task that probes for the presence of a feature using
> > something like a SIGILL handler or, perhaps more likely, assumes that
> > the presence of one hwcap implies the presence of another. And then
> > there are the applications that just base everything off the MIDR...
>
> The goal of this mechanism is not to provide strict architectural
> enforcement or to trap the use of hardware features; rather, it is to
> provide a consistent discovery interface for applications. I chose the
> HWCAP vector because it mirrors the existing behavior of running an
> older kernel on newer hardware: while ID registers might report a
> feature as physically present, the HWCAPs will omit it if the kernel
> lacks support. Applications are generally expected to treat HWCAPs as
> the source of truth for which features are safe to use, even if the
> underlying hardware is technically capable of more.
>
> Another significant advantage of using HWCAPs is that many
> applications already rely on them for feature detection. This interface
> allows these applications to work correctly "out-of-the-box" in a
> migrated environment without requiring any userspace modifications. I
> understand that some apps may use other detection methods; however, there
> it no gurantee that these applications will work correctly after
> migration to another machine.
>
> >
> > There's also kvm, which provides a roundabout way to query some features
> > of the underlying hardware.
> >
> > You're probably better off using/extending the idreg overrides we have
> > in arch/arm64/kernel/pi/idreg-override.c so that you can make your
> > cluster of heterogeneous machines look alike.
>
> IIRC, idreg-override/cpuid-masking usually works for an entire machine.
> We actually need to have a mechanism that will work on a per-container
> basis. Workloads inside one cluster can have different
> migration/snapshot requirements. Some are pinned to a specific node,
> others are never migrated, while others need to be migratable across a
> cluster or even between clusters. We need a mechanism that can be
> tunable on a per-container/per-process basis.
>
> >
> > On the other hand, if munging the hwcaps happens to be sufficient for
> > this particular use-case, can't it be handled entirely in userspace (e.g.
> > by hacking libc?)
>
> CRIU often handles workloads with a mix of runtimes: some linked against
> glibc, some against musl, and others like Go that bypass libc entirely.
> CRIU is mostly used to handle containers that can run multiple processes
> possible based on different runtimes. It means available cpu features
> should not be only specified for one runtime, they have to be passed
> across different runtimes. I think the pure userspace solution is near
> infeasible in this case.
>
> Thanks,
> Andrei
^ permalink raw reply
* Re: [PATCH v6 22/40] arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats
From: James Morse @ 2026-03-27 15:47 UTC (permalink / raw)
To: Gavin Shan, Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, zengheng4, linux-doc,
Shaopeng Tan
In-Reply-To: <bd450f1f-05a3-44d6-9bbc-1c48d967baa4@redhat.com>
Hi Gavin,
On 23/03/2026 22:49, Gavin Shan wrote:
> On 3/14/26 12:45 AM, Ben Horgan wrote:
>> From: Dave Martin <Dave.Martin@arm.com>
>>
>> MPAM uses a fixed-point formats for some hardware controls. Resctrl
>> provides the bandwidth controls as a percentage. Add helpers to convert
>> between these.
>>
>> Ensure bwa_wd is at most 16 to make it clear higher values have no meaning.
> One nitpick below, but this looks good to me in either way.
>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
Thanks!
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 0e5e24ef60fe..0c97f7708722 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -713,6 +713,13 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
>> mpam_set_feature(mpam_feat_mbw_part, props);
>> props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
>> +
>> + /*
>> + * The BWA_WD field can represent 0-63, but the control fields it
>> + * describes have a maximum of 16 bits.
>> + */
>> + props->bwa_wd = min(props->bwa_wd, 16);
>> +
>
> 16 may deserve a definition for it since it's a constant value and referred
> for multiple times in this patch, if we need to give this series another
> respin :-)
Hmmm.,. I've left this, I'm not sure what you'd call it. U16_BITS? That sort of thing
might be needed for long/int etc. Here there is either a comment, or its
accepting/returning a u16. I think its fairly obvious where the number 16 is coming from.
Thanks,
James
^ permalink raw reply
* Re: [PATCH v6 25/40] arm_mpam: resctrl: Add support for 'MB' resource
From: James Morse @ 2026-03-27 15:47 UTC (permalink / raw)
To: Gavin Shan, Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, zengheng4, linux-doc,
Shaopeng Tan
In-Reply-To: <3ae3356d-a901-4b71-90df-557d468e4785@redhat.com>
Hi Gavin,
On 23/03/2026 23:09, Gavin Shan wrote:
> On 3/14/26 12:46 AM, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> resctrl supports 'MB', as a percentage throttling of traffic from the
>> L3. This is the control that mba_sc uses, so ideally the class chosen
>> should be as close as possible to the counters used for mbm_total. If there
>> is a single L3, it's the last cache, and the topology of the memory matches
>> then the traffic at the memory controller will be equivalent to that at
>> egress of the L3. If these conditions are met allow the memory class to
>> back MB.
>>
>> MB's percentage control should be backed either with the fixed point
>> fraction MBW_MAX or bandwidth portion bitmaps. The bandwidth portion
>> bitmaps is not used as its tricky to pick which bits to use to avoid
>> contention, and may be possible to expose this as something other than a
>> percentage in the future.
> One comment below and it deserves to be addressed if we have another respin:
>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
Thanks!
>> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
>> index 93c8a9608ed4..cad65cf7d12d 100644
>> --- a/drivers/resctrl/mpam_resctrl.c
>> +++ b/drivers/resctrl/mpam_resctrl.c
>> @@ -317,6 +344,166 @@ static u16 percent_to_mbw_max(u8 pc, struct mpam_props *cprops)
>> +/*
>> + * Test if the traffic for a class matches that at egress from the L3. For
>> + * MSC at memory controllers this is only possible if there is a single L3
>> + * as otherwise the counters at the memory can include bandwidth from the
>> + * non-local L3.
>> + */
>> +static bool traffic_matches_l3(struct mpam_class *class)
>> +{
>> + int err, cpu;
>> +
>> + lockdep_assert_cpus_held();
>> +
>> + if (class->type == MPAM_CLASS_CACHE && class->level == 3)
>> + return true;
>> +
>> + if (class->type == MPAM_CLASS_CACHE && class->level != 3) {
>> + pr_debug("class %u is a different cache from L3\n", class->level);
>> + return false;
>> + }
>> +
>> + if (class->type != MPAM_CLASS_MEMORY) {
>> + pr_debug("class %u is neither of type cache or memory\n", class->level);
>> + return false;
>> + }
>> +
>
> We bail if the calss isn't MPAM_CLASS_MEMORY here ...
>
>> + cpumask_var_t __free(free_cpumask_var) tmp_cpumask = CPUMASK_VAR_NULL;
>> + if (!alloc_cpumask_var(&tmp_cpumask, GFP_KERNEL)) {
>> + pr_debug("cpumask allocation failed\n");
>> + return false;
>> + }
>> +
>> + if (class->type != MPAM_CLASS_MEMORY) {
>> + pr_debug("class %u is neither of type cache or memory\n",
>> + class->level);
>> + return false;
>> + }
>> +
>
> Duplicated check here as the previous one. So this check can be dropped.
Heh, that looks like a rebase conflict! Thanks for spotting it.
Fixed locally.
James
^ permalink raw reply
* Re: [PATCH v6 36/40] arm_mpam: Add workaround for T241-MPAM-1
From: James Morse @ 2026-03-27 15:48 UTC (permalink / raw)
To: Gavin Shan, Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, zengheng4, linux-doc,
Shaopeng Tan
In-Reply-To: <7b73d10e-4bfd-434f-b05f-25c4859a7abd@redhat.com>
Hi Gavin,
On 24/03/2026 04:16, Gavin Shan wrote:
> On 3/14/26 12:46 AM, Ben Horgan wrote:
>> From: Shanker Donthineni <sdonthineni@nvidia.com>
>>
>> The MPAM bandwidth partitioning controls will not be correctly configured,
>> and hardware will retain default configuration register values, meaning
>> generally that bandwidth will remain unprovisioned.
>>
>> To address the issue, follow the below steps after updating the MBW_MIN
>> and/or MBW_MAX registers.
>>
>> - Perform 64b reads from all 12 bridge MPAM shadow registers at offsets
>> (0x360048 + slice*0x10000 + partid*8). These registers are read-only.
>> - Continue iterating until all 12 shadow register values match in a loop.
>> pr_warn_once if the values fail to match within the loop count 1000.
>> - Perform 64b writes with the value 0x0 to the two spare registers at
>> offsets 0x1b0000 and 0x1c0000.
>>
>> In the hardware, writes to the MPAMCFG_MBW_MAX MPAMCFG_MBW_MIN registers
>> are transformed into broadcast writes to the 12 shadow registers. The
>> final two writes to the spare registers cause a final rank of downstream
>> micro-architectural MPAM registers to be updated from the shadow copies.
>> The intervening loop to read the 12 shadow registers helps avoid a race
>> condition where writes to the spare registers occur before all shadow
>> registers have been updated.
> One question below.
>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index e66631f3f732..b1753498f07f 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -630,7 +640,45 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc
>> *msc,
>> return ERR_PTR(-ENOENT);
>> }
>> +static int mpam_enable_quirk_nvidia_t241_1(struct mpam_msc *msc,
>> + const struct mpam_quirk *quirk)
>> +{
>> + s32 soc_id = arm_smccc_get_soc_id_version();
>> + struct resource *r;
>> + phys_addr_t phys;
>> +
>> + /*
>> + * A mapping to a device other than the MSC is needed, check
>> + * SOC_ID is NVIDIA T241 chip (036b:0241)
>> + */
>> + if (soc_id < 0 || soc_id != SMCCC_SOC_ID_T241)
>> + return -EINVAL;
>> +
>> + r = platform_get_resource(msc->pdev, IORESOURCE_MEM, 0);
>> + if (!r)
>> + return -EINVAL;
>> +
>> + /* Find the internal registers base addr from the CHIP ID */
>> + msc->t241_id = T241_CHIP_ID(r->start);
>> + phys = FIELD_PREP(GENMASK_ULL(45, 44), msc->t241_id) | 0x19000000ULL;
>> +
>> + t241_scratch_regs[msc->t241_id] = ioremap(phys, SZ_8M);
>> + if (WARN_ON_ONCE(!t241_scratch_regs[msc->t241_id]))
>> + return -EINVAL;
>
> Those IO regions aren't unmapped when the MSCs are removed. I guess it would be
> something to be improved? :-)
It's just leaking some VA space in the unlikely event the error interrupt goes off.
That is never expected to happen - all the errors indicate a software bug, so its
not a case of being unlucky. (This assumes T241 supports the error interrupt!).
Adding some teardown would just be for this erratum, I expect it to be the only one
that needs to map some other device to poke at. I'm not sure its worth it.
I'm also very nervous changing this quirk as its difficult for me to test!
>> +
>> + pr_info_once("Enabled workaround for NVIDIA T241 erratum T241-MPAM-1\n");
>> +
>> + return 0;
>> +}
>> +
>> static const struct mpam_quirk mpam_quirks[] = {
>> + {
>> + /* NVIDIA t241 erratum T241-MPAM-1 */
>> + .init = mpam_enable_quirk_nvidia_t241_1,
>> + .iidr = MPAM_IIDR_NVIDIA_T241,
>> + .iidr_mask = MPAM_IIDR_MATCH_ONE,
>> + .workaround = T241_SCRUB_SHADOW_REGS,
>
> Perhaps we need a more leading space for every line in the above block.
Sure, done locally.
>> + },
>> { NULL } /* Sentinel */
>> };
Thanks,
James
^ permalink raw reply
* Re: [PATCH 1/4] exec: inherit HWCAPs from the parent process
From: Mark Rutland @ 2026-03-27 16:06 UTC (permalink / raw)
To: Andrei Vagin
Cc: Will Deacon, Kees Cook, Andrew Morton, Marek Szyprowski,
Cyrill Gorcunov, Mike Rapoport, Alexander Mikhalitsyn,
linux-kernel, linux-fsdevel, linux-mm, criu, Catalin Marinas,
linux-arm-kernel, Chen Ridong, Christian Brauner,
David Hildenbrand, Eric Biederman, Lorenzo Stoakes, Michal Koutny,
Alexander Mikhalitsyn
In-Reply-To: <CAEWA0a7iR8YHooqXJfhersV6YhAXGMZDUhib3QQH5XGn=KNowA@mail.gmail.com>
On Tue, Mar 24, 2026 at 03:19:49PM -0700, Andrei Vagin wrote:
> Hi Mark and Will,
>
> Thanks for the feedback. Please read the inline comments.
>
> On Tue, Mar 24, 2026 at 3:28 AM Will Deacon <will@kernel.org> wrote:
> >
> > On Mon, Mar 23, 2026 at 06:21:22PM +0000, Mark Rutland wrote:
> > > On Mon, Mar 23, 2026 at 05:53:37PM +0000, Andrei Vagin wrote:
> > > > Introduces a mechanism to inherit hardware capabilities (AT_HWCAP,
> > > > AT_HWCAP2, etc.) from a parent process when they have been modified via
> > > > prctl.
> > > >
> > > > To support C/R operations (snapshots, live migration) in heterogeneous
> > > > clusters, we must ensure that processes utilize CPU features available
> > > > on all potential target nodes. To solve this, we need to advertise a
> > > > common feature set across the cluster.
> > > >
> > > > This patch adds a new mm flag MMF_USER_HWCAP, which is set when the
> > > > auxiliary vector is modified via prctl(PR_SET_MM, PR_SET_MM_AUXV). When
> > > > execve() is called, if the current process has MMF_USER_HWCAP set, the
> > > > HWCAP values are extracted from the current auxiliary vector and stored
> > > > in the linux_binprm structure. These values are then used to populate
> > > > the auxiliary vector of the new process, effectively inheriting the
> > > > hardware capabilities.
> > > >
> > > > The inherited HWCAPs are masked with the hardware capabilities supported
> > > > by the current kernel to ensure that we don't report more features than
> > > > actually supported. This is important to avoid unexpected behavior,
> > > > especially for processes with additional privileges.
> > >
> > > At a high level, I don't think that's going to be sufficient:
> > >
> > > * On an architecture with other userspace accessible feature
> > > identification mechanism registers (e.g. ID registers), userspace
> > > might read those. So you might need to hide stuff there too, and
> > > that's going to require architecture-specific interfaces to manage.
> > >
> > > It's possible that some code checks HWCAPs and others check ID
> > > registers, and mismatch between the two could be problematic.
> > >
> > > * If the HWCAPs can be inherited by a more privileged task, then a
> > > malicious user could use this to hide security features (e.g. shadow
> > > stack or pointer authentication on arm64), and make it easier to
> > > attack that task. While not a direct attack, it would undermine those
> > > features.
>
> I agree with Mark that only a privileged process have to be able to mask
> certain hardware features. Currently, PR_SET_MM_AUXV is guarded by
> CAP_SYS_RESOURCE, but PR_SET_MM_MAP allows changing the auxiliary vector
> without specific capabilities. This is definitely the issue. To address
> this, I think we can consider to introduce a new prctl command to enable
> HWCAP inheritance explicitly.
>
> > Yeah, this looks like a non-starter to me on arm64. Even if it was
> > extended to apply the same treatment to the idregs, many of the hwcap
> > features can't actually be disabled by the kernel and so you still run
> > the risk of a task that probes for the presence of a feature using
> > something like a SIGILL handler or, perhaps more likely, assumes that
> > the presence of one hwcap implies the presence of another. And then
> > there are the applications that just base everything off the MIDR...
>
> The goal of this mechanism is not to provide strict architectural
> enforcement or to trap the use of hardware features; rather, it is to
> provide a consistent discovery interface for applications. I chose the
> HWCAP vector because it mirrors the existing behavior of running an
> older kernel on newer hardware: while ID registers might report a
> feature as physically present, the HWCAPs will omit it if the kernel
> lacks support.
On arm64, the view of the ID registers that userspace gets *only*
exposes features that the kernel knows about, as userspace reads of
those registers are trapped+emulated by the kernel. On arm64 it's
not true to say that something appears in those but not the HWCAPs.
I understand that might be different on other architectures, and so
maybe this approach is sufficient on other architectures, but it is not
sufficient on arm64.
> Applications are generally expected to treat HWCAPs as
> the source of truth for which features are safe to use, even if the
> underlying hardware is technically capable of more.
I'm fairly certain that there are arm64 applications (and libraries)
which check only the ID register values, and not the HWCAPs.
Architecturally, there are features which are detected via other
mechanisms (e.g. CHKFEAT), for which HWCAPs are also irrelevant. Even if
that happens to be ok today, there are almost certainly future uses that
will not be compatible with the scheme you propose.
I don't think we can say "applications must check the HWCAPs", when we
know that applications and libraries legitimately don't always do that.
> Another significant advantage of using HWCAPs is that many
> applications already rely on them for feature detection. This interface
> allows these applications to work correctly "out-of-the-box" in a
> migrated environment without requiring any userspace modifications. I
> understand that some apps may use other detection methods; however, there
> it no gurantee that these applications will work correctly after
> migration to another machine.
I think the existince of applications that detect features by other
(legitimate!) means implies that there's no guarantee that this feature
is useful and will remain useful going forwards.
For example, what do you plan to do if an application or library starts
doing something legitimate that causes it to become incompatible with
this scheme?
I don't want to be in a position where userspace is asked to steer clear
of legitimate mechanisms, or where architecture code suddently has to
pick up a lot of complexity to make this work.
> > There's also kvm, which provides a roundabout way to query some features
> > of the underlying hardware.
> >
> > You're probably better off using/extending the idreg overrides we have
> > in arch/arm64/kernel/pi/idreg-override.c so that you can make your
> > cluster of heterogeneous machines look alike.
>
> IIRC, idreg-override/cpuid-masking usually works for an entire machine.
> We actually need to have a mechanism that will work on a per-container
> basis. Workloads inside one cluster can have different
> migration/snapshot requirements. Some are pinned to a specific node,
> others are never migrated, while others need to be migratable across a
> cluster or even between clusters. We need a mechanism that can be
> tunable on a per-container/per-process basis.
I think that's theoretically possible, BUT it will require substantially
more complexity, to address the issues that Will and I have mentioned. I
don't think people are very happy to pick up that complexity.
There are many other aspects that are going to be problematic for
heterogeneous migration. Even if you hide the HWCAP for a stateful
feature (e.g. SME), it might appear in one machine's signal frames (and
be mandatory there), but might not appear in anothers, and so migration
might not work either way. Likewise, that state can appear via ptrace.
Thanks,
Mark.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox