Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] media: cedrus: skip invalid H.264 reference list entries
From: Paul Kocialkowski @ 2026-04-09 14:31 UTC (permalink / raw)
  To: Nicolas Dufresne
  Cc: Pengpeng Hou, mripard, mchehab, gregkh, wens, jernej.skrabec,
	samuel, linux-media, linux-staging, linux-arm-kernel, linux-sunxi,
	linux-kernel
In-Reply-To: <4bd5d70a6144f3e8d4356c182f314cf735f1921c.camel@collabora.com>

[-- Attachment #1: Type: text/plain, Size: 6000 bytes --]

Hi Nicolas,

On Thu 09 Apr 26, 10:00, Nicolas Dufresne wrote:
> Le jeudi 09 avril 2026 à 15:33 +0200, Paul Kocialkowski a écrit :
> > Hi,
> > 
> > On Tue 24 Mar 26, 16:08, Pengpeng Hou wrote:
> > > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the
> > > stateless slice control and later uses their indices to look up
> > > decode->dpb[] in _cedrus_write_ref_list().
> > > 
> > > Rejecting such controls in cedrus_try_ctrl() would break existing
> > > userspace, since stateless H.264 reference lists may legitimately carry
> > > out-of-range indices for missing references. Instead, guard the actual
> > > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed
> > > V4L2_H264_NUM_DPB_ENTRIES array.
> > 
> > Could you explain why it is legitimate that userspace would pass indices that
> > are not in the dpb list? As far as I remember from the H.264 spec, the L0/L1
> > lists are constructed from active references only and the number of items
> > there
> > should be given by num_ref_idx_l0_active_minus1/num_ref_idx_l1_active_minus1.
> > We can tolerate invalid data beyond these indices, but certainly not as part
> > of the indices that should be valid.
> > 
> > However I agree that cedrus_try_ctrl is maybe not the right place to check it
> > since I'm not sure we are guaranteed that the slice params control will be
> > checked before the new DPB (from the same request) is applied, so we might end
> > up checking against the dpb from the previous decode request.
> > 
> > But I think we should error out and not just skip the invalid reference.
> 
> Its been a long time I haven't looked into this. But what happens here is that
> once you lost a reference, the userspace DPB will hold a gap picture, which as
> no backing storage. Since it has no backing storage, there is no cookie
> (timestamp) associated with it. This gap picture will still make it to the
> reference lists, since the position of the reference in the list is important
> (you cannot just remove an item). It is an established practice in userspace to
> simply fill the void with an invalid index, typically 0xff, which is always
> invalid. Because that's what some userspace do, it became part of our ABI.

Right we definitely need to keep the order of the L0/L1 lists even with missing
references and the question is whether the hardware can deal with it or not.

Our uAPI specification currently doesn't say anything about handling missing
references. I'm generally not very keen on considering that undefined behavior
becomes de-facto uAPI that should never be broken, because there are cases where
it is obviously incorrect and the fact that it didn't fail previously is the
result of a bug in the implementation.

But in this situation I agree we do need a way to indicate that references are
missing and using 0xff sounds like a good plan to me, given that we provide a
uAPI header define with this value and that the doc mentions it.

> Decoders are expected be fault tolerant, though the tolerance level is hardware
> specific, and so failing in the common code would be inappropriate (failing in
> Cedrus could be acceptable, assuming it can't work with missing references,
> which the implementation seems to be fine with).

Okay I agree that we should not fail in common code and tolerate a value to
indicate a missing reference.

What the current proposal is doing (skipping the reference) results in the
SRAM entry for the reference remaining untouched, which will keep its value
from the previous frame. This seems clearly incorrect.

> Hantro G1 notably have a flag to report missing reference to the HW, and it will
> manage concealement internally. G2/RKVDEC don't, and we try and pick the most
> recent frame as a replacement backing storage, which most of the time minimises
> the damages.

It sounds like an approach that could work for cedrus too.

> As future refinement, we need drivers in the long term to properly report the
> damages (perhaps through additional RO request controls). As discussed few years
> ago in the error handling wip for rkvdec, the V4L2 doc specify that any sort of
> damages known to exist in a frame shall results in the ERROR flag being set. We
> can deduce that the error flag with a payload of 0 indicates to userspace to not
> use the frame (which typically happen on hard errors, or errror at entropy
> decode staged) and ERROR flag with a correct payload signal some level of
> corruption, and its left to the application to decide what to do.

I think it make sense yes, but it would be good to document it in the uAPI
document too.

All the best,

Paul

> Nicolas
> 
> > 
> > All the best,
> > 
> > Paul
> > 
> > > 
> > > This keeps the fix local to the driver use site and avoids out-of-bounds
> > > reads from malformed or unsupported reference list entries.
> > > 
> > > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
> > > ---
> > >  drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > > b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > > @@ -210,6 +210,9 @@ static void _cedrus_write_ref_list(struct cedrus_ctx
> > > *ctx,
> > >  		u8 dpb_idx;
> > >  
> > >  		dpb_idx = ref_list[i].index;
> > > +		if (dpb_idx >= V4L2_H264_NUM_DPB_ENTRIES)
> > > +			continue;
> > > +
> > >  		dpb = &decode->dpb[dpb_idx];
> > >  
> > >  		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> > > -- 
> > > 2.50.1
> > > 



-- 
Paul Kocialkowski,

Independent contractor - sys-base - https://www.sys-base.io/
Free software developer - https://www.paulk.fr/

Expert in multimedia, graphics and embedded hardware support with Linux.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v10 16/20] coresight: Add PM callbacks for sink device
From: James Clark @ 2026-04-09 14:30 UTC (permalink / raw)
  To: Suzuki K Poulose, Leo Yan
  Cc: coresight, linux-arm-kernel, Yeoreum Yun, Mark Rutland,
	Will Deacon, Yabin Cui, Keita Morisaki, Yuanfang Zhang,
	Greg Kroah-Hartman, Alexander Shishkin, Tamas Petz,
	Thomas Gleixner, Peter Zijlstra, Mike Leach
In-Reply-To: <f20b5a00-1d68-48d6-90c8-97dca2e279a4@arm.com>



On 09/04/2026 2:14 pm, Suzuki K Poulose wrote:
> On 09/04/2026 11:52, James Clark wrote:
>>
>>
>> On 05/04/2026 4:02 pm, Leo Yan wrote:
>>> Unlike system level sinks, per-CPU sinks may lose power during CPU idle
>>> states.  Currently, this applies specifically to TRBE.  This commit
>>> invokes save and restore callbacks for the sink in the CPU PM notifier.
>>>
>>> If the sink provides PM callbacks but the source does not, this is
>>> unsafe because the sink cannot be disabled safely unless the source
>>> can also be controlled, so veto low power entry to avoid lockups.
>>>
>>> Tested-by: James Clark <james.clark@linaro.org>
>>> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
>>> Reviewed-by: James Clark <james.clark@linaro.org>
>>> Signed-off-by: Leo Yan <leo.yan@arm.com>
>>> ---
>>>   drivers/hwtracing/coresight/coresight-core.c | 46 +++++++++++++++++ 
>>> + ++++++++--
>>>   1 file changed, 43 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/ 
>>> hwtracing/coresight/coresight-core.c
>>> index 
>>> c1e8debc76aba7eb5ecf7efe2a3b9b8b3e11b10c..a918bf6398a932de30fe9b4947020cc4c1cfb2f7 100644
>>> --- a/drivers/hwtracing/coresight/coresight-core.c
>>> +++ b/drivers/hwtracing/coresight/coresight-core.c
>>> @@ -1736,14 +1736,15 @@ static void coresight_release_device_list(void)
>>>   /* Return: 1 if PM is required, 0 if skip, <0 on error */
>>>   static int coresight_pm_check(struct coresight_path *path)
>>>   {
>>> -    struct coresight_device *source;
>>> -    bool source_has_cb;
>>> +    struct coresight_device *source, *sink;
>>> +    bool source_has_cb, sink_has_cb;
>>>       if (!path)
>>>           return 0;
>>>       source = coresight_get_source(path);
>>> -    if (!source)
>>> +    sink = coresight_get_sink(path);
>>> +    if (!source || !sink)
>>>           return 0;
>>>       /* Don't save and restore if the source is inactive */
>>> @@ -1759,16 +1760,36 @@ static int coresight_pm_check(struct 
>>> coresight_path *path)
>>>       if (source_has_cb)
>>>           return 1;
>>> +    sink_has_cb = coresight_ops(sink)->pm_save_disable &&
>>> +              coresight_ops(sink)->pm_restore_enable;
>>> +    /*
>>> +     * It is not permitted that the source has no callbacks while 
>>> the sink
>>> +     * does, as the sink cannot be disabled without disabling the 
>>> source,
>>> +     * which may lead to lockups. Alternatively, the ETM driver should
>>> +     * enable self-hosted PM mode at probe (see etm4_probe()).
>>> +     */
>>> +    if (sink_has_cb) {
>>> +        pr_warn_once("coresight PM failed: source has no PM 
>>> callbacks; "
>>> +                 "cannot safely control sink\n");
>>
>> This prints out on my Orion board on a fresh boot because of how 
>> pm_save_enable is setup there. Do we really need the configuration of 
>> pm_save_enable for ETE/TRBE if we know that it always needs saving?
>>
>> It also stops warning if I rmmod and modprobe the module after 
>> booting. Seems like pm_save_enable is different depending on how the 
>> module is loaded which doesn't seem right.
> 
> Thats because the warning is pr_warn_*once*()
> 
> Suzuki
> 
> 

I don't think so, I tested it with a printf instead of a warn once and 
also tested modprobeing straight after a reboot.

>>
>>> +        return -EINVAL;
>>> +    }
>>> +
>>>       return 0;
>>>   }
>>>   static int coresight_pm_device_save(struct coresight_device *csdev)
>>>   {
>>> +    if (!csdev || !coresight_ops(csdev)->pm_save_disable)
>>> +        return 0;
>>> +
>>>       return coresight_ops(csdev)->pm_save_disable(csdev);
>>>   }
>>>   static void coresight_pm_device_restore(struct coresight_device 
>>> *csdev)
>>>   {
>>> +    if (!csdev || !coresight_ops(csdev)->pm_restore_enable)
>>> +        return;
>>> +
>>>       coresight_ops(csdev)->pm_restore_enable(csdev);
>>>   }
>>> @@ -1787,15 +1808,32 @@ static int coresight_pm_save(struct 
>>> coresight_path *path)
>>>       to = list_prev_entry(coresight_path_last_node(path), link);
>>>       coresight_disable_path_from_to(path, from, to);
>>> +    ret = coresight_pm_device_save(coresight_get_sink(path));
>>> +    if (ret)
>>> +        goto sink_failed;
>>> +
>>
>> The comment directly above this says "Up to the node before sink to 
>> avoid latency". But then this line goes and saves the sink anyway. So 
>> I'm not sure what's meant by the comment?
>>
>>>       return 0;
>>> +
>>> +sink_failed:
>>> +    if (!coresight_enable_path_from_to(path, 
>>> coresight_get_mode(source),
>>> +                       from, to))
>>> +        coresight_pm_device_restore(source);
>>> +
>>> +    pr_err("Failed in coresight PM save on CPU%d: %d\n",
>>> +           smp_processor_id(), ret);
>>> +    this_cpu_write(percpu_pm_failed, true);
>>
>> Why does only a failing sink set percpu_pm_failed when failing to save 
>> the source exits early. Sashiko has a similar comment that this could 
>> result in restoring uninitialised source save data later, but a 
>> comment in this function about why the flow is like this would be 
>> helpful.
>>
>> We have coresight_disable_path_from_to() which always succeeds and 
>> doesn't return an error. TRBE is the only sink with a pm_save_disable()
>> callback, but it always succeeds anyway.
>>
>> Would it not be much simpler to require that sink save/restore 
>> callbacks always succeed and don't return anything? Seems like this 
>> percpu_pm_failed stuff is extra complexity for a scenario that doesn't 
>> exist? The only thing that can fail is saving the source but it 
>> doesn't goto sink_failed when that happens.
>>
>> Ideally etm4_cpu_save() wouldn't have a return value either. It would 
>> be good if we could find away to skip or ignore the timeouts in there 
>> somehow because that's the only reason it can fail.
>>
>>> +    return ret;
>>>   }
>>>   static void coresight_pm_restore(struct coresight_path *path)
>>>   {
>>>       struct coresight_device *source = coresight_get_source(path);
>>> +    struct coresight_device *sink = coresight_get_sink(path);
>>>       struct coresight_node *from, *to;
>>>       int ret;
>>> +    coresight_pm_device_restore(sink);
>>> +
>>>       from = coresight_path_first_node(path);
>>>       /* Up to the node before sink to avoid latency */
>>>       to = list_prev_entry(coresight_path_last_node(path), link);
>>> @@ -1808,6 +1846,8 @@ static void coresight_pm_restore(struct 
>>> coresight_path *path)
>>>       return;
>>>   path_failed:
>>> +    coresight_pm_device_save(sink);
>>> +
>>>       pr_err("Failed in coresight PM restore on CPU%d: %d\n",
>>>              smp_processor_id(), ret);
>>>
>>
> 



^ permalink raw reply

* Re: [RFC PATCH 1/7] media: v4l2-ctrls: Add V4L2_CID_MEMORY_USAGE control
From: Detlev Casanova @ 2026-04-09 14:29 UTC (permalink / raw)
  To: Nicolas Dufresne, Ming Qian(OSS), Frank Li
  Cc: linux-media, mchehab, hverkuil-cisco, sebastian.fricke, shawnguo,
	s.hauer, kernel, festevam, linux-imx, xiahong.bao, eagle.zhou,
	imx, linux-kernel, linux-arm-kernel
In-Reply-To: <8911674f2f86a4b75e1f44d6e9b66a28f6e74e56.camel@ndufresne.ca>



On 4/8/26 17:11, Nicolas Dufresne wrote:
> Le jeudi 02 avril 2026 à 11:14 +0800, Ming Qian(OSS) a écrit :
>> Hi Nicolas,
>>
>> On 4/1/2026 10:23 AM, Ming Qian(OSS) wrote:
>>> Hi Nicolas,
>>>
>>> On 3/31/2026 10:54 PM, Nicolas Dufresne wrote:
>>>> Le mardi 31 mars 2026 à 10:33 -0400, Frank Li a écrit :
>>>>> On Tue, Mar 31, 2026 at 03:23:11PM +0800, ming.qian@oss.nxp.com wrote:
>>>>>> From: Ming Qian <ming.qian@oss.nxp.com>
>>>>>>
>>>>>> Add a new read-only control V4L2_CID_MEMORY_USAGE that allows
>>>>>> applications to query the total amount of memory currently used
>>>>>> by a device instance.
>>>>>>
>>>>>> This control reports the memory consumption in bytes, including
>>>>>> internal buffers, intermediate processing data, and other
>>>>>> driver-managed allocations. Applications can use this information
>>>>>> for debugging, resource monitoring, or making informed decisions
>>>>>> about buffer allocation strategies.
>>>>>>
>>>>>> Signed-off-by: Ming Qian <ming.qian@oss.nxp.com>
>>>>>> ---
>>>>> Not sure why not export these information by debugfs, or any benefit vs
>>>>> debugfs?
>>>> There is also a on-going proposal that uses fdinfo.
>>>>
>>>> Nicolas
>>>>
>>> Thanks for the reminder about the ongoing fdinfo proposal.
>>>
>>> Just to confirm, you are referring to Detlev’s ongoing fdinfo proposal,
>>> specifically this series:
>>> https://lore.kernel.org/lkml/20260212162328.192217-1-
>>> detlev.casanova@collabora.com/
>>>
>>> I will align my work with it and switch to using fdinfo.
>>> Once the show_fdinfo support from that series is merged, I will prepare
>>> the next revision of my patch accordingly.
>>>
>>> Regards,
>>> Ming
>>>
>> Regarding the discussion about using fdinfo instead of a V4L2 control, I
>> have two questions:
>>
>> 	1. Key consistency in fdinfo
>> 	fdinfo uses key–value pairs, which is flexible, but if multiple
>> 	drivers want to expose the same “memory usage” information,
>> 	they need to agree on a common key name and meaning. Otherwise
>> 	user‑space must handle each driver differently. A V4L2 control
>> 	naturally provides a unified interface without this coordination
>> 	effort.
>>
>>
>> 	2. Lack of notification in fdinfo
>> 	With a control, user‑space can subscribe to control events and
>> 	receive notifications when the memory usage changes. fdinfo does
>> 	not have a built‑in event mechanism, so users must either poll
>> 	or rely on additional eventfd‑like or custom event mechanisms.
>>
>> Do you have any suggestions or existing practices to address these two
>> issues when using fdinfo?
>>
>> Thanks again for your time and comments.
> Added Detlev in CC. You can also refer to his work through:
>
> https://lore.kernel.org/all/20260212162328.192217-1-detlev.casanova@collabora.com/
>
> Nicolas
Hi Ming !

One of the reasons for using fdinfo is that it's already being used in 
the drm subsystem and it is working well.
Of course, in DRM, drivers don't allocate a lot of memory themselves, 
userspace drivers (in mesa) go through the DRM uAPI to allocate buffers, 
making the DRM subsystem aware of all allocated memory.
That lets DRM show memory stats in a standard way for all drm drivers. 
In v4l2, memory allocation is shared between userspace and the driver.
We could have drivers report memory usage through a callback and 
v4l2-core can add the standard field based on that.

For notifications, I don't really see a need for that, most tracing 
tools will use polling (I'm thinking perfetto, but also top-like tools).
We could have a max-mem-usage field if we'd want to make sure we don't 
miss the maximum memory usage between 2 polls.

Finally, I think v4l2 controls should only be used to control, configure 
and exchange data with video devices, not get stat information on what 
the driver is doing.

Detlev.
>
>> Regards,
>> Ming
>>
>>>>> Generanlly document should be first patch, then driver change.
>>>>>
>>>>> Frank
>>>>>
>>>>>>    drivers/media/v4l2-core/v4l2-ctrls-defs.c | 8 ++++++++
>>>>>>    include/uapi/linux/v4l2-controls.h        | 4 +++-
>>>>>>    2 files changed, 11 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/media/v4l2-core/v4l2-ctrls-defs.c b/drivers/
>>>>>> media/v4l2-core/v4l2-ctrls-defs.c
>>>>>> index 551426c4cd01..053db78ff661 100644
>>>>>> --- a/drivers/media/v4l2-core/v4l2-ctrls-defs.c
>>>>>> +++ b/drivers/media/v4l2-core/v4l2-ctrls-defs.c
>>>>>> @@ -831,6 +831,7 @@ const char *v4l2_ctrl_get_name(u32 id)
>>>>>>        case V4L2_CID_ALPHA_COMPONENT:        return "Alpha Component";
>>>>>>        case V4L2_CID_COLORFX_CBCR:        return "Color Effects, CbCr";
>>>>>>        case V4L2_CID_COLORFX_RGB:              return "Color Effects,
>>>>>> RGB";
>>>>>> +    case V4L2_CID_MEMORY_USAGE:        return "Memory Usage";
>>>>>>
>>>>>>        /*
>>>>>>         * Codec controls
>>>>>> @@ -1476,6 +1477,13 @@ void v4l2_ctrl_fill(u32 id, const char
>>>>>> **name, enum v4l2_ctrl_type *type,
>>>>>>            *min = 0;
>>>>>>            *max = 0xffff;
>>>>>>            break;
>>>>>> +    case V4L2_CID_MEMORY_USAGE:
>>>>>> +        *type = V4L2_CTRL_TYPE_INTEGER64;
>>>>>> +        *flags |= V4L2_CTRL_FLAG_READ_ONLY;
>>>>>> +        *min = 0;
>>>>>> +        *max = S64_MAX;
>>>>>> +        *step = 1;
>>>>>> +        break;
>>>>>>        case V4L2_CID_FLASH_FAULT:
>>>>>>        case V4L2_CID_JPEG_ACTIVE_MARKER:
>>>>>>        case V4L2_CID_3A_LOCK:
>>>>>> diff --git a/include/uapi/linux/v4l2-controls.h b/include/uapi/
>>>>>> linux/v4l2-controls.h
>>>>>> index 68dd0c4e47b2..02c6f960d38e 100644
>>>>>> --- a/include/uapi/linux/v4l2-controls.h
>>>>>> +++ b/include/uapi/linux/v4l2-controls.h
>>>>>> @@ -110,8 +110,10 @@ enum v4l2_colorfx {
>>>>>>    #define V4L2_CID_COLORFX_CBCR            (V4L2_CID_BASE+42)
>>>>>>    #define V4L2_CID_COLORFX_RGB            (V4L2_CID_BASE+43)
>>>>>>
>>>>>> +#define V4L2_CID_MEMORY_USAGE            (V4L2_CID_BASE+44)
>>>>>> +
>>>>>>    /* last CID + 1 */
>>>>>> -#define V4L2_CID_LASTP1                         (V4L2_CID_BASE+44)
>>>>>> +#define V4L2_CID_LASTP1                         (V4L2_CID_BASE+45)
>>>>>>
>>>>>>    /* USER-class private control IDs */
>>>>>>
>>>>>> -- 
>>>>>> 2.53.0
>>>>>>



^ permalink raw reply

* Re: [PATCH v2 2/4] perf: Fix uninitialized bitfields in perf_clear_branch_entry_bitfields()
From: Leo Yan @ 2026-04-09 14:28 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: bpf, Puranjay Mohan, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Will Deacon, Mark Rutland,
	Catalin Marinas, Rob Herring, Breno Leitao, linux-arm-kernel,
	linux-perf-users, kernel-team
In-Reply-To: <20260318171706.2840512-3-puranjay@kernel.org>

On Wed, Mar 18, 2026 at 10:16:56AM -0700, Puranjay Mohan wrote:
> perf_clear_branch_entry_bitfields() zeroes individual bitfields of struct
> perf_branch_entry but misses the new_type (4 bits) and priv (3 bits)
> fields. This means any code path that relies on this function to produce
> a clean entry may expose stale or uninitialised data in these fields to
> userspace.
> 
> The function was introduced by commit bfe4daf850f4 ("perf/core: Add
> perf_clear_branch_entry_bitfields() helper") specifically to "centralize
> the initialization to avoid missing a field in case more are added."
> Unfortunately, the commits that later added new_type and priv to struct
> perf_branch_entry only updated the UAPI header and did not update this
> clearing function.
> 
> Zero new_type and priv alongside the other bitfields.
> 
> Fixes: b190bc4ac9e6 ("perf: Extend branch type classification")
> Fixes: 5402d25aa571 ("perf: Capture branch privilege information")
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
>  include/linux/perf_event.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 48d851fbd8ea..d7f39b7e9cda 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1481,6 +1481,8 @@ static inline void perf_clear_branch_entry_bitfields(struct perf_branch_entry *b
>  	br->cycles	= 0;
>  	br->type	= 0;
>  	br->spec	= PERF_BR_SPEC_NA;
> +	br->new_type	= 0;
> +	br->priv	= 0;
>  	br->reserved	= 0;
>  }

We already know this does not work well. Instead, we can define a union
for bitfield and use memset to clear it, later we will not bother for
this kind of issue anymore.

  struct perf_branch_entry {
          ...
  
          union {
                  struct {
                          __u64   mispred   :  1, /* target mispredicted */
                                  predicted :  1, /* target predicted */
                                  ...
                                  reserved  : 31;
                  };
                  __u64 bitfields;
          };
  };

static inline void perf_clear_branch_entry_bitfields(struct perf_branch_entry *br)
{
        memset(&br->bitfields, 0, sizeof(br->bitfields));
}

Thanks,
Leo


^ permalink raw reply

* Re: [PATCH RFC net-next 0/4] improve hw flow offload byte accounting
From: Daniel Golle @ 2026-04-09 14:21 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Felix Fietkau, John Crispin, Lorenzo Bianconi, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
	Florian Westphal, Phil Sutter, netdev, linux-kernel,
	linux-arm-kernel, linux-mediatek, netfilter-devel, coreteam
In-Reply-To: <adevKeasLkEB5zZ4@chamomile>

On Thu, Apr 09, 2026 at 03:52:41PM +0200, Pablo Neira Ayuso wrote:
> On Thu, Apr 09, 2026 at 02:07:22PM +0100, Daniel Golle wrote:
> > Hardware flow counters report raw byte counts whose semantics
> > vary by vendor -- some count ingress L2 frames, others egress
> > L2, others L3. The nf_flow_table framework currently passes
> > these bytes straight to conntrack without conversion, and
> > sub-interfaces (VLAN, PPPoE) that are bypassed by hw offload
> > never see any counter updates at all.
> 
> I see, but that is part of the feature itself? Why pretend that these
> interface are really seeing traffic while they don't. This aspiration
> of trying to do all hardware offload fully transparent (when it is not
> the case, not mentioning semantic changes in how packet handling is
> done compared to the software plane) does not sound convincing to me.

Please explain what you mean by offloading not being fully
transparent. If the MAC hardware offloads VLAN encap/decap, for
example, we also maintain the counters correctly (it just so happens),
just the flow-offloading case results in a weird overall picture:
hardware interface counters keep increasing, encap interfaces (802.1Q,
PPPoE) don't. That makes it confusing and hard to understand what's
happening when only looking at the interface counters (ie. "what is
all that traffic on my physical WAN interface which isn't PPPoE? Can't
be that all of that is the modems management interface, SNMP, ...")

> 
> On top of this, this issue also exists in the software plane: Devices
> that are bypasses do not get their counters bumped.
> 
> Maybe if this is really a requirement, then this should address the
> issue for software too, but is it worth the effort to add
> infrastructure for this purpose?

To me it would feel more correct to see counters increasing also
for offloaded traffic on software interfaces such as PPPoE or VLAN.

I honestly didn't think about the software fastpath, and yes, I think
it should be addressed there too.

> > This series lets drivers declare what their counters represent,
> > so the framework can normalize to L3 for conntrack and
> > propagate per-layer stats to encap sub-interfaces.

This part could also been seen as an independent fix as currently
conntrack stats for the same traffic differ in case of software
offloading (pure L3 bytes) and hardware offloading (L2 ingress bytes
in case of mtk_ppe).


^ permalink raw reply

* Re: [PATCH] mm/arm: pgtable: remove young bit check for pte_valid_user
From: Russell King (Oracle) @ 2026-04-09 14:21 UTC (permalink / raw)
  To: Will Deacon; +Cc: Brian Ruley, Steve Capper, linux-arm-kernel, linux-kernel
In-Reply-To: <adewJetUv6wMiz9o@willie-the-truck>

On Thu, Apr 09, 2026 at 02:56:53PM +0100, Will Deacon wrote:
> On Thu, Apr 09, 2026 at 03:54:45PM +0300, Brian Ruley wrote:
> > Fixes cache desync, which can cause undefined instruction,
> > translation and permission faults under heavy memory use.
> > 
> > This is an old bug introduced in commit 1971188aa196 ("ARM: 7985/1: mm:
> > implement pte_accessible for faulting mappings"), which included a check
> > for the young bit of a PTE. The underlying assumption was that old pages
> > are not cached, therefore, `__sync_icache_dcache' could be skipped
> > entirely.
> > 
> > However, under extreme memory pressure, page migrations happen
> > frequently and the assumption of uncached "old" pages does not hold.
> > Especially for systems that do not have swap, the migrated pages are
> > unequivocally marked old. This presents a problem, as it is possible
> > for the original page to be immediately mapped to another VA that
> > happens to share the same cache index in VIPT I-cache (we found this
> > bug on Cortex-A9). Without cache invalidation, the CPU will see the
> > old mapping whose physical page can now be used for a different
> > purpose, as illustrated below:
> > 
> >                 Core                      Physical Memory
> >   +-------------------------------+     +------------------+
> >   | TLB                           |     |                  |
> >   |  VA_A 0xb6e6f -> pfn_q        |     | pfn_q: code      |
> >   +-------------------------------+     +------------------+
> >   | I-cache                       |
> >   |  set[VA_A bits] | tag=pfn_q   |
> >   +-------------------------------+
> > 
> > migrate (kcompactd):
> >   1. copy pfn_q --> pfn_r
> >   2. free pfn_q
> >   3. pte: VA_a -> pfn_r
> >   4. pte_mkold(pte) --> !young
> >   5. ICIALLUIS skipped (because !young)
> > 
> > pfn_src reused (OOM pressure):
> >   pte: VA_B -> pfn_q (different code)
> > 
> > bug:
> >                 Core                      Physical Memory
> >   +-------------------------------+     +------------------+
> >   | TLB (empty)                   |     | pfn_r: old code  |
> >   +-------------------------------+     | pfn_q: new code  |
> >   | I-cache                       |     +------------------+
> >   |  set[VA_A bits] | tag=pfn_q   |<--- wrong instructions
> >   +-------------------------------+
> 
> (nit: Do you have pfn_r and pfn_q mixed up in the "Physical Memory" box?)
> 
> > This was verified on ba16-based board (i.MX6Quad/Dual, Cortex-A9) by
> > instrumenting the migration code to track recently migrated pages in a
> > ring buffer and then dumping them in the undefined instruction fault
> > handler. The bug can be triggered with `stress-ng':
> > 
> >   stress-ng --vm 4 --vm-bytes 2G --vm-method zero-one --verify
> > 
> > Note that the system we tested on has only 2G of memory, so the test
> > triggered the OOM-killer in our case.
> > 
> > Fixes: 1971188aa196 ("ARM: 7985/1: mm: implement pte_accessible for faulting mappings")
> > Signed-off-by: Brian Ruley <brian.ruley@gehealthcare.com>
> > ---
> >  arch/arm/include/asm/pgtable.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> > index 6fa9acd6a7f5..e3a5b4a9a65f 100644
> > --- a/arch/arm/include/asm/pgtable.h
> > +++ b/arch/arm/include/asm/pgtable.h
> > @@ -185,7 +185,7 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
> >  #define pte_exec(pte)		(pte_isclear((pte), L_PTE_XN))
> >  
> >  #define pte_valid_user(pte)	\
> > -	(pte_valid(pte) && pte_isset((pte), L_PTE_USER) && pte_young(pte))
> > +	(pte_valid(pte) && pte_isset((pte), L_PTE_USER))
> 
> This patch is from twelve years ago, so please forgive me for having
> forgotten all of the details. However, my recollection is that when using
> the classic/!lpae format (as you will be on Cortex-A9), page aging is
> implemented by using invalid (translation faulting) ptes for 'old'
> mappings.

It is.

> So in the case you describe, we may well elide the I-cache maintenance,
> but won't we also put down an invalid pte?

Correct.

> If we later take a fault
> on that, we should then perform the cache maintenance when installing
> the young entry (via ptep_set_access_flags()).

Correct again.

> The more interesting part
> is probably when the mapping for 'VA_B' is installed to map 'pfn_q' but,
> again, I would've expected the cache maintenance to happen just prior to
> installing the valid (young) mapping.

Also correct - for the new PTE to become accessible in userspace, we
would need to establish a young PTE, which will result in set_ptes()
being called, and that should trigger __flush_icache_all() which will
flush the _entire_ instruction cache, which will remove any stale
entries for the old mapping that is no longer accessible.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


^ permalink raw reply

* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
From: Suzuki K Poulose @ 2026-04-09 14:18 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Ryan Roberts, Will Deacon, David Hildenbrand (Arm), Dev Jain,
	Yang Shi, Jinjiang Tu, Kevin Brodsky, linux-arm-kernel,
	linux-kernel, stable
In-Reply-To: <d1ecba64-898f-433b-93d4-7a33b9c3f378@arm.com>



On 09/04/2026 10:38, Suzuki K Poulose wrote:
> On 07/04/2026 18:21, Catalin Marinas wrote:
>> On Tue, Apr 07, 2026 at 10:57:35AM +0100, Suzuki K Poulose wrote:
>>> On 02/04/2026 21:43, Catalin Marinas wrote:
>>>> On Mon, Mar 30, 2026 at 05:17:02PM +0100, Ryan Roberts wrote:
>>>>>    int split_kernel_leaf_mapping(unsigned long start, unsigned long 
>>>>> end)
>>>>>    {
>>>>>        int ret;
>>>>> -    /*
>>>>> -     * !BBML2_NOABORT systems should not be trying to change 
>>>>> permissions on
>>>>> -     * anything that is not pte-mapped in the first place. Just 
>>>>> return early
>>>>> -     * and let the permission change code raise a warning if not 
>>>>> already
>>>>> -     * pte-mapped.
>>>>> -     */
>>>>> -    if (!system_supports_bbml2_noabort())
>>>>> -        return 0;
>>>>> -
>>>>>        /*
>>>>>         * If the region is within a pte-mapped area, there is no 
>>>>> need to try to
>>>>>         * split. Additionally, CONFIG_DEBUG_PAGEALLOC and 
>>>>> CONFIG_KFENCE may
>>>>>         * change permissions from atomic context so for those cases 
>>>>> (which are
>>>>>         * always pte-mapped), we must not go any further because 
>>>>> taking the
>>>>> -     * mutex below may sleep.
>>>>> +     * mutex below may sleep. Do not call force_pte_mapping() here 
>>>>> because
>>>>> +     * it could return a confusing result if called from a 
>>>>> secondary cpu
>>>>> +     * prior to finalizing caps. Instead, 
>>>>> linear_map_requires_bbml2 gives us
>>>>> +     * what we need.
>>>>>         */
>>>>> -    if (force_pte_mapping() || is_kfence_address((void *)start))
>>>>> +    if (!linear_map_requires_bbml2 || is_kfence_address((void 
>>>>> *)start))
>>>>>            return 0;
>>>>> +    if (!system_supports_bbml2_noabort()) {
>>>>> +        /*
>>>>> +         * !BBML2_NOABORT systems should not be trying to change
>>>>> +         * permissions on anything that is not pte-mapped in the 
>>>>> first
>>>>> +         * place. Just return early and let the permission change 
>>>>> code
>>>>> +         * raise a warning if not already pte-mapped.
>>>>> +         */
>>>>> +        if (system_capabilities_finalized())
>>>>> +            return 0;
>>>>> +
>>>>> +        /*
>>>>> +         * Boot-time: split_kernel_leaf_mapping_locked() allocates 
>>>>> from
>>>>> +         * page allocator. Can't split until it's available.
>>>>> +         */
>>>>> +        if (WARN_ON(!page_alloc_available))
>>>>> +            return -EBUSY;
>>>>> +
>>>>> +        /*
>>>>> +         * Boot-time: Started secondary cpus but don't know if they
>>>>> +         * support BBML2_NOABORT yet. Can't allow splitting in this
>>>>> +         * window in case they don't.
>>>>> +         */
>>>>> +        if (WARN_ON(num_online_cpus() > 1))
>>>>> +            return -EBUSY;
>>>>> +    }
>>>>
>>>> I think sashiko is over cautions here
>>>> (https://sashiko.dev/#/patchset/20260330161705.3349825-1- 
>>>> ryan.roberts@arm.com)
>>>> but it has a somewhat valid point from the perspective of
>>>> num_online_cpus() semantics. We have have num_online_cpus() == 1 while
>>>> having a secondary CPU just booted and with its MMU enabled. I don't
>>>> think we can have any asynchronous tasks running at that point to
>>>> trigger a spit though. Even async_init() is called after smp_init().
>>>>
>>>> An option may be to attempt cpus_read_trylock() as this lock is 
>>>> taken by
>>>> _cpu_up(). If it fails, return -EBUSY, otherwise check 
>>>> num_online_cpus()
>>>> and unlock (and return -EBUSY if secondaries already started).
>>>>
>>>> Another thing I couldn't get my head around - IIUC is_realm_world()
>>>> won't return true for map_mem() yet (if in a realm).
>>>
>>> That is correct. map_mem() comes from paginig_init(), which gets called
>>> before arm64_rsi_init(). Realm check was delayed until psci_xx_init().
>>> We had a version which parsed the DT for PSCI conduit early enough
>>> to be able to make the SMC calls to detect the Realm. But there
>>> were concerns around it.
>>
>> Ah, yes, I remember.
>>
>> Does it mean that commit 42be24a4178f ("arm64: Enable memory encrypt for
>> Realms") was broken without rodata=full w.r.t. the linear map? Commit
> 
> Apparently, it looks like we missed this when we demoted the RSI
> detection later.
> 
>> a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
>> introduced force_pte_mapping() but it just copied the logic in the
>> existing can_set_direct_map(). Looking at the linear_map_requires_bbml2
>> assignment, we get (!is_realm_world() && is_realm_world()) and it
>> cancels out, no effect on it but we don't get pte mappings either (even
>> if we don't have BBML2).
> 
> Yep, that's right.
>>
>> I think we need at least some safety checks:
>>
>> 1. BBML2_NOABORT support on the boot CPU - continue with the existing
>>     logic (as per Ryan's series)
>>
>> 2. !system_supports_bbml2_noabort() - split in
>>     linear_map_maybe_split_to_ptes(). This does not currently happen
>>     because linear_map_requires_bbml2 may be false in the absence of
>>     rodata=full. Not sure how to fix this without some variable telling
>>     us how the linear map was mapped. The requires_bbml2 flag doesn't
>>
>> 3. Panic in arm64_rsi_init() if !BBML2_NOABORT on the boot CPU _and_ we
>>     have block mappings already. People can avoid it with rodata=full
> 
> It looks like this will be a common case :-(

Having another look, by default, arm64 boots with rodata=full, and users
have to explicitly lower the bar by setting rodata=off or noalias. So
this has been keeping us running ;-).

With rodata=off, I get the following for a Realm boot:

[    0.000000] ------------[ cut here ]------------ 

[    0.000000] WARNING: arch/arm64/mm/pageattr.c:61 at 
pageattr_pmd_entry+0x78/0xe0, CPU#0: swapper/0
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 7.0.0-rc1+ 
#1889 PREEMPT
[    0.000000] Hardware name: linux,dummy-virt (DT)
[    0.000000] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS 
BTYPE=--)
[    0.000000] pc : pageattr_pmd_entry+0x78/0xe0
[    0.000000] lr : walk_pgd_range+0x43c/0x970
[    0.000000] sp : ffff800082343b70
[    0.000000] x29: ffff800082343b70 x28: fff0000019600000 x27: 
fff0000019580000
[    0.000000] x26: ffff800082343c98 x25: fff000001d57ffff x24: 
fff000001fffe000
[    0.000000] x23: ffff8000810ae698 x22: fff000001fffd650 x21: 
fff0000019780000
[    0.000000] x20: fff000001d580000 x19: 0000000000000000 x18: 
0000000000000030
[    0.000000] x17: 0000000000004000 x16: 000000009fffc000 x15: 
0000000000000020
[    0.000000] x14: 0000000000003be4 x13: 0000000000000020 x12: 
0000000000000000
[    0.000000] x11: 0000000000000016 x10: 0000000000000015 x9 : 
0000000000000013
[    0.000000] x8 : 0000000000000015 x7 : 0000000080000000 x6 : 
0000000000000000
[    0.000000] x5 : 0078000099400405 x4 : fff000001fffd650 x3 : 
ffff800082343c98
[    0.000000] x2 : 0000000000080000 x1 : fff0000019580000 x0 : 
0000000000000001
[    0.000000] Call trace:
[    0.000000]  pageattr_pmd_entry+0x78/0xe0 (P)
[    0.000000]  walk_kernel_page_table_range_lockless+0x60/0xa0 

[    0.000000]  update_range_prot+0x80/0x128
[    0.000000]  __set_memory_enc_dec.part.0+0x88/0x258
[    0.000000]  realm_set_memory_decrypted+0x54/0x98
[    0.000000]  set_memory_decrypted+0x38/0x58
[    0.000000]  swiotlb_update_mem_attributes+0x44/0x58
[    0.000000]  mem_init+0x24/0x38
[    0.000000]  mm_core_init+0x94/0x140
[    0.000000]  start_kernel+0x544/0xa18
[    0.000000]  __primary_switched+0x88/0x98
[    0.000000] ---[ end trace 0000000000000000 ]---


Suzuki

> 
>>
>> 4. If (3) is a common case, a better alternative is to rewrite the
>>     linear map sometime after arm64_rsi_init() but before we call
>>     split_kernel_leaf_mapping().
> 
> We will explore this route.
> 
> The other option is to move the RSI detection (and the PSCI probe)
> earlier to be able to make better decisions early on. I will play with
> that a bit too.
> 
> Suzuki
> 
> 
>>
> 



^ permalink raw reply

* Re: [PATCH 7/7] media: rkvdec: Add multicore IOMMU support
From: Nicolas Dufresne @ 2026-04-09 14:19 UTC (permalink / raw)
  To: Detlev Casanova, Mauro Carvalho Chehab, Ezequiel Garcia,
	Heiko Stuebner, Hans Verkuil, Jonas Karlman
  Cc: kernel, linux-media, linux-kernel, linux-rockchip,
	linux-arm-kernel
In-Reply-To: <20260409-rkvdec-multicore-v1-7-62b316abf0f7@collabora.com>

[-- Attachment #1: Type: text/plain, Size: 10653 bytes --]

Le jeudi 09 avril 2026 à 09:50 -0400, Detlev Casanova a écrit :
> As each core has its own IOMMU core, buffers must be mapped in each
> core's IOMMU so that any run() call can use any core without having to
> remap everything.
> 
> To do that, we use rockchip iommu domain's iommu devices list.
> With that, one IOMMU domain can be mapped on multiple devices, meaning
> that each call to iommu_map() will flush the new mapping on all devices
> in the list.
> 
> The IOMMU domain that will have all devices in its list is the first
> core's default domain.
> 
> Another domain cannot be used because VB2 allocates buffers through the
> DMA engine, which uses iommu_get_dma_domain() to find the domain to map
> buffers through.
> 
> The IOMMU restore function can still work as before, but needs to be more
> explicit in what domain to attach the device to.
> That is because detaching the empty domain will reattach the core's default
> domain, which is wrong (except for the first "main" core).
> 
> The RCB temporary buffers are allocated in a dedicated SRAM, each
> core has its own SRAM, so the mapping for each core's SRAM is added in the
> global domain.
> 
> Everything else is mapped through the first core's default domain, making
> the driver write the mappings on both IOMMU cores.

Just raising an issue with the patch ordering here. I'm worried in a git bisect,
the driver will be broken until we apply this last patch. Can we make sure that
the driver bisects ? (or tell me if I'm wrong)

Nicolas

> 
> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
> ---
>  .../media/platform/rockchip/rkvdec/rkvdec-rcb.c    | 21 ++++++-------
>  .../media/platform/rockchip/rkvdec/rkvdec-rcb.h    |  6 ++--
>  drivers/media/platform/rockchip/rkvdec/rkvdec.c    | 35 +++++++++++++++++-----
>  drivers/media/platform/rockchip/rkvdec/rkvdec.h    |  2 +-
>  4 files changed, 44 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c
> index 190fb7438e8c..977e37cf209b 100644
> --- a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c
> +++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c
> @@ -57,7 +57,7 @@ bool rkvdec_rcb_buf_validate_size(struct rkvdec_ctx *ctx)
>  	return ret;
>  }
>  
> -void rkvdec_free_rcb(struct rkvdec_core *core)
> +void rkvdec_free_rcb(struct rkvdec_dev *rkvdec, struct rkvdec_core *core)
>  {
>  	struct rkvdec_rcb_config *cfg = core->rcb_config;
>  	unsigned long virt_addr;
> @@ -76,12 +76,12 @@ void rkvdec_free_rcb(struct rkvdec_core *core)
>  		case RKVDEC_ALLOC_SRAM:
>  			virt_addr = (unsigned long)cfg->rcb_bufs[i].cpu;
>  
> -			if (core->iommu_domain)
> -				iommu_unmap(core->iommu_domain, virt_addr, rcb_size);
> +			if (rkvdec->iommu_global_domain)
> +				iommu_unmap(rkvdec->iommu_global_domain, virt_addr, rcb_size);
>  			gen_pool_free(core->sram_pool, virt_addr, rcb_size);
>  			break;
>  		case RKVDEC_ALLOC_DMA:
> -			dma_free_coherent(core->dev,
> +			dma_free_coherent(rkvdec->main_core->dev,
>  					  rcb_size,
>  					  cfg->rcb_bufs[i].cpu,
>  					  cfg->rcb_bufs[i].dma);
> @@ -97,7 +97,8 @@ void rkvdec_free_rcb(struct rkvdec_core *core)
>  	core->rcb_config = NULL;
>  }
>  
> -int rkvdec_allocate_rcb(struct rkvdec_core *core, u32 width, u32 height,
> +int rkvdec_allocate_rcb(struct rkvdec_dev *rkvdec, struct rkvdec_core *core,
> +			u32 width, u32 height,
>  			const struct rcb_size_info *size_info,
>  			size_t rcb_count)
>  {
> @@ -132,7 +133,7 @@ int rkvdec_allocate_rcb(struct rkvdec_core *core, u32 width, u32 height,
>  
>  		/* Try allocating an SRAM buffer */
>  		if (core->sram_pool) {
> -			if (core->iommu_domain)
> +			if (rkvdec->iommu_global_domain)
>  				rcb_size = ALIGN(rcb_size, SZ_4K);
>  
>  			cpu = gen_pool_dma_zalloc_align(core->sram_pool,
> @@ -142,11 +143,11 @@ int rkvdec_allocate_rcb(struct rkvdec_core *core, u32 width, u32 height,
>  		}
>  
>  		/* If an IOMMU is used, map the SRAM address through it */
> -		if (cpu && core->iommu_domain) {
> +		if (cpu && rkvdec->iommu_global_domain) {
>  			unsigned long virt_addr = (unsigned long)cpu;
>  			phys_addr_t phys_addr = dma;
>  
> -			ret = iommu_map(core->iommu_domain, virt_addr, phys_addr,
> +			ret = iommu_map(rkvdec->iommu_global_domain, virt_addr, phys_addr,
>  					rcb_size, IOMMU_READ | IOMMU_WRITE, 0);
>  			if (ret) {
>  				gen_pool_free(core->sram_pool,
> @@ -166,7 +167,7 @@ int rkvdec_allocate_rcb(struct rkvdec_core *core, u32 width, u32 height,
>  ram_fallback:
>  		/* Fallback to RAM */
>  		if (!cpu) {
> -			cpu = dma_alloc_coherent(core->dev,
> +			cpu = dma_alloc_coherent(rkvdec->main_core->dev,
>  						 rcb_size,
>  						 &dma,
>  						 GFP_KERNEL);
> @@ -189,7 +190,7 @@ int rkvdec_allocate_rcb(struct rkvdec_core *core, u32 width, u32 height,
>  	return 0;
>  
>  err_alloc:
> -	rkvdec_free_rcb(core);
> +	rkvdec_free_rcb(rkvdec, core);
>  
>  	return ret;
>  }
> diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h
> index a12af9b7dc2b..d1149afe7fda 100644
> --- a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h
> +++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h
> @@ -8,6 +8,7 @@
>  
>  #include <linux/types.h>
>  
> +struct rkvdec_dev;
>  struct rkvdec_ctx;
>  struct rkvdec_core;
>  
> @@ -21,11 +22,12 @@ struct rcb_size_info {
>  	enum rcb_axis axis;
>  };
>  
> -int rkvdec_allocate_rcb(struct rkvdec_core *core, u32 width, u32 height,
> +int rkvdec_allocate_rcb(struct rkvdec_dev *rkvdec, struct rkvdec_core *core,
> +			u32 width, u32 height,
>  			const struct rcb_size_info *size_info,
>  			size_t rcb_count);
>  dma_addr_t rkvdec_rcb_buf_dma_addr(struct rkvdec_ctx *ctx, int id);
>  size_t rkvdec_rcb_buf_size(struct rkvdec_ctx *ctx, int id);
>  int rkvdec_rcb_buf_count(struct rkvdec_ctx *ctx);
>  bool rkvdec_rcb_buf_validate_size(struct rkvdec_ctx *ctx);
> -void rkvdec_free_rcb(struct rkvdec_core *core);
> +void rkvdec_free_rcb(struct rkvdec_dev *rkvdec, struct rkvdec_core *core);
> diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec.c b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
> index c2818f1575ef..2930e9b64906 100644
> --- a/drivers/media/platform/rockchip/rkvdec/rkvdec.c
> +++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
> @@ -1204,9 +1204,9 @@ static void rkvdec_device_run(void *priv)
>  	}
>  
>  	if (!rkvdec_rcb_buf_validate_size(ctx)) {
> -		rkvdec_free_rcb(ctx->core);
> +		rkvdec_free_rcb(ctx->dev, ctx->core);
>  
> -		ret = rkvdec_allocate_rcb(ctx->core,
> +		ret = rkvdec_allocate_rcb(ctx->dev, ctx->core,
>  					  ctx->decoded_fmt.fmt.pix_mp.width,
>  					  ctx->decoded_fmt.fmt.pix_mp.height,
>  					  ctx->dev->variant->rcb_sizes,
> @@ -1486,6 +1486,7 @@ static void rkvdec_v4l2_cleanup(struct rkvdec_dev *rkvdec)
>  
>  static void rkvdec_iommu_restore(struct rkvdec_core *core)
>  {
> +	int ret;
>  	if (core->empty_domain) {
>  		/*
>  		 * To rewrite mapping into the attached IOMMU core, attach a new empty domain that
> @@ -1494,8 +1495,14 @@ static void rkvdec_iommu_restore(struct rkvdec_core *core)
>  		 * This is safely done in this interrupt handler to make sure no memory get mapped
>  		 * through the IOMMU while the empty domain is attached.
>  		 */
> -		iommu_attach_device(core->empty_domain, core->dev);
> +		iommu_detach_device(core->curr_ctx->dev->iommu_global_domain, core->dev);
> +		ret = iommu_attach_device(core->empty_domain, core->dev);
> +		if (ret)
> +			dev_warn(core->dev, "Cannot attach empty domain: %d\n", ret);
>  		iommu_detach_device(core->empty_domain, core->dev);
> +		ret = iommu_attach_device(core->curr_ctx->dev->iommu_global_domain, core->dev);
> +		if (ret)
> +			dev_warn(core->dev, "Cannot attach global domain: %d\n", ret);
>  	}
>  }
>  
> @@ -1858,6 +1865,8 @@ static int rkvdec_probe(struct platform_device *pdev)
>  
>  	core = &rkvdec->cores[rkvdec->core_count++];
>  
> +	core->id = rkvdec->core_count - 1;
> +
>  	platform_set_drvdata(pdev, rkvdec);
>  	core->dev = &pdev->dev;
>  	INIT_DELAYED_WORK(&core->watchdog_work, rkvdec_watchdog_func);
> @@ -1883,12 +1892,24 @@ static int rkvdec_probe(struct platform_device *pdev)
>  			return PTR_ERR(core->link);
>  	}
>  
> -	core->iommu_domain = iommu_get_domain_for_dev(&pdev->dev);
> -	if (core->iommu_domain) {
> +	if (iommu_get_domain_for_dev(&pdev->dev)) {
>  		core->empty_domain = iommu_paging_domain_alloc(core->dev);
>  
> -		if (!core->empty_domain)
> +		if (IS_ERR(core->empty_domain))
>  			dev_warn(core->dev, "cannot alloc new empty domain\n");
> +
> +		if (!rkvdec->iommu_global_domain) {
> +			rkvdec->iommu_global_domain = iommu_get_domain_for_dev(core->dev);
> +
> +			if (IS_ERR(rkvdec->iommu_global_domain)) {
> +				rkvdec->iommu_global_domain = NULL;
> +				dev_warn_once(core->dev, "cannot alloc new global domain\n");
> +			}
> +		}
> +
> +		ret = iommu_attach_device(rkvdec->iommu_global_domain, core->dev);
> +		if (ret)
> +			dev_warn(core->dev, "cannot attach global domain to core %d\n", core->id);
>  	}
>  
>  	ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32));
> @@ -1961,7 +1982,7 @@ static void rkvdec_remove(struct platform_device *pdev)
>  		if (rkvdec->cores[i].empty_domain)
>  			iommu_domain_free(rkvdec->cores[i].empty_domain);
>  
> -		rkvdec_free_rcb(&rkvdec->cores[i]);
> +		rkvdec_free_rcb(rkvdec, &rkvdec->cores[i]);
>  	}
>  }
>  
> diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec.h b/drivers/media/platform/rockchip/rkvdec/rkvdec.h
> index 4f042a367dc0..ccd766b220c7 100644
> --- a/drivers/media/platform/rockchip/rkvdec/rkvdec.h
> +++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.h
> @@ -135,7 +135,6 @@ struct rkvdec_core {
>  	void __iomem *link;
>  	struct delayed_work watchdog_work;
>  	struct gen_pool *sram_pool;
> -	struct iommu_domain *iommu_domain;
>  	struct iommu_domain *empty_domain;
>  	struct rkvdec_rcb_config *rcb_config;
>  	struct rkvdec_ctx *curr_ctx;
> @@ -155,6 +154,7 @@ struct rkvdec_dev {
>  	unsigned int available_core_count;
>  	spinlock_t cores_lock; /* serializes core list access */
>  	struct rkvdec_core *main_core;
> +	struct iommu_domain *iommu_global_domain;
>  };
>  
>  struct rkvdec_ctx {

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH] mm/arm: pgtable: remove young bit check for pte_valid_user
From: Russell King (Oracle) @ 2026-04-09 14:15 UTC (permalink / raw)
  To: Brian Ruley; +Cc: Steve Capper, Will Deacon, linux-arm-kernel, linux-kernel
In-Reply-To: <20260409125446.981747-1-brian.ruley@gehealthcare.com>

On Thu, Apr 09, 2026 at 03:54:45PM +0300, Brian Ruley wrote:
> Fixes cache desync, which can cause undefined instruction,
> translation and permission faults under heavy memory use.
> 
> This is an old bug introduced in commit 1971188aa196 ("ARM: 7985/1: mm:
> implement pte_accessible for faulting mappings"), which included a check
> for the young bit of a PTE. The underlying assumption was that old pages
> are not cached, therefore, `__sync_icache_dcache' could be skipped
> entirely.
> 
> However, under extreme memory pressure, page migrations happen
> frequently and the assumption of uncached "old" pages does not hold.

The first thing to point out is that PTEs that are marked as "old" are
not mapped into userspace. They need to take a fault to be marked
young, which will involve another call to set_pte(), at which point
pte_valid_user() should return true. Your assumption that this is
about "old" pages being uncached is totally incorrect - there has
never been such an assumption.

> Especially for systems that do not have swap, the migrated pages are
> unequivocally marked old. This presents a problem, as it is possible
> for the original page to be immediately mapped to another VA that
> happens to share the same cache index in VIPT I-cache (we found this
> bug on Cortex-A9). Without cache invalidation, the CPU will see the
> old mapping whose physical page can now be used for a different
> purpose, as illustrated below:



> 
>                 Core                      Physical Memory
>   +-------------------------------+     +------------------+
>   | TLB                           |     |                  |
>   |  VA_A 0xb6e6f -> pfn_q        |     | pfn_q: code      |
>   +-------------------------------+     +------------------+
>   | I-cache                       |
>   |  set[VA_A bits] | tag=pfn_q   |
>   +-------------------------------+
> 
> migrate (kcompactd):
>   1. copy pfn_q --> pfn_r
>   2. free pfn_q
>   3. pte: VA_a -> pfn_r
>   4. pte_mkold(pte) --> !young
>   5. ICIALLUIS skipped (because !young)

At this point, the hardware PTE will be set to zero and the TLB
invalidated. This _should_ mean that any future access should result
in a page permission fault being raised. That will then provoke the
MM to mark the PTE young, which will then result in set_ptes()
being called, and thus __sync_icache_dcache() will be called for
the _neew_ pte (which will be for pfn_r.)

> 
> pfn_src reused (OOM pressure):
>   pte: VA_B -> pfn_q (different code)
> 
> bug:
>                 Core                      Physical Memory
>   +-------------------------------+     +------------------+
>   | TLB (empty)                   |     | pfn_r: old code  |
>   +-------------------------------+     | pfn_q: new code  |
>   | I-cache                       |     +------------------+
>   |  set[VA_A bits] | tag=pfn_q   |<--- wrong instructions
>   +-------------------------------+
> 
> This was verified on ba16-based board (i.MX6Quad/Dual, Cortex-A9) by
> instrumenting the migration code to track recently migrated pages in a
> ring buffer and then dumping them in the undefined instruction fault
> handler. The bug can be triggered with `stress-ng':
> 
>   stress-ng --vm 4 --vm-bytes 2G --vm-method zero-one --verify
> 
> Note that the system we tested on has only 2G of memory, so the test
> triggered the OOM-killer in our case.

So you're saying that stress-ng doesn't reproduce this bug but triggers
the OOM-killer... confused.

Cortex-A9 has been around for a long time - I have systems that still
use Cortex-A9 every day without swap, and they have been rock solid.

If there was a bug like this, I would've expected to see problems, but
I'm not... so, I'm not convinced there's a problem here.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


^ permalink raw reply

* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
From: Catalin Marinas @ 2026-04-09 14:09 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Ryan Roberts, Will Deacon, David Hildenbrand (Arm), Dev Jain,
	Yang Shi, Jinjiang Tu, Kevin Brodsky, linux-arm-kernel,
	linux-kernel, stable
In-Reply-To: <d1ecba64-898f-433b-93d4-7a33b9c3f378@arm.com>

On Thu, Apr 09, 2026 at 10:38:03AM +0100, Suzuki K Poulose wrote:
> On 07/04/2026 18:21, Catalin Marinas wrote:
> > a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
> > introduced force_pte_mapping() but it just copied the logic in the
> > existing can_set_direct_map(). Looking at the linear_map_requires_bbml2
> > assignment, we get (!is_realm_world() && is_realm_world()) and it
> > cancels out, no effect on it but we don't get pte mappings either (even
> > if we don't have BBML2).
> 
> Yep, that's right.
> > 
> > I think we need at least some safety checks:
> > 
> > 1. BBML2_NOABORT support on the boot CPU - continue with the existing
> >     logic (as per Ryan's series)
> > 
> > 2. !system_supports_bbml2_noabort() - split in
> >     linear_map_maybe_split_to_ptes(). This does not currently happen
> >     because linear_map_requires_bbml2 may be false in the absence of
> >     rodata=full. Not sure how to fix this without some variable telling
> >     us how the linear map was mapped. The requires_bbml2 flag doesn't
> > 
> > 3. Panic in arm64_rsi_init() if !BBML2_NOABORT on the boot CPU _and_ we
> >     have block mappings already. People can avoid it with rodata=full
> 
> It looks like this will be a common case :-(
> 
> > 
> > 4. If (3) is a common case, a better alternative is to rewrite the
> >     linear map sometime after arm64_rsi_init() but before we call
> >     split_kernel_leaf_mapping().
> 
> We will explore this route.
> 
> The other option is to move the RSI detection (and the PSCI probe)
> earlier to be able to make better decisions early on. I will play with
> that a bit too.

I thought we could reuse linear_map_split_to_ptes() but this function
assumes that the primary CPU supports BBML2_NOABORT. To do this live,
we'd have to clone the active kernel pgtable hierarchy, switch to it and
then continue with the splitting. kasan_init_shadow() does a bit of this
but not fully as it only cares about the shadow mapping.

Hmm, maybe probing the RSI early is easier ;).

-- 
Catalin


^ permalink raw reply

* Re: [PATCH] media: cedrus: skip invalid H.264 reference list entries
From: Nicolas Dufresne @ 2026-04-09 14:00 UTC (permalink / raw)
  To: Pengpeng Hou, Maxime Ripard
  Cc: Paul Kocialkowski, Mauro Carvalho Chehab, Greg Kroah-Hartman,
	Chen-Yu Tsai, Jernej Skrabec, Samuel Holland, linux-media,
	linux-staging, linux-arm-kernel, linux-sunxi, linux-kernel
In-Reply-To: <20260409223000.0-cedrus-h264-active-reply-pengpeng@iscas.ac.cn>

[-- Attachment #1: Type: text/plain, Size: 648 bytes --]

Le jeudi 09 avril 2026 à 22:30 +0800, Pengpeng Hou a écrit :
> Hi Paul,
> 
> Thanks, that makes sense.
> 
> I agree Cedrus should not silently skip an invalid index in the active
> portion of ref_pic_list0/ref_pic_list1.
> 
> I'll respin this to keep the check at the Cedrus use site rather than
> cedrus_try_ctrl(), but return -EINVAL when one of the active reference
> entries points past V4L2_H264_NUM_DPB_ENTRIES. Entries beyond
> num_ref_idx_l0_active_minus1 / num_ref_idx_l1_active_minus1 will still
> be ignored as before.

Please, let the discussion continue before respinning.

Nicolas

> 
> Thanks,
> Pengpeng
> 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH] media: cedrus: skip invalid H.264 reference list entries
From: Nicolas Dufresne @ 2026-04-09 14:00 UTC (permalink / raw)
  To: Paul Kocialkowski, Pengpeng Hou
  Cc: mripard, mchehab, gregkh, wens, jernej.skrabec, samuel,
	linux-media, linux-staging, linux-arm-kernel, linux-sunxi,
	linux-kernel
In-Reply-To: <adeqkmA9VhaPkSAk@shepard>

[-- Attachment #1: Type: text/plain, Size: 4292 bytes --]

Le jeudi 09 avril 2026 à 15:33 +0200, Paul Kocialkowski a écrit :
> Hi,
> 
> On Tue 24 Mar 26, 16:08, Pengpeng Hou wrote:
> > Cedrus consumes H.264 ref_pic_list0/ref_pic_list1 entries from the
> > stateless slice control and later uses their indices to look up
> > decode->dpb[] in _cedrus_write_ref_list().
> > 
> > Rejecting such controls in cedrus_try_ctrl() would break existing
> > userspace, since stateless H.264 reference lists may legitimately carry
> > out-of-range indices for missing references. Instead, guard the actual
> > DPB lookup in Cedrus and skip entries whose indices do not fit the fixed
> > V4L2_H264_NUM_DPB_ENTRIES array.
> 
> Could you explain why it is legitimate that userspace would pass indices that
> are not in the dpb list? As far as I remember from the H.264 spec, the L0/L1
> lists are constructed from active references only and the number of items
> there
> should be given by num_ref_idx_l0_active_minus1/num_ref_idx_l1_active_minus1.
> We can tolerate invalid data beyond these indices, but certainly not as part
> of the indices that should be valid.
> 
> However I agree that cedrus_try_ctrl is maybe not the right place to check it
> since I'm not sure we are guaranteed that the slice params control will be
> checked before the new DPB (from the same request) is applied, so we might end
> up checking against the dpb from the previous decode request.
> 
> But I think we should error out and not just skip the invalid reference.

Its been a long time I haven't looked into this. But what happens here is that
once you lost a reference, the userspace DPB will hold a gap picture, which as
no backing storage. Since it has no backing storage, there is no cookie
(timestamp) associated with it. This gap picture will still make it to the
reference lists, since the position of the reference in the list is important
(you cannot just remove an item). It is an established practice in userspace to
simply fill the void with an invalid index, typically 0xff, which is always
invalid. Because that's what some userspace do, it became part of our ABI.

Decoders are expected be fault tolerant, though the tolerance level is hardware
specific, and so failing in the common code would be inappropriate (failing in
Cedrus could be acceptable, assuming it can't work with missing references,
which the implementation seems to be fine with).

Hantro G1 notably have a flag to report missing reference to the HW, and it will
manage concealement internally. G2/RKVDEC don't, and we try and pick the most
recent frame as a replacement backing storage, which most of the time minimises
the damages.

As future refinement, we need drivers in the long term to properly report the
damages (perhaps through additional RO request controls). As discussed few years
ago in the error handling wip for rkvdec, the V4L2 doc specify that any sort of
damages known to exist in a frame shall results in the ERROR flag being set. We
can deduce that the error flag with a payload of 0 indicates to userspace to not
use the frame (which typically happen on hard errors, or errror at entropy
decode staged) and ERROR flag with a correct payload signal some level of
corruption, and its left to the application to decide what to do.

Nicolas

> 
> All the best,
> 
> Paul
> 
> > 
> > This keeps the fix local to the driver use site and avoids out-of-bounds
> > reads from malformed or unsupported reference list entries.
> > 
> > Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
> > ---
> >  drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > --- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > +++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
> > @@ -210,6 +210,9 @@ static void _cedrus_write_ref_list(struct cedrus_ctx
> > *ctx,
> >  		u8 dpb_idx;
> >  
> >  		dpb_idx = ref_list[i].index;
> > +		if (dpb_idx >= V4L2_H264_NUM_DPB_ENTRIES)
> > +			continue;
> > +
> >  		dpb = &decode->dpb[dpb_idx];
> >  
> >  		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
> > -- 
> > 2.50.1
> > 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH] mm/arm: pgtable: remove young bit check for pte_valid_user
From: Will Deacon @ 2026-04-09 13:56 UTC (permalink / raw)
  To: Brian Ruley
  Cc: Russell King, Steve Capper, Russell King, linux-arm-kernel,
	linux-kernel
In-Reply-To: <20260409125446.981747-1-brian.ruley@gehealthcare.com>

On Thu, Apr 09, 2026 at 03:54:45PM +0300, Brian Ruley wrote:
> Fixes cache desync, which can cause undefined instruction,
> translation and permission faults under heavy memory use.
> 
> This is an old bug introduced in commit 1971188aa196 ("ARM: 7985/1: mm:
> implement pte_accessible for faulting mappings"), which included a check
> for the young bit of a PTE. The underlying assumption was that old pages
> are not cached, therefore, `__sync_icache_dcache' could be skipped
> entirely.
> 
> However, under extreme memory pressure, page migrations happen
> frequently and the assumption of uncached "old" pages does not hold.
> Especially for systems that do not have swap, the migrated pages are
> unequivocally marked old. This presents a problem, as it is possible
> for the original page to be immediately mapped to another VA that
> happens to share the same cache index in VIPT I-cache (we found this
> bug on Cortex-A9). Without cache invalidation, the CPU will see the
> old mapping whose physical page can now be used for a different
> purpose, as illustrated below:
> 
>                 Core                      Physical Memory
>   +-------------------------------+     +------------------+
>   | TLB                           |     |                  |
>   |  VA_A 0xb6e6f -> pfn_q        |     | pfn_q: code      |
>   +-------------------------------+     +------------------+
>   | I-cache                       |
>   |  set[VA_A bits] | tag=pfn_q   |
>   +-------------------------------+
> 
> migrate (kcompactd):
>   1. copy pfn_q --> pfn_r
>   2. free pfn_q
>   3. pte: VA_a -> pfn_r
>   4. pte_mkold(pte) --> !young
>   5. ICIALLUIS skipped (because !young)
> 
> pfn_src reused (OOM pressure):
>   pte: VA_B -> pfn_q (different code)
> 
> bug:
>                 Core                      Physical Memory
>   +-------------------------------+     +------------------+
>   | TLB (empty)                   |     | pfn_r: old code  |
>   +-------------------------------+     | pfn_q: new code  |
>   | I-cache                       |     +------------------+
>   |  set[VA_A bits] | tag=pfn_q   |<--- wrong instructions
>   +-------------------------------+

(nit: Do you have pfn_r and pfn_q mixed up in the "Physical Memory" box?)

> This was verified on ba16-based board (i.MX6Quad/Dual, Cortex-A9) by
> instrumenting the migration code to track recently migrated pages in a
> ring buffer and then dumping them in the undefined instruction fault
> handler. The bug can be triggered with `stress-ng':
> 
>   stress-ng --vm 4 --vm-bytes 2G --vm-method zero-one --verify
> 
> Note that the system we tested on has only 2G of memory, so the test
> triggered the OOM-killer in our case.
> 
> Fixes: 1971188aa196 ("ARM: 7985/1: mm: implement pte_accessible for faulting mappings")
> Signed-off-by: Brian Ruley <brian.ruley@gehealthcare.com>
> ---
>  arch/arm/include/asm/pgtable.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index 6fa9acd6a7f5..e3a5b4a9a65f 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -185,7 +185,7 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
>  #define pte_exec(pte)		(pte_isclear((pte), L_PTE_XN))
>  
>  #define pte_valid_user(pte)	\
> -	(pte_valid(pte) && pte_isset((pte), L_PTE_USER) && pte_young(pte))
> +	(pte_valid(pte) && pte_isset((pte), L_PTE_USER))

This patch is from twelve years ago, so please forgive me for having
forgotten all of the details. However, my recollection is that when using
the classic/!lpae format (as you will be on Cortex-A9), page aging is
implemented by using invalid (translation faulting) ptes for 'old'
mappings.

So in the case you describe, we may well elide the I-cache maintenance,
but won't we also put down an invalid pte? If we later take a fault
on that, we should then perform the cache maintenance when installing
the young entry (via ptep_set_access_flags()). The more interesting part
is probably when the mapping for 'VA_B' is installed to map 'pfn_q' but,
again, I would've expected the cache maintenance to happen just prior to
installing the valid (young) mapping.

Please can you help me to understand the problem better?

Will


^ permalink raw reply

* [PATCH v2] media: cedrus: reject invalid active H.264 ref indices
From: Pengpeng Hou @ 2026-04-09 13:30 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Paul Kocialkowski, Mauro Carvalho Chehab, Greg Kroah-Hartman,
	Chen-Yu Tsai, Jernej Skrabec, Samuel Holland, Nicolas Dufresne,
	linux-media, linux-staging, linux-arm-kernel, linux-sunxi,
	linux-kernel, pengpeng
In-Reply-To: <20260324080856.56787-1-pengpeng@iscas.ac.cn>

Cedrus consumes the active ref_pic_list0/ref_pic_list1 entries and uses
their indices to look up decode->dpb[] in _cedrus_write_ref_list().

Those active portions are the first
num_ref_idx_l0_active_minus1 + 1 / num_ref_idx_l1_active_minus1 + 1
entries in the two reference lists.

An out-of-range index in that active portion can therefore read past the
fixed V4L2_H264_NUM_DPB_ENTRIES array.

Checking this in cedrus_try_ctrl() is awkward because the request-local
DPB state may not have been applied yet. Instead, validate the active
entries at the actual Cedrus use site and fail setup with -EINVAL if one
points past decode->dpb[].

Entries beyond the active reference counts remain ignored as before, so
this does not change how Cedrus treats unused tail data in the
reference-list controls.

Fixes: 6eb9b758e307 ("media: cedrus: Add H264 decoding support")
Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
---
Changes since v1:
- reject invalid indices in the active reference list entries instead of
  silently skipping them
- keep the validation at the Cedrus use site, but propagate -EINVAL back
  through setup

 drivers/staging/media/sunxi/cedrus/cedrus_h264.c | 32 ++++++++++++++--
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
index 3e2843ef6cce..58c411c580f3 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_h264.c
@@ -186,10 +186,10 @@ static int cedrus_write_frame_list(struct cedrus_ctx *ctx,
 
 #define CEDRUS_MAX_REF_IDX	32
 
-static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
-				   struct cedrus_run *run,
-				   const struct v4l2_h264_reference *ref_list,
-				   u8 num_ref, enum cedrus_h264_sram_off sram)
+static int _cedrus_write_ref_list(struct cedrus_ctx *ctx,
+				  struct cedrus_run *run,
+				  const struct v4l2_h264_reference *ref_list,
+				  u8 num_ref, enum cedrus_h264_sram_off sram)
 {
 	const struct v4l2_ctrl_h264_decode_params *decode = run->h264.decode_params;
 	struct vb2_queue *cap_q;
@@ -210,6 +210,9 @@ static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
 		u8 dpb_idx;
 
 		dpb_idx = ref_list[i].index;
+		if (dpb_idx >= V4L2_H264_NUM_DPB_ENTRIES)
+			return -EINVAL;
+
 		dpb = &decode->dpb[dpb_idx];
 
 		if (!(dpb->flags & V4L2_H264_DPB_ENTRY_FLAG_ACTIVE))
@@ -229,28 +232,30 @@ static void _cedrus_write_ref_list(struct cedrus_ctx *ctx,
 
 	size = min_t(size_t, ALIGN(num_ref, 4), sizeof(sram_array));
 	cedrus_h264_write_sram(dev, sram, &sram_array, size);
+
+	return 0;
 }
 
-static void cedrus_write_ref_list0(struct cedrus_ctx *ctx,
-				   struct cedrus_run *run)
+static int cedrus_write_ref_list0(struct cedrus_ctx *ctx,
+				  struct cedrus_run *run)
 {
 	const struct v4l2_ctrl_h264_slice_params *slice = run->h264.slice_params;
 
-	_cedrus_write_ref_list(ctx, run,
-			       slice->ref_pic_list0,
-			       slice->num_ref_idx_l0_active_minus1 + 1,
-			       CEDRUS_SRAM_H264_REF_LIST_0);
+	return _cedrus_write_ref_list(ctx, run,
+				      slice->ref_pic_list0,
+				      slice->num_ref_idx_l0_active_minus1 + 1,
+				      CEDRUS_SRAM_H264_REF_LIST_0);
 }
 
-static void cedrus_write_ref_list1(struct cedrus_ctx *ctx,
-				   struct cedrus_run *run)
+static int cedrus_write_ref_list1(struct cedrus_ctx *ctx,
+				  struct cedrus_run *run)
 {
 	const struct v4l2_ctrl_h264_slice_params *slice = run->h264.slice_params;
 
-	_cedrus_write_ref_list(ctx, run,
-			       slice->ref_pic_list1,
-			       slice->num_ref_idx_l1_active_minus1 + 1,
-			       CEDRUS_SRAM_H264_REF_LIST_1);
+	return _cedrus_write_ref_list(ctx, run,
+				      slice->ref_pic_list1,
+				      slice->num_ref_idx_l1_active_minus1 + 1,
+				      CEDRUS_SRAM_H264_REF_LIST_1);
 }
 
 static void cedrus_write_scaling_lists(struct cedrus_ctx *ctx,
@@ -338,8 +343,8 @@ static void cedrus_skip_bits(struct cedrus_dev *dev, int num)
 	}
 }
 
-static void cedrus_set_params(struct cedrus_ctx *ctx,
-			      struct cedrus_run *run)
+static int cedrus_set_params(struct cedrus_ctx *ctx,
+			     struct cedrus_run *run)
 {
 	const struct v4l2_ctrl_h264_decode_params *decode = run->h264.decode_params;
 	const struct v4l2_ctrl_h264_slice_params *slice = run->h264.slice_params;
@@ -351,6 +356,7 @@ static void cedrus_set_params(struct cedrus_ctx *ctx,
 	size_t slice_bytes = vb2_get_plane_payload(src_buf, 0);
 	unsigned int pic_width_in_mbs;
 	bool mbaff_pic;
+	int ret;
 	u32 reg;
 
 	cedrus_write(dev, VE_H264_VLD_LEN, slice_bytes * 8);
@@ -393,11 +399,17 @@ static void cedrus_set_params(struct cedrus_ctx *ctx,
 
 	if ((slice->slice_type == V4L2_H264_SLICE_TYPE_P) ||
 	    (slice->slice_type == V4L2_H264_SLICE_TYPE_SP) ||
-	    (slice->slice_type == V4L2_H264_SLICE_TYPE_B))
-		cedrus_write_ref_list0(ctx, run);
+	    (slice->slice_type == V4L2_H264_SLICE_TYPE_B)) {
+		ret = cedrus_write_ref_list0(ctx, run);
+		if (ret)
+			return ret;
+	}
 
-	if (slice->slice_type == V4L2_H264_SLICE_TYPE_B)
-		cedrus_write_ref_list1(ctx, run);
+	if (slice->slice_type == V4L2_H264_SLICE_TYPE_B) {
+		ret = cedrus_write_ref_list1(ctx, run);
+		if (ret)
+			return ret;
+	}
 
 	// picture parameters
 	reg = 0;
@@ -478,6 +490,8 @@ static void cedrus_set_params(struct cedrus_ctx *ctx,
 		     VE_H264_CTRL_SLICE_DECODE_INT |
 		     VE_H264_CTRL_DECODE_ERR_INT |
 		     VE_H264_CTRL_VLD_DATA_REQ_INT);
+
+	return 0;
 }
 
 static enum cedrus_irq_status
@@ -531,9 +545,7 @@ static int cedrus_h264_setup(struct cedrus_ctx *ctx, struct cedrus_run *run)
 	if (ret)
 		return ret;
 
-	cedrus_set_params(ctx, run);
-
-	return 0;
+	return cedrus_set_params(ctx, run);
 }
 
 static int cedrus_h264_start(struct cedrus_ctx *ctx)
-- 
2.50.1



^ permalink raw reply related

* Re: [PATCH] media: cedrus: skip invalid H.264 reference list entries
From: Pengpeng Hou @ 2026-04-09 14:30 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Paul Kocialkowski, Mauro Carvalho Chehab, Greg Kroah-Hartman,
	Chen-Yu Tsai, Jernej Skrabec, Samuel Holland, Nicolas Dufresne,
	linux-media, linux-staging, linux-arm-kernel, linux-sunxi,
	linux-kernel, pengpeng
In-Reply-To: <20260324080856.56787-1-pengpeng@iscas.ac.cn>

Hi Paul,

Thanks, that makes sense.

I agree Cedrus should not silently skip an invalid index in the active
portion of ref_pic_list0/ref_pic_list1.

I'll respin this to keep the check at the Cedrus use site rather than
cedrus_try_ctrl(), but return -EINVAL when one of the active reference
entries points past V4L2_H264_NUM_DPB_ENTRIES. Entries beyond
num_ref_idx_l0_active_minus1 / num_ref_idx_l1_active_minus1 will still
be ignored as before.

Thanks,
Pengpeng




^ permalink raw reply

* Re: [PATCH] clk: clk-imx8mm: Initialize clocks in arch_initcall
From: Abel Vesa @ 2026-04-09 13:54 UTC (permalink / raw)
  To: Paul Geurts
  Cc: abelvesa, peng.fan, mturquette, sboyd, Frank.Li, s.hauer, kernel,
	festevam, shawnguo, linux-clk, imx, linux-arm-kernel,
	linux-kernel, martijn.de.gouw
In-Reply-To: <20260408101313.2082125-1-paul.geurts@prodrive-technologies.com>

On 26-04-08 12:13:13, Paul Geurts wrote:
> The i.MX8MM clock driver is implemented as module_platform_driver();,
> which makes it initialize in device_initcall(). This means that all
> drivers referencing the clock driver nodes in the device tree are
> deferred by fw_devlink, which are most of the i.MX8M platform drivers.
> 
> Explicitly initialize the clock driver in arch_initcall(), to make sure
> the clock driver is ready when the rest of the drivers are probed.
> 
> Fixes: af7e7ee0e428 ("clk: imx8mm: Switch to platform driver")
> Signed-off-by: Paul Geurts <paul.geurts@prodrive-technologies.com>

Nack.

Nothing wrong with probe deferring. It is there to ensure the order
the drivers probe in is correct.

Plus, moving it to arch_initcall won't solve anything.

fw_devlink will not stop the driver from probing if there is no provider
that this driver is waiting for. And if there is a provider that is
needed by this clock controller, moving it to arch_initcall won't
magically skip waiting for the provider anyway.


^ permalink raw reply

* Re: [PATCH RFC net-next 0/4] improve hw flow offload byte accounting
From: Pablo Neira Ayuso @ 2026-04-09 13:52 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Felix Fietkau, John Crispin, Lorenzo Bianconi, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
	Florian Westphal, Phil Sutter, netdev, linux-kernel,
	linux-arm-kernel, linux-mediatek, netfilter-devel, coreteam
In-Reply-To: <cover.1775739840.git.daniel@makrotopia.org>

On Thu, Apr 09, 2026 at 02:07:22PM +0100, Daniel Golle wrote:
> Hardware flow counters report raw byte counts whose semantics
> vary by vendor -- some count ingress L2 frames, others egress
> L2, others L3. The nf_flow_table framework currently passes
> these bytes straight to conntrack without conversion, and
> sub-interfaces (VLAN, PPPoE) that are bypassed by hw offload
> never see any counter updates at all.

I see, but that is part of the feature itself? Why pretend that these
interface are really seeing traffic while they don't. This aspiration
of trying to do all hardware offload fully transparent (when it is not
the case, not mentioning semantic changes in how packet handling is
done compared to the software plane) does not sound convincing to me.

On top of this, this issue also exists in the software plane: Devices
that are bypasses do not get their counters bumped.

Maybe if this is really a requirement, then this should address the
issue for software too, but is it worth the effort to add
infrastructure for this purpose?

> This series lets drivers declare what their counters represent,
> so the framework can normalize to L3 for conntrack and
> propagate per-layer stats to encap sub-interfaces.
> 
> Questions:
>  - Sub-interface stats accesses vlan_dev_priv() directly --
>    should there be a generic netdev callback instead?
>  - Are there hw offload drivers whose counters do not fit the
>    ingress-L2 / egress-L2 / L3 model?
> 
> Daniel Golle (4):
>   net: flow_offload: let drivers report byte counter semantics
>   nf_flow_table: track sub-interface and bridge ifindex in flow tuple
>   nf_flow_table: convert hw byte counts and update sub-interface stats
>   net: ethernet: mtk_eth_soc: report INGRESS_L2 byte_type in flow stats
> 
>  .../net/ethernet/mediatek/mtk_ppe_offload.c   |   1 +
>  include/net/flow_offload.h                    |   7 +
>  include/net/netfilter/nf_flow_table.h         |   5 +
>  net/netfilter/nf_flow_table_core.c            |   2 +
>  net/netfilter/nf_flow_table_offload.c         | 174 +++++++++++++++++-
>  net/netfilter/nf_flow_table_path.c            |   8 +
>  6 files changed, 195 insertions(+), 2 deletions(-)
> 
> -- 
> 2.53.0


^ permalink raw reply

* [PATCH 7/7] media: rkvdec: Add multicore IOMMU support
From: Detlev Casanova @ 2026-04-09 13:50 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Ezequiel Garcia, Heiko Stuebner,
	Nicolas Dufresne, Hans Verkuil, Jonas Karlman
  Cc: kernel, linux-media, linux-kernel, linux-rockchip,
	linux-arm-kernel, Detlev Casanova
In-Reply-To: <20260409-rkvdec-multicore-v1-0-62b316abf0f7@collabora.com>

As each core has its own IOMMU core, buffers must be mapped in each
core's IOMMU so that any run() call can use any core without having to
remap everything.

To do that, we use rockchip iommu domain's iommu devices list.
With that, one IOMMU domain can be mapped on multiple devices, meaning
that each call to iommu_map() will flush the new mapping on all devices
in the list.

The IOMMU domain that will have all devices in its list is the first
core's default domain.

Another domain cannot be used because VB2 allocates buffers through the
DMA engine, which uses iommu_get_dma_domain() to find the domain to map
buffers through.

The IOMMU restore function can still work as before, but needs to be more
explicit in what domain to attach the device to.
That is because detaching the empty domain will reattach the core's default
domain, which is wrong (except for the first "main" core).

The RCB temporary buffers are allocated in a dedicated SRAM, each
core has its own SRAM, so the mapping for each core's SRAM is added in the
global domain.

Everything else is mapped through the first core's default domain, making
the driver write the mappings on both IOMMU cores.

Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
---
 .../media/platform/rockchip/rkvdec/rkvdec-rcb.c    | 21 ++++++-------
 .../media/platform/rockchip/rkvdec/rkvdec-rcb.h    |  6 ++--
 drivers/media/platform/rockchip/rkvdec/rkvdec.c    | 35 +++++++++++++++++-----
 drivers/media/platform/rockchip/rkvdec/rkvdec.h    |  2 +-
 4 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c
index 190fb7438e8c..977e37cf209b 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c
@@ -57,7 +57,7 @@ bool rkvdec_rcb_buf_validate_size(struct rkvdec_ctx *ctx)
 	return ret;
 }
 
-void rkvdec_free_rcb(struct rkvdec_core *core)
+void rkvdec_free_rcb(struct rkvdec_dev *rkvdec, struct rkvdec_core *core)
 {
 	struct rkvdec_rcb_config *cfg = core->rcb_config;
 	unsigned long virt_addr;
@@ -76,12 +76,12 @@ void rkvdec_free_rcb(struct rkvdec_core *core)
 		case RKVDEC_ALLOC_SRAM:
 			virt_addr = (unsigned long)cfg->rcb_bufs[i].cpu;
 
-			if (core->iommu_domain)
-				iommu_unmap(core->iommu_domain, virt_addr, rcb_size);
+			if (rkvdec->iommu_global_domain)
+				iommu_unmap(rkvdec->iommu_global_domain, virt_addr, rcb_size);
 			gen_pool_free(core->sram_pool, virt_addr, rcb_size);
 			break;
 		case RKVDEC_ALLOC_DMA:
-			dma_free_coherent(core->dev,
+			dma_free_coherent(rkvdec->main_core->dev,
 					  rcb_size,
 					  cfg->rcb_bufs[i].cpu,
 					  cfg->rcb_bufs[i].dma);
@@ -97,7 +97,8 @@ void rkvdec_free_rcb(struct rkvdec_core *core)
 	core->rcb_config = NULL;
 }
 
-int rkvdec_allocate_rcb(struct rkvdec_core *core, u32 width, u32 height,
+int rkvdec_allocate_rcb(struct rkvdec_dev *rkvdec, struct rkvdec_core *core,
+			u32 width, u32 height,
 			const struct rcb_size_info *size_info,
 			size_t rcb_count)
 {
@@ -132,7 +133,7 @@ int rkvdec_allocate_rcb(struct rkvdec_core *core, u32 width, u32 height,
 
 		/* Try allocating an SRAM buffer */
 		if (core->sram_pool) {
-			if (core->iommu_domain)
+			if (rkvdec->iommu_global_domain)
 				rcb_size = ALIGN(rcb_size, SZ_4K);
 
 			cpu = gen_pool_dma_zalloc_align(core->sram_pool,
@@ -142,11 +143,11 @@ int rkvdec_allocate_rcb(struct rkvdec_core *core, u32 width, u32 height,
 		}
 
 		/* If an IOMMU is used, map the SRAM address through it */
-		if (cpu && core->iommu_domain) {
+		if (cpu && rkvdec->iommu_global_domain) {
 			unsigned long virt_addr = (unsigned long)cpu;
 			phys_addr_t phys_addr = dma;
 
-			ret = iommu_map(core->iommu_domain, virt_addr, phys_addr,
+			ret = iommu_map(rkvdec->iommu_global_domain, virt_addr, phys_addr,
 					rcb_size, IOMMU_READ | IOMMU_WRITE, 0);
 			if (ret) {
 				gen_pool_free(core->sram_pool,
@@ -166,7 +167,7 @@ int rkvdec_allocate_rcb(struct rkvdec_core *core, u32 width, u32 height,
 ram_fallback:
 		/* Fallback to RAM */
 		if (!cpu) {
-			cpu = dma_alloc_coherent(core->dev,
+			cpu = dma_alloc_coherent(rkvdec->main_core->dev,
 						 rcb_size,
 						 &dma,
 						 GFP_KERNEL);
@@ -189,7 +190,7 @@ int rkvdec_allocate_rcb(struct rkvdec_core *core, u32 width, u32 height,
 	return 0;
 
 err_alloc:
-	rkvdec_free_rcb(core);
+	rkvdec_free_rcb(rkvdec, core);
 
 	return ret;
 }
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h
index a12af9b7dc2b..d1149afe7fda 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h
@@ -8,6 +8,7 @@
 
 #include <linux/types.h>
 
+struct rkvdec_dev;
 struct rkvdec_ctx;
 struct rkvdec_core;
 
@@ -21,11 +22,12 @@ struct rcb_size_info {
 	enum rcb_axis axis;
 };
 
-int rkvdec_allocate_rcb(struct rkvdec_core *core, u32 width, u32 height,
+int rkvdec_allocate_rcb(struct rkvdec_dev *rkvdec, struct rkvdec_core *core,
+			u32 width, u32 height,
 			const struct rcb_size_info *size_info,
 			size_t rcb_count);
 dma_addr_t rkvdec_rcb_buf_dma_addr(struct rkvdec_ctx *ctx, int id);
 size_t rkvdec_rcb_buf_size(struct rkvdec_ctx *ctx, int id);
 int rkvdec_rcb_buf_count(struct rkvdec_ctx *ctx);
 bool rkvdec_rcb_buf_validate_size(struct rkvdec_ctx *ctx);
-void rkvdec_free_rcb(struct rkvdec_core *core);
+void rkvdec_free_rcb(struct rkvdec_dev *rkvdec, struct rkvdec_core *core);
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec.c b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
index c2818f1575ef..2930e9b64906 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
@@ -1204,9 +1204,9 @@ static void rkvdec_device_run(void *priv)
 	}
 
 	if (!rkvdec_rcb_buf_validate_size(ctx)) {
-		rkvdec_free_rcb(ctx->core);
+		rkvdec_free_rcb(ctx->dev, ctx->core);
 
-		ret = rkvdec_allocate_rcb(ctx->core,
+		ret = rkvdec_allocate_rcb(ctx->dev, ctx->core,
 					  ctx->decoded_fmt.fmt.pix_mp.width,
 					  ctx->decoded_fmt.fmt.pix_mp.height,
 					  ctx->dev->variant->rcb_sizes,
@@ -1486,6 +1486,7 @@ static void rkvdec_v4l2_cleanup(struct rkvdec_dev *rkvdec)
 
 static void rkvdec_iommu_restore(struct rkvdec_core *core)
 {
+	int ret;
 	if (core->empty_domain) {
 		/*
 		 * To rewrite mapping into the attached IOMMU core, attach a new empty domain that
@@ -1494,8 +1495,14 @@ static void rkvdec_iommu_restore(struct rkvdec_core *core)
 		 * This is safely done in this interrupt handler to make sure no memory get mapped
 		 * through the IOMMU while the empty domain is attached.
 		 */
-		iommu_attach_device(core->empty_domain, core->dev);
+		iommu_detach_device(core->curr_ctx->dev->iommu_global_domain, core->dev);
+		ret = iommu_attach_device(core->empty_domain, core->dev);
+		if (ret)
+			dev_warn(core->dev, "Cannot attach empty domain: %d\n", ret);
 		iommu_detach_device(core->empty_domain, core->dev);
+		ret = iommu_attach_device(core->curr_ctx->dev->iommu_global_domain, core->dev);
+		if (ret)
+			dev_warn(core->dev, "Cannot attach global domain: %d\n", ret);
 	}
 }
 
@@ -1858,6 +1865,8 @@ static int rkvdec_probe(struct platform_device *pdev)
 
 	core = &rkvdec->cores[rkvdec->core_count++];
 
+	core->id = rkvdec->core_count - 1;
+
 	platform_set_drvdata(pdev, rkvdec);
 	core->dev = &pdev->dev;
 	INIT_DELAYED_WORK(&core->watchdog_work, rkvdec_watchdog_func);
@@ -1883,12 +1892,24 @@ static int rkvdec_probe(struct platform_device *pdev)
 			return PTR_ERR(core->link);
 	}
 
-	core->iommu_domain = iommu_get_domain_for_dev(&pdev->dev);
-	if (core->iommu_domain) {
+	if (iommu_get_domain_for_dev(&pdev->dev)) {
 		core->empty_domain = iommu_paging_domain_alloc(core->dev);
 
-		if (!core->empty_domain)
+		if (IS_ERR(core->empty_domain))
 			dev_warn(core->dev, "cannot alloc new empty domain\n");
+
+		if (!rkvdec->iommu_global_domain) {
+			rkvdec->iommu_global_domain = iommu_get_domain_for_dev(core->dev);
+
+			if (IS_ERR(rkvdec->iommu_global_domain)) {
+				rkvdec->iommu_global_domain = NULL;
+				dev_warn_once(core->dev, "cannot alloc new global domain\n");
+			}
+		}
+
+		ret = iommu_attach_device(rkvdec->iommu_global_domain, core->dev);
+		if (ret)
+			dev_warn(core->dev, "cannot attach global domain to core %d\n", core->id);
 	}
 
 	ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32));
@@ -1961,7 +1982,7 @@ static void rkvdec_remove(struct platform_device *pdev)
 		if (rkvdec->cores[i].empty_domain)
 			iommu_domain_free(rkvdec->cores[i].empty_domain);
 
-		rkvdec_free_rcb(&rkvdec->cores[i]);
+		rkvdec_free_rcb(rkvdec, &rkvdec->cores[i]);
 	}
 }
 
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec.h b/drivers/media/platform/rockchip/rkvdec/rkvdec.h
index 4f042a367dc0..ccd766b220c7 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec.h
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.h
@@ -135,7 +135,6 @@ struct rkvdec_core {
 	void __iomem *link;
 	struct delayed_work watchdog_work;
 	struct gen_pool *sram_pool;
-	struct iommu_domain *iommu_domain;
 	struct iommu_domain *empty_domain;
 	struct rkvdec_rcb_config *rcb_config;
 	struct rkvdec_ctx *curr_ctx;
@@ -155,6 +154,7 @@ struct rkvdec_dev {
 	unsigned int available_core_count;
 	spinlock_t cores_lock; /* serializes core list access */
 	struct rkvdec_core *main_core;
+	struct iommu_domain *iommu_global_domain;
 };
 
 struct rkvdec_ctx {

-- 
2.53.0



^ permalink raw reply related

* [PATCH 5/7] media: rkvdec: Add multicore support
From: Detlev Casanova @ 2026-04-09 13:50 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Ezequiel Garcia, Heiko Stuebner,
	Nicolas Dufresne, Hans Verkuil, Jonas Karlman
  Cc: kernel, linux-media, linux-kernel, linux-rockchip,
	linux-arm-kernel, Detlev Casanova
In-Reply-To: <20260409-rkvdec-multicore-v1-0-62b316abf0f7@collabora.com>

Rockchip SoCs like the RK3588 have multiple independent decoder cores
They are defined in the device tree as separate DT nodes sharing the same
compatible.
Rework the driver to discover and drive all of them from the same v4l2
device.

During probe, rkvdec_probe_get_first() finds the first available DT
node with a matching compatible. If the probing device is that node, a
new rkvdec_dev is allocated and a rkvdec_core is added to it to become
the main core.
Otherwise the existing rkvdec_dev is retrieved via platform_get_drvdata()
and the new core is appended to the list.
V4L2/m2m registration happens only once, on the main core.

Per-core hardware resources (MMIO, clocks, SRAM pool, IOMMU domain,
watchdog, RCB buffers) are moved into a new struct rkvdec_core.
A spinlock-protected pool of available cores tracks which cores are
idle and acquire_core() is used to pop a core before each decode, with
release_core() returning it after completion.

The m2m job_ready callback is implemented to prevent scheduling a
context that is already actively decoding on a core, allowing the
framework to move on to the next queued context instead.

Buffer completion is split from job completion:
v4l2_m2m_buf_done_manual() marks buffers done and release_core()
returns the core, then v4l2_m2m_try_schedule() is called to wake
waiters. v4l2_m2m_job_finish() is called unconditionally at the end
of device_run() so the m2m framework can immediately schedule the next
job for enother core, while this one is still running.

This mechanism allows jobs from multiple contexts to run simultaneously,
but does not allow parallele decoding within a single context.
Scheduling such jobs would need more analyses on reference frames
availability for each job to run, and could not yield impressive
performance gain if frames depend on the previous one being fully
decoded.

Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
---
 .../media/platform/rockchip/rkvdec/rkvdec-h264.c   |  17 +-
 .../media/platform/rockchip/rkvdec/rkvdec-hevc.c   |  16 +-
 .../media/platform/rockchip/rkvdec/rkvdec-rcb.c    |  58 ++-
 .../media/platform/rockchip/rkvdec/rkvdec-rcb.h    |   5 +-
 .../platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c |  24 +-
 .../platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c |  24 +-
 .../platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c |  24 +-
 .../platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c |  26 +-
 .../media/platform/rockchip/rkvdec/rkvdec-vp9.c    |  27 +-
 drivers/media/platform/rockchip/rkvdec/rkvdec.c    | 430 +++++++++++++--------
 drivers/media/platform/rockchip/rkvdec/rkvdec.h    |  29 +-
 11 files changed, 401 insertions(+), 279 deletions(-)

diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c
index d3202cecb988..215676c55069 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c
@@ -248,6 +248,7 @@ static void set_poc_reg(struct rkvdec_regs *regs, uint32_t poc, int id, bool bot
 static void config_registers(struct rkvdec_ctx *ctx,
 			     struct rkvdec_h264_run *run)
 {
+	struct rkvdec_core *core = ctx->core;
 	struct rkvdec_dev *rkvdec = ctx->dev;
 	const struct v4l2_ctrl_h264_decode_params *dec_params = run->decode_params;
 	const struct v4l2_ctrl_h264_sps *sps = run->sps;
@@ -354,7 +355,7 @@ static void config_registers(struct rkvdec_ctx *ctx,
 	offset = offsetof(struct rkvdec_h264_priv_tbl, err_info);
 	regs->h26x.errorinfo_base = priv_start_addr + offset;
 
-	rkvdec_memcpy_toio(rkvdec->regs, regs,
+	rkvdec_memcpy_toio(core->regs, regs,
 			   MIN(sizeof(*regs), sizeof(u32) * rkvdec->variant->num_regs));
 }
 
@@ -379,7 +380,7 @@ static int rkvdec_h264_start(struct rkvdec_ctx *ctx)
 	if (!h264_ctx)
 		return -ENOMEM;
 
-	priv_tbl = dma_alloc_coherent(rkvdec->dev, sizeof(*priv_tbl),
+	priv_tbl = dma_alloc_coherent(rkvdec->main_core->dev, sizeof(*priv_tbl),
 				      &h264_ctx->priv_tbl.dma, GFP_KERNEL);
 	if (!priv_tbl) {
 		ret = -ENOMEM;
@@ -404,7 +405,7 @@ static void rkvdec_h264_stop(struct rkvdec_ctx *ctx)
 	struct rkvdec_h264_ctx *h264_ctx = ctx->priv;
 	struct rkvdec_dev *rkvdec = ctx->dev;
 
-	dma_free_coherent(rkvdec->dev, h264_ctx->priv_tbl.size,
+	dma_free_coherent(rkvdec->main_core->dev, h264_ctx->priv_tbl.size,
 			  h264_ctx->priv_tbl.cpu, h264_ctx->priv_tbl.dma);
 	kfree(h264_ctx);
 }
@@ -412,7 +413,7 @@ static void rkvdec_h264_stop(struct rkvdec_ctx *ctx)
 static int rkvdec_h264_run(struct rkvdec_ctx *ctx)
 {
 	struct v4l2_h264_reflist_builder reflist_builder;
-	struct rkvdec_dev *rkvdec = ctx->dev;
+	struct rkvdec_core *core = ctx->core;
 	struct rkvdec_h264_ctx *h264_ctx = ctx->priv;
 	struct rkvdec_h264_run run;
 	struct rkvdec_h264_priv_tbl *tbl = h264_ctx->priv_tbl.cpu;
@@ -434,15 +435,15 @@ static int rkvdec_h264_run(struct rkvdec_ctx *ctx)
 
 	rkvdec_run_postamble(ctx, &run.base);
 
-	schedule_delayed_work(&rkvdec->watchdog_work, msecs_to_jiffies(2000));
+	schedule_delayed_work(&core->watchdog_work, msecs_to_jiffies(2000));
 
-	writel(1, rkvdec->regs + RKVDEC_REG_PREF_LUMA_CACHE_COMMAND);
-	writel(1, rkvdec->regs + RKVDEC_REG_PREF_CHR_CACHE_COMMAND);
+	writel(1, core->regs + RKVDEC_REG_PREF_LUMA_CACHE_COMMAND);
+	writel(1, core->regs + RKVDEC_REG_PREF_CHR_CACHE_COMMAND);
 
 	/* Start decoding! */
 	writel(RKVDEC_INTERRUPT_DEC_E | RKVDEC_CONFIG_DEC_CLK_GATE_E |
 	       RKVDEC_TIMEOUT_E | RKVDEC_BUF_EMPTY_E,
-	       rkvdec->regs + RKVDEC_REG_INTERRUPT);
+	       core->regs + RKVDEC_REG_INTERRUPT);
 
 	return 0;
 }
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c
index ac8b825d080a..fd664ac63698 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c
@@ -402,6 +402,7 @@ static void assemble_sw_rps(struct rkvdec_ctx *ctx,
 static void config_registers(struct rkvdec_ctx *ctx,
 			     struct rkvdec_hevc_run *run)
 {
+	struct rkvdec_core *core = ctx->core;
 	struct rkvdec_dev *rkvdec = ctx->dev;
 	const struct v4l2_ctrl_hevc_decode_params *decode_params = run->decode_params;
 	const struct v4l2_ctrl_hevc_sps *sps = run->sps;
@@ -498,7 +499,7 @@ static void config_registers(struct rkvdec_ctx *ctx,
 	offset = offsetof(struct rkvdec_hevc_priv_tbl, rps);
 	regs->h26x.rps_base = priv_start_addr + offset;
 
-	rkvdec_memcpy_toio(rkvdec->regs, regs,
+	rkvdec_memcpy_toio(core->regs, regs,
 			   MIN(sizeof(*regs), sizeof(u32) * rkvdec->variant->num_regs));
 }
 
@@ -532,7 +533,7 @@ static int rkvdec_hevc_start(struct rkvdec_ctx *ctx)
 	if (!hevc_ctx)
 		return -ENOMEM;
 
-	priv_tbl = dma_alloc_coherent(rkvdec->dev, sizeof(*priv_tbl),
+	priv_tbl = dma_alloc_coherent(rkvdec->main_core->dev, sizeof(*priv_tbl),
 				      &hevc_ctx->priv_tbl.dma, GFP_KERNEL);
 	if (!priv_tbl) {
 		kfree(hevc_ctx);
@@ -553,13 +554,14 @@ static void rkvdec_hevc_stop(struct rkvdec_ctx *ctx)
 	struct rkvdec_hevc_ctx *hevc_ctx = ctx->priv;
 	struct rkvdec_dev *rkvdec = ctx->dev;
 
-	dma_free_coherent(rkvdec->dev, hevc_ctx->priv_tbl.size,
+	dma_free_coherent(rkvdec->main_core->dev, hevc_ctx->priv_tbl.size,
 			  hevc_ctx->priv_tbl.cpu, hevc_ctx->priv_tbl.dma);
 	kfree(hevc_ctx);
 }
 
 static int rkvdec_hevc_run(struct rkvdec_ctx *ctx)
 {
+	struct rkvdec_core *core = ctx->core;
 	struct rkvdec_dev *rkvdec = ctx->dev;
 	struct rkvdec_hevc_run run;
 	struct rkvdec_hevc_ctx *hevc_ctx = ctx->priv;
@@ -576,10 +578,10 @@ static int rkvdec_hevc_run(struct rkvdec_ctx *ctx)
 
 	rkvdec_run_postamble(ctx, &run.base);
 
-	schedule_delayed_work(&rkvdec->watchdog_work, msecs_to_jiffies(2000));
+	schedule_delayed_work(&core->watchdog_work, msecs_to_jiffies(2000));
 
-	writel(1, rkvdec->regs + RKVDEC_REG_PREF_LUMA_CACHE_COMMAND);
-	writel(1, rkvdec->regs + RKVDEC_REG_PREF_CHR_CACHE_COMMAND);
+	writel(1, core->regs + RKVDEC_REG_PREF_LUMA_CACHE_COMMAND);
+	writel(1, core->regs + RKVDEC_REG_PREF_CHR_CACHE_COMMAND);
 
 	if (rkvdec->variant->quirks & RKVDEC_QUIRK_DISABLE_QOS)
 		rkvdec_quirks_disable_qos(ctx);
@@ -589,7 +591,7 @@ static int rkvdec_hevc_run(struct rkvdec_ctx *ctx)
 		0 : RKVDEC_WR_DDR_ALIGN_EN;
 	writel(RKVDEC_INTERRUPT_DEC_E | RKVDEC_CONFIG_DEC_CLK_GATE_E |
 	       RKVDEC_TIMEOUT_E | RKVDEC_BUF_EMPTY_E | reg,
-	       rkvdec->regs + RKVDEC_REG_INTERRUPT);
+	       core->regs + RKVDEC_REG_INTERRUPT);
 
 	return 0;
 }
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c
index 191f78278c01..190fb7438e8c 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c
@@ -29,38 +29,37 @@ static size_t rkvdec_rcb_size(const struct rcb_size_info *size_info,
 
 dma_addr_t rkvdec_rcb_buf_dma_addr(struct rkvdec_ctx *ctx, int id)
 {
-	return ctx->rcb_config->rcb_bufs[id].dma;
+	return ctx->core->rcb_config->rcb_bufs[id].dma;
 }
 
 size_t rkvdec_rcb_buf_size(struct rkvdec_ctx *ctx, int id)
 {
-	return ctx->rcb_config->rcb_bufs[id].size;
+	return ctx->core->rcb_config->rcb_bufs[id].size;
 }
 
 int rkvdec_rcb_buf_count(struct rkvdec_ctx *ctx)
 {
-	return ctx->rcb_config->rcb_count;
+	return ctx->core->rcb_config->rcb_count;
 }
 
 bool rkvdec_rcb_buf_validate_size(struct rkvdec_ctx *ctx)
 {
-	struct rkvdec_rcb_config *cfg = ctx->rcb_config;
+	struct rkvdec_rcb_config *cfg = ctx->core->rcb_config;
 
 	bool ret = cfg && cfg->height >= ctx->decoded_fmt.fmt.pix_mp.height &&
 		   cfg->width >= ctx->decoded_fmt.fmt.pix_mp.width;
 
 	if (!ret && cfg) {
-		dev_dbg(ctx->dev->dev, "RCB size %ux%u -> %ux%u\n", cfg->width, cfg->height,
+		dev_dbg(ctx->core->dev, "RCB size %ux%u -> %ux%u\n", cfg->width, cfg->height,
 			ctx->decoded_fmt.fmt.pix_mp.width, ctx->decoded_fmt.fmt.pix_mp.height);
 	}
 
 	return ret;
 }
 
-void rkvdec_free_rcb(struct rkvdec_ctx *ctx)
+void rkvdec_free_rcb(struct rkvdec_core *core)
 {
-	struct rkvdec_dev *dev = ctx->dev;
-	struct rkvdec_rcb_config *cfg = ctx->rcb_config;
+	struct rkvdec_rcb_config *cfg = core->rcb_config;
 	unsigned long virt_addr;
 	int i;
 
@@ -77,12 +76,12 @@ void rkvdec_free_rcb(struct rkvdec_ctx *ctx)
 		case RKVDEC_ALLOC_SRAM:
 			virt_addr = (unsigned long)cfg->rcb_bufs[i].cpu;
 
-			if (dev->iommu_domain)
-				iommu_unmap(dev->iommu_domain, virt_addr, rcb_size);
-			gen_pool_free(dev->sram_pool, virt_addr, rcb_size);
+			if (core->iommu_domain)
+				iommu_unmap(core->iommu_domain, virt_addr, rcb_size);
+			gen_pool_free(core->sram_pool, virt_addr, rcb_size);
 			break;
 		case RKVDEC_ALLOC_DMA:
-			dma_free_coherent(dev->dev,
+			dma_free_coherent(core->dev,
 					  rcb_size,
 					  cfg->rcb_bufs[i].cpu,
 					  cfg->rcb_bufs[i].dma);
@@ -91,33 +90,32 @@ void rkvdec_free_rcb(struct rkvdec_ctx *ctx)
 	}
 
 	if (cfg->rcb_bufs)
-		devm_kfree(dev->dev, cfg->rcb_bufs);
+		devm_kfree(core->dev, cfg->rcb_bufs);
 
-	devm_kfree(dev->dev, cfg);
+	devm_kfree(core->dev, cfg);
 
-	ctx->rcb_config = NULL;
+	core->rcb_config = NULL;
 }
 
-int rkvdec_allocate_rcb(struct rkvdec_ctx *ctx, u32 width, u32 height,
+int rkvdec_allocate_rcb(struct rkvdec_core *core, u32 width, u32 height,
 			const struct rcb_size_info *size_info,
 			size_t rcb_count)
 {
 	int ret, i;
-	struct rkvdec_dev *rkvdec = ctx->dev;
 	struct rkvdec_rcb_config *cfg;
 
 	if (!size_info || !rcb_count) {
-		ctx->rcb_config = NULL;
+		core->rcb_config = NULL;
 		return 0;
 	}
 
-	ctx->rcb_config = devm_kzalloc(rkvdec->dev, sizeof(*ctx->rcb_config), GFP_KERNEL);
-	if (!ctx->rcb_config)
+	core->rcb_config = devm_kzalloc(core->dev, sizeof(*core->rcb_config), GFP_KERNEL);
+	if (!core->rcb_config)
 		return -ENOMEM;
 
-	cfg = ctx->rcb_config;
+	cfg = core->rcb_config;
 
-	cfg->rcb_bufs = devm_kzalloc(rkvdec->dev, sizeof(*cfg->rcb_bufs) * rcb_count, GFP_KERNEL);
+	cfg->rcb_bufs = devm_kzalloc(core->dev, sizeof(*cfg->rcb_bufs) * rcb_count, GFP_KERNEL);
 	if (!cfg->rcb_bufs) {
 		ret = -ENOMEM;
 		goto err_alloc;
@@ -133,25 +131,25 @@ int rkvdec_allocate_rcb(struct rkvdec_ctx *ctx, u32 width, u32 height,
 		enum rkvdec_alloc_type alloc_type = RKVDEC_ALLOC_SRAM;
 
 		/* Try allocating an SRAM buffer */
-		if (ctx->dev->sram_pool) {
-			if (rkvdec->iommu_domain)
+		if (core->sram_pool) {
+			if (core->iommu_domain)
 				rcb_size = ALIGN(rcb_size, SZ_4K);
 
-			cpu = gen_pool_dma_zalloc_align(ctx->dev->sram_pool,
+			cpu = gen_pool_dma_zalloc_align(core->sram_pool,
 							rcb_size,
 							&dma,
 							SZ_4K);
 		}
 
 		/* If an IOMMU is used, map the SRAM address through it */
-		if (cpu && rkvdec->iommu_domain) {
+		if (cpu && core->iommu_domain) {
 			unsigned long virt_addr = (unsigned long)cpu;
 			phys_addr_t phys_addr = dma;
 
-			ret = iommu_map(rkvdec->iommu_domain, virt_addr, phys_addr,
+			ret = iommu_map(core->iommu_domain, virt_addr, phys_addr,
 					rcb_size, IOMMU_READ | IOMMU_WRITE, 0);
 			if (ret) {
-				gen_pool_free(ctx->dev->sram_pool,
+				gen_pool_free(core->sram_pool,
 					      (unsigned long)cpu,
 					      rcb_size);
 				cpu = NULL;
@@ -168,7 +166,7 @@ int rkvdec_allocate_rcb(struct rkvdec_ctx *ctx, u32 width, u32 height,
 ram_fallback:
 		/* Fallback to RAM */
 		if (!cpu) {
-			cpu = dma_alloc_coherent(ctx->dev->dev,
+			cpu = dma_alloc_coherent(core->dev,
 						 rcb_size,
 						 &dma,
 						 GFP_KERNEL);
@@ -191,7 +189,7 @@ int rkvdec_allocate_rcb(struct rkvdec_ctx *ctx, u32 width, u32 height,
 	return 0;
 
 err_alloc:
-	rkvdec_free_rcb(ctx);
+	rkvdec_free_rcb(core);
 
 	return ret;
 }
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h
index 0662a4359bdf..a12af9b7dc2b 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h
@@ -9,6 +9,7 @@
 #include <linux/types.h>
 
 struct rkvdec_ctx;
+struct rkvdec_core;
 
 enum rcb_axis {
 	PIC_WIDTH = 0,
@@ -20,11 +21,11 @@ struct rcb_size_info {
 	enum rcb_axis axis;
 };
 
-int rkvdec_allocate_rcb(struct rkvdec_ctx *ctx, u32 width, u32 height,
+int rkvdec_allocate_rcb(struct rkvdec_core *core, u32 width, u32 height,
 			const struct rcb_size_info *size_info,
 			size_t rcb_count);
 dma_addr_t rkvdec_rcb_buf_dma_addr(struct rkvdec_ctx *ctx, int id);
 size_t rkvdec_rcb_buf_size(struct rkvdec_ctx *ctx, int id);
 int rkvdec_rcb_buf_count(struct rkvdec_ctx *ctx);
 bool rkvdec_rcb_buf_validate_size(struct rkvdec_ctx *ctx);
-void rkvdec_free_rcb(struct rkvdec_ctx *ctx);
+void rkvdec_free_rcb(struct rkvdec_core *core);
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c
index b961fddc8583..667c5d36f3ea 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c
@@ -185,22 +185,22 @@ static void assemble_hw_pps(struct rkvdec_ctx *ctx,
 
 static void rkvdec_write_regs(struct rkvdec_ctx *ctx)
 {
-	struct rkvdec_dev *rkvdec = ctx->dev;
+	struct rkvdec_core *core = ctx->core;
 	struct rkvdec_h264_ctx *h264_ctx = ctx->priv;
 
-	rkvdec_memcpy_toio(rkvdec->regs + OFFSET_COMMON_REGS,
+	rkvdec_memcpy_toio(core->regs + OFFSET_COMMON_REGS,
 			   &h264_ctx->regs.common,
 			   sizeof(h264_ctx->regs.common));
-	rkvdec_memcpy_toio(rkvdec->regs + OFFSET_CODEC_PARAMS_REGS,
+	rkvdec_memcpy_toio(core->regs + OFFSET_CODEC_PARAMS_REGS,
 			   &h264_ctx->regs.h264_param,
 			   sizeof(h264_ctx->regs.h264_param));
-	rkvdec_memcpy_toio(rkvdec->regs + OFFSET_COMMON_ADDR_REGS,
+	rkvdec_memcpy_toio(core->regs + OFFSET_COMMON_ADDR_REGS,
 			   &h264_ctx->regs.common_addr,
 			   sizeof(h264_ctx->regs.common_addr));
-	rkvdec_memcpy_toio(rkvdec->regs + OFFSET_CODEC_ADDR_REGS,
+	rkvdec_memcpy_toio(core->regs + OFFSET_CODEC_ADDR_REGS,
 			   &h264_ctx->regs.h264_addr,
 			   sizeof(h264_ctx->regs.h264_addr));
-	rkvdec_memcpy_toio(rkvdec->regs + OFFSET_POC_HIGHBIT_REGS,
+	rkvdec_memcpy_toio(core->regs + OFFSET_POC_HIGHBIT_REGS,
 			   &h264_ctx->regs.h264_highpoc,
 			   sizeof(h264_ctx->regs.h264_highpoc));
 }
@@ -368,7 +368,6 @@ static void config_registers(struct rkvdec_ctx *ctx,
 
 static int rkvdec_h264_start(struct rkvdec_ctx *ctx)
 {
-	struct rkvdec_dev *rkvdec = ctx->dev;
 	struct rkvdec_h264_priv_tbl *priv_tbl;
 	struct rkvdec_h264_ctx *h264_ctx;
 	struct v4l2_ctrl *ctrl;
@@ -387,7 +386,7 @@ static int rkvdec_h264_start(struct rkvdec_ctx *ctx)
 	if (!h264_ctx)
 		return -ENOMEM;
 
-	priv_tbl = dma_alloc_coherent(rkvdec->dev, sizeof(*priv_tbl),
+	priv_tbl = dma_alloc_coherent(ctx->dev->main_core->dev, sizeof(*priv_tbl),
 				      &h264_ctx->priv_tbl.dma, GFP_KERNEL);
 	if (!priv_tbl) {
 		ret = -ENOMEM;
@@ -410,9 +409,8 @@ static int rkvdec_h264_start(struct rkvdec_ctx *ctx)
 static void rkvdec_h264_stop(struct rkvdec_ctx *ctx)
 {
 	struct rkvdec_h264_ctx *h264_ctx = ctx->priv;
-	struct rkvdec_dev *rkvdec = ctx->dev;
 
-	dma_free_coherent(rkvdec->dev, h264_ctx->priv_tbl.size,
+	dma_free_coherent(ctx->dev->main_core->dev, h264_ctx->priv_tbl.size,
 			  h264_ctx->priv_tbl.cpu, h264_ctx->priv_tbl.dma);
 	kfree(h264_ctx);
 }
@@ -420,7 +418,7 @@ static void rkvdec_h264_stop(struct rkvdec_ctx *ctx)
 static int rkvdec_h264_run(struct rkvdec_ctx *ctx)
 {
 	struct v4l2_h264_reflist_builder reflist_builder;
-	struct rkvdec_dev *rkvdec = ctx->dev;
+	struct rkvdec_core *core = ctx->core;
 	struct rkvdec_h264_ctx *h264_ctx = ctx->priv;
 	struct rkvdec_h264_priv_tbl *tbl = h264_ctx->priv_tbl.cpu;
 	struct rkvdec_h264_run run;
@@ -443,10 +441,10 @@ static int rkvdec_h264_run(struct rkvdec_ctx *ctx)
 
 	rkvdec_run_postamble(ctx, &run.base);
 
-	rkvdec_schedule_watchdog(rkvdec, h264_ctx->regs.common.reg032_timeout_threshold);
+	rkvdec_schedule_watchdog(core, h264_ctx->regs.common.reg032_timeout_threshold);
 
 	/* Start decoding! */
-	writel(VDPU381_DEC_E_BIT, rkvdec->regs + VDPU381_REG_DEC_E);
+	writel(VDPU381_DEC_E_BIT, core->regs + VDPU381_REG_DEC_E);
 
 	return 0;
 }
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c
index fe6414a17551..bd68120b74c6 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c
@@ -356,22 +356,22 @@ static void set_ref_valid(struct rkvdec_vdpu381_regs_hevc *regs, int id, u32 val
 
 static void rkvdec_write_regs(struct rkvdec_ctx *ctx)
 {
-	struct rkvdec_dev *rkvdec = ctx->dev;
+	struct rkvdec_core *core = ctx->core;
 	struct rkvdec_hevc_ctx *hevc_ctx = ctx->priv;
 
-	rkvdec_memcpy_toio(rkvdec->regs + OFFSET_COMMON_REGS,
+	rkvdec_memcpy_toio(core->regs + OFFSET_COMMON_REGS,
 			   &hevc_ctx->regs.common,
 			   sizeof(hevc_ctx->regs.common));
-	rkvdec_memcpy_toio(rkvdec->regs + OFFSET_CODEC_PARAMS_REGS,
+	rkvdec_memcpy_toio(core->regs + OFFSET_CODEC_PARAMS_REGS,
 			   &hevc_ctx->regs.hevc_param,
 			   sizeof(hevc_ctx->regs.hevc_param));
-	rkvdec_memcpy_toio(rkvdec->regs + OFFSET_COMMON_ADDR_REGS,
+	rkvdec_memcpy_toio(core->regs + OFFSET_COMMON_ADDR_REGS,
 			   &hevc_ctx->regs.common_addr,
 			   sizeof(hevc_ctx->regs.common_addr));
-	rkvdec_memcpy_toio(rkvdec->regs + OFFSET_CODEC_ADDR_REGS,
+	rkvdec_memcpy_toio(core->regs + OFFSET_CODEC_ADDR_REGS,
 			   &hevc_ctx->regs.hevc_addr,
 			   sizeof(hevc_ctx->regs.hevc_addr));
-	rkvdec_memcpy_toio(rkvdec->regs + OFFSET_POC_HIGHBIT_REGS,
+	rkvdec_memcpy_toio(core->regs + OFFSET_POC_HIGHBIT_REGS,
 			   &hevc_ctx->regs.hevc_highpoc,
 			   sizeof(hevc_ctx->regs.hevc_highpoc));
 }
@@ -555,7 +555,7 @@ static int rkvdec_hevc_start(struct rkvdec_ctx *ctx)
 	if (!hevc_ctx)
 		return -ENOMEM;
 
-	priv_tbl = dma_alloc_coherent(rkvdec->dev, sizeof(*priv_tbl),
+	priv_tbl = dma_alloc_coherent(rkvdec->main_core->dev, sizeof(*priv_tbl),
 				      &hevc_ctx->priv_tbl.dma, GFP_KERNEL);
 	if (!priv_tbl) {
 		ret = -ENOMEM;
@@ -580,14 +580,14 @@ static void rkvdec_hevc_stop(struct rkvdec_ctx *ctx)
 	struct rkvdec_hevc_ctx *hevc_ctx = ctx->priv;
 	struct rkvdec_dev *rkvdec = ctx->dev;
 
-	dma_free_coherent(rkvdec->dev, hevc_ctx->priv_tbl.size,
+	dma_free_coherent(rkvdec->main_core->dev, hevc_ctx->priv_tbl.size,
 			  hevc_ctx->priv_tbl.cpu, hevc_ctx->priv_tbl.dma);
 	kfree(hevc_ctx);
 }
 
 static int rkvdec_hevc_run(struct rkvdec_ctx *ctx)
 {
-	struct rkvdec_dev *rkvdec = ctx->dev;
+	struct rkvdec_core *core = ctx->core;
 	struct rkvdec_hevc_run run;
 	struct rkvdec_hevc_ctx *hevc_ctx = ctx->priv;
 	struct rkvdec_hevc_priv_tbl *tbl = hevc_ctx->priv_tbl.cpu;
@@ -604,7 +604,7 @@ static int rkvdec_hevc_run(struct rkvdec_ctx *ctx)
 	 */
 	if ((!ctx->has_sps_lt_rps && run.sps->num_long_term_ref_pics_sps) ||
 	    (!ctx->has_sps_st_rps && run.sps->num_short_term_ref_pic_sets)) {
-		dev_warn_ratelimited(rkvdec->dev, "Long and short term RPS not set\n");
+		dev_warn_ratelimited(core->dev, "Long and short term RPS not set\n");
 	} else {
 		rkvdec_hevc_assemble_hw_rps(&run, &tbl->rps, &hevc_ctx->st_cache);
 	}
@@ -613,10 +613,10 @@ static int rkvdec_hevc_run(struct rkvdec_ctx *ctx)
 
 	rkvdec_run_postamble(ctx, &run.base);
 
-	rkvdec_schedule_watchdog(rkvdec, hevc_ctx->regs.common.reg032_timeout_threshold);
+	rkvdec_schedule_watchdog(core, hevc_ctx->regs.common.reg032_timeout_threshold);
 
 	/* Start decoding! */
-	writel(VDPU381_DEC_E_BIT, rkvdec->regs + VDPU381_REG_DEC_E);
+	writel(VDPU381_DEC_E_BIT, ctx->core->regs + VDPU381_REG_DEC_E);
 
 	return 0;
 }
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
index fb4f849d7366..f3a1c31d84d3 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
@@ -291,19 +291,19 @@ static void assemble_hw_pps(struct rkvdec_ctx *ctx,
 
 static void rkvdec_write_regs(struct rkvdec_ctx *ctx)
 {
-	struct rkvdec_dev *rkvdec = ctx->dev;
+	struct rkvdec_core *core = ctx->core;
 	struct rkvdec_h264_ctx *h264_ctx = ctx->priv;
 
-	rkvdec_memcpy_toio(rkvdec->regs + VDPU383_OFFSET_COMMON_REGS,
+	rkvdec_memcpy_toio(core->regs + VDPU383_OFFSET_COMMON_REGS,
 			   &h264_ctx->regs.common,
 			   sizeof(h264_ctx->regs.common));
-	rkvdec_memcpy_toio(rkvdec->regs + VDPU383_OFFSET_COMMON_ADDR_REGS,
+	rkvdec_memcpy_toio(core->regs + VDPU383_OFFSET_COMMON_ADDR_REGS,
 			   &h264_ctx->regs.common_addr,
 			   sizeof(h264_ctx->regs.common_addr));
-	rkvdec_memcpy_toio(rkvdec->regs + VDPU383_OFFSET_CODEC_PARAMS_REGS,
+	rkvdec_memcpy_toio(core->regs + VDPU383_OFFSET_CODEC_PARAMS_REGS,
 			   &h264_ctx->regs.h26x_params,
 			   sizeof(h264_ctx->regs.h26x_params));
-	rkvdec_memcpy_toio(rkvdec->regs + VDPU383_OFFSET_CODEC_ADDR_REGS,
+	rkvdec_memcpy_toio(core->regs + VDPU383_OFFSET_CODEC_ADDR_REGS,
 			   &h264_ctx->regs.h26x_addr,
 			   sizeof(h264_ctx->regs.h26x_addr));
 }
@@ -455,7 +455,7 @@ static int rkvdec_h264_start(struct rkvdec_ctx *ctx)
 	if (!h264_ctx)
 		return -ENOMEM;
 
-	priv_tbl = dma_alloc_coherent(rkvdec->dev, sizeof(*priv_tbl),
+	priv_tbl = dma_alloc_coherent(rkvdec->main_core->dev, sizeof(*priv_tbl),
 				      &h264_ctx->priv_tbl.dma, GFP_KERNEL);
 	if (!priv_tbl) {
 		ret = -ENOMEM;
@@ -481,7 +481,7 @@ static void rkvdec_h264_stop(struct rkvdec_ctx *ctx)
 	struct rkvdec_h264_ctx *h264_ctx = ctx->priv;
 	struct rkvdec_dev *rkvdec = ctx->dev;
 
-	dma_free_coherent(rkvdec->dev, h264_ctx->priv_tbl.size,
+	dma_free_coherent(rkvdec->main_core->dev, h264_ctx->priv_tbl.size,
 			  h264_ctx->priv_tbl.cpu, h264_ctx->priv_tbl.dma);
 	kfree(h264_ctx);
 }
@@ -489,7 +489,7 @@ static void rkvdec_h264_stop(struct rkvdec_ctx *ctx)
 static int rkvdec_h264_run(struct rkvdec_ctx *ctx)
 {
 	struct v4l2_h264_reflist_builder reflist_builder;
-	struct rkvdec_dev *rkvdec = ctx->dev;
+	struct rkvdec_core *core = ctx->core;
 	struct rkvdec_h264_ctx *h264_ctx = ctx->priv;
 	struct rkvdec_h264_run run;
 	struct rkvdec_h264_priv_tbl *tbl = h264_ctx->priv_tbl.cpu;
@@ -514,12 +514,12 @@ static int rkvdec_h264_run(struct rkvdec_ctx *ctx)
 	rkvdec_run_postamble(ctx, &run.base);
 
 	timeout_threshold = h264_ctx->regs.common.reg013_core_timeout_threshold;
-	rkvdec_schedule_watchdog(rkvdec, timeout_threshold);
+	rkvdec_schedule_watchdog(core, timeout_threshold);
 
 	/* Start decoding! */
-	writel(timeout_threshold, rkvdec->link + VDPU383_LINK_TIMEOUT_THRESHOLD);
-	writel(0, rkvdec->link + VDPU383_LINK_IP_ENABLE);
-	writel(VDPU383_DEC_E_BIT, rkvdec->link + VDPU383_LINK_DEC_ENABLE);
+	writel(timeout_threshold, core->link + VDPU383_LINK_TIMEOUT_THRESHOLD);
+	writel(0, core->link + VDPU383_LINK_IP_ENABLE);
+	writel(VDPU383_DEC_E_BIT, core->link + VDPU383_LINK_DEC_ENABLE);
 
 	return 0;
 }
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c
index 96d938ee70b0..5170ca35fd90 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c
@@ -381,19 +381,19 @@ static void assemble_hw_pps(struct rkvdec_ctx *ctx,
 
 static void rkvdec_write_regs(struct rkvdec_ctx *ctx)
 {
-	struct rkvdec_dev *rkvdec = ctx->dev;
+	struct rkvdec_core *core = ctx->core;
 	struct rkvdec_hevc_ctx *h265_ctx = ctx->priv;
 
-	rkvdec_memcpy_toio(rkvdec->regs + VDPU383_OFFSET_COMMON_REGS,
+	rkvdec_memcpy_toio(core->regs + VDPU383_OFFSET_COMMON_REGS,
 			   &h265_ctx->regs.common,
 			   sizeof(h265_ctx->regs.common));
-	rkvdec_memcpy_toio(rkvdec->regs + VDPU383_OFFSET_COMMON_ADDR_REGS,
+	rkvdec_memcpy_toio(core->regs + VDPU383_OFFSET_COMMON_ADDR_REGS,
 			   &h265_ctx->regs.common_addr,
 			   sizeof(h265_ctx->regs.common_addr));
-	rkvdec_memcpy_toio(rkvdec->regs + VDPU383_OFFSET_CODEC_PARAMS_REGS,
+	rkvdec_memcpy_toio(core->regs + VDPU383_OFFSET_CODEC_PARAMS_REGS,
 			   &h265_ctx->regs.h26x_params,
 			   sizeof(h265_ctx->regs.h26x_params));
-	rkvdec_memcpy_toio(rkvdec->regs + VDPU383_OFFSET_CODEC_ADDR_REGS,
+	rkvdec_memcpy_toio(core->regs + VDPU383_OFFSET_CODEC_ADDR_REGS,
 			   &h265_ctx->regs.h26x_addr,
 			   sizeof(h265_ctx->regs.h26x_addr));
 }
@@ -563,7 +563,7 @@ static int rkvdec_hevc_start(struct rkvdec_ctx *ctx)
 	if (!hevc_ctx)
 		return -ENOMEM;
 
-	priv_tbl = dma_alloc_coherent(rkvdec->dev, sizeof(*priv_tbl),
+	priv_tbl = dma_alloc_coherent(rkvdec->main_core->dev, sizeof(*priv_tbl),
 				      &hevc_ctx->priv_tbl.dma, GFP_KERNEL);
 	if (!priv_tbl) {
 		ret = -ENOMEM;
@@ -588,14 +588,14 @@ static void rkvdec_hevc_stop(struct rkvdec_ctx *ctx)
 	struct rkvdec_hevc_ctx *hevc_ctx = ctx->priv;
 	struct rkvdec_dev *rkvdec = ctx->dev;
 
-	dma_free_coherent(rkvdec->dev, hevc_ctx->priv_tbl.size,
+	dma_free_coherent(rkvdec->main_core->dev, hevc_ctx->priv_tbl.size,
 			  hevc_ctx->priv_tbl.cpu, hevc_ctx->priv_tbl.dma);
 	kfree(hevc_ctx);
 }
 
 static int rkvdec_hevc_run(struct rkvdec_ctx *ctx)
 {
-	struct rkvdec_dev *rkvdec = ctx->dev;
+	struct rkvdec_core *core = ctx->core;
 	struct rkvdec_hevc_run run;
 	struct rkvdec_hevc_ctx *hevc_ctx = ctx->priv;
 	struct rkvdec_hevc_priv_tbl *tbl = hevc_ctx->priv_tbl.cpu;
@@ -610,7 +610,7 @@ static int rkvdec_hevc_run(struct rkvdec_ctx *ctx)
 	 */
 	if ((!ctx->has_sps_lt_rps && run.sps->num_long_term_ref_pics_sps) ||
 	    (!ctx->has_sps_st_rps && run.sps->num_short_term_ref_pic_sets)) {
-		dev_err_ratelimited(rkvdec->dev, "Long and short term RPS not set\n");
+		dev_err_ratelimited(core->dev, "Long and short term RPS not set\n");
 		return -EINVAL;
 	}
 
@@ -624,12 +624,12 @@ static int rkvdec_hevc_run(struct rkvdec_ctx *ctx)
 	rkvdec_run_postamble(ctx, &run.base);
 
 	timeout_threshold = hevc_ctx->regs.common.reg013_core_timeout_threshold;
-	rkvdec_schedule_watchdog(rkvdec, timeout_threshold);
+	rkvdec_schedule_watchdog(core, timeout_threshold);
 
 	/* Start decoding! */
-	writel(timeout_threshold, rkvdec->link + VDPU383_LINK_TIMEOUT_THRESHOLD);
-	writel(VDPU383_IP_CRU_MODE, rkvdec->link + VDPU383_LINK_IP_ENABLE);
-	writel(VDPU383_DEC_E_BIT, rkvdec->link + VDPU383_LINK_DEC_ENABLE);
+	writel(timeout_threshold, core->link + VDPU383_LINK_TIMEOUT_THRESHOLD);
+	writel(VDPU383_IP_CRU_MODE, core->link + VDPU383_LINK_IP_ENABLE);
+	writel(VDPU383_DEC_E_BIT, core->link + VDPU383_LINK_DEC_ENABLE);
 
 	return 0;
 }
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c
index 2751f5396ee8..0b7d6b29bcfa 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c
@@ -482,6 +482,7 @@ static void config_registers(struct rkvdec_ctx *ctx,
 	struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv;
 	struct rkvdec_regs *regs = &vp9_ctx->regs;
 	const struct v4l2_vp9_segmentation *seg;
+	struct rkvdec_core *core = ctx->core;
 	struct rkvdec_dev *rkvdec = ctx->dev;
 	dma_addr_t addr;
 	bool intra_only;
@@ -657,18 +658,19 @@ static void config_registers(struct rkvdec_ctx *ctx,
 
 	regs->vp9.reg44.strmd_error_e = 0xe;
 
-	rkvdec_memcpy_toio(rkvdec->regs, regs,
+	rkvdec_memcpy_toio(core->regs, regs,
 			   MIN(sizeof(*regs), sizeof(u32) * rkvdec->variant->num_regs));
 }
 
 static int validate_dec_params(struct rkvdec_ctx *ctx,
 			       const struct v4l2_ctrl_vp9_frame *dec_params)
 {
+	struct rkvdec_core *core = ctx->core;
 	unsigned int aligned_width, aligned_height;
 
 	/* We only support profile 0. */
 	if (dec_params->profile != 0) {
-		dev_err(ctx->dev->dev, "unsupported profile %d\n",
+		dev_err(core->dev, "unsupported profile %d\n",
 			dec_params->profile);
 		return -EINVAL;
 	}
@@ -682,7 +684,7 @@ static int validate_dec_params(struct rkvdec_ctx *ctx,
 	 */
 	if (aligned_width != ctx->decoded_fmt.fmt.pix_mp.width ||
 	    aligned_height != ctx->decoded_fmt.fmt.pix_mp.height) {
-		dev_err(ctx->dev->dev,
+		dev_err(core->dev,
 			"unexpected bitstream resolution %dx%d\n",
 			dec_params->frame_width_minus_1 + 1,
 			dec_params->frame_height_minus_1 + 1);
@@ -768,6 +770,7 @@ static int rkvdec_vp9_run_preamble(struct rkvdec_ctx *ctx,
 
 static int rkvdec_vp9_run(struct rkvdec_ctx *ctx)
 {
+	struct rkvdec_core *core = ctx->core;
 	struct rkvdec_dev *rkvdec = ctx->dev;
 	struct rkvdec_vp9_run run = { };
 	int ret;
@@ -786,10 +789,10 @@ static int rkvdec_vp9_run(struct rkvdec_ctx *ctx)
 
 	rkvdec_run_postamble(ctx, &run.base);
 
-	schedule_delayed_work(&rkvdec->watchdog_work, msecs_to_jiffies(2000));
+	schedule_delayed_work(&core->watchdog_work, msecs_to_jiffies(2000));
 
-	writel(1, rkvdec->regs + RKVDEC_REG_PREF_LUMA_CACHE_COMMAND);
-	writel(1, rkvdec->regs + RKVDEC_REG_PREF_CHR_CACHE_COMMAND);
+	writel(1, core->regs + RKVDEC_REG_PREF_LUMA_CACHE_COMMAND);
+	writel(1, core->regs + RKVDEC_REG_PREF_CHR_CACHE_COMMAND);
 
 	if (rkvdec->variant->quirks & RKVDEC_QUIRK_DISABLE_QOS)
 		rkvdec_quirks_disable_qos(ctx);
@@ -797,7 +800,7 @@ static int rkvdec_vp9_run(struct rkvdec_ctx *ctx)
 	/* Start decoding! */
 	writel(RKVDEC_INTERRUPT_DEC_E | RKVDEC_CONFIG_DEC_CLK_GATE_E |
 	       RKVDEC_TIMEOUT_E | RKVDEC_BUF_EMPTY_E,
-	       rkvdec->regs + RKVDEC_REG_INTERRUPT);
+	       core->regs + RKVDEC_REG_INTERRUPT);
 
 	return 0;
 }
@@ -979,7 +982,7 @@ static int rkvdec_vp9_start(struct rkvdec_ctx *ctx)
 	ctx->priv = vp9_ctx;
 
 	BUILD_BUG_ON(sizeof(priv_tbl->probs) % 16); /* ensure probs size is 128-bit aligned */
-	priv_tbl = dma_alloc_coherent(rkvdec->dev, sizeof(*priv_tbl),
+	priv_tbl = dma_alloc_coherent(rkvdec->main_core->dev, sizeof(*priv_tbl),
 				      &vp9_ctx->priv_tbl.dma, GFP_KERNEL);
 	if (!priv_tbl) {
 		ret = -ENOMEM;
@@ -989,7 +992,7 @@ static int rkvdec_vp9_start(struct rkvdec_ctx *ctx)
 	vp9_ctx->priv_tbl.size = sizeof(*priv_tbl);
 	vp9_ctx->priv_tbl.cpu = priv_tbl;
 
-	count_tbl = dma_alloc_coherent(rkvdec->dev, RKVDEC_VP9_COUNT_SIZE,
+	count_tbl = dma_alloc_coherent(rkvdec->main_core->dev, RKVDEC_VP9_COUNT_SIZE,
 				       &vp9_ctx->count_tbl.dma, GFP_KERNEL);
 	if (!count_tbl) {
 		ret = -ENOMEM;
@@ -1003,7 +1006,7 @@ static int rkvdec_vp9_start(struct rkvdec_ctx *ctx)
 	return 0;
 
 err_free_priv_tbl:
-	dma_free_coherent(rkvdec->dev, vp9_ctx->priv_tbl.size,
+	dma_free_coherent(rkvdec->main_core->dev, vp9_ctx->priv_tbl.size,
 			  vp9_ctx->priv_tbl.cpu, vp9_ctx->priv_tbl.dma);
 
 err_free_ctx:
@@ -1016,9 +1019,9 @@ static void rkvdec_vp9_stop(struct rkvdec_ctx *ctx)
 	struct rkvdec_vp9_ctx *vp9_ctx = ctx->priv;
 	struct rkvdec_dev *rkvdec = ctx->dev;
 
-	dma_free_coherent(rkvdec->dev, vp9_ctx->count_tbl.size,
+	dma_free_coherent(rkvdec->main_core->dev, vp9_ctx->count_tbl.size,
 			  vp9_ctx->count_tbl.cpu, vp9_ctx->count_tbl.dma);
-	dma_free_coherent(rkvdec->dev, vp9_ctx->priv_tbl.size,
+	dma_free_coherent(rkvdec->main_core->dev, vp9_ctx->priv_tbl.size,
 			  vp9_ctx->priv_tbl.cpu, vp9_ctx->priv_tbl.dma);
 	kfree(vp9_ctx);
 }
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec.c b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
index db2731af06cf..5667d625f016 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
@@ -17,6 +17,7 @@
 #include <linux/module.h>
 #include <linux/of.h>
 #include <linux/of_device.h>
+#include <linux/of_platform.h>
 #include <linux/platform_device.h>
 #include <linux/pm.h>
 #include <linux/pm_runtime.h>
@@ -655,11 +656,11 @@ static int rkvdec_querycap(struct file *file, void *priv,
 	struct rkvdec_dev *rkvdec = video_drvdata(file);
 	struct video_device *vdev = video_devdata(file);
 
-	strscpy(cap->driver, rkvdec->dev->driver->name,
+	strscpy(cap->driver, rkvdec->main_core->dev->driver->name,
 		sizeof(cap->driver));
 	strscpy(cap->card, vdev->name, sizeof(cap->card));
 	snprintf(cap->bus_info, sizeof(cap->bus_info), "platform:%s",
-		 rkvdec->dev->driver->name);
+		 rkvdec->main_core->dev->driver->name);
 	return 0;
 }
 
@@ -1026,8 +1027,6 @@ static void rkvdec_stop_streaming(struct vb2_queue *q)
 
 		if (desc->ops->stop)
 			desc->ops->stop(ctx);
-
-		rkvdec_free_rcb(ctx);
 	}
 
 	rkvdec_queue_cleanup(q, VB2_BUF_STATE_ERROR);
@@ -1061,28 +1060,66 @@ static const struct media_device_ops rkvdec_media_ops = {
 	.req_queue = v4l2_m2m_request_queue,
 };
 
-static void rkvdec_job_finish_no_pm(struct rkvdec_ctx *ctx,
-				    enum vb2_buffer_state result)
+/**
+ * Return a core that is available for decoding or null if no core is found.
+ * The caller should make sure to call release_core() when the core is no longer needed.
+ */
+static struct rkvdec_core *acquire_core(struct rkvdec_dev *rkvdec, struct rkvdec_ctx *ctx)
+{
+	struct rkvdec_core *core = NULL;
+
+	guard(spinlock_irqsave)(&rkvdec->cores_lock);
+
+	if (rkvdec->available_core_count) {
+		core = rkvdec->available_cores[--rkvdec->available_core_count];
+
+		// Set the current core's ctx to this ctx
+		core->curr_ctx = ctx;
+	}
+
+	return core;
+}
+
+/**
+ * Release the core to make it available for a next job.
+ */
+static void release_core(struct rkvdec_dev *rkvdec, struct rkvdec_core *core)
+{
+	guard(spinlock_irqsave)(&rkvdec->cores_lock);
+
+	core->curr_ctx = NULL;
+	rkvdec->available_cores[rkvdec->available_core_count++] = core;
+}
+
+static void rkvdec_buf_done_no_pm(struct rkvdec_ctx *ctx,
+				  enum vb2_buffer_state result)
 {
+	struct v4l2_m2m_ctx *m2m_ctx = ctx->fh.m2m_ctx;
+	struct v4l2_m2m_dev *m2m_dev = m2m_ctx->m2m_dev;
+
 	if (ctx->coded_fmt_desc->ops->done) {
 		struct vb2_v4l2_buffer *src_buf, *dst_buf;
 
-		src_buf = v4l2_m2m_next_src_buf(ctx->fh.m2m_ctx);
-		dst_buf = v4l2_m2m_next_dst_buf(ctx->fh.m2m_ctx);
+		src_buf = v4l2_m2m_next_src_buf(m2m_ctx);
+		dst_buf = v4l2_m2m_next_dst_buf(m2m_ctx);
 		ctx->coded_fmt_desc->ops->done(ctx, src_buf, dst_buf, result);
 	}
 
-	v4l2_m2m_buf_done_and_job_finish(ctx->dev->m2m_dev, ctx->fh.m2m_ctx,
-					 result);
+	v4l2_m2m_buf_done_manual(m2m_dev, m2m_ctx, result);
+
+	if (ctx->core) {
+		release_core(ctx->dev, ctx->core);
+		v4l2_m2m_try_schedule(m2m_ctx);
+	}
 }
 
-static void rkvdec_job_finish(struct rkvdec_ctx *ctx,
-			      enum vb2_buffer_state result)
+static void rkvdec_buf_done(struct rkvdec_ctx *ctx,
+			    enum vb2_buffer_state result)
 {
-	struct rkvdec_dev *rkvdec = ctx->dev;
+	pm_runtime_mark_last_busy(ctx->core->dev);
+	pm_runtime_put_autosuspend(ctx->core->dev);
 
-	pm_runtime_put_autosuspend(rkvdec->dev);
-	rkvdec_job_finish_no_pm(ctx, result);
+	rkvdec_buf_done_no_pm(ctx, result);
 }
 
 void rkvdec_run_preamble(struct rkvdec_ctx *ctx, struct rkvdec_run *run)
@@ -1112,14 +1149,14 @@ void rkvdec_run_postamble(struct rkvdec_ctx *ctx, struct rkvdec_run *run)
 
 void rkvdec_quirks_disable_qos(struct rkvdec_ctx *ctx)
 {
-	struct rkvdec_dev *rkvdec = ctx->dev;
+	struct rkvdec_core *core = ctx->core;
 	u32 reg;
 
 	/* Set undocumented swreg_block_gating_e field */
-	reg = readl(rkvdec->regs + RKVDEC_REG_QOS_CTRL);
+	reg = readl(core->regs + RKVDEC_REG_QOS_CTRL);
 	reg &= GENMASK(31, 16);
 	reg |= 0xEFFF;
-	writel(reg, rkvdec->regs + RKVDEC_REG_QOS_CTRL);
+	writel(reg, core->regs + RKVDEC_REG_QOS_CTRL);
 }
 
 void rkvdec_memcpy_toio(void __iomem *dst, void *src, size_t len)
@@ -1131,57 +1168,81 @@ void rkvdec_memcpy_toio(void __iomem *dst, void *src, size_t len)
 #endif
 }
 
-void rkvdec_schedule_watchdog(struct rkvdec_dev *rkvdec, u32 timeout_threshold)
+void rkvdec_schedule_watchdog(struct rkvdec_core *core, u32 timeout_threshold)
 {
 	/* Set watchdog at 2 times the hardware timeout threshold */
 	u32 watchdog_time;
-	unsigned long axi_rate = clk_get_rate(rkvdec->axi_clk);
+	unsigned long axi_rate = clk_get_rate(core->axi_clk);
 
 	if (axi_rate)
 		watchdog_time = 2 * div_u64(1000 * (u64)timeout_threshold, axi_rate);
 	else
 		watchdog_time = 2000;
 
-	schedule_delayed_work(&rkvdec->watchdog_work, msecs_to_jiffies(watchdog_time));
+	schedule_delayed_work(&core->watchdog_work, msecs_to_jiffies(watchdog_time));
 }
 
 static void rkvdec_device_run(void *priv)
 {
 	struct rkvdec_ctx *ctx = priv;
-	struct rkvdec_dev *rkvdec = ctx->dev;
 	const struct rkvdec_coded_fmt_desc *desc = ctx->coded_fmt_desc;
 	int ret;
 
 	if (WARN_ON(!desc))
-		return;
+		goto finish;
 
-	ret = pm_runtime_resume_and_get(rkvdec->dev);
+	ctx->core = acquire_core(ctx->dev, ctx);
+	if (!ctx->core)
+		goto finish;
+
+	ret = pm_runtime_resume_and_get(ctx->core->dev);
 	if (ret < 0) {
-		rkvdec_job_finish_no_pm(ctx, VB2_BUF_STATE_ERROR);
-		return;
+		rkvdec_buf_done_no_pm(ctx, VB2_BUF_STATE_ERROR);
+		goto finish;
 	}
 
 	if (!rkvdec_rcb_buf_validate_size(ctx)) {
-		rkvdec_free_rcb(ctx);
+		rkvdec_free_rcb(ctx->core);
 
-		ret = rkvdec_allocate_rcb(ctx,
+		ret = rkvdec_allocate_rcb(ctx->core,
 					  ctx->decoded_fmt.fmt.pix_mp.width,
 					  ctx->decoded_fmt.fmt.pix_mp.height,
 					  ctx->dev->variant->rcb_sizes,
 					  ctx->dev->variant->num_rcb_sizes);
 		if (ret) {
-			rkvdec_job_finish(ctx, VB2_BUF_STATE_ERROR);
-			return;
+			rkvdec_buf_done(ctx, VB2_BUF_STATE_ERROR);
+			goto finish;
 		}
 	}
 
 	ret = desc->ops->run(ctx);
-	if (ret)
-		rkvdec_job_finish(ctx, VB2_BUF_STATE_ERROR);
+	if (ret) {
+		rkvdec_buf_done(ctx, VB2_BUF_STATE_ERROR);
+		goto finish;
+	}
+
+finish:
+	v4l2_m2m_job_finish(ctx->dev->m2m_dev, ctx->fh.m2m_ctx);
+}
+
+static int rkvdec_job_ready(void *priv)
+{
+	struct rkvdec_ctx *ctx = priv;
+	struct rkvdec_dev *rkvdec = ctx->dev;
+
+	guard(spinlock_irqsave)(&rkvdec->cores_lock);
+
+	for (int i = 0; i < rkvdec->core_count; i++) {
+		if (rkvdec->cores[i].curr_ctx == ctx)
+			return 0;
+	}
+
+	return 1;
 }
 
 static const struct v4l2_m2m_ops rkvdec_m2m_ops = {
 	.device_run = rkvdec_device_run,
+	.job_ready = rkvdec_job_ready,
 };
 
 static int rkvdec_queue_init(void *priv,
@@ -1340,10 +1401,11 @@ static const struct v4l2_file_operations rkvdec_fops = {
 static int rkvdec_v4l2_init(struct rkvdec_dev *rkvdec)
 {
 	int ret;
+	struct device *dev = rkvdec->main_core->dev;
 
-	ret = v4l2_device_register(rkvdec->dev, &rkvdec->v4l2_dev);
+	ret = v4l2_device_register(dev, &rkvdec->v4l2_dev);
 	if (ret) {
-		dev_err(rkvdec->dev, "Failed to register V4L2 device\n");
+		dev_err(dev, "Failed to register V4L2 device\n");
 		return ret;
 	}
 
@@ -1354,7 +1416,7 @@ static int rkvdec_v4l2_init(struct rkvdec_dev *rkvdec)
 		goto err_unregister_v4l2;
 	}
 
-	rkvdec->mdev.dev = rkvdec->dev;
+	rkvdec->mdev.dev = dev;
 	strscpy(rkvdec->mdev.model, "rkvdec", sizeof(rkvdec->mdev.model));
 	strscpy(rkvdec->mdev.bus_info, "platform:rkvdec",
 		sizeof(rkvdec->mdev.bus_info));
@@ -1420,9 +1482,9 @@ static void rkvdec_v4l2_cleanup(struct rkvdec_dev *rkvdec)
 	v4l2_device_unregister(&rkvdec->v4l2_dev);
 }
 
-static void rkvdec_iommu_restore(struct rkvdec_dev *rkvdec)
+static void rkvdec_iommu_restore(struct rkvdec_core *core)
 {
-	if (rkvdec->empty_domain) {
+	if (core->empty_domain) {
 		/*
 		 * To rewrite mapping into the attached IOMMU core, attach a new empty domain that
 		 * will program an empty table, then detach it to restore the default domain and
@@ -1430,42 +1492,42 @@ static void rkvdec_iommu_restore(struct rkvdec_dev *rkvdec)
 		 * This is safely done in this interrupt handler to make sure no memory get mapped
 		 * through the IOMMU while the empty domain is attached.
 		 */
-		iommu_attach_device(rkvdec->empty_domain, rkvdec->dev);
-		iommu_detach_device(rkvdec->empty_domain, rkvdec->dev);
+		iommu_attach_device(core->empty_domain, core->dev);
+		iommu_detach_device(core->empty_domain, core->dev);
 	}
 }
 
 static irqreturn_t rk3399_irq_handler(struct rkvdec_ctx *ctx)
 {
-	struct rkvdec_dev *rkvdec = ctx->dev;
+	struct rkvdec_core *core = ctx->core;
 	enum vb2_buffer_state state;
 	u32 status;
 
-	status = readl(rkvdec->regs + RKVDEC_REG_INTERRUPT);
-	writel(0, rkvdec->regs + RKVDEC_REG_INTERRUPT);
+	status = readl(core->regs + RKVDEC_REG_INTERRUPT);
+	writel(0, core->regs + RKVDEC_REG_INTERRUPT);
 
 	if (status & RKVDEC_RDY_STA) {
 		state = VB2_BUF_STATE_DONE;
 	} else {
 		state = VB2_BUF_STATE_ERROR;
 		if (status & RKVDEC_SOFTRESET_RDY)
-			rkvdec_iommu_restore(rkvdec);
+			rkvdec_iommu_restore(core);
 	}
 
-	if (cancel_delayed_work(&rkvdec->watchdog_work))
-		rkvdec_job_finish(ctx, state);
+	if (cancel_delayed_work(&core->watchdog_work))
+		rkvdec_buf_done(ctx, state);
 
 	return IRQ_HANDLED;
 }
 
 static irqreturn_t vdpu381_irq_handler(struct rkvdec_ctx *ctx)
 {
-	struct rkvdec_dev *rkvdec = ctx->dev;
+	struct rkvdec_core *core = ctx->core;
 	enum vb2_buffer_state state;
 	u32 status;
 
-	status = readl(rkvdec->regs + VDPU381_REG_STA_INT);
-	writel(0, rkvdec->regs + VDPU381_REG_STA_INT);
+	status = readl(core->regs + VDPU381_REG_STA_INT);
+	writel(0, core->regs + VDPU381_REG_STA_INT);
 
 	if (status & VDPU381_STA_INT_DEC_RDY_STA) {
 		state = VB2_BUF_STATE_DONE;
@@ -1474,47 +1536,50 @@ static irqreturn_t vdpu381_irq_handler(struct rkvdec_ctx *ctx)
 		if (status & (VDPU381_STA_INT_SOFTRESET_RDY |
 			      VDPU381_STA_INT_TIMEOUT |
 			      VDPU381_STA_INT_ERROR))
-			rkvdec_iommu_restore(rkvdec);
+			rkvdec_iommu_restore(core);
 	}
 
-	if (cancel_delayed_work(&rkvdec->watchdog_work))
-		rkvdec_job_finish(ctx, state);
+	if (cancel_delayed_work(&core->watchdog_work))
+		rkvdec_buf_done(ctx, state);
 
 	return IRQ_HANDLED;
 }
 
 static irqreturn_t vdpu383_irq_handler(struct rkvdec_ctx *ctx)
 {
-	struct rkvdec_dev *rkvdec = ctx->dev;
+	struct rkvdec_core *core = ctx->core;
 	enum vb2_buffer_state state;
 	u32 status;
 
-	status = readl(rkvdec->link + VDPU383_LINK_STA_INT);
-	writel(FIELD_PREP_WM16(VDPU383_STA_INT_ALL, 0), rkvdec->link + VDPU383_LINK_STA_INT);
+	status = readl(core->link + VDPU383_LINK_STA_INT);
+	writel(FIELD_PREP_WM16(VDPU383_STA_INT_ALL, 0), core->link + VDPU383_LINK_STA_INT);
 	/* On vdpu383, the interrupts must be disabled */
 	writel(FIELD_PREP_WM16(VDPU383_INT_EN_IRQ | VDPU383_INT_EN_LINE_IRQ, 0),
-	       rkvdec->link + VDPU383_LINK_INT_EN);
+	       core->link + VDPU383_LINK_INT_EN);
 
 	if (status & VDPU383_STA_INT_DEC_RDY_STA) {
 		state = VB2_BUF_STATE_DONE;
 	} else {
 		state = VB2_BUF_STATE_ERROR;
-		rkvdec_iommu_restore(rkvdec);
+		rkvdec_iommu_restore(core);
 	}
 
-	if (cancel_delayed_work(&rkvdec->watchdog_work))
-		rkvdec_job_finish(ctx, state);
+	if (cancel_delayed_work(&core->watchdog_work))
+		rkvdec_buf_done(ctx, state);
 
 	return IRQ_HANDLED;
 }
 
 static irqreturn_t rkvdec_irq_handler(int irq, void *priv)
 {
-	struct rkvdec_dev *rkvdec = priv;
-	struct rkvdec_ctx *ctx = v4l2_m2m_get_curr_priv(rkvdec->m2m_dev);
-	const struct rkvdec_variant *variant = rkvdec->variant;
+	irqreturn_t ret;
+	struct rkvdec_core *core = priv;
+	struct rkvdec_ctx *ctx = core->curr_ctx;
+	const struct rkvdec_variant *variant = ctx->dev->variant;
+
+	ret = variant->ops->irq_handler(ctx);
 
-	return variant->ops->irq_handler(ctx);
+	return ret;
 }
 
 /*
@@ -1591,62 +1656,19 @@ static void vdpu383_flatten_matrices(u8 *output, const u8 *input, int matrices,
 
 static void rkvdec_watchdog_func(struct work_struct *work)
 {
-	struct rkvdec_dev *rkvdec;
+	struct rkvdec_core *core;
 	struct rkvdec_ctx *ctx;
 
-	rkvdec = container_of(to_delayed_work(work), struct rkvdec_dev,
+	core = container_of(to_delayed_work(work), struct rkvdec_core,
 			      watchdog_work);
-	ctx = v4l2_m2m_get_curr_priv(rkvdec->m2m_dev);
+	ctx = core->curr_ctx;
 	if (ctx) {
-		dev_err(rkvdec->dev, "Frame processing timed out!\n");
-		writel(RKVDEC_IRQ_DIS, rkvdec->regs + RKVDEC_REG_INTERRUPT);
-		rkvdec_job_finish(ctx, VB2_BUF_STATE_ERROR);
+		dev_err(core->dev, "Frame processing timed out!\n");
+		writel(RKVDEC_IRQ_DIS, core->regs + RKVDEC_REG_INTERRUPT);
+		rkvdec_buf_done(ctx, VB2_BUF_STATE_ERROR);
 	}
 }
 
-/*
- * Some SoCs, like RK3588 have multiple identical VDPU cores, but the
- * kernel is currently missing support for multi-core handling. Exposing
- * separate devices for each core to userspace is bad, since that does
- * not allow scheduling tasks properly (and creates ABI). With this workaround
- * the driver will only probe for the first core and early exit for the other
- * cores. Once the driver gains multi-core support, the same technique
- * for detecting the first core can be used to cluster all cores together.
- */
-static int rkvdec_disable_multicore(struct rkvdec_dev *rkvdec)
-{
-	struct device_node *node = NULL;
-	const char *compatible;
-	bool is_first_core;
-	int ret;
-
-	/* Intentionally ignores the fallback strings */
-	ret = of_property_read_string(rkvdec->dev->of_node, "compatible", &compatible);
-	if (ret)
-		return ret;
-
-	/* The first compatible and available node found is considered the main core */
-	do {
-		node = of_find_compatible_node(node, NULL, compatible);
-		if (of_device_is_available(node))
-			break;
-	} while (node);
-
-	if (!node)
-		return -EINVAL;
-
-	is_first_core = (rkvdec->dev->of_node == node);
-
-	of_node_put(node);
-
-	if (!is_first_core) {
-		dev_info(rkvdec->dev, "missing multi-core support, ignoring this instance\n");
-		return -ENODEV;
-	}
-
-	return 0;
-}
-
 static const struct rkvdec_variant_ops rk3399_variant_ops = {
 	.irq_handler = rk3399_irq_handler,
 	.colmv_size = rkvdec_colmv_size,
@@ -1757,49 +1779,114 @@ static const struct of_device_id of_rkvdec_match[] = {
 };
 MODULE_DEVICE_TABLE(of, of_rkvdec_match);
 
+static struct rkvdec_dev *device_node_to_rkvdec(struct device_node *node)
+{
+	struct platform_device *pdev = of_find_device_by_node(node);
+	struct rkvdec_dev *rkvdec = NULL;
+
+	if (!pdev)
+		return NULL;
+
+	rkvdec = platform_get_drvdata(pdev);
+
+	platform_device_put(pdev);
+
+	return rkvdec;
+}
+
+/*
+ * Probe a new core based on the given device.
+ * If it is the first probed core of the same compatible, a new rkvdec instance is
+ * created and the core is added to it.
+ * If not, the core is added to the existing rkvdec instance.
+ */
+static struct rkvdec_dev *rkvdec_probe_get_first(struct device *dev)
+{
+	struct device_node *first_node = NULL;
+	struct rkvdec_dev *rkvdec = NULL;
+	const char *compatible;
+	bool is_first_core;
+	int ret = 0;
+
+	/* Intentionally ignores the fallback strings */
+	ret = of_property_read_string(dev->of_node, "compatible", &compatible);
+	if (ret)
+		return ERR_PTR(-EINVAL);
+
+	/* The first compatible and available node found is considered the main core */
+	do {
+		first_node = of_find_compatible_node(first_node, NULL, compatible);
+		if (of_device_is_available(first_node))
+			break;
+	} while (first_node);
+
+	if (!first_node)
+		return ERR_PTR(-EINVAL);
+
+	is_first_core = (dev->of_node == first_node);
+
+	if (is_first_core) {
+		of_node_put(first_node);
+
+		rkvdec = devm_kzalloc(dev, sizeof(*rkvdec), GFP_KERNEL);
+		if (!rkvdec)
+			return ERR_PTR(-ENOMEM);
+
+		rkvdec->variant = of_device_get_match_data(dev);
+
+		mutex_init(&rkvdec->vdev_lock);
+		spin_lock_init(&rkvdec->cores_lock);
+	} else {
+		rkvdec = device_node_to_rkvdec(first_node);
+		of_node_put(first_node);
+	}
+
+	return rkvdec;
+}
+
 static int rkvdec_probe(struct platform_device *pdev)
 {
-	const struct rkvdec_variant *variant;
 	struct rkvdec_dev *rkvdec;
+	struct rkvdec_core *core;
 	int ret, irq;
 
-	variant = of_device_get_match_data(&pdev->dev);
-	if (!variant)
-		return -EINVAL;
+	rkvdec = rkvdec_probe_get_first(&pdev->dev);
+	if (IS_ERR(rkvdec))
+		return PTR_ERR(rkvdec);
 
-	rkvdec = devm_kzalloc(&pdev->dev, sizeof(*rkvdec), GFP_KERNEL);
-	if (!rkvdec)
-		return -ENOMEM;
+	core = &rkvdec->cores[rkvdec->core_count++];
 
 	platform_set_drvdata(pdev, rkvdec);
-	rkvdec->dev = &pdev->dev;
-	rkvdec->variant = variant;
-	mutex_init(&rkvdec->vdev_lock);
-	INIT_DELAYED_WORK(&rkvdec->watchdog_work, rkvdec_watchdog_func);
-
-	ret = rkvdec_disable_multicore(rkvdec);
-	if (ret)
-		return ret;
+	core->dev = &pdev->dev;
+	INIT_DELAYED_WORK(&core->watchdog_work, rkvdec_watchdog_func);
 
-	ret = devm_clk_bulk_get_all_enabled(&pdev->dev, &rkvdec->clocks);
+	ret = devm_clk_bulk_get_all_enabled(&pdev->dev, &core->clocks);
 	if (ret < 0)
 		return ret;
 
-	rkvdec->num_clocks = ret;
-	rkvdec->axi_clk = devm_clk_get(&pdev->dev, "axi");
+	core->num_clocks = ret;
+	core->axi_clk = devm_clk_get(&pdev->dev, "axi");
 
 	if (rkvdec->variant->has_single_reg_region) {
-		rkvdec->regs = devm_platform_ioremap_resource(pdev, 0);
-		if (IS_ERR(rkvdec->regs))
-			return PTR_ERR(rkvdec->regs);
+		core->regs = devm_platform_ioremap_resource(pdev, 0);
+		if (IS_ERR(core->regs))
+			return PTR_ERR(core->regs);
 	} else {
-		rkvdec->regs = devm_platform_ioremap_resource_byname(pdev, "function");
-		if (IS_ERR(rkvdec->regs))
-			return PTR_ERR(rkvdec->regs);
+		core->regs = devm_platform_ioremap_resource_byname(pdev, "function");
+		if (IS_ERR(core->regs))
+			return PTR_ERR(core->regs);
+
+		core->link = devm_platform_ioremap_resource_byname(pdev, "link");
+		if (IS_ERR(core->link))
+			return PTR_ERR(core->link);
+	}
 
-		rkvdec->link = devm_platform_ioremap_resource_byname(pdev, "link");
-		if (IS_ERR(rkvdec->link))
-			return PTR_ERR(rkvdec->link);
+	core->iommu_domain = iommu_get_domain_for_dev(&pdev->dev);
+	if (core->iommu_domain) {
+		core->empty_domain = iommu_paging_domain_alloc(core->dev);
+
+		if (!core->empty_domain)
+			dev_warn(core->dev, "cannot alloc new empty domain\n");
 	}
 
 	ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32));
@@ -1816,42 +1903,41 @@ static int rkvdec_probe(struct platform_device *pdev)
 
 	ret = devm_request_threaded_irq(&pdev->dev, irq, NULL,
 					rkvdec_irq_handler, IRQF_ONESHOT,
-					dev_name(&pdev->dev), rkvdec);
+					dev_name(&pdev->dev), core);
 	if (ret) {
-		dev_err(&pdev->dev, "Could not request vdec IRQ\n");
+		dev_err(&pdev->dev, "Could not request core IRQ\n");
 		return ret;
 	}
 
-	rkvdec->sram_pool = of_gen_pool_get(pdev->dev.of_node, "sram", 0);
-	if (!rkvdec->sram_pool && rkvdec->variant->num_rcb_sizes > 0)
+	core->sram_pool = of_gen_pool_get(pdev->dev.of_node, "sram", 0);
+	if (!core->sram_pool && rkvdec->variant->num_rcb_sizes > 0)
 		dev_info(&pdev->dev, "No sram node, RCB will be stored in RAM\n");
 
 	pm_runtime_set_autosuspend_delay(&pdev->dev, 100);
 	pm_runtime_use_autosuspend(&pdev->dev);
 	pm_runtime_enable(&pdev->dev);
 
-	ret = rkvdec_v4l2_init(rkvdec);
-	if (ret)
-		goto err_disable_runtime_pm;
-
-	rkvdec->iommu_domain = iommu_get_domain_for_dev(&pdev->dev);
-	if (rkvdec->iommu_domain) {
-		rkvdec->empty_domain = iommu_paging_domain_alloc(rkvdec->dev);
+	/* Only init v4l2 for the first (main) core. */
+	if (rkvdec->core_count == 1) {
+		rkvdec->main_core = core;
 
-		if (IS_ERR(rkvdec->empty_domain)) {
-			rkvdec->empty_domain = NULL;
-			dev_warn(rkvdec->dev, "cannot alloc new empty domain\n");
-		}
+		ret = rkvdec_v4l2_init(rkvdec);
+		if (ret)
+			goto err_disable_runtime_pm;
 	}
 
+	release_core(rkvdec, core);
+
+	dev_info(core->dev, "Registered core %d\n", core->id);
+
 	return 0;
 
 err_disable_runtime_pm:
 	pm_runtime_dont_use_autosuspend(&pdev->dev);
 	pm_runtime_disable(&pdev->dev);
 
-	if (rkvdec->sram_pool)
-		gen_pool_destroy(rkvdec->sram_pool);
+	if (core->sram_pool)
+		gen_pool_destroy(core->sram_pool);
 
 	return ret;
 }
@@ -1859,30 +1945,50 @@ static int rkvdec_probe(struct platform_device *pdev)
 static void rkvdec_remove(struct platform_device *pdev)
 {
 	struct rkvdec_dev *rkvdec = platform_get_drvdata(pdev);
+	int i;
 
-	cancel_delayed_work_sync(&rkvdec->watchdog_work);
+	for (i = 0; i < rkvdec->core_count; i++)
+		cancel_delayed_work_sync(&rkvdec->cores[i].watchdog_work);
 
 	rkvdec_v4l2_cleanup(rkvdec);
-	pm_runtime_disable(&pdev->dev);
-	pm_runtime_dont_use_autosuspend(&pdev->dev);
 
-	if (rkvdec->empty_domain)
-		iommu_domain_free(rkvdec->empty_domain);
+	for (i = 0; i < rkvdec->core_count; i++) {
+		pm_runtime_disable(rkvdec->cores[i].dev);
+		pm_runtime_dont_use_autosuspend(rkvdec->cores[i].dev);
+
+		if (rkvdec->cores[i].empty_domain)
+			iommu_domain_free(rkvdec->cores[i].empty_domain);
+
+		rkvdec_free_rcb(&rkvdec->cores[i]);
+	}
 }
 
 #ifdef CONFIG_PM
 static int rkvdec_runtime_resume(struct device *dev)
 {
 	struct rkvdec_dev *rkvdec = dev_get_drvdata(dev);
+	int i, ret;
 
-	return clk_bulk_prepare_enable(rkvdec->num_clocks, rkvdec->clocks);
+	for (i = 0; i < rkvdec->core_count; i++) {
+		ret = clk_bulk_prepare_enable(rkvdec->cores[i].num_clocks,
+					      rkvdec->cores[i].clocks);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
 }
 
 static int rkvdec_runtime_suspend(struct device *dev)
 {
 	struct rkvdec_dev *rkvdec = dev_get_drvdata(dev);
+	int i;
+
+	for (i = 0; i < rkvdec->core_count; i++) {
+		clk_bulk_disable_unprepare(rkvdec->cores[i].num_clocks,
+					   rkvdec->cores[i].clocks);
+	}
 
-	clk_bulk_disable_unprepare(rkvdec->num_clocks, rkvdec->clocks);
 	return 0;
 }
 #endif
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec.h b/drivers/media/platform/rockchip/rkvdec/rkvdec.h
index a24be6638b6b..4f042a367dc0 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec.h
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.h
@@ -15,6 +15,7 @@
 #include <linux/videodev2.h>
 #include <linux/wait.h>
 #include <linux/clk.h>
+#include <linux/spinlock.h>
 
 #include <media/v4l2-ctrls.h>
 #include <media/v4l2-device.h>
@@ -125,23 +126,35 @@ struct rkvdec_coded_fmt_desc {
 	u32 subsystem_flags;
 };
 
-struct rkvdec_dev {
-	struct v4l2_device v4l2_dev;
-	struct media_device mdev;
-	struct video_device vdev;
-	struct v4l2_m2m_dev *m2m_dev;
+struct rkvdec_core {
 	struct device *dev;
 	struct clk_bulk_data *clocks;
 	unsigned int num_clocks;
 	struct clk *axi_clk;
 	void __iomem *regs;
 	void __iomem *link;
-	struct mutex vdev_lock; /* serializes ioctls */
 	struct delayed_work watchdog_work;
 	struct gen_pool *sram_pool;
 	struct iommu_domain *iommu_domain;
 	struct iommu_domain *empty_domain;
+	struct rkvdec_rcb_config *rcb_config;
+	struct rkvdec_ctx *curr_ctx;
+	int id;
+};
+
+struct rkvdec_dev {
+	struct v4l2_device v4l2_dev;
+	struct media_device mdev;
+	struct video_device vdev;
+	struct v4l2_m2m_dev *m2m_dev;
+	struct mutex vdev_lock; /* serializes ioctls */
 	const struct rkvdec_variant *variant;
+	struct rkvdec_core cores[2];
+	int core_count;
+	struct rkvdec_core *available_cores[2];
+	unsigned int available_core_count;
+	spinlock_t cores_lock; /* serializes core list access */
+	struct rkvdec_core *main_core;
 };
 
 struct rkvdec_ctx {
@@ -152,8 +165,8 @@ struct rkvdec_ctx {
 	struct v4l2_ctrl_handler ctrl_hdl;
 	struct rkvdec_dev *dev;
 	enum rkvdec_image_fmt image_fmt;
-	struct rkvdec_rcb_config *rcb_config;
 	u32 colmv_offset;
+	struct rkvdec_core *core;
 	void *priv;
 	u8 has_sps_st_rps: 1;
 	u8 has_sps_lt_rps: 1;
@@ -179,7 +192,7 @@ struct rkvdec_aux_buf {
 void rkvdec_run_preamble(struct rkvdec_ctx *ctx, struct rkvdec_run *run);
 void rkvdec_run_postamble(struct rkvdec_ctx *ctx, struct rkvdec_run *run);
 void rkvdec_memcpy_toio(void __iomem *dst, void *src, size_t len);
-void rkvdec_schedule_watchdog(struct rkvdec_dev *rkvdec, u32 timeout_threshold);
+void rkvdec_schedule_watchdog(struct rkvdec_core *core, u32 timeout_threshold);
 
 void rkvdec_quirks_disable_qos(struct rkvdec_ctx *ctx);
 

-- 
2.53.0



^ permalink raw reply related

* [PATCH 6/7] media: rkvdec: Wait for all buffers before stop_streaming
From: Detlev Casanova @ 2026-04-09 13:50 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Ezequiel Garcia, Heiko Stuebner,
	Nicolas Dufresne, Hans Verkuil, Jonas Karlman
  Cc: kernel, linux-media, linux-kernel, linux-rockchip,
	linux-arm-kernel, Detlev Casanova
In-Reply-To: <20260409-rkvdec-multicore-v1-0-62b316abf0f7@collabora.com>

Because the jobs are marked as finished before the buffer are marked
as done, the stop_streaming callback can be called while the decoder
is still running.

This could even go further and deallocate buffers that are still
being used by the hardware.

Fortunately, to avoid that, the vb2_wait_for_all_buffers() function
can be used at the beginning of the stop_streaming callback to make sure
that cleanup functions are called after the last buffer has been returned
to the queue.

Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
---
 drivers/media/platform/rockchip/rkvdec/rkvdec.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec.c b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
index 5667d625f016..c2818f1575ef 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
@@ -1027,6 +1027,8 @@ static void rkvdec_stop_streaming(struct vb2_queue *q)
 
 		if (desc->ops->stop)
 			desc->ops->stop(ctx);
+
+		vb2_wait_for_all_buffers(q);
 	}
 
 	rkvdec_queue_cleanup(q, VB2_BUF_STATE_ERROR);

-- 
2.53.0



^ permalink raw reply related

* [PATCH 4/7] media: rkvdec: Remove unused need_reset
From: Detlev Casanova @ 2026-04-09 13:50 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Ezequiel Garcia, Heiko Stuebner,
	Nicolas Dufresne, Hans Verkuil, Jonas Karlman
  Cc: kernel, linux-media, linux-kernel, linux-rockchip,
	linux-arm-kernel, Detlev Casanova
In-Reply-To: <20260409-rkvdec-multicore-v1-0-62b316abf0f7@collabora.com>

A left-over from the iommu restore mecanism was forgotten.
As need_reset is never set to true, the if has no use.

The actual restore function is called above it vase the IRQ isn't in a
success status.

Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
---
 drivers/media/platform/rockchip/rkvdec/rkvdec.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec.c b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
index 31ddfcc58894..db2731af06cf 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
@@ -1462,7 +1462,6 @@ static irqreturn_t vdpu381_irq_handler(struct rkvdec_ctx *ctx)
 {
 	struct rkvdec_dev *rkvdec = ctx->dev;
 	enum vb2_buffer_state state;
-	bool need_reset = 0;
 	u32 status;
 
 	status = readl(rkvdec->regs + VDPU381_REG_STA_INT);
@@ -1478,9 +1477,6 @@ static irqreturn_t vdpu381_irq_handler(struct rkvdec_ctx *ctx)
 			rkvdec_iommu_restore(rkvdec);
 	}
 
-	if (need_reset)
-		rkvdec_iommu_restore(rkvdec);
-
 	if (cancel_delayed_work(&rkvdec->watchdog_work))
 		rkvdec_job_finish(ctx, state);
 
@@ -1491,7 +1487,6 @@ static irqreturn_t vdpu383_irq_handler(struct rkvdec_ctx *ctx)
 {
 	struct rkvdec_dev *rkvdec = ctx->dev;
 	enum vb2_buffer_state state;
-	bool need_reset = 0;
 	u32 status;
 
 	status = readl(rkvdec->link + VDPU383_LINK_STA_INT);
@@ -1507,9 +1502,6 @@ static irqreturn_t vdpu383_irq_handler(struct rkvdec_ctx *ctx)
 		rkvdec_iommu_restore(rkvdec);
 	}
 
-	if (need_reset)
-		rkvdec_iommu_restore(rkvdec);
-
 	if (cancel_delayed_work(&rkvdec->watchdog_work))
 		rkvdec_job_finish(ctx, state);
 

-- 
2.53.0



^ permalink raw reply related

* [PATCH 2/7] media: v4l2-mem2mem: Remove WARN_ON() in v4l2_m2m_job_finish()
From: Detlev Casanova @ 2026-04-09 13:50 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Ezequiel Garcia, Heiko Stuebner,
	Nicolas Dufresne, Hans Verkuil, Jonas Karlman
  Cc: kernel, linux-media, linux-kernel, linux-rockchip,
	linux-arm-kernel, Detlev Casanova
In-Reply-To: <20260409-rkvdec-multicore-v1-0-62b316abf0f7@collabora.com>

This warning was added because there was no way, for drivers that support
holding capture buffers to mark jobs done correctly without calling
v4l2_m2m_buf_done_and_job_finish().

Now that v4l2_m2m_buf_done_manual() has been introduced, it has become
possible and drivers can use it with v4l2_m2m_job_finish() for finer
grained buffer management.

Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
---
 drivers/media/v4l2-core/v4l2-mem2mem.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
index 7f9fad4f6807..27f7dd7974b5 100644
--- a/drivers/media/v4l2-core/v4l2-mem2mem.c
+++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
@@ -487,13 +487,6 @@ void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
 	unsigned long flags;
 	bool schedule_next;
 
-	/*
-	 * This function should not be used for drivers that support
-	 * holding capture buffers. Those should use
-	 * v4l2_m2m_buf_done_and_job_finish() instead.
-	 */
-	WARN_ON(m2m_ctx->out_q_ctx.q.subsystem_flags &
-		VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF);
 	spin_lock_irqsave(&m2m_dev->job_spinlock, flags);
 	schedule_next = _v4l2_m2m_job_finish(m2m_dev, m2m_ctx);
 	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);

-- 
2.53.0



^ permalink raw reply related

* [PATCH 3/7] media: rkvdec: Keep RCB to the correct size
From: Detlev Casanova @ 2026-04-09 13:50 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Ezequiel Garcia, Heiko Stuebner,
	Nicolas Dufresne, Hans Verkuil, Jonas Karlman
  Cc: kernel, linux-media, linux-kernel, linux-rockchip,
	linux-arm-kernel, Detlev Casanova
In-Reply-To: <20260409-rkvdec-multicore-v1-0-62b316abf0f7@collabora.com>

Currently, if a video changes resolution, the RCB size might be too small
and the HW could try to write out of the allocated buffer.

To fix that, make sure that the RCB size is validated for each run and
increase the buffer size when needed.

Fixes: e5640dbb991c ("media: rkvdec: Add RCB and SRAM support")
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
---
 .../media/platform/rockchip/rkvdec/rkvdec-rcb.c    | 26 +++++++++++++++---
 .../media/platform/rockchip/rkvdec/rkvdec-rcb.h    |  3 ++-
 drivers/media/platform/rockchip/rkvdec/rkvdec.c    | 31 +++++++++++-----------
 3 files changed, 40 insertions(+), 20 deletions(-)

diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c
index fdcf1f177379..191f78278c01 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.c
@@ -17,6 +17,8 @@
 struct rkvdec_rcb_config {
 	struct rkvdec_aux_buf *rcb_bufs;
 	size_t rcb_count;
+	u32 width;
+	u32 height;
 };
 
 static size_t rkvdec_rcb_size(const struct rcb_size_info *size_info,
@@ -40,6 +42,21 @@ int rkvdec_rcb_buf_count(struct rkvdec_ctx *ctx)
 	return ctx->rcb_config->rcb_count;
 }
 
+bool rkvdec_rcb_buf_validate_size(struct rkvdec_ctx *ctx)
+{
+	struct rkvdec_rcb_config *cfg = ctx->rcb_config;
+
+	bool ret = cfg && cfg->height >= ctx->decoded_fmt.fmt.pix_mp.height &&
+		   cfg->width >= ctx->decoded_fmt.fmt.pix_mp.width;
+
+	if (!ret && cfg) {
+		dev_dbg(ctx->dev->dev, "RCB size %ux%u -> %ux%u\n", cfg->width, cfg->height,
+			ctx->decoded_fmt.fmt.pix_mp.width, ctx->decoded_fmt.fmt.pix_mp.height);
+	}
+
+	return ret;
+}
+
 void rkvdec_free_rcb(struct rkvdec_ctx *ctx)
 {
 	struct rkvdec_dev *dev = ctx->dev;
@@ -77,14 +94,15 @@ void rkvdec_free_rcb(struct rkvdec_ctx *ctx)
 		devm_kfree(dev->dev, cfg->rcb_bufs);
 
 	devm_kfree(dev->dev, cfg);
+
+	ctx->rcb_config = NULL;
 }
 
-int rkvdec_allocate_rcb(struct rkvdec_ctx *ctx,
+int rkvdec_allocate_rcb(struct rkvdec_ctx *ctx, u32 width, u32 height,
 			const struct rcb_size_info *size_info,
 			size_t rcb_count)
 {
 	int ret, i;
-	u32 width, height;
 	struct rkvdec_dev *rkvdec = ctx->dev;
 	struct rkvdec_rcb_config *cfg;
 
@@ -105,8 +123,8 @@ int rkvdec_allocate_rcb(struct rkvdec_ctx *ctx,
 		goto err_alloc;
 	}
 
-	width = ctx->decoded_fmt.fmt.pix_mp.width;
-	height = ctx->decoded_fmt.fmt.pix_mp.height;
+	cfg->width = width;
+	cfg->height = height;
 
 	for (i = 0; i < rcb_count; i++) {
 		void *cpu = NULL;
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h
index 30e8002555c8..0662a4359bdf 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-rcb.h
@@ -20,10 +20,11 @@ struct rcb_size_info {
 	enum rcb_axis axis;
 };
 
-int rkvdec_allocate_rcb(struct rkvdec_ctx *ctx,
+int rkvdec_allocate_rcb(struct rkvdec_ctx *ctx, u32 width, u32 height,
 			const struct rcb_size_info *size_info,
 			size_t rcb_count);
 dma_addr_t rkvdec_rcb_buf_dma_addr(struct rkvdec_ctx *ctx, int id);
 size_t rkvdec_rcb_buf_size(struct rkvdec_ctx *ctx, int id);
 int rkvdec_rcb_buf_count(struct rkvdec_ctx *ctx);
+bool rkvdec_rcb_buf_validate_size(struct rkvdec_ctx *ctx);
 void rkvdec_free_rcb(struct rkvdec_ctx *ctx);
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec.c b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
index 1d1e9bfef8e9..31ddfcc58894 100644
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
@@ -978,8 +978,7 @@ static int rkvdec_start_streaming(struct vb2_queue *q, unsigned int count)
 {
 	struct rkvdec_ctx *ctx = vb2_get_drv_priv(q);
 	const struct rkvdec_coded_fmt_desc *desc;
-	const struct rkvdec_variant *variant = ctx->dev->variant;
-	int ret;
+	int ret = 0;
 
 	if (V4L2_TYPE_IS_CAPTURE(q->type))
 		return 0;
@@ -988,20 +987,8 @@ static int rkvdec_start_streaming(struct vb2_queue *q, unsigned int count)
 	if (WARN_ON(!desc))
 		return -EINVAL;
 
-	ret = rkvdec_allocate_rcb(ctx, variant->rcb_sizes, variant->num_rcb_sizes);
-	if (ret)
-		return ret;
-
-	if (desc->ops->start) {
+	if (desc->ops->start)
 		ret = desc->ops->start(ctx);
-		if (ret)
-			goto err_ops_start;
-	}
-
-	return 0;
-
-err_ops_start:
-	rkvdec_free_rcb(ctx);
 
 	return ret;
 }
@@ -1174,6 +1161,20 @@ static void rkvdec_device_run(void *priv)
 		return;
 	}
 
+	if (!rkvdec_rcb_buf_validate_size(ctx)) {
+		rkvdec_free_rcb(ctx);
+
+		ret = rkvdec_allocate_rcb(ctx,
+					  ctx->decoded_fmt.fmt.pix_mp.width,
+					  ctx->decoded_fmt.fmt.pix_mp.height,
+					  ctx->dev->variant->rcb_sizes,
+					  ctx->dev->variant->num_rcb_sizes);
+		if (ret) {
+			rkvdec_job_finish(ctx, VB2_BUF_STATE_ERROR);
+			return;
+		}
+	}
+
 	ret = desc->ops->run(ctx);
 	if (ret)
 		rkvdec_job_finish(ctx, VB2_BUF_STATE_ERROR);

-- 
2.53.0



^ permalink raw reply related

* [PATCH 1/7] media: v4l2-mem2mem: Add v4l2_m2m_buf_done_manual()
From: Detlev Casanova @ 2026-04-09 13:50 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Ezequiel Garcia, Heiko Stuebner,
	Nicolas Dufresne, Hans Verkuil, Jonas Karlman
  Cc: kernel, linux-media, linux-kernel, linux-rockchip,
	linux-arm-kernel, Detlev Casanova
In-Reply-To: <20260409-rkvdec-multicore-v1-0-62b316abf0f7@collabora.com>

This function can be used to mark buffers as done, handling locking, but
not finishing the job as it is done by v4l2_m2m_buf_done_and_finish_job().

To avoid copying similar code, a static function is added with an extra
finish argument.
The code path of v4l2_m2m_buf_done_and_finish_job() is unchanged.

This allows for finer grained buffer management in drivers, scheduling
new jobs before the previous one is finished and prepares for enabling
multicore support in rkvdec.

Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
---
 drivers/media/v4l2-core/v4l2-mem2mem.c | 27 ++++++++++++++++++++++-----
 include/media/v4l2-mem2mem.h           | 20 ++++++++++++++++++++
 2 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c
index a65cbb124cfe..7f9fad4f6807 100644
--- a/drivers/media/v4l2-core/v4l2-mem2mem.c
+++ b/drivers/media/v4l2-core/v4l2-mem2mem.c
@@ -503,9 +503,9 @@ void v4l2_m2m_job_finish(struct v4l2_m2m_dev *m2m_dev,
 }
 EXPORT_SYMBOL(v4l2_m2m_job_finish);
 
-void v4l2_m2m_buf_done_and_job_finish(struct v4l2_m2m_dev *m2m_dev,
-				      struct v4l2_m2m_ctx *m2m_ctx,
-				      enum vb2_buffer_state state)
+static void _buf_done_and_job_finish(struct v4l2_m2m_dev *m2m_dev,
+				     struct v4l2_m2m_ctx *m2m_ctx,
+				     enum vb2_buffer_state state, bool finish)
 {
 	struct vb2_v4l2_buffer *src_buf, *dst_buf;
 	bool schedule_next = false;
@@ -532,13 +532,30 @@ void v4l2_m2m_buf_done_and_job_finish(struct v4l2_m2m_dev *m2m_dev,
 	 * before the CAPTURE buffer is done.
 	 */
 	v4l2_m2m_buf_done(src_buf, state);
-	schedule_next = _v4l2_m2m_job_finish(m2m_dev, m2m_ctx);
+	if (finish)
+		schedule_next = _v4l2_m2m_job_finish(m2m_dev, m2m_ctx);
 unlock:
 	spin_unlock_irqrestore(&m2m_dev->job_spinlock, flags);
 
-	if (schedule_next)
+	if (schedule_next || !finish)
 		v4l2_m2m_schedule_next_job(m2m_dev, m2m_ctx);
 }
+
+
+void v4l2_m2m_buf_done_manual(struct v4l2_m2m_dev *m2m_dev,
+			      struct v4l2_m2m_ctx *m2m_ctx,
+			      enum vb2_buffer_state state)
+{
+	_buf_done_and_job_finish(m2m_dev, m2m_ctx, state, false);
+}
+EXPORT_SYMBOL(v4l2_m2m_buf_done_manual);
+
+void v4l2_m2m_buf_done_and_job_finish(struct v4l2_m2m_dev *m2m_dev,
+				      struct v4l2_m2m_ctx *m2m_ctx,
+				      enum vb2_buffer_state state)
+{
+	_buf_done_and_job_finish(m2m_dev, m2m_ctx, state, true);
+}
 EXPORT_SYMBOL(v4l2_m2m_buf_done_and_job_finish);
 
 void v4l2_m2m_suspend(struct v4l2_m2m_dev *m2m_dev)
diff --git a/include/media/v4l2-mem2mem.h b/include/media/v4l2-mem2mem.h
index 31de25d792b9..6a36fc885f5f 100644
--- a/include/media/v4l2-mem2mem.h
+++ b/include/media/v4l2-mem2mem.h
@@ -227,6 +227,26 @@ void v4l2_m2m_buf_done_and_job_finish(struct v4l2_m2m_dev *m2m_dev,
 				      struct v4l2_m2m_ctx *m2m_ctx,
 				      enum vb2_buffer_state state);
 
+/**
+ * v4l2_m2m_buf_done_manual() - manually mark the job as done, but do not
+ * finish it.
+ *
+ * @m2m_dev: opaque pointer to the internal data to handle M2M context
+ * @m2m_ctx: m2m context assigned to the instance given by struct &v4l2_m2m_ctx
+ * @state: vb2 buffer state passed to v4l2_m2m_buf_done().
+ *
+ * The function works the same way as v4l2_m2m_buf_done_and_job_finish()
+ * but does not inform the framework that the job has been finished,
+ * leaving the user the responsability to call v4l2_m2m_job_finish()
+ * when a buffer can be released to userspace.
+ *
+ * It allows driver to process new buffers, before the previous one is
+ * done.
+ */
+void v4l2_m2m_buf_done_manual(struct v4l2_m2m_dev *m2m_dev,
+			      struct v4l2_m2m_ctx *m2m_ctx,
+			      enum vb2_buffer_state state);
+
 static inline void
 v4l2_m2m_buf_done(struct vb2_v4l2_buffer *buf, enum vb2_buffer_state state)
 {

-- 
2.53.0



^ permalink raw reply related

* [PATCH 0/7] media: rkvdec: Enable multi-core support
From: Detlev Casanova @ 2026-04-09 13:50 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Ezequiel Garcia, Heiko Stuebner,
	Nicolas Dufresne, Hans Verkuil, Jonas Karlman
  Cc: kernel, linux-media, linux-kernel, linux-rockchip,
	linux-arm-kernel, Detlev Casanova

Since the driver is used for decoding on rk3588 and that the SoC has
2 identical decoding cores, enable support for it.

Instead of exposing 2 v4l2 devices to userspace, the driver will only
expose one and handle the 2 cores transparently.

The 2 cores are able to work in parallel, but only contexts are
parallelized: 1 stream, that uses 1 context, will only be able to use
1 core as it usually needs previous frames already decoded to use as
reference frames.
To avoid complex scheduling, only different streams can use cores at the
same time.

To achieve this, the v4l2_m2m_buf_done_and_job_finish() had to be split
in a done and a finish part (still keeping the unsplit function for other
drivers). That allows the driver to get new jobs to run while the previous
one is still running.
The job_ready() callback is used to avoid scheduling multiple jobs from
the same context.

The IOMMU support is in a different commit, as it needed a bit more
thought to work correctly, but I'm wondering if it should be merged with
the multicore support commit.

A fix for the RCB (Row and Cols Buffer) size computation is also provided
as it was causing issues with some fluster tests.

Performance-wise, fluster doesn't seem to run much faster, but I tested
with an HEVC test video from Jellyfin and observed that frames start
dropping with 6 concurrent gstreamer instances, instead of 4 without
multi-core enabled.

Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
---
Detlev Casanova (7):
      media: v4l2-mem2mem: Add v4l2_m2m_buf_done_manual()
      media: v4l2-mem2mem: Remove WARN_ON() in v4l2_m2m_job_finish()
      media: rkvdec: Keep RCB to the correct size
      media: rkvdec: Remove unused need_reset
      media: rkvdec: Add multicore support
      media: rkvdec: Wait for all buffers before stop_streaming
      media: rkvdec: Add multicore IOMMU support

 .../media/platform/rockchip/rkvdec/rkvdec-h264.c   |  17 +-
 .../media/platform/rockchip/rkvdec/rkvdec-hevc.c   |  16 +-
 .../media/platform/rockchip/rkvdec/rkvdec-rcb.c    |  77 ++--
 .../media/platform/rockchip/rkvdec/rkvdec-rcb.h    |   8 +-
 .../platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c |  24 +-
 .../platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c |  24 +-
 .../platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c |  24 +-
 .../platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c |  26 +-
 .../media/platform/rockchip/rkvdec/rkvdec-vp9.c    |  27 +-
 drivers/media/platform/rockchip/rkvdec/rkvdec.c    | 482 +++++++++++++--------
 drivers/media/platform/rockchip/rkvdec/rkvdec.h    |  31 +-
 drivers/media/v4l2-core/v4l2-mem2mem.c             |  34 +-
 include/media/v4l2-mem2mem.h                       |  20 +
 13 files changed, 500 insertions(+), 310 deletions(-)
---
base-commit: 3036cd0d3328220a1858b1ab390be8b562774e8a
change-id: 20260408-rkvdec-multicore-98831da51072

Best regards,
--  
Detlev Casanova <detlev.casanova@collabora.com>



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox