linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] Performance improvement of decoder
@ 2025-05-22  7:25 Jackson.lee
  2025-05-22  7:26 ` [PATCH v2 1/7] media: chips-media: wave5: Fix Null reference while testing fluster Jackson.lee
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Jackson.lee @ 2025-05-22  7:25 UTC (permalink / raw)
  To: mchehab, hverkuil-cisco, sebastian.fricke, nicolas.dufresne,
	bob.beckett, dafna.hirschfeld
  Cc: linux-media, linux-kernel, jackson.lee, lafley.kim, b-brnich,
	hverkuil, nas.chung

From: Jackson Lee <jackson.lee@chipsnmedia.com>

v4l2-compliance results:
========================

v4l2-compliance 1.28.1-5233, 64 bits, 64-bit time_t

Buffer ioctls:
                warn: v4l2-test-buffers.cpp(693): VIDIOC_CREATE_BUFS not supported
                warn: v4l2-test-buffers.cpp(693): VIDIOC_CREATE_BUFS not supported
        test VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF: OK
        test CREATE_BUFS maximum buffers: OK
        test VIDIOC_EXPBUF: OK
        test Requests: OK (Not Supported)

Total for wave5-dec device /dev/video0: 46, Succeeded: 46, Failed: 0, Warnings: 2 Total for wave5-enc device /dev/video1: 46, Succeeded: 46, Failed: 0, Warnings: 0

Fluster test results:
=====================

Running test suite JCT-VC-HEVC_V1 with decoder GStreamer-H.265-V4L2-Gst1.0 Using 3 parallel job(s)
Ran 133/147 tests successfully               in 40.982 secs

(1 test fails because of not supporting to parse multi frames, 1 test fails because of a missing frame and slight corruption,
 2 tests fail because of sizes which are incompatible with the IP, 11 tests fail because of unsupported 10 bit format)


Running test suite JVT-AVC_V1 with decoder GStreamer-H.264-V4L2-Gst1.0 Using 3 parallel job(s)
Ran 78/135 tests successfully               in 43.445 secs

(57 fail because the hardware is unable to decode  MBAFF / FMO / Field / Extended profile streams.)

Running test suite JVT-FR-EXT with decoder GStreamer-H.264-V4L2-Gst1.0 Using 3 parallel job(s)
Ran 25/69 tests successfully               in 30.118 secs

(44 fail because the hardware does not support field encoded and 422
encoded stream)

Seek test
=====================
1. gst-play-1.0 seek.264
2. this will use waylandsink since gst-play-1.0 uses playbin.
   if you don't want to hook up display,
   you can run gst-play-1.0 seek.264 --videosink=fakevideosink instead 3. Let pipeline run for 2-3 seconds 4. press SPACE key to pause 5. press 0 to reset press SPACE to start again

gst-play-1.0 seek.264 --videosink=fakevideosink Press 'k' to see a list of keyboard shortcuts.
Now playing /root/seek.264
Redistribute latency...
Redistribute latency...
Redistribute latency...
Redistribute latency...
Redistribute latency...aused
0:00:09.9 / 0:00:09.7
Reached end of play list.

Sequence Change test
=====================
gst-launch-1.0 filesrc location=./drc.h264 ! h264parse ! v4l2h264dec ! filesink location=./h264_output_420.yuv
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Redistribute latency...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Redistribute latency...
Got EOS from element "pipeline0".
Execution ended after 0:00:00.113620590
Setting pipeline to NULL ...
Freeing pipeline ...

Change since v1:
===================
* For [PATCH v1 2/7] media: chips-media: wave5: Improve performance of decoder
 - change log to dbg level

Change since v0:
===================
* For [PATCH v1 2/7] media: chips-media: wave5: Improve performance of decoder
 - separates the previous patch to a few patches

* For [PATCH v1 3/7] media: chips-media: wave5: Fix not to be closed
 - separated from the previous patch of performance improvement of
   decoder

* For [PATCH v1 4/7] media: chips-media: wave5: Use spinlock whenever state is changed
 - separated from the previous patch of performance improvement of
   decoder

* For [PATCH v1 5/7] media: chips-media: wave5: Fix not to free resources normally when
    instance was destroyed
 - separated from the previous patch of performance improvement of
   decoder

* For [PATCH v1 7/7] media: chips-media: wave5: Fix SError of kernel panic when closed
 - separated from the previous patch of performance improvement of
   decoder


Jackson Lee (7):
  media: chips-media: wave5: Fix Null reference while testing fluster
  media: chips-media: wave5: Improve performance of decoder
  media: chips-media: wave5: Fix not to be closed
  media: chips-media: wave5: Use spinlock whenever statue is changed
  media: chips-media: wave5: Fix not to free resources normally when
    instance was destroyed
  media: chips-media: wave5: Reduce high CPU load
  media: chips-media: wave5: Fix SError of kernel panic when closed

 .../platform/chips-media/wave5/wave5-helper.c |  10 +-
 .../platform/chips-media/wave5/wave5-hw.c     |   2 +-
 .../chips-media/wave5/wave5-vpu-dec.c         | 117 +++++++++++-------
 .../chips-media/wave5/wave5-vpu-enc.c         |   8 +-
 .../platform/chips-media/wave5/wave5-vpu.c    |  71 +++++++++--
 .../platform/chips-media/wave5/wave5-vpuapi.c |  36 +++---
 .../platform/chips-media/wave5/wave5-vpuapi.h |  10 ++
 .../chips-media/wave5/wave5-vpuconfig.h       |   1 +
 8 files changed, 181 insertions(+), 74 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 1/7] media: chips-media: wave5: Fix Null reference while testing fluster
  2025-05-22  7:25 [PATCH v2 0/7] Performance improvement of decoder Jackson.lee
@ 2025-05-22  7:26 ` Jackson.lee
  2025-05-23 17:20   ` Nicolas Dufresne
  2025-05-22  7:26 ` [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder Jackson.lee
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Jackson.lee @ 2025-05-22  7:26 UTC (permalink / raw)
  To: mchehab, hverkuil-cisco, sebastian.fricke, nicolas.dufresne,
	bob.beckett, dafna.hirschfeld
  Cc: linux-media, linux-kernel, jackson.lee, lafley.kim, b-brnich,
	hverkuil, nas.chung

From: Jackson Lee <jackson.lee@chipsnmedia.com>

When multi instances are created/destroyed, many interrupts happens
or structures for decoder are removed.
"struct vpu_instance" this structure is shared for all flow in decoder,
so if the structure is not protected by lock, Null reference exception
could happens sometimes.
IRQ Handler was spilt to two phases and Lock was added as well.

Fixes: 9707a6254a8a ("media: chips-media: wave5: Add the v4l2 layer")
Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
---
 .../platform/chips-media/wave5/wave5-helper.c | 10 ++-
 .../chips-media/wave5/wave5-vpu-dec.c         |  5 ++
 .../chips-media/wave5/wave5-vpu-enc.c         |  5 ++
 .../platform/chips-media/wave5/wave5-vpu.c    | 69 ++++++++++++++++---
 .../platform/chips-media/wave5/wave5-vpuapi.h |  6 ++
 5 files changed, 86 insertions(+), 9 deletions(-)

diff --git a/drivers/media/platform/chips-media/wave5/wave5-helper.c b/drivers/media/platform/chips-media/wave5/wave5-helper.c
index 2c9d8cbca6e4..5d9969bb7ada 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-helper.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-helper.c
@@ -49,7 +49,7 @@ void wave5_cleanup_instance(struct vpu_instance *inst)
 		v4l2_fh_del(&inst->v4l2_fh);
 		v4l2_fh_exit(&inst->v4l2_fh);
 	}
-	list_del_init(&inst->list);
+	kfifo_free(&inst->irq_status);
 	ida_free(&inst->dev->inst_ida, inst->id);
 	kfree(inst->codec_info);
 	kfree(inst);
@@ -61,8 +61,16 @@ int wave5_vpu_release_device(struct file *filp,
 {
 	struct vpu_instance *inst = wave5_to_vpu_inst(filp->private_data);
 	int ret = 0;
+	unsigned long flags;
 
 	v4l2_m2m_ctx_release(inst->v4l2_fh.m2m_ctx);
+	ret = mutex_lock_interruptible(&inst->dev->irq_lock);
+	if (ret)
+		return ret;
+	spin_lock_irqsave(&inst->dev->irq_spinlock, flags);
+	list_del_init(&inst->list);
+	spin_unlock_irqrestore(&inst->dev->irq_spinlock, flags);
+	mutex_unlock(&inst->dev->irq_lock);
 	if (inst->state != VPU_INST_STATE_NONE) {
 		u32 fail_res;
 
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
index fd71f0c43ac3..32de43de1870 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
@@ -1811,6 +1811,11 @@ static int wave5_vpu_open_dec(struct file *filp)
 	inst->xfer_func = V4L2_XFER_FUNC_DEFAULT;
 
 	init_completion(&inst->irq_done);
+	ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int), GFP_KERNEL);
+	if (ret) {
+		dev_err(inst->dev->dev, "failed to allocate fifo\n");
+		goto cleanup_inst;
+	}
 
 	inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL);
 	if (inst->id < 0) {
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
index 1e5fc5f8b856..52a1a00fd9bb 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
@@ -1760,6 +1760,11 @@ static int wave5_vpu_open_enc(struct file *filp)
 	inst->frame_rate = 30;
 
 	init_completion(&inst->irq_done);
+	ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int), GFP_KERNEL);
+	if (ret) {
+		dev_err(inst->dev->dev, "failed to allocate fifo\n");
+		goto cleanup_inst;
+	}
 
 	inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL);
 	if (inst->id < 0) {
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
index e1715d3f43b0..a2c09523d76b 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
@@ -51,8 +51,11 @@ static void wave5_vpu_handle_irq(void *dev_id)
 	u32 seq_done;
 	u32 cmd_done;
 	u32 irq_reason;
-	struct vpu_instance *inst;
+	u32 irq_subreason;
+	struct vpu_instance *inst, *tmp;
 	struct vpu_device *dev = dev_id;
+	int val;
+	unsigned long flags;
 
 	irq_reason = wave5_vdi_read_register(dev, W5_VPU_VINT_REASON);
 	seq_done = wave5_vdi_read_register(dev, W5_RET_SEQ_DONE_INSTANCE_INFO);
@@ -60,7 +63,8 @@ static void wave5_vpu_handle_irq(void *dev_id)
 	wave5_vdi_write_register(dev, W5_VPU_VINT_REASON_CLR, irq_reason);
 	wave5_vdi_write_register(dev, W5_VPU_VINT_CLEAR, 0x1);
 
-	list_for_each_entry(inst, &dev->instances, list) {
+	spin_lock_irqsave(&dev->irq_spinlock, flags);
+	list_for_each_entry_safe(inst, tmp, &dev->instances, list) {
 
 		if (irq_reason & BIT(INT_WAVE5_INIT_SEQ) ||
 		    irq_reason & BIT(INT_WAVE5_ENC_SET_PARAM)) {
@@ -82,14 +86,22 @@ static void wave5_vpu_handle_irq(void *dev_id)
 		    irq_reason & BIT(INT_WAVE5_ENC_PIC)) {
 			if (cmd_done & BIT(inst->id)) {
 				cmd_done &= ~BIT(inst->id);
-				wave5_vdi_write_register(dev, W5_RET_QUEUE_CMD_DONE_INST,
-							 cmd_done);
-				inst->ops->finish_process(inst);
+				if (dev->irq >= 0) {
+					irq_subreason =
+						wave5_vdi_read_register(dev, W5_VPU_VINT_REASON);
+					if (!(irq_subreason & BIT(INT_WAVE5_DEC_PIC)))
+						wave5_vdi_write_register(dev,
+									 W5_RET_QUEUE_CMD_DONE_INST,
+									 cmd_done);
+				}
+				val = BIT(INT_WAVE5_DEC_PIC);
+				kfifo_in(&inst->irq_status, &val, sizeof(int));
 			}
 		}
-
-		wave5_vpu_clear_interrupt(inst, irq_reason);
 	}
+	spin_unlock_irqrestore(&dev->irq_spinlock, flags);
+
+	up(&dev->irq_sem);
 }
 
 static irqreturn_t wave5_vpu_irq_thread(int irq, void *dev_id)
@@ -121,6 +133,35 @@ static enum hrtimer_restart wave5_vpu_timer_callback(struct hrtimer *timer)
 	return HRTIMER_RESTART;
 }
 
+static int irq_thread(void *data)
+{
+	struct vpu_device *dev = (struct vpu_device *)data;
+	struct vpu_instance *inst, *tmp;
+	int irq_status, ret;
+
+	while (!kthread_should_stop()) {
+		if (down_interruptible(&dev->irq_sem))
+			continue;
+
+		if (kthread_should_stop())
+			break;
+
+		mutex_lock(&dev->irq_lock);
+		list_for_each_entry_safe(inst, tmp, &dev->instances, list) {
+			while (kfifo_len(&inst->irq_status)) {
+				ret = kfifo_out(&inst->irq_status, &irq_status, sizeof(int));
+				if (!ret)
+					break;
+
+				inst->ops->finish_process(inst);
+			}
+		}
+		mutex_unlock(&dev->irq_lock);
+	}
+
+	return 0;
+}
+
 static int wave5_vpu_load_firmware(struct device *dev, const char *fw_name,
 				   u32 *revision)
 {
@@ -224,6 +265,8 @@ static int wave5_vpu_probe(struct platform_device *pdev)
 
 	mutex_init(&dev->dev_lock);
 	mutex_init(&dev->hw_lock);
+	mutex_init(&dev->irq_lock);
+	spin_lock_init(&dev->irq_spinlock);
 	dev_set_drvdata(&pdev->dev, dev);
 	dev->dev = &pdev->dev;
 
@@ -266,6 +309,10 @@ static int wave5_vpu_probe(struct platform_device *pdev)
 	}
 	dev->product = wave5_vpu_get_product_id(dev);
 
+	sema_init(&dev->irq_sem, 1);
+	INIT_LIST_HEAD(&dev->instances);
+	dev->irq_thread = kthread_run(irq_thread, dev, "irq thread");
+
 	dev->irq = platform_get_irq(pdev, 0);
 	if (dev->irq < 0) {
 		dev_err(&pdev->dev, "failed to get irq resource, falling back to polling\n");
@@ -288,7 +335,6 @@ static int wave5_vpu_probe(struct platform_device *pdev)
 		}
 	}
 
-	INIT_LIST_HEAD(&dev->instances);
 	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
 	if (ret) {
 		dev_err(&pdev->dev, "v4l2_device_register, fail: %d\n", ret);
@@ -351,6 +397,12 @@ static void wave5_vpu_remove(struct platform_device *pdev)
 {
 	struct vpu_device *dev = dev_get_drvdata(&pdev->dev);
 
+	if (dev->irq_thread) {
+		kthread_stop(dev->irq_thread);
+		up(&dev->irq_sem);
+		dev->irq_thread = NULL;
+	}
+
 	if (dev->irq < 0) {
 		kthread_destroy_worker(dev->worker);
 		hrtimer_cancel(&dev->hrtimer);
@@ -361,6 +413,7 @@ static void wave5_vpu_remove(struct platform_device *pdev)
 
 	mutex_destroy(&dev->dev_lock);
 	mutex_destroy(&dev->hw_lock);
+	mutex_destroy(&dev->irq_lock);
 	reset_control_assert(dev->resets);
 	clk_bulk_disable_unprepare(dev->num_clks, dev->clks);
 	wave5_vpu_enc_unregister_device(dev);
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
index 45615c15beca..f3c1ad6fb3be 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
@@ -8,6 +8,7 @@
 #ifndef VPUAPI_H_INCLUDED
 #define VPUAPI_H_INCLUDED
 
+#include <linux/kfifo.h>
 #include <linux/idr.h>
 #include <linux/genalloc.h>
 #include <media/v4l2-device.h>
@@ -747,6 +748,7 @@ struct vpu_device {
 	struct video_device *video_dev_enc;
 	struct mutex dev_lock; /* lock for the src, dst v4l2 queues */
 	struct mutex hw_lock; /* lock hw configurations */
+	struct mutex irq_lock;
 	int irq;
 	enum product_id product;
 	struct vpu_attr attr;
@@ -764,7 +766,10 @@ struct vpu_device {
 	struct kthread_worker *worker;
 	int vpu_poll_interval;
 	int num_clks;
+	struct task_struct *irq_thread;
+	struct semaphore irq_sem;
 	struct reset_control *resets;
+	spinlock_t irq_spinlock; /* protect instances list */
 };
 
 struct vpu_instance;
@@ -788,6 +793,7 @@ struct vpu_instance {
 	enum v4l2_ycbcr_encoding ycbcr_enc;
 	enum v4l2_quantization quantization;
 
+	struct kfifo irq_status;
 	enum vpu_instance_state state;
 	enum vpu_instance_type type;
 	const struct vpu_instance_ops *ops;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder
  2025-05-22  7:25 [PATCH v2 0/7] Performance improvement of decoder Jackson.lee
  2025-05-22  7:26 ` [PATCH v2 1/7] media: chips-media: wave5: Fix Null reference while testing fluster Jackson.lee
@ 2025-05-22  7:26 ` Jackson.lee
  2025-05-23 17:39   ` Nicolas Dufresne
  2025-05-22  7:26 ` [PATCH v2 3/7] media: chips-media: wave5: Fix not to be closed Jackson.lee
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Jackson.lee @ 2025-05-22  7:26 UTC (permalink / raw)
  To: mchehab, hverkuil-cisco, sebastian.fricke, nicolas.dufresne,
	bob.beckett, dafna.hirschfeld
  Cc: linux-media, linux-kernel, jackson.lee, lafley.kim, b-brnich,
	hverkuil, nas.chung

From: Jackson Lee <jackson.lee@chipsnmedia.com>

The current decoding method  was to wait until each frame was
decoded after feeding a bitstream. As a result, performance was low
and Wave5 could not achieve max pixel processing rate.

Update driver to use an asynchronous approach for decoding and feeding a
bitstream in order to achieve full capabilities of the device.

WAVE5 supports command-queueing to maximize performance by pipelining
internal commands and by hiding wait cycle taken to receive a command
from Host processor.

Instead of waiting for each command to be executed before sending the
next command, Host processor just places all the commands in the
command-queue and goes on doing other things while the commands in the
queue are processed by VPU.

While Host processor handles its own tasks, it can receive VPU interrupt
request (IRQ).
In this case, host processor can simply exit interrupt service routine
(ISR) without accessing to host interface to read the result of the
command reported by VPU.
After host processor completed its tasks, host processor can read the
command result when host processor needs the reports and does
response processing.

To archive this goal, the device_run() calls v4l2_m2m_job_finish
so that next command can be sent to VPU continuously, if there is
any result, then irq is triggered and gets decoded frames and returns
them to upper layer.
Theses processes work independently each other without waiting
a decoded frame.

Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
---
 .../platform/chips-media/wave5/wave5-hw.c     |  2 +-
 .../chips-media/wave5/wave5-vpu-dec.c         | 84 +++++++++++--------
 .../platform/chips-media/wave5/wave5-vpuapi.c |  2 +
 .../platform/chips-media/wave5/wave5-vpuapi.h |  3 +
 4 files changed, 57 insertions(+), 34 deletions(-)

diff --git a/drivers/media/platform/chips-media/wave5/wave5-hw.c b/drivers/media/platform/chips-media/wave5/wave5-hw.c
index d94cf84c3ee5..687ce6ccf3ae 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-hw.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-hw.c
@@ -102,7 +102,7 @@ static void _wave5_print_reg_err(struct vpu_device *vpu_dev, u32 reg_fail_reason
 		dev_dbg(dev, "%s: queueing failure: 0x%x\n", func, reg_val);
 		break;
 	case WAVE5_SYSERR_RESULT_NOT_READY:
-		dev_err(dev, "%s: result not ready: 0x%x\n", func, reg_fail_reason);
+		dev_dbg(dev, "%s: result not ready: 0x%x\n", func, reg_fail_reason);
 		break;
 	case WAVE5_SYSERR_ACCESS_VIOLATION_HW:
 		dev_err(dev, "%s: access violation: 0x%x\n", func, reg_fail_reason);
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
index 32de43de1870..995234a3a6d6 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
@@ -347,13 +347,12 @@ static void wave5_vpu_dec_finish_decode(struct vpu_instance *inst)
 	struct vb2_v4l2_buffer *dec_buf = NULL;
 	struct vb2_v4l2_buffer *disp_buf = NULL;
 	struct vb2_queue *dst_vq = v4l2_m2m_get_dst_vq(m2m_ctx);
-	struct queue_status_info q_status;
 
 	dev_dbg(inst->dev->dev, "%s: Fetch output info from firmware.", __func__);
 
 	ret = wave5_vpu_dec_get_output_info(inst, &dec_info);
 	if (ret) {
-		dev_warn(inst->dev->dev, "%s: could not get output info.", __func__);
+		dev_dbg(inst->dev->dev, "%s: could not get output info.", __func__);
 		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
 		return;
 	}
@@ -441,20 +440,6 @@ static void wave5_vpu_dec_finish_decode(struct vpu_instance *inst)
 		}
 		spin_unlock_irqrestore(&inst->state_spinlock, flags);
 	}
-
-	/*
-	 * During a resolution change and while draining, the firmware may flush
-	 * the reorder queue regardless of having a matching decoding operation
-	 * pending. Only terminate the job if there are no more IRQ coming.
-	 */
-	wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, &q_status);
-	if (q_status.report_queue_count == 0 &&
-	    (q_status.instance_queue_count == 0 || dec_info.sequence_changed)) {
-		dev_dbg(inst->dev->dev, "%s: finishing job.\n", __func__);
-		pm_runtime_mark_last_busy(inst->dev->dev);
-		pm_runtime_put_autosuspend(inst->dev->dev);
-		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
-	}
 }
 
 static int wave5_vpu_dec_querycap(struct file *file, void *fh, struct v4l2_capability *cap)
@@ -1146,8 +1131,8 @@ static int write_to_ringbuffer(struct vpu_instance *inst, void *buffer, size_t b
 static int fill_ringbuffer(struct vpu_instance *inst)
 {
 	struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
-	struct v4l2_m2m_buffer *buf, *n;
-	int ret;
+	struct vpu_src_buffer *vpu_buf;
+	int ret = 0;
 
 	if (m2m_ctx->last_src_buf)  {
 		struct vpu_src_buffer *vpu_buf = wave5_to_vpu_src_buf(m2m_ctx->last_src_buf);
@@ -1158,9 +1143,8 @@ static int fill_ringbuffer(struct vpu_instance *inst)
 		}
 	}
 
-	v4l2_m2m_for_each_src_buf_safe(m2m_ctx, buf, n) {
-		struct vb2_v4l2_buffer *vbuf = &buf->vb;
-		struct vpu_src_buffer *vpu_buf = wave5_to_vpu_src_buf(vbuf);
+	list_for_each_entry(vpu_buf, &inst->avail_src_bufs, list) {
+		struct vb2_v4l2_buffer *vbuf = &vpu_buf->v4l2_m2m_buf.vb;
 		struct vpu_buf *ring_buffer = &inst->bitstream_vbuf;
 		size_t src_size = vb2_get_plane_payload(&vbuf->vb2_buf, 0);
 		void *src_buf = vb2_plane_vaddr(&vbuf->vb2_buf, 0);
@@ -1220,9 +1204,13 @@ static int fill_ringbuffer(struct vpu_instance *inst)
 			dev_dbg(inst->dev->dev, "last src buffer written to the ring buffer\n");
 			break;
 		}
+
+		inst->queuing_num++;
+		list_del_init(&vpu_buf->list);
+		break;
 	}
 
-	return 0;
+	return ret;
 }
 
 static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb)
@@ -1236,6 +1224,11 @@ static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb)
 	vbuf->sequence = inst->queued_src_buf_num++;
 
 	v4l2_m2m_buf_queue(m2m_ctx, vbuf);
+
+	INIT_LIST_HEAD(&vpu_buf->list);
+	mutex_lock(&inst->feed_lock);
+	list_add_tail(&vpu_buf->list, &inst->avail_src_bufs);
+	mutex_unlock(&inst->feed_lock);
 }
 
 static void wave5_vpu_dec_buf_queue_dst(struct vb2_buffer *vb)
@@ -1385,6 +1378,13 @@ static int streamoff_output(struct vb2_queue *q)
 	dma_addr_t new_rd_ptr;
 	struct dec_output_info dec_info;
 	unsigned int i;
+	struct vpu_src_buffer *vpu_buf, *tmp;
+
+	inst->retry = false;
+	inst->queuing_num = 0;
+
+	list_for_each_entry_safe(vpu_buf, tmp, &inst->avail_src_bufs, list)
+		list_del_init(&vpu_buf->list);
 
 	for (i = 0; i < v4l2_m2m_num_dst_bufs_ready(m2m_ctx); i++) {
 		ret = wave5_vpu_dec_set_disp_flag(inst, i);
@@ -1580,10 +1580,19 @@ static void wave5_vpu_dec_device_run(void *priv)
 
 	dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with new bitstream data", __func__);
 	pm_runtime_resume_and_get(inst->dev->dev);
-	ret = fill_ringbuffer(inst);
-	if (ret) {
-		dev_warn(inst->dev->dev, "Filling ring buffer failed\n");
-		goto finish_job_and_return;
+	if (!inst->retry) {
+		mutex_lock(&inst->feed_lock);
+		ret = fill_ringbuffer(inst);
+		mutex_unlock(&inst->feed_lock);
+		if (ret < 0) {
+			dev_warn(inst->dev->dev, "Filling ring buffer failed\n");
+			goto finish_job_and_return;
+		} else if (!inst->eos &&
+				inst->queuing_num == 0 &&
+				inst->state == VPU_INST_STATE_PIC_RUN) {
+			dev_dbg(inst->dev->dev, "%s: no bitstream for feeding, so skip ", __func__);
+			goto finish_job_and_return;
+		}
 	}
 
 	switch (inst->state) {
@@ -1639,7 +1648,7 @@ static void wave5_vpu_dec_device_run(void *priv)
 		}
 
 		if (q_status.instance_queue_count) {
-			dev_dbg(inst->dev->dev, "%s: leave with active job", __func__);
+			v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
 			return;
 		}
 
@@ -1650,14 +1659,21 @@ static void wave5_vpu_dec_device_run(void *priv)
 			dev_err(inst->dev->dev,
 				"Frame decoding on m2m context (%p), fail: %d (result: %d)\n",
 				m2m_ctx, ret, fail_res);
-			break;
+			goto finish_job_and_return;
+		}
+
+		if (fail_res == WAVE5_SYSERR_QUEUEING_FAIL) {
+			inst->retry = true;
+		} else {
+			inst->retry = false;
+			if (!inst->eos)
+				inst->queuing_num--;
 		}
-		/* Return so that we leave this job active */
-		dev_dbg(inst->dev->dev, "%s: leave with active job", __func__);
-		return;
-	default:
-		WARN(1, "Execution of a job in state %s illegal.\n", state_to_str(inst->state));
 		break;
+	default:
+		dev_dbg(inst->dev->dev, "Execution of a job in state %s illegal.\n",
+			state_to_str(inst->state));
+
 	}
 
 finish_job_and_return:
@@ -1755,6 +1771,8 @@ static int wave5_vpu_open_dec(struct file *filp)
 	inst->ops = &wave5_vpu_dec_inst_ops;
 
 	spin_lock_init(&inst->state_spinlock);
+	mutex_init(&inst->feed_lock);
+	INIT_LIST_HEAD(&inst->avail_src_bufs);
 
 	inst->codec_info = kzalloc(sizeof(*inst->codec_info), GFP_KERNEL);
 	if (!inst->codec_info)
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
index e5e879a13e8b..68d86625538f 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
@@ -255,6 +255,8 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
 	if (inst_count == 1)
 		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
 
+	mutex_destroy(&inst->feed_lock);
+
 unlock_and_return:
 	mutex_unlock(&vpu_dev->hw_lock);
 	pm_runtime_put_sync(inst->dev->dev);
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
index f3c1ad6fb3be..fd0aef0bac4e 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
@@ -818,6 +818,9 @@ struct vpu_instance {
 	bool cbcr_interleave;
 	bool nv21;
 	bool eos;
+	bool retry; /* retry to feed bitstream if failure reason is WAVE5_SYSERR_QUEUEING_FAIL*/
+	int queuing_num; /* check if there is input buffer or not */
+	struct mutex feed_lock; /* lock for feeding bitstream buffers */
 	struct vpu_buf bitstream_vbuf;
 	dma_addr_t last_rd_ptr;
 	size_t remaining_consumed_bytes;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 3/7] media: chips-media: wave5: Fix not to be closed
  2025-05-22  7:25 [PATCH v2 0/7] Performance improvement of decoder Jackson.lee
  2025-05-22  7:26 ` [PATCH v2 1/7] media: chips-media: wave5: Fix Null reference while testing fluster Jackson.lee
  2025-05-22  7:26 ` [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder Jackson.lee
@ 2025-05-22  7:26 ` Jackson.lee
  2025-05-22  7:26 ` [PATCH v2 4/7] media: chips-media: wave5: Use spinlock whenever statue is changed Jackson.lee
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Jackson.lee @ 2025-05-22  7:26 UTC (permalink / raw)
  To: mchehab, hverkuil-cisco, sebastian.fricke, nicolas.dufresne,
	bob.beckett, dafna.hirschfeld
  Cc: linux-media, linux-kernel, jackson.lee, lafley.kim, b-brnich,
	hverkuil, nas.chung

From: Jackson Lee <jackson.lee@chipsnmedia.com>

The current code was to wait interrupt if queue or report queue is not 0,
but since applying the performance patch, input and output is not any
more synchronized.
So even if queue count is not 0, an interrupt could not be triggered.

Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
---
 drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c   | 6 +++---
 drivers/media/platform/chips-media/wave5/wave5-vpuconfig.h | 1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
index 995234a3a6d6..42981c3b49bc 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
@@ -1481,11 +1481,11 @@ static void wave5_vpu_dec_stop_streaming(struct vb2_queue *q)
 
 		wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, &q_status);
 
-		if (q_status.report_queue_count == 0)
+		if ((inst->state == VPU_INST_STATE_STOP || q_status.instance_queue_count == 0) &&
+		    q_status.report_queue_count == 0)
 			break;
 
-		if (wave5_vpu_wait_interrupt(inst, VPU_DEC_TIMEOUT) < 0)
-			break;
+		wave5_vpu_wait_interrupt(inst, VPU_DEC_STOP_TIMEOUT);
 
 		if (wave5_vpu_dec_get_output_info(inst, &dec_output_info))
 			dev_dbg(inst->dev->dev, "there is no output info\n");
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuconfig.h b/drivers/media/platform/chips-media/wave5/wave5-vpuconfig.h
index 1ea9f5f31499..4ebd48d5550e 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpuconfig.h
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpuconfig.h
@@ -59,6 +59,7 @@
 //  application specific configuration
 #define VPU_ENC_TIMEOUT                 60000
 #define VPU_DEC_TIMEOUT                 60000
+#define VPU_DEC_STOP_TIMEOUT            10
 
 // for WAVE encoder
 #define USE_SRC_PRP_AXI         0
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 4/7] media: chips-media: wave5: Use spinlock whenever statue is changed
  2025-05-22  7:25 [PATCH v2 0/7] Performance improvement of decoder Jackson.lee
                   ` (2 preceding siblings ...)
  2025-05-22  7:26 ` [PATCH v2 3/7] media: chips-media: wave5: Fix not to be closed Jackson.lee
@ 2025-05-22  7:26 ` Jackson.lee
  2025-05-23 17:41   ` Nicolas Dufresne
  2025-05-22  7:26 ` [PATCH v2 5/7] media: chips-media: wave5: Fix not to free resources normally when instance was destroyed Jackson.lee
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Jackson.lee @ 2025-05-22  7:26 UTC (permalink / raw)
  To: mchehab, hverkuil-cisco, sebastian.fricke, nicolas.dufresne,
	bob.beckett, dafna.hirschfeld
  Cc: linux-media, linux-kernel, jackson.lee, lafley.kim, b-brnich,
	hverkuil, nas.chung

From: Jackson Lee <jackson.lee@chipsnmedia.com>

The device_run and finish_decode is not any more synchronized,
so lock was needed in the device_run whenever state was changed.

Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
---
 drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
index 42981c3b49bc..719c5527eb7f 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
@@ -1577,6 +1577,7 @@ static void wave5_vpu_dec_device_run(void *priv)
 	struct queue_status_info q_status;
 	u32 fail_res = 0;
 	int ret = 0;
+	unsigned long flags;
 
 	dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with new bitstream data", __func__);
 	pm_runtime_resume_and_get(inst->dev->dev);
@@ -1617,7 +1618,9 @@ static void wave5_vpu_dec_device_run(void *priv)
 			}
 			spin_unlock_irqrestore(&inst->state_spinlock, flags);
 		} else {
+			spin_lock_irqsave(&inst->state_spinlock, flags);
 			switch_state(inst, VPU_INST_STATE_INIT_SEQ);
+			spin_unlock_irqrestore(&inst->state_spinlock, flags);
 		}
 
 		break;
@@ -1628,8 +1631,9 @@ static void wave5_vpu_dec_device_run(void *priv)
 		 * we had a chance to switch, which leads to an invalid state
 		 * change.
 		 */
+		spin_lock_irqsave(&inst->state_spinlock, flags);
 		switch_state(inst, VPU_INST_STATE_PIC_RUN);
-
+		spin_unlock_irqrestore(&inst->state_spinlock, flags);
 		/*
 		 * During DRC, the picture decoding remains pending, so just leave the job
 		 * active until this decode operation completes.
@@ -1643,7 +1647,9 @@ static void wave5_vpu_dec_device_run(void *priv)
 		ret = wave5_prepare_fb(inst);
 		if (ret) {
 			dev_warn(inst->dev->dev, "Framebuffer preparation, fail: %d\n", ret);
+			spin_lock_irqsave(&inst->state_spinlock, flags);
 			switch_state(inst, VPU_INST_STATE_STOP);
+			spin_unlock_irqrestore(&inst->state_spinlock, flags);
 			break;
 		}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 5/7] media: chips-media: wave5: Fix not to free resources normally when instance was destroyed
  2025-05-22  7:25 [PATCH v2 0/7] Performance improvement of decoder Jackson.lee
                   ` (3 preceding siblings ...)
  2025-05-22  7:26 ` [PATCH v2 4/7] media: chips-media: wave5: Use spinlock whenever statue is changed Jackson.lee
@ 2025-05-22  7:26 ` Jackson.lee
  2025-05-23 17:42   ` Nicolas Dufresne
  2025-05-22  7:26 ` [PATCH v2 6/7] media: chips-media: wave5: Reduce high CPU load Jackson.lee
  2025-05-22  7:26 ` [PATCH v2 7/7] media: chips-media: wave5: Fix SError of kernel panic when closed Jackson.lee
  6 siblings, 1 reply; 29+ messages in thread
From: Jackson.lee @ 2025-05-22  7:26 UTC (permalink / raw)
  To: mchehab, hverkuil-cisco, sebastian.fricke, nicolas.dufresne,
	bob.beckett, dafna.hirschfeld
  Cc: linux-media, linux-kernel, jackson.lee, lafley.kim, b-brnich,
	hverkuil, nas.chung

From: Jackson Lee <jackson.lee@chipsnmedia.com>

Since applying performance patch, there was a problem not to free
resources, the root cause was that timeout sometimes happened after
calling the wave5_vpu_dec_finish_seq() when application was closed
forcibly,so if failure reason is WAVE5_SYSERR_VPU_STILL_RUNNING,
the wave5_vpu_dec_get_output_info() should be called to flush videos
decoded before closed.

Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
---
 .../platform/chips-media/wave5/wave5-vpuapi.c | 20 +++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
index 68d86625538f..d7318d596b73 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
@@ -209,6 +209,7 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
 	int i;
 	int inst_count = 0;
 	struct vpu_instance *inst_elm;
+	struct dec_output_info dec_info;
 
 	*fail_res = 0;
 	if (!inst->codec_info)
@@ -229,11 +230,26 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
 			goto unlock_and_return;
 		}
 
-		if (*fail_res == WAVE5_SYSERR_VPU_STILL_RUNNING &&
-		    retry++ >= MAX_FIRMWARE_CALL_RETRY) {
+		if (ret == 0)
+			break;
+
+		if (*fail_res != WAVE5_SYSERR_VPU_STILL_RUNNING) {
+			dev_warn(inst->dev->dev, "dec_finish_seq timed out\n");
+			goto unlock_and_return;
+		}
+
+		if (retry++ >= MAX_FIRMWARE_CALL_RETRY) {
 			ret = -ETIMEDOUT;
 			goto unlock_and_return;
 		}
+
+		mutex_unlock(&vpu_dev->hw_lock);
+		wave5_vpu_dec_get_output_info(inst, &dec_info);
+		ret = mutex_lock_interruptible(&vpu_dev->hw_lock);
+		if (ret) {
+			pm_runtime_put_sync(inst->dev->dev);
+			return ret;
+		}
 	} while (ret != 0);
 
 	dev_dbg(inst->dev->dev, "%s: dec_finish_seq complete\n", __func__);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 6/7] media: chips-media: wave5: Reduce high CPU load
  2025-05-22  7:25 [PATCH v2 0/7] Performance improvement of decoder Jackson.lee
                   ` (4 preceding siblings ...)
  2025-05-22  7:26 ` [PATCH v2 5/7] media: chips-media: wave5: Fix not to free resources normally when instance was destroyed Jackson.lee
@ 2025-05-22  7:26 ` Jackson.lee
  2025-05-23 17:43   ` Nicolas Dufresne
  2025-05-22  7:26 ` [PATCH v2 7/7] media: chips-media: wave5: Fix SError of kernel panic when closed Jackson.lee
  6 siblings, 1 reply; 29+ messages in thread
From: Jackson.lee @ 2025-05-22  7:26 UTC (permalink / raw)
  To: mchehab, hverkuil-cisco, sebastian.fricke, nicolas.dufresne,
	bob.beckett, dafna.hirschfeld
  Cc: linux-media, linux-kernel, jackson.lee, lafley.kim, b-brnich,
	hverkuil, nas.chung

From: Jackson Lee <jackson.lee@chipsnmedia.com>

Since applying changes for performance improvement of decoder,
there was a problem related to high CPU load.
CPU load was more than 4 times when comparing CPU load.
The root cause was the device_run was called many times even if
there was no bitstream which should be queued.

Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
---
 .../media/platform/chips-media/wave5/wave5-vpu-dec.c | 12 +++++++++---
 .../media/platform/chips-media/wave5/wave5-vpuapi.h  |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
index 719c5527eb7f..421a9e1a6f15 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
@@ -1280,10 +1280,13 @@ static void wave5_vpu_dec_buf_queue(struct vb2_buffer *vb)
 		__func__, vb->type, vb->index, vb2_plane_size(&vbuf->vb2_buf, 0),
 		vb2_plane_size(&vbuf->vb2_buf, 1), vb2_plane_size(&vbuf->vb2_buf, 2));
 
-	if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE)
+	if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {
+		if (inst->empty_queue)
+			inst->empty_queue = false;
 		wave5_vpu_dec_buf_queue_src(vb);
-	else if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE)
+	} else if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) {
 		wave5_vpu_dec_buf_queue_dst(vb);
+	}
 }
 
 static int wave5_vpu_dec_allocate_ring_buffer(struct vpu_instance *inst)
@@ -1474,6 +1477,7 @@ static void wave5_vpu_dec_stop_streaming(struct vb2_queue *q)
 
 	dev_dbg(inst->dev->dev, "%s: type: %u\n", __func__, q->type);
 	pm_runtime_resume_and_get(inst->dev->dev);
+	inst->empty_queue = false;
 
 	while (check_cmd) {
 		struct queue_status_info q_status;
@@ -1592,6 +1596,7 @@ static void wave5_vpu_dec_device_run(void *priv)
 				inst->queuing_num == 0 &&
 				inst->state == VPU_INST_STATE_PIC_RUN) {
 			dev_dbg(inst->dev->dev, "%s: no bitstream for feeding, so skip ", __func__);
+			inst->empty_queue = true;
 			goto finish_job_and_return;
 		}
 	}
@@ -1737,7 +1742,8 @@ static int wave5_vpu_dec_job_ready(void *priv)
 				"No capture buffer ready to decode!\n");
 			break;
 		} else if (!wave5_is_draining_or_eos(inst) &&
-			   !v4l2_m2m_num_src_bufs_ready(m2m_ctx)) {
+			   (!v4l2_m2m_num_src_bufs_ready(m2m_ctx) ||
+			    inst->empty_queue)) {
 			dev_dbg(inst->dev->dev,
 				"No bitstream data to decode!\n");
 			break;
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
index fd0aef0bac4e..f2596af08cdf 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
@@ -821,6 +821,7 @@ struct vpu_instance {
 	bool retry; /* retry to feed bitstream if failure reason is WAVE5_SYSERR_QUEUEING_FAIL*/
 	int queuing_num; /* check if there is input buffer or not */
 	struct mutex feed_lock; /* lock for feeding bitstream buffers */
+	bool empty_queue;
 	struct vpu_buf bitstream_vbuf;
 	dma_addr_t last_rd_ptr;
 	size_t remaining_consumed_bytes;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 7/7] media: chips-media: wave5: Fix SError of kernel panic when closed
  2025-05-22  7:25 [PATCH v2 0/7] Performance improvement of decoder Jackson.lee
                   ` (5 preceding siblings ...)
  2025-05-22  7:26 ` [PATCH v2 6/7] media: chips-media: wave5: Reduce high CPU load Jackson.lee
@ 2025-05-22  7:26 ` Jackson.lee
  2025-05-23 17:48   ` Nicolas Dufresne
  6 siblings, 1 reply; 29+ messages in thread
From: Jackson.lee @ 2025-05-22  7:26 UTC (permalink / raw)
  To: mchehab, hverkuil-cisco, sebastian.fricke, nicolas.dufresne,
	bob.beckett, dafna.hirschfeld
  Cc: linux-media, linux-kernel, jackson.lee, lafley.kim, b-brnich,
	hverkuil, nas.chung

From: Jackson Lee <jackson.lee@chipsnmedia.com>

Since applying "Reduce high CPU load" patch, SError of kernel panic rarely
happened while testing fluster.
The root cause was to enter suspend mode because timeout of autosuspend
delay happened.

[   48.834439] SError Interrupt on CPU0, code 0x00000000bf000000 -- SError
[   48.834455] CPU: 0 UID: 0 PID: 1067 Comm: v4l2h265dec0:sr Not tainted 6.12.9-gc9e21a1ebd75-dirty #7
[   48.834461] Hardware name: ti Texas Instruments J721S2 EVM/Texas Instruments J721S2 EVM, BIOS 2025.01-00345-gbaf3aaa8ecfa 01/01/2025
[   48.834464] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   48.834468] pc : wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
[   48.834488] lr : wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
[   48.834495] sp : ffff8000856e3a30
[   48.834497] x29: ffff8000856e3a30 x28: ffff0008093f6010 x27: ffff000809158130
[   48.834504] x26: 0000000000000000 x25: ffff00080b625000 x24: ffff000804a9ba80
[   48.834509] x23: ffff000802343028 x22: ffff000809158150 x21: ffff000802218000
[   48.834513] x20: ffff0008093f6000 x19: ffff0008093f6000 x18: 0000000000000000
[   48.834518] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff74009618
[   48.834523] x14: 000000010000000c x13: 0000000000000000 x12: 0000000000000000
[   48.834527] x11: ffffffffffffffff x10: ffffffffffffffff x9 : ffff000802343028
[   48.834532] x8 : ffff00080b6252a0 x7 : 0000000000000038 x6 : 0000000000000000
[   48.834536] x5 : ffff00080b625060 x4 : 0000000000000000 x3 : 0000000000000000
[   48.834541] x2 : 0000000000000000 x1 : ffff800084bf0118 x0 : ffff800084bf0000
[   48.834547] Kernel panic - not syncing: Asynchronous SError Interrupt
[   48.834549] CPU: 0 UID: 0 PID: 1067 Comm: v4l2h265dec0:sr Not tainted 6.12.9-gc9e21a1ebd75-dirty #7
[   48.834554] Hardware name: ti Texas Instruments J721S2 EVM/Texas Instruments J721S2 EVM, BIOS 2025.01-00345-gbaf3aaa8ecfa 01/01/2025
[   48.834556] Call trace:
[   48.834559]  dump_backtrace+0x94/0xec
[   48.834574]  show_stack+0x18/0x24
[   48.834579]  dump_stack_lvl+0x38/0x90
[   48.834585]  dump_stack+0x18/0x24
[   48.834588]  panic+0x35c/0x3e0
[   48.834592]  nmi_panic+0x40/0x8c
[   48.834595]  arm64_serror_panic+0x64/0x70
[   48.834598]  do_serror+0x3c/0x78
[   48.834601]  el1h_64_error_handler+0x34/0x4c
[   48.834605]  el1h_64_error+0x64/0x68
[   48.834608]  wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
[   48.834615]  wave5_vpu_dec_clr_disp_flag+0x54/0x80 [wave5]
[   48.834622]  wave5_vpu_dec_buf_queue+0x19c/0x1a0 [wave5]
[   48.834628]  __enqueue_in_driver+0x3c/0x74 [videobuf2_common]
[   48.834639]  vb2_core_qbuf+0x508/0x61c [videobuf2_common]
[   48.834646]  vb2_qbuf+0xa4/0x168 [videobuf2_v4l2]
[   48.834656]  v4l2_m2m_qbuf+0x80/0x238 [v4l2_mem2mem]
[   48.834666]  v4l2_m2m_ioctl_qbuf+0x18/0x24 [v4l2_mem2mem]
[   48.834673]  v4l_qbuf+0x48/0x5c [videodev]
[   48.834704]  __video_do_ioctl+0x180/0x3f0 [videodev]
[   48.834725]  video_usercopy+0x2ec/0x68c [videodev]
[   48.834745]  video_ioctl2+0x18/0x24 [videodev]
[   48.834766]  v4l2_ioctl+0x40/0x60 [videodev]
[   48.834786]  __arm64_sys_ioctl+0xa8/0xec
[   48.834793]  invoke_syscall+0x44/0x100
[   48.834800]  el0_svc_common.constprop.0+0xc0/0xe0
[   48.834804]  do_el0_svc+0x1c/0x28
[   48.834809]  el0_svc+0x30/0xd0
[   48.834813]  el0t_64_sync_handler+0xc0/0xc4
[   48.834816]  el0t_64_sync+0x190/0x194
[   48.834820] SMP: stopping secondary CPUs
[   48.834831] Kernel Offset: disabled
[   48.834833] CPU features: 0x08,00002002,80200000,4200421b
[   48.834837] Memory Limit: none
[   49.161404] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---

Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
---
 .../platform/chips-media/wave5/wave5-vpu-dec.c   |  3 ---
 .../platform/chips-media/wave5/wave5-vpu-enc.c   |  3 ---
 .../media/platform/chips-media/wave5/wave5-vpu.c |  2 +-
 .../platform/chips-media/wave5/wave5-vpuapi.c    | 16 ----------------
 4 files changed, 1 insertion(+), 23 deletions(-)

diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
index 421a9e1a6f15..b4b522d7fa84 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
@@ -1865,9 +1865,6 @@ static int wave5_vpu_open_dec(struct file *filp)
 	if (ret)
 		goto cleanup_inst;
 
-	if (list_empty(&dev->instances))
-		pm_runtime_use_autosuspend(inst->dev->dev);
-
 	list_add_tail(&inst->list, &dev->instances);
 
 	mutex_unlock(&dev->dev_lock);
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
index 52a1a00fd9bb..7f1aa392805f 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
@@ -1779,9 +1779,6 @@ static int wave5_vpu_open_enc(struct file *filp)
 	if (ret)
 		goto cleanup_inst;
 
-	if (list_empty(&dev->instances))
-		pm_runtime_use_autosuspend(inst->dev->dev);
-
 	list_add_tail(&inst->list, &dev->instances);
 
 	mutex_unlock(&dev->dev_lock);
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
index a2c09523d76b..24a9001966e7 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
@@ -368,7 +368,7 @@ static int wave5_vpu_probe(struct platform_device *pdev)
 	dev_info(&pdev->dev, "Product Code:      0x%x\n", dev->product_code);
 	dev_info(&pdev->dev, "Firmware Revision: %u\n", fw_revision);
 
-	pm_runtime_set_autosuspend_delay(&pdev->dev, 100);
+	pm_runtime_set_autosuspend_delay(&pdev->dev, 500);
 	pm_runtime_use_autosuspend(&pdev->dev);
 	pm_runtime_enable(&pdev->dev);
 	wave5_vpu_sleep_wake(&pdev->dev, true, NULL, 0);
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
index d7318d596b73..1f7f4d214b3c 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
@@ -207,8 +207,6 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
 	int retry = 0;
 	struct vpu_device *vpu_dev = inst->dev;
 	int i;
-	int inst_count = 0;
-	struct vpu_instance *inst_elm;
 	struct dec_output_info dec_info;
 
 	*fail_res = 0;
@@ -265,12 +263,6 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
 	}
 
 	wave5_vdi_free_dma_memory(vpu_dev, &p_dec_info->vb_task);
-
-	list_for_each_entry(inst_elm, &vpu_dev->instances, list)
-		inst_count++;
-	if (inst_count == 1)
-		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
-
 	mutex_destroy(&inst->feed_lock);
 
 unlock_and_return:
@@ -738,8 +730,6 @@ int wave5_vpu_enc_close(struct vpu_instance *inst, u32 *fail_res)
 	int ret;
 	int retry = 0;
 	struct vpu_device *vpu_dev = inst->dev;
-	int inst_count = 0;
-	struct vpu_instance *inst_elm;
 
 	*fail_res = 0;
 	if (!inst->codec_info)
@@ -782,12 +772,6 @@ int wave5_vpu_enc_close(struct vpu_instance *inst, u32 *fail_res)
 	}
 
 	wave5_vdi_free_dma_memory(vpu_dev, &p_enc_info->vb_task);
-
-	list_for_each_entry(inst_elm, &vpu_dev->instances, list)
-		inst_count++;
-	if (inst_count == 1)
-		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
-
 	mutex_unlock(&vpu_dev->hw_lock);
 	pm_runtime_put_sync(inst->dev->dev);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/7] media: chips-media: wave5: Fix Null reference while testing fluster
  2025-05-22  7:26 ` [PATCH v2 1/7] media: chips-media: wave5: Fix Null reference while testing fluster Jackson.lee
@ 2025-05-23 17:20   ` Nicolas Dufresne
  2025-05-27  4:05     ` jackson.lee
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Dufresne @ 2025-05-23 17:20 UTC (permalink / raw)
  To: Jackson.lee, mchehab, hverkuil-cisco, sebastian.fricke,
	bob.beckett, dafna.hirschfeld
  Cc: linux-media, linux-kernel, lafley.kim, b-brnich, hverkuil,
	nas.chung

Hi,

Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> From: Jackson Lee <jackson.lee@chipsnmedia.com>
> 
> When multi instances are created/destroyed, many interrupts happens
> or structures for decoder are removed.
> "struct vpu_instance" this structure is shared for all flow in decoder,
> so if the structure is not protected by lock, Null reference exception
> could happens sometimes.
> IRQ Handler was spilt to two phases and Lock was added as well.
> 
> Fixes: 9707a6254a8a ("media: chips-media: wave5: Add the v4l2 layer")
> Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> ---
>  .../platform/chips-media/wave5/wave5-helper.c | 10 ++-
>  .../chips-media/wave5/wave5-vpu-dec.c         |  5 ++
>  .../chips-media/wave5/wave5-vpu-enc.c         |  5 ++
>  .../platform/chips-media/wave5/wave5-vpu.c    | 69 ++++++++++++++++---
>  .../platform/chips-media/wave5/wave5-vpuapi.h |  6 ++
>  5 files changed, 86 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-helper.c b/drivers/media/platform/chips-media/wave5/wave5-
> helper.c
> index 2c9d8cbca6e4..5d9969bb7ada 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-helper.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-helper.c
> @@ -49,7 +49,7 @@ void wave5_cleanup_instance(struct vpu_instance *inst)
>  		v4l2_fh_del(&inst->v4l2_fh);
>  		v4l2_fh_exit(&inst->v4l2_fh);
>  	}
> -	list_del_init(&inst->list);
> +	kfifo_free(&inst->irq_status);
>  	ida_free(&inst->dev->inst_ida, inst->id);
>  	kfree(inst->codec_info);
>  	kfree(inst);
> @@ -61,8 +61,16 @@ int wave5_vpu_release_device(struct file *filp,
>  {
>  	struct vpu_instance *inst = wave5_to_vpu_inst(filp->private_data);
>  	int ret = 0;
> +	unsigned long flags;
>  
>  	v4l2_m2m_ctx_release(inst->v4l2_fh.m2m_ctx);
> +	ret = mutex_lock_interruptible(&inst->dev->irq_lock);
> +	if (ret)
> +		return ret;

This is quite some heavy locking, why do you need both the mutex and
the spinlock ?

> +	spin_lock_irqsave(&inst->dev->irq_spinlock, flags);
> +	list_del_init(&inst->list);
> +	spin_unlock_irqrestore(&inst->dev->irq_spinlock, flags);
> +	mutex_unlock(&inst->dev->irq_lock);
>  	if (inst->state != VPU_INST_STATE_NONE) {
>  		u32 fail_res;
>  
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-
> media/wave5/wave5-vpu-dec.c
> index fd71f0c43ac3..32de43de1870 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> @@ -1811,6 +1811,11 @@ static int wave5_vpu_open_dec(struct file *filp)
>  	inst->xfer_func = V4L2_XFER_FUNC_DEFAULT;
>  
>  	init_completion(&inst->irq_done);
> +	ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int), GFP_KERNEL);
> +	if (ret) {
> +		dev_err(inst->dev->dev, "failed to allocate fifo\n");
> +		goto cleanup_inst;
> +	}
>  
>  	inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL);
>  	if (inst->id < 0) {
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c b/drivers/media/platform/chips-
> media/wave5/wave5-vpu-enc.c
> index 1e5fc5f8b856..52a1a00fd9bb 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> @@ -1760,6 +1760,11 @@ static int wave5_vpu_open_enc(struct file *filp)
>  	inst->frame_rate = 30;
>  
>  	init_completion(&inst->irq_done);
> +	ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int), GFP_KERNEL);
> +	if (ret) {
> +		dev_err(inst->dev->dev, "failed to allocate fifo\n");
> +		goto cleanup_inst;
> +	}
>  
>  	inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL);
>  	if (inst->id < 0) {
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c b/drivers/media/platform/chips-media/wave5/wave5-
> vpu.c
> index e1715d3f43b0..a2c09523d76b 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> @@ -51,8 +51,11 @@ static void wave5_vpu_handle_irq(void *dev_id)
>  	u32 seq_done;
>  	u32 cmd_done;
>  	u32 irq_reason;
> -	struct vpu_instance *inst;
> +	u32 irq_subreason;
> +	struct vpu_instance *inst, *tmp;
>  	struct vpu_device *dev = dev_id;
> +	int val;
> +	unsigned long flags;
>  
>  	irq_reason = wave5_vdi_read_register(dev, W5_VPU_VINT_REASON);
>  	seq_done = wave5_vdi_read_register(dev, W5_RET_SEQ_DONE_INSTANCE_INFO);
> @@ -60,7 +63,8 @@ static void wave5_vpu_handle_irq(void *dev_id)
>  	wave5_vdi_write_register(dev, W5_VPU_VINT_REASON_CLR, irq_reason);
>  	wave5_vdi_write_register(dev, W5_VPU_VINT_CLEAR, 0x1);
>  
> -	list_for_each_entry(inst, &dev->instances, list) {
> +	spin_lock_irqsave(&dev->irq_spinlock, flags);
> +	list_for_each_entry_safe(inst, tmp, &dev->instances, list) {
>  
>  		if (irq_reason & BIT(INT_WAVE5_INIT_SEQ) ||
>  		    irq_reason & BIT(INT_WAVE5_ENC_SET_PARAM)) {
> @@ -82,14 +86,22 @@ static void wave5_vpu_handle_irq(void *dev_id)
>  		    irq_reason & BIT(INT_WAVE5_ENC_PIC)) {
>  			if (cmd_done & BIT(inst->id)) {
>  				cmd_done &= ~BIT(inst->id);
> -				wave5_vdi_write_register(dev, W5_RET_QUEUE_CMD_DONE_INST,
> -							 cmd_done);
> -				inst->ops->finish_process(inst);
> +				if (dev->irq >= 0) {
> +					irq_subreason =
> +						wave5_vdi_read_register(dev, W5_VPU_VINT_REASON);
> +					if (!(irq_subreason & BIT(INT_WAVE5_DEC_PIC)))
> +						wave5_vdi_write_register(dev,
> +									 W5_RET_QUEUE_CMD_DONE_INST,
> +									 cmd_done);
> +				}
> +				val = BIT(INT_WAVE5_DEC_PIC);
> +				kfifo_in(&inst->irq_status, &val, sizeof(int));
>  			}
>  		}
> -
> -		wave5_vpu_clear_interrupt(inst, irq_reason);
>  	}
> +	spin_unlock_irqrestore(&dev->irq_spinlock, flags);
> +
> +	up(&dev->irq_sem);
>  }
>  
>  static irqreturn_t wave5_vpu_irq_thread(int irq, void *dev_id)
> @@ -121,6 +133,35 @@ static enum hrtimer_restart wave5_vpu_timer_callback(struct hrtimer *timer)
>  	return HRTIMER_RESTART;
>  }
>  
> +static int irq_thread(void *data)
> +{
> +	struct vpu_device *dev = (struct vpu_device *)data;
> +	struct vpu_instance *inst, *tmp;
> +	int irq_status, ret;
> +
> +	while (!kthread_should_stop()) {
> +		if (down_interruptible(&dev->irq_sem))
> +			continue;
> +
> +		if (kthread_should_stop())
> +			break;
> +
> +		mutex_lock(&dev->irq_lock);
> +		list_for_each_entry_safe(inst, tmp, &dev->instances, list) {
> +			while (kfifo_len(&inst->irq_status)) {
> +				ret = kfifo_out(&inst->irq_status, &irq_status, sizeof(int));
> +				if (!ret)
> +					break;
> +
> +				inst->ops->finish_process(inst);
> +			}
> +		}
> +		mutex_unlock(&dev->irq_lock);
> +	}
> +
> +	return 0;
> +}
> +
>  static int wave5_vpu_load_firmware(struct device *dev, const char *fw_name,
>  				   u32 *revision)
>  {
> @@ -224,6 +265,8 @@ static int wave5_vpu_probe(struct platform_device *pdev)
>  
>  	mutex_init(&dev->dev_lock);
>  	mutex_init(&dev->hw_lock);
> +	mutex_init(&dev->irq_lock);
> +	spin_lock_init(&dev->irq_spinlock);
>  	dev_set_drvdata(&pdev->dev, dev);
>  	dev->dev = &pdev->dev;
>  
> @@ -266,6 +309,10 @@ static int wave5_vpu_probe(struct platform_device *pdev)
>  	}
>  	dev->product = wave5_vpu_get_product_id(dev);
>  
> +	sema_init(&dev->irq_sem, 1);
> +	INIT_LIST_HEAD(&dev->instances);
> +	dev->irq_thread = kthread_run(irq_thread, dev, "irq thread");
> +
>  	dev->irq = platform_get_irq(pdev, 0);
>  	if (dev->irq < 0) {
>  		dev_err(&pdev->dev, "failed to get irq resource, falling back to polling\n");
> @@ -288,7 +335,6 @@ static int wave5_vpu_probe(struct platform_device *pdev)
>  		}
>  	}
>  
> -	INIT_LIST_HEAD(&dev->instances);
>  	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
>  	if (ret) {
>  		dev_err(&pdev->dev, "v4l2_device_register, fail: %d\n", ret);
> @@ -351,6 +397,12 @@ static void wave5_vpu_remove(struct platform_device *pdev)
>  {
>  	struct vpu_device *dev = dev_get_drvdata(&pdev->dev);
>  
> +	if (dev->irq_thread) {
> +		kthread_stop(dev->irq_thread);
> +		up(&dev->irq_sem);
> +		dev->irq_thread = NULL;
> +	}
> +
>  	if (dev->irq < 0) {
>  		kthread_destroy_worker(dev->worker);
>  		hrtimer_cancel(&dev->hrtimer);
> @@ -361,6 +413,7 @@ static void wave5_vpu_remove(struct platform_device *pdev)
>  
>  	mutex_destroy(&dev->dev_lock);
>  	mutex_destroy(&dev->hw_lock);
> +	mutex_destroy(&dev->irq_lock);
>  	reset_control_assert(dev->resets);
>  	clk_bulk_disable_unprepare(dev->num_clks, dev->clks);
>  	wave5_vpu_enc_unregister_device(dev);
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h b/drivers/media/platform/chips-media/wave5/wave5-
> vpuapi.h
> index 45615c15beca..f3c1ad6fb3be 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> @@ -8,6 +8,7 @@
>  #ifndef VPUAPI_H_INCLUDED
>  #define VPUAPI_H_INCLUDED
>  
> +#include <linux/kfifo.h>
>  #include <linux/idr.h>
>  #include <linux/genalloc.h>
>  #include <media/v4l2-device.h>
> @@ -747,6 +748,7 @@ struct vpu_device {
>  	struct video_device *video_dev_enc;
>  	struct mutex dev_lock; /* lock for the src, dst v4l2 queues */
>  	struct mutex hw_lock; /* lock hw configurations */
> +	struct mutex irq_lock;
>  	int irq;
>  	enum product_id product;
>  	struct vpu_attr attr;
> @@ -764,7 +766,10 @@ struct vpu_device {
>  	struct kthread_worker *worker;
>  	int vpu_poll_interval;
>  	int num_clks;
> +	struct task_struct *irq_thread;
> +	struct semaphore irq_sem;

Can you comment on what they actually protect ?

>  	struct reset_control *resets;
> +	spinlock_t irq_spinlock; /* protect instances list */
>  };
>  
>  struct vpu_instance;
> @@ -788,6 +793,7 @@ struct vpu_instance {
>  	enum v4l2_ycbcr_encoding ycbcr_enc;
>  	enum v4l2_quantization quantization;
>  
> +	struct kfifo irq_status;
>  	enum vpu_instance_state state;
>  	enum vpu_instance_type type;
>  	const struct vpu_instance_ops *ops;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder
  2025-05-22  7:26 ` [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder Jackson.lee
@ 2025-05-23 17:39   ` Nicolas Dufresne
  2025-05-27  4:58     ` jackson.lee
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Dufresne @ 2025-05-23 17:39 UTC (permalink / raw)
  To: Jackson.lee, mchehab, hverkuil-cisco, sebastian.fricke,
	bob.beckett, dafna.hirschfeld
  Cc: linux-media, linux-kernel, lafley.kim, b-brnich, hverkuil,
	nas.chung

Hi,

Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> From: Jackson Lee <jackson.lee@chipsnmedia.com>
> 
> The current decoding method  was to wait until each frame was
> decoded after feeding a bitstream. As a result, performance was low
> and Wave5 could not achieve max pixel processing rate.
> 
> Update driver to use an asynchronous approach for decoding and feeding a
> bitstream in order to achieve full capabilities of the device.
> 
> WAVE5 supports command-queueing to maximize performance by pipelining
> internal commands and by hiding wait cycle taken to receive a command
> from Host processor.
> 
> Instead of waiting for each command to be executed before sending the
> next command, Host processor just places all the commands in the
> command-queue and goes on doing other things while the commands in the
> queue are processed by VPU.
> 
> While Host processor handles its own tasks, it can receive VPU interrupt
> request (IRQ).
> In this case, host processor can simply exit interrupt service routine
> (ISR) without accessing to host interface to read the result of the
> command reported by VPU.
> After host processor completed its tasks, host processor can read the
> command result when host processor needs the reports and does
> response processing.
> 
> To archive this goal, the device_run() calls v4l2_m2m_job_finish
> so that next command can be sent to VPU continuously, if there is
> any result, then irq is triggered and gets decoded frames and returns
> them to upper layer.
> Theses processes work independently each other without waiting
> a decoded frame.
> 
> Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> ---
>  .../platform/chips-media/wave5/wave5-hw.c     |  2 +-
>  .../chips-media/wave5/wave5-vpu-dec.c         | 84 +++++++++++--------
>  .../platform/chips-media/wave5/wave5-vpuapi.c |  2 +
>  .../platform/chips-media/wave5/wave5-vpuapi.h |  3 +
>  4 files changed, 57 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-hw.c b/drivers/media/platform/chips-media/wave5/wave5-hw.c
> index d94cf84c3ee5..687ce6ccf3ae 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-hw.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-hw.c
> @@ -102,7 +102,7 @@ static void _wave5_print_reg_err(struct vpu_device *vpu_dev, u32 reg_fail_reason
>  		dev_dbg(dev, "%s: queueing failure: 0x%x\n", func, reg_val);
>  		break;
>  	case WAVE5_SYSERR_RESULT_NOT_READY:
> -		dev_err(dev, "%s: result not ready: 0x%x\n", func, reg_fail_reason);
> +		dev_dbg(dev, "%s: result not ready: 0x%x\n", func, reg_fail_reason);
>  		break;
>  	case WAVE5_SYSERR_ACCESS_VIOLATION_HW:
>  		dev_err(dev, "%s: access violation: 0x%x\n", func, reg_fail_reason);
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-
> media/wave5/wave5-vpu-dec.c
> index 32de43de1870..995234a3a6d6 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> @@ -347,13 +347,12 @@ static void wave5_vpu_dec_finish_decode(struct vpu_instance *inst)
>  	struct vb2_v4l2_buffer *dec_buf = NULL;
>  	struct vb2_v4l2_buffer *disp_buf = NULL;
>  	struct vb2_queue *dst_vq = v4l2_m2m_get_dst_vq(m2m_ctx);
> -	struct queue_status_info q_status;
>  
>  	dev_dbg(inst->dev->dev, "%s: Fetch output info from firmware.", __func__);
>  
>  	ret = wave5_vpu_dec_get_output_info(inst, &dec_info);
>  	if (ret) {
> -		dev_warn(inst->dev->dev, "%s: could not get output info.", __func__);
> +		dev_dbg(inst->dev->dev, "%s: could not get output info.", __func__);

Wouldn't it be better to check the return value to possibly differentiate some errors
from something similar to EGAIN?

>  		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
>  		return;
>  	}
> @@ -441,20 +440,6 @@ static void wave5_vpu_dec_finish_decode(struct vpu_instance *inst)
>  		}
>  		spin_unlock_irqrestore(&inst->state_spinlock, flags);
>  	}
> -
> -	/*
> -	 * During a resolution change and while draining, the firmware may flush
> -	 * the reorder queue regardless of having a matching decoding operation
> -	 * pending. Only terminate the job if there are no more IRQ coming.
> -	 */
> -	wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, &q_status);
> -	if (q_status.report_queue_count == 0 &&
> -	    (q_status.instance_queue_count == 0 || dec_info.sequence_changed)) {
> -		dev_dbg(inst->dev->dev, "%s: finishing job.\n", __func__);
> -		pm_runtime_mark_last_busy(inst->dev->dev);
> -		pm_runtime_put_autosuspend(inst->dev->dev);
> -		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> -	}
>  }
>  
>  static int wave5_vpu_dec_querycap(struct file *file, void *fh, struct v4l2_capability *cap)
> @@ -1146,8 +1131,8 @@ static int write_to_ringbuffer(struct vpu_instance *inst, void *buffer, size_t b
>  static int fill_ringbuffer(struct vpu_instance *inst)
>  {
>  	struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
> -	struct v4l2_m2m_buffer *buf, *n;
> -	int ret;
> +	struct vpu_src_buffer *vpu_buf;
> +	int ret = 0;
>  
>  	if (m2m_ctx->last_src_buf)  {
>  		struct vpu_src_buffer *vpu_buf = wave5_to_vpu_src_buf(m2m_ctx->last_src_buf);
> @@ -1158,9 +1143,8 @@ static int fill_ringbuffer(struct vpu_instance *inst)
>  		}
>  	}
>  
> -	v4l2_m2m_for_each_src_buf_safe(m2m_ctx, buf, n) {
> -		struct vb2_v4l2_buffer *vbuf = &buf->vb;
> -		struct vpu_src_buffer *vpu_buf = wave5_to_vpu_src_buf(vbuf);
> +	list_for_each_entry(vpu_buf, &inst->avail_src_bufs, list) {
> +		struct vb2_v4l2_buffer *vbuf = &vpu_buf->v4l2_m2m_buf.vb;
>  		struct vpu_buf *ring_buffer = &inst->bitstream_vbuf;
>  		size_t src_size = vb2_get_plane_payload(&vbuf->vb2_buf, 0);
>  		void *src_buf = vb2_plane_vaddr(&vbuf->vb2_buf, 0);
> @@ -1220,9 +1204,13 @@ static int fill_ringbuffer(struct vpu_instance *inst)
>  			dev_dbg(inst->dev->dev, "last src buffer written to the ring buffer\n");
>  			break;
>  		}
> +
> +		inst->queuing_num++;
> +		list_del_init(&vpu_buf->list);
> +		break;
>  	}
>  
> -	return 0;
> +	return ret;
>  }
>  
>  static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb)
> @@ -1236,6 +1224,11 @@ static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb)
>  	vbuf->sequence = inst->queued_src_buf_num++;
>  
>  	v4l2_m2m_buf_queue(m2m_ctx, vbuf);
> +
> +	INIT_LIST_HEAD(&vpu_buf->list);
> +	mutex_lock(&inst->feed_lock);
> +	list_add_tail(&vpu_buf->list, &inst->avail_src_bufs);
> +	mutex_unlock(&inst->feed_lock);
>  }
>  
>  static void wave5_vpu_dec_buf_queue_dst(struct vb2_buffer *vb)
> @@ -1385,6 +1378,13 @@ static int streamoff_output(struct vb2_queue *q)
>  	dma_addr_t new_rd_ptr;
>  	struct dec_output_info dec_info;
>  	unsigned int i;
> +	struct vpu_src_buffer *vpu_buf, *tmp;
> +
> +	inst->retry = false;
> +	inst->queuing_num = 0;
> +
> +	list_for_each_entry_safe(vpu_buf, tmp, &inst->avail_src_bufs, list)
> +		list_del_init(&vpu_buf->list);
>  
>  	for (i = 0; i < v4l2_m2m_num_dst_bufs_ready(m2m_ctx); i++) {
>  		ret = wave5_vpu_dec_set_disp_flag(inst, i);
> @@ -1580,10 +1580,19 @@ static void wave5_vpu_dec_device_run(void *priv)
>  
>  	dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with new bitstream data", __func__);
>  	pm_runtime_resume_and_get(inst->dev->dev);
> -	ret = fill_ringbuffer(inst);
> -	if (ret) {
> -		dev_warn(inst->dev->dev, "Filling ring buffer failed\n");
> -		goto finish_job_and_return;
> +	if (!inst->retry) {
> +		mutex_lock(&inst->feed_lock);
> +		ret = fill_ringbuffer(inst);
> +		mutex_unlock(&inst->feed_lock);
> +		if (ret < 0) {
> +			dev_warn(inst->dev->dev, "Filling ring buffer failed\n");
> +			goto finish_job_and_return;
> +		} else if (!inst->eos &&
> +				inst->queuing_num == 0 &&
> +				inst->state == VPU_INST_STATE_PIC_RUN) {
> +			dev_dbg(inst->dev->dev, "%s: no bitstream for feeding, so skip ", __func__);
> +			goto finish_job_and_return;
> +		}
>  	}
>  
>  	switch (inst->state) {
> @@ -1639,7 +1648,7 @@ static void wave5_vpu_dec_device_run(void *priv)
>  		}
>  
>  		if (q_status.instance_queue_count) {
> -			dev_dbg(inst->dev->dev, "%s: leave with active job", __func__);
> +			v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
>  			return;
>  		}
>  
> @@ -1650,14 +1659,21 @@ static void wave5_vpu_dec_device_run(void *priv)
>  			dev_err(inst->dev->dev,
>  				"Frame decoding on m2m context (%p), fail: %d (result: %d)\n",
>  				m2m_ctx, ret, fail_res);
> -			break;
> +			goto finish_job_and_return;
> +		}
> +
> +		if (fail_res == WAVE5_SYSERR_QUEUEING_FAIL) {
> +			inst->retry = true;
> +		} else {
> +			inst->retry = false;
> +			if (!inst->eos)
> +				inst->queuing_num--;

I looked into the original state machine violation you had in previous version, and I got
the impression that the reason you did hit that was that you actually call device_run
passed inst->eos. Its probably not that simple in practice, but I think you forgot to adapt
the job_ready() ops to prevent more device_run() called passed CMD_STOP and having all
pending buffer written in the ring buffer.

As a side effect, you endup calling device_run() in a race with the finish() setting
the state to STOP. I really think there is a way to use inst->eos boolean to prevent
that race in the first place. Might need to be combined with checking if you have buffers
prior to command stop that did not yet fit into the ring buffer.

>  		}
> -		/* Return so that we leave this job active */
> -		dev_dbg(inst->dev->dev, "%s: leave with active job", __func__);
> -		return;
> -	default:
> -		WARN(1, "Execution of a job in state %s illegal.\n", state_to_str(inst->state));
>  		break;
> +	default:
> +		dev_dbg(inst->dev->dev, "Execution of a job in state %s illegal.\n",
> +			state_to_str(inst->state));
> +
>  	}
>  
>  finish_job_and_return:
> @@ -1755,6 +1771,8 @@ static int wave5_vpu_open_dec(struct file *filp)
>  	inst->ops = &wave5_vpu_dec_inst_ops;
>  
>  	spin_lock_init(&inst->state_spinlock);
> +	mutex_init(&inst->feed_lock);
> +	INIT_LIST_HEAD(&inst->avail_src_bufs);
>  
>  	inst->codec_info = kzalloc(sizeof(*inst->codec_info), GFP_KERNEL);
>  	if (!inst->codec_info)
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c b/drivers/media/platform/chips-media/wave5/wave5-
> vpuapi.c
> index e5e879a13e8b..68d86625538f 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> @@ -255,6 +255,8 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
>  	if (inst_count == 1)
>  		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
>  
> +	mutex_destroy(&inst->feed_lock);
> +
>  unlock_and_return:
>  	mutex_unlock(&vpu_dev->hw_lock);
>  	pm_runtime_put_sync(inst->dev->dev);
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h b/drivers/media/platform/chips-media/wave5/wave5-
> vpuapi.h
> index f3c1ad6fb3be..fd0aef0bac4e 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> @@ -818,6 +818,9 @@ struct vpu_instance {
>  	bool cbcr_interleave;
>  	bool nv21;
>  	bool eos;
> +	bool retry; /* retry to feed bitstream if failure reason is WAVE5_SYSERR_QUEUEING_FAIL*/
> +	int queuing_num; /* check if there is input buffer or not */

This is described as a boolean, but is implemented as a counter. What does it count exactly ?
I think it needs a better name too.

Nicolas

> +	struct mutex feed_lock; /* lock for feeding bitstream buffers */
>  	struct vpu_buf bitstream_vbuf;
>  	dma_addr_t last_rd_ptr;
>  	size_t remaining_consumed_bytes;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 4/7] media: chips-media: wave5: Use spinlock whenever statue is changed
  2025-05-22  7:26 ` [PATCH v2 4/7] media: chips-media: wave5: Use spinlock whenever statue is changed Jackson.lee
@ 2025-05-23 17:41   ` Nicolas Dufresne
  2025-05-27  5:02     ` jackson.lee
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Dufresne @ 2025-05-23 17:41 UTC (permalink / raw)
  To: Jackson.lee, mchehab, hverkuil-cisco, sebastian.fricke,
	bob.beckett, dafna.hirschfeld
  Cc: linux-media, linux-kernel, lafley.kim, b-brnich, hverkuil,
	nas.chung

Hi,

Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> From: Jackson Lee <jackson.lee@chipsnmedia.com>
> 
> The device_run and finish_decode is not any more synchronized,
> so lock was needed in the device_run whenever state was changed.

Can you try to introduce the locking ahead of the patches, otherwise
this break bisectability as the in-between become racy.

Nicolas

> 
> Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> ---
>  drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-
> media/wave5/wave5-vpu-dec.c
> index 42981c3b49bc..719c5527eb7f 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> @@ -1577,6 +1577,7 @@ static void wave5_vpu_dec_device_run(void *priv)
>  	struct queue_status_info q_status;
>  	u32 fail_res = 0;
>  	int ret = 0;
> +	unsigned long flags;
>  
>  	dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with new bitstream data", __func__);
>  	pm_runtime_resume_and_get(inst->dev->dev);
> @@ -1617,7 +1618,9 @@ static void wave5_vpu_dec_device_run(void *priv)
>  			}
>  			spin_unlock_irqrestore(&inst->state_spinlock, flags);
>  		} else {
> +			spin_lock_irqsave(&inst->state_spinlock, flags);
>  			switch_state(inst, VPU_INST_STATE_INIT_SEQ);
> +			spin_unlock_irqrestore(&inst->state_spinlock, flags);
>  		}
>  
>  		break;
> @@ -1628,8 +1631,9 @@ static void wave5_vpu_dec_device_run(void *priv)
>  		 * we had a chance to switch, which leads to an invalid state
>  		 * change.
>  		 */
> +		spin_lock_irqsave(&inst->state_spinlock, flags);
>  		switch_state(inst, VPU_INST_STATE_PIC_RUN);
> -
> +		spin_unlock_irqrestore(&inst->state_spinlock, flags);
>  		/*
>  		 * During DRC, the picture decoding remains pending, so just leave the job
>  		 * active until this decode operation completes.
> @@ -1643,7 +1647,9 @@ static void wave5_vpu_dec_device_run(void *priv)
>  		ret = wave5_prepare_fb(inst);
>  		if (ret) {
>  			dev_warn(inst->dev->dev, "Framebuffer preparation, fail: %d\n", ret);
> +			spin_lock_irqsave(&inst->state_spinlock, flags);
>  			switch_state(inst, VPU_INST_STATE_STOP);
> +			spin_unlock_irqrestore(&inst->state_spinlock, flags);
>  			break;
>  		}
>  

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 5/7] media: chips-media: wave5: Fix not to free resources normally when instance was destroyed
  2025-05-22  7:26 ` [PATCH v2 5/7] media: chips-media: wave5: Fix not to free resources normally when instance was destroyed Jackson.lee
@ 2025-05-23 17:42   ` Nicolas Dufresne
  2025-05-27  5:04     ` jackson.lee
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Dufresne @ 2025-05-23 17:42 UTC (permalink / raw)
  To: Jackson.lee, mchehab, hverkuil-cisco, sebastian.fricke,
	bob.beckett, dafna.hirschfeld
  Cc: linux-media, linux-kernel, lafley.kim, b-brnich, hverkuil,
	nas.chung

Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> From: Jackson Lee <jackson.lee@chipsnmedia.com>
> 
> Since applying performance patch, there was a problem not to free
> resources, the root cause was that timeout sometimes happened after
> calling the wave5_vpu_dec_finish_seq() when application was closed
> forcibly,so if failure reason is WAVE5_SYSERR_VPU_STILL_RUNNING,
> the wave5_vpu_dec_get_output_info() should be called to flush videos
> decoded before closed.

Either squash, or try to bring before too.

Nicolas

> 
> Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> ---
>  .../platform/chips-media/wave5/wave5-vpuapi.c | 20 +++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c b/drivers/media/platform/chips-media/wave5/wave5-
> vpuapi.c
> index 68d86625538f..d7318d596b73 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> @@ -209,6 +209,7 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
>  	int i;
>  	int inst_count = 0;
>  	struct vpu_instance *inst_elm;
> +	struct dec_output_info dec_info;
>  
>  	*fail_res = 0;
>  	if (!inst->codec_info)
> @@ -229,11 +230,26 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
>  			goto unlock_and_return;
>  		}
>  
> -		if (*fail_res == WAVE5_SYSERR_VPU_STILL_RUNNING &&
> -		    retry++ >= MAX_FIRMWARE_CALL_RETRY) {
> +		if (ret == 0)
> +			break;
> +
> +		if (*fail_res != WAVE5_SYSERR_VPU_STILL_RUNNING) {
> +			dev_warn(inst->dev->dev, "dec_finish_seq timed out\n");
> +			goto unlock_and_return;
> +		}
> +
> +		if (retry++ >= MAX_FIRMWARE_CALL_RETRY) {
>  			ret = -ETIMEDOUT;
>  			goto unlock_and_return;
>  		}
> +
> +		mutex_unlock(&vpu_dev->hw_lock);
> +		wave5_vpu_dec_get_output_info(inst, &dec_info);
> +		ret = mutex_lock_interruptible(&vpu_dev->hw_lock);
> +		if (ret) {
> +			pm_runtime_put_sync(inst->dev->dev);
> +			return ret;
> +		}
>  	} while (ret != 0);
>  
>  	dev_dbg(inst->dev->dev, "%s: dec_finish_seq complete\n", __func__);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 6/7] media: chips-media: wave5: Reduce high CPU load
  2025-05-22  7:26 ` [PATCH v2 6/7] media: chips-media: wave5: Reduce high CPU load Jackson.lee
@ 2025-05-23 17:43   ` Nicolas Dufresne
  2025-05-27  5:05     ` jackson.lee
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Dufresne @ 2025-05-23 17:43 UTC (permalink / raw)
  To: Jackson.lee, mchehab, hverkuil-cisco, sebastian.fricke,
	bob.beckett, dafna.hirschfeld
  Cc: linux-media, linux-kernel, lafley.kim, b-brnich, hverkuil,
	nas.chung

Hi,

Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> From: Jackson Lee <jackson.lee@chipsnmedia.com>
> 
> Since applying changes for performance improvement of decoder,
> there was a problem related to high CPU load.
> CPU load was more than 4 times when comparing CPU load.
> The root cause was the device_run was called many times even if
> there was no bitstream which should be queued.

You should squash this.

Nicolas

> 
> Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> ---
>  .../media/platform/chips-media/wave5/wave5-vpu-dec.c | 12 +++++++++---
>  .../media/platform/chips-media/wave5/wave5-vpuapi.h  |  1 +
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-
> media/wave5/wave5-vpu-dec.c
> index 719c5527eb7f..421a9e1a6f15 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> @@ -1280,10 +1280,13 @@ static void wave5_vpu_dec_buf_queue(struct vb2_buffer *vb)
>  		__func__, vb->type, vb->index, vb2_plane_size(&vbuf->vb2_buf, 0),
>  		vb2_plane_size(&vbuf->vb2_buf, 1), vb2_plane_size(&vbuf->vb2_buf, 2));
>  
> -	if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE)
> +	if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {
> +		if (inst->empty_queue)
> +			inst->empty_queue = false;
>  		wave5_vpu_dec_buf_queue_src(vb);
> -	else if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE)
> +	} else if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) {
>  		wave5_vpu_dec_buf_queue_dst(vb);
> +	}
>  }
>  
>  static int wave5_vpu_dec_allocate_ring_buffer(struct vpu_instance *inst)
> @@ -1474,6 +1477,7 @@ static void wave5_vpu_dec_stop_streaming(struct vb2_queue *q)
>  
>  	dev_dbg(inst->dev->dev, "%s: type: %u\n", __func__, q->type);
>  	pm_runtime_resume_and_get(inst->dev->dev);
> +	inst->empty_queue = false;
>  
>  	while (check_cmd) {
>  		struct queue_status_info q_status;
> @@ -1592,6 +1596,7 @@ static void wave5_vpu_dec_device_run(void *priv)
>  				inst->queuing_num == 0 &&
>  				inst->state == VPU_INST_STATE_PIC_RUN) {
>  			dev_dbg(inst->dev->dev, "%s: no bitstream for feeding, so skip ", __func__);
> +			inst->empty_queue = true;
>  			goto finish_job_and_return;
>  		}
>  	}
> @@ -1737,7 +1742,8 @@ static int wave5_vpu_dec_job_ready(void *priv)
>  				"No capture buffer ready to decode!\n");
>  			break;
>  		} else if (!wave5_is_draining_or_eos(inst) &&
> -			   !v4l2_m2m_num_src_bufs_ready(m2m_ctx)) {
> +			   (!v4l2_m2m_num_src_bufs_ready(m2m_ctx) ||
> +			    inst->empty_queue)) {
>  			dev_dbg(inst->dev->dev,
>  				"No bitstream data to decode!\n");
>  			break;
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h b/drivers/media/platform/chips-media/wave5/wave5-
> vpuapi.h
> index fd0aef0bac4e..f2596af08cdf 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> @@ -821,6 +821,7 @@ struct vpu_instance {
>  	bool retry; /* retry to feed bitstream if failure reason is WAVE5_SYSERR_QUEUEING_FAIL*/
>  	int queuing_num; /* check if there is input buffer or not */
>  	struct mutex feed_lock; /* lock for feeding bitstream buffers */
> +	bool empty_queue;
>  	struct vpu_buf bitstream_vbuf;
>  	dma_addr_t last_rd_ptr;
>  	size_t remaining_consumed_bytes;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 7/7] media: chips-media: wave5: Fix SError of kernel panic when closed
  2025-05-22  7:26 ` [PATCH v2 7/7] media: chips-media: wave5: Fix SError of kernel panic when closed Jackson.lee
@ 2025-05-23 17:48   ` Nicolas Dufresne
  2025-05-27  5:07     ` jackson.lee
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Dufresne @ 2025-05-23 17:48 UTC (permalink / raw)
  To: Jackson.lee, mchehab, hverkuil-cisco, sebastian.fricke,
	bob.beckett, dafna.hirschfeld
  Cc: linux-media, linux-kernel, lafley.kim, b-brnich, hverkuil,
	nas.chung

Hi,


Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> From: Jackson Lee <jackson.lee@chipsnmedia.com>
> 
> Since applying "Reduce high CPU load" patch, SError of kernel panic rarely
> happened while testing fluster.
> The root cause was to enter suspend mode because timeout of autosuspend
> delay happened.

This would need to be done ahead of other patches, before it breaks. Toggling
auto-suspend seems suspicious. I'm pretty sure this was always a bit fishy, so
I'm fine with removing that. Normally get/put once everything is configured
should be fine.

> 
> [   48.834439] SError Interrupt on CPU0, code 0x00000000bf000000 -- SError
> [   48.834455] CPU: 0 UID: 0 PID: 1067 Comm: v4l2h265dec0:sr Not tainted 6.12.9-gc9e21a1ebd75-dirty #7
> [   48.834461] Hardware name: ti Texas Instruments J721S2 EVM/Texas Instruments J721S2 EVM, BIOS 2025.01-00345-
> gbaf3aaa8ecfa 01/01/2025
> [   48.834464] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   48.834468] pc : wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
> [   48.834488] lr : wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
> [   48.834495] sp : ffff8000856e3a30
> [   48.834497] x29: ffff8000856e3a30 x28: ffff0008093f6010 x27: ffff000809158130
> [   48.834504] x26: 0000000000000000 x25: ffff00080b625000 x24: ffff000804a9ba80
> [   48.834509] x23: ffff000802343028 x22: ffff000809158150 x21: ffff000802218000
> [   48.834513] x20: ffff0008093f6000 x19: ffff0008093f6000 x18: 0000000000000000
> [   48.834518] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff74009618
> [   48.834523] x14: 000000010000000c x13: 0000000000000000 x12: 0000000000000000
> [   48.834527] x11: ffffffffffffffff x10: ffffffffffffffff x9 : ffff000802343028
> [   48.834532] x8 : ffff00080b6252a0 x7 : 0000000000000038 x6 : 0000000000000000
> [   48.834536] x5 : ffff00080b625060 x4 : 0000000000000000 x3 : 0000000000000000
> [   48.834541] x2 : 0000000000000000 x1 : ffff800084bf0118 x0 : ffff800084bf0000
> [   48.834547] Kernel panic - not syncing: Asynchronous SError Interrupt
> [   48.834549] CPU: 0 UID: 0 PID: 1067 Comm: v4l2h265dec0:sr Not tainted 6.12.9-gc9e21a1ebd75-dirty #7

Hopefully you also test on mainline.

Nicolas

> [   48.834554] Hardware name: ti Texas Instruments J721S2 EVM/Texas Instruments J721S2 EVM, BIOS 2025.01-00345-
> gbaf3aaa8ecfa 01/01/2025
> [   48.834556] Call trace:
> [   48.834559]  dump_backtrace+0x94/0xec
> [   48.834574]  show_stack+0x18/0x24
> [   48.834579]  dump_stack_lvl+0x38/0x90
> [   48.834585]  dump_stack+0x18/0x24
> [   48.834588]  panic+0x35c/0x3e0
> [   48.834592]  nmi_panic+0x40/0x8c
> [   48.834595]  arm64_serror_panic+0x64/0x70
> [   48.834598]  do_serror+0x3c/0x78
> [   48.834601]  el1h_64_error_handler+0x34/0x4c
> [   48.834605]  el1h_64_error+0x64/0x68
> [   48.834608]  wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
> [   48.834615]  wave5_vpu_dec_clr_disp_flag+0x54/0x80 [wave5]
> [   48.834622]  wave5_vpu_dec_buf_queue+0x19c/0x1a0 [wave5]
> [   48.834628]  __enqueue_in_driver+0x3c/0x74 [videobuf2_common]
> [   48.834639]  vb2_core_qbuf+0x508/0x61c [videobuf2_common]
> [   48.834646]  vb2_qbuf+0xa4/0x168 [videobuf2_v4l2]
> [   48.834656]  v4l2_m2m_qbuf+0x80/0x238 [v4l2_mem2mem]
> [   48.834666]  v4l2_m2m_ioctl_qbuf+0x18/0x24 [v4l2_mem2mem]
> [   48.834673]  v4l_qbuf+0x48/0x5c [videodev]
> [   48.834704]  __video_do_ioctl+0x180/0x3f0 [videodev]
> [   48.834725]  video_usercopy+0x2ec/0x68c [videodev]
> [   48.834745]  video_ioctl2+0x18/0x24 [videodev]
> [   48.834766]  v4l2_ioctl+0x40/0x60 [videodev]
> [   48.834786]  __arm64_sys_ioctl+0xa8/0xec
> [   48.834793]  invoke_syscall+0x44/0x100
> [   48.834800]  el0_svc_common.constprop.0+0xc0/0xe0
> [   48.834804]  do_el0_svc+0x1c/0x28
> [   48.834809]  el0_svc+0x30/0xd0
> [   48.834813]  el0t_64_sync_handler+0xc0/0xc4
> [   48.834816]  el0t_64_sync+0x190/0x194
> [   48.834820] SMP: stopping secondary CPUs
> [   48.834831] Kernel Offset: disabled
> [   48.834833] CPU features: 0x08,00002002,80200000,4200421b
> [   48.834837] Memory Limit: none
> [   49.161404] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
> 
> Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> ---
>  .../platform/chips-media/wave5/wave5-vpu-dec.c   |  3 ---
>  .../platform/chips-media/wave5/wave5-vpu-enc.c   |  3 ---
>  .../media/platform/chips-media/wave5/wave5-vpu.c |  2 +-
>  .../platform/chips-media/wave5/wave5-vpuapi.c    | 16 ----------------
>  4 files changed, 1 insertion(+), 23 deletions(-)
> 
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-
> media/wave5/wave5-vpu-dec.c
> index 421a9e1a6f15..b4b522d7fa84 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> @@ -1865,9 +1865,6 @@ static int wave5_vpu_open_dec(struct file *filp)
>  	if (ret)
>  		goto cleanup_inst;
>  
> -	if (list_empty(&dev->instances))
> -		pm_runtime_use_autosuspend(inst->dev->dev);
> -
>  	list_add_tail(&inst->list, &dev->instances);
>  
>  	mutex_unlock(&dev->dev_lock);
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c b/drivers/media/platform/chips-
> media/wave5/wave5-vpu-enc.c
> index 52a1a00fd9bb..7f1aa392805f 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> @@ -1779,9 +1779,6 @@ static int wave5_vpu_open_enc(struct file *filp)
>  	if (ret)
>  		goto cleanup_inst;
>  
> -	if (list_empty(&dev->instances))
> -		pm_runtime_use_autosuspend(inst->dev->dev);
> -
>  	list_add_tail(&inst->list, &dev->instances);
>  
>  	mutex_unlock(&dev->dev_lock);
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c b/drivers/media/platform/chips-media/wave5/wave5-
> vpu.c
> index a2c09523d76b..24a9001966e7 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> @@ -368,7 +368,7 @@ static int wave5_vpu_probe(struct platform_device *pdev)
>  	dev_info(&pdev->dev, "Product Code:      0x%x\n", dev->product_code);
>  	dev_info(&pdev->dev, "Firmware Revision: %u\n", fw_revision);
>  
> -	pm_runtime_set_autosuspend_delay(&pdev->dev, 100);
> +	pm_runtime_set_autosuspend_delay(&pdev->dev, 500);
>  	pm_runtime_use_autosuspend(&pdev->dev);
>  	pm_runtime_enable(&pdev->dev);
>  	wave5_vpu_sleep_wake(&pdev->dev, true, NULL, 0);
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c b/drivers/media/platform/chips-media/wave5/wave5-
> vpuapi.c
> index d7318d596b73..1f7f4d214b3c 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> @@ -207,8 +207,6 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
>  	int retry = 0;
>  	struct vpu_device *vpu_dev = inst->dev;
>  	int i;
> -	int inst_count = 0;
> -	struct vpu_instance *inst_elm;
>  	struct dec_output_info dec_info;
>  
>  	*fail_res = 0;
> @@ -265,12 +263,6 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
>  	}
>  
>  	wave5_vdi_free_dma_memory(vpu_dev, &p_dec_info->vb_task);
> -
> -	list_for_each_entry(inst_elm, &vpu_dev->instances, list)
> -		inst_count++;
> -	if (inst_count == 1)
> -		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
> -
>  	mutex_destroy(&inst->feed_lock);
>  
>  unlock_and_return:
> @@ -738,8 +730,6 @@ int wave5_vpu_enc_close(struct vpu_instance *inst, u32 *fail_res)
>  	int ret;
>  	int retry = 0;
>  	struct vpu_device *vpu_dev = inst->dev;
> -	int inst_count = 0;
> -	struct vpu_instance *inst_elm;
>  
>  	*fail_res = 0;
>  	if (!inst->codec_info)
> @@ -782,12 +772,6 @@ int wave5_vpu_enc_close(struct vpu_instance *inst, u32 *fail_res)
>  	}
>  
>  	wave5_vdi_free_dma_memory(vpu_dev, &p_enc_info->vb_task);
> -
> -	list_for_each_entry(inst_elm, &vpu_dev->instances, list)
> -		inst_count++;
> -	if (inst_count == 1)
> -		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
> -
>  	mutex_unlock(&vpu_dev->hw_lock);
>  	pm_runtime_put_sync(inst->dev->dev);
>  

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v2 1/7] media: chips-media: wave5: Fix Null reference while testing fluster
  2025-05-23 17:20   ` Nicolas Dufresne
@ 2025-05-27  4:05     ` jackson.lee
  2025-05-27 12:57       ` Nicolas Dufresne
  0 siblings, 1 reply; 29+ messages in thread
From: jackson.lee @ 2025-05-27  4:05 UTC (permalink / raw)
  To: Nicolas Dufresne, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	sebastian.fricke@collabora.com, bob.beckett@collabora.com,
	dafna.hirschfeld@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung

Hi Nicolas


> -----Original Message-----
> From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> Sent: Saturday, May 24, 2025 2:20 AM
> To: jackson.lee <jackson.lee@chipsnmedia.com>; mchehab@kernel.org;
> hverkuil-cisco@xs4all.nl; sebastian.fricke@collabora.com;
> bob.beckett@collabora.com; dafna.hirschfeld@collabora.com
> Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; lafley.kim
> <lafley.kim@chipsnmedia.com>; b-brnich@ti.com; hverkuil@xs4all.nl; Nas
> Chung <nas.chung@chipsnmedia.com>
> Subject: Re: [PATCH v2 1/7] media: chips-media: wave5: Fix Null reference
> while testing fluster
> 
> Hi,
> 
> Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> > From: Jackson Lee <jackson.lee@chipsnmedia.com>
> >
> > When multi instances are created/destroyed, many interrupts happens or
> > structures for decoder are removed.
> > "struct vpu_instance" this structure is shared for all flow in
> > decoder, so if the structure is not protected by lock, Null reference
> > exception could happens sometimes.
> > IRQ Handler was spilt to two phases and Lock was added as well.
> >
> > Fixes: 9707a6254a8a ("media: chips-media: wave5: Add the v4l2 layer")
> > Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> > Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> > ---
> >  .../platform/chips-media/wave5/wave5-helper.c | 10 ++-
> >  .../chips-media/wave5/wave5-vpu-dec.c         |  5 ++
> >  .../chips-media/wave5/wave5-vpu-enc.c         |  5 ++
> >  .../platform/chips-media/wave5/wave5-vpu.c    | 69
> > ++++++++++++++++---
> >  .../platform/chips-media/wave5/wave5-vpuapi.h |  6 ++
> >  5 files changed, 86 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-helper.c
> > b/drivers/media/platform/chips-media/wave5/wave5-
> > helper.c
> > index 2c9d8cbca6e4..5d9969bb7ada 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-helper.c
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-helper.c
> > @@ -49,7 +49,7 @@ void wave5_cleanup_instance(struct vpu_instance *inst)
> >  		v4l2_fh_del(&inst->v4l2_fh);
> >  		v4l2_fh_exit(&inst->v4l2_fh);
> >  	}
> > -	list_del_init(&inst->list);
> > +	kfifo_free(&inst->irq_status);
> >  	ida_free(&inst->dev->inst_ida, inst->id);
> >  	kfree(inst->codec_info);
> >  	kfree(inst);
> > @@ -61,8 +61,16 @@ int wave5_vpu_release_device(struct file *filp,
> >  {
> >  	struct vpu_instance *inst = wave5_to_vpu_inst(filp->private_data);
> >  	int ret = 0;
> > +	unsigned long flags;
> >
> >  	v4l2_m2m_ctx_release(inst->v4l2_fh.m2m_ctx);
> > +	ret = mutex_lock_interruptible(&inst->dev->irq_lock);
> > +	if (ret)
> > +		return ret;
> 
> This is quite some heavy locking, why do you need both the mutex and the
> spinlock ?


To prevent Null reference exception, the existing irq handler were separated to two modules
One is to queue interrupt reason in the irq hander, the other is to call  the wave5_vpu_dec_finish_decode to get decoded frame.
The list of instances should be protected between all flow of the decoding process, but to protect the list in the irq_handler, spin lock should be used, and 
mutex should be used in the irq_thread because spin lock is not able to be used because mutex is being used in the wave5_vpu_dec_finish_decode.
So to protect the list in the release function, spin lock and mutex were used.



> 
> > +	spin_lock_irqsave(&inst->dev->irq_spinlock, flags);
> > +	list_del_init(&inst->list);
> > +	spin_unlock_irqrestore(&inst->dev->irq_spinlock, flags);
> > +	mutex_unlock(&inst->dev->irq_lock);
> >  	if (inst->state != VPU_INST_STATE_NONE) {
> >  		u32 fail_res;
> >
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > b/drivers/media/platform/chips- media/wave5/wave5-vpu-dec.c index
> > fd71f0c43ac3..32de43de1870 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > @@ -1811,6 +1811,11 @@ static int wave5_vpu_open_dec(struct file *filp)
> >  	inst->xfer_func = V4L2_XFER_FUNC_DEFAULT;
> >
> >  	init_completion(&inst->irq_done);
> > +	ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int), GFP_KERNEL);
> > +	if (ret) {
> > +		dev_err(inst->dev->dev, "failed to allocate fifo\n");
> > +		goto cleanup_inst;
> > +	}
> >
> >  	inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL);
> >  	if (inst->id < 0) {
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > b/drivers/media/platform/chips- media/wave5/wave5-vpu-enc.c index
> > 1e5fc5f8b856..52a1a00fd9bb 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > @@ -1760,6 +1760,11 @@ static int wave5_vpu_open_enc(struct file *filp)
> >  	inst->frame_rate = 30;
> >
> >  	init_completion(&inst->irq_done);
> > +	ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int), GFP_KERNEL);
> > +	if (ret) {
> > +		dev_err(inst->dev->dev, "failed to allocate fifo\n");
> > +		goto cleanup_inst;
> > +	}
> >
> >  	inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL);
> >  	if (inst->id < 0) {
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> > b/drivers/media/platform/chips-media/wave5/wave5-
> > vpu.c
> > index e1715d3f43b0..a2c09523d76b 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> > @@ -51,8 +51,11 @@ static void wave5_vpu_handle_irq(void *dev_id)
> >  	u32 seq_done;
> >  	u32 cmd_done;
> >  	u32 irq_reason;
> > -	struct vpu_instance *inst;
> > +	u32 irq_subreason;
> > +	struct vpu_instance *inst, *tmp;
> >  	struct vpu_device *dev = dev_id;
> > +	int val;
> > +	unsigned long flags;
> >
> >  	irq_reason = wave5_vdi_read_register(dev, W5_VPU_VINT_REASON);
> >  	seq_done = wave5_vdi_read_register(dev,
> > W5_RET_SEQ_DONE_INSTANCE_INFO); @@ -60,7 +63,8 @@ static void
> wave5_vpu_handle_irq(void *dev_id)
> >  	wave5_vdi_write_register(dev, W5_VPU_VINT_REASON_CLR, irq_reason);
> >  	wave5_vdi_write_register(dev, W5_VPU_VINT_CLEAR, 0x1);
> >
> > -	list_for_each_entry(inst, &dev->instances, list) {
> > +	spin_lock_irqsave(&dev->irq_spinlock, flags);
> > +	list_for_each_entry_safe(inst, tmp, &dev->instances, list) {
> >
> >  		if (irq_reason & BIT(INT_WAVE5_INIT_SEQ) ||
> >  		    irq_reason & BIT(INT_WAVE5_ENC_SET_PARAM)) { @@ -82,14
> +86,22
> > @@ static void wave5_vpu_handle_irq(void *dev_id)
> >  		    irq_reason & BIT(INT_WAVE5_ENC_PIC)) {
> >  			if (cmd_done & BIT(inst->id)) {
> >  				cmd_done &= ~BIT(inst->id);
> > -				wave5_vdi_write_register(dev,
> W5_RET_QUEUE_CMD_DONE_INST,
> > -							 cmd_done);
> > -				inst->ops->finish_process(inst);
> > +				if (dev->irq >= 0) {
> > +					irq_subreason =
> > +						wave5_vdi_read_register(dev,
> W5_VPU_VINT_REASON);
> > +					if (!(irq_subreason &
> BIT(INT_WAVE5_DEC_PIC)))
> > +						wave5_vdi_write_register(dev,
> > +
> W5_RET_QUEUE_CMD_DONE_INST,
> > +									 cmd_done);
> > +				}
> > +				val = BIT(INT_WAVE5_DEC_PIC);
> > +				kfifo_in(&inst->irq_status, &val, sizeof(int));
> >  			}
> >  		}
> > -
> > -		wave5_vpu_clear_interrupt(inst, irq_reason);
> >  	}
> > +	spin_unlock_irqrestore(&dev->irq_spinlock, flags);
> > +
> > +	up(&dev->irq_sem);
> >  }
> >
> >  static irqreturn_t wave5_vpu_irq_thread(int irq, void *dev_id) @@
> > -121,6 +133,35 @@ static enum hrtimer_restart
> wave5_vpu_timer_callback(struct hrtimer *timer)
> >  	return HRTIMER_RESTART;
> >  }
> >
> > +static int irq_thread(void *data)
> > +{
> > +	struct vpu_device *dev = (struct vpu_device *)data;
> > +	struct vpu_instance *inst, *tmp;
> > +	int irq_status, ret;
> > +
> > +	while (!kthread_should_stop()) {
> > +		if (down_interruptible(&dev->irq_sem))
> > +			continue;
> > +
> > +		if (kthread_should_stop())
> > +			break;
> > +
> > +		mutex_lock(&dev->irq_lock);
> > +		list_for_each_entry_safe(inst, tmp, &dev->instances, list) {
> > +			while (kfifo_len(&inst->irq_status)) {
> > +				ret = kfifo_out(&inst->irq_status, &irq_status,
> sizeof(int));
> > +				if (!ret)
> > +					break;
> > +
> > +				inst->ops->finish_process(inst);
> > +			}
> > +		}
> > +		mutex_unlock(&dev->irq_lock);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static int wave5_vpu_load_firmware(struct device *dev, const char
> *fw_name,
> >  				   u32 *revision)
> >  {
> > @@ -224,6 +265,8 @@ static int wave5_vpu_probe(struct platform_device
> > *pdev)
> >
> >  	mutex_init(&dev->dev_lock);
> >  	mutex_init(&dev->hw_lock);
> > +	mutex_init(&dev->irq_lock);
> > +	spin_lock_init(&dev->irq_spinlock);
> >  	dev_set_drvdata(&pdev->dev, dev);
> >  	dev->dev = &pdev->dev;
> >
> > @@ -266,6 +309,10 @@ static int wave5_vpu_probe(struct platform_device
> *pdev)
> >  	}
> >  	dev->product = wave5_vpu_get_product_id(dev);
> >
> > +	sema_init(&dev->irq_sem, 1);
> > +	INIT_LIST_HEAD(&dev->instances);
> > +	dev->irq_thread = kthread_run(irq_thread, dev, "irq thread");
> > +
> >  	dev->irq = platform_get_irq(pdev, 0);
> >  	if (dev->irq < 0) {
> >  		dev_err(&pdev->dev, "failed to get irq resource, falling
> back to
> > polling\n"); @@ -288,7 +335,6 @@ static int wave5_vpu_probe(struct
> platform_device *pdev)
> >  		}
> >  	}
> >
> > -	INIT_LIST_HEAD(&dev->instances);
> >  	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
> >  	if (ret) {
> >  		dev_err(&pdev->dev, "v4l2_device_register, fail: %d\n", ret);
> @@
> > -351,6 +397,12 @@ static void wave5_vpu_remove(struct platform_device
> > *pdev)
> >  {
> >  	struct vpu_device *dev = dev_get_drvdata(&pdev->dev);
> >
> > +	if (dev->irq_thread) {
> > +		kthread_stop(dev->irq_thread);
> > +		up(&dev->irq_sem);
> > +		dev->irq_thread = NULL;
> > +	}
> > +
> >  	if (dev->irq < 0) {
> >  		kthread_destroy_worker(dev->worker);
> >  		hrtimer_cancel(&dev->hrtimer);
> > @@ -361,6 +413,7 @@ static void wave5_vpu_remove(struct
> > platform_device *pdev)
> >
> >  	mutex_destroy(&dev->dev_lock);
> >  	mutex_destroy(&dev->hw_lock);
> > +	mutex_destroy(&dev->irq_lock);
> >  	reset_control_assert(dev->resets);
> >  	clk_bulk_disable_unprepare(dev->num_clks, dev->clks);
> >  	wave5_vpu_enc_unregister_device(dev);
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > b/drivers/media/platform/chips-media/wave5/wave5-
> > vpuapi.h
> > index 45615c15beca..f3c1ad6fb3be 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > @@ -8,6 +8,7 @@
> >  #ifndef VPUAPI_H_INCLUDED
> >  #define VPUAPI_H_INCLUDED
> >
> > +#include <linux/kfifo.h>
> >  #include <linux/idr.h>
> >  #include <linux/genalloc.h>
> >  #include <media/v4l2-device.h>
> > @@ -747,6 +748,7 @@ struct vpu_device {
> >  	struct video_device *video_dev_enc;
> >  	struct mutex dev_lock; /* lock for the src, dst v4l2 queues */
> >  	struct mutex hw_lock; /* lock hw configurations */
> > +	struct mutex irq_lock;
> >  	int irq;
> >  	enum product_id product;
> >  	struct vpu_attr attr;
> > @@ -764,7 +766,10 @@ struct vpu_device {
> >  	struct kthread_worker *worker;
> >  	int vpu_poll_interval;
> >  	int num_clks;
> > +	struct task_struct *irq_thread;
> > +	struct semaphore irq_sem;
> 
> Can you comment on what they actually protect ?

Okay

> 
> >  	struct reset_control *resets;
> > +	spinlock_t irq_spinlock; /* protect instances list */
> >  };
> >
> >  struct vpu_instance;
> > @@ -788,6 +793,7 @@ struct vpu_instance {
> >  	enum v4l2_ycbcr_encoding ycbcr_enc;
> >  	enum v4l2_quantization quantization;
> >
> > +	struct kfifo irq_status;
> >  	enum vpu_instance_state state;
> >  	enum vpu_instance_type type;
> >  	const struct vpu_instance_ops *ops;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder
  2025-05-23 17:39   ` Nicolas Dufresne
@ 2025-05-27  4:58     ` jackson.lee
  2025-05-28 13:46       ` Nicolas Dufresne
  2025-05-30 14:33       ` Nicolas Dufresne
  0 siblings, 2 replies; 29+ messages in thread
From: jackson.lee @ 2025-05-27  4:58 UTC (permalink / raw)
  To: Nicolas Dufresne, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	bob.beckett@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung



> -----Original Message-----
> From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> Sent: Saturday, May 24, 2025 2:39 AM
> To: jackson.lee <jackson.lee@chipsnmedia.com>; mchehab@kernel.org;
> hverkuil-cisco@xs4all.nl; sebastian.fricke@collabora.com;
> bob.beckett@collabora.com; dafna.hirschfeld@collabora.com
> Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; lafley.kim
> <lafley.kim@chipsnmedia.com>; b-brnich@ti.com; hverkuil@xs4all.nl; Nas
> Chung <nas.chung@chipsnmedia.com>
> Subject: Re: [PATCH v2 2/7] media: chips-media: wave5: Improve performance
> of decoder
> 
> Hi,
> 
> Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> > From: Jackson Lee <jackson.lee@chipsnmedia.com>
> >
> > The current decoding method  was to wait until each frame was decoded
> > after feeding a bitstream. As a result, performance was low and Wave5
> > could not achieve max pixel processing rate.
> >
> > Update driver to use an asynchronous approach for decoding and feeding
> > a bitstream in order to achieve full capabilities of the device.
> >
> > WAVE5 supports command-queueing to maximize performance by pipelining
> > internal commands and by hiding wait cycle taken to receive a command
> > from Host processor.
> >
> > Instead of waiting for each command to be executed before sending the
> > next command, Host processor just places all the commands in the
> > command-queue and goes on doing other things while the commands in the
> > queue are processed by VPU.
> >
> > While Host processor handles its own tasks, it can receive VPU
> > interrupt request (IRQ).
> > In this case, host processor can simply exit interrupt service routine
> > (ISR) without accessing to host interface to read the result of the
> > command reported by VPU.
> > After host processor completed its tasks, host processor can read the
> > command result when host processor needs the reports and does response
> > processing.
> >
> > To archive this goal, the device_run() calls v4l2_m2m_job_finish so
> > that next command can be sent to VPU continuously, if there is any
> > result, then irq is triggered and gets decoded frames and returns them
> > to upper layer.
> > Theses processes work independently each other without waiting a
> > decoded frame.
> >
> > Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> > Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> > ---
> >  .../platform/chips-media/wave5/wave5-hw.c     |  2 +-
> >  .../chips-media/wave5/wave5-vpu-dec.c         | 84
> > +++++++++++--------
> >  .../platform/chips-media/wave5/wave5-vpuapi.c |  2 +
> >  .../platform/chips-media/wave5/wave5-vpuapi.h |  3 +
> >  4 files changed, 57 insertions(+), 34 deletions(-)
> >
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > b/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > index d94cf84c3ee5..687ce6ccf3ae 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > @@ -102,7 +102,7 @@ static void _wave5_print_reg_err(struct vpu_device
> *vpu_dev, u32 reg_fail_reason
> >  		dev_dbg(dev, "%s: queueing failure: 0x%x\n", func, reg_val);
> >  		break;
> >  	case WAVE5_SYSERR_RESULT_NOT_READY:
> > -		dev_err(dev, "%s: result not ready: 0x%x\n", func,
> reg_fail_reason);
> > +		dev_dbg(dev, "%s: result not ready: 0x%x\n", func,
> > +reg_fail_reason);
> >  		break;
> >  	case WAVE5_SYSERR_ACCESS_VIOLATION_HW:
> >  		dev_err(dev, "%s: access violation: 0x%x\n", func,
> > reg_fail_reason); diff --git
> > a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > b/drivers/media/platform/chips- media/wave5/wave5-vpu-dec.c index
> > 32de43de1870..995234a3a6d6 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > @@ -347,13 +347,12 @@ static void wave5_vpu_dec_finish_decode(struct
> vpu_instance *inst)
> >  	struct vb2_v4l2_buffer *dec_buf = NULL;
> >  	struct vb2_v4l2_buffer *disp_buf = NULL;
> >  	struct vb2_queue *dst_vq = v4l2_m2m_get_dst_vq(m2m_ctx);
> > -	struct queue_status_info q_status;
> >
> >  	dev_dbg(inst->dev->dev, "%s: Fetch output info from firmware.",
> > __func__);
> >
> >  	ret = wave5_vpu_dec_get_output_info(inst, &dec_info);
> >  	if (ret) {
> > -		dev_warn(inst->dev->dev, "%s: could not get output info.",
> __func__);
> > +		dev_dbg(inst->dev->dev, "%s: could not get output info.",
> > +__func__);
> 
> Wouldn't it be better to check the return value to possibly differentiate
> some errors from something similar to EGAIN?
> 

In case of this, when get_result command is requested to VPU, there could be no output.
So it is not error case and EAGIAN is not proper because the wave5_vpu_dec_finish_decode is triggered by PIC_Done interrupt.
So I think it is proper code.


> >  		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> >  		return;
> >  	}
> > @@ -441,20 +440,6 @@ static void wave5_vpu_dec_finish_decode(struct
> vpu_instance *inst)
> >  		}
> >  		spin_unlock_irqrestore(&inst->state_spinlock, flags);
> >  	}
> > -
> > -	/*
> > -	 * During a resolution change and while draining, the firmware may
> flush
> > -	 * the reorder queue regardless of having a matching decoding
> operation
> > -	 * pending. Only terminate the job if there are no more IRQ coming.
> > -	 */
> > -	wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, &q_status);
> > -	if (q_status.report_queue_count == 0 &&
> > -	    (q_status.instance_queue_count == 0 ||
> dec_info.sequence_changed)) {
> > -		dev_dbg(inst->dev->dev, "%s: finishing job.\n", __func__);
> > -		pm_runtime_mark_last_busy(inst->dev->dev);
> > -		pm_runtime_put_autosuspend(inst->dev->dev);
> > -		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> > -	}
> >  }
> >
> >  static int wave5_vpu_dec_querycap(struct file *file, void *fh, struct
> > v4l2_capability *cap) @@ -1146,8 +1131,8 @@ static int
> > write_to_ringbuffer(struct vpu_instance *inst, void *buffer, size_t b
> >  static int fill_ringbuffer(struct vpu_instance *inst)
> >  {
> >  	struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
> > -	struct v4l2_m2m_buffer *buf, *n;
> > -	int ret;
> > +	struct vpu_src_buffer *vpu_buf;
> > +	int ret = 0;
> >
> >  	if (m2m_ctx->last_src_buf)  {
> >  		struct vpu_src_buffer *vpu_buf =
> > wave5_to_vpu_src_buf(m2m_ctx->last_src_buf);
> > @@ -1158,9 +1143,8 @@ static int fill_ringbuffer(struct vpu_instance
> *inst)
> >  		}
> >  	}
> >
> > -	v4l2_m2m_for_each_src_buf_safe(m2m_ctx, buf, n) {
> > -		struct vb2_v4l2_buffer *vbuf = &buf->vb;
> > -		struct vpu_src_buffer *vpu_buf = wave5_to_vpu_src_buf(vbuf);
> > +	list_for_each_entry(vpu_buf, &inst->avail_src_bufs, list) {
> > +		struct vb2_v4l2_buffer *vbuf = &vpu_buf->v4l2_m2m_buf.vb;
> >  		struct vpu_buf *ring_buffer = &inst->bitstream_vbuf;
> >  		size_t src_size = vb2_get_plane_payload(&vbuf->vb2_buf, 0);
> >  		void *src_buf = vb2_plane_vaddr(&vbuf->vb2_buf, 0); @@ -
> 1220,9
> > +1204,13 @@ static int fill_ringbuffer(struct vpu_instance *inst)
> >  			dev_dbg(inst->dev->dev, "last src buffer written to
> the ring buffer\n");
> >  			break;
> >  		}
> > +
> > +		inst->queuing_num++;
> > +		list_del_init(&vpu_buf->list);
> > +		break;
> >  	}
> >
> > -	return 0;
> > +	return ret;
> >  }
> >
> >  static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb) @@
> > -1236,6 +1224,11 @@ static void wave5_vpu_dec_buf_queue_src(struct
> vb2_buffer *vb)
> >  	vbuf->sequence = inst->queued_src_buf_num++;
> >
> >  	v4l2_m2m_buf_queue(m2m_ctx, vbuf);
> > +
> > +	INIT_LIST_HEAD(&vpu_buf->list);
> > +	mutex_lock(&inst->feed_lock);
> > +	list_add_tail(&vpu_buf->list, &inst->avail_src_bufs);
> > +	mutex_unlock(&inst->feed_lock);
> >  }
> >
> >  static void wave5_vpu_dec_buf_queue_dst(struct vb2_buffer *vb) @@
> > -1385,6 +1378,13 @@ static int streamoff_output(struct vb2_queue *q)
> >  	dma_addr_t new_rd_ptr;
> >  	struct dec_output_info dec_info;
> >  	unsigned int i;
> > +	struct vpu_src_buffer *vpu_buf, *tmp;
> > +
> > +	inst->retry = false;
> > +	inst->queuing_num = 0;
> > +
> > +	list_for_each_entry_safe(vpu_buf, tmp, &inst->avail_src_bufs, list)
> > +		list_del_init(&vpu_buf->list);
> >
> >  	for (i = 0; i < v4l2_m2m_num_dst_bufs_ready(m2m_ctx); i++) {
> >  		ret = wave5_vpu_dec_set_disp_flag(inst, i); @@ -1580,10
> +1580,19 @@
> > static void wave5_vpu_dec_device_run(void *priv)
> >
> >  	dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with new
> bitstream data", __func__);
> >  	pm_runtime_resume_and_get(inst->dev->dev);
> > -	ret = fill_ringbuffer(inst);
> > -	if (ret) {
> > -		dev_warn(inst->dev->dev, "Filling ring buffer failed\n");
> > -		goto finish_job_and_return;
> > +	if (!inst->retry) {
> > +		mutex_lock(&inst->feed_lock);
> > +		ret = fill_ringbuffer(inst);
> > +		mutex_unlock(&inst->feed_lock);
> > +		if (ret < 0) {
> > +			dev_warn(inst->dev->dev, "Filling ring buffer
> failed\n");
> > +			goto finish_job_and_return;
> > +		} else if (!inst->eos &&
> > +				inst->queuing_num == 0 &&
> > +				inst->state == VPU_INST_STATE_PIC_RUN) {
> > +			dev_dbg(inst->dev->dev, "%s: no bitstream for feeding,
> so skip ", __func__);
> > +			goto finish_job_and_return;
> > +		}
> >  	}
> >
> >  	switch (inst->state) {
> > @@ -1639,7 +1648,7 @@ static void wave5_vpu_dec_device_run(void *priv)
> >  		}
> >
> >  		if (q_status.instance_queue_count) {
> > -			dev_dbg(inst->dev->dev, "%s: leave with active job",
> __func__);
> > +			v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> >  			return;
> >  		}
> >
> > @@ -1650,14 +1659,21 @@ static void wave5_vpu_dec_device_run(void *priv)
> >  			dev_err(inst->dev->dev,
> >  				"Frame decoding on m2m context (%p), fail: %d
> (result: %d)\n",
> >  				m2m_ctx, ret, fail_res);
> > -			break;
> > +			goto finish_job_and_return;
> > +		}
> > +
> > +		if (fail_res == WAVE5_SYSERR_QUEUEING_FAIL) {
> > +			inst->retry = true;
> > +		} else {
> > +			inst->retry = false;
> > +			if (!inst->eos)
> > +				inst->queuing_num--;
> 
> I looked into the original state machine violation you had in previous
> version, and I got the impression that the reason you did hit that was
> that you actually call device_run passed inst->eos. Its probably not that
> simple in practice, but I think you forgot to adapt the job_ready() ops to
> prevent more device_run() called passed CMD_STOP and having all pending
> buffer written in the ring buffer.
> 
> As a side effect, you endup calling device_run() in a race with the
> finish() setting the state to STOP. I really think there is a way to use
> inst->eos boolean to prevent that race in the first place. Might need to
> be combined with checking if you have buffers prior to command stop that
> did not yet fit into the ring buffer.
> 


The queuing_num is used to check if there is input data or not, so it was declared in the int type.
If there is no input data, then the device_run will be not called until queuing input data.
In case of eos sent, device_run should be ran continuously until getting eos from VPU, the code was needed.
If my answer is not correct, please let me know.


> >  		}
> > -		/* Return so that we leave this job active */
> > -		dev_dbg(inst->dev->dev, "%s: leave with active job",
> __func__);
> > -		return;
> > -	default:
> > -		WARN(1, "Execution of a job in state %s illegal.\n",
> state_to_str(inst->state));
> >  		break;
> > +	default:
> > +		dev_dbg(inst->dev->dev, "Execution of a job in state %s
> illegal.\n",
> > +			state_to_str(inst->state));
> > +
> >  	}
> >
> >  finish_job_and_return:
> > @@ -1755,6 +1771,8 @@ static int wave5_vpu_open_dec(struct file *filp)
> >  	inst->ops = &wave5_vpu_dec_inst_ops;
> >
> >  	spin_lock_init(&inst->state_spinlock);
> > +	mutex_init(&inst->feed_lock);
> > +	INIT_LIST_HEAD(&inst->avail_src_bufs);
> >
> >  	inst->codec_info = kzalloc(sizeof(*inst->codec_info), GFP_KERNEL);
> >  	if (!inst->codec_info)
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > b/drivers/media/platform/chips-media/wave5/wave5-
> > vpuapi.c
> > index e5e879a13e8b..68d86625538f 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > @@ -255,6 +255,8 @@ int wave5_vpu_dec_close(struct vpu_instance *inst,
> u32 *fail_res)
> >  	if (inst_count == 1)
> >  		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
> >
> > +	mutex_destroy(&inst->feed_lock);
> > +
> >  unlock_and_return:
> >  	mutex_unlock(&vpu_dev->hw_lock);
> >  	pm_runtime_put_sync(inst->dev->dev);
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > b/drivers/media/platform/chips-media/wave5/wave5-
> > vpuapi.h
> > index f3c1ad6fb3be..fd0aef0bac4e 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > @@ -818,6 +818,9 @@ struct vpu_instance {
> >  	bool cbcr_interleave;
> >  	bool nv21;
> >  	bool eos;
> > +	bool retry; /* retry to feed bitstream if failure reason is
> WAVE5_SYSERR_QUEUEING_FAIL*/
> > +	int queuing_num; /* check if there is input buffer or not */
> 
> This is described as a boolean, but is implemented as a counter. What does
> it count exactly ?
> I think it needs a better name too.
> 

Please refer to the above comment.

Thanks
Jackson


> Nicolas
> 
> > +	struct mutex feed_lock; /* lock for feeding bitstream buffers */
> >  	struct vpu_buf bitstream_vbuf;
> >  	dma_addr_t last_rd_ptr;
> >  	size_t remaining_consumed_bytes;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v2 4/7] media: chips-media: wave5: Use spinlock whenever statue is changed
  2025-05-23 17:41   ` Nicolas Dufresne
@ 2025-05-27  5:02     ` jackson.lee
  2025-05-28 13:49       ` Nicolas Dufresne
  0 siblings, 1 reply; 29+ messages in thread
From: jackson.lee @ 2025-05-27  5:02 UTC (permalink / raw)
  To: Nicolas Dufresne, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	bob.beckett@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung

Hi Nicolas

> -----Original Message-----
> From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> Sent: Saturday, May 24, 2025 2:41 AM
> To: jackson.lee <jackson.lee@chipsnmedia.com>; mchehab@kernel.org;
> hverkuil-cisco@xs4all.nl; sebastian.fricke@collabora.com;
> bob.beckett@collabora.com; dafna.hirschfeld@collabora.com
> Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; lafley.kim
> <lafley.kim@chipsnmedia.com>; b-brnich@ti.com; hverkuil@xs4all.nl; Nas
> Chung <nas.chung@chipsnmedia.com>
> Subject: Re: [PATCH v2 4/7] media: chips-media: wave5: Use spinlock
> whenever statue is changed
> 
> Hi,
> 
> Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> > From: Jackson Lee <jackson.lee@chipsnmedia.com>
> >
> > The device_run and finish_decode is not any more synchronized, so lock
> > was needed in the device_run whenever state was changed.
> 
> Can you try to introduce the locking ahead of the patches, otherwise this
> break bisectability as the in-between become racy.


Do you want to introduce this patch ahead of the performance patch?

Thanks
Jackson

> 
> Nicolas
> 
> >
> > Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> > Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> > ---
> >  drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > b/drivers/media/platform/chips- media/wave5/wave5-vpu-dec.c index
> > 42981c3b49bc..719c5527eb7f 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > @@ -1577,6 +1577,7 @@ static void wave5_vpu_dec_device_run(void *priv)
> >  	struct queue_status_info q_status;
> >  	u32 fail_res = 0;
> >  	int ret = 0;
> > +	unsigned long flags;
> >
> >  	dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with new
> bitstream data", __func__);
> >  	pm_runtime_resume_and_get(inst->dev->dev);
> > @@ -1617,7 +1618,9 @@ static void wave5_vpu_dec_device_run(void *priv)
> >  			}
> >  			spin_unlock_irqrestore(&inst->state_spinlock, flags);
> >  		} else {
> > +			spin_lock_irqsave(&inst->state_spinlock, flags);
> >  			switch_state(inst, VPU_INST_STATE_INIT_SEQ);
> > +			spin_unlock_irqrestore(&inst->state_spinlock, flags);
> >  		}
> >
> >  		break;
> > @@ -1628,8 +1631,9 @@ static void wave5_vpu_dec_device_run(void *priv)
> >  		 * we had a chance to switch, which leads to an invalid state
> >  		 * change.
> >  		 */
> > +		spin_lock_irqsave(&inst->state_spinlock, flags);
> >  		switch_state(inst, VPU_INST_STATE_PIC_RUN);
> > -
> > +		spin_unlock_irqrestore(&inst->state_spinlock, flags);
> >  		/*
> >  		 * During DRC, the picture decoding remains pending, so just
> leave the job
> >  		 * active until this decode operation completes.
> > @@ -1643,7 +1647,9 @@ static void wave5_vpu_dec_device_run(void *priv)
> >  		ret = wave5_prepare_fb(inst);
> >  		if (ret) {
> >  			dev_warn(inst->dev->dev, "Framebuffer preparation,
> fail: %d\n",
> > ret);
> > +			spin_lock_irqsave(&inst->state_spinlock, flags);
> >  			switch_state(inst, VPU_INST_STATE_STOP);
> > +			spin_unlock_irqrestore(&inst->state_spinlock, flags);
> >  			break;
> >  		}
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v2 5/7] media: chips-media: wave5: Fix not to free resources normally when instance was destroyed
  2025-05-23 17:42   ` Nicolas Dufresne
@ 2025-05-27  5:04     ` jackson.lee
  0 siblings, 0 replies; 29+ messages in thread
From: jackson.lee @ 2025-05-27  5:04 UTC (permalink / raw)
  To: Nicolas Dufresne, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	sebastian.fricke@collabora.com, bob.beckett@collabora.com,
	dafna.hirschfeld@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung

Hi Nicolas

> -----Original Message-----
> From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> Sent: Saturday, May 24, 2025 2:42 AM
> To: jackson.lee <jackson.lee@chipsnmedia.com>; mchehab@kernel.org;
> hverkuil-cisco@xs4all.nl; sebastian.fricke@collabora.com;
> bob.beckett@collabora.com; dafna.hirschfeld@collabora.com
> Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; lafley.kim
> <lafley.kim@chipsnmedia.com>; b-brnich@ti.com; hverkuil@xs4all.nl; Nas
> Chung <nas.chung@chipsnmedia.com>
> Subject: Re: [PATCH v2 5/7] media: chips-media: wave5: Fix not to free
> resources normally when instance was destroyed
> 
> Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> > From: Jackson Lee <jackson.lee@chipsnmedia.com>
> >
> > Since applying performance patch, there was a problem not to free
> > resources, the root cause was that timeout sometimes happened after
> > calling the wave5_vpu_dec_finish_seq() when application was closed
> > forcibly,so if failure reason is WAVE5_SYSERR_VPU_STILL_RUNNING, the
> > wave5_vpu_dec_get_output_info() should be called to flush videos
> > decoded before closed.
> 
> Either squash, or try to bring before too.
> 

I will squash this patch to the performance patch.

Thanks
Jackson

> Nicolas
> 
> >
> > Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> > Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> > ---
> >  .../platform/chips-media/wave5/wave5-vpuapi.c | 20
> > +++++++++++++++++--
> >  1 file changed, 18 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > b/drivers/media/platform/chips-media/wave5/wave5-
> > vpuapi.c
> > index 68d86625538f..d7318d596b73 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > @@ -209,6 +209,7 @@ int wave5_vpu_dec_close(struct vpu_instance *inst,
> u32 *fail_res)
> >  	int i;
> >  	int inst_count = 0;
> >  	struct vpu_instance *inst_elm;
> > +	struct dec_output_info dec_info;
> >
> >  	*fail_res = 0;
> >  	if (!inst->codec_info)
> > @@ -229,11 +230,26 @@ int wave5_vpu_dec_close(struct vpu_instance *inst,
> u32 *fail_res)
> >  			goto unlock_and_return;
> >  		}
> >
> > -		if (*fail_res == WAVE5_SYSERR_VPU_STILL_RUNNING &&
> > -		    retry++ >= MAX_FIRMWARE_CALL_RETRY) {
> > +		if (ret == 0)
> > +			break;
> > +
> > +		if (*fail_res != WAVE5_SYSERR_VPU_STILL_RUNNING) {
> > +			dev_warn(inst->dev->dev, "dec_finish_seq timed out\n");
> > +			goto unlock_and_return;
> > +		}
> > +
> > +		if (retry++ >= MAX_FIRMWARE_CALL_RETRY) {
> >  			ret = -ETIMEDOUT;
> >  			goto unlock_and_return;
> >  		}
> > +
> > +		mutex_unlock(&vpu_dev->hw_lock);
> > +		wave5_vpu_dec_get_output_info(inst, &dec_info);
> > +		ret = mutex_lock_interruptible(&vpu_dev->hw_lock);
> > +		if (ret) {
> > +			pm_runtime_put_sync(inst->dev->dev);
> > +			return ret;
> > +		}
> >  	} while (ret != 0);
> >
> >  	dev_dbg(inst->dev->dev, "%s: dec_finish_seq complete\n", __func__);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v2 6/7] media: chips-media: wave5: Reduce high CPU load
  2025-05-23 17:43   ` Nicolas Dufresne
@ 2025-05-27  5:05     ` jackson.lee
  0 siblings, 0 replies; 29+ messages in thread
From: jackson.lee @ 2025-05-27  5:05 UTC (permalink / raw)
  To: Nicolas Dufresne, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	sebastian.fricke@collabora.com, bob.beckett@collabora.com,
	dafna.hirschfeld@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung

Hi Nicolas

> -----Original Message-----
> From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> Sent: Saturday, May 24, 2025 2:43 AM
> To: jackson.lee <jackson.lee@chipsnmedia.com>; mchehab@kernel.org;
> hverkuil-cisco@xs4all.nl; sebastian.fricke@collabora.com;
> bob.beckett@collabora.com; dafna.hirschfeld@collabora.com
> Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; lafley.kim
> <lafley.kim@chipsnmedia.com>; b-brnich@ti.com; hverkuil@xs4all.nl; Nas
> Chung <nas.chung@chipsnmedia.com>
> Subject: Re: [PATCH v2 6/7] media: chips-media: wave5: Reduce high CPU
> load
> 
> Hi,
> 
> Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> > From: Jackson Lee <jackson.lee@chipsnmedia.com>
> >
> > Since applying changes for performance improvement of decoder, there
> > was a problem related to high CPU load.
> > CPU load was more than 4 times when comparing CPU load.
> > The root cause was the device_run was called many times even if there
> > was no bitstream which should be queued.
> 
> You should squash this.
> 
> Nicolas
> 


I will also squash this to the performance patch.

Thanks
Jackson


> >
> > Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> > Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> > ---
> >  .../media/platform/chips-media/wave5/wave5-vpu-dec.c | 12
> > +++++++++---
> >  .../media/platform/chips-media/wave5/wave5-vpuapi.h  |  1 +
> >  2 files changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > b/drivers/media/platform/chips- media/wave5/wave5-vpu-dec.c index
> > 719c5527eb7f..421a9e1a6f15 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > @@ -1280,10 +1280,13 @@ static void wave5_vpu_dec_buf_queue(struct
> vb2_buffer *vb)
> >  		__func__, vb->type, vb->index, vb2_plane_size(&vbuf->vb2_buf,
> 0),
> >  		vb2_plane_size(&vbuf->vb2_buf, 1), vb2_plane_size(&vbuf-
> >vb2_buf,
> > 2));
> >
> > -	if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE)
> > +	if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {
> > +		if (inst->empty_queue)
> > +			inst->empty_queue = false;
> >  		wave5_vpu_dec_buf_queue_src(vb);
> > -	else if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE)
> > +	} else if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) {
> >  		wave5_vpu_dec_buf_queue_dst(vb);
> > +	}
> >  }
> >
> >  static int wave5_vpu_dec_allocate_ring_buffer(struct vpu_instance
> > *inst) @@ -1474,6 +1477,7 @@ static void
> > wave5_vpu_dec_stop_streaming(struct vb2_queue *q)
> >
> >  	dev_dbg(inst->dev->dev, "%s: type: %u\n", __func__, q->type);
> >  	pm_runtime_resume_and_get(inst->dev->dev);
> > +	inst->empty_queue = false;
> >
> >  	while (check_cmd) {
> >  		struct queue_status_info q_status;
> > @@ -1592,6 +1596,7 @@ static void wave5_vpu_dec_device_run(void *priv)
> >  				inst->queuing_num == 0 &&
> >  				inst->state == VPU_INST_STATE_PIC_RUN) {
> >  			dev_dbg(inst->dev->dev, "%s: no bitstream for feeding,
> so skip ",
> > __func__);
> > +			inst->empty_queue = true;
> >  			goto finish_job_and_return;
> >  		}
> >  	}
> > @@ -1737,7 +1742,8 @@ static int wave5_vpu_dec_job_ready(void *priv)
> >  				"No capture buffer ready to decode!\n");
> >  			break;
> >  		} else if (!wave5_is_draining_or_eos(inst) &&
> > -			   !v4l2_m2m_num_src_bufs_ready(m2m_ctx)) {
> > +			   (!v4l2_m2m_num_src_bufs_ready(m2m_ctx) ||
> > +			    inst->empty_queue)) {
> >  			dev_dbg(inst->dev->dev,
> >  				"No bitstream data to decode!\n");
> >  			break;
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > b/drivers/media/platform/chips-media/wave5/wave5-
> > vpuapi.h
> > index fd0aef0bac4e..f2596af08cdf 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > @@ -821,6 +821,7 @@ struct vpu_instance {
> >  	bool retry; /* retry to feed bitstream if failure reason is
> WAVE5_SYSERR_QUEUEING_FAIL*/
> >  	int queuing_num; /* check if there is input buffer or not */
> >  	struct mutex feed_lock; /* lock for feeding bitstream buffers */
> > +	bool empty_queue;
> >  	struct vpu_buf bitstream_vbuf;
> >  	dma_addr_t last_rd_ptr;
> >  	size_t remaining_consumed_bytes;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v2 7/7] media: chips-media: wave5: Fix SError of kernel panic when closed
  2025-05-23 17:48   ` Nicolas Dufresne
@ 2025-05-27  5:07     ` jackson.lee
  0 siblings, 0 replies; 29+ messages in thread
From: jackson.lee @ 2025-05-27  5:07 UTC (permalink / raw)
  To: Nicolas Dufresne, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	sebastian.fricke@collabora.com, bob.beckett@collabora.com,
	dafna.hirschfeld@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung

Hi Nicolas

> -----Original Message-----
> From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> Sent: Saturday, May 24, 2025 2:48 AM
> To: jackson.lee <jackson.lee@chipsnmedia.com>; mchehab@kernel.org;
> hverkuil-cisco@xs4all.nl; sebastian.fricke@collabora.com;
> bob.beckett@collabora.com; dafna.hirschfeld@collabora.com
> Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; lafley.kim
> <lafley.kim@chipsnmedia.com>; b-brnich@ti.com; hverkuil@xs4all.nl; Nas
> Chung <nas.chung@chipsnmedia.com>
> Subject: Re: [PATCH v2 7/7] media: chips-media: wave5: Fix SError of
> kernel panic when closed
> 
> Hi,
> 
> 
> Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> > From: Jackson Lee <jackson.lee@chipsnmedia.com>
> >
> > Since applying "Reduce high CPU load" patch, SError of kernel panic
> > rarely happened while testing fluster.
> > The root cause was to enter suspend mode because timeout of
> > autosuspend delay happened.
> 
> This would need to be done ahead of other patches, before it breaks.
> Toggling auto-suspend seems suspicious. I'm pretty sure this was always a
> bit fishy, so I'm fine with removing that. Normally get/put once
> everything is configured should be fine.
> 

Okay

Thanks
Jackson


> >
> > [   48.834439] SError Interrupt on CPU0, code 0x00000000bf000000 --
> > SError [   48.834455] CPU: 0 UID: 0 PID: 1067 Comm: v4l2h265dec0:sr
> > Not tainted 6.12.9-gc9e21a1ebd75-dirty #7 [   48.834461] Hardware
> > name: ti Texas Instruments J721S2 EVM/Texas Instruments J721S2 EVM,
> > BIOS 2025.01-00345- gbaf3aaa8ecfa 01/01/2025 [   48.834464] pstate:
> > 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [   48.834468]
> > pc : wave5_dec_clr_disp_flag+0x40/0x80 [wave5] [   48.834488] lr :
> > wave5_dec_clr_disp_flag+0x40/0x80 [wave5] [   48.834495] sp :
> > ffff8000856e3a30 [   48.834497] x29: ffff8000856e3a30 x28:
> > ffff0008093f6010 x27: ffff000809158130 [   48.834504] x26:
> > 0000000000000000 x25: ffff00080b625000 x24: ffff000804a9ba80 [
> > 48.834509] x23: ffff000802343028 x22: ffff000809158150 x21:
> > ffff000802218000 [   48.834513] x20: ffff0008093f6000 x19:
> > ffff0008093f6000 x18: 0000000000000000 [   48.834518] x17:
> > 0000000000000000 x16: 0000000000000000 x15: 0000ffff74009618 [
> > 48.834523] x14: 000000010000000c x13: 0000000000000000 x12:
> > 0000000000000000 [   48.834527] x11: ffffffffffffffff x10:
> > ffffffffffffffff x9 : ffff000802343028 [   48.834532] x8 :
> > ffff00080b6252a0 x7 : 0000000000000038 x6 : 0000000000000000 [
> > 48.834536] x5 : ffff00080b625060 x4 : 0000000000000000 x3 :
> > 0000000000000000 [   48.834541] x2 : 0000000000000000 x1 :
> > ffff800084bf0118 x0 : ffff800084bf0000 [   48.834547] Kernel panic -
> > not syncing: Asynchronous SError Interrupt [   48.834549] CPU: 0 UID:
> > 0 PID: 1067 Comm: v4l2h265dec0:sr Not tainted
> > 6.12.9-gc9e21a1ebd75-dirty #7
> 
> Hopefully you also test on mainline.
> 
> Nicolas
> 
> > [   48.834554] Hardware name: ti Texas Instruments J721S2 EVM/Texas
> > Instruments J721S2 EVM, BIOS 2025.01-00345- gbaf3aaa8ecfa 01/01/2025 [
> > 48.834556] Call trace:
> > [   48.834559]  dump_backtrace+0x94/0xec [   48.834574]
> > show_stack+0x18/0x24 [   48.834579]  dump_stack_lvl+0x38/0x90 [
> > 48.834585]  dump_stack+0x18/0x24 [   48.834588]  panic+0x35c/0x3e0 [
> > 48.834592]  nmi_panic+0x40/0x8c [   48.834595]
> > arm64_serror_panic+0x64/0x70 [   48.834598]  do_serror+0x3c/0x78 [
> > 48.834601]  el1h_64_error_handler+0x34/0x4c [   48.834605]
> > el1h_64_error+0x64/0x68 [   48.834608]
> > wave5_dec_clr_disp_flag+0x40/0x80 [wave5] [   48.834615]
> > wave5_vpu_dec_clr_disp_flag+0x54/0x80 [wave5] [   48.834622]
> > wave5_vpu_dec_buf_queue+0x19c/0x1a0 [wave5] [   48.834628]
> > __enqueue_in_driver+0x3c/0x74 [videobuf2_common] [   48.834639]
> > vb2_core_qbuf+0x508/0x61c [videobuf2_common] [   48.834646]
> > vb2_qbuf+0xa4/0x168 [videobuf2_v4l2] [   48.834656]
> > v4l2_m2m_qbuf+0x80/0x238 [v4l2_mem2mem] [   48.834666]
> > v4l2_m2m_ioctl_qbuf+0x18/0x24 [v4l2_mem2mem] [   48.834673]
> > v4l_qbuf+0x48/0x5c [videodev] [   48.834704]
> > __video_do_ioctl+0x180/0x3f0 [videodev] [   48.834725]
> > video_usercopy+0x2ec/0x68c [videodev] [   48.834745]
> > video_ioctl2+0x18/0x24 [videodev] [   48.834766]  v4l2_ioctl+0x40/0x60
> > [videodev] [   48.834786]  __arm64_sys_ioctl+0xa8/0xec [   48.834793]
> > invoke_syscall+0x44/0x100 [   48.834800]
> > el0_svc_common.constprop.0+0xc0/0xe0
> > [   48.834804]  do_el0_svc+0x1c/0x28
> > [   48.834809]  el0_svc+0x30/0xd0
> > [   48.834813]  el0t_64_sync_handler+0xc0/0xc4 [   48.834816]
> > el0t_64_sync+0x190/0x194 [   48.834820] SMP: stopping secondary CPUs [
> > 48.834831] Kernel Offset: disabled [   48.834833] CPU features:
> > 0x08,00002002,80200000,4200421b [   48.834837] Memory Limit: none [
> > 49.161404] ---[ end Kernel panic - not syncing: Asynchronous SError
> > Interrupt ]---
> >
> > Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> > Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> > ---
> >  .../platform/chips-media/wave5/wave5-vpu-dec.c   |  3 ---
> >  .../platform/chips-media/wave5/wave5-vpu-enc.c   |  3 ---
> >  .../media/platform/chips-media/wave5/wave5-vpu.c |  2 +-
> >  .../platform/chips-media/wave5/wave5-vpuapi.c    | 16
> > ----------------
> >  4 files changed, 1 insertion(+), 23 deletions(-)
> >
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > b/drivers/media/platform/chips- media/wave5/wave5-vpu-dec.c index
> > 421a9e1a6f15..b4b522d7fa84 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > @@ -1865,9 +1865,6 @@ static int wave5_vpu_open_dec(struct file *filp)
> >  	if (ret)
> >  		goto cleanup_inst;
> >
> > -	if (list_empty(&dev->instances))
> > -		pm_runtime_use_autosuspend(inst->dev->dev);
> > -
> >  	list_add_tail(&inst->list, &dev->instances);
> >
> >  	mutex_unlock(&dev->dev_lock);
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > b/drivers/media/platform/chips- media/wave5/wave5-vpu-enc.c index
> > 52a1a00fd9bb..7f1aa392805f 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > @@ -1779,9 +1779,6 @@ static int wave5_vpu_open_enc(struct file *filp)
> >  	if (ret)
> >  		goto cleanup_inst;
> >
> > -	if (list_empty(&dev->instances))
> > -		pm_runtime_use_autosuspend(inst->dev->dev);
> > -
> >  	list_add_tail(&inst->list, &dev->instances);
> >
> >  	mutex_unlock(&dev->dev_lock);
> > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> > b/drivers/media/platform/chips-media/wave5/wave5-
> > vpu.c
> > index a2c09523d76b..24a9001966e7 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> > @@ -368,7 +368,7 @@ static int wave5_vpu_probe(struct platform_device
> *pdev)
> >  	dev_info(&pdev->dev, "Product Code:      0x%x\n", dev-
> >product_code);
> >  	dev_info(&pdev->dev, "Firmware Revision: %u\n", fw_revision);
> >
> > -	pm_runtime_set_autosuspend_delay(&pdev->dev, 100);
> > +	pm_runtime_set_autosuspend_delay(&pdev->dev, 500);
> >  	pm_runtime_use_autosuspend(&pdev->dev);
> >  	pm_runtime_enable(&pdev->dev);
> >  	wave5_vpu_sleep_wake(&pdev->dev, true, NULL, 0); diff --git
> > a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > b/drivers/media/platform/chips-media/wave5/wave5-
> > vpuapi.c
> > index d7318d596b73..1f7f4d214b3c 100644
> > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > @@ -207,8 +207,6 @@ int wave5_vpu_dec_close(struct vpu_instance *inst,
> u32 *fail_res)
> >  	int retry = 0;
> >  	struct vpu_device *vpu_dev = inst->dev;
> >  	int i;
> > -	int inst_count = 0;
> > -	struct vpu_instance *inst_elm;
> >  	struct dec_output_info dec_info;
> >
> >  	*fail_res = 0;
> > @@ -265,12 +263,6 @@ int wave5_vpu_dec_close(struct vpu_instance *inst,
> u32 *fail_res)
> >  	}
> >
> >  	wave5_vdi_free_dma_memory(vpu_dev, &p_dec_info->vb_task);
> > -
> > -	list_for_each_entry(inst_elm, &vpu_dev->instances, list)
> > -		inst_count++;
> > -	if (inst_count == 1)
> > -		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
> > -
> >  	mutex_destroy(&inst->feed_lock);
> >
> >  unlock_and_return:
> > @@ -738,8 +730,6 @@ int wave5_vpu_enc_close(struct vpu_instance *inst,
> u32 *fail_res)
> >  	int ret;
> >  	int retry = 0;
> >  	struct vpu_device *vpu_dev = inst->dev;
> > -	int inst_count = 0;
> > -	struct vpu_instance *inst_elm;
> >
> >  	*fail_res = 0;
> >  	if (!inst->codec_info)
> > @@ -782,12 +772,6 @@ int wave5_vpu_enc_close(struct vpu_instance *inst,
> u32 *fail_res)
> >  	}
> >
> >  	wave5_vdi_free_dma_memory(vpu_dev, &p_enc_info->vb_task);
> > -
> > -	list_for_each_entry(inst_elm, &vpu_dev->instances, list)
> > -		inst_count++;
> > -	if (inst_count == 1)
> > -		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
> > -
> >  	mutex_unlock(&vpu_dev->hw_lock);
> >  	pm_runtime_put_sync(inst->dev->dev);
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 1/7] media: chips-media: wave5: Fix Null reference while testing fluster
  2025-05-27  4:05     ` jackson.lee
@ 2025-05-27 12:57       ` Nicolas Dufresne
  0 siblings, 0 replies; 29+ messages in thread
From: Nicolas Dufresne @ 2025-05-27 12:57 UTC (permalink / raw)
  To: jackson.lee, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	sebastian.fricke@collabora.com, bob.beckett@collabora.com,
	dafna.hirschfeld@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung

Hi,

Le mardi 27 mai 2025 à 04:05 +0000, jackson.lee a écrit :
> Hi Nicolas
> 
> 
> > -----Original Message-----
> > From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> > Sent: Saturday, May 24, 2025 2:20 AM
> > To: jackson.lee <jackson.lee@chipsnmedia.com>; mchehab@kernel.org;
> > hverkuil-cisco@xs4all.nl; sebastian.fricke@collabora.com;
> > bob.beckett@collabora.com; dafna.hirschfeld@collabora.com
> > Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; lafley.kim
> > <lafley.kim@chipsnmedia.com>; b-brnich@ti.com; hverkuil@xs4all.nl; Nas
> > Chung <nas.chung@chipsnmedia.com>
> > Subject: Re: [PATCH v2 1/7] media: chips-media: wave5: Fix Null reference
> > while testing fluster
> > 
> > Hi,
> > 
> > Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> > > From: Jackson Lee <jackson.lee@chipsnmedia.com>
> > > 
> > > When multi instances are created/destroyed, many interrupts happens or
> > > structures for decoder are removed.
> > > "struct vpu_instance" this structure is shared for all flow in
> > > decoder, so if the structure is not protected by lock, Null reference
> > > exception could happens sometimes.
> > > IRQ Handler was spilt to two phases and Lock was added as well.
> > > 
> > > Fixes: 9707a6254a8a ("media: chips-media: wave5: Add the v4l2 layer")
> > > Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> > > Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> > > ---
> > >  .../platform/chips-media/wave5/wave5-helper.c | 10 ++-
> > >  .../chips-media/wave5/wave5-vpu-dec.c         |  5 ++
> > >  .../chips-media/wave5/wave5-vpu-enc.c         |  5 ++
> > >  .../platform/chips-media/wave5/wave5-vpu.c    | 69
> > > ++++++++++++++++---
> > >  .../platform/chips-media/wave5/wave5-vpuapi.h |  6 ++
> > >  5 files changed, 86 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-helper.c
> > > b/drivers/media/platform/chips-media/wave5/wave5-
> > > helper.c
> > > index 2c9d8cbca6e4..5d9969bb7ada 100644
> > > --- a/drivers/media/platform/chips-media/wave5/wave5-helper.c
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-helper.c
> > > @@ -49,7 +49,7 @@ void wave5_cleanup_instance(struct vpu_instance *inst)
> > >  		v4l2_fh_del(&inst->v4l2_fh);
> > >  		v4l2_fh_exit(&inst->v4l2_fh);
> > >  	}
> > > -	list_del_init(&inst->list);
> > > +	kfifo_free(&inst->irq_status);
> > >  	ida_free(&inst->dev->inst_ida, inst->id);
> > >  	kfree(inst->codec_info);
> > >  	kfree(inst);
> > > @@ -61,8 +61,16 @@ int wave5_vpu_release_device(struct file *filp,
> > >  {
> > >  	struct vpu_instance *inst = wave5_to_vpu_inst(filp->private_data);
> > >  	int ret = 0;
> > > +	unsigned long flags;
> > > 
> > >  	v4l2_m2m_ctx_release(inst->v4l2_fh.m2m_ctx);
> > > +	ret = mutex_lock_interruptible(&inst->dev->irq_lock);
> > > +	if (ret)
> > > +		return ret;
> > 
> > This is quite some heavy locking, why do you need both the mutex and the
> > spinlock ?
> 
> 
> To prevent Null reference exception, the existing irq handler were separated to two modules
> One is to queue interrupt reason in the irq hander, the other is to call  the wave5_vpu_dec_finish_decode to get
> decoded frame.
> The list of instances should be protected between all flow of the decoding process, but to protect the list in the
> irq_handler, spin lock should be used, and 
> mutex should be used in the irq_thread because spin lock is not able to be used because mutex is being used in the
> wave5_vpu_dec_finish_decode.
> So to protect the list in the release function, spin lock and mutex were used.

I don't have a better solution without massively refactoring this driver,
so I'd say its fine. But all this explanation should be found in the code
as comment as its not obvious and quite easy to break.

regards,
Nicolas

> 
> 
> 
> > 
> > > +	spin_lock_irqsave(&inst->dev->irq_spinlock, flags);
> > > +	list_del_init(&inst->list);
> > > +	spin_unlock_irqrestore(&inst->dev->irq_spinlock, flags);
> > > +	mutex_unlock(&inst->dev->irq_lock);
> > >  	if (inst->state != VPU_INST_STATE_NONE) {
> > >  		u32 fail_res;
> > > 
> > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > b/drivers/media/platform/chips- media/wave5/wave5-vpu-dec.c index
> > > fd71f0c43ac3..32de43de1870 100644
> > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > @@ -1811,6 +1811,11 @@ static int wave5_vpu_open_dec(struct file *filp)
> > >  	inst->xfer_func = V4L2_XFER_FUNC_DEFAULT;
> > > 
> > >  	init_completion(&inst->irq_done);
> > > +	ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int), GFP_KERNEL);
> > > +	if (ret) {
> > > +		dev_err(inst->dev->dev, "failed to allocate fifo\n");
> > > +		goto cleanup_inst;
> > > +	}
> > > 
> > >  	inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL);
> > >  	if (inst->id < 0) {
> > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > > b/drivers/media/platform/chips- media/wave5/wave5-vpu-enc.c index
> > > 1e5fc5f8b856..52a1a00fd9bb 100644
> > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> > > @@ -1760,6 +1760,11 @@ static int wave5_vpu_open_enc(struct file *filp)
> > >  	inst->frame_rate = 30;
> > > 
> > >  	init_completion(&inst->irq_done);
> > > +	ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int), GFP_KERNEL);
> > > +	if (ret) {
> > > +		dev_err(inst->dev->dev, "failed to allocate fifo\n");
> > > +		goto cleanup_inst;
> > > +	}
> > > 
> > >  	inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL);
> > >  	if (inst->id < 0) {
> > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> > > b/drivers/media/platform/chips-media/wave5/wave5-
> > > vpu.c
> > > index e1715d3f43b0..a2c09523d76b 100644
> > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> > > @@ -51,8 +51,11 @@ static void wave5_vpu_handle_irq(void *dev_id)
> > >  	u32 seq_done;
> > >  	u32 cmd_done;
> > >  	u32 irq_reason;
> > > -	struct vpu_instance *inst;
> > > +	u32 irq_subreason;
> > > +	struct vpu_instance *inst, *tmp;
> > >  	struct vpu_device *dev = dev_id;
> > > +	int val;
> > > +	unsigned long flags;
> > > 
> > >  	irq_reason = wave5_vdi_read_register(dev, W5_VPU_VINT_REASON);
> > >  	seq_done = wave5_vdi_read_register(dev,
> > > W5_RET_SEQ_DONE_INSTANCE_INFO); @@ -60,7 +63,8 @@ static void
> > wave5_vpu_handle_irq(void *dev_id)
> > >  	wave5_vdi_write_register(dev, W5_VPU_VINT_REASON_CLR, irq_reason);
> > >  	wave5_vdi_write_register(dev, W5_VPU_VINT_CLEAR, 0x1);
> > > 
> > > -	list_for_each_entry(inst, &dev->instances, list) {
> > > +	spin_lock_irqsave(&dev->irq_spinlock, flags);
> > > +	list_for_each_entry_safe(inst, tmp, &dev->instances, list) {
> > > 
> > >  		if (irq_reason & BIT(INT_WAVE5_INIT_SEQ) ||
> > >  		    irq_reason & BIT(INT_WAVE5_ENC_SET_PARAM)) { @@ -82,14
> > +86,22
> > > @@ static void wave5_vpu_handle_irq(void *dev_id)
> > >  		    irq_reason & BIT(INT_WAVE5_ENC_PIC)) {
> > >  			if (cmd_done & BIT(inst->id)) {
> > >  				cmd_done &= ~BIT(inst->id);
> > > -				wave5_vdi_write_register(dev,
> > W5_RET_QUEUE_CMD_DONE_INST,
> > > -							 cmd_done);
> > > -				inst->ops->finish_process(inst);
> > > +				if (dev->irq >= 0) {
> > > +					irq_subreason =
> > > +						wave5_vdi_read_register(dev,
> > W5_VPU_VINT_REASON);
> > > +					if (!(irq_subreason &
> > BIT(INT_WAVE5_DEC_PIC)))
> > > +						wave5_vdi_write_register(dev,
> > > +
> > W5_RET_QUEUE_CMD_DONE_INST,
> > > +									 cmd_done);
> > > +				}
> > > +				val = BIT(INT_WAVE5_DEC_PIC);
> > > +				kfifo_in(&inst->irq_status, &val, sizeof(int));
> > >  			}
> > >  		}
> > > -
> > > -		wave5_vpu_clear_interrupt(inst, irq_reason);
> > >  	}
> > > +	spin_unlock_irqrestore(&dev->irq_spinlock, flags);
> > > +
> > > +	up(&dev->irq_sem);
> > >  }
> > > 
> > >  static irqreturn_t wave5_vpu_irq_thread(int irq, void *dev_id) @@
> > > -121,6 +133,35 @@ static enum hrtimer_restart
> > wave5_vpu_timer_callback(struct hrtimer *timer)
> > >  	return HRTIMER_RESTART;
> > >  }
> > > 
> > > +static int irq_thread(void *data)
> > > +{
> > > +	struct vpu_device *dev = (struct vpu_device *)data;
> > > +	struct vpu_instance *inst, *tmp;
> > > +	int irq_status, ret;
> > > +
> > > +	while (!kthread_should_stop()) {
> > > +		if (down_interruptible(&dev->irq_sem))
> > > +			continue;
> > > +
> > > +		if (kthread_should_stop())
> > > +			break;
> > > +
> > > +		mutex_lock(&dev->irq_lock);
> > > +		list_for_each_entry_safe(inst, tmp, &dev->instances, list) {
> > > +			while (kfifo_len(&inst->irq_status)) {
> > > +				ret = kfifo_out(&inst->irq_status, &irq_status,
> > sizeof(int));
> > > +				if (!ret)
> > > +					break;
> > > +
> > > +				inst->ops->finish_process(inst);
> > > +			}
> > > +		}
> > > +		mutex_unlock(&dev->irq_lock);
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  static int wave5_vpu_load_firmware(struct device *dev, const char
> > *fw_name,
> > >  				   u32 *revision)
> > >  {
> > > @@ -224,6 +265,8 @@ static int wave5_vpu_probe(struct platform_device
> > > *pdev)
> > > 
> > >  	mutex_init(&dev->dev_lock);
> > >  	mutex_init(&dev->hw_lock);
> > > +	mutex_init(&dev->irq_lock);
> > > +	spin_lock_init(&dev->irq_spinlock);
> > >  	dev_set_drvdata(&pdev->dev, dev);
> > >  	dev->dev = &pdev->dev;
> > > 
> > > @@ -266,6 +309,10 @@ static int wave5_vpu_probe(struct platform_device
> > *pdev)
> > >  	}
> > >  	dev->product = wave5_vpu_get_product_id(dev);
> > > 
> > > +	sema_init(&dev->irq_sem, 1);
> > > +	INIT_LIST_HEAD(&dev->instances);
> > > +	dev->irq_thread = kthread_run(irq_thread, dev, "irq thread");
> > > +
> > >  	dev->irq = platform_get_irq(pdev, 0);
> > >  	if (dev->irq < 0) {
> > >  		dev_err(&pdev->dev, "failed to get irq resource, falling
> > back to
> > > polling\n"); @@ -288,7 +335,6 @@ static int wave5_vpu_probe(struct
> > platform_device *pdev)
> > >  		}
> > >  	}
> > > 
> > > -	INIT_LIST_HEAD(&dev->instances);
> > >  	ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
> > >  	if (ret) {
> > >  		dev_err(&pdev->dev, "v4l2_device_register, fail: %d\n", ret);
> > @@
> > > -351,6 +397,12 @@ static void wave5_vpu_remove(struct platform_device
> > > *pdev)
> > >  {
> > >  	struct vpu_device *dev = dev_get_drvdata(&pdev->dev);
> > > 
> > > +	if (dev->irq_thread) {
> > > +		kthread_stop(dev->irq_thread);
> > > +		up(&dev->irq_sem);
> > > +		dev->irq_thread = NULL;
> > > +	}
> > > +
> > >  	if (dev->irq < 0) {
> > >  		kthread_destroy_worker(dev->worker);
> > >  		hrtimer_cancel(&dev->hrtimer);
> > > @@ -361,6 +413,7 @@ static void wave5_vpu_remove(struct
> > > platform_device *pdev)
> > > 
> > >  	mutex_destroy(&dev->dev_lock);
> > >  	mutex_destroy(&dev->hw_lock);
> > > +	mutex_destroy(&dev->irq_lock);
> > >  	reset_control_assert(dev->resets);
> > >  	clk_bulk_disable_unprepare(dev->num_clks, dev->clks);
> > >  	wave5_vpu_enc_unregister_device(dev);
> > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > b/drivers/media/platform/chips-media/wave5/wave5-
> > > vpuapi.h
> > > index 45615c15beca..f3c1ad6fb3be 100644
> > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > @@ -8,6 +8,7 @@
> > >  #ifndef VPUAPI_H_INCLUDED
> > >  #define VPUAPI_H_INCLUDED
> > > 
> > > +#include <linux/kfifo.h>
> > >  #include <linux/idr.h>
> > >  #include <linux/genalloc.h>
> > >  #include <media/v4l2-device.h>
> > > @@ -747,6 +748,7 @@ struct vpu_device {
> > >  	struct video_device *video_dev_enc;
> > >  	struct mutex dev_lock; /* lock for the src, dst v4l2 queues */
> > >  	struct mutex hw_lock; /* lock hw configurations */
> > > +	struct mutex irq_lock;
> > >  	int irq;
> > >  	enum product_id product;
> > >  	struct vpu_attr attr;
> > > @@ -764,7 +766,10 @@ struct vpu_device {
> > >  	struct kthread_worker *worker;
> > >  	int vpu_poll_interval;
> > >  	int num_clks;
> > > +	struct task_struct *irq_thread;
> > > +	struct semaphore irq_sem;
> > 
> > Can you comment on what they actually protect ?
> 
> Okay
> 
> > 
> > >  	struct reset_control *resets;
> > > +	spinlock_t irq_spinlock; /* protect instances list */
> > >  };
> > > 
> > >  struct vpu_instance;
> > > @@ -788,6 +793,7 @@ struct vpu_instance {
> > >  	enum v4l2_ycbcr_encoding ycbcr_enc;
> > >  	enum v4l2_quantization quantization;
> > > 
> > > +	struct kfifo irq_status;
> > >  	enum vpu_instance_state state;
> > >  	enum vpu_instance_type type;
> > >  	const struct vpu_instance_ops *ops;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder
  2025-05-27  4:58     ` jackson.lee
@ 2025-05-28 13:46       ` Nicolas Dufresne
  2025-06-04  4:09         ` jackson.lee
  2025-05-30 14:33       ` Nicolas Dufresne
  1 sibling, 1 reply; 29+ messages in thread
From: Nicolas Dufresne @ 2025-05-28 13:46 UTC (permalink / raw)
  To: jackson.lee, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	bob.beckett@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung

Hi,

Le mardi 27 mai 2025 à 04:58 +0000, jackson.lee a écrit :
> 
> 
> > -----Original Message-----
> > From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> > Sent: Saturday, May 24, 2025 2:39 AM
> > To: jackson.lee <jackson.lee@chipsnmedia.com>; mchehab@kernel.org;
> > hverkuil-cisco@xs4all.nl; sebastian.fricke@collabora.com;
> > bob.beckett@collabora.com; dafna.hirschfeld@collabora.com
> > Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; lafley.kim
> > <lafley.kim@chipsnmedia.com>; b-brnich@ti.com; hverkuil@xs4all.nl; Nas
> > Chung <nas.chung@chipsnmedia.com>
> > Subject: Re: [PATCH v2 2/7] media: chips-media: wave5: Improve performance
> > of decoder
> > 
> > Hi,
> > 
> > Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> > > From: Jackson Lee <jackson.lee@chipsnmedia.com>
> > > 
> > > The current decoding method  was to wait until each frame was decoded
> > > after feeding a bitstream. As a result, performance was low and Wave5
> > > could not achieve max pixel processing rate.
> > > 
> > > Update driver to use an asynchronous approach for decoding and feeding
> > > a bitstream in order to achieve full capabilities of the device.
> > > 
> > > WAVE5 supports command-queueing to maximize performance by pipelining
> > > internal commands and by hiding wait cycle taken to receive a command
> > > from Host processor.
> > > 
> > > Instead of waiting for each command to be executed before sending the
> > > next command, Host processor just places all the commands in the
> > > command-queue and goes on doing other things while the commands in the
> > > queue are processed by VPU.
> > > 
> > > While Host processor handles its own tasks, it can receive VPU
> > > interrupt request (IRQ).
> > > In this case, host processor can simply exit interrupt service routine
> > > (ISR) without accessing to host interface to read the result of the
> > > command reported by VPU.
> > > After host processor completed its tasks, host processor can read the
> > > command result when host processor needs the reports and does response
> > > processing.
> > > 
> > > To archive this goal, the device_run() calls v4l2_m2m_job_finish so
> > > that next command can be sent to VPU continuously, if there is any
> > > result, then irq is triggered and gets decoded frames and returns them
> > > to upper layer.
> > > Theses processes work independently each other without waiting a
> > > decoded frame.
> > > 
> > > Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> > > Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> > > ---
> > >  .../platform/chips-media/wave5/wave5-hw.c     |  2 +-
> > >  .../chips-media/wave5/wave5-vpu-dec.c         | 84
> > > +++++++++++--------
> > >  .../platform/chips-media/wave5/wave5-vpuapi.c |  2 +
> > >  .../platform/chips-media/wave5/wave5-vpuapi.h |  3 +
> > >  4 files changed, 57 insertions(+), 34 deletions(-)
> > > 
> > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > > b/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > > index d94cf84c3ee5..687ce6ccf3ae 100644
> > > --- a/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > > @@ -102,7 +102,7 @@ static void _wave5_print_reg_err(struct vpu_device
> > *vpu_dev, u32 reg_fail_reason
> > >  		dev_dbg(dev, "%s: queueing failure: 0x%x\n", func, reg_val);
> > >  		break;
> > >  	case WAVE5_SYSERR_RESULT_NOT_READY:
> > > -		dev_err(dev, "%s: result not ready: 0x%x\n", func,
> > reg_fail_reason);
> > > +		dev_dbg(dev, "%s: result not ready: 0x%x\n", func,
> > > +reg_fail_reason);
> > >  		break;
> > >  	case WAVE5_SYSERR_ACCESS_VIOLATION_HW:
> > >  		dev_err(dev, "%s: access violation: 0x%x\n", func,
> > > reg_fail_reason); diff --git
> > > a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > b/drivers/media/platform/chips- media/wave5/wave5-vpu-dec.c index
> > > 32de43de1870..995234a3a6d6 100644
> > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > @@ -347,13 +347,12 @@ static void wave5_vpu_dec_finish_decode(struct
> > vpu_instance *inst)
> > >  	struct vb2_v4l2_buffer *dec_buf = NULL;
> > >  	struct vb2_v4l2_buffer *disp_buf = NULL;
> > >  	struct vb2_queue *dst_vq = v4l2_m2m_get_dst_vq(m2m_ctx);
> > > -	struct queue_status_info q_status;
> > > 
> > >  	dev_dbg(inst->dev->dev, "%s: Fetch output info from firmware.",
> > > __func__);
> > > 
> > >  	ret = wave5_vpu_dec_get_output_info(inst, &dec_info);
> > >  	if (ret) {
> > > -		dev_warn(inst->dev->dev, "%s: could not get output info.",
> > __func__);
> > > +		dev_dbg(inst->dev->dev, "%s: could not get output info.",
> > > +__func__);
> > 
> > Wouldn't it be better to check the return value to possibly differentiate
> > some errors from something similar to EGAIN?
> > 
> 
> In case of this, when get_result command is requested to VPU, there could be no output.
> So it is not error case and EAGIAN is not proper because the wave5_vpu_dec_finish_decode is triggered by PIC_Done
> interrupt.
> So I think it is proper code.

My worry is that you are silencing possible problems. I checked further, it may return
EINVAL, EIO and EAGAIN and whatever read_poll_timeout() returns on timeout. For most, this
trace was noise, so now I agree with the change, but can you go find the EINVAL and add
a warn_on around it, since this is a programming error, it would happen if dec_info is
a null pointer. Can be patch in itself, before this one.

> 
> 
> > >  		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> > >  		return;
> > >  	}
> > > @@ -441,20 +440,6 @@ static void wave5_vpu_dec_finish_decode(struct
> > vpu_instance *inst)
> > >  		}
> > >  		spin_unlock_irqrestore(&inst->state_spinlock, flags);
> > >  	}
> > > -
> > > -	/*
> > > -	 * During a resolution change and while draining, the firmware may
> > flush
> > > -	 * the reorder queue regardless of having a matching decoding
> > operation
> > > -	 * pending. Only terminate the job if there are no more IRQ coming.
> > > -	 */
> > > -	wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, &q_status);
> > > -	if (q_status.report_queue_count == 0 &&
> > > -	    (q_status.instance_queue_count == 0 ||
> > dec_info.sequence_changed)) {
> > > -		dev_dbg(inst->dev->dev, "%s: finishing job.\n", __func__);
> > > -		pm_runtime_mark_last_busy(inst->dev->dev);
> > > -		pm_runtime_put_autosuspend(inst->dev->dev);
> > > -		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> > > -	}
> > >  }
> > > 
> > >  static int wave5_vpu_dec_querycap(struct file *file, void *fh, struct
> > > v4l2_capability *cap) @@ -1146,8 +1131,8 @@ static int
> > > write_to_ringbuffer(struct vpu_instance *inst, void *buffer, size_t b
> > >  static int fill_ringbuffer(struct vpu_instance *inst)
> > >  {
> > >  	struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
> > > -	struct v4l2_m2m_buffer *buf, *n;
> > > -	int ret;
> > > +	struct vpu_src_buffer *vpu_buf;
> > > +	int ret = 0;
> > > 
> > >  	if (m2m_ctx->last_src_buf)  {
> > >  		struct vpu_src_buffer *vpu_buf =
> > > wave5_to_vpu_src_buf(m2m_ctx->last_src_buf);
> > > @@ -1158,9 +1143,8 @@ static int fill_ringbuffer(struct vpu_instance
> > *inst)
> > >  		}
> > >  	}
> > > 
> > > -	v4l2_m2m_for_each_src_buf_safe(m2m_ctx, buf, n) {
> > > -		struct vb2_v4l2_buffer *vbuf = &buf->vb;
> > > -		struct vpu_src_buffer *vpu_buf = wave5_to_vpu_src_buf(vbuf);
> > > +	list_for_each_entry(vpu_buf, &inst->avail_src_bufs, list) {
> > > +		struct vb2_v4l2_buffer *vbuf = &vpu_buf->v4l2_m2m_buf.vb;
> > >  		struct vpu_buf *ring_buffer = &inst->bitstream_vbuf;
> > >  		size_t src_size = vb2_get_plane_payload(&vbuf->vb2_buf, 0);
> > >  		void *src_buf = vb2_plane_vaddr(&vbuf->vb2_buf, 0); @@ -
> > 1220,9
> > > +1204,13 @@ static int fill_ringbuffer(struct vpu_instance *inst)
> > >  			dev_dbg(inst->dev->dev, "last src buffer written to
> > the ring buffer\n");
> > >  			break;
> > >  		}
> > > +
> > > +		inst->queuing_num++;
> > > +		list_del_init(&vpu_buf->list);
> > > +		break;
> > >  	}
> > > 
> > > -	return 0;
> > > +	return ret;
> > >  }
> > > 
> > >  static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb) @@
> > > -1236,6 +1224,11 @@ static void wave5_vpu_dec_buf_queue_src(struct
> > vb2_buffer *vb)
> > >  	vbuf->sequence = inst->queued_src_buf_num++;
> > > 
> > >  	v4l2_m2m_buf_queue(m2m_ctx, vbuf);
> > > +
> > > +	INIT_LIST_HEAD(&vpu_buf->list);
> > > +	mutex_lock(&inst->feed_lock);
> > > +	list_add_tail(&vpu_buf->list, &inst->avail_src_bufs);
> > > +	mutex_unlock(&inst->feed_lock);
> > >  }
> > > 
> > >  static void wave5_vpu_dec_buf_queue_dst(struct vb2_buffer *vb) @@
> > > -1385,6 +1378,13 @@ static int streamoff_output(struct vb2_queue *q)
> > >  	dma_addr_t new_rd_ptr;
> > >  	struct dec_output_info dec_info;
> > >  	unsigned int i;
> > > +	struct vpu_src_buffer *vpu_buf, *tmp;
> > > +
> > > +	inst->retry = false;
> > > +	inst->queuing_num = 0;
> > > +
> > > +	list_for_each_entry_safe(vpu_buf, tmp, &inst->avail_src_bufs, list)
> > > +		list_del_init(&vpu_buf->list);
> > > 
> > >  	for (i = 0; i < v4l2_m2m_num_dst_bufs_ready(m2m_ctx); i++) {
> > >  		ret = wave5_vpu_dec_set_disp_flag(inst, i); @@ -1580,10
> > +1580,19 @@
> > > static void wave5_vpu_dec_device_run(void *priv)
> > > 
> > >  	dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with new
> > bitstream data", __func__);
> > >  	pm_runtime_resume_and_get(inst->dev->dev);
> > > -	ret = fill_ringbuffer(inst);
> > > -	if (ret) {
> > > -		dev_warn(inst->dev->dev, "Filling ring buffer failed\n");
> > > -		goto finish_job_and_return;
> > > +	if (!inst->retry) {
> > > +		mutex_lock(&inst->feed_lock);
> > > +		ret = fill_ringbuffer(inst);
> > > +		mutex_unlock(&inst->feed_lock);
> > > +		if (ret < 0) {
> > > +			dev_warn(inst->dev->dev, "Filling ring buffer
> > failed\n");
> > > +			goto finish_job_and_return;
> > > +		} else if (!inst->eos &&
> > > +				inst->queuing_num == 0 &&
> > > +				inst->state == VPU_INST_STATE_PIC_RUN) {
> > > +			dev_dbg(inst->dev->dev, "%s: no bitstream for feeding,
> > so skip ", __func__);
> > > +			goto finish_job_and_return;
> > > +		}
> > >  	}
> > > 
> > >  	switch (inst->state) {
> > > @@ -1639,7 +1648,7 @@ static void wave5_vpu_dec_device_run(void *priv)
> > >  		}
> > > 
> > >  		if (q_status.instance_queue_count) {
> > > -			dev_dbg(inst->dev->dev, "%s: leave with active job",
> > __func__);
> > > +			v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> > >  			return;
> > >  		}
> > > 
> > > @@ -1650,14 +1659,21 @@ static void wave5_vpu_dec_device_run(void *priv)
> > >  			dev_err(inst->dev->dev,
> > >  				"Frame decoding on m2m context (%p), fail: %d
> > (result: %d)\n",
> > >  				m2m_ctx, ret, fail_res);
> > > -			break;
> > > +			goto finish_job_and_return;
> > > +		}
> > > +
> > > +		if (fail_res == WAVE5_SYSERR_QUEUEING_FAIL) {
> > > +			inst->retry = true;
> > > +		} else {
> > > +			inst->retry = false;
> > > +			if (!inst->eos)
> > > +				inst->queuing_num--;
> > 
> > I looked into the original state machine violation you had in previous
> > version, and I got the impression that the reason you did hit that was
> > that you actually call device_run passed inst->eos. Its probably not that
> > simple in practice, but I think you forgot to adapt the job_ready() ops to
> > prevent more device_run() called passed CMD_STOP and having all pending
> > buffer written in the ring buffer.
> > 
> > As a side effect, you endup calling device_run() in a race with the
> > finish() setting the state to STOP. I really think there is a way to use
> > inst->eos boolean to prevent that race in the first place. Might need to
> > be combined with checking if you have buffers prior to command stop that
> > did not yet fit into the ring buffer.
> > 
> 
> 
> The queuing_num is used to check if there is input data or not, so it was declared in the int type.
> If there is no input data, then the device_run will be not called until queuing input data.
> In case of eos sent, device_run should be ran continuously until getting eos from VPU, the code was needed.
> If my answer is not correct, please let me know.

Running in loop anything is never the right approach. The device_run() should be run
when a useful event occur and filtered by the job_ready() ops. I believe I'm proposing some
hint how to solve this design issue. The issue is quite clear with the follow up patch trying
to reduce the CPU usage due to spinning.

> 
> 
> > >  		}
> > > -		/* Return so that we leave this job active */
> > > -		dev_dbg(inst->dev->dev, "%s: leave with active job",
> > __func__);
> > > -		return;
> > > -	default:
> > > -		WARN(1, "Execution of a job in state %s illegal.\n",
> > state_to_str(inst->state));
> > >  		break;
> > > +	default:
> > > +		dev_dbg(inst->dev->dev, "Execution of a job in state %s
> > illegal.\n",
> > > +			state_to_str(inst->state));
> > > +
> > >  	}
> > > 
> > >  finish_job_and_return:
> > > @@ -1755,6 +1771,8 @@ static int wave5_vpu_open_dec(struct file *filp)
> > >  	inst->ops = &wave5_vpu_dec_inst_ops;
> > > 
> > >  	spin_lock_init(&inst->state_spinlock);
> > > +	mutex_init(&inst->feed_lock);
> > > +	INIT_LIST_HEAD(&inst->avail_src_bufs);
> > > 
> > >  	inst->codec_info = kzalloc(sizeof(*inst->codec_info), GFP_KERNEL);
> > >  	if (!inst->codec_info)
> > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > b/drivers/media/platform/chips-media/wave5/wave5-
> > > vpuapi.c
> > > index e5e879a13e8b..68d86625538f 100644
> > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > @@ -255,6 +255,8 @@ int wave5_vpu_dec_close(struct vpu_instance *inst,
> > u32 *fail_res)
> > >  	if (inst_count == 1)
> > >  		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
> > > 
> > > +	mutex_destroy(&inst->feed_lock);
> > > +
> > >  unlock_and_return:
> > >  	mutex_unlock(&vpu_dev->hw_lock);
> > >  	pm_runtime_put_sync(inst->dev->dev);
> > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > b/drivers/media/platform/chips-media/wave5/wave5-
> > > vpuapi.h
> > > index f3c1ad6fb3be..fd0aef0bac4e 100644
> > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > @@ -818,6 +818,9 @@ struct vpu_instance {
> > >  	bool cbcr_interleave;
> > >  	bool nv21;
> > >  	bool eos;
> > > +	bool retry; /* retry to feed bitstream if failure reason is
> > WAVE5_SYSERR_QUEUEING_FAIL*/
> > > +	int queuing_num; /* check if there is input buffer or not */
> > 
> > This is described as a boolean, but is implemented as a counter. What does
> > it count exactly ?
> > I think it needs a better name too.
> > 
> 
> Please refer to the above comment.

That won't work for me, I'm requesting you to correct the comment to properly
say this is a counter.

Nicolas

> 
> Thanks
> Jackson
> 
> 
> > Nicolas
> > 
> > > +	struct mutex feed_lock; /* lock for feeding bitstream buffers */
> > >  	struct vpu_buf bitstream_vbuf;
> > >  	dma_addr_t last_rd_ptr;
> > >  	size_t remaining_consumed_bytes;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 4/7] media: chips-media: wave5: Use spinlock whenever statue is changed
  2025-05-27  5:02     ` jackson.lee
@ 2025-05-28 13:49       ` Nicolas Dufresne
  0 siblings, 0 replies; 29+ messages in thread
From: Nicolas Dufresne @ 2025-05-28 13:49 UTC (permalink / raw)
  To: jackson.lee, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	bob.beckett@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung

Le mardi 27 mai 2025 à 05:02 +0000, jackson.lee a écrit :
> Hi Nicolas
> 
> > -----Original Message-----
> > From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> > Sent: Saturday, May 24, 2025 2:41 AM
> > To: jackson.lee <jackson.lee@chipsnmedia.com>; mchehab@kernel.org;
> > hverkuil-cisco@xs4all.nl; sebastian.fricke@collabora.com;
> > bob.beckett@collabora.com; dafna.hirschfeld@collabora.com
> > Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; lafley.kim
> > <lafley.kim@chipsnmedia.com>; b-brnich@ti.com; hverkuil@xs4all.nl; Nas
> > Chung <nas.chung@chipsnmedia.com>
> > Subject: Re: [PATCH v2 4/7] media: chips-media: wave5: Use spinlock
> > whenever statue is changed
> > 
> > Hi,
> > 
> > Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> > > From: Jackson Lee <jackson.lee@chipsnmedia.com>
> > > 
> > > The device_run and finish_decode is not any more synchronized, so lock
> > > was needed in the device_run whenever state was changed.
> > 
> > Can you try to introduce the locking ahead of the patches, otherwise this
> > break bisectability as the in-between become racy.
> 
> 
> Do you want to introduce this patch ahead of the performance patch?

I'm sure you can find the right anwser if you understand why I'm asking this. The way
patchset should be layout is so that at any step applying the set, the driver
should remain stable. If one patch above breaks something, and you fix it in the
next patch, this is not a bisectable set.

git bisect does not know about "sets" and shouldn't need to know about it either.

regards,
Nicolas

> 
> Thanks
> Jackson
> 
> > 
> > Nicolas
> > 
> > > 
> > > Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> > > Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> > > ---
> > >  drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > b/drivers/media/platform/chips- media/wave5/wave5-vpu-dec.c index
> > > 42981c3b49bc..719c5527eb7f 100644
> > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > @@ -1577,6 +1577,7 @@ static void wave5_vpu_dec_device_run(void *priv)
> > >  	struct queue_status_info q_status;
> > >  	u32 fail_res = 0;
> > >  	int ret = 0;
> > > +	unsigned long flags;
> > > 
> > >  	dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with new
> > bitstream data", __func__);
> > >  	pm_runtime_resume_and_get(inst->dev->dev);
> > > @@ -1617,7 +1618,9 @@ static void wave5_vpu_dec_device_run(void *priv)
> > >  			}
> > >  			spin_unlock_irqrestore(&inst->state_spinlock, flags);
> > >  		} else {
> > > +			spin_lock_irqsave(&inst->state_spinlock, flags);
> > >  			switch_state(inst, VPU_INST_STATE_INIT_SEQ);
> > > +			spin_unlock_irqrestore(&inst->state_spinlock, flags);
> > >  		}
> > > 
> > >  		break;
> > > @@ -1628,8 +1631,9 @@ static void wave5_vpu_dec_device_run(void *priv)
> > >  		 * we had a chance to switch, which leads to an invalid state
> > >  		 * change.
> > >  		 */
> > > +		spin_lock_irqsave(&inst->state_spinlock, flags);
> > >  		switch_state(inst, VPU_INST_STATE_PIC_RUN);
> > > -
> > > +		spin_unlock_irqrestore(&inst->state_spinlock, flags);
> > >  		/*
> > >  		 * During DRC, the picture decoding remains pending, so just
> > leave the job
> > >  		 * active until this decode operation completes.
> > > @@ -1643,7 +1647,9 @@ static void wave5_vpu_dec_device_run(void *priv)
> > >  		ret = wave5_prepare_fb(inst);
> > >  		if (ret) {
> > >  			dev_warn(inst->dev->dev, "Framebuffer preparation,
> > fail: %d\n",
> > > ret);
> > > +			spin_lock_irqsave(&inst->state_spinlock, flags);
> > >  			switch_state(inst, VPU_INST_STATE_STOP);
> > > +			spin_unlock_irqrestore(&inst->state_spinlock, flags);
> > >  			break;
> > >  		}
> > > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder
  2025-05-27  4:58     ` jackson.lee
  2025-05-28 13:46       ` Nicolas Dufresne
@ 2025-05-30 14:33       ` Nicolas Dufresne
  1 sibling, 0 replies; 29+ messages in thread
From: Nicolas Dufresne @ 2025-05-30 14:33 UTC (permalink / raw)
  To: jackson.lee, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	bob.beckett@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung

Hi,

Le mardi 27 mai 2025 à 04:58 +0000, jackson.lee a écrit :
> 
> 
> > -----Original Message-----
> > From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> > Sent: Saturday, May 24, 2025 2:39 AM
> > To: jackson.lee <jackson.lee@chipsnmedia.com>; mchehab@kernel.org;
> > hverkuil-cisco@xs4all.nl; sebastian.fricke@collabora.com;
> > bob.beckett@collabora.com; dafna.hirschfeld@collabora.com
> > Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; lafley.kim
> > <lafley.kim@chipsnmedia.com>; b-brnich@ti.com; hverkuil@xs4all.nl; Nas
> > Chung <nas.chung@chipsnmedia.com>
> > Subject: Re: [PATCH v2 2/7] media: chips-media: wave5: Improve performance
> > of decoder
> > 
> > Hi,
> > 
> > Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> > > From: Jackson Lee <jackson.lee@chipsnmedia.com>
> > > 
> > > The current decoding method  was to wait until each frame was decoded
> > > after feeding a bitstream. As a result, performance was low and Wave5
> > > could not achieve max pixel processing rate.
> > > 
> > > Update driver to use an asynchronous approach for decoding and feeding
> > > a bitstream in order to achieve full capabilities of the device.
> > > 
> > > WAVE5 supports command-queueing to maximize performance by pipelining
> > > internal commands and by hiding wait cycle taken to receive a command
> > > from Host processor.
> > > 
> > > Instead of waiting for each command to be executed before sending the
> > > next command, Host processor just places all the commands in the
> > > command-queue and goes on doing other things while the commands in the
> > > queue are processed by VPU.
> > > 
> > > While Host processor handles its own tasks, it can receive VPU
> > > interrupt request (IRQ).
> > > In this case, host processor can simply exit interrupt service routine
> > > (ISR) without accessing to host interface to read the result of the
> > > command reported by VPU.
> > > After host processor completed its tasks, host processor can read the
> > > command result when host processor needs the reports and does response
> > > processing.
> > > 
> > > To archive this goal, the device_run() calls v4l2_m2m_job_finish so
> > > that next command can be sent to VPU continuously, if there is any
> > > result, then irq is triggered and gets decoded frames and returns them
> > > to upper layer.
> > > Theses processes work independently each other without waiting a
> > > decoded frame.
> > > 
> > > Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> > > Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> > > ---
> > >  .../platform/chips-media/wave5/wave5-hw.c     |  2 +-
> > >  .../chips-media/wave5/wave5-vpu-dec.c         | 84
> > > +++++++++++--------
> > >  .../platform/chips-media/wave5/wave5-vpuapi.c |  2 +
> > >  .../platform/chips-media/wave5/wave5-vpuapi.h |  3 +
> > >  4 files changed, 57 insertions(+), 34 deletions(-)
> > > 
> > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > > b/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > > index d94cf84c3ee5..687ce6ccf3ae 100644
> > > --- a/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > > @@ -102,7 +102,7 @@ static void _wave5_print_reg_err(struct vpu_device
> > *vpu_dev, u32 reg_fail_reason
> > >  		dev_dbg(dev, "%s: queueing failure: 0x%x\n", func, reg_val);
> > >  		break;
> > >  	case WAVE5_SYSERR_RESULT_NOT_READY:
> > > -		dev_err(dev, "%s: result not ready: 0x%x\n", func,
> > reg_fail_reason);
> > > +		dev_dbg(dev, "%s: result not ready: 0x%x\n", func,
> > > +reg_fail_reason);
> > >  		break;
> > >  	case WAVE5_SYSERR_ACCESS_VIOLATION_HW:
> > >  		dev_err(dev, "%s: access violation: 0x%x\n", func,
> > > reg_fail_reason); diff --git
> > > a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > b/drivers/media/platform/chips- media/wave5/wave5-vpu-dec.c index
> > > 32de43de1870..995234a3a6d6 100644
> > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > @@ -347,13 +347,12 @@ static void wave5_vpu_dec_finish_decode(struct
> > vpu_instance *inst)
> > >  	struct vb2_v4l2_buffer *dec_buf = NULL;
> > >  	struct vb2_v4l2_buffer *disp_buf = NULL;
> > >  	struct vb2_queue *dst_vq = v4l2_m2m_get_dst_vq(m2m_ctx);
> > > -	struct queue_status_info q_status;
> > > 
> > >  	dev_dbg(inst->dev->dev, "%s: Fetch output info from firmware.",
> > > __func__);
> > > 
> > >  	ret = wave5_vpu_dec_get_output_info(inst, &dec_info);
> > >  	if (ret) {
> > > -		dev_warn(inst->dev->dev, "%s: could not get output info.",
> > __func__);
> > > +		dev_dbg(inst->dev->dev, "%s: could not get output info.",
> > > +__func__);
> > 
> > Wouldn't it be better to check the return value to possibly differentiate
> > some errors from something similar to EGAIN?
> > 
> 
> In case of this, when get_result command is requested to VPU, there could be no output.
> So it is not error case and EAGIAN is not proper because the wave5_vpu_dec_finish_decode is triggered by PIC_Done
> interrupt.
> So I think it is proper code.

Ack. You should consider working on v3 from there.

regards,
Nicolas

> 
> 
> > >  		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> > >  		return;
> > >  	}
> > > @@ -441,20 +440,6 @@ static void wave5_vpu_dec_finish_decode(struct
> > vpu_instance *inst)
> > >  		}
> > >  		spin_unlock_irqrestore(&inst->state_spinlock, flags);
> > >  	}
> > > -
> > > -	/*
> > > -	 * During a resolution change and while draining, the firmware may
> > flush
> > > -	 * the reorder queue regardless of having a matching decoding
> > operation
> > > -	 * pending. Only terminate the job if there are no more IRQ coming.
> > > -	 */
> > > -	wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, &q_status);
> > > -	if (q_status.report_queue_count == 0 &&
> > > -	    (q_status.instance_queue_count == 0 ||
> > dec_info.sequence_changed)) {
> > > -		dev_dbg(inst->dev->dev, "%s: finishing job.\n", __func__);
> > > -		pm_runtime_mark_last_busy(inst->dev->dev);
> > > -		pm_runtime_put_autosuspend(inst->dev->dev);
> > > -		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> > > -	}
> > >  }
> > > 
> > >  static int wave5_vpu_dec_querycap(struct file *file, void *fh, struct
> > > v4l2_capability *cap) @@ -1146,8 +1131,8 @@ static int
> > > write_to_ringbuffer(struct vpu_instance *inst, void *buffer, size_t b
> > >  static int fill_ringbuffer(struct vpu_instance *inst)
> > >  {
> > >  	struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
> > > -	struct v4l2_m2m_buffer *buf, *n;
> > > -	int ret;
> > > +	struct vpu_src_buffer *vpu_buf;
> > > +	int ret = 0;
> > > 
> > >  	if (m2m_ctx->last_src_buf)  {
> > >  		struct vpu_src_buffer *vpu_buf =
> > > wave5_to_vpu_src_buf(m2m_ctx->last_src_buf);
> > > @@ -1158,9 +1143,8 @@ static int fill_ringbuffer(struct vpu_instance
> > *inst)
> > >  		}
> > >  	}
> > > 
> > > -	v4l2_m2m_for_each_src_buf_safe(m2m_ctx, buf, n) {
> > > -		struct vb2_v4l2_buffer *vbuf = &buf->vb;
> > > -		struct vpu_src_buffer *vpu_buf = wave5_to_vpu_src_buf(vbuf);
> > > +	list_for_each_entry(vpu_buf, &inst->avail_src_bufs, list) {
> > > +		struct vb2_v4l2_buffer *vbuf = &vpu_buf->v4l2_m2m_buf.vb;
> > >  		struct vpu_buf *ring_buffer = &inst->bitstream_vbuf;
> > >  		size_t src_size = vb2_get_plane_payload(&vbuf->vb2_buf, 0);
> > >  		void *src_buf = vb2_plane_vaddr(&vbuf->vb2_buf, 0); @@ -
> > 1220,9
> > > +1204,13 @@ static int fill_ringbuffer(struct vpu_instance *inst)
> > >  			dev_dbg(inst->dev->dev, "last src buffer written to
> > the ring buffer\n");
> > >  			break;
> > >  		}
> > > +
> > > +		inst->queuing_num++;
> > > +		list_del_init(&vpu_buf->list);
> > > +		break;
> > >  	}
> > > 
> > > -	return 0;
> > > +	return ret;
> > >  }
> > > 
> > >  static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb) @@
> > > -1236,6 +1224,11 @@ static void wave5_vpu_dec_buf_queue_src(struct
> > vb2_buffer *vb)
> > >  	vbuf->sequence = inst->queued_src_buf_num++;
> > > 
> > >  	v4l2_m2m_buf_queue(m2m_ctx, vbuf);
> > > +
> > > +	INIT_LIST_HEAD(&vpu_buf->list);
> > > +	mutex_lock(&inst->feed_lock);
> > > +	list_add_tail(&vpu_buf->list, &inst->avail_src_bufs);
> > > +	mutex_unlock(&inst->feed_lock);
> > >  }
> > > 
> > >  static void wave5_vpu_dec_buf_queue_dst(struct vb2_buffer *vb) @@
> > > -1385,6 +1378,13 @@ static int streamoff_output(struct vb2_queue *q)
> > >  	dma_addr_t new_rd_ptr;
> > >  	struct dec_output_info dec_info;
> > >  	unsigned int i;
> > > +	struct vpu_src_buffer *vpu_buf, *tmp;
> > > +
> > > +	inst->retry = false;
> > > +	inst->queuing_num = 0;
> > > +
> > > +	list_for_each_entry_safe(vpu_buf, tmp, &inst->avail_src_bufs, list)
> > > +		list_del_init(&vpu_buf->list);
> > > 
> > >  	for (i = 0; i < v4l2_m2m_num_dst_bufs_ready(m2m_ctx); i++) {
> > >  		ret = wave5_vpu_dec_set_disp_flag(inst, i); @@ -1580,10
> > +1580,19 @@
> > > static void wave5_vpu_dec_device_run(void *priv)
> > > 
> > >  	dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with new
> > bitstream data", __func__);
> > >  	pm_runtime_resume_and_get(inst->dev->dev);
> > > -	ret = fill_ringbuffer(inst);
> > > -	if (ret) {
> > > -		dev_warn(inst->dev->dev, "Filling ring buffer failed\n");
> > > -		goto finish_job_and_return;
> > > +	if (!inst->retry) {
> > > +		mutex_lock(&inst->feed_lock);
> > > +		ret = fill_ringbuffer(inst);
> > > +		mutex_unlock(&inst->feed_lock);
> > > +		if (ret < 0) {
> > > +			dev_warn(inst->dev->dev, "Filling ring buffer
> > failed\n");
> > > +			goto finish_job_and_return;
> > > +		} else if (!inst->eos &&
> > > +				inst->queuing_num == 0 &&
> > > +				inst->state == VPU_INST_STATE_PIC_RUN) {
> > > +			dev_dbg(inst->dev->dev, "%s: no bitstream for feeding,
> > so skip ", __func__);
> > > +			goto finish_job_and_return;
> > > +		}
> > >  	}
> > > 
> > >  	switch (inst->state) {
> > > @@ -1639,7 +1648,7 @@ static void wave5_vpu_dec_device_run(void *priv)
> > >  		}
> > > 
> > >  		if (q_status.instance_queue_count) {
> > > -			dev_dbg(inst->dev->dev, "%s: leave with active job",
> > __func__);
> > > +			v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> > >  			return;
> > >  		}
> > > 
> > > @@ -1650,14 +1659,21 @@ static void wave5_vpu_dec_device_run(void *priv)
> > >  			dev_err(inst->dev->dev,
> > >  				"Frame decoding on m2m context (%p), fail: %d
> > (result: %d)\n",
> > >  				m2m_ctx, ret, fail_res);
> > > -			break;
> > > +			goto finish_job_and_return;
> > > +		}
> > > +
> > > +		if (fail_res == WAVE5_SYSERR_QUEUEING_FAIL) {
> > > +			inst->retry = true;
> > > +		} else {
> > > +			inst->retry = false;
> > > +			if (!inst->eos)
> > > +				inst->queuing_num--;
> > 
> > I looked into the original state machine violation you had in previous
> > version, and I got the impression that the reason you did hit that was
> > that you actually call device_run passed inst->eos. Its probably not that
> > simple in practice, but I think you forgot to adapt the job_ready() ops to
> > prevent more device_run() called passed CMD_STOP and having all pending
> > buffer written in the ring buffer.
> > 
> > As a side effect, you endup calling device_run() in a race with the
> > finish() setting the state to STOP. I really think there is a way to use
> > inst->eos boolean to prevent that race in the first place. Might need to
> > be combined with checking if you have buffers prior to command stop that
> > did not yet fit into the ring buffer.
> > 
> 
> 
> The queuing_num is used to check if there is input data or not, so it was declared in the int type.
> If there is no input data, then the device_run will be not called until queuing input data.
> In case of eos sent, device_run should be ran continuously until getting eos from VPU, the code was needed.
> If my answer is not correct, please let me know.
> 
> 
> > >  		}
> > > -		/* Return so that we leave this job active */
> > > -		dev_dbg(inst->dev->dev, "%s: leave with active job",
> > __func__);
> > > -		return;
> > > -	default:
> > > -		WARN(1, "Execution of a job in state %s illegal.\n",
> > state_to_str(inst->state));
> > >  		break;
> > > +	default:
> > > +		dev_dbg(inst->dev->dev, "Execution of a job in state %s
> > illegal.\n",
> > > +			state_to_str(inst->state));
> > > +
> > >  	}
> > > 
> > >  finish_job_and_return:
> > > @@ -1755,6 +1771,8 @@ static int wave5_vpu_open_dec(struct file *filp)
> > >  	inst->ops = &wave5_vpu_dec_inst_ops;
> > > 
> > >  	spin_lock_init(&inst->state_spinlock);
> > > +	mutex_init(&inst->feed_lock);
> > > +	INIT_LIST_HEAD(&inst->avail_src_bufs);
> > > 
> > >  	inst->codec_info = kzalloc(sizeof(*inst->codec_info), GFP_KERNEL);
> > >  	if (!inst->codec_info)
> > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > b/drivers/media/platform/chips-media/wave5/wave5-
> > > vpuapi.c
> > > index e5e879a13e8b..68d86625538f 100644
> > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > @@ -255,6 +255,8 @@ int wave5_vpu_dec_close(struct vpu_instance *inst,
> > u32 *fail_res)
> > >  	if (inst_count == 1)
> > >  		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
> > > 
> > > +	mutex_destroy(&inst->feed_lock);
> > > +
> > >  unlock_and_return:
> > >  	mutex_unlock(&vpu_dev->hw_lock);
> > >  	pm_runtime_put_sync(inst->dev->dev);
> > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > b/drivers/media/platform/chips-media/wave5/wave5-
> > > vpuapi.h
> > > index f3c1ad6fb3be..fd0aef0bac4e 100644
> > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > @@ -818,6 +818,9 @@ struct vpu_instance {
> > >  	bool cbcr_interleave;
> > >  	bool nv21;
> > >  	bool eos;
> > > +	bool retry; /* retry to feed bitstream if failure reason is
> > WAVE5_SYSERR_QUEUEING_FAIL*/
> > > +	int queuing_num; /* check if there is input buffer or not */
> > 
> > This is described as a boolean, but is implemented as a counter. What does
> > it count exactly ?
> > I think it needs a better name too.
> > 
> 
> Please refer to the above comment.
> 
> Thanks
> Jackson
> 
> 
> > Nicolas
> > 
> > > +	struct mutex feed_lock; /* lock for feeding bitstream buffers */
> > >  	struct vpu_buf bitstream_vbuf;
> > >  	dma_addr_t last_rd_ptr;
> > >  	size_t remaining_consumed_bytes;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder
  2025-05-28 13:46       ` Nicolas Dufresne
@ 2025-06-04  4:09         ` jackson.lee
  2025-06-04 13:47           ` Nicolas Dufresne
  0 siblings, 1 reply; 29+ messages in thread
From: jackson.lee @ 2025-06-04  4:09 UTC (permalink / raw)
  To: Nicolas Dufresne, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	bob.beckett@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung

Hi

> -----Original Message-----
> From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> Sent: Wednesday, May 28, 2025 10:47 PM
> To: jackson.lee <jackson.lee@chipsnmedia.com>; mchehab@kernel.org;
> hverkuil-cisco@xs4all.nl; bob.beckett@collabora.com
> Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; lafley.kim
> <lafley.kim@chipsnmedia.com>; b-brnich@ti.com; hverkuil@xs4all.nl; Nas
> Chung <nas.chung@chipsnmedia.com>
> Subject: Re: [PATCH v2 2/7] media: chips-media: wave5: Improve performance
> of decoder
> 
> Hi,
> 
> Le mardi 27 mai 2025 à 04:58 +0000, jackson.lee a écrit :
> >
> >
> > > -----Original Message-----
> > > From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> > > Sent: Saturday, May 24, 2025 2:39 AM
> > > To: jackson.lee <jackson.lee@chipsnmedia.com>; mchehab@kernel.org;
> > > hverkuil-cisco@xs4all.nl; sebastian.fricke@collabora.com;
> > > bob.beckett@collabora.com; dafna.hirschfeld@collabora.com
> > > Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > lafley.kim <lafley.kim@chipsnmedia.com>; b-brnich@ti.com;
> > > hverkuil@xs4all.nl; Nas Chung <nas.chung@chipsnmedia.com>
> > > Subject: Re: [PATCH v2 2/7] media: chips-media: wave5: Improve
> > > performance of decoder
> > >
> > > Hi,
> > >
> > > Le jeudi 22 mai 2025 à 16:26 +0900, Jackson.lee a écrit :
> > > > From: Jackson Lee <jackson.lee@chipsnmedia.com>
> > > >
> > > > The current decoding method  was to wait until each frame was
> > > > decoded after feeding a bitstream. As a result, performance was
> > > > low and Wave5 could not achieve max pixel processing rate.
> > > >
> > > > Update driver to use an asynchronous approach for decoding and
> > > > feeding a bitstream in order to achieve full capabilities of the
> device.
> > > >
> > > > WAVE5 supports command-queueing to maximize performance by
> > > > pipelining internal commands and by hiding wait cycle taken to
> > > > receive a command from Host processor.
> > > >
> > > > Instead of waiting for each command to be executed before sending
> > > > the next command, Host processor just places all the commands in
> > > > the command-queue and goes on doing other things while the
> > > > commands in the queue are processed by VPU.
> > > >
> > > > While Host processor handles its own tasks, it can receive VPU
> > > > interrupt request (IRQ).
> > > > In this case, host processor can simply exit interrupt service
> > > > routine
> > > > (ISR) without accessing to host interface to read the result of
> > > > the command reported by VPU.
> > > > After host processor completed its tasks, host processor can read
> > > > the command result when host processor needs the reports and does
> > > > response processing.
> > > >
> > > > To archive this goal, the device_run() calls v4l2_m2m_job_finish
> > > > so that next command can be sent to VPU continuously, if there is
> > > > any result, then irq is triggered and gets decoded frames and
> > > > returns them to upper layer.
> > > > Theses processes work independently each other without waiting a
> > > > decoded frame.
> > > >
> > > > Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> > > > Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> > > > ---
> > > >  .../platform/chips-media/wave5/wave5-hw.c     |  2 +-
> > > >  .../chips-media/wave5/wave5-vpu-dec.c         | 84
> > > > +++++++++++--------
> > > >  .../platform/chips-media/wave5/wave5-vpuapi.c |  2 +
> > > >  .../platform/chips-media/wave5/wave5-vpuapi.h |  3 +
> > > >  4 files changed, 57 insertions(+), 34 deletions(-)
> > > >
> > > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > > > b/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > > > index d94cf84c3ee5..687ce6ccf3ae 100644
> > > > --- a/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-hw.c
> > > > @@ -102,7 +102,7 @@ static void _wave5_print_reg_err(struct
> > > > vpu_device
> > > *vpu_dev, u32 reg_fail_reason
> > > >  		dev_dbg(dev, "%s: queueing failure: 0x%x\n", func,
> reg_val);
> > > >  		break;
> > > >  	case WAVE5_SYSERR_RESULT_NOT_READY:
> > > > -		dev_err(dev, "%s: result not ready: 0x%x\n", func,
> > > reg_fail_reason);
> > > > +		dev_dbg(dev, "%s: result not ready: 0x%x\n", func,
> > > > +reg_fail_reason);
> > > >  		break;
> > > >  	case WAVE5_SYSERR_ACCESS_VIOLATION_HW:
> > > >  		dev_err(dev, "%s: access violation: 0x%x\n", func,
> > > > reg_fail_reason); diff --git
> > > > a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > > b/drivers/media/platform/chips- media/wave5/wave5-vpu-dec.c index
> > > > 32de43de1870..995234a3a6d6 100644
> > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> > > > @@ -347,13 +347,12 @@ static void
> > > > wave5_vpu_dec_finish_decode(struct
> > > vpu_instance *inst)
> > > >  	struct vb2_v4l2_buffer *dec_buf = NULL;
> > > >  	struct vb2_v4l2_buffer *disp_buf = NULL;
> > > >  	struct vb2_queue *dst_vq = v4l2_m2m_get_dst_vq(m2m_ctx);
> > > > -	struct queue_status_info q_status;
> > > >
> > > >  	dev_dbg(inst->dev->dev, "%s: Fetch output info from
> firmware.",
> > > > __func__);
> > > >
> > > >  	ret = wave5_vpu_dec_get_output_info(inst, &dec_info);
> > > >  	if (ret) {
> > > > -		dev_warn(inst->dev->dev, "%s: could not get output
> info.",
> > > __func__);
> > > > +		dev_dbg(inst->dev->dev, "%s: could not get output
> info.",
> > > > +__func__);
> > >
> > > Wouldn't it be better to check the return value to possibly
> > > differentiate some errors from something similar to EGAIN?
> > >
> >
> > In case of this, when get_result command is requested to VPU, there
> could be no output.
> > So it is not error case and EAGIAN is not proper because the
> > wave5_vpu_dec_finish_decode is triggered by PIC_Done interrupt.
> > So I think it is proper code.
> 
> My worry is that you are silencing possible problems. I checked further,
> it may return EINVAL, EIO and EAGAIN and whatever read_poll_timeout()
> returns on timeout. For most, this trace was noise, so now I agree with
> the change, but can you go find the EINVAL and add a warn_on around it,
> since this is a programming error, it would happen if dec_info is a null
> pointer. Can be patch in itself, before this one.
> 
> >
> >
> > > >  		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> > > >  		return;
> > > >  	}
> > > > @@ -441,20 +440,6 @@ static void
> > > > wave5_vpu_dec_finish_decode(struct
> > > vpu_instance *inst)
> > > >  		}
> > > >  		spin_unlock_irqrestore(&inst->state_spinlock, flags);
> > > >  	}
> > > > -
> > > > -	/*
> > > > -	 * During a resolution change and while draining, the
> firmware may
> > > flush
> > > > -	 * the reorder queue regardless of having a matching decoding
> > > operation
> > > > -	 * pending. Only terminate the job if there are no more IRQ
> coming.
> > > > -	 */
> > > > -	wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS,
> &q_status);
> > > > -	if (q_status.report_queue_count == 0 &&
> > > > -	    (q_status.instance_queue_count == 0 ||
> > > dec_info.sequence_changed)) {
> > > > -		dev_dbg(inst->dev->dev, "%s: finishing job.\n",
> __func__);
> > > > -		pm_runtime_mark_last_busy(inst->dev->dev);
> > > > -		pm_runtime_put_autosuspend(inst->dev->dev);
> > > > -		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> > > > -	}
> > > >  }
> > > >
> > > >  static int wave5_vpu_dec_querycap(struct file *file, void *fh,
> > > > struct v4l2_capability *cap) @@ -1146,8 +1131,8 @@ static int
> > > > write_to_ringbuffer(struct vpu_instance *inst, void *buffer,
> > > > size_t b
> > > >  static int fill_ringbuffer(struct vpu_instance *inst)
> > > >  {
> > > >  	struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
> > > > -	struct v4l2_m2m_buffer *buf, *n;
> > > > -	int ret;
> > > > +	struct vpu_src_buffer *vpu_buf;
> > > > +	int ret = 0;
> > > >
> > > >  	if (m2m_ctx->last_src_buf)  {
> > > >  		struct vpu_src_buffer *vpu_buf =
> > > > wave5_to_vpu_src_buf(m2m_ctx->last_src_buf);
> > > > @@ -1158,9 +1143,8 @@ static int fill_ringbuffer(struct
> > > > vpu_instance
> > > *inst)
> > > >  		}
> > > >  	}
> > > >
> > > > -	v4l2_m2m_for_each_src_buf_safe(m2m_ctx, buf, n) {
> > > > -		struct vb2_v4l2_buffer *vbuf = &buf->vb;
> > > > -		struct vpu_src_buffer *vpu_buf =
> wave5_to_vpu_src_buf(vbuf);
> > > > +	list_for_each_entry(vpu_buf, &inst->avail_src_bufs, list) {
> > > > +		struct vb2_v4l2_buffer *vbuf = &vpu_buf-
> >v4l2_m2m_buf.vb;
> > > >  		struct vpu_buf *ring_buffer = &inst->bitstream_vbuf;
> > > >  		size_t src_size = vb2_get_plane_payload(&vbuf->vb2_buf,
> 0);
> > > >  		void *src_buf = vb2_plane_vaddr(&vbuf->vb2_buf, 0); @@
> -
> > > 1220,9
> > > > +1204,13 @@ static int fill_ringbuffer(struct vpu_instance *inst)
> > > >  			dev_dbg(inst->dev->dev, "last src buffer
> written to
> > > the ring buffer\n");
> > > >  			break;
> > > >  		}
> > > > +
> > > > +		inst->queuing_num++;
> > > > +		list_del_init(&vpu_buf->list);
> > > > +		break;
> > > >  	}
> > > >
> > > > -	return 0;
> > > > +	return ret;
> > > >  }
> > > >
> > > >  static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb) @@
> > > > -1236,6 +1224,11 @@ static void wave5_vpu_dec_buf_queue_src(struct
> > > vb2_buffer *vb)
> > > >  	vbuf->sequence = inst->queued_src_buf_num++;
> > > >
> > > >  	v4l2_m2m_buf_queue(m2m_ctx, vbuf);
> > > > +
> > > > +	INIT_LIST_HEAD(&vpu_buf->list);
> > > > +	mutex_lock(&inst->feed_lock);
> > > > +	list_add_tail(&vpu_buf->list, &inst->avail_src_bufs);
> > > > +	mutex_unlock(&inst->feed_lock);
> > > >  }
> > > >
> > > >  static void wave5_vpu_dec_buf_queue_dst(struct vb2_buffer *vb) @@
> > > > -1385,6 +1378,13 @@ static int streamoff_output(struct vb2_queue *q)
> > > >  	dma_addr_t new_rd_ptr;
> > > >  	struct dec_output_info dec_info;
> > > >  	unsigned int i;
> > > > +	struct vpu_src_buffer *vpu_buf, *tmp;
> > > > +
> > > > +	inst->retry = false;
> > > > +	inst->queuing_num = 0;
> > > > +
> > > > +	list_for_each_entry_safe(vpu_buf, tmp, &inst->avail_src_bufs,
> list)
> > > > +		list_del_init(&vpu_buf->list);
> > > >
> > > >  	for (i = 0; i < v4l2_m2m_num_dst_bufs_ready(m2m_ctx); i++) {
> > > >  		ret = wave5_vpu_dec_set_disp_flag(inst, i); @@ -
> 1580,10
> > > +1580,19 @@
> > > > static void wave5_vpu_dec_device_run(void *priv)
> > > >
> > > >  	dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with new
> > > bitstream data", __func__);
> > > >  	pm_runtime_resume_and_get(inst->dev->dev);
> > > > -	ret = fill_ringbuffer(inst);
> > > > -	if (ret) {
> > > > -		dev_warn(inst->dev->dev, "Filling ring buffer
> failed\n");
> > > > -		goto finish_job_and_return;
> > > > +	if (!inst->retry) {
> > > > +		mutex_lock(&inst->feed_lock);
> > > > +		ret = fill_ringbuffer(inst);
> > > > +		mutex_unlock(&inst->feed_lock);
> > > > +		if (ret < 0) {
> > > > +			dev_warn(inst->dev->dev, "Filling ring buffer
> > > failed\n");
> > > > +			goto finish_job_and_return;
> > > > +		} else if (!inst->eos &&
> > > > +				inst->queuing_num == 0 &&
> > > > +				inst->state == VPU_INST_STATE_PIC_RUN) {
> > > > +			dev_dbg(inst->dev->dev, "%s: no bitstream for
> feeding,
> > > so skip ", __func__);
> > > > +			goto finish_job_and_return;
> > > > +		}
> > > >  	}
> > > >
> > > >  	switch (inst->state) {
> > > > @@ -1639,7 +1648,7 @@ static void wave5_vpu_dec_device_run(void
> *priv)
> > > >  		}
> > > >
> > > >  		if (q_status.instance_queue_count) {
> > > > -			dev_dbg(inst->dev->dev, "%s: leave with active
> job",
> > > __func__);
> > > > +			v4l2_m2m_job_finish(inst->v4l2_m2m_dev,
> m2m_ctx);
> > > >  			return;
> > > >  		}
> > > >
> > > > @@ -1650,14 +1659,21 @@ static void wave5_vpu_dec_device_run(void
> *priv)
> > > >  			dev_err(inst->dev->dev,
> > > >  				"Frame decoding on m2m context (%p),
> fail: %d
> > > (result: %d)\n",
> > > >  				m2m_ctx, ret, fail_res);
> > > > -			break;
> > > > +			goto finish_job_and_return;
> > > > +		}
> > > > +
> > > > +		if (fail_res == WAVE5_SYSERR_QUEUEING_FAIL) {
> > > > +			inst->retry = true;
> > > > +		} else {
> > > > +			inst->retry = false;
> > > > +			if (!inst->eos)
> > > > +				inst->queuing_num--;
> > >
> > > I looked into the original state machine violation you had in
> > > previous version, and I got the impression that the reason you did
> > > hit that was that you actually call device_run passed inst->eos. Its
> > > probably not that simple in practice, but I think you forgot to
> > > adapt the job_ready() ops to prevent more device_run() called passed
> > > CMD_STOP and having all pending buffer written in the ring buffer.
> > >
> > > As a side effect, you endup calling device_run() in a race with the
> > > finish() setting the state to STOP. I really think there is a way to
> > > use
> > > inst->eos boolean to prevent that race in the first place. Might
> > > inst->need to
> > > be combined with checking if you have buffers prior to command stop
> > > that did not yet fit into the ring buffer.
> > >
> >
> >
> > The queuing_num is used to check if there is input data or not, so it
> was declared in the int type.
> > If there is no input data, then the device_run will be not called until
> queuing input data.
> > In case of eos sent, device_run should be ran continuously until getting
> eos from VPU, the code was needed.
> > If my answer is not correct, please let me know.
> 
> Running in loop anything is never the right approach. The device_run()
> should be run when a useful event occur and filtered by the job_ready()
> ops. I believe I'm proposing some hint how to solve this design issue. The
> issue is quite clear with the follow up patch trying to reduce the CPU
> usage due to spinning.



Thanks for your feedback.
But there is one thing to say to you.
After receiving EOS from application, we have to periodically run the device_run to send the DEC_PIC command so that VPU can trigger interrupt  until getting all decoded frames and EOS from Get_Result command.
So even if we sent EOS to VPU once, we should run the device_run function continuously, the above code was added. 
If the job_ready returns false to prevent running the device_run after sending EOS to VPU, then GStreamer pipeline will not be terminated normally because of not receiving all decoded frames.


Thanks


> 
> >
> >
> > > >  		}
> > > > -		/* Return so that we leave this job active */
> > > > -		dev_dbg(inst->dev->dev, "%s: leave with active job",
> > > __func__);
> > > > -		return;
> > > > -	default:
> > > > -		WARN(1, "Execution of a job in state %s illegal.\n",
> > > state_to_str(inst->state));
> > > >  		break;
> > > > +	default:
> > > > +		dev_dbg(inst->dev->dev, "Execution of a job in
> state %s
> > > illegal.\n",
> > > > +			state_to_str(inst->state));
> > > > +
> > > >  	}
> > > >
> > > >  finish_job_and_return:
> > > > @@ -1755,6 +1771,8 @@ static int wave5_vpu_open_dec(struct file
> *filp)
> > > >  	inst->ops = &wave5_vpu_dec_inst_ops;
> > > >
> > > >  	spin_lock_init(&inst->state_spinlock);
> > > > +	mutex_init(&inst->feed_lock);
> > > > +	INIT_LIST_HEAD(&inst->avail_src_bufs);
> > > >
> > > >  	inst->codec_info = kzalloc(sizeof(*inst->codec_info),
> GFP_KERNEL);
> > > >  	if (!inst->codec_info)
> > > > diff --git
> > > > a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > > b/drivers/media/platform/chips-media/wave5/wave5-
> > > > vpuapi.c
> > > > index e5e879a13e8b..68d86625538f 100644
> > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> > > > @@ -255,6 +255,8 @@ int wave5_vpu_dec_close(struct vpu_instance
> > > > *inst,
> > > u32 *fail_res)
> > > >  	if (inst_count == 1)
> > > >  		pm_runtime_dont_use_autosuspend(vpu_dev->dev);
> > > >
> > > > +	mutex_destroy(&inst->feed_lock);
> > > > +
> > > >  unlock_and_return:
> > > >  	mutex_unlock(&vpu_dev->hw_lock);
> > > >  	pm_runtime_put_sync(inst->dev->dev);
> > > > diff --git
> > > > a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > > b/drivers/media/platform/chips-media/wave5/wave5-
> > > > vpuapi.h
> > > > index f3c1ad6fb3be..fd0aef0bac4e 100644
> > > > --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> > > > @@ -818,6 +818,9 @@ struct vpu_instance {
> > > >  	bool cbcr_interleave;
> > > >  	bool nv21;
> > > >  	bool eos;
> > > > +	bool retry; /* retry to feed bitstream if failure reason is
> > > WAVE5_SYSERR_QUEUEING_FAIL*/
> > > > +	int queuing_num; /* check if there is input buffer or not */
> > >
> > > This is described as a boolean, but is implemented as a counter.
> > > What does it count exactly ?
> > > I think it needs a better name too.
> > >
> >
> > Please refer to the above comment.
> 
> That won't work for me, I'm requesting you to correct the comment to
> properly say this is a counter.
> 
> Nicolas
> 
> >
> > Thanks
> > Jackson
> >
> >
> > > Nicolas
> > >
> > > > +	struct mutex feed_lock; /* lock for feeding bitstream
> buffers */
> > > >  	struct vpu_buf bitstream_vbuf;
> > > >  	dma_addr_t last_rd_ptr;
> > > >  	size_t remaining_consumed_bytes;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder
  2025-06-04  4:09         ` jackson.lee
@ 2025-06-04 13:47           ` Nicolas Dufresne
  2025-06-05  4:50             ` jackson.lee
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Dufresne @ 2025-06-04 13:47 UTC (permalink / raw)
  To: jackson.lee, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	bob.beckett@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung

Le mercredi 04 juin 2025 à 04:09 +0000, jackson.lee a écrit :
> > Running in loop anything is never the right approach. The device_run()
> > should be run when a useful event occur and filtered by the job_ready()
> > ops. I believe I'm proposing some hint how to solve this design issue. The
> > issue is quite clear with the follow up patch trying to reduce the CPU
> > usage due to spinning.
> 
> 
> 
> Thanks for your feedback.
> But there is one thing to say to you.
> After receiving EOS from application, we have to periodically run the device_run
> to send the DEC_PIC command so that VPU can trigger interrupt  until getting all 
> decoded frames and EOS from Get_Result command.
> So even if we sent EOS to VPU once, we should run the device_run function continuously,
> the above code was added. If the job_ready returns false to prevent running the
> device_run after sending EOS to VPU, then GStreamer pipeline will not be terminated 
> normally because of not receiving all decoded frames.

This, in my opinion, boils down to a small flaw, either in the firmware or the driver.
This is why there was this code:

-
-	/*
-	 * During a resolution change and while draining, the firmware may flush
-	 * the reorder queue regardless of having a matching decoding operation
-	 * pending. Only terminate the job if there are no more IRQ coming.
-	 */
-	wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, &q_status);
-	if (q_status.report_queue_count == 0 &&
-	    (q_status.instance_queue_count == 0 || dec_info.sequence_changed)) {
-		dev_dbg(inst->dev->dev, "%s: finishing job.\n", __func__);
-		pm_runtime_mark_last_busy(inst->dev->dev);
-		pm_runtime_put_autosuspend(inst->dev->dev);
-		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
-	}

Which you removed in this patch, as it makes it impossible to utilise the HW queues.
In the specific case you described, if my memory is right, the CMD_STOP (EOS in your
terms) comes in race with the queue being consumed, leading to possibly having no
event to figure-out when we are done with that sequenece.

V4L2 M2M is all event based, and v4l2_m2m_job_finish() is one of those. But in the
new implementation, this event no longer correlate with the HW being idle.
This is fine, don't read me wrong. It now matches the driver being ready to try and
queue more work.

So my question is, is there a way to know, at CMD_STOP call, that the HW
has gone idle, and that no more events will allow handling the EOS case?

Nicolas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder
  2025-06-04 13:47           ` Nicolas Dufresne
@ 2025-06-05  4:50             ` jackson.lee
  2025-06-05 13:28               ` Nicolas Dufresne
  0 siblings, 1 reply; 29+ messages in thread
From: jackson.lee @ 2025-06-05  4:50 UTC (permalink / raw)
  To: Nicolas Dufresne, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	bob.beckett@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung

Hi Nicolas

> -----Original Message-----
> From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> Sent: Wednesday, June 4, 2025 10:48 PM
> To: jackson.lee <jackson.lee@chipsnmedia.com>; mchehab@kernel.org;
> hverkuil-cisco@xs4all.nl; bob.beckett@collabora.com
> Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; lafley.kim
> <lafley.kim@chipsnmedia.com>; b-brnich@ti.com; hverkuil@xs4all.nl; Nas
> Chung <nas.chung@chipsnmedia.com>
> Subject: Re: [PATCH v2 2/7] media: chips-media: wave5: Improve performance
> of decoder
> 
> Le mercredi 04 juin 2025 à 04:09 +0000, jackson.lee a écrit :
> > > Running in loop anything is never the right approach. The
> > > device_run() should be run when a useful event occur and filtered by
> > > the job_ready() ops. I believe I'm proposing some hint how to solve
> > > this design issue. The issue is quite clear with the follow up patch
> > > trying to reduce the CPU usage due to spinning.
> >
> >
> >
> > Thanks for your feedback.
> > But there is one thing to say to you.
> > After receiving EOS from application, we have to periodically run the
> > device_run to send the DEC_PIC command so that VPU can trigger
> > interrupt  until getting all decoded frames and EOS from Get_Result
> command.
> > So even if we sent EOS to VPU once, we should run the device_run
> > function continuously, the above code was added. If the job_ready
> > returns false to prevent running the device_run after sending EOS to
> > VPU, then GStreamer pipeline will not be terminated normally because of
> not receiving all decoded frames.
> 
> This, in my opinion, boils down to a small flaw, either in the firmware or
> the driver.
> This is why there was this code:
> 
> -
> -	/*
> -	 * During a resolution change and while draining, the firmware may
> flush
> -	 * the reorder queue regardless of having a matching decoding
> operation
> -	 * pending. Only terminate the job if there are no more IRQ coming.
> -	 */
> -	wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, &q_status);
> -	if (q_status.report_queue_count == 0 &&
> -	    (q_status.instance_queue_count == 0 ||
> dec_info.sequence_changed)) {
> -		dev_dbg(inst->dev->dev, "%s: finishing job.\n", __func__);
> -		pm_runtime_mark_last_busy(inst->dev->dev);
> -		pm_runtime_put_autosuspend(inst->dev->dev);
> -		v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> -	}
> 
> Which you removed in this patch, as it makes it impossible to utilise the
> HW queues.
> In the specific case you described, if my memory is right, the CMD_STOP
> (EOS in your
> terms) comes in race with the queue being consumed, leading to possibly
> having no event to figure-out when we are done with that sequenece.
> 
> V4L2 M2M is all event based, and v4l2_m2m_job_finish() is one of those.
> But in the new implementation, this event no longer correlate with the HW
> being idle.
> This is fine, don't read me wrong. It now matches the driver being ready
> to try and queue more work.
> 
> So my question is, is there a way to know, at CMD_STOP call, that the HW
> has gone idle, and that no more events will allow handling the EOS case?
> 
> Nicolas


Thanks for your reply.	

Now there is only one thing to know if there is more events or not to handle the EOS case.
It is that driver sends DEC_PIC command to VPU continuously until display index is -2(it means EOS) from VPU.
VPU should trigger interrupts to get display index from the finish_decode function.
So we have to run device_run to send DEC_PIC command.

Thanks
Jackson

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder
  2025-06-05  4:50             ` jackson.lee
@ 2025-06-05 13:28               ` Nicolas Dufresne
  2025-06-09  8:47                 ` jackson.lee
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Dufresne @ 2025-06-05 13:28 UTC (permalink / raw)
  To: jackson.lee, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	bob.beckett@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung

Hi,

Le jeudi 05 juin 2025 à 04:50 +0000, jackson.lee a écrit :
> > So my question is, is there a way to know, at CMD_STOP call, that the HW
> > has gone idle, and that no more events will allow handling the EOS case?
> > 
> > Nicolas
> 
> 
> Thanks for your reply.	
> 
> Now there is only one thing to know if there is more events or not to handle the EOS case.
> It is that driver sends DEC_PIC command to VPU continuously until display index is -2(it means EOS) from VPU.
> VPU should trigger interrupts to get display index from the finish_decode function.
> So we have to run device_run to send DEC_PIC command.

What don't want to see is a loop where we do:

	device_run()
		finish_job()
			device_run()
				finish_job()
					....

What I see now, is that we simply bang on the trigger until it completes, which
is very wasteful in power and CPU time. In your next version, make sure to
find a mitigation to that active loop, and document it please.

Nicolas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder
  2025-06-05 13:28               ` Nicolas Dufresne
@ 2025-06-09  8:47                 ` jackson.lee
  0 siblings, 0 replies; 29+ messages in thread
From: jackson.lee @ 2025-06-09  8:47 UTC (permalink / raw)
  To: Nicolas Dufresne, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
	bob.beckett@collabora.com
  Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
	lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung

Hi Nicolas

I will find the way to reduce overhead after receiving the STOP_CMD.

thanks

> -----Original Message-----
> From: Nicolas Dufresne <nicolas.dufresne@collabora.com>
> Sent: Thursday, June 5, 2025 10:29 PM
> To: jackson.lee <jackson.lee@chipsnmedia.com>; mchehab@kernel.org;
> hverkuil-cisco@xs4all.nl; bob.beckett@collabora.com
> Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; lafley.kim
> <lafley.kim@chipsnmedia.com>; b-brnich@ti.com; hverkuil@xs4all.nl; Nas
> Chung <nas.chung@chipsnmedia.com>
> Subject: Re: [PATCH v2 2/7] media: chips-media: wave5: Improve performance
> of decoder
> 
> Hi,
> 
> Le jeudi 05 juin 2025 à 04:50 +0000, jackson.lee a écrit :
> > > So my question is, is there a way to know, at CMD_STOP call, that
> > > the HW has gone idle, and that no more events will allow handling the
> EOS case?
> > >
> > > Nicolas
> >
> >
> > Thanks for your reply.
> >
> > Now there is only one thing to know if there is more events or not to
> handle the EOS case.
> > It is that driver sends DEC_PIC command to VPU continuously until
> display index is -2(it means EOS) from VPU.
> > VPU should trigger interrupts to get display index from the
> finish_decode function.
> > So we have to run device_run to send DEC_PIC command.
> 
> What don't want to see is a loop where we do:
> 
> 	device_run()
> 		finish_job()
> 			device_run()
> 				finish_job()
> 					....
> 
> What I see now, is that we simply bang on the trigger until it completes,
> which is very wasteful in power and CPU time. In your next version, make
> sure to find a mitigation to that active loop, and document it please.
> 
> Nicolas

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2025-06-09  8:47 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22  7:25 [PATCH v2 0/7] Performance improvement of decoder Jackson.lee
2025-05-22  7:26 ` [PATCH v2 1/7] media: chips-media: wave5: Fix Null reference while testing fluster Jackson.lee
2025-05-23 17:20   ` Nicolas Dufresne
2025-05-27  4:05     ` jackson.lee
2025-05-27 12:57       ` Nicolas Dufresne
2025-05-22  7:26 ` [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder Jackson.lee
2025-05-23 17:39   ` Nicolas Dufresne
2025-05-27  4:58     ` jackson.lee
2025-05-28 13:46       ` Nicolas Dufresne
2025-06-04  4:09         ` jackson.lee
2025-06-04 13:47           ` Nicolas Dufresne
2025-06-05  4:50             ` jackson.lee
2025-06-05 13:28               ` Nicolas Dufresne
2025-06-09  8:47                 ` jackson.lee
2025-05-30 14:33       ` Nicolas Dufresne
2025-05-22  7:26 ` [PATCH v2 3/7] media: chips-media: wave5: Fix not to be closed Jackson.lee
2025-05-22  7:26 ` [PATCH v2 4/7] media: chips-media: wave5: Use spinlock whenever statue is changed Jackson.lee
2025-05-23 17:41   ` Nicolas Dufresne
2025-05-27  5:02     ` jackson.lee
2025-05-28 13:49       ` Nicolas Dufresne
2025-05-22  7:26 ` [PATCH v2 5/7] media: chips-media: wave5: Fix not to free resources normally when instance was destroyed Jackson.lee
2025-05-23 17:42   ` Nicolas Dufresne
2025-05-27  5:04     ` jackson.lee
2025-05-22  7:26 ` [PATCH v2 6/7] media: chips-media: wave5: Reduce high CPU load Jackson.lee
2025-05-23 17:43   ` Nicolas Dufresne
2025-05-27  5:05     ` jackson.lee
2025-05-22  7:26 ` [PATCH v2 7/7] media: chips-media: wave5: Fix SError of kernel panic when closed Jackson.lee
2025-05-23 17:48   ` Nicolas Dufresne
2025-05-27  5:07     ` jackson.lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).