* [PATCH v3 0/4] Performance improvement of decoder
@ 2025-06-23 0:21 Jackson.lee
2025-06-23 0:21 ` [PATCH v3 1/4] media: chips-media: wave5: Fix SError of kernel panic when closed Jackson.lee
` (5 more replies)
0 siblings, 6 replies; 12+ messages in thread
From: Jackson.lee @ 2025-06-23 0:21 UTC (permalink / raw)
To: mchehab, hverkuil-cisco, nicolas.dufresne, bob.beckett
Cc: linux-media, linux-kernel, jackson.lee, lafley.kim, b-brnich,
hverkuil, nas.chung
From: Jackson Lee <jackson.lee@chipsnmedia.com>
v4l2-compliance results:
========================
v4l2-compliance 1.28.1-5233, 64 bits, 64-bit time_t
Buffer ioctls:
warn: v4l2-test-buffers.cpp(693): VIDIOC_CREATE_BUFS not supported
warn: v4l2-test-buffers.cpp(693): VIDIOC_CREATE_BUFS not supported
test VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF: OK
test CREATE_BUFS maximum buffers: OK
test VIDIOC_EXPBUF: OK
test Requests: OK (Not Supported)
Total for wave5-dec device /dev/video0: 46, Succeeded: 46, Failed: 0, Warnings: 2 Total for wave5-enc device /dev/video1: 46, Succeeded: 46, Failed: 0, Warnings: 0
Fluster test results:
=====================
Running test suite JCT-VC-HEVC_V1 with decoder GStreamer-H.265-V4L2-Gst1.0 Using 3 parallel job(s)
Ran 133/147 tests successfully in 40.114 secs
(1 test fails because of not supporting to parse multi frames, 1 test fails because of a missing frame and slight corruption,
2 tests fail because of sizes which are incompatible with the IP, 11 tests fail because of unsupported 10 bit format)
Running test suite JVT-AVC_V1 with decoder GStreamer-H.264-V4L2-Gst1.0 Using 3 parallel job(s)
Ran 78/135 tests successfully in 43.364 secs
(57 fail because the hardware is unable to decode MBAFF / FMO / Field / Extended profile streams.)
Running test suite JVT-FR-EXT with decoder GStreamer-H.264-V4L2-Gst1.0 Using 3 parallel job(s)
Ran 25/69 tests successfully in 40.411 secs
(44 fail because the hardware does not support field encoded and 422 encoded stream)
Seek test
=====================
1. gst-play-1.0 seek.264
2. this will use waylandsink since gst-play-1.0 uses playbin.
if you don't want to hook up display,
you can run gst-play-1.0 seek.264 --videosink=fakevideosink instead 3. Let pipeline run for 2-3 seconds 4. press SPACE key to pause 5. press 0 to reset press SPACE to start again
gst-play-1.0 seek.264 --videosink=fakevideosink Press 'k' to see a list of keyboard shortcuts.
Now playing /root/seek.264
Redistribute latency...
Redistribute latency...
Redistribute latency...
Redistribute latency...
Redistribute latency...aused
0:00:09.9 / 0:00:09.7
Reached end of play list.
Sequence Change test
=====================
gst-launch-1.0 filesrc location=./drc.h264 ! h264parse ! v4l2h264dec ! filesink location=./h264_output_420.yuv Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Redistribute latency...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Redistribute latency...
Got EOS from element "pipeline0".
Execution ended after 0:00:00.113620590
Setting pipeline to NULL ...
Freeing pipeline ...
Change since v2:
==================
* For [PATCH v3 4/4] media: chips-media: wave5: Improve performance of decoder
- squash v2's #3~#6 to #4 patch of v3
Change since v1:
===================
* For [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder
- change log to dbg level
Change since v0:
===================
* For [PATCH v1 2/7] media: chips-media: wave5: Improve performance of decoder
- separates the previous patch to a few patches
* For [PATCH v1 3/7] media: chips-media: wave5: Fix not to be closed
- separated from the previous patch of performance improvement of
decoder
* For [PATCH v1 4/7] media: chips-media: wave5: Use spinlock whenever state is changed
- separated from the previous patch of performance improvement of
decoder
* For [PATCH v1 5/7] media: chips-media: wave5: Fix not to free resources normally when
instance was destroyed
- separated from the previous patch of performance improvement of
decoder
* For [PATCH v1 7/7] media: chips-media: wave5: Fix SError of kernel panic when closed
- separated from the previous patch of performance improvement of
decoder
Jackson Lee (4):
media: chips-media: wave5: Fix SError of kernel panic when closed
media: chips-media: wave5: Fix Null reference while testing fluster
media: chips-media: wave5: Add WARN_ON to check if dec_output_info is
NULL
media: chips-media: wave5: Improve performance of decoder
.../platform/chips-media/wave5/wave5-helper.c | 23 ++-
.../platform/chips-media/wave5/wave5-hw.c | 2 +-
.../chips-media/wave5/wave5-vpu-dec.c | 139 ++++++++++++------
.../chips-media/wave5/wave5-vpu-enc.c | 8 +-
.../platform/chips-media/wave5/wave5-vpu.c | 71 +++++++--
.../platform/chips-media/wave5/wave5-vpuapi.c | 37 ++---
.../platform/chips-media/wave5/wave5-vpuapi.h | 11 ++
.../chips-media/wave5/wave5-vpuconfig.h | 1 +
8 files changed, 219 insertions(+), 73 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 1/4] media: chips-media: wave5: Fix SError of kernel panic when closed
2025-06-23 0:21 [PATCH v3 0/4] Performance improvement of decoder Jackson.lee
@ 2025-06-23 0:21 ` Jackson.lee
2025-08-29 18:03 ` Nicolas Dufresne
2025-06-23 0:21 ` [PATCH v3 2/4] media: chips-media: wave5: Fix Null reference while testing fluster Jackson.lee
` (4 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Jackson.lee @ 2025-06-23 0:21 UTC (permalink / raw)
To: mchehab, hverkuil-cisco, nicolas.dufresne, bob.beckett
Cc: linux-media, linux-kernel, jackson.lee, lafley.kim, b-brnich,
hverkuil, nas.chung
From: Jackson Lee <jackson.lee@chipsnmedia.com>
SError of kernel panic rarely happened while testing fluster.
The root cause was to enter suspend mode because timeout of autosuspend
delay happened.
[ 48.834439] SError Interrupt on CPU0, code 0x00000000bf000000 -- SError
[ 48.834455] CPU: 0 UID: 0 PID: 1067 Comm: v4l2h265dec0:sr Not tainted 6.12.9-gc9e21a1ebd75-dirty #7
[ 48.834461] Hardware name: ti Texas Instruments J721S2 EVM/Texas Instruments J721S2 EVM, BIOS 2025.01-00345-gbaf3aaa8ecfa 01/01/2025
[ 48.834464] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 48.834468] pc : wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
[ 48.834488] lr : wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
[ 48.834495] sp : ffff8000856e3a30
[ 48.834497] x29: ffff8000856e3a30 x28: ffff0008093f6010 x27: ffff000809158130
[ 48.834504] x26: 0000000000000000 x25: ffff00080b625000 x24: ffff000804a9ba80
[ 48.834509] x23: ffff000802343028 x22: ffff000809158150 x21: ffff000802218000
[ 48.834513] x20: ffff0008093f6000 x19: ffff0008093f6000 x18: 0000000000000000
[ 48.834518] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff74009618
[ 48.834523] x14: 000000010000000c x13: 0000000000000000 x12: 0000000000000000
[ 48.834527] x11: ffffffffffffffff x10: ffffffffffffffff x9 : ffff000802343028
[ 48.834532] x8 : ffff00080b6252a0 x7 : 0000000000000038 x6 : 0000000000000000
[ 48.834536] x5 : ffff00080b625060 x4 : 0000000000000000 x3 : 0000000000000000
[ 48.834541] x2 : 0000000000000000 x1 : ffff800084bf0118 x0 : ffff800084bf0000
[ 48.834547] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 48.834549] CPU: 0 UID: 0 PID: 1067 Comm: v4l2h265dec0:sr Not tainted 6.12.9-gc9e21a1ebd75-dirty #7
[ 48.834554] Hardware name: ti Texas Instruments J721S2 EVM/Texas Instruments J721S2 EVM, BIOS 2025.01-00345-gbaf3aaa8ecfa 01/01/2025
[ 48.834556] Call trace:
[ 48.834559] dump_backtrace+0x94/0xec
[ 48.834574] show_stack+0x18/0x24
[ 48.834579] dump_stack_lvl+0x38/0x90
[ 48.834585] dump_stack+0x18/0x24
[ 48.834588] panic+0x35c/0x3e0
[ 48.834592] nmi_panic+0x40/0x8c
[ 48.834595] arm64_serror_panic+0x64/0x70
[ 48.834598] do_serror+0x3c/0x78
[ 48.834601] el1h_64_error_handler+0x34/0x4c
[ 48.834605] el1h_64_error+0x64/0x68
[ 48.834608] wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
[ 48.834615] wave5_vpu_dec_clr_disp_flag+0x54/0x80 [wave5]
[ 48.834622] wave5_vpu_dec_buf_queue+0x19c/0x1a0 [wave5]
[ 48.834628] __enqueue_in_driver+0x3c/0x74 [videobuf2_common]
[ 48.834639] vb2_core_qbuf+0x508/0x61c [videobuf2_common]
[ 48.834646] vb2_qbuf+0xa4/0x168 [videobuf2_v4l2]
[ 48.834656] v4l2_m2m_qbuf+0x80/0x238 [v4l2_mem2mem]
[ 48.834666] v4l2_m2m_ioctl_qbuf+0x18/0x24 [v4l2_mem2mem]
[ 48.834673] v4l_qbuf+0x48/0x5c [videodev]
[ 48.834704] __video_do_ioctl+0x180/0x3f0 [videodev]
[ 48.834725] video_usercopy+0x2ec/0x68c [videodev]
[ 48.834745] video_ioctl2+0x18/0x24 [videodev]
[ 48.834766] v4l2_ioctl+0x40/0x60 [videodev]
[ 48.834786] __arm64_sys_ioctl+0xa8/0xec
[ 48.834793] invoke_syscall+0x44/0x100
[ 48.834800] el0_svc_common.constprop.0+0xc0/0xe0
[ 48.834804] do_el0_svc+0x1c/0x28
[ 48.834809] el0_svc+0x30/0xd0
[ 48.834813] el0t_64_sync_handler+0xc0/0xc4
[ 48.834816] el0t_64_sync+0x190/0x194
[ 48.834820] SMP: stopping secondary CPUs
[ 48.834831] Kernel Offset: disabled
[ 48.834833] CPU features: 0x08,00002002,80200000,4200421b
[ 48.834837] Memory Limit: none
[ 49.161404] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
Fixes: 2092b3833487 ("media: chips-media: wave5: Support runtime suspend/resume")
Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
---
.../platform/chips-media/wave5/wave5-vpu-dec.c | 3 ---
.../platform/chips-media/wave5/wave5-vpu-enc.c | 3 ---
.../media/platform/chips-media/wave5/wave5-vpu.c | 2 +-
.../platform/chips-media/wave5/wave5-vpuapi.c | 15 ---------------
4 files changed, 1 insertion(+), 22 deletions(-)
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
index fd71f0c43ac3..216b024c42d8 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
@@ -1830,9 +1830,6 @@ static int wave5_vpu_open_dec(struct file *filp)
if (ret)
goto cleanup_inst;
- if (list_empty(&dev->instances))
- pm_runtime_use_autosuspend(inst->dev->dev);
-
list_add_tail(&inst->list, &dev->instances);
mutex_unlock(&dev->dev_lock);
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
index 1e5fc5f8b856..cf20f774ed1b 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
@@ -1774,9 +1774,6 @@ static int wave5_vpu_open_enc(struct file *filp)
if (ret)
goto cleanup_inst;
- if (list_empty(&dev->instances))
- pm_runtime_use_autosuspend(inst->dev->dev);
-
list_add_tail(&inst->list, &dev->instances);
mutex_unlock(&dev->dev_lock);
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
index e1715d3f43b0..b3c633dd3df1 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
@@ -322,7 +322,7 @@ static int wave5_vpu_probe(struct platform_device *pdev)
dev_info(&pdev->dev, "Product Code: 0x%x\n", dev->product_code);
dev_info(&pdev->dev, "Firmware Revision: %u\n", fw_revision);
- pm_runtime_set_autosuspend_delay(&pdev->dev, 100);
+ pm_runtime_set_autosuspend_delay(&pdev->dev, 500);
pm_runtime_use_autosuspend(&pdev->dev);
pm_runtime_enable(&pdev->dev);
wave5_vpu_sleep_wake(&pdev->dev, true, NULL, 0);
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
index e5e879a13e8b..e94d6ebc9f81 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
@@ -207,8 +207,6 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
int retry = 0;
struct vpu_device *vpu_dev = inst->dev;
int i;
- int inst_count = 0;
- struct vpu_instance *inst_elm;
*fail_res = 0;
if (!inst->codec_info)
@@ -250,11 +248,6 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
wave5_vdi_free_dma_memory(vpu_dev, &p_dec_info->vb_task);
- list_for_each_entry(inst_elm, &vpu_dev->instances, list)
- inst_count++;
- if (inst_count == 1)
- pm_runtime_dont_use_autosuspend(vpu_dev->dev);
-
unlock_and_return:
mutex_unlock(&vpu_dev->hw_lock);
pm_runtime_put_sync(inst->dev->dev);
@@ -720,8 +713,6 @@ int wave5_vpu_enc_close(struct vpu_instance *inst, u32 *fail_res)
int ret;
int retry = 0;
struct vpu_device *vpu_dev = inst->dev;
- int inst_count = 0;
- struct vpu_instance *inst_elm;
*fail_res = 0;
if (!inst->codec_info)
@@ -764,12 +755,6 @@ int wave5_vpu_enc_close(struct vpu_instance *inst, u32 *fail_res)
}
wave5_vdi_free_dma_memory(vpu_dev, &p_enc_info->vb_task);
-
- list_for_each_entry(inst_elm, &vpu_dev->instances, list)
- inst_count++;
- if (inst_count == 1)
- pm_runtime_dont_use_autosuspend(vpu_dev->dev);
-
mutex_unlock(&vpu_dev->hw_lock);
pm_runtime_put_sync(inst->dev->dev);
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 2/4] media: chips-media: wave5: Fix Null reference while testing fluster
2025-06-23 0:21 [PATCH v3 0/4] Performance improvement of decoder Jackson.lee
2025-06-23 0:21 ` [PATCH v3 1/4] media: chips-media: wave5: Fix SError of kernel panic when closed Jackson.lee
@ 2025-06-23 0:21 ` Jackson.lee
2025-08-29 18:13 ` Nicolas Dufresne
2025-06-23 0:21 ` [PATCH v3 3/4] media: chips-media: wave5: Add WARN_ON to check if dec_output_info is NULL Jackson.lee
` (3 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Jackson.lee @ 2025-06-23 0:21 UTC (permalink / raw)
To: mchehab, hverkuil-cisco, nicolas.dufresne, bob.beckett
Cc: linux-media, linux-kernel, jackson.lee, lafley.kim, b-brnich,
hverkuil, nas.chung
From: Jackson Lee <jackson.lee@chipsnmedia.com>
When multi instances are created/destroyed, many interrupts happens
or structures for decoder are removed.
"struct vpu_instance" this structure is shared for all flow in decoder,
so if the structure is not protected by lock, Null reference exception
could happens sometimes.
IRQ Handler was spilt to two phases and Lock was added as well.
Fixes: 9707a6254a8a ("media: chips-media: wave5: Add the v4l2 layer")
Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
---
.../platform/chips-media/wave5/wave5-helper.c | 23 ++++++-
.../chips-media/wave5/wave5-vpu-dec.c | 5 ++
.../chips-media/wave5/wave5-vpu-enc.c | 5 ++
.../platform/chips-media/wave5/wave5-vpu.c | 69 ++++++++++++++++---
.../platform/chips-media/wave5/wave5-vpuapi.h | 6 ++
5 files changed, 99 insertions(+), 9 deletions(-)
diff --git a/drivers/media/platform/chips-media/wave5/wave5-helper.c b/drivers/media/platform/chips-media/wave5/wave5-helper.c
index 2c9d8cbca6e4..02914dc1ca56 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-helper.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-helper.c
@@ -49,7 +49,7 @@ void wave5_cleanup_instance(struct vpu_instance *inst)
v4l2_fh_del(&inst->v4l2_fh);
v4l2_fh_exit(&inst->v4l2_fh);
}
- list_del_init(&inst->list);
+ kfifo_free(&inst->irq_status);
ida_free(&inst->dev->inst_ida, inst->id);
kfree(inst->codec_info);
kfree(inst);
@@ -61,8 +61,29 @@ int wave5_vpu_release_device(struct file *filp,
{
struct vpu_instance *inst = wave5_to_vpu_inst(filp->private_data);
int ret = 0;
+ unsigned long flags;
v4l2_m2m_ctx_release(inst->v4l2_fh.m2m_ctx);
+ /*
+ * To prevent Null reference exception, the existing irq handler were
+ * separated to two modules.
+ * One is to queue interrupt reason into the irq handler,
+ * the other is irq_thread to call the wave5_vpu_dec_finish_decode
+ * to get decoded frame.
+ * The list of instances should be protected between all flow of the
+ * decoding process, but to protect the list in the irq_handler, spin lock
+ * should be used, and mutex should be used in the irq_thread because spin lock
+ * is not able to be used because mutex is already being used
+ * in the wave5_vpu_dec_finish_decode.
+ * So the spin lock and mutex were used to protect the list in the release function.
+ */
+ ret = mutex_lock_interruptible(&inst->dev->irq_lock);
+ if (ret)
+ return ret;
+ spin_lock_irqsave(&inst->dev->irq_spinlock, flags);
+ list_del_init(&inst->list);
+ spin_unlock_irqrestore(&inst->dev->irq_spinlock, flags);
+ mutex_unlock(&inst->dev->irq_lock);
if (inst->state != VPU_INST_STATE_NONE) {
u32 fail_res;
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
index 216b024c42d8..2df7668575f4 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
@@ -1811,6 +1811,11 @@ static int wave5_vpu_open_dec(struct file *filp)
inst->xfer_func = V4L2_XFER_FUNC_DEFAULT;
init_completion(&inst->irq_done);
+ ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int), GFP_KERNEL);
+ if (ret) {
+ dev_err(inst->dev->dev, "failed to allocate fifo\n");
+ goto cleanup_inst;
+ }
inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL);
if (inst->id < 0) {
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
index cf20f774ed1b..7f1aa392805f 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
@@ -1760,6 +1760,11 @@ static int wave5_vpu_open_enc(struct file *filp)
inst->frame_rate = 30;
init_completion(&inst->irq_done);
+ ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int), GFP_KERNEL);
+ if (ret) {
+ dev_err(inst->dev->dev, "failed to allocate fifo\n");
+ goto cleanup_inst;
+ }
inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL);
if (inst->id < 0) {
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
index b3c633dd3df1..24a9001966e7 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
@@ -51,8 +51,11 @@ static void wave5_vpu_handle_irq(void *dev_id)
u32 seq_done;
u32 cmd_done;
u32 irq_reason;
- struct vpu_instance *inst;
+ u32 irq_subreason;
+ struct vpu_instance *inst, *tmp;
struct vpu_device *dev = dev_id;
+ int val;
+ unsigned long flags;
irq_reason = wave5_vdi_read_register(dev, W5_VPU_VINT_REASON);
seq_done = wave5_vdi_read_register(dev, W5_RET_SEQ_DONE_INSTANCE_INFO);
@@ -60,7 +63,8 @@ static void wave5_vpu_handle_irq(void *dev_id)
wave5_vdi_write_register(dev, W5_VPU_VINT_REASON_CLR, irq_reason);
wave5_vdi_write_register(dev, W5_VPU_VINT_CLEAR, 0x1);
- list_for_each_entry(inst, &dev->instances, list) {
+ spin_lock_irqsave(&dev->irq_spinlock, flags);
+ list_for_each_entry_safe(inst, tmp, &dev->instances, list) {
if (irq_reason & BIT(INT_WAVE5_INIT_SEQ) ||
irq_reason & BIT(INT_WAVE5_ENC_SET_PARAM)) {
@@ -82,14 +86,22 @@ static void wave5_vpu_handle_irq(void *dev_id)
irq_reason & BIT(INT_WAVE5_ENC_PIC)) {
if (cmd_done & BIT(inst->id)) {
cmd_done &= ~BIT(inst->id);
- wave5_vdi_write_register(dev, W5_RET_QUEUE_CMD_DONE_INST,
- cmd_done);
- inst->ops->finish_process(inst);
+ if (dev->irq >= 0) {
+ irq_subreason =
+ wave5_vdi_read_register(dev, W5_VPU_VINT_REASON);
+ if (!(irq_subreason & BIT(INT_WAVE5_DEC_PIC)))
+ wave5_vdi_write_register(dev,
+ W5_RET_QUEUE_CMD_DONE_INST,
+ cmd_done);
+ }
+ val = BIT(INT_WAVE5_DEC_PIC);
+ kfifo_in(&inst->irq_status, &val, sizeof(int));
}
}
-
- wave5_vpu_clear_interrupt(inst, irq_reason);
}
+ spin_unlock_irqrestore(&dev->irq_spinlock, flags);
+
+ up(&dev->irq_sem);
}
static irqreturn_t wave5_vpu_irq_thread(int irq, void *dev_id)
@@ -121,6 +133,35 @@ static enum hrtimer_restart wave5_vpu_timer_callback(struct hrtimer *timer)
return HRTIMER_RESTART;
}
+static int irq_thread(void *data)
+{
+ struct vpu_device *dev = (struct vpu_device *)data;
+ struct vpu_instance *inst, *tmp;
+ int irq_status, ret;
+
+ while (!kthread_should_stop()) {
+ if (down_interruptible(&dev->irq_sem))
+ continue;
+
+ if (kthread_should_stop())
+ break;
+
+ mutex_lock(&dev->irq_lock);
+ list_for_each_entry_safe(inst, tmp, &dev->instances, list) {
+ while (kfifo_len(&inst->irq_status)) {
+ ret = kfifo_out(&inst->irq_status, &irq_status, sizeof(int));
+ if (!ret)
+ break;
+
+ inst->ops->finish_process(inst);
+ }
+ }
+ mutex_unlock(&dev->irq_lock);
+ }
+
+ return 0;
+}
+
static int wave5_vpu_load_firmware(struct device *dev, const char *fw_name,
u32 *revision)
{
@@ -224,6 +265,8 @@ static int wave5_vpu_probe(struct platform_device *pdev)
mutex_init(&dev->dev_lock);
mutex_init(&dev->hw_lock);
+ mutex_init(&dev->irq_lock);
+ spin_lock_init(&dev->irq_spinlock);
dev_set_drvdata(&pdev->dev, dev);
dev->dev = &pdev->dev;
@@ -266,6 +309,10 @@ static int wave5_vpu_probe(struct platform_device *pdev)
}
dev->product = wave5_vpu_get_product_id(dev);
+ sema_init(&dev->irq_sem, 1);
+ INIT_LIST_HEAD(&dev->instances);
+ dev->irq_thread = kthread_run(irq_thread, dev, "irq thread");
+
dev->irq = platform_get_irq(pdev, 0);
if (dev->irq < 0) {
dev_err(&pdev->dev, "failed to get irq resource, falling back to polling\n");
@@ -288,7 +335,6 @@ static int wave5_vpu_probe(struct platform_device *pdev)
}
}
- INIT_LIST_HEAD(&dev->instances);
ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
if (ret) {
dev_err(&pdev->dev, "v4l2_device_register, fail: %d\n", ret);
@@ -351,6 +397,12 @@ static void wave5_vpu_remove(struct platform_device *pdev)
{
struct vpu_device *dev = dev_get_drvdata(&pdev->dev);
+ if (dev->irq_thread) {
+ kthread_stop(dev->irq_thread);
+ up(&dev->irq_sem);
+ dev->irq_thread = NULL;
+ }
+
if (dev->irq < 0) {
kthread_destroy_worker(dev->worker);
hrtimer_cancel(&dev->hrtimer);
@@ -361,6 +413,7 @@ static void wave5_vpu_remove(struct platform_device *pdev)
mutex_destroy(&dev->dev_lock);
mutex_destroy(&dev->hw_lock);
+ mutex_destroy(&dev->irq_lock);
reset_control_assert(dev->resets);
clk_bulk_disable_unprepare(dev->num_clks, dev->clks);
wave5_vpu_enc_unregister_device(dev);
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
index 45615c15beca..bc101397204d 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
@@ -8,6 +8,7 @@
#ifndef VPUAPI_H_INCLUDED
#define VPUAPI_H_INCLUDED
+#include <linux/kfifo.h>
#include <linux/idr.h>
#include <linux/genalloc.h>
#include <media/v4l2-device.h>
@@ -747,6 +748,7 @@ struct vpu_device {
struct video_device *video_dev_enc;
struct mutex dev_lock; /* lock for the src, dst v4l2 queues */
struct mutex hw_lock; /* lock hw configurations */
+ struct mutex irq_lock;
int irq;
enum product_id product;
struct vpu_attr attr;
@@ -764,7 +766,10 @@ struct vpu_device {
struct kthread_worker *worker;
int vpu_poll_interval;
int num_clks;
+ struct task_struct *irq_thread;
+ struct semaphore irq_sem; /* signal to irq_thread when interrupt happens*/
struct reset_control *resets;
+ spinlock_t irq_spinlock; /* protect instances list */
};
struct vpu_instance;
@@ -788,6 +793,7 @@ struct vpu_instance {
enum v4l2_ycbcr_encoding ycbcr_enc;
enum v4l2_quantization quantization;
+ struct kfifo irq_status;
enum vpu_instance_state state;
enum vpu_instance_type type;
const struct vpu_instance_ops *ops;
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 3/4] media: chips-media: wave5: Add WARN_ON to check if dec_output_info is NULL
2025-06-23 0:21 [PATCH v3 0/4] Performance improvement of decoder Jackson.lee
2025-06-23 0:21 ` [PATCH v3 1/4] media: chips-media: wave5: Fix SError of kernel panic when closed Jackson.lee
2025-06-23 0:21 ` [PATCH v3 2/4] media: chips-media: wave5: Fix Null reference while testing fluster Jackson.lee
@ 2025-06-23 0:21 ` Jackson.lee
2025-08-29 18:15 ` Nicolas Dufresne
2025-06-23 0:21 ` [PATCH v3 4/4] media: chips-media: wave5: Improve performance of decoder Jackson.lee
` (2 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Jackson.lee @ 2025-06-23 0:21 UTC (permalink / raw)
To: mchehab, hverkuil-cisco, nicolas.dufresne, bob.beckett
Cc: linux-media, linux-kernel, jackson.lee, lafley.kim, b-brnich,
hverkuil, nas.chung
From: Jackson Lee <jackson.lee@chipsnmedia.com>
The dec_output_info could be a null pointer, so WARN_ON around it
was added.
Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
---
drivers/media/platform/chips-media/wave5/wave5-vpuapi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
index e94d6ebc9f81..5b10f9f49b9f 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
@@ -485,7 +485,7 @@ int wave5_vpu_dec_get_output_info(struct vpu_instance *inst, struct dec_output_i
struct vpu_device *vpu_dev = inst->dev;
struct dec_output_info *disp_info;
- if (!info)
+ if (WARN_ON(!info))
return -EINVAL;
p_dec_info = &inst->codec_info->dec_info;
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 4/4] media: chips-media: wave5: Improve performance of decoder
2025-06-23 0:21 [PATCH v3 0/4] Performance improvement of decoder Jackson.lee
` (2 preceding siblings ...)
2025-06-23 0:21 ` [PATCH v3 3/4] media: chips-media: wave5: Add WARN_ON to check if dec_output_info is NULL Jackson.lee
@ 2025-06-23 0:21 ` Jackson.lee
2025-08-29 18:01 ` Nicolas Dufresne
2025-07-07 0:37 ` [PATCH v3 0/4] Performance improvement " jackson.lee
2025-08-29 15:40 ` Nicolas Dufresne
5 siblings, 1 reply; 12+ messages in thread
From: Jackson.lee @ 2025-06-23 0:21 UTC (permalink / raw)
To: mchehab, hverkuil-cisco, nicolas.dufresne, bob.beckett
Cc: linux-media, linux-kernel, jackson.lee, lafley.kim, b-brnich,
hverkuil, nas.chung
From: Jackson Lee <jackson.lee@chipsnmedia.com>
The current decoding method was to wait until each frame was
decoded after feeding a bitstream. As a result, performance was low
and Wave5 could not achieve max pixel processing rate.
Update driver to use an asynchronous approach for decoding and feeding a
bitstream in order to achieve full capabilities of the device.
WAVE5 supports command-queueing to maximize performance by pipelining
internal commands and by hiding wait cycle taken to receive a command
from Host processor.
Instead of waiting for each command to be executed before sending the
next command, Host processor just places all the commands in the
command-queue and goes on doing other things while the commands in the
queue are processed by VPU.
While Host processor handles its own tasks, it can receive VPU interrupt
request (IRQ).
In this case, host processor can simply exit interrupt service routine
(ISR) without accessing to host interface to read the result of the
command reported by VPU.
After host processor completed its tasks, host processor can read the
command result when host processor needs the reports and does
response processing.
To archive this goal, the device_run() calls v4l2_m2m_job_finish
so that next command can be sent to VPU continuously, if there is
any result, then irq is triggered and gets decoded frames and returns
them to upper layer.
Theses processes work independently each other without waiting
a decoded frame.
Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
---
.../platform/chips-media/wave5/wave5-hw.c | 2 +-
.../chips-media/wave5/wave5-vpu-dec.c | 131 ++++++++++++------
.../platform/chips-media/wave5/wave5-vpuapi.c | 22 ++-
.../platform/chips-media/wave5/wave5-vpuapi.h | 5 +
.../chips-media/wave5/wave5-vpuconfig.h | 1 +
5 files changed, 119 insertions(+), 42 deletions(-)
diff --git a/drivers/media/platform/chips-media/wave5/wave5-hw.c b/drivers/media/platform/chips-media/wave5/wave5-hw.c
index d94cf84c3ee5..687ce6ccf3ae 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-hw.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-hw.c
@@ -102,7 +102,7 @@ static void _wave5_print_reg_err(struct vpu_device *vpu_dev, u32 reg_fail_reason
dev_dbg(dev, "%s: queueing failure: 0x%x\n", func, reg_val);
break;
case WAVE5_SYSERR_RESULT_NOT_READY:
- dev_err(dev, "%s: result not ready: 0x%x\n", func, reg_fail_reason);
+ dev_dbg(dev, "%s: result not ready: 0x%x\n", func, reg_fail_reason);
break;
case WAVE5_SYSERR_ACCESS_VIOLATION_HW:
dev_err(dev, "%s: access violation: 0x%x\n", func, reg_fail_reason);
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
index 2df7668575f4..4554a24df8a1 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
@@ -268,6 +268,7 @@ static void send_eos_event(struct vpu_instance *inst)
v4l2_event_queue_fh(&inst->v4l2_fh, &vpu_event_eos);
inst->eos = false;
+ inst->sent_eos = true;
}
static int handle_dynamic_resolution_change(struct vpu_instance *inst)
@@ -347,13 +348,12 @@ static void wave5_vpu_dec_finish_decode(struct vpu_instance *inst)
struct vb2_v4l2_buffer *dec_buf = NULL;
struct vb2_v4l2_buffer *disp_buf = NULL;
struct vb2_queue *dst_vq = v4l2_m2m_get_dst_vq(m2m_ctx);
- struct queue_status_info q_status;
dev_dbg(inst->dev->dev, "%s: Fetch output info from firmware.", __func__);
ret = wave5_vpu_dec_get_output_info(inst, &dec_info);
if (ret) {
- dev_warn(inst->dev->dev, "%s: could not get output info.", __func__);
+ dev_dbg(inst->dev->dev, "%s: could not get output info.", __func__);
v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
return;
}
@@ -442,18 +442,14 @@ static void wave5_vpu_dec_finish_decode(struct vpu_instance *inst)
spin_unlock_irqrestore(&inst->state_spinlock, flags);
}
- /*
- * During a resolution change and while draining, the firmware may flush
- * the reorder queue regardless of having a matching decoding operation
- * pending. Only terminate the job if there are no more IRQ coming.
- */
- wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, &q_status);
- if (q_status.report_queue_count == 0 &&
- (q_status.instance_queue_count == 0 || dec_info.sequence_changed)) {
- dev_dbg(inst->dev->dev, "%s: finishing job.\n", __func__);
- pm_runtime_mark_last_busy(inst->dev->dev);
- pm_runtime_put_autosuspend(inst->dev->dev);
- v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
+ if (inst->sent_eos &&
+ v4l2_m2m_get_curr_priv(inst->v4l2_m2m_dev)) {
+ struct queue_status_info q_status;
+
+ wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, &q_status);
+ if (q_status.report_queue_count == 0 &&
+ q_status.instance_queue_count == 0)
+ v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
}
}
@@ -1146,8 +1142,8 @@ static int write_to_ringbuffer(struct vpu_instance *inst, void *buffer, size_t b
static int fill_ringbuffer(struct vpu_instance *inst)
{
struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
- struct v4l2_m2m_buffer *buf, *n;
- int ret;
+ struct vpu_src_buffer *vpu_buf;
+ int ret = 0;
if (m2m_ctx->last_src_buf) {
struct vpu_src_buffer *vpu_buf = wave5_to_vpu_src_buf(m2m_ctx->last_src_buf);
@@ -1158,9 +1154,8 @@ static int fill_ringbuffer(struct vpu_instance *inst)
}
}
- v4l2_m2m_for_each_src_buf_safe(m2m_ctx, buf, n) {
- struct vb2_v4l2_buffer *vbuf = &buf->vb;
- struct vpu_src_buffer *vpu_buf = wave5_to_vpu_src_buf(vbuf);
+ list_for_each_entry(vpu_buf, &inst->avail_src_bufs, list) {
+ struct vb2_v4l2_buffer *vbuf = &vpu_buf->v4l2_m2m_buf.vb;
struct vpu_buf *ring_buffer = &inst->bitstream_vbuf;
size_t src_size = vb2_get_plane_payload(&vbuf->vb2_buf, 0);
void *src_buf = vb2_plane_vaddr(&vbuf->vb2_buf, 0);
@@ -1220,9 +1215,13 @@ static int fill_ringbuffer(struct vpu_instance *inst)
dev_dbg(inst->dev->dev, "last src buffer written to the ring buffer\n");
break;
}
+
+ inst->queuing_num++;
+ list_del_init(&vpu_buf->list);
+ break;
}
- return 0;
+ return ret;
}
static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb)
@@ -1236,6 +1235,11 @@ static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb)
vbuf->sequence = inst->queued_src_buf_num++;
v4l2_m2m_buf_queue(m2m_ctx, vbuf);
+
+ INIT_LIST_HEAD(&vpu_buf->list);
+ mutex_lock(&inst->feed_lock);
+ list_add_tail(&vpu_buf->list, &inst->avail_src_bufs);
+ mutex_unlock(&inst->feed_lock);
}
static void wave5_vpu_dec_buf_queue_dst(struct vb2_buffer *vb)
@@ -1287,10 +1291,13 @@ static void wave5_vpu_dec_buf_queue(struct vb2_buffer *vb)
__func__, vb->type, vb->index, vb2_plane_size(&vbuf->vb2_buf, 0),
vb2_plane_size(&vbuf->vb2_buf, 1), vb2_plane_size(&vbuf->vb2_buf, 2));
- if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE)
+ if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {
+ if (inst->empty_queue)
+ inst->empty_queue = false;
wave5_vpu_dec_buf_queue_src(vb);
- else if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE)
+ } else if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) {
wave5_vpu_dec_buf_queue_dst(vb);
+ }
}
static int wave5_vpu_dec_allocate_ring_buffer(struct vpu_instance *inst)
@@ -1385,6 +1392,13 @@ static int streamoff_output(struct vb2_queue *q)
dma_addr_t new_rd_ptr;
struct dec_output_info dec_info;
unsigned int i;
+ struct vpu_src_buffer *vpu_buf, *tmp;
+
+ inst->retry = false;
+ inst->queuing_num = 0;
+
+ list_for_each_entry_safe(vpu_buf, tmp, &inst->avail_src_bufs, list)
+ list_del_init(&vpu_buf->list);
for (i = 0; i < v4l2_m2m_num_dst_bufs_ready(m2m_ctx); i++) {
ret = wave5_vpu_dec_set_disp_flag(inst, i);
@@ -1474,6 +1488,8 @@ static void wave5_vpu_dec_stop_streaming(struct vb2_queue *q)
dev_dbg(inst->dev->dev, "%s: type: %u\n", __func__, q->type);
pm_runtime_resume_and_get(inst->dev->dev);
+ inst->empty_queue = false;
+ inst->sent_eos = false;
while (check_cmd) {
struct queue_status_info q_status;
@@ -1481,11 +1497,11 @@ static void wave5_vpu_dec_stop_streaming(struct vb2_queue *q)
wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, &q_status);
- if (q_status.report_queue_count == 0)
+ if ((inst->state == VPU_INST_STATE_STOP || q_status.instance_queue_count == 0) &&
+ q_status.report_queue_count == 0)
break;
- if (wave5_vpu_wait_interrupt(inst, VPU_DEC_TIMEOUT) < 0)
- break;
+ wave5_vpu_wait_interrupt(inst, VPU_DEC_STOP_TIMEOUT);
if (wave5_vpu_dec_get_output_info(inst, &dec_output_info))
dev_dbg(inst->dev->dev, "there is no output info\n");
@@ -1577,13 +1593,24 @@ static void wave5_vpu_dec_device_run(void *priv)
struct queue_status_info q_status;
u32 fail_res = 0;
int ret = 0;
+ unsigned long flags;
dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with new bitstream data", __func__);
pm_runtime_resume_and_get(inst->dev->dev);
- ret = fill_ringbuffer(inst);
- if (ret) {
- dev_warn(inst->dev->dev, "Filling ring buffer failed\n");
- goto finish_job_and_return;
+ if (!inst->retry) {
+ mutex_lock(&inst->feed_lock);
+ ret = fill_ringbuffer(inst);
+ mutex_unlock(&inst->feed_lock);
+ if (ret < 0) {
+ dev_warn(inst->dev->dev, "Filling ring buffer failed\n");
+ goto finish_job_and_return;
+ } else if (!inst->eos &&
+ inst->queuing_num == 0 &&
+ inst->state == VPU_INST_STATE_PIC_RUN) {
+ dev_dbg(inst->dev->dev, "%s: no bitstream for feeding, so skip ", __func__);
+ inst->empty_queue = true;
+ goto finish_job_and_return;
+ }
}
switch (inst->state) {
@@ -1608,7 +1635,9 @@ static void wave5_vpu_dec_device_run(void *priv)
}
spin_unlock_irqrestore(&inst->state_spinlock, flags);
} else {
+ spin_lock_irqsave(&inst->state_spinlock, flags);
switch_state(inst, VPU_INST_STATE_INIT_SEQ);
+ spin_unlock_irqrestore(&inst->state_spinlock, flags);
}
break;
@@ -1619,8 +1648,9 @@ static void wave5_vpu_dec_device_run(void *priv)
* we had a chance to switch, which leads to an invalid state
* change.
*/
+ spin_lock_irqsave(&inst->state_spinlock, flags);
switch_state(inst, VPU_INST_STATE_PIC_RUN);
-
+ spin_unlock_irqrestore(&inst->state_spinlock, flags);
/*
* During DRC, the picture decoding remains pending, so just leave the job
* active until this decode operation completes.
@@ -1634,12 +1664,14 @@ static void wave5_vpu_dec_device_run(void *priv)
ret = wave5_prepare_fb(inst);
if (ret) {
dev_warn(inst->dev->dev, "Framebuffer preparation, fail: %d\n", ret);
+ spin_lock_irqsave(&inst->state_spinlock, flags);
switch_state(inst, VPU_INST_STATE_STOP);
+ spin_unlock_irqrestore(&inst->state_spinlock, flags);
break;
}
if (q_status.instance_queue_count) {
- dev_dbg(inst->dev->dev, "%s: leave with active job", __func__);
+ v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
return;
}
@@ -1650,26 +1682,42 @@ static void wave5_vpu_dec_device_run(void *priv)
dev_err(inst->dev->dev,
"Frame decoding on m2m context (%p), fail: %d (result: %d)\n",
m2m_ctx, ret, fail_res);
- break;
+ goto finish_job_and_return;
+ }
+
+ if (fail_res == WAVE5_SYSERR_QUEUEING_FAIL) {
+ inst->retry = true;
+ } else {
+ inst->retry = false;
+ if (!inst->eos)
+ inst->queuing_num--;
}
- /* Return so that we leave this job active */
- dev_dbg(inst->dev->dev, "%s: leave with active job", __func__);
- return;
- default:
- WARN(1, "Execution of a job in state %s illegal.\n", state_to_str(inst->state));
break;
+ default:
+ dev_dbg(inst->dev->dev, "Execution of a job in state %s illegal.\n",
+ state_to_str(inst->state));
}
finish_job_and_return:
dev_dbg(inst->dev->dev, "%s: leave and finish job", __func__);
pm_runtime_mark_last_busy(inst->dev->dev);
pm_runtime_put_autosuspend(inst->dev->dev);
- v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
+ /*
+ * After receiving CMD_STOP, there is no input, but we have to run device_run
+ * to send DEC_PIC command until display index == -1, so job_finish was always
+ * called in the device_run to archive it, the logic was very wasteful
+ * in power and CPU time.
+ * If EOS is passed, device_run will not call job_finish no more, it is called
+ * only if HW is idle status in order to reduce overhead.
+ */
+ if (!inst->sent_eos)
+ v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
}
static void wave5_vpu_dec_job_abort(void *priv)
{
struct vpu_instance *inst = priv;
+ struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
int ret;
ret = switch_state(inst, VPU_INST_STATE_STOP);
@@ -1680,6 +1728,8 @@ static void wave5_vpu_dec_job_abort(void *priv)
if (ret)
dev_warn(inst->dev->dev,
"Setting EOS for the bitstream, fail: %d\n", ret);
+
+ v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
}
static int wave5_vpu_dec_job_ready(void *priv)
@@ -1715,7 +1765,8 @@ static int wave5_vpu_dec_job_ready(void *priv)
"No capture buffer ready to decode!\n");
break;
} else if (!wave5_is_draining_or_eos(inst) &&
- !v4l2_m2m_num_src_bufs_ready(m2m_ctx)) {
+ (!v4l2_m2m_num_src_bufs_ready(m2m_ctx) ||
+ inst->empty_queue)) {
dev_dbg(inst->dev->dev,
"No bitstream data to decode!\n");
break;
@@ -1755,6 +1806,8 @@ static int wave5_vpu_open_dec(struct file *filp)
inst->ops = &wave5_vpu_dec_inst_ops;
spin_lock_init(&inst->state_spinlock);
+ mutex_init(&inst->feed_lock);
+ INIT_LIST_HEAD(&inst->avail_src_bufs);
inst->codec_info = kzalloc(sizeof(*inst->codec_info), GFP_KERNEL);
if (!inst->codec_info)
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
index 5b10f9f49b9f..edbe69540ef1 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
@@ -207,6 +207,7 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
int retry = 0;
struct vpu_device *vpu_dev = inst->dev;
int i;
+ struct dec_output_info dec_info;
*fail_res = 0;
if (!inst->codec_info)
@@ -227,11 +228,26 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
goto unlock_and_return;
}
- if (*fail_res == WAVE5_SYSERR_VPU_STILL_RUNNING &&
- retry++ >= MAX_FIRMWARE_CALL_RETRY) {
+ if (ret == 0)
+ break;
+
+ if (*fail_res != WAVE5_SYSERR_VPU_STILL_RUNNING) {
+ dev_warn(inst->dev->dev, "dec_finish_seq timed out\n");
+ goto unlock_and_return;
+ }
+
+ if (retry++ >= MAX_FIRMWARE_CALL_RETRY) {
ret = -ETIMEDOUT;
goto unlock_and_return;
}
+
+ mutex_unlock(&vpu_dev->hw_lock);
+ wave5_vpu_dec_get_output_info(inst, &dec_info);
+ ret = mutex_lock_interruptible(&vpu_dev->hw_lock);
+ if (ret) {
+ pm_runtime_put_sync(inst->dev->dev);
+ return ret;
+ }
} while (ret != 0);
dev_dbg(inst->dev->dev, "%s: dec_finish_seq complete\n", __func__);
@@ -248,6 +264,8 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
wave5_vdi_free_dma_memory(vpu_dev, &p_dec_info->vb_task);
+ mutex_destroy(&inst->feed_lock);
+
unlock_and_return:
mutex_unlock(&vpu_dev->hw_lock);
pm_runtime_put_sync(inst->dev->dev);
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
index bc101397204d..adfbc104f939 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
@@ -818,6 +818,11 @@ struct vpu_instance {
bool cbcr_interleave;
bool nv21;
bool eos;
+ bool sent_eos; /* check if EOS is sent to application */
+ bool retry; /* retry to feed bitstream if failure reason is WAVE5_SYSERR_QUEUEING_FAIL*/
+ int queuing_num; /* count of bitstream queued */
+ struct mutex feed_lock; /* lock for feeding bitstream buffers */
+ bool empty_queue;
struct vpu_buf bitstream_vbuf;
dma_addr_t last_rd_ptr;
size_t remaining_consumed_bytes;
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuconfig.h b/drivers/media/platform/chips-media/wave5/wave5-vpuconfig.h
index 1ea9f5f31499..4ebd48d5550e 100644
--- a/drivers/media/platform/chips-media/wave5/wave5-vpuconfig.h
+++ b/drivers/media/platform/chips-media/wave5/wave5-vpuconfig.h
@@ -59,6 +59,7 @@
// application specific configuration
#define VPU_ENC_TIMEOUT 60000
#define VPU_DEC_TIMEOUT 60000
+#define VPU_DEC_STOP_TIMEOUT 10
// for WAVE encoder
#define USE_SRC_PRP_AXI 0
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* RE: [PATCH v3 0/4] Performance improvement of decoder
2025-06-23 0:21 [PATCH v3 0/4] Performance improvement of decoder Jackson.lee
` (3 preceding siblings ...)
2025-06-23 0:21 ` [PATCH v3 4/4] media: chips-media: wave5: Improve performance of decoder Jackson.lee
@ 2025-07-07 0:37 ` jackson.lee
2025-08-12 7:51 ` jackson.lee
2025-08-29 15:40 ` Nicolas Dufresne
5 siblings, 1 reply; 12+ messages in thread
From: jackson.lee @ 2025-07-07 0:37 UTC (permalink / raw)
To: mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
nicolas.dufresne@collabora.com, bob.beckett@collabora.com
Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung
Hi Nicolas
Can you review these patch series?
These patch series includes the below changes as well.
1. The explanation for change of Fix Null reference while testing fluster
2. Reducing the active loop because of overhead and explanation for change of Improve performance decoder
3. Making new patch set in order to check EINVAL and add a warn_on around it.
Thanks
Jackson
> -----Original Message-----
> From: jackson.lee <jackson.lee@chipsnmedia.com>
> Sent: Monday, June 23, 2025 9:22 AM
> To: mchehab@kernel.org; hverkuil-cisco@xs4all.nl;
> nicolas.dufresne@collabora.com; bob.beckett@collabora.com
> Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; jackson.lee
> <jackson.lee@chipsnmedia.com>; lafley.kim <lafley.kim@chipsnmedia.com>; b-
> brnich@ti.com; hverkuil@xs4all.nl; Nas Chung <nas.chung@chipsnmedia.com>
> Subject: [PATCH v3 0/4] Performance improvement of decoder
>
> From: Jackson Lee <jackson.lee@chipsnmedia.com>
>
> v4l2-compliance results:
> ========================
>
> v4l2-compliance 1.28.1-5233, 64 bits, 64-bit time_t
>
> Buffer ioctls:
> warn: v4l2-test-buffers.cpp(693): VIDIOC_CREATE_BUFS not
> supported
> warn: v4l2-test-buffers.cpp(693): VIDIOC_CREATE_BUFS not
> supported
> test VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF: OK
> test CREATE_BUFS maximum buffers: OK
> test VIDIOC_EXPBUF: OK
> test Requests: OK (Not Supported)
>
> Total for wave5-dec device /dev/video0: 46, Succeeded: 46, Failed: 0,
> Warnings: 2 Total for wave5-enc device /dev/video1: 46, Succeeded: 46,
> Failed: 0, Warnings: 0
>
> Fluster test results:
> =====================
>
> Running test suite JCT-VC-HEVC_V1 with decoder GStreamer-H.265-V4L2-Gst1.0
> Using 3 parallel job(s)
> Ran 133/147 tests successfully in 40.114 secs
>
> (1 test fails because of not supporting to parse multi frames, 1 test
> fails because of a missing frame and slight corruption,
> 2 tests fail because of sizes which are incompatible with the IP, 11
> tests fail because of unsupported 10 bit format)
>
>
> Running test suite JVT-AVC_V1 with decoder GStreamer-H.264-V4L2-Gst1.0
> Using 3 parallel job(s)
> Ran 78/135 tests successfully in 43.364 secs
>
> (57 fail because the hardware is unable to decode MBAFF / FMO / Field /
> Extended profile streams.)
>
> Running test suite JVT-FR-EXT with decoder GStreamer-H.264-V4L2-Gst1.0
> Using 3 parallel job(s)
> Ran 25/69 tests successfully in 40.411 secs
>
> (44 fail because the hardware does not support field encoded and 422
> encoded stream)
>
> Seek test
> =====================
> 1. gst-play-1.0 seek.264
> 2. this will use waylandsink since gst-play-1.0 uses playbin.
> if you don't want to hook up display,
> you can run gst-play-1.0 seek.264 --videosink=fakevideosink instead 3.
> Let pipeline run for 2-3 seconds 4. press SPACE key to pause 5. press 0 to
> reset press SPACE to start again
>
> gst-play-1.0 seek.264 --videosink=fakevideosink Press 'k' to see a list of
> keyboard shortcuts.
> Now playing /root/seek.264
> Redistribute latency...
> Redistribute latency...
> Redistribute latency...
> Redistribute latency...
> Redistribute latency...aused
> 0:00:09.9 / 0:00:09.7
> Reached end of play list.
>
> Sequence Change test
> =====================
> gst-launch-1.0 filesrc location=./drc.h264 ! h264parse ! v4l2h264dec !
> filesink location=./h264_output_420.yuv Setting pipeline to PAUSED ...
> Pipeline is PREROLLING ...
> Redistribute latency...
> Pipeline is PREROLLED ...
> Setting pipeline to PLAYING ...
> New clock: GstSystemClock
> Redistribute latency...
> Got EOS from element "pipeline0".
> Execution ended after 0:00:00.113620590
> Setting pipeline to NULL ...
> Freeing pipeline ...
>
> Change since v2:
> ==================
> * For [PATCH v3 4/4] media: chips-media: wave5: Improve performance of
> decoder
> - squash v2's #3~#6 to #4 patch of v3
>
> Change since v1:
> ===================
> * For [PATCH v2 2/7] media: chips-media: wave5: Improve performance of
> decoder
> - change log to dbg level
>
> Change since v0:
> ===================
> * For [PATCH v1 2/7] media: chips-media: wave5: Improve performance of
> decoder
> - separates the previous patch to a few patches
>
> * For [PATCH v1 3/7] media: chips-media: wave5: Fix not to be closed
> - separated from the previous patch of performance improvement of
> decoder
>
> * For [PATCH v1 4/7] media: chips-media: wave5: Use spinlock whenever
> state is changed
> - separated from the previous patch of performance improvement of
> decoder
>
> * For [PATCH v1 5/7] media: chips-media: wave5: Fix not to free resources
> normally when
> instance was destroyed
> - separated from the previous patch of performance improvement of
> decoder
>
> * For [PATCH v1 7/7] media: chips-media: wave5: Fix SError of kernel panic
> when closed
> - separated from the previous patch of performance improvement of
> decoder
>
>
> Jackson Lee (4):
> media: chips-media: wave5: Fix SError of kernel panic when closed
> media: chips-media: wave5: Fix Null reference while testing fluster
> media: chips-media: wave5: Add WARN_ON to check if dec_output_info is
> NULL
> media: chips-media: wave5: Improve performance of decoder
>
> .../platform/chips-media/wave5/wave5-helper.c | 23 ++-
> .../platform/chips-media/wave5/wave5-hw.c | 2 +-
> .../chips-media/wave5/wave5-vpu-dec.c | 139 ++++++++++++------
> .../chips-media/wave5/wave5-vpu-enc.c | 8 +-
> .../platform/chips-media/wave5/wave5-vpu.c | 71 +++++++--
> .../platform/chips-media/wave5/wave5-vpuapi.c | 37 ++---
> .../platform/chips-media/wave5/wave5-vpuapi.h | 11 ++
> .../chips-media/wave5/wave5-vpuconfig.h | 1 +
> 8 files changed, 219 insertions(+), 73 deletions(-)
>
> --
> 2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH v3 0/4] Performance improvement of decoder
2025-07-07 0:37 ` [PATCH v3 0/4] Performance improvement " jackson.lee
@ 2025-08-12 7:51 ` jackson.lee
0 siblings, 0 replies; 12+ messages in thread
From: jackson.lee @ 2025-08-12 7:51 UTC (permalink / raw)
To: jackson.lee, mchehab@kernel.org, hverkuil-cisco@xs4all.nl,
nicolas.dufresne@collabora.com, bob.beckett@collabora.com
Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org,
lafley.kim, b-brnich@ti.com, hverkuil@xs4all.nl, Nas Chung
Hi Nicolas
Can you please review these patch series ?
Thanks
> -----Original Message-----
> From: jackson.lee <jackson.lee@chipsnmedia.com>
> Sent: Monday, July 7, 2025 9:37 AM
> To: mchehab@kernel.org; hverkuil-cisco@xs4all.nl;
> nicolas.dufresne@collabora.com; bob.beckett@collabora.com
> Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org; lafley.kim
> <lafley.kim@chipsnmedia.com>; b-brnich@ti.com; hverkuil@xs4all.nl; Nas
> Chung <nas.chung@chipsnmedia.com>
> Subject: RE: [PATCH v3 0/4] Performance improvement of decoder
>
> Hi Nicolas
>
> Can you review these patch series?
>
> These patch series includes the below changes as well.
>
> 1. The explanation for change of Fix Null reference while testing fluster
> 2. Reducing the active loop because of overhead and explanation for change
> of Improve performance decoder 3. Making new patch set in order to check
> EINVAL and add a warn_on around it.
>
> Thanks
> Jackson
>
> > -----Original Message-----
> > From: jackson.lee <jackson.lee@chipsnmedia.com>
> > Sent: Monday, June 23, 2025 9:22 AM
> > To: mchehab@kernel.org; hverkuil-cisco@xs4all.nl;
> > nicolas.dufresne@collabora.com; bob.beckett@collabora.com
> > Cc: linux-media@vger.kernel.org; linux-kernel@vger.kernel.org;
> > jackson.lee <jackson.lee@chipsnmedia.com>; lafley.kim
> > <lafley.kim@chipsnmedia.com>; b- brnich@ti.com; hverkuil@xs4all.nl;
> > Nas Chung <nas.chung@chipsnmedia.com>
> > Subject: [PATCH v3 0/4] Performance improvement of decoder
> >
> > From: Jackson Lee <jackson.lee@chipsnmedia.com>
> >
> > v4l2-compliance results:
> > ========================
> >
> > v4l2-compliance 1.28.1-5233, 64 bits, 64-bit time_t
> >
> > Buffer ioctls:
> > warn: v4l2-test-buffers.cpp(693): VIDIOC_CREATE_BUFS
> > not supported
> > warn: v4l2-test-buffers.cpp(693): VIDIOC_CREATE_BUFS
> > not supported
> > test VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF: OK
> > test CREATE_BUFS maximum buffers: OK
> > test VIDIOC_EXPBUF: OK
> > test Requests: OK (Not Supported)
> >
> > Total for wave5-dec device /dev/video0: 46, Succeeded: 46, Failed: 0,
> > Warnings: 2 Total for wave5-enc device /dev/video1: 46, Succeeded: 46,
> > Failed: 0, Warnings: 0
> >
> > Fluster test results:
> > =====================
> >
> > Running test suite JCT-VC-HEVC_V1 with decoder
> > GStreamer-H.265-V4L2-Gst1.0 Using 3 parallel job(s)
> > Ran 133/147 tests successfully in 40.114 secs
> >
> > (1 test fails because of not supporting to parse multi frames, 1 test
> > fails because of a missing frame and slight corruption,
> > 2 tests fail because of sizes which are incompatible with the IP, 11
> > tests fail because of unsupported 10 bit format)
> >
> >
> > Running test suite JVT-AVC_V1 with decoder GStreamer-H.264-V4L2-Gst1.0
> > Using 3 parallel job(s)
> > Ran 78/135 tests successfully in 43.364 secs
> >
> > (57 fail because the hardware is unable to decode MBAFF / FMO / Field
> > / Extended profile streams.)
> >
> > Running test suite JVT-FR-EXT with decoder GStreamer-H.264-V4L2-Gst1.0
> > Using 3 parallel job(s)
> > Ran 25/69 tests successfully in 40.411 secs
> >
> > (44 fail because the hardware does not support field encoded and 422
> > encoded stream)
> >
> > Seek test
> > =====================
> > 1. gst-play-1.0 seek.264
> > 2. this will use waylandsink since gst-play-1.0 uses playbin.
> > if you don't want to hook up display,
> > you can run gst-play-1.0 seek.264 --videosink=fakevideosink instead 3.
> > Let pipeline run for 2-3 seconds 4. press SPACE key to pause 5. press
> > 0 to reset press SPACE to start again
> >
> > gst-play-1.0 seek.264 --videosink=fakevideosink Press 'k' to see a
> > list of keyboard shortcuts.
> > Now playing /root/seek.264
> > Redistribute latency...
> > Redistribute latency...
> > Redistribute latency...
> > Redistribute latency...
> > Redistribute latency...aused
> > 0:00:09.9 / 0:00:09.7
> > Reached end of play list.
> >
> > Sequence Change test
> > =====================
> > gst-launch-1.0 filesrc location=./drc.h264 ! h264parse ! v4l2h264dec !
> > filesink location=./h264_output_420.yuv Setting pipeline to PAUSED ...
> > Pipeline is PREROLLING ...
> > Redistribute latency...
> > Pipeline is PREROLLED ...
> > Setting pipeline to PLAYING ...
> > New clock: GstSystemClock
> > Redistribute latency...
> > Got EOS from element "pipeline0".
> > Execution ended after 0:00:00.113620590 Setting pipeline to NULL ...
> > Freeing pipeline ...
> >
> > Change since v2:
> > ==================
> > * For [PATCH v3 4/4] media: chips-media: wave5: Improve performance of
> > decoder
> > - squash v2's #3~#6 to #4 patch of v3
> >
> > Change since v1:
> > ===================
> > * For [PATCH v2 2/7] media: chips-media: wave5: Improve performance of
> > decoder
> > - change log to dbg level
> >
> > Change since v0:
> > ===================
> > * For [PATCH v1 2/7] media: chips-media: wave5: Improve performance of
> > decoder
> > - separates the previous patch to a few patches
> >
> > * For [PATCH v1 3/7] media: chips-media: wave5: Fix not to be closed
> > - separated from the previous patch of performance improvement of
> > decoder
> >
> > * For [PATCH v1 4/7] media: chips-media: wave5: Use spinlock whenever
> > state is changed
> > - separated from the previous patch of performance improvement of
> > decoder
> >
> > * For [PATCH v1 5/7] media: chips-media: wave5: Fix not to free
> > resources normally when
> > instance was destroyed
> > - separated from the previous patch of performance improvement of
> > decoder
> >
> > * For [PATCH v1 7/7] media: chips-media: wave5: Fix SError of kernel
> > panic when closed
> > - separated from the previous patch of performance improvement of
> > decoder
> >
> >
> > Jackson Lee (4):
> > media: chips-media: wave5: Fix SError of kernel panic when closed
> > media: chips-media: wave5: Fix Null reference while testing fluster
> > media: chips-media: wave5: Add WARN_ON to check if dec_output_info is
> > NULL
> > media: chips-media: wave5: Improve performance of decoder
> >
> > .../platform/chips-media/wave5/wave5-helper.c | 23 ++-
> > .../platform/chips-media/wave5/wave5-hw.c | 2 +-
> > .../chips-media/wave5/wave5-vpu-dec.c | 139 ++++++++++++------
> > .../chips-media/wave5/wave5-vpu-enc.c | 8 +-
> > .../platform/chips-media/wave5/wave5-vpu.c | 71 +++++++--
> > .../platform/chips-media/wave5/wave5-vpuapi.c | 37 ++---
> > .../platform/chips-media/wave5/wave5-vpuapi.h | 11 ++
> > .../chips-media/wave5/wave5-vpuconfig.h | 1 +
> > 8 files changed, 219 insertions(+), 73 deletions(-)
> >
> > --
> > 2.43.0
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 0/4] Performance improvement of decoder
2025-06-23 0:21 [PATCH v3 0/4] Performance improvement of decoder Jackson.lee
` (4 preceding siblings ...)
2025-07-07 0:37 ` [PATCH v3 0/4] Performance improvement " jackson.lee
@ 2025-08-29 15:40 ` Nicolas Dufresne
5 siblings, 0 replies; 12+ messages in thread
From: Nicolas Dufresne @ 2025-08-29 15:40 UTC (permalink / raw)
To: Jackson.lee, mchehab, hverkuil-cisco, bob.beckett
Cc: linux-media, linux-kernel, lafley.kim, b-brnich, hverkuil,
nas.chung
[-- Attachment #1: Type: text/plain, Size: 9590 bytes --]
Hi Jackson,
Le lundi 23 juin 2025 à 09:21 +0900, Jackson.lee a écrit :
> From: Jackson Lee <jackson.lee@chipsnmedia.com>
>
> v4l2-compliance results:
> ========================
>
> v4l2-compliance 1.28.1-5233, 64 bits, 64-bit time_t
>
> Buffer ioctls:
> warn: v4l2-test-buffers.cpp(693): VIDIOC_CREATE_BUFS not supported
> warn: v4l2-test-buffers.cpp(693): VIDIOC_CREATE_BUFS not supported
> test VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF: OK
> test CREATE_BUFS maximum buffers: OK
> test VIDIOC_EXPBUF: OK
> test Requests: OK (Not Supported)
>
> Total for wave5-dec device /dev/video0: 46, Succeeded: 46, Failed: 0, Warnings: 2 Total for wave5-enc device /dev/video1: 46, Succeeded: 46, Failed: 0, Warnings: 0
>
> Fluster test results:
> =====================
>
> Running test suite JCT-VC-HEVC_V1 with decoder GStreamer-H.265-V4L2-Gst1.0 Using 3 parallel job(s)
> Ran 133/147 tests successfully in 40.114 secs
>
> (1 test fails because of not supporting to parse multi frames, 1 test fails because of a missing frame and slight corruption,
> 2 tests fail because of sizes which are incompatible with the IP, 11 tests fail because of unsupported 10 bit format)
>
>
> Running test suite JVT-AVC_V1 with decoder GStreamer-H.264-V4L2-Gst1.0 Using 3 parallel job(s)
> Ran 78/135 tests successfully in 43.364 secs
>
> (57 fail because the hardware is unable to decode MBAFF / FMO / Field / Extended profile streams.)
>
> Running test suite JVT-FR-EXT with decoder GStreamer-H.264-V4L2-Gst1.0 Using 3 parallel job(s)
> Ran 25/69 tests successfully in 40.411 secs
Ack, same results here and consistent.
>
> (44 fail because the hardware does not support field encoded and 422 encoded stream)
>
> Seek test
> =====================
> 1. gst-play-1.0 seek.264
> 2. this will use waylandsink since gst-play-1.0 uses playbin.
> if you don't want to hook up display,
> you can run gst-play-1.0 seek.264 --videosink=fakevideosink instead 3. Let pipeline run for 2-3 seconds 4. press SPACE key to pause 5. press 0 to reset press SPACE to start again
>
> gst-play-1.0 seek.264 --videosink=fakevideosink Press 'k' to see a list of keyboard shortcuts.
> Now playing /root/seek.264
> Redistribute latency...
> Redistribute latency...
> Redistribute latency...
> Redistribute latency...
> Redistribute latency...aused
> 0:00:09.9 / 0:00:09.7
> Reached end of play list.
>
> Sequence Change test
> =====================
> gst-launch-1.0 filesrc location=./drc.h264 ! h264parse ! v4l2h264dec ! filesink location=./h264_output_420.yuv Setting pipeline to PAUSED ...
> Pipeline is PREROLLING ...
> Redistribute latency...
> Pipeline is PREROLLED ...
> Setting pipeline to PLAYING ...
> New clock: GstSystemClock
> Redistribute latency...
> Got EOS from element "pipeline0".
> Execution ended after 0:00:00.113620590
> Setting pipeline to NULL ...
> Freeing pipeline ...
I tried and reproduce your results. I've used an ISOMP4 file, nothing big, 720p
10min video. After 30s of seeking back and forth I've got a deadlock, with the
following kernel log:
vdec 4210000.video-codec: wave5_vpu_firmware_command_queue_error_check: still running: 0x1000
I don't know if its worse then before, but the bug is severe enough to be
concern. To reproduce easily, I pick a longer video, seek forward close to the
end, and then seek back (gst-play so smaller steps back) very quickly till it
reaches position 0, and repeat.
This happened without resolution change happening concurrent to seeks, just a
flat, single resolution video. Once I do the same test with an agressive DRC in
place, I hit kernel crash. I will share in private email the DRC H.264 sample
I'm using, and how to make it bigger so its manually seekable.
[ 678.819859] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000358
[ 678.828746] Mem abort info:
[ 678.832378] ESR = 0x0000000096000004
[ 678.838555] EC = 0x25: DABT (current EL), IL = 32 bits
[ 678.845921] SET = 0, FnV = 0
[ 678.849882] EA = 0, S1PTW = 0
[ 678.854241] FSC = 0x04: level 0 translation fault
[ 678.860098] Data abort info:
[ 678.864410] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 678.871000] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 678.877384] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 678.887785] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000882215000
[ 678.901210] [0000000000000358] pgd=0000000000000000, p4d=0000000000000000
[ 678.908585] Internal error: Oops: 0000000096000004 [#1] SMP
[ 678.914266] Modules linked in: rfkill qrtr rpmsg_ctrl rpmsg_char phy_cadence_torrent tps6594_esm tps6594_pfsm tps6594_regulator rtc_tps6594 ti_am335x_adc kfifo_buf pinctrl_tps6594 gpio_regmap cdns3 cdns_usb_common mux_gpio omap_mailbox ti_k3_r5_remoteproc phy_j721e_wiz phy_can_transceiver wave5 v4l2_mem2mem powervr videobuf2_dma_contig drm_gpuvm videobuf2_memops videobuf2_v4l2 drm_exec at24 drm_shmem_helper tps6594_i2c videodev gpu_sched tps6594_core videobuf2_common k3_j72xx_bandgap ti_k3_dsp_remoteproc mc drm_kms_helper ti_k3_common sa2ul ti_am335x_tscadc authenc m_can_platform m_can can_dev cdns3_ti rti_wdt fuse drm dm_mod backlight ipv6
[ 678.971012] CPU: 1 UID: 0 PID: 51 Comm: kworker/1:1 Not tainted 6.17.0-rc3-jacinto+ #2 PREEMPT
[ 678.979704] Hardware name: Texas Instruments J721S2 EVM (DT)
[ 678.985358] Workqueue: events v4l2_m2m_device_run_work [v4l2_mem2mem]
[ 678.991811] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 678.998767] pc : v4l2_m2m_try_run+0x74/0x13c [v4l2_mem2mem]
[ 679.004345] lr : v4l2_m2m_try_run+0x60/0x13c [v4l2_mem2mem]
[ 679.009922] sp : ffff800083333d60
[ 679.013232] x29: ffff800083333d60 x28: 0000000000000000 x27: 0000000000000000
[ 679.020358] x26: ffff000b7dfa8468 x25: 0000000000000000 x24: ffff000800012205
[ 679.027480] x23: ffff0008011aa300 x22: ffff000b7dfa8440 x21: ffff0008053f2220
[ 679.034602] x20: ffff000800012200 x19: ffff0008053f2000 x18: 0000000000000000
[ 679.041724] x17: 0000000000000000 x16: 0000000000000000 x15: 009f729c552fd3f8
[ 679.048846] x14: 00000000000002ae x13: ffff8000811f4790 x12: 0000000000000537
[ 679.055968] x11: 00000000000000c0 x10: 0000000000000ab0 x9 : ffff800083333c80
[ 679.063090] x8 : ffff0008011aae10 x7 : 0000000000002d02 x6 : 000000000000ba6b
[ 679.070212] x5 : ffff000827f68b40 x4 : ffff0008011aa300 x3 : ffff00080b2bb480
[ 679.077333] x2 : 0000000000000000 x1 : ffff80007a972538 x0 : 0000000000000000
[ 679.084456] Call trace:
[ 679.086893] v4l2_m2m_try_run+0x74/0x13c [v4l2_mem2mem] (P)
[ 679.092462] v4l2_m2m_device_run_work+0x14/0x20 [v4l2_mem2mem]
[ 679.098285] process_one_work+0x150/0x290
[ 679.102294] worker_thread+0x2d0/0x3ec
[ 679.106034] kthread+0x12c/0x210
[ 679.109255] ret_from_fork+0x10/0x20
[ 679.112825] Code: 39530000 370005c0 f9400260 f9412661 (f941ac00)
[ 679.118905] ---[ end trace 0000000000000000 ]---
>
> Change since v2:
> ==================
> * For [PATCH v3 4/4] media: chips-media: wave5: Improve performance of decoder
> - squash v2's #3~#6 to #4 patch of v3
Thanks for this update, I'll check if anything is left appart from stability and
provide feedback. I'm looking forward you input on the disclosed bug I have hit.
Nicolas
>
> Change since v1:
> ===================
> * For [PATCH v2 2/7] media: chips-media: wave5: Improve performance of decoder
> - change log to dbg level
>
> Change since v0:
> ===================
> * For [PATCH v1 2/7] media: chips-media: wave5: Improve performance of decoder
> - separates the previous patch to a few patches
>
> * For [PATCH v1 3/7] media: chips-media: wave5: Fix not to be closed
> - separated from the previous patch of performance improvement of
> decoder
>
> * For [PATCH v1 4/7] media: chips-media: wave5: Use spinlock whenever state is changed
> - separated from the previous patch of performance improvement of
> decoder
>
> * For [PATCH v1 5/7] media: chips-media: wave5: Fix not to free resources normally when
> instance was destroyed
> - separated from the previous patch of performance improvement of
> decoder
>
> * For [PATCH v1 7/7] media: chips-media: wave5: Fix SError of kernel panic when closed
> - separated from the previous patch of performance improvement of
> decoder
>
>
> Jackson Lee (4):
> media: chips-media: wave5: Fix SError of kernel panic when closed
> media: chips-media: wave5: Fix Null reference while testing fluster
> media: chips-media: wave5: Add WARN_ON to check if dec_output_info is
> NULL
> media: chips-media: wave5: Improve performance of decoder
>
> .../platform/chips-media/wave5/wave5-helper.c | 23 ++-
> .../platform/chips-media/wave5/wave5-hw.c | 2 +-
> .../chips-media/wave5/wave5-vpu-dec.c | 139 ++++++++++++------
> .../chips-media/wave5/wave5-vpu-enc.c | 8 +-
> .../platform/chips-media/wave5/wave5-vpu.c | 71 +++++++--
> .../platform/chips-media/wave5/wave5-vpuapi.c | 37 ++---
> .../platform/chips-media/wave5/wave5-vpuapi.h | 11 ++
> .../chips-media/wave5/wave5-vpuconfig.h | 1 +
> 8 files changed, 219 insertions(+), 73 deletions(-)
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 4/4] media: chips-media: wave5: Improve performance of decoder
2025-06-23 0:21 ` [PATCH v3 4/4] media: chips-media: wave5: Improve performance of decoder Jackson.lee
@ 2025-08-29 18:01 ` Nicolas Dufresne
0 siblings, 0 replies; 12+ messages in thread
From: Nicolas Dufresne @ 2025-08-29 18:01 UTC (permalink / raw)
To: Jackson.lee, mchehab, hverkuil-cisco, bob.beckett
Cc: linux-media, linux-kernel, lafley.kim, b-brnich, hverkuil,
nas.chung
[-- Attachment #1: Type: text/plain, Size: 18781 bytes --]
Le lundi 23 juin 2025 à 09:21 +0900, Jackson.lee a écrit :
> From: Jackson Lee <jackson.lee@chipsnmedia.com>
>
> The current decoding method was to wait until each frame was
> decoded after feeding a bitstream. As a result, performance was low
> and Wave5 could not achieve max pixel processing rate.
>
> Update driver to use an asynchronous approach for decoding and feeding a
> bitstream in order to achieve full capabilities of the device.
>
> WAVE5 supports command-queueing to maximize performance by pipelining
> internal commands and by hiding wait cycle taken to receive a command
> from Host processor.
>
> Instead of waiting for each command to be executed before sending the
> next command, Host processor just places all the commands in the
> command-queue and goes on doing other things while the commands in the
> queue are processed by VPU.
>
> While Host processor handles its own tasks, it can receive VPU interrupt
> request (IRQ).
> In this case, host processor can simply exit interrupt service routine
> (ISR) without accessing to host interface to read the result of the
> command reported by VPU.
> After host processor completed its tasks, host processor can read the
> command result when host processor needs the reports and does
> response processing.
>
> To archive this goal, the device_run() calls v4l2_m2m_job_finish
> so that next command can be sent to VPU continuously, if there is
> any result, then irq is triggered and gets decoded frames and returns
> them to upper layer.
> Theses processes work independently each other without waiting
> a decoded frame.
>
> Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> ---
> .../platform/chips-media/wave5/wave5-hw.c | 2 +-
> .../chips-media/wave5/wave5-vpu-dec.c | 131 ++++++++++++------
> .../platform/chips-media/wave5/wave5-vpuapi.c | 22 ++-
> .../platform/chips-media/wave5/wave5-vpuapi.h | 5 +
> .../chips-media/wave5/wave5-vpuconfig.h | 1 +
> 5 files changed, 119 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-hw.c b/drivers/media/platform/chips-media/wave5/wave5-hw.c
> index d94cf84c3ee5..687ce6ccf3ae 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-hw.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-hw.c
> @@ -102,7 +102,7 @@ static void _wave5_print_reg_err(struct vpu_device *vpu_dev, u32 reg_fail_reason
> dev_dbg(dev, "%s: queueing failure: 0x%x\n", func, reg_val);
> break;
> case WAVE5_SYSERR_RESULT_NOT_READY:
> - dev_err(dev, "%s: result not ready: 0x%x\n", func, reg_fail_reason);
> + dev_dbg(dev, "%s: result not ready: 0x%x\n", func, reg_fail_reason);
> break;
> case WAVE5_SYSERR_ACCESS_VIOLATION_HW:
> dev_err(dev, "%s: access violation: 0x%x\n", func, reg_fail_reason);
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> index 2df7668575f4..4554a24df8a1 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> @@ -268,6 +268,7 @@ static void send_eos_event(struct vpu_instance *inst)
>
> v4l2_event_queue_fh(&inst->v4l2_fh, &vpu_event_eos);
> inst->eos = false;
> + inst->sent_eos = true;
> }
>
> static int handle_dynamic_resolution_change(struct vpu_instance *inst)
> @@ -347,13 +348,12 @@ static void wave5_vpu_dec_finish_decode(struct vpu_instance *inst)
> struct vb2_v4l2_buffer *dec_buf = NULL;
> struct vb2_v4l2_buffer *disp_buf = NULL;
> struct vb2_queue *dst_vq = v4l2_m2m_get_dst_vq(m2m_ctx);
> - struct queue_status_info q_status;
>
> dev_dbg(inst->dev->dev, "%s: Fetch output info from firmware.", __func__);
>
> ret = wave5_vpu_dec_get_output_info(inst, &dec_info);
> if (ret) {
> - dev_warn(inst->dev->dev, "%s: could not get output info.", __func__);
> + dev_dbg(inst->dev->dev, "%s: could not get output info.", __func__);
> v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> return;
> }
> @@ -442,18 +442,14 @@ static void wave5_vpu_dec_finish_decode(struct vpu_instance *inst)
> spin_unlock_irqrestore(&inst->state_spinlock, flags);
> }
>
> - /*
> - * During a resolution change and while draining, the firmware may flush
> - * the reorder queue regardless of having a matching decoding operation
> - * pending. Only terminate the job if there are no more IRQ coming.
> - */
> - wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, &q_status);
> - if (q_status.report_queue_count == 0 &&
> - (q_status.instance_queue_count == 0 || dec_info.sequence_changed)) {
> - dev_dbg(inst->dev->dev, "%s: finishing job.\n", __func__);
> - pm_runtime_mark_last_busy(inst->dev->dev);
> - pm_runtime_put_autosuspend(inst->dev->dev);
> - v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> + if (inst->sent_eos &&
> + v4l2_m2m_get_curr_priv(inst->v4l2_m2m_dev)) {
> + struct queue_status_info q_status;
> +
> + wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, &q_status);
> + if (q_status.report_queue_count == 0 &&
> + q_status.instance_queue_count == 0)
> + v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> }
> }
>
> @@ -1146,8 +1142,8 @@ static int write_to_ringbuffer(struct vpu_instance *inst, void *buffer, size_t b
> static int fill_ringbuffer(struct vpu_instance *inst)
> {
> struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
> - struct v4l2_m2m_buffer *buf, *n;
> - int ret;
> + struct vpu_src_buffer *vpu_buf;
> + int ret = 0;
>
> if (m2m_ctx->last_src_buf) {
> struct vpu_src_buffer *vpu_buf = wave5_to_vpu_src_buf(m2m_ctx->last_src_buf);
> @@ -1158,9 +1154,8 @@ static int fill_ringbuffer(struct vpu_instance *inst)
> }
> }
>
> - v4l2_m2m_for_each_src_buf_safe(m2m_ctx, buf, n) {
> - struct vb2_v4l2_buffer *vbuf = &buf->vb;
> - struct vpu_src_buffer *vpu_buf = wave5_to_vpu_src_buf(vbuf);
> + list_for_each_entry(vpu_buf, &inst->avail_src_bufs, list) {
> + struct vb2_v4l2_buffer *vbuf = &vpu_buf->v4l2_m2m_buf.vb;
> struct vpu_buf *ring_buffer = &inst->bitstream_vbuf;
> size_t src_size = vb2_get_plane_payload(&vbuf->vb2_buf, 0);
> void *src_buf = vb2_plane_vaddr(&vbuf->vb2_buf, 0);
> @@ -1220,9 +1215,13 @@ static int fill_ringbuffer(struct vpu_instance *inst)
> dev_dbg(inst->dev->dev, "last src buffer written to the ring buffer\n");
> break;
> }
> +
> + inst->queuing_num++;
> + list_del_init(&vpu_buf->list);
> + break;
> }
>
> - return 0;
> + return ret;
> }
>
> static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb)
> @@ -1236,6 +1235,11 @@ static void wave5_vpu_dec_buf_queue_src(struct vb2_buffer *vb)
> vbuf->sequence = inst->queued_src_buf_num++;
>
> v4l2_m2m_buf_queue(m2m_ctx, vbuf);
> +
> + INIT_LIST_HEAD(&vpu_buf->list);
> + mutex_lock(&inst->feed_lock);
> + list_add_tail(&vpu_buf->list, &inst->avail_src_bufs);
> + mutex_unlock(&inst->feed_lock);
Needs some comment. Also why not do everything in the mutex ?
> }
>
> static void wave5_vpu_dec_buf_queue_dst(struct vb2_buffer *vb)
> @@ -1287,10 +1291,13 @@ static void wave5_vpu_dec_buf_queue(struct vb2_buffer *vb)
> __func__, vb->type, vb->index, vb2_plane_size(&vbuf->vb2_buf, 0),
> vb2_plane_size(&vbuf->vb2_buf, 1), vb2_plane_size(&vbuf->vb2_buf, 2));
>
> - if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE)
> + if (vb->type == V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) {
> + if (inst->empty_queue)
> + inst->empty_queue = false;
> wave5_vpu_dec_buf_queue_src(vb);
> - else if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE)
> + } else if (vb->type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) {
> wave5_vpu_dec_buf_queue_dst(vb);
> + }
> }
>
> static int wave5_vpu_dec_allocate_ring_buffer(struct vpu_instance *inst)
> @@ -1385,6 +1392,13 @@ static int streamoff_output(struct vb2_queue *q)
> dma_addr_t new_rd_ptr;
> struct dec_output_info dec_info;
> unsigned int i;
> + struct vpu_src_buffer *vpu_buf, *tmp;
> +
> + inst->retry = false;
> + inst->queuing_num = 0;
> +
> + list_for_each_entry_safe(vpu_buf, tmp, &inst->avail_src_bufs, list)
> + list_del_init(&vpu_buf->list);
>
> for (i = 0; i < v4l2_m2m_num_dst_bufs_ready(m2m_ctx); i++) {
> ret = wave5_vpu_dec_set_disp_flag(inst, i);
> @@ -1474,6 +1488,8 @@ static void wave5_vpu_dec_stop_streaming(struct vb2_queue *q)
>
> dev_dbg(inst->dev->dev, "%s: type: %u\n", __func__, q->type);
> pm_runtime_resume_and_get(inst->dev->dev);
> + inst->empty_queue = false;
> + inst->sent_eos = false;
>
> while (check_cmd) {
> struct queue_status_info q_status;
> @@ -1481,11 +1497,11 @@ static void wave5_vpu_dec_stop_streaming(struct vb2_queue *q)
>
> wave5_vpu_dec_give_command(inst, DEC_GET_QUEUE_STATUS, &q_status);
>
> - if (q_status.report_queue_count == 0)
> + if ((inst->state == VPU_INST_STATE_STOP || q_status.instance_queue_count == 0) &&
> + q_status.report_queue_count == 0)
> break;
>
> - if (wave5_vpu_wait_interrupt(inst, VPU_DEC_TIMEOUT) < 0)
> - break;
> + wave5_vpu_wait_interrupt(inst, VPU_DEC_STOP_TIMEOUT);
>
> if (wave5_vpu_dec_get_output_info(inst, &dec_output_info))
> dev_dbg(inst->dev->dev, "there is no output info\n");
> @@ -1577,13 +1593,24 @@ static void wave5_vpu_dec_device_run(void *priv)
> struct queue_status_info q_status;
> u32 fail_res = 0;
> int ret = 0;
> + unsigned long flags;
>
> dev_dbg(inst->dev->dev, "%s: Fill the ring buffer with new bitstream data", __func__);
> pm_runtime_resume_and_get(inst->dev->dev);
> - ret = fill_ringbuffer(inst);
> - if (ret) {
> - dev_warn(inst->dev->dev, "Filling ring buffer failed\n");
> - goto finish_job_and_return;
> + if (!inst->retry) {
> + mutex_lock(&inst->feed_lock);
> + ret = fill_ringbuffer(inst);
> + mutex_unlock(&inst->feed_lock);
> + if (ret < 0) {
> + dev_warn(inst->dev->dev, "Filling ring buffer failed\n");
> + goto finish_job_and_return;
> + } else if (!inst->eos &&
> + inst->queuing_num == 0 &&
> + inst->state == VPU_INST_STATE_PIC_RUN) {
> + dev_dbg(inst->dev->dev, "%s: no bitstream for feeding, so skip ", __func__);
> + inst->empty_queue = true;
> + goto finish_job_and_return;
> + }
> }
>
> switch (inst->state) {
> @@ -1608,7 +1635,9 @@ static void wave5_vpu_dec_device_run(void *priv)
> }
> spin_unlock_irqrestore(&inst->state_spinlock, flags);
> } else {
> + spin_lock_irqsave(&inst->state_spinlock, flags);
If you are to lock/only for every state switch, just do that inside the helper
function, that will reduce the footprint. Also, consider using guard().
> switch_state(inst, VPU_INST_STATE_INIT_SEQ);
> + spin_unlock_irqrestore(&inst->state_spinlock, flags);
> }
>
> break;
> @@ -1619,8 +1648,9 @@ static void wave5_vpu_dec_device_run(void *priv)
> * we had a chance to switch, which leads to an invalid state
> * change.
> */
> + spin_lock_irqsave(&inst->state_spinlock, flags);
> switch_state(inst, VPU_INST_STATE_PIC_RUN);
> -
> + spin_unlock_irqrestore(&inst->state_spinlock, flags);
> /*
> * During DRC, the picture decoding remains pending, so just leave the job
> * active until this decode operation completes.
> @@ -1634,12 +1664,14 @@ static void wave5_vpu_dec_device_run(void *priv)
> ret = wave5_prepare_fb(inst);
> if (ret) {
> dev_warn(inst->dev->dev, "Framebuffer preparation, fail: %d\n", ret);
> + spin_lock_irqsave(&inst->state_spinlock, flags);
> switch_state(inst, VPU_INST_STATE_STOP);
> + spin_unlock_irqrestore(&inst->state_spinlock, flags);
> break;
> }
>
> if (q_status.instance_queue_count) {
> - dev_dbg(inst->dev->dev, "%s: leave with active job", __func__);
> + v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> return;
> }
>
> @@ -1650,26 +1682,42 @@ static void wave5_vpu_dec_device_run(void *priv)
> dev_err(inst->dev->dev,
> "Frame decoding on m2m context (%p), fail: %d (result: %d)\n",
> m2m_ctx, ret, fail_res);
> - break;
> + goto finish_job_and_return;
> + }
> +
> + if (fail_res == WAVE5_SYSERR_QUEUEING_FAIL) {
> + inst->retry = true;
> + } else {
> + inst->retry = false;
> + if (!inst->eos)
> + inst->queuing_num--;
> }
> - /* Return so that we leave this job active */
> - dev_dbg(inst->dev->dev, "%s: leave with active job", __func__);
> - return;
> - default:
> - WARN(1, "Execution of a job in state %s illegal.\n", state_to_str(inst->state));
> break;
> + default:
> + dev_dbg(inst->dev->dev, "Execution of a job in state %s illegal.\n",
> + state_to_str(inst->state));
> }
>
> finish_job_and_return:
> dev_dbg(inst->dev->dev, "%s: leave and finish job", __func__);
> pm_runtime_mark_last_busy(inst->dev->dev);
> pm_runtime_put_autosuspend(inst->dev->dev);
> - v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> + /*
> + * After receiving CMD_STOP, there is no input, but we have to run device_run
> + * to send DEC_PIC command until display index == -1, so job_finish was always
> + * called in the device_run to archive it, the logic was very wasteful
> + * in power and CPU time.
> + * If EOS is passed, device_run will not call job_finish no more, it is called
> + * only if HW is idle status in order to reduce overhead.
> + */
> + if (!inst->sent_eos)
> + v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> }
>
> static void wave5_vpu_dec_job_abort(void *priv)
> {
> struct vpu_instance *inst = priv;
> + struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx;
> int ret;
>
> ret = switch_state(inst, VPU_INST_STATE_STOP);
> @@ -1680,6 +1728,8 @@ static void wave5_vpu_dec_job_abort(void *priv)
> if (ret)
> dev_warn(inst->dev->dev,
> "Setting EOS for the bitstream, fail: %d\n", ret);
> +
> + v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx);
> }
>
> static int wave5_vpu_dec_job_ready(void *priv)
> @@ -1715,7 +1765,8 @@ static int wave5_vpu_dec_job_ready(void *priv)
> "No capture buffer ready to decode!\n");
> break;
> } else if (!wave5_is_draining_or_eos(inst) &&
> - !v4l2_m2m_num_src_bufs_ready(m2m_ctx)) {
> + (!v4l2_m2m_num_src_bufs_ready(m2m_ctx) ||
> + inst->empty_queue)) {
> dev_dbg(inst->dev->dev,
> "No bitstream data to decode!\n");
> break;
> @@ -1755,6 +1806,8 @@ static int wave5_vpu_open_dec(struct file *filp)
> inst->ops = &wave5_vpu_dec_inst_ops;
>
> spin_lock_init(&inst->state_spinlock);
> + mutex_init(&inst->feed_lock);
> + INIT_LIST_HEAD(&inst->avail_src_bufs);
>
> inst->codec_info = kzalloc(sizeof(*inst->codec_info), GFP_KERNEL);
> if (!inst->codec_info)
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> index 5b10f9f49b9f..edbe69540ef1 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> @@ -207,6 +207,7 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
> int retry = 0;
> struct vpu_device *vpu_dev = inst->dev;
> int i;
> + struct dec_output_info dec_info;
>
> *fail_res = 0;
> if (!inst->codec_info)
> @@ -227,11 +228,26 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
> goto unlock_and_return;
> }
>
> - if (*fail_res == WAVE5_SYSERR_VPU_STILL_RUNNING &&
> - retry++ >= MAX_FIRMWARE_CALL_RETRY) {
> + if (ret == 0)
> + break;
> +
> + if (*fail_res != WAVE5_SYSERR_VPU_STILL_RUNNING) {
> + dev_warn(inst->dev->dev, "dec_finish_seq timed out\n");
> + goto unlock_and_return;
> + }
> +
> + if (retry++ >= MAX_FIRMWARE_CALL_RETRY) {
> ret = -ETIMEDOUT;
> goto unlock_and_return;
> }
> +
> + mutex_unlock(&vpu_dev->hw_lock);
> + wave5_vpu_dec_get_output_info(inst, &dec_info);
> + ret = mutex_lock_interruptible(&vpu_dev->hw_lock);
> + if (ret) {
> + pm_runtime_put_sync(inst->dev->dev);
> + return ret;
> + }
> } while (ret != 0);
>
> dev_dbg(inst->dev->dev, "%s: dec_finish_seq complete\n", __func__);
> @@ -248,6 +264,8 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32 *fail_res)
>
> wave5_vdi_free_dma_memory(vpu_dev, &p_dec_info->vb_task);
>
> + mutex_destroy(&inst->feed_lock);
> +
> unlock_and_return:
> mutex_unlock(&vpu_dev->hw_lock);
> pm_runtime_put_sync(inst->dev->dev);
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> index bc101397204d..adfbc104f939 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> @@ -818,6 +818,11 @@ struct vpu_instance {
> bool cbcr_interleave;
> bool nv21;
> bool eos;
> + bool sent_eos; /* check if EOS is sent to application */
> + bool retry; /* retry to feed bitstream if failure reason is WAVE5_SYSERR_QUEUEING_FAIL*/
> + int queuing_num; /* count of bitstream queued */
> + struct mutex feed_lock; /* lock for feeding bitstream buffers */
> + bool empty_queue;
So my overall impression is that most of this change make sense if you really
don't want to architecture the VPU library part. I left a minor remark that
needs addressing, along with fixing the crash and deadlock found during testing.
regards,
Nicolas
> struct vpu_buf bitstream_vbuf;
> dma_addr_t last_rd_ptr;
> size_t remaining_consumed_bytes;
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuconfig.h b/drivers/media/platform/chips-media/wave5/wave5-vpuconfig.h
> index 1ea9f5f31499..4ebd48d5550e 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpuconfig.h
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuconfig.h
> @@ -59,6 +59,7 @@
> // application specific configuration
> #define VPU_ENC_TIMEOUT 60000
> #define VPU_DEC_TIMEOUT 60000
> +#define VPU_DEC_STOP_TIMEOUT 10
>
> // for WAVE encoder
> #define USE_SRC_PRP_AXI 0
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/4] media: chips-media: wave5: Fix SError of kernel panic when closed
2025-06-23 0:21 ` [PATCH v3 1/4] media: chips-media: wave5: Fix SError of kernel panic when closed Jackson.lee
@ 2025-08-29 18:03 ` Nicolas Dufresne
0 siblings, 0 replies; 12+ messages in thread
From: Nicolas Dufresne @ 2025-08-29 18:03 UTC (permalink / raw)
To: Jackson.lee, mchehab, hverkuil-cisco, bob.beckett
Cc: linux-media, linux-kernel, lafley.kim, b-brnich, hverkuil,
nas.chung
[-- Attachment #1: Type: text/plain, Size: 8439 bytes --]
Le lundi 23 juin 2025 à 09:21 +0900, Jackson.lee a écrit :
> From: Jackson Lee <jackson.lee@chipsnmedia.com>
>
> SError of kernel panic rarely happened while testing fluster.
> The root cause was to enter suspend mode because timeout of autosuspend
> delay happened.
>
> [ 48.834439] SError Interrupt on CPU0, code 0x00000000bf000000 -- SError
> [ 48.834455] CPU: 0 UID: 0 PID: 1067 Comm: v4l2h265dec0:sr Not tainted
> 6.12.9-gc9e21a1ebd75-dirty #7
> [ 48.834461] Hardware name: ti Texas Instruments J721S2 EVM/Texas
> Instruments J721S2 EVM, BIOS 2025.01-00345-gbaf3aaa8ecfa 01/01/2025
> [ 48.834464] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 48.834468] pc : wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
> [ 48.834488] lr : wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
> [ 48.834495] sp : ffff8000856e3a30
> [ 48.834497] x29: ffff8000856e3a30 x28: ffff0008093f6010 x27:
> ffff000809158130
> [ 48.834504] x26: 0000000000000000 x25: ffff00080b625000 x24:
> ffff000804a9ba80
> [ 48.834509] x23: ffff000802343028 x22: ffff000809158150 x21:
> ffff000802218000
> [ 48.834513] x20: ffff0008093f6000 x19: ffff0008093f6000 x18:
> 0000000000000000
> [ 48.834518] x17: 0000000000000000 x16: 0000000000000000 x15:
> 0000ffff74009618
> [ 48.834523] x14: 000000010000000c x13: 0000000000000000 x12:
> 0000000000000000
> [ 48.834527] x11: ffffffffffffffff x10: ffffffffffffffff x9 :
> ffff000802343028
> [ 48.834532] x8 : ffff00080b6252a0 x7 : 0000000000000038 x6 :
> 0000000000000000
> [ 48.834536] x5 : ffff00080b625060 x4 : 0000000000000000 x3 :
> 0000000000000000
> [ 48.834541] x2 : 0000000000000000 x1 : ffff800084bf0118 x0 :
> ffff800084bf0000
> [ 48.834547] Kernel panic - not syncing: Asynchronous SError Interrupt
> [ 48.834549] CPU: 0 UID: 0 PID: 1067 Comm: v4l2h265dec0:sr Not tainted
> 6.12.9-gc9e21a1ebd75-dirty #7
> [ 48.834554] Hardware name: ti Texas Instruments J721S2 EVM/Texas
> Instruments J721S2 EVM, BIOS 2025.01-00345-gbaf3aaa8ecfa 01/01/2025
> [ 48.834556] Call trace:
> [ 48.834559] dump_backtrace+0x94/0xec
> [ 48.834574] show_stack+0x18/0x24
> [ 48.834579] dump_stack_lvl+0x38/0x90
> [ 48.834585] dump_stack+0x18/0x24
> [ 48.834588] panic+0x35c/0x3e0
> [ 48.834592] nmi_panic+0x40/0x8c
> [ 48.834595] arm64_serror_panic+0x64/0x70
> [ 48.834598] do_serror+0x3c/0x78
> [ 48.834601] el1h_64_error_handler+0x34/0x4c
> [ 48.834605] el1h_64_error+0x64/0x68
> [ 48.834608] wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
> [ 48.834615] wave5_vpu_dec_clr_disp_flag+0x54/0x80 [wave5]
> [ 48.834622] wave5_vpu_dec_buf_queue+0x19c/0x1a0 [wave5]
> [ 48.834628] __enqueue_in_driver+0x3c/0x74 [videobuf2_common]
> [ 48.834639] vb2_core_qbuf+0x508/0x61c [videobuf2_common]
> [ 48.834646] vb2_qbuf+0xa4/0x168 [videobuf2_v4l2]
> [ 48.834656] v4l2_m2m_qbuf+0x80/0x238 [v4l2_mem2mem]
> [ 48.834666] v4l2_m2m_ioctl_qbuf+0x18/0x24 [v4l2_mem2mem]
> [ 48.834673] v4l_qbuf+0x48/0x5c [videodev]
> [ 48.834704] __video_do_ioctl+0x180/0x3f0 [videodev]
> [ 48.834725] video_usercopy+0x2ec/0x68c [videodev]
> [ 48.834745] video_ioctl2+0x18/0x24 [videodev]
> [ 48.834766] v4l2_ioctl+0x40/0x60 [videodev]
> [ 48.834786] __arm64_sys_ioctl+0xa8/0xec
> [ 48.834793] invoke_syscall+0x44/0x100
> [ 48.834800] el0_svc_common.constprop.0+0xc0/0xe0
> [ 48.834804] do_el0_svc+0x1c/0x28
> [ 48.834809] el0_svc+0x30/0xd0
> [ 48.834813] el0t_64_sync_handler+0xc0/0xc4
> [ 48.834816] el0t_64_sync+0x190/0x194
> [ 48.834820] SMP: stopping secondary CPUs
> [ 48.834831] Kernel Offset: disabled
> [ 48.834833] CPU features: 0x08,00002002,80200000,4200421b
> [ 48.834837] Memory Limit: none
> [ 49.161404] ---[ end Kernel panic - not syncing: Asynchronous SError
> Interrupt ]---
>
> Fixes: 2092b3833487 ("media: chips-media: wave5: Support runtime
> suspend/resume")
> Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
cheers,
Nicolas
> ---
> .../platform/chips-media/wave5/wave5-vpu-dec.c | 3 ---
> .../platform/chips-media/wave5/wave5-vpu-enc.c | 3 ---
> .../media/platform/chips-media/wave5/wave5-vpu.c | 2 +-
> .../platform/chips-media/wave5/wave5-vpuapi.c | 15 ---------------
> 4 files changed, 1 insertion(+), 22 deletions(-)
>
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> index fd71f0c43ac3..216b024c42d8 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> @@ -1830,9 +1830,6 @@ static int wave5_vpu_open_dec(struct file *filp)
> if (ret)
> goto cleanup_inst;
>
> - if (list_empty(&dev->instances))
> - pm_runtime_use_autosuspend(inst->dev->dev);
> -
> list_add_tail(&inst->list, &dev->instances);
>
> mutex_unlock(&dev->dev_lock);
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> index 1e5fc5f8b856..cf20f774ed1b 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> @@ -1774,9 +1774,6 @@ static int wave5_vpu_open_enc(struct file *filp)
> if (ret)
> goto cleanup_inst;
>
> - if (list_empty(&dev->instances))
> - pm_runtime_use_autosuspend(inst->dev->dev);
> -
> list_add_tail(&inst->list, &dev->instances);
>
> mutex_unlock(&dev->dev_lock);
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> index e1715d3f43b0..b3c633dd3df1 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> @@ -322,7 +322,7 @@ static int wave5_vpu_probe(struct platform_device *pdev)
> dev_info(&pdev->dev, "Product Code: 0x%x\n", dev->product_code);
> dev_info(&pdev->dev, "Firmware Revision: %u\n", fw_revision);
>
> - pm_runtime_set_autosuspend_delay(&pdev->dev, 100);
> + pm_runtime_set_autosuspend_delay(&pdev->dev, 500);
> pm_runtime_use_autosuspend(&pdev->dev);
> pm_runtime_enable(&pdev->dev);
> wave5_vpu_sleep_wake(&pdev->dev, true, NULL, 0);
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> index e5e879a13e8b..e94d6ebc9f81 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> @@ -207,8 +207,6 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32
> *fail_res)
> int retry = 0;
> struct vpu_device *vpu_dev = inst->dev;
> int i;
> - int inst_count = 0;
> - struct vpu_instance *inst_elm;
>
> *fail_res = 0;
> if (!inst->codec_info)
> @@ -250,11 +248,6 @@ int wave5_vpu_dec_close(struct vpu_instance *inst, u32
> *fail_res)
>
> wave5_vdi_free_dma_memory(vpu_dev, &p_dec_info->vb_task);
>
> - list_for_each_entry(inst_elm, &vpu_dev->instances, list)
> - inst_count++;
> - if (inst_count == 1)
> - pm_runtime_dont_use_autosuspend(vpu_dev->dev);
> -
> unlock_and_return:
> mutex_unlock(&vpu_dev->hw_lock);
> pm_runtime_put_sync(inst->dev->dev);
> @@ -720,8 +713,6 @@ int wave5_vpu_enc_close(struct vpu_instance *inst, u32
> *fail_res)
> int ret;
> int retry = 0;
> struct vpu_device *vpu_dev = inst->dev;
> - int inst_count = 0;
> - struct vpu_instance *inst_elm;
>
> *fail_res = 0;
> if (!inst->codec_info)
> @@ -764,12 +755,6 @@ int wave5_vpu_enc_close(struct vpu_instance *inst, u32
> *fail_res)
> }
>
> wave5_vdi_free_dma_memory(vpu_dev, &p_enc_info->vb_task);
> -
> - list_for_each_entry(inst_elm, &vpu_dev->instances, list)
> - inst_count++;
> - if (inst_count == 1)
> - pm_runtime_dont_use_autosuspend(vpu_dev->dev);
> -
> mutex_unlock(&vpu_dev->hw_lock);
> pm_runtime_put_sync(inst->dev->dev);
>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/4] media: chips-media: wave5: Fix Null reference while testing fluster
2025-06-23 0:21 ` [PATCH v3 2/4] media: chips-media: wave5: Fix Null reference while testing fluster Jackson.lee
@ 2025-08-29 18:13 ` Nicolas Dufresne
0 siblings, 0 replies; 12+ messages in thread
From: Nicolas Dufresne @ 2025-08-29 18:13 UTC (permalink / raw)
To: Jackson.lee, mchehab, hverkuil-cisco, bob.beckett
Cc: linux-media, linux-kernel, lafley.kim, b-brnich, hverkuil,
nas.chung
[-- Attachment #1: Type: text/plain, Size: 11615 bytes --]
Le lundi 23 juin 2025 à 09:21 +0900, Jackson.lee a écrit :
> From: Jackson Lee <jackson.lee@chipsnmedia.com>
>
> When multi instances are created/destroyed, many interrupts happens
> or structures for decoder are removed.
Did you mean "and" instead of "or" ?
> "struct vpu_instance" this structure is shared for all flow in decoder,
* in the decoder
> so if the structure is not protected by lock, Null reference exception
Null dereference
> could happens sometimes.
> IRQ Handler was spilt to two phases and Lock was added as well.
>
> Fixes: 9707a6254a8a ("media: chips-media: wave5: Add the v4l2 layer")
> Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> ---
> .../platform/chips-media/wave5/wave5-helper.c | 23 ++++++-
> .../chips-media/wave5/wave5-vpu-dec.c | 5 ++
> .../chips-media/wave5/wave5-vpu-enc.c | 5 ++
> .../platform/chips-media/wave5/wave5-vpu.c | 69 ++++++++++++++++---
> .../platform/chips-media/wave5/wave5-vpuapi.h | 6 ++
> 5 files changed, 99 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-helper.c b/drivers/media/platform/chips-media/wave5/wave5-helper.c
> index 2c9d8cbca6e4..02914dc1ca56 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-helper.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-helper.c
> @@ -49,7 +49,7 @@ void wave5_cleanup_instance(struct vpu_instance *inst)
> v4l2_fh_del(&inst->v4l2_fh);
> v4l2_fh_exit(&inst->v4l2_fh);
> }
> - list_del_init(&inst->list);
> + kfifo_free(&inst->irq_status);
> ida_free(&inst->dev->inst_ida, inst->id);
> kfree(inst->codec_info);
> kfree(inst);
> @@ -61,8 +61,29 @@ int wave5_vpu_release_device(struct file *filp,
> {
> struct vpu_instance *inst = wave5_to_vpu_inst(filp->private_data);
> int ret = 0;
> + unsigned long flags;
>
> v4l2_m2m_ctx_release(inst->v4l2_fh.m2m_ctx);
> + /*
> + * To prevent Null reference exception, the existing irq handler were
> + * separated to two modules.
> + * One is to queue interrupt reason into the irq handler,
> + * the other is irq_thread to call the wave5_vpu_dec_finish_decode
> + * to get decoded frame.
> + * The list of instances should be protected between all flow of the
> + * decoding process, but to protect the list in the irq_handler, spin lock
> + * should be used, and mutex should be used in the irq_thread because spin lock
> + * is not able to be used because mutex is already being used
> + * in the wave5_vpu_dec_finish_decode.
> + * So the spin lock and mutex were used to protect the list in the release function.
> + */
> + ret = mutex_lock_interruptible(&inst->dev->irq_lock);
> + if (ret)
> + return ret;
> + spin_lock_irqsave(&inst->dev->irq_spinlock, flags);
> + list_del_init(&inst->list);
> + spin_unlock_irqrestore(&inst->dev->irq_spinlock, flags);
> + mutex_unlock(&inst->dev->irq_lock);
> if (inst->state != VPU_INST_STATE_NONE) {
> u32 fail_res;
>
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> index 216b024c42d8..2df7668575f4 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c
> @@ -1811,6 +1811,11 @@ static int wave5_vpu_open_dec(struct file *filp)
> inst->xfer_func = V4L2_XFER_FUNC_DEFAULT;
>
> init_completion(&inst->irq_done);
> + ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int), GFP_KERNEL);
> + if (ret) {
> + dev_err(inst->dev->dev, "failed to allocate fifo\n");
> + goto cleanup_inst;
> + }
>
> inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL);
> if (inst->id < 0) {
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> index cf20f774ed1b..7f1aa392805f 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-enc.c
> @@ -1760,6 +1760,11 @@ static int wave5_vpu_open_enc(struct file *filp)
> inst->frame_rate = 30;
>
> init_completion(&inst->irq_done);
> + ret = kfifo_alloc(&inst->irq_status, 16 * sizeof(int), GFP_KERNEL);
> + if (ret) {
> + dev_err(inst->dev->dev, "failed to allocate fifo\n");
> + goto cleanup_inst;
> + }
>
> inst->id = ida_alloc(&inst->dev->inst_ida, GFP_KERNEL);
> if (inst->id < 0) {
Can you prepend a patch that deduplicates this initialization code, put this in
wave5-helper.c, this way you won't have to copy paste this fix. The
kfifo_alloc/free will also now be in the same C file.<
Question here, I've been wondering if the list is strictly required on real
hardware ? I often thought this complex machinary was made to support a really
slow simulated CPU. But all this happened way before me being involved.
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu.c b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> index b3c633dd3df1..24a9001966e7 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu.c
> @@ -51,8 +51,11 @@ static void wave5_vpu_handle_irq(void *dev_id)
> u32 seq_done;
> u32 cmd_done;
> u32 irq_reason;
> - struct vpu_instance *inst;
> + u32 irq_subreason;
> + struct vpu_instance *inst, *tmp;
> struct vpu_device *dev = dev_id;
> + int val;
> + unsigned long flags;
>
> irq_reason = wave5_vdi_read_register(dev, W5_VPU_VINT_REASON);
> seq_done = wave5_vdi_read_register(dev, W5_RET_SEQ_DONE_INSTANCE_INFO);
> @@ -60,7 +63,8 @@ static void wave5_vpu_handle_irq(void *dev_id)
> wave5_vdi_write_register(dev, W5_VPU_VINT_REASON_CLR, irq_reason);
> wave5_vdi_write_register(dev, W5_VPU_VINT_CLEAR, 0x1);
>
> - list_for_each_entry(inst, &dev->instances, list) {
> + spin_lock_irqsave(&dev->irq_spinlock, flags);
> + list_for_each_entry_safe(inst, tmp, &dev->instances, list) {
>
> if (irq_reason & BIT(INT_WAVE5_INIT_SEQ) ||
> irq_reason & BIT(INT_WAVE5_ENC_SET_PARAM)) {
> @@ -82,14 +86,22 @@ static void wave5_vpu_handle_irq(void *dev_id)
> irq_reason & BIT(INT_WAVE5_ENC_PIC)) {
> if (cmd_done & BIT(inst->id)) {
> cmd_done &= ~BIT(inst->id);
> - wave5_vdi_write_register(dev, W5_RET_QUEUE_CMD_DONE_INST,
> - cmd_done);
> - inst->ops->finish_process(inst);
> + if (dev->irq >= 0) {
> + irq_subreason =
> + wave5_vdi_read_register(dev, W5_VPU_VINT_REASON);
> + if (!(irq_subreason & BIT(INT_WAVE5_DEC_PIC)))
> + wave5_vdi_write_register(dev,
> + W5_RET_QUEUE_CMD_DONE_INST,
> + cmd_done);
> + }
> + val = BIT(INT_WAVE5_DEC_PIC);
> + kfifo_in(&inst->irq_status, &val, sizeof(int));
> }
> }
> -
> - wave5_vpu_clear_interrupt(inst, irq_reason);
> }
> + spin_unlock_irqrestore(&dev->irq_spinlock, flags);
> +
> + up(&dev->irq_sem);
> }
>
> static irqreturn_t wave5_vpu_irq_thread(int irq, void *dev_id)
> @@ -121,6 +133,35 @@ static enum hrtimer_restart wave5_vpu_timer_callback(struct hrtimer *timer)
> return HRTIMER_RESTART;
> }
>
> +static int irq_thread(void *data)
Have you considered using threaded IRQ instead ? Is that an option ? Otherwise,
why not ?
cheers,
Nicolas
> +{
> + struct vpu_device *dev = (struct vpu_device *)data;
> + struct vpu_instance *inst, *tmp;
> + int irq_status, ret;
> +
> + while (!kthread_should_stop()) {
> + if (down_interruptible(&dev->irq_sem))
> + continue;
> +
> + if (kthread_should_stop())
> + break;
> +
> + mutex_lock(&dev->irq_lock);
> + list_for_each_entry_safe(inst, tmp, &dev->instances, list) {
> + while (kfifo_len(&inst->irq_status)) {
> + ret = kfifo_out(&inst->irq_status, &irq_status, sizeof(int));
> + if (!ret)
> + break;
> +
> + inst->ops->finish_process(inst);
> + }
> + }
> + mutex_unlock(&dev->irq_lock);
> + }
> +
> + return 0;
> +}
> +
> static int wave5_vpu_load_firmware(struct device *dev, const char *fw_name,
> u32 *revision)
> {
> @@ -224,6 +265,8 @@ static int wave5_vpu_probe(struct platform_device *pdev)
>
> mutex_init(&dev->dev_lock);
> mutex_init(&dev->hw_lock);
> + mutex_init(&dev->irq_lock);
> + spin_lock_init(&dev->irq_spinlock);
> dev_set_drvdata(&pdev->dev, dev);
> dev->dev = &pdev->dev;
>
> @@ -266,6 +309,10 @@ static int wave5_vpu_probe(struct platform_device *pdev)
> }
> dev->product = wave5_vpu_get_product_id(dev);
>
> + sema_init(&dev->irq_sem, 1);
> + INIT_LIST_HEAD(&dev->instances);
> + dev->irq_thread = kthread_run(irq_thread, dev, "irq thread");
> +
> dev->irq = platform_get_irq(pdev, 0);
> if (dev->irq < 0) {
> dev_err(&pdev->dev, "failed to get irq resource, falling back to polling\n");
> @@ -288,7 +335,6 @@ static int wave5_vpu_probe(struct platform_device *pdev)
> }
> }
>
> - INIT_LIST_HEAD(&dev->instances);
> ret = v4l2_device_register(&pdev->dev, &dev->v4l2_dev);
> if (ret) {
> dev_err(&pdev->dev, "v4l2_device_register, fail: %d\n", ret);
> @@ -351,6 +397,12 @@ static void wave5_vpu_remove(struct platform_device *pdev)
> {
> struct vpu_device *dev = dev_get_drvdata(&pdev->dev);
>
> + if (dev->irq_thread) {
> + kthread_stop(dev->irq_thread);
> + up(&dev->irq_sem);
> + dev->irq_thread = NULL;
> + }
> +
> if (dev->irq < 0) {
> kthread_destroy_worker(dev->worker);
> hrtimer_cancel(&dev->hrtimer);
> @@ -361,6 +413,7 @@ static void wave5_vpu_remove(struct platform_device *pdev)
>
> mutex_destroy(&dev->dev_lock);
> mutex_destroy(&dev->hw_lock);
> + mutex_destroy(&dev->irq_lock);
> reset_control_assert(dev->resets);
> clk_bulk_disable_unprepare(dev->num_clks, dev->clks);
> wave5_vpu_enc_unregister_device(dev);
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> index 45615c15beca..bc101397204d 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.h
> @@ -8,6 +8,7 @@
> #ifndef VPUAPI_H_INCLUDED
> #define VPUAPI_H_INCLUDED
>
> +#include <linux/kfifo.h>
> #include <linux/idr.h>
> #include <linux/genalloc.h>
> #include <media/v4l2-device.h>
> @@ -747,6 +748,7 @@ struct vpu_device {
> struct video_device *video_dev_enc;
> struct mutex dev_lock; /* lock for the src, dst v4l2 queues */
> struct mutex hw_lock; /* lock hw configurations */
> + struct mutex irq_lock;
> int irq;
> enum product_id product;
> struct vpu_attr attr;
> @@ -764,7 +766,10 @@ struct vpu_device {
> struct kthread_worker *worker;
> int vpu_poll_interval;
> int num_clks;
> + struct task_struct *irq_thread;
> + struct semaphore irq_sem; /* signal to irq_thread when interrupt happens*/
> struct reset_control *resets;
> + spinlock_t irq_spinlock; /* protect instances list */
> };
>
> struct vpu_instance;
> @@ -788,6 +793,7 @@ struct vpu_instance {
> enum v4l2_ycbcr_encoding ycbcr_enc;
> enum v4l2_quantization quantization;
>
> + struct kfifo irq_status;
> enum vpu_instance_state state;
> enum vpu_instance_type type;
> const struct vpu_instance_ops *ops;
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 3/4] media: chips-media: wave5: Add WARN_ON to check if dec_output_info is NULL
2025-06-23 0:21 ` [PATCH v3 3/4] media: chips-media: wave5: Add WARN_ON to check if dec_output_info is NULL Jackson.lee
@ 2025-08-29 18:15 ` Nicolas Dufresne
0 siblings, 0 replies; 12+ messages in thread
From: Nicolas Dufresne @ 2025-08-29 18:15 UTC (permalink / raw)
To: Jackson.lee, mchehab, hverkuil-cisco, bob.beckett
Cc: linux-media, linux-kernel, lafley.kim, b-brnich, hverkuil,
nas.chung
[-- Attachment #1: Type: text/plain, Size: 1381 bytes --]
Le lundi 23 juin 2025 à 09:21 +0900, Jackson.lee a écrit :
> From: Jackson Lee <jackson.lee@chipsnmedia.com>
>
> The dec_output_info could be a null pointer, so WARN_ON around it
> was added.
I think to warrant a WARN_ON, its should be that the info should NOT be null.
WARN_ON should be used for driver programming issues. If this is what you mean,
I would reword:
The dec_output_info should not be a null pointer, WARN_ON around it to
indicates a driver issue.
cheers,
Nicolas
>
> Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
> Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
> ---
> drivers/media/platform/chips-media/wave5/wave5-vpuapi.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> index e94d6ebc9f81..5b10f9f49b9f 100644
> --- a/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpuapi.c
> @@ -485,7 +485,7 @@ int wave5_vpu_dec_get_output_info(struct vpu_instance *inst, struct dec_output_i
> struct vpu_device *vpu_dev = inst->dev;
> struct dec_output_info *disp_info;
>
> - if (!info)
> + if (WARN_ON(!info))
> return -EINVAL;
>
> p_dec_info = &inst->codec_info->dec_info;
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-08-29 18:15 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-23 0:21 [PATCH v3 0/4] Performance improvement of decoder Jackson.lee
2025-06-23 0:21 ` [PATCH v3 1/4] media: chips-media: wave5: Fix SError of kernel panic when closed Jackson.lee
2025-08-29 18:03 ` Nicolas Dufresne
2025-06-23 0:21 ` [PATCH v3 2/4] media: chips-media: wave5: Fix Null reference while testing fluster Jackson.lee
2025-08-29 18:13 ` Nicolas Dufresne
2025-06-23 0:21 ` [PATCH v3 3/4] media: chips-media: wave5: Add WARN_ON to check if dec_output_info is NULL Jackson.lee
2025-08-29 18:15 ` Nicolas Dufresne
2025-06-23 0:21 ` [PATCH v3 4/4] media: chips-media: wave5: Improve performance of decoder Jackson.lee
2025-08-29 18:01 ` Nicolas Dufresne
2025-07-07 0:37 ` [PATCH v3 0/4] Performance improvement " jackson.lee
2025-08-12 7:51 ` jackson.lee
2025-08-29 15:40 ` Nicolas Dufresne
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).