* [PATCH] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW
@ 2025-04-08 9:57 Jacek Lawrynowicz
2025-04-10 7:49 ` Jacek Lawrynowicz
2025-04-22 12:17 ` Greg KH
0 siblings, 2 replies; 8+ messages in thread
From: Jacek Lawrynowicz @ 2025-04-08 9:57 UTC (permalink / raw)
To: linux-kernel; +Cc: stable, Karol Wachowski, Jacek Lawrynowicz
From: Karol Wachowski <karol.wachowski@intel.com>
commit dad945c27a42dfadddff1049cf5ae417209a8996 upstream.
Trigger recovery of the NPU upon receiving HW context violation from
the firmware. The context violation error is a fatal error that prevents
any subsequent jobs from being executed. Without this fix it is
necessary to reload the driver to restore the NPU operational state.
This is simplified version of upstream commit as the full implementation
would require all engine reset/resume logic to be backported.
Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-13-maciej.falkowski@linux.intel.com
Fixes: 0adff3b0ef12 ("accel/ivpu: Share NPU busy time in sysfs")
Cc: <stable@vger.kernel.org> # v6.11+
---
drivers/accel/ivpu/ivpu_job.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/accel/ivpu/ivpu_job.c b/drivers/accel/ivpu/ivpu_job.c
index be2e2bf0f43f0..70b3676974407 100644
--- a/drivers/accel/ivpu/ivpu_job.c
+++ b/drivers/accel/ivpu/ivpu_job.c
@@ -482,6 +482,8 @@ static struct ivpu_job *ivpu_job_remove_from_submitted_jobs(struct ivpu_device *
return job;
}
+#define VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW 0xEU
+
static int ivpu_job_signal_and_destroy(struct ivpu_device *vdev, u32 job_id, u32 job_status)
{
struct ivpu_job *job;
@@ -490,6 +492,9 @@ static int ivpu_job_signal_and_destroy(struct ivpu_device *vdev, u32 job_id, u32
if (!job)
return -ENOENT;
+ if (job_status == VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW)
+ ivpu_pm_trigger_recovery(vdev, "HW context violation");
+
if (job->file_priv->has_mmu_faults)
job_status = DRM_IVPU_JOB_STATUS_ABORTED;
--
2.45.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW
2025-04-08 9:57 [PATCH] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW Jacek Lawrynowicz
@ 2025-04-10 7:49 ` Jacek Lawrynowicz
2025-04-10 8:03 ` Greg Kroah-Hartman
2025-04-22 12:17 ` Greg KH
1 sibling, 1 reply; 8+ messages in thread
From: Jacek Lawrynowicz @ 2025-04-10 7:49 UTC (permalink / raw)
To: linux-kernel, Greg Kroah-Hartman; +Cc: stable, Karol Wachowski
Hi,
This is an important patch for the Intel NPU.
Is there anything it is missing to be included in stable?
Regards,
Jacek
On 4/8/2025 11:57 AM, Jacek Lawrynowicz wrote:
> From: Karol Wachowski <karol.wachowski@intel.com>
>
> commit dad945c27a42dfadddff1049cf5ae417209a8996 upstream.
>
> Trigger recovery of the NPU upon receiving HW context violation from
> the firmware. The context violation error is a fatal error that prevents
> any subsequent jobs from being executed. Without this fix it is
> necessary to reload the driver to restore the NPU operational state.
>
> This is simplified version of upstream commit as the full implementation
> would require all engine reset/resume logic to be backported.
>
> Signed-off-by: Karol Wachowski <karol.wachowski@intel.com>
> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
> Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-13-maciej.falkowski@linux.intel.com
> Fixes: 0adff3b0ef12 ("accel/ivpu: Share NPU busy time in sysfs")
> Cc: <stable@vger.kernel.org> # v6.11+
> ---
> drivers/accel/ivpu/ivpu_job.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/accel/ivpu/ivpu_job.c b/drivers/accel/ivpu/ivpu_job.c
> index be2e2bf0f43f0..70b3676974407 100644
> --- a/drivers/accel/ivpu/ivpu_job.c
> +++ b/drivers/accel/ivpu/ivpu_job.c
> @@ -482,6 +482,8 @@ static struct ivpu_job *ivpu_job_remove_from_submitted_jobs(struct ivpu_device *
> return job;
> }
>
> +#define VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW 0xEU
> +
> static int ivpu_job_signal_and_destroy(struct ivpu_device *vdev, u32 job_id, u32 job_status)
> {
> struct ivpu_job *job;
> @@ -490,6 +492,9 @@ static int ivpu_job_signal_and_destroy(struct ivpu_device *vdev, u32 job_id, u32
> if (!job)
> return -ENOENT;
>
> + if (job_status == VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW)
> + ivpu_pm_trigger_recovery(vdev, "HW context violation");
> +
> if (job->file_priv->has_mmu_faults)
> job_status = DRM_IVPU_JOB_STATUS_ABORTED;
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW
2025-04-10 7:49 ` Jacek Lawrynowicz
@ 2025-04-10 8:03 ` Greg Kroah-Hartman
2025-04-10 9:27 ` Jacek Lawrynowicz
0 siblings, 1 reply; 8+ messages in thread
From: Greg Kroah-Hartman @ 2025-04-10 8:03 UTC (permalink / raw)
To: Jacek Lawrynowicz; +Cc: linux-kernel, stable, Karol Wachowski
On Thu, Apr 10, 2025 at 09:49:37AM +0200, Jacek Lawrynowicz wrote:
> Hi,
>
> This is an important patch for the Intel NPU.
> Is there anything it is missing to be included in stable?
Patience, you only sent this:
> On 4/8/2025 11:57 AM, Jacek Lawrynowicz wrote:
2 days ago, AFTER the latest of -rc releases was sent out for review,
and those kernels have NOT even been released yet!
[rant about how you all know this process works deleted as it was
just snarky on my side, although quite cathartic, thanks for letting me
vent...]
Relax, it will get handled when we can get to it. To help out, please
take the time to review pending stable backported patches that have been
submitted to the mailing list ahead of yours.
greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW
2025-04-10 8:03 ` Greg Kroah-Hartman
@ 2025-04-10 9:27 ` Jacek Lawrynowicz
0 siblings, 0 replies; 8+ messages in thread
From: Jacek Lawrynowicz @ 2025-04-10 9:27 UTC (permalink / raw)
To: Greg Kroah-Hartman; +Cc: linux-kernel, stable, Karol Wachowski
Hi,
On 4/10/2025 10:03 AM, Greg Kroah-Hartman wrote:
> On Thu, Apr 10, 2025 at 09:49:37AM +0200, Jacek Lawrynowicz wrote:
>> Hi,
>>
>> This is an important patch for the Intel NPU.
>> Is there anything it is missing to be included in stable?
>
> Patience, you only sent this:
>
>> On 4/8/2025 11:57 AM, Jacek Lawrynowicz wrote:
>
> 2 days ago, AFTER the latest of -rc releases was sent out for review,
> and those kernels have NOT even been released yet!
>
> [rant about how you all know this process works deleted as it was
> just snarky on my side, although quite cathartic, thanks for letting me
> vent...]
>
> Relax, it will get handled when we can get to it. To help out, please
> take the time to review pending stable backported patches that have been
> submitted to the mailing list ahead of yours.
Yeah, sorry about that. It seems I'm still learning the art of patience in the kernel processes.
I didn't mean to rush anything. I was just worried my patch might have been overlooked due to a typo or a wrong commit message tag.
I'll make sure to review the backported patches in the meantime.
And thanks for holding back on the rant. I really think that the standard set by Linus is not healthy.
Regards,
Jacek
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW
2025-04-08 9:57 [PATCH] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW Jacek Lawrynowicz
2025-04-10 7:49 ` Jacek Lawrynowicz
@ 2025-04-22 12:17 ` Greg KH
2025-04-24 10:22 ` Jacek Lawrynowicz
1 sibling, 1 reply; 8+ messages in thread
From: Greg KH @ 2025-04-22 12:17 UTC (permalink / raw)
To: Jacek Lawrynowicz; +Cc: linux-kernel, stable, Karol Wachowski
On Tue, Apr 08, 2025 at 11:57:11AM +0200, Jacek Lawrynowicz wrote:
> From: Karol Wachowski <karol.wachowski@intel.com>
>
> commit dad945c27a42dfadddff1049cf5ae417209a8996 upstream.
>
> Trigger recovery of the NPU upon receiving HW context violation from
> the firmware. The context violation error is a fatal error that prevents
> any subsequent jobs from being executed. Without this fix it is
> necessary to reload the driver to restore the NPU operational state.
>
> This is simplified version of upstream commit as the full implementation
> would require all engine reset/resume logic to be backported.
We REALLY do not like taking patches that are not upstream. Why not
backport all of the needed patches instead, how many would that be?
Taking one-off patches like this just makes it harder/impossible to
maintain the code over time as further fixes in this same area will NOT
apply properly at all.
Think about what you want to be touching 5 years from now, a one-off
change that doesn't match the rest of the kernel tree, or something that
is the same?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW
2025-04-22 12:17 ` Greg KH
@ 2025-04-24 10:22 ` Jacek Lawrynowicz
2025-04-24 10:34 ` Greg KH
0 siblings, 1 reply; 8+ messages in thread
From: Jacek Lawrynowicz @ 2025-04-24 10:22 UTC (permalink / raw)
To: Greg KH; +Cc: linux-kernel, stable, Karol Wachowski
Hi,
On 4/22/2025 2:17 PM, Greg KH wrote:
> On Tue, Apr 08, 2025 at 11:57:11AM +0200, Jacek Lawrynowicz wrote:
>> From: Karol Wachowski <karol.wachowski@intel.com>
>>
>> commit dad945c27a42dfadddff1049cf5ae417209a8996 upstream.
>>
>> Trigger recovery of the NPU upon receiving HW context violation from
>> the firmware. The context violation error is a fatal error that prevents
>> any subsequent jobs from being executed. Without this fix it is
>> necessary to reload the driver to restore the NPU operational state.
>>
>> This is simplified version of upstream commit as the full implementation
>> would require all engine reset/resume logic to be backported.
>
> We REALLY do not like taking patches that are not upstream. Why not
> backport all of the needed patches instead, how many would that be?
> Taking one-off patches like this just makes it harder/impossible to
> maintain the code over time as further fixes in this same area will NOT
> apply properly at all.
>
> Think about what you want to be touching 5 years from now, a one-off
> change that doesn't match the rest of the kernel tree, or something that
> is the same?
Sure, I'm totally on board with backporting all required patches.
I thought it was not possible due to 100 line limit.
This would be the minimum set of patches:
Patch 1:
drivers/accel/ivpu/ivpu_drv.c | 32 +++-----------
drivers/accel/ivpu/ivpu_drv.h | 2 +
drivers/accel/ivpu/ivpu_job.c | 78 ++++++++++++++++++++++++++-------
drivers/accel/ivpu/ivpu_job.h | 1 +
drivers/accel/ivpu/ivpu_mmu.c | 3 +-
drivers/accel/ivpu/ivpu_sysfs.c | 5 ++-
6 files changed, 75 insertions(+), 46 deletions(-)
Patch 2:
drivers/accel/ivpu/ivpu_job.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
Patch 3:
drivers/accel/ivpu/ivpu_job.c | 2 +-
drivers/accel/ivpu/ivpu_jsm_msg.c | 3 +-
drivers/accel/ivpu/vpu_boot_api.h | 45 +++--
drivers/accel/ivpu/vpu_jsm_api.h | 303 +++++++++++++++++++++++++-----
4 files changed, 293 insertions(+), 60 deletions(-)
Patch 4:
drivers/accel/ivpu/ivpu_job.c | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
First patch needs some changes to apply correctly to 6.12 but the rest of them apply pretty cleanly.
Is this acceptable?
Regards,
Jacek
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW
2025-04-24 10:22 ` Jacek Lawrynowicz
@ 2025-04-24 10:34 ` Greg KH
2025-04-30 12:50 ` Jacek Lawrynowicz
0 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2025-04-24 10:34 UTC (permalink / raw)
To: Jacek Lawrynowicz; +Cc: linux-kernel, stable, Karol Wachowski
On Thu, Apr 24, 2025 at 12:22:31PM +0200, Jacek Lawrynowicz wrote:
> Hi,
>
> On 4/22/2025 2:17 PM, Greg KH wrote:
> > On Tue, Apr 08, 2025 at 11:57:11AM +0200, Jacek Lawrynowicz wrote:
> >> From: Karol Wachowski <karol.wachowski@intel.com>
> >>
> >> commit dad945c27a42dfadddff1049cf5ae417209a8996 upstream.
> >>
> >> Trigger recovery of the NPU upon receiving HW context violation from
> >> the firmware. The context violation error is a fatal error that prevents
> >> any subsequent jobs from being executed. Without this fix it is
> >> necessary to reload the driver to restore the NPU operational state.
> >>
> >> This is simplified version of upstream commit as the full implementation
> >> would require all engine reset/resume logic to be backported.
> >
> > We REALLY do not like taking patches that are not upstream. Why not
> > backport all of the needed patches instead, how many would that be?
> > Taking one-off patches like this just makes it harder/impossible to
> > maintain the code over time as further fixes in this same area will NOT
> > apply properly at all.
> >
> > Think about what you want to be touching 5 years from now, a one-off
> > change that doesn't match the rest of the kernel tree, or something that
> > is the same?
>
> Sure, I'm totally on board with backporting all required patches.
> I thought it was not possible due to 100 line limit.
>
> This would be the minimum set of patches:
>
> Patch 1:
> drivers/accel/ivpu/ivpu_drv.c | 32 +++-----------
> drivers/accel/ivpu/ivpu_drv.h | 2 +
> drivers/accel/ivpu/ivpu_job.c | 78 ++++++++++++++++++++++++++-------
> drivers/accel/ivpu/ivpu_job.h | 1 +
> drivers/accel/ivpu/ivpu_mmu.c | 3 +-
> drivers/accel/ivpu/ivpu_sysfs.c | 5 ++-
> 6 files changed, 75 insertions(+), 46 deletions(-)
>
> Patch 2:
> drivers/accel/ivpu/ivpu_job.c | 15 ++++++---------
> 1 file changed, 6 insertions(+), 9 deletions(-)
>
> Patch 3:
> drivers/accel/ivpu/ivpu_job.c | 2 +-
> drivers/accel/ivpu/ivpu_jsm_msg.c | 3 +-
> drivers/accel/ivpu/vpu_boot_api.h | 45 +++--
> drivers/accel/ivpu/vpu_jsm_api.h | 303 +++++++++++++++++++++++++-----
> 4 files changed, 293 insertions(+), 60 deletions(-)
>
> Patch 4:
> drivers/accel/ivpu/ivpu_job.c | 27 ++++++++++++++++++++++++++-
> 1 file changed, 26 insertions(+), 1 deletion(-)
>
> First patch needs some changes to apply correctly to 6.12 but the rest of them apply pretty cleanly.
> Is this acceptable?
Totally acceptable, that's trivial compared to many of the larger
backports we have taken over the years :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW
2025-04-24 10:34 ` Greg KH
@ 2025-04-30 12:50 ` Jacek Lawrynowicz
0 siblings, 0 replies; 8+ messages in thread
From: Jacek Lawrynowicz @ 2025-04-30 12:50 UTC (permalink / raw)
To: Greg KH; +Cc: linux-kernel, stable, Karol Wachowski
Hi,
On 4/24/2025 12:34 PM, Greg KH wrote:
> On Thu, Apr 24, 2025 at 12:22:31PM +0200, Jacek Lawrynowicz wrote:
>> Hi,
>>
>> On 4/22/2025 2:17 PM, Greg KH wrote:
>>> On Tue, Apr 08, 2025 at 11:57:11AM +0200, Jacek Lawrynowicz wrote:
>>>> From: Karol Wachowski <karol.wachowski@intel.com>
>>>>
>>>> commit dad945c27a42dfadddff1049cf5ae417209a8996 upstream.
>>>>
>>>> Trigger recovery of the NPU upon receiving HW context violation from
>>>> the firmware. The context violation error is a fatal error that prevents
>>>> any subsequent jobs from being executed. Without this fix it is
>>>> necessary to reload the driver to restore the NPU operational state.
>>>>
>>>> This is simplified version of upstream commit as the full implementation
>>>> would require all engine reset/resume logic to be backported.
>>>
>>> We REALLY do not like taking patches that are not upstream. Why not
>>> backport all of the needed patches instead, how many would that be?
>>> Taking one-off patches like this just makes it harder/impossible to
>>> maintain the code over time as further fixes in this same area will NOT
>>> apply properly at all.
>>>
>>> Think about what you want to be touching 5 years from now, a one-off
>>> change that doesn't match the rest of the kernel tree, or something that
>>> is the same?
>>
>> Sure, I'm totally on board with backporting all required patches.
>> I thought it was not possible due to 100 line limit.
>>
>> This would be the minimum set of patches:
>>
>> Patch 1:
>> drivers/accel/ivpu/ivpu_drv.c | 32 +++-----------
>> drivers/accel/ivpu/ivpu_drv.h | 2 +
>> drivers/accel/ivpu/ivpu_job.c | 78 ++++++++++++++++++++++++++-------
>> drivers/accel/ivpu/ivpu_job.h | 1 +
>> drivers/accel/ivpu/ivpu_mmu.c | 3 +-
>> drivers/accel/ivpu/ivpu_sysfs.c | 5 ++-
>> 6 files changed, 75 insertions(+), 46 deletions(-)
>>
>> Patch 2:
>> drivers/accel/ivpu/ivpu_job.c | 15 ++++++---------
>> 1 file changed, 6 insertions(+), 9 deletions(-)
>>
>> Patch 3:
>> drivers/accel/ivpu/ivpu_job.c | 2 +-
>> drivers/accel/ivpu/ivpu_jsm_msg.c | 3 +-
>> drivers/accel/ivpu/vpu_boot_api.h | 45 +++--
>> drivers/accel/ivpu/vpu_jsm_api.h | 303 +++++++++++++++++++++++++-----
>> 4 files changed, 293 insertions(+), 60 deletions(-)
>>
>> Patch 4:
>> drivers/accel/ivpu/ivpu_job.c | 27 ++++++++++++++++++++++++++-
>> 1 file changed, 26 insertions(+), 1 deletion(-)
>>
>> First patch needs some changes to apply correctly to 6.12 but the rest of them apply pretty cleanly.
>> Is this acceptable?
>
> Totally acceptable, that's trivial compared to many of the larger
> backports we have taken over the years :)
OK, I've sent two separate patchses for 6.12 and 6.14 that contain minimal number of patches.
I've rebased only two patches in each patchsets and the rest is as-is from upstream.
Let me know in case I messed something up.
Regards,
Jacek
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-04-30 12:50 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-08 9:57 [PATCH] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW Jacek Lawrynowicz
2025-04-10 7:49 ` Jacek Lawrynowicz
2025-04-10 8:03 ` Greg Kroah-Hartman
2025-04-10 9:27 ` Jacek Lawrynowicz
2025-04-22 12:17 ` Greg KH
2025-04-24 10:22 ` Jacek Lawrynowicz
2025-04-24 10:34 ` Greg KH
2025-04-30 12:50 ` Jacek Lawrynowicz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox