All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Osipenko <digetx@gmail.com>
To: Mikko Perttunen <cyndis@kapsi.fi>,
	Mikko Perttunen <mperttunen@nvidia.com>,
	thierry.reding@gmail.com, jonathanh@nvidia.com, airlied@linux.ie,
	daniel@ffwll.ch
Cc: linux-tegra@vger.kernel.org, dri-devel@lists.freedesktop.org,
	talho@nvidia.com, bhuntsman@nvidia.com
Subject: Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
Date: Sun, 13 Sep 2020 00:51:01 +0300	[thread overview]
Message-ID: <3f80aff2-23ce-9b1f-d242-e46e974fbeed@gmail.com> (raw)
In-Reply-To: <28f18a23-b588-004d-4945-91b7a593607a@kapsi.fi>

12.09.2020 16:31, Mikko Perttunen пишет:
...
>> I'm now taking a closer look at this patch and it raises some more
>> questions, like for example by looking at the "On job timeout, we stop
>> the channel, NOP all future jobs on the channel using the same syncpoint
>> ..." through the prism of grate-kernel experience, I'm not sure how it
>> could co-exist with the drm-scheduler and why it's needed at all. But I
>> think we need a feature-complete version (at least a rough version), so
>> that we could start the testing, and then it should be easier to review
>> and discuss such things.
>>
> 
> The reason this is needed is that if a job times out and we don't do its
> syncpoint increments on the CPU, then a successive job incrementing that
> same syncpoint would cause fences to signal incorrectly. The job that
> was supposed to signal those fences didn't actually run; and any data
> those fences were protecting would still be garbage.

I'll need to re-read the previous discussion because IIRC, I was
suggesting that once job is hung, all jobs should be removed from
queue/PB and re-submitted, then the re-submitted jobs will use the
new/updated sync point base.

And we probably should need another drm_tegra_submit_cmd type that waits
for a relative sync point increment. The
drm_tegra_submit_cmd_wait_syncpt uses absolute sync point value and it
shouldn't be used for sync point increments that are internal to a job
because it complicates the recovery.

All waits that are internal to a job should only wait for relative sync
point increments.

In the grate-kernel every job uses unique-and-clean sync point (which is
also internal to the kernel driver) and a relative wait [1] is used for
the job's internal sync point increments [2][3][4], and thus, kernel
driver simply jumps over a hung job by updating DMAGET to point at the
start of a next job.

[1]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/uapi/patching.c#L367

[2]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/gr3d.c#L486
[3]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/exa/copy_2d.c#L389
[4]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/tegra_stream_v2.c#L536

WARNING: multiple messages have this Message-ID (diff)
From: Dmitry Osipenko <digetx@gmail.com>
To: Mikko Perttunen <cyndis@kapsi.fi>,
	Mikko Perttunen <mperttunen@nvidia.com>,
	thierry.reding@gmail.com, jonathanh@nvidia.com, airlied@linux.ie,
	daniel@ffwll.ch
Cc: linux-tegra@vger.kernel.org, talho@nvidia.com,
	bhuntsman@nvidia.com, dri-devel@lists.freedesktop.org
Subject: Re: [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode
Date: Sun, 13 Sep 2020 00:51:01 +0300	[thread overview]
Message-ID: <3f80aff2-23ce-9b1f-d242-e46e974fbeed@gmail.com> (raw)
In-Reply-To: <28f18a23-b588-004d-4945-91b7a593607a@kapsi.fi>

12.09.2020 16:31, Mikko Perttunen пишет:
...
>> I'm now taking a closer look at this patch and it raises some more
>> questions, like for example by looking at the "On job timeout, we stop
>> the channel, NOP all future jobs on the channel using the same syncpoint
>> ..." through the prism of grate-kernel experience, I'm not sure how it
>> could co-exist with the drm-scheduler and why it's needed at all. But I
>> think we need a feature-complete version (at least a rough version), so
>> that we could start the testing, and then it should be easier to review
>> and discuss such things.
>>
> 
> The reason this is needed is that if a job times out and we don't do its
> syncpoint increments on the CPU, then a successive job incrementing that
> same syncpoint would cause fences to signal incorrectly. The job that
> was supposed to signal those fences didn't actually run; and any data
> those fences were protecting would still be garbage.

I'll need to re-read the previous discussion because IIRC, I was
suggesting that once job is hung, all jobs should be removed from
queue/PB and re-submitted, then the re-submitted jobs will use the
new/updated sync point base.

And we probably should need another drm_tegra_submit_cmd type that waits
for a relative sync point increment. The
drm_tegra_submit_cmd_wait_syncpt uses absolute sync point value and it
shouldn't be used for sync point increments that are internal to a job
because it complicates the recovery.

All waits that are internal to a job should only wait for relative sync
point increments.

In the grate-kernel every job uses unique-and-clean sync point (which is
also internal to the kernel driver) and a relative wait [1] is used for
the job's internal sync point increments [2][3][4], and thus, kernel
driver simply jumps over a hung job by updating DMAGET to point at the
start of a next job.

[1]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/uapi/patching.c#L367

[2]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/gr3d.c#L486
[3]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/exa/copy_2d.c#L389
[4]
https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/tegra_stream_v2.c#L536
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2020-09-12 21:51 UTC|newest]

Thread overview: 128+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-05 10:34 [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI Mikko Perttunen
2020-09-05 10:34 ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 01/17] gpu: host1x: Use different lock classes for each client Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 02/17] gpu: host1x: Allow syncpoints without associated client Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 03/17] gpu: host1x: Show number of pending waiters in debugfs Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 04/17] gpu: host1x: Remove cancelled waiters immediately Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 05/17] gpu: host1x: Use HW-equivalent syncpoint expiration check Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 06/17] gpu: host1x: Cleanup and refcounting for syncpoints Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 14:30   ` Dmitry Osipenko
2020-09-05 14:30     ` Dmitry Osipenko
2020-09-05 14:53     ` Mikko Perttunen
2020-09-05 14:53       ` Mikko Perttunen
2020-09-09  0:07       ` Dmitry Osipenko
2020-09-09  0:07         ` Dmitry Osipenko
2020-09-09  8:03         ` Mikko Perttunen
2020-09-09  8:03           ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 07/17] gpu: host1x: Introduce UAPI header Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 08/17] gpu: host1x: Implement /dev/host1x device node Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 09/17] gpu: host1x: DMA fences and userspace fence creation Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-10 22:00   ` Dmitry Osipenko
2020-09-10 22:00     ` Dmitry Osipenko
2020-09-11  9:07     ` Mikko Perttunen
2020-09-11  9:07       ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 10/17] WIP: gpu: host1x: Add no-recovery mode Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-11 16:40   ` Dmitry Osipenko
2020-09-11 16:40     ` Dmitry Osipenko
2020-09-11 22:11     ` Mikko Perttunen
2020-09-11 22:11       ` Mikko Perttunen
2020-09-12 12:53       ` Dmitry Osipenko
2020-09-12 12:53         ` Dmitry Osipenko
2020-09-12 13:31         ` Mikko Perttunen
2020-09-12 13:31           ` Mikko Perttunen
2020-09-12 21:51           ` Dmitry Osipenko [this message]
2020-09-12 21:51             ` Dmitry Osipenko
2020-09-13  9:51             ` Mikko Perttunen
2020-09-13  9:51               ` Mikko Perttunen
2020-09-13 18:37               ` Dmitry Osipenko
2020-09-13 18:37                 ` Dmitry Osipenko
2020-09-15 10:57                 ` Mikko Perttunen
2020-09-15 10:57                   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 11/17] gpu: host1x: Add job release callback Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 12/17] gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 13/17] gpu: host1x: Reset max value when freeing a syncpoint Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-16 19:44   ` Dmitry Osipenko
2020-09-16 19:44     ` Dmitry Osipenko
2020-09-16 20:43     ` Mikko Perttunen
2020-09-16 20:43       ` Mikko Perttunen
2020-09-16 21:37       ` Dmitry Osipenko
2020-09-16 21:37         ` Dmitry Osipenko
2020-09-17  7:25         ` Mikko Perttunen
2020-09-17  7:25           ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 14/17] drm/tegra: Add new UAPI to header Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-08 23:45   ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Dmitry Osipenko
2020-09-08 23:45     ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Dmitry Osipenko
2020-09-09  8:10     ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Mikko Perttunen
2020-09-09  8:10       ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Mikko Perttunen
2020-09-10 22:15       ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Dmitry Osipenko
2020-09-10 22:15         ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Dmitry Osipenko
2020-09-11  9:52         ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Mikko Perttunen
2020-09-11  9:52           ` DRM_TEGRA_SUBMIT_BUF_WRITE_RELOC Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 15/17] drm/tegra: Add power_on/power_off engine callbacks Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-09  0:16   ` Dmitry Osipenko
2020-09-09  0:16     ` Dmitry Osipenko
2020-09-09  8:11     ` Mikko Perttunen
2020-09-09  8:11       ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 16/17] drm/tegra: Allocate per-engine channel in core code Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-05 10:34 ` [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI Mikko Perttunen
2020-09-05 10:34   ` Mikko Perttunen
2020-09-09  0:47   ` Dmitry Osipenko
2020-09-09  0:47     ` Dmitry Osipenko
2020-09-09  8:19     ` Mikko Perttunen
2020-09-09  8:19       ` Mikko Perttunen
2020-09-10 21:59       ` Dmitry Osipenko
2020-09-10 21:59         ` Dmitry Osipenko
2020-09-09  1:13   ` [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI (submit_handle_syncpts) Dmitry Osipenko
2020-09-09  1:13     ` Dmitry Osipenko
2020-09-09  1:24     ` Dmitry Osipenko
2020-09-09  1:24       ` Dmitry Osipenko
2020-09-09  8:26       ` Mikko Perttunen
2020-09-09  8:26         ` Mikko Perttunen
2020-09-10 21:58         ` Dmitry Osipenko
2020-09-10 21:58           ` Dmitry Osipenko
2020-09-09  2:06   ` [RFC PATCH v2 17/17] WIP: drm/tegra: Implement new UAPI Dmitry Osipenko
2020-09-09  2:06     ` Dmitry Osipenko
2020-09-09  8:26     ` Mikko Perttunen
2020-09-09  8:26       ` Mikko Perttunen
2020-09-09  2:10   ` Dmitry Osipenko
2020-09-09  2:10     ` Dmitry Osipenko
2020-09-09  2:34     ` Dmitry Osipenko
2020-09-09  2:34       ` Dmitry Osipenko
2020-09-09  8:36       ` Mikko Perttunen
2020-09-09  8:36         ` Mikko Perttunen
2020-09-10 21:57         ` Dmitry Osipenko
2020-09-10 21:57           ` Dmitry Osipenko
2020-09-11  9:59           ` Mikko Perttunen
2020-09-11  9:59             ` Mikko Perttunen
2020-09-11 16:30             ` Dmitry Osipenko
2020-09-11 16:30               ` Dmitry Osipenko
2020-09-15 11:08               ` Mikko Perttunen
2020-09-15 11:08                 ` Mikko Perttunen
2020-09-08 23:36 ` [RFC PATCH v2 00/17] Host1x/TegraDRM UAPI Dmitry Osipenko
2020-09-08 23:36   ` Dmitry Osipenko
2020-09-09  8:40   ` Mikko Perttunen
2020-09-09  8:40     ` Mikko Perttunen
2020-09-10 22:09     ` Dmitry Osipenko
2020-09-10 22:09       ` Dmitry Osipenko
2020-09-09  2:20 ` Dmitry Osipenko
2020-09-09  2:20   ` Dmitry Osipenko
2020-09-09  8:44   ` Mikko Perttunen
2020-09-09  8:44     ` Mikko Perttunen
2020-09-10 21:53     ` Dmitry Osipenko
2020-09-10 21:53       ` Dmitry Osipenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f80aff2-23ce-9b1f-d242-e46e974fbeed@gmail.com \
    --to=digetx@gmail.com \
    --cc=airlied@linux.ie \
    --cc=bhuntsman@nvidia.com \
    --cc=cyndis@kapsi.fi \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jonathanh@nvidia.com \
    --cc=linux-tegra@vger.kernel.org \
    --cc=mperttunen@nvidia.com \
    --cc=talho@nvidia.com \
    --cc=thierry.reding@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.