From: "Ceraolo Spurio, Daniele" <daniele.ceraolospurio@intel.com>
To: "Zhang, Carl" <carl.zhang@intel.com>,
"Ye, Tony" <tony.ye@intel.com>,
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
"intel-gfx@lists.freedesktop.org"
<intel-gfx@lists.freedesktop.org>
Cc: "Usyskin, Alexander" <alexander.usyskin@intel.com>,
"Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>
Subject: Re: [Intel-gfx] [PATCH 00/15] HuC loading for DG2
Date: Tue, 5 Jul 2022 16:30:18 -0700 [thread overview]
Message-ID: <a120b625-4042-f616-b314-aed2013f324b@intel.com> (raw)
In-Reply-To: <PH0PR11MB557934FC60F660B9ABB96CA987AC9@PH0PR11MB5579.namprd11.prod.outlook.com>
On 6/15/2022 7:28 PM, Zhang, Carl wrote:
>> On From: Ye, Tony <tony.ye@intel.com>
>> Sent: Thursday, June 16, 2022 12:15 AM
>>
>>
>> On 6/15/2022 3:13 AM, Tvrtko Ursulin wrote:
>>> On 15/06/2022 00:15, Ye, Tony wrote:
>>>> On 6/14/2022 8:30 AM, Ceraolo Spurio, Daniele wrote:
>>>>> On 6/14/2022 12:44 AM, Tvrtko Ursulin wrote:
>>>>>> On 13/06/2022 19:13, Ceraolo Spurio, Daniele wrote:
>>>>>>> On 6/13/2022 10:39 AM, Tvrtko Ursulin wrote:
>>>>>>>> On 13/06/2022 18:06, Ceraolo Spurio, Daniele wrote:
>>>>>>>>> On 6/13/2022 9:56 AM, Tvrtko Ursulin wrote:
>>>>>>>>>> On 13/06/2022 17:41, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>> On 6/13/2022 9:31 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>> On 13/06/2022 16:39, Ceraolo Spurio, Daniele wrote:
>>>>>>>>>>>>> On 6/13/2022 1:16 AM, Tvrtko Ursulin wrote:
>>>>>>>>>>>>>> On 10/06/2022 00:19, Daniele Ceraolo Spurio wrote:
>>>>>>>>>>>>>>> On DG2, HuC loading is performed by the GSC, via a PXP
>>>>>>>>>>>>>>> command. The load operation itself is relatively simple
>>>>>>>>>>>>>>> (just send a message to the GSC with the physical address
>>>>>>>>>>>>>>> of the HuC in LMEM), but there are timing changes that
>>>>>>>>>>>>>>> requires special attention. In particular, to send a PXP
>>>>>>>>>>>>>>> command we need to first export the GSC driver and then
>>>>>>>>>>>>>>> wait for the mei-gsc and mei-pxp modules to start, which
>>>>>>>>>>>>>>> means that HuC load will complete after i915 load is
>>>>>>>>>>>>>>> complete. This means that there is a small window of time
>>>>>>>>>>>>>>> after i915 is registered and before HuC is loaded during
>>>>>>>>>>>>>>> which userspace could submit and/or checking the HuC load
>>>>>>>>>>>>>>> status, although this is quite unlikely to happen (HuC is
>>>>>>>>>>>>>>> usually loaded before kernel init/resume completes).
>>>>>>>>>>>>>>> We've consulted with the media team in regards to how to
>>>>>>>>>>>>>>> handle this and they've asked us to do the following:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) Report HuC as loaded in the getparam IOCTL even if load
>>>>>>>>>>>>>>> is still in progress. The media driver uses the IOCTL as a
>>>>>>>>>>>>>>> way to check if HuC is enabled and then includes a
>>>>>>>>>>>>>>> secondary check in the batches to get the actual status,
>>>>>>>>>>>>>>> so doing it this way allows userspace to keep working
>>>>>>>>>>>>>>> without changes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) Stall all userspace VCS submission until HuC is loaded.
>>>>>>>>>>>>>>> Stalls are
>>>>>>>>>>>>>>> expected to be very rare (if any), due to the fact that
>>>>>>>>>>>>>>> HuC is usually loaded before kernel init/resume is
>>>>>>>>>>>>>>> completed.
>>>>>>>>>>>>>> Motivation to add these complications into i915 are not
>>>>>>>>>>>>>> clear to me here. I mean there is no HuC on DG2 _yet_ is
>>>>>>>>>>>>>> the premise of the series, right? So no backwards
>>>>>>>>>>>>>> compatibility concerns. In this case why jump through the
>>>>>>>>>>>>>> hoops and not let userspace handle all of this by just
>>>>>>>>>>>>>> leaving the getparam return the true status?
>>>>>>>>>>>>> The main areas impacted by the fact that we can't guarantee
>>>>>>>>>>>>> that HuC load is complete when i915 starts accepting
>>>>>>>>>>>>> submissions are boot and suspend/resume, with the latter
>>>>>>>>>>>>> being the main problem; GT reset is not a concern because
>>>>>>>>>>>>> HuC now survives it. A suspend/resume can be transparent to
>>>>>>>>>>>>> userspace and therefore the HuC status can temporarily flip
>>>>>>>>>>>>> from loaded to not without userspace knowledge, especially
>>>>>>>>>>>>> if we start going into deeper suspend states and start
>>>>>>>>>>>>> causing HuC resets when we go into runtime suspend. Note
>>>>>>>>>>>>> that this is different from what happens during GT reset for
>>>>>>>>>>>>> older platforms, because in that scenario we guarantee that
>>>>>>>>>>>>> HuC reload is complete before we restart the submission
>>>>>>>>>>>>> back-end, so userspace doesn't notice that the HuC status
>>>>>>>>>>>>> change. We had an internal discussion about this problem
>>>>>>>>>>>>> with both media and i915 archs and the conclusion was that
>>>>>>>>>>>>> the best option is for i915 to stall media submission while
>>>>>>>>>>>>> HuC (re-)load is in progress.
>>>>>>>>>>>> Resume is potentialy a good reason - I did not pick up on
>>>>>>>>>>>> that from the cover letter. I read the statement about the
>>>>>>>>>>>> unlikely and small window where HuC is not loaded during
>>>>>>>>>>>> kernel init/resume and I guess did not pick up on the resume
>>>>>>>>>>>> part.
>>>>>>>>>>>>
>>>>>>>>>>>> Waiting for GSC to load HuC from i915 resume is not an option?
>>>>>>>>>>> GSC is an aux device exported by i915, so AFAIU GSC resume
>>>>>>>>>>> can't start until i915 resume completes.
>>>>>>>>>> I'll dig into this in the next few days since I want to
>>>>>>>>>> understand how exactly it works. Or someone can help explain.
>>>>>>>>>>
>>>>>>>>>> If in the end conclusion will be that i915 resume indeed cannot
>>>>>>>>>> wait for GSC, then I think auto-blocking of queued up contexts
>>>>>>>>>> on media engines indeed sounds unavoidable. Otherwise, as you
>>>>>>>>>> explained, user experience post resume wouldn't be good.
>>>>>>>>> Even if we could implement a wait, I'm not sure we should. GSC
>>>>>>>>> resume and HuC reload takes ~300ms in most cases, I don't think
>>>>>>>>> we want to block within the i915 resume path for that long.
>>>>>>>> Yeah maybe not. But entertaining the idea that it is technically
>>>>>>>> possible to block - we could perhaps add uapi for userspace to
>>>>>>>> mark contexts which want HuC access. Then track if there are any
>>>>>>>> such contexts with outstanding submissions and only wait in
>>>>>>>> resume if there are. If that would end up significantly less code
>>>>>>>> on the i915 side to maintain is an open.
>>>>>>>>
>>>>>>>> What would be the end result from users point of view in case
>>>>>>>> where it suspended during video playback? The proposed solution
>>>>>>>> from this series sees the video stuck after resume. Maybe
>>>>>>>> compositor blocks as well since I am not sure how well they
>>>>>>>> handle one window not providing new data. Probably depends on
>> the
>>>>>>>> compositor.
>>>>>>>>
>>>>>>>> And then with a simpler solution definitely the whole resume
>>>>>>>> would be delayed by 300ms.
>>>>>>>>
>>>>>>>> With my ChromeOS hat the stalled media engines does sound like a
>>>>>>>> better solution. But with the maintainer hat I'd like all options
>>>>>>>> evaluated since there is attractiveness if a good enough solution
>>>>>>>> can be achieved with significantly less kernel code.
>>>>>>>>
>>>>>>>> You say 300ms is typical time for HuC load. How long it is on
>>>>>>>> other platforms? If much faster then why is it so slow here?
>>>>>>> The GSC itself has to come out of suspend before it can perform
>>>>>>> the load, which takes a few tens of ms I believe. AFAIU the GSC is
>>>>>>> also slower in processing the HuC load and auth compared to the
>>>>>>> legacy path. The GSC FW team gave a 250ms limit for the time the
>>>>>>> GSC FW needs from start of the resume flow to HuC load complete,
>>>>>>> so I bumped that to ~300ms to account for all other SW
>>>>>>> interactions, plus a bit of buffer. Note that a bit of the SW
>>>>>>> overhead is caused by the fact that we have 2 mei modules in play
>>>>>>> here: mei-gsc, which manages the GSC device itself (including
>>>>>>> resume), and mei-pxp, which owns the pxp messaging, including HuC
>>>>>>> load.
>>>>>> And how long on other platforms (not DG2) do you know? Presumably
>>>>>> there the wait is on the i915 resume path?
>>>>> I don't have "official" expected load times at hand, but looking at
>>>>> the BAT boot logs for this series for DG1 I see it takes ~10 ms to
>>>>> load both GuC and HuC:
>>>>>
>>>>> <7>[ 8.157838] i915 0000:03:00.0: [drm:intel_huc_init [i915]] GSC
>>>>> loads huc=no <6>[ 8.158632] i915 0000:03:00.0: [drm] GuC firmware
>>>>> i915/dg1_guc_70.1.1.bin version 70.1 <6>[ 8.158634] i915
>>>>> 0000:03:00.0: [drm] HuC firmware i915/dg1_huc_7.9.3.bin version 7.9
>>>>> <7>[ 8.164255] i915 0000:03:00.0: [drm:guc_enable_communication
>>>>> [i915]] GuC communication enabled <6>[ 8.166111] i915
>>>>> 0000:03:00.0: [drm] HuC authenticated
>>>>>
>>>>> Note that we increase the GT frequency all the way to the max before
>>>>> starting the FW load, which speeds things up.
>>>>>
>>>>>>>>>> However, do we really need to lie in the getparam? How about
>>>>>>>>>> extend or add a new one to separate the loading vs loaded
>>>>>>>>>> states? Since userspace does not support DG2 HuC yet this
>>>>>>>>>> should be doable.
>>>>>>>>> I don't really have a preference here. The media team asked us
>>>>>>>>> to do it this way because they wouldn't have a use for the
>>>>>>>>> different "in progress" and "done" states. If they're ok with
>>>>>>>>> having separate flags that's fine by me.
>>>>>>>>> Tony, any feedback here?
>>>>>>>> We don't even have any docs in i915_drm.h in terms of what it
>> means:
>>>>>>>> #define I915_PARAM_HUC_STATUS 42
>>>>>>>>
>>>>>>>> Seems to be a boolean. Status false vs true? Could you add some
>>>>>>>> docs?
>>>>>>> There is documentation above intel_huc_check_status(), which is
>>>>>>> also updated in this series. I can move that to i915_drm.h.
>>>>>> That would be great, thanks.
>>>>>>
>>>>>> And with so rich return codes already documented and exposed via
>>>>>> uapi - would we really need to add anything new for DG2 apart for
>>>>>> userspace to know that if zero is returned (not a negative error
>>>>>> value) it should retry? I mean is there another negative error
>>>>>> missing which would prevent zero transitioning to one?
>>>>> I think if the auth fails we currently return 0, because the uc
>>>>> state in that case would be "TRANSFERRED", i.e. DMA complete but not
>>>>> fully enabled. I don't have anything against changing the FW state
>>>>> to "ERROR" in this scenario and leave the 0 to mean "not done yet",
>>>>> but I'd prefer the media team to comment on their needs for this
>>>>> IOCTL before committing to anything.
>>>>
>>>> Currently media doesn't differentiate "delayed loading is in
>>>> progress" with "HuC is authenticated and running". If the HuC
>>>> authentication eventually fails, the user needs to check the debugfs
>>>> to know the reason. IMHO, it's not a big problem as this is what we
>>>> do even when the IOCTL returns non-zero values. + Carl to comment.
>>> (Side note - debugfs can be assumed to not exist so it is not
>>> interesting to users.)
>>>
>>> There isn't currently a "delayed loading is in progress" state, that's
>>> the discussion in this thread, if and how to add it.
>>>
>>> Getparam it currently documents these states:
>>>
>>> -ENODEV if HuC is not present on this platform,
>>> -EOPNOTSUPP if HuC firmware is disabled,
>>> -ENOPKG if HuC firmware was not installed,
>>> -ENOEXEC if HuC firmware is invalid or mismatched,
>>> 0 if HuC firmware is not running,
>>> 1 if HuC firmware is authenticated and running.
>>>
>>> This patch proposed to change this to:
>>>
>>> 1 if HuC firmware is authenticated and running or if delayed load is
>>> in progress,
>>> 0 if HuC firmware is not running and delayed load is not in progress
>>>
>>> Alternative idea is for DG2 (well in general) to add some more fine
>>> grained states, so that i915 does not have to use 1 for both running
>>> and loading. This may be adding a new error code for auth fails as
>>> Daniele mentioned. Then UMD can know that if 0 is returned and
>>> platform is DG2 it needs to query it again since it will transition to
>>> either 1 or error eventually. This would mean the non error states
>>> would be:
>>>
>>> 0 not running (aka loading)
>>> 1 running (and authenticated)
>>>
>>> @Daniele - one more thing - can you make sure in the series (if you
>>> haven't already) that if HuC status was in any error before suspend
>>> reload is not re-tried on resume? My thinking is that the error is
>>> likely to persist and we don't want to impose long delay on every
>>> resume afterwards. Makes sense to you?
>>>
>>> @Tony - one more question for the UMD. Or two.
>>>
>>> How prevalent is usage of HuC on DG2 depending on what codecs need it?
>>> Do you know in advance, before creating a GEM context, that HuC
>>> commands will be sent to the engine or this changes at runtime?
>> HuC is needed for all codecs while HW bit rate control (CBR, VBR) is in use.
>> It's also used by content protection. And UMD doesn't know if it will be used
>> later at context creation time.
>>
> from UMD perspective, We don’t care much on the normal initialization process
> because, I could not image that a system is boot up, and user select a crypted content
> to playback, and huc is still not ready.
> of course, We are also ok to query the huc status twice, and wait if the status is "0 not running"
> to avoid potential issue.
>
> I suppose the main possible issue will happen in the hibernation/awake process, it is transparent to UMD.
> UMD will not call ioctrl to query huc status in this process, and will continue to send command buffer to KMD.
I think there is an agreement that it is ok to return 0 to mark the load
still in progress and 1 for load & auth complete. However, double
checking the code it turns out that we currently return 0 on load
failure, even if that's not particularly clear from the comment. I can
easily change that to be an error code, but not sure if that's
considered an API breakage considering it's not a well documented
behavior. I believe that on pre-DG2 userspace considers 1 as ok and
everything else as failure, so changing the ioctl to return an error
code on failure and 0 for load pending (with the latter being a
DG2-esclusive code for now) should be safe, but I'd like confirmation
that I'm not breaking API before sending the relevant code.
Thanks,
Daniele
>
>> Thanks,
>>
>> Tony
>>
>>> Regards,
>>>
>>> Tvrtko
next prev parent reply other threads:[~2022-07-05 23:30 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-09 23:19 [Intel-gfx] [PATCH 00/15] HuC loading for DG2 Daniele Ceraolo Spurio
2022-06-09 23:19 ` [Intel-gfx] [PATCH 01/15] HAX: mei: GSC support for XeHP SDV and DG2 platform Daniele Ceraolo Spurio
2022-06-09 23:19 ` [Intel-gfx] [PATCH 02/15] mei: add support to GSC extended header Daniele Ceraolo Spurio
2022-08-03 22:07 ` Teres Alexis, Alan Previn
2022-08-16 20:49 ` Winkler, Tomas
2022-06-09 23:19 ` [Intel-gfx] [PATCH 03/15] mei: bus: enable sending gsc commands Daniele Ceraolo Spurio
2022-06-09 23:19 ` [Intel-gfx] [PATCH 04/15] mei: bus: extend bus API to support command streamer API Daniele Ceraolo Spurio
2022-06-09 23:19 ` [Intel-gfx] [PATCH 05/15] mei: pxp: add command streamer API to the PXP driver Daniele Ceraolo Spurio
2022-07-27 1:42 ` Teres Alexis, Alan Previn
2022-06-09 23:19 ` [Intel-gfx] [PATCH 06/15] mei: pxp: support matching with a gfx discrete card Daniele Ceraolo Spurio
2022-07-27 1:01 ` Teres Alexis, Alan Previn
2022-06-09 23:19 ` [Intel-gfx] [PATCH 07/15] drm/i915/pxp: load the pxp module when we have a gsc-loaded huc Daniele Ceraolo Spurio
2022-06-18 7:27 ` Teres Alexis, Alan Previn
2022-06-09 23:19 ` [Intel-gfx] [PATCH 08/15] drm/i915/pxp: implement function for sending tee stream command Daniele Ceraolo Spurio
2022-06-18 8:07 ` Teres Alexis, Alan Previn
2022-06-09 23:19 ` [Intel-gfx] [PATCH 09/15] drm/i915/pxp: add huc authentication and loading command Daniele Ceraolo Spurio
2022-06-21 6:33 ` Teres Alexis, Alan Previn
2022-06-09 23:19 ` [Intel-gfx] [PATCH 10/15] drm/i915/dg2: setup HuC loading via GSC Daniele Ceraolo Spurio
2022-07-05 22:35 ` Teres Alexis, Alan Previn
2022-06-09 23:19 ` [Intel-gfx] [PATCH 11/15] drm/i915/huc: track delayed HuC load with a fence Daniele Ceraolo Spurio
2022-07-06 4:42 ` Teres Alexis, Alan Previn
2022-06-09 23:19 ` [Intel-gfx] [PATCH 12/15] drm/i915/huc: stall media submission until HuC is loaded Daniele Ceraolo Spurio
2022-07-27 0:33 ` Teres Alexis, Alan Previn
2022-06-09 23:19 ` [Intel-gfx] [PATCH 13/15] drm/i915/huc: report HuC as loaded even if load still in progress Daniele Ceraolo Spurio
2022-07-06 4:49 ` Teres Alexis, Alan Previn
2022-06-09 23:19 ` [Intel-gfx] [PATCH 14/15] drm/i915/huc: define gsc-compatible HuC fw for DG2 Daniele Ceraolo Spurio
2022-06-22 17:55 ` Teres Alexis, Alan Previn
2022-06-22 18:16 ` Teres Alexis, Alan Previn
2022-06-09 23:19 ` [Intel-gfx] [PATCH 15/15] HAX: drm/i915: force INTEL_MEI_GSC and INTEL_MEI_PXP on for CI Daniele Ceraolo Spurio
2022-06-10 0:07 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for HuC loading for DG2 Patchwork
2022-06-10 0:07 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-06-10 8:01 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-06-11 8:01 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-06-13 8:16 ` [Intel-gfx] [PATCH 00/15] " Tvrtko Ursulin
2022-06-13 15:39 ` Ceraolo Spurio, Daniele
2022-06-13 16:31 ` Tvrtko Ursulin
2022-06-13 16:41 ` Ceraolo Spurio, Daniele
2022-06-13 16:56 ` Tvrtko Ursulin
2022-06-13 17:06 ` Ceraolo Spurio, Daniele
2022-06-13 17:39 ` Tvrtko Ursulin
2022-06-13 18:13 ` Ceraolo Spurio, Daniele
2022-06-14 7:44 ` Tvrtko Ursulin
2022-06-14 15:30 ` Ceraolo Spurio, Daniele
2022-06-14 23:15 ` Ye, Tony
2022-06-15 10:13 ` Tvrtko Ursulin
2022-06-15 14:35 ` Ceraolo Spurio, Daniele
2022-06-15 14:53 ` Tvrtko Ursulin
2022-06-15 16:14 ` Ye, Tony
2022-06-16 2:28 ` Zhang, Carl
2022-07-05 23:30 ` Ceraolo Spurio, Daniele [this message]
2022-07-06 17:26 ` Ye, Tony
2022-07-06 19:29 ` Ceraolo Spurio, Daniele
2022-07-06 20:11 ` Ye, Tony
2022-06-16 7:10 ` Tvrtko Ursulin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a120b625-4042-f616-b314-aed2013f324b@intel.com \
--to=daniele.ceraolospurio@intel.com \
--cc=alan.previn.teres.alexis@intel.com \
--cc=alexander.usyskin@intel.com \
--cc=carl.zhang@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=tony.ye@intel.com \
--cc=tvrtko.ursulin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox