From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yu Dai Subject: Re: [PATCH 06/18] drm/i915: Defer default hardware context initialisation until first open Date: Mon, 30 Mar 2015 12:11:51 -0700 Message-ID: <55199FF7.8030100@intel.com> References: <1427398885-31988-1-git-send-email-yu.dai@intel.com> <1427398885-31988-7-git-send-email-yu.dai@intel.com> <20150327084527.GI23521@phenom.ffwll.local> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0406090382==" Return-path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTP id 646647206E for ; Mon, 30 Mar 2015 12:12:55 -0700 (PDT) In-Reply-To: <20150327084527.GI23521@phenom.ffwll.local> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Daniel Vetter Cc: intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org This is a multi-part message in MIME format. --===============0406090382== Content-Type: multipart/alternative; boundary="------------050804070708050903060009" This is a multi-part message in MIME format. --------------050804070708050903060009 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit On 03/27/2015 01:45 AM, Daniel Vetter wrote: > On Thu, Mar 26, 2015 at 12:41:13PM -0700, yu.dai@intel.com wrote: > > From: Dave Gordon > > > > In order to fully initialise the default contexts, we have to execute > > batchbuffer commands on the GPU engines. But we can't do that until any > > required firmware has been loaded, which may not be possible during > > driver load, because the filesystem(s) containing the firmware may not > > be mounted until later. > > > > Therefore, we now allow the first call to the firmware-loading code to > > return -EAGAIN to indicate that it's not yet ready, and that it should > > be retried when the device is first opened from user code, by which > > time we expect that all required filesystems will have been mounted. > > The late-retry code will then re-attempt to load the firmware if the > > early attempt failed. > > We've tried a similar approach a while back and it doesn't work well in > conjunction with rps - the hw tends to fall over if the context state > isn't properly initialized when going into rc6. I believe patch 18/18 of this series is to notify GuC when RC6 is on and off. I don't know much details of implementation inside firmware, but I believe GuC will take care of it properly. > Why exactly can't we load that firmware right at boot-up, or at least > stall correctly until it's there? Dave G. wrote a very good comment about this. Sorry I lost it during patch squashing. Here is a copy of it. I will amend the comment in next version. The GuC loader uses an asynchronous thread to fetch the firmware image (aka "binary blob") from a file. This thread has then had to wait for the mainline driver loading code to complete GEM initialisation before it can convert the blob into a GEM object and transfer it to the GuC's memory. Unfortunately, with this scheme, the GuC loading was occurring*after* the internally-generated batches used to initialise contexts had already been submitted (using direct access to the ELSP, since the GuC wasn't ready). In addition, the one-way synchronisation mechanism resulted in the firmware image being transfeered to the GuC at an indeterminate time (just "sometime" after the mainline thread releases the device's struct_mutex), with consequent confusion. Rather than complicate the loader further by adding a second sync point and arranging the handover of the struct_mutex from one thread to the other, this commit reverses the synchronisation so that the mainline thread waits for the asynchronous thread rather than vice versa. The firmware loader now only has to save the reference to the blob and signal a completion; the mainline thread can then continue with the rest of the loading process when it catches up. The result is a much simpler (and fully deterministic) process for loading the GuC firmware. -Alex --------------050804070708050903060009 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 7bit

On 03/27/2015 01:45 AM, Daniel Vetter wrote:
On Thu, Mar 26, 2015 at 12:41:13PM -0700, yu.dai@intel.com wrote:
> From: Dave Gordon <david.s.gordon@intel.com>
> 
> In order to fully initialise the default contexts, we have to execute
> batchbuffer commands on the GPU engines. But we can't do that until any
> required firmware has been loaded, which may not be possible during
> driver load, because the filesystem(s) containing the firmware may not
> be mounted until later.
> 
> Therefore, we now allow the first call to the firmware-loading code to
> return -EAGAIN to indicate that it's not yet ready, and that it should
> be retried when the device is first opened from user code, by which
> time we expect that all required filesystems will have been mounted.
> The late-retry code will then re-attempt to load the firmware if the
> early attempt failed.

We've tried a similar approach a while back and it doesn't work well in
conjunction with rps - the hw tends to fall over if the context state
isn't properly initialized when going into rc6.
I believe patch 18/18 of this series is to notify GuC when RC6 is on and off. I don't know much details of implementation inside firmware, but I believe GuC will take care of it properly.
Why exactly can't we load that firmware right at boot-up, or at least
stall correctly until it's there?
Dave G. wrote a very good comment about this. Sorry I lost it during patch squashing. Here is a copy of it. I will amend the comment in next version.
The GuC loader uses an asynchronous thread to fetch the firmware image
(aka "binary blob") from a file. This thread has then had to wait for
the mainline driver loading code to complete GEM initialisation before
it can convert the blob into a GEM object and transfer it to the GuC's
memory.

Unfortunately, with this scheme, the GuC loading was occurring *after*
the internally-generated batches used to initialise contexts had already
been submitted (using direct access to the ELSP, since the GuC wasn't
ready).

In addition, the one-way synchronisation mechanism resulted in the
firmware image being transfeered to the GuC at an indeterminate time
(just "sometime" after the mainline thread releases the device's
struct_mutex), with consequent confusion.

Rather than complicate the loader further by adding a second sync point
and arranging the handover of the struct_mutex from one thread to the
other, this commit reverses the synchronisation so that the mainline
thread waits for the asynchronous thread rather than vice versa. The
firmware loader now only has to save the reference to the blob and
signal a completion; the mainline thread can then continue with the rest
of the loading process when it catches up. The result is a much simpler
(and fully deterministic) process for loading the GuC firmware.

-Alex
--------------050804070708050903060009-- --===============0406090382== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4 IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHA6Ly9saXN0 cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9pbnRlbC1nZngK --===============0406090382==--