From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>,
Intel graphics driver community testing & development
<intel-gfx@lists.freedesktop.org>
Subject: Re: [PATCH v2] drm/i915: Sanitize GuC client initialization
Date: Tue, 21 Feb 2017 17:38:24 +0200 [thread overview]
Message-ID: <1487691504.3137.37.camel@linux.intel.com> (raw)
In-Reply-To: <9a46eb2b-a2da-2a8e-deb4-e626f7ed27ab@intel.com>
On ke, 2017-02-15 at 18:28 -0800, Daniele Ceraolo Spurio wrote:
>
> On 14/02/17 05:53, Joonas Lahtinen wrote:
> >
> > Started adding proper teardown to guc_client_alloc, ended up removing
> > quite a few dead ends where errors communicating with the GuC were
> > silently ignored. There also seemed to be quite a few erronous
> > teardown actions performed in case of an error (ordering wrong).
> >
> > v2:
> > - Increase function symmetry/proximity (Michal/Daniele)
> > - Fix __reserve_doorbell accounting for high priority (Daniele)
> > - Call __update_doorbell_desc! (Daniele)
> > - Isolate __guc_{,de}allocate_doorbell (Michal/Daniele)
> >
> > Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
> > Cc: Oscar Mateo <oscar.mateo@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
<SNIP>
> > +static bool has_doorbell(struct i915_guc_client *client)
> > +{
> > + if (client->doorbell_id == GUC_DOORBELL_INVALID)
> > + return false;
> >
> > - /* Activate the new doorbell */
> > - __set_bit(new_id, doorbell_bitmap);
> > + return test_bit(client->doorbell_id, client->guc->doorbell_bitmap);
>
> Should we warn/gem_bug if client->doorbell_id != GUC_DOORBELL_INVALID
> and the bit in guc->doorbell_bitmap is not set? Not sure if you plan to
> decouple them more in the future, but with the current code it should
> always be impossible to have a valid doorbell without the bit in the
> bitmask being set. Maybe we can just return client->doorbell_id !=
> GUC_DOORBELL_INVALID here if we have no plan for a case where they can
> be out of sync.
Kinda a leftover from restructuring, the selection and reservation were
a different thing at some point. Changed into GEM_BUG_ON.
> > +static int __destroy_doorbell(struct i915_guc_client *client)
> > {
> > - (void)guc_update_doorbell_id(guc, client, GUC_INVALID_DOORBELL_ID);
> > + struct guc_doorbell_info *doorbell;
> >
> > - /* XXX: wait for any interrupts */
> > - /* XXX: wait for workqueue to drain */
> > + doorbell = __get_doorbell(client);
> > + doorbell->db_status = GUC_DOORBELL_DISABLED;
> > + doorbell->cookie = 0;
> > +
>
> Not from this patch (but since we're at it...), I did a bit of digging
> and I've found that doorbell release flow requires SW to poll the
> GEN8_DRBREGL(db_id) register after updating doorbell->db_status to wait
> for the GEN8_DRB_VALID bit to go to zero. This ensures that any pending
> events are processed before we call into GuC to release the doorbell.
> Note that GuC will fail the DEALLOCATE_DOORBELL action if the bit has
> not gone to zero yet. This is currently not an issue, probably because
> we use a single doorbell and we don't ring it near release time, but the
> situation could change in the future so I believe it is worth to fix it
> now. I can follow up with an additional patch if you want to keep this
> one as refactoring only.
Ack, add a follow-up on top of your series.
> > -static uint32_t select_doorbell_cacheline(struct intel_guc *guc)
> > + return 0;
> > +}
> > +
> > +static unsigned long __reserve_cacheline(struct intel_guc* guc)
>
> "reserve_cacheline" doesn't really reflect what happens, because more
> doorbells can use the same cacheline (while they are on different
> physical pages) and there is no concept of unreserving the cacheline.
> The idea is to try to avoid having lots of doorbells on the same
> cacheline if possible to make the monitoring more efficient on the HW,
> so I'd keep the "select_cacheline" naming for this function. We should
> probably also look at modifying the function to use something more
> elaborated than a simple round robin scheme to ensure doorbells are
> equally distribuited on cachelines, but that can probably wait until we
> use more doorbells.
Renamed.
> > /*
> > * Borrow the first client to set up & tear down each unused doorbell
> > * in turn, to ensure that all doorbell h/w is (re)initialised.
> > */
> > -static void guc_init_doorbell_hw(struct intel_guc *guc)
> > +static int guc_init_doorbell_hw(struct intel_guc *guc)
> > {
> > struct i915_guc_client *client = guc->execbuf_client;
> > - uint16_t db_id;
> > - int i, err;
> > + int err;
> > + int i;
> >
> > - guc_disable_doorbell(guc, client);
> > + if (has_doorbell(client))
> > + destroy_doorbell(client);
> >
> > - for (i = 0; i < GUC_MAX_DOORBELLS; ++i) {
> > - /* Skip if doorbell is OK */
> > - if (guc_doorbell_check(guc, i))
> > + for (i = 0; i < GUC_NUM_DOORBELLS; ++i) {
> > + if (doorbell_ok(guc, i))
> > continue;
> >
>
> As mentioned in the previous patch version, here we assume that all the
> doorbells should be disabled an we want to reset HW that has been left
> enabled from previous users, so the doorbell_bitmap should be clear.
> Maybe adding a simple
>
> GEM_BUG_ON(test_bit(i, guc->doorbell_bitmap));
>
> will help making sure that we're not leaving bits set when
> releasing/resetting.
The functions don't actually even touch doorbell_bitmap, they just nuke
all active clients. That's what the previous code did, as I read it.
> > - err = guc_update_doorbell_id(guc, client, i);
> > - if (err)
> > - DRM_DEBUG_DRIVER("Doorbell %d update failed, err %d\n",
> > - i, err);
> > + err = __reset_doorbell(client, i);
>
> If the reset fails the doorbell is in a bad state, so it might be worth
> to ensure that the bit is set in the bitmask to make sure we don't
> assign that doorbell to any client, but we'd have to make sure to clear
> the bitmask on GuC reset (see comment above).
Clearing the bits now in guc_init_doorbell_hw().
> > + WARN(err, "Doorbell %d reset failed, err %d\n", i, err);
> > }
> >
> > - db_id = select_doorbell_register(guc, client->priority);
> > - WARN_ON(db_id == GUC_INVALID_DOORBELL_ID);
> > -
> > - err = guc_update_doorbell_id(guc, client, db_id);
> > + err = __reserve_doorbell(client);
> > if (err)
> > - DRM_WARN("Failed to restore doorbell to %d, err %d\n",
> > - db_id, err);
> > + return err;
>
> Continuing the discussion from the previous patch revision:
>
> <snip>
> > Shouldn't this be create_doorbell() instead of reserve?
>
> That would end up calling the __update_doorbell, which we can't as the
> desc is for some strange reason not mapped yet.
> </snip>
>
> As far as I can see, everything should already be allocated and mapped
> here. Aren't we anyway already calling __update_doorbell both in the
> client_alloc and in __reset_doorbell above?
> Also, where are we re-creating the doorbell that we destroyed at the
> beginning of the function?
>
Hmm, refactored the extra destroy cycle out.
<SNIP>
> > @@ -753,27 +781,35 @@ guc_client_alloc(struct drm_i915_private *dev_priv,
> > guc_proc_desc_init(guc, client);
> > guc_ctx_desc_init(guc, client);
> >
> > - /* For runtime client allocation we need to enable the doorbell. Not
> > - * required yet for the static execbuf_client as this special kernel
> > - * client is enabled from i915_guc_submission_enable().
> > - *
> > - * guc_update_doorbell_id(guc, client, db_id);
> > - */
> > + /* For runtime client allocation we need to enable the doorbell. */
>
> this comment is a bit unclear now because you're not deferring the
> initialization of execbuf_client to guc_init_doorbell_hw anymore, so
> there is no difference between execbuf_client and "runtime" clients.
> Maybe we can just remove the comment.
Removed.
> Note that creating the doorbell here will cause it to be destroyed and
> re-allocated when i915_guc_submission_enable is called. A worth
> compromise IMO to avoid special cases, but worth pointing out :)
Refactored a bit.
Regards, Joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2017-02-21 15:38 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-14 13:53 [PATCH v2] drm/i915: Sanitize GuC client initialization Joonas Lahtinen
2017-02-14 17:52 ` ✓ Fi.CI.BAT: success for drm/i915: Sanitize GuC client initialization (rev2) Patchwork
2017-02-16 2:28 ` [PATCH v2] drm/i915: Sanitize GuC client initialization Daniele Ceraolo Spurio
2017-02-16 7:44 ` Chris Wilson
2017-02-16 16:22 ` Daniele Ceraolo Spurio
2017-02-21 15:38 ` Joonas Lahtinen [this message]
2017-02-22 11:44 ` Joonas Lahtinen
2017-02-16 10:50 ` Oscar Mateo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1487691504.3137.37.camel@linux.intel.com \
--to=joonas.lahtinen@linux.intel.com \
--cc=daniele.ceraolospurio@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.