From: Dave Gordon <david.s.gordon@intel.com>
To: Mika Kuoppala <mika.kuoppala@linux.intel.com>,
intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915: cleanup lrc engine state when lrc is freed
Date: Fri, 09 Jan 2015 18:47:41 +0000 [thread overview]
Message-ID: <54B0224D.2050407@intel.com> (raw)
In-Reply-To: <1420724407-11767-1-git-send-email-mika.kuoppala@intel.com>
On 08/01/15 13:40, Mika Kuoppala wrote:
> i915_gem_validate_context() will check the engine->state to see if it can
> submit into a ringbuffer. But when we are releasing the context we leave the
> engine state to a non null value. Thus after a successful hang recovery
> we might mistakenly submit to a non initialized ringbuffer resulting in:
>
> [ 1991.356418] ------------[ cut here ]------------
> [ 1991.359192] WARNING: CPU: 1 PID: 2335 at lib/iomap.c:43 bad_io_access+0x3d/0x40()
> [ 1991.361966] Bad IO access at port 0x24 (outl(val,port))
> [ 1991.364750] Modules linked in: snd_hda_codec_hdmi i915 x86_pkg_temp_thermal coretemp kvm_intel kvm snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hwdep snd_pcm aesni_intel aes_x86_64 glue_helper lrw i2c_algo_bit gf128mul ablk_helper drm_kms_helper cryptd snd_seq_midi snd_seq_midi_event serio_raw drm snd_rawmidi snd_seq snd_seq_device snd_timer video snd soundcore mei_me lpc_ich bnep mac_hid acpi_pad mei rfcomm bluetooth parport_pc ppdev lp parport nls_iso8859_1 e1000e ptp ahci libahci pps_core sdhci_acpi sdhci
> [ 1991.370827] CPU: 1 PID: 2335 Comm: gem_ringfill Tainted: G W 3.19.0-rc3+ #50
> [ 1991.373838] Hardware name: Intel Corporation Broadwell Client platform/SawTooth Peak, BIOS BDW-E1R1.86C.0092.R00.1408311942 08/31/2014
> [ 1991.376902] ffffffff81aa1a46 ffff88014910fac8 ffffffff8173dbcf 0000000000000001
> [ 1991.379978] ffff88014910fb18 ffff88014910fb08 ffffffff8107007a ffff88014910fb28
> [ 1991.383037] ffff880147209940 ffff8800aafa8718 ffff8800aafa0000 ffff8800aafa1918
> [ 1991.386094] Call Trace:
> [ 1991.389140] [<ffffffff8173dbcf>] dump_stack+0x45/0x57
> [ 1991.392207] [<ffffffff8107007a>] warn_slowpath_common+0x8a/0xc0
> [ 1991.395268] [<ffffffff810700f6>] warn_slowpath_fmt+0x46/0x50
> [ 1991.398330] [<ffffffffa053290c>] ? intel_logical_ring_begin+0x3c/0x240 [i915]
> [ 1991.401395] [<ffffffff813985bd>] bad_io_access+0x3d/0x40
> [ 1991.404462] [<ffffffff81398763>] iowrite32+0x33/0x40
> [ 1991.407529] [<ffffffffa0533585>] gen8_init_rcs_context+0xd5/0x170 [i915]
> [ 1991.410605] [<ffffffffa0533d17>] intel_lr_context_deferred_create+0x657/0x8e0 [i915]
> [ 1991.413668] [<ffffffffa050eff1>] i915_gem_do_execbuffer.isra.22+0xed1/0xf60 [i915]
> [ 1991.416736] [<ffffffff811c0125>] ? __kmalloc+0x55/0x1b0
> [ 1991.419801] [<ffffffffa051029c>] ? i915_gem_execbuffer2+0x6c/0x2c0 [i915]
> [ 1991.422772] [<ffffffffa05102e1>] i915_gem_execbuffer2+0xb1/0x2c0 [i915]
> [ 1991.425632] [<ffffffffa01b8ab4>] drm_ioctl+0x1a4/0x630 [drm]
> [ 1991.428454] [<ffffffff811258bc>] ? acct_account_cputime+0x1c/0x20
> [ 1991.431255] [<ffffffff811ee378>] do_vfs_ioctl+0x2f8/0x510
> [ 1991.434009] [<ffffffff8109f834>] ? vtime_account_user+0x54/0x60
> [ 1991.436778] [<ffffffff811ee611>] SyS_ioctl+0x81/0xa0
> [ 1991.439553] [<ffffffff81745cb4>] ? int_check_syscall_exit_work+0x34/0x3d
> [ 1991.442306] [<ffffffff81745a2d>] system_call_fastpath+0x16/0x1b
>
> Fix this by setting all the engine fields properly when lrc is freed.
>
> Cc: Thomas Daniel <thomas.daniel@intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
> drivers/gpu/drm/i915/intel_lrc.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 7670a0f..32684d9 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1777,6 +1777,10 @@ void intel_lr_context_free(struct intel_context *ctx)
> intel_destroy_ringbuffer_obj(ringbuf);
> kfree(ringbuf);
> drm_gem_object_unreference(&ctx_obj->base);
> + WARN_ON(ctx->engine[i].unpin_count != 0);
> + ctx->engine[i].unpin_count = 0;
> + ctx->engine[i].ringbuf = NULL;
> + ctx->engine[i].state = NULL;
> }
> }
> }
Hi,
I don't quite see how this can fix the problem illustrated by the stack
trace above. AFAICS intel_lr_context_free() is called /only/ from
i915_gem_context_free(), which should mean that the refcount on the
intel_context object is already zero, and that it will be freed on
return. So the contents of ctx->engine[] should be irrelevant ...
void i915_gem_context_free(struct kref *ctx_ref)
{
struct intel_context *ctx = container_of(ctx_ref,
typeof(*ctx), ref);
trace_i915_context_free(ctx);
if (i915.enable_execlists)
intel_lr_context_free(ctx);
i915_ppgtt_put(ctx->ppgtt);
if (ctx->legacy_hw_ctx.rcs_state)
drm_gem_object_unreference(&ctx->legacy_hw_ctx.rcs_state->base);
list_del(&ctx->link);
kfree(ctx);
}
... unless something is trying to reuse the context while it is still in
the process of being deleted :(
In addition, the stack trace above implies that ctx->engine[].state WAS
NULL when i915_gem_validate_context() was called, otherwise it would not
have called intel_lr_context_deferred_create()
if (i915.enable_execlists && !ctx->engine[ring->id].state) {
int ret = intel_lr_context_deferred_create(ctx, ring);
if (ret) {
DRM_DEBUG("Could not create LRC %u: %d\n", ctx_id, ret);
return ERR_PTR(ret);
}
}
and likewise that function would not have called gen8_init_rcs_context()
unless this was a new context:
if (ctx == ring->default_context)
lrc_setup_hardware_status_page(ring, ctx_obj);
else if (ring->id == RCS && !ctx->rcs_initialized) {
if (ring->init_context) {
ret = ring->init_context(ring, ctx);
if (ret) {
DRM_ERROR("ring init context: %d\n", ret);
ctx->engine[ring->id].ringbuf = NULL;
ctx->engine[ring->id].state = NULL;
goto error;
}
}
ctx->rcs_initialized = true;
}
Note that rcs_initialized is never cleared, even with your change, so
that in a use-after-free situation we wouldn't end up in this path. So I
think the mystery is how this context ended up in an inconsistent state:
has it been partially freed and then reused, or has some part of the new
context allocation path failed but not been unwound correctly?
And if setting to NULL a pointer that's inside a structure that's in the
process of being freed actually makes a difference, doesn't that mean
there's a use-after-free issue somewhere?
.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2015-01-09 18:49 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-08 13:40 [PATCH] drm/i915: cleanup lrc engine state when lrc is freed Mika Kuoppala
2015-01-08 18:22 ` shuang.he
2015-01-09 18:47 ` Dave Gordon [this message]
2015-01-12 16:24 ` Ville Syrjälä
2015-01-20 16:04 ` Mika Kuoppala
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54B0224D.2050407@intel.com \
--to=david.s.gordon@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=mika.kuoppala@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.