From: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>
To: John Harrison <john.c.harrison@intel.com>,
<intel-gfx@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH v2 1/1] drm/i915/reset: Fix error_state_read ptr + offset use
Date: Tue, 8 Mar 2022 11:47:13 -0800 [thread overview]
Message-ID: <dc58f444-12f6-6a33-1be7-f0fa004d4b93@intel.com> (raw)
In-Reply-To: <c4976a17-1254-c893-6501-e6fa6620a855@intel.com>
On 3/1/2022 1:37 PM, John Harrison wrote:
> On 2/25/2022 22:27, Alan Previn wrote:
>> ...
>> This fixes a kernel page fault can happen when
>> multiple tests are running concurrently in a loop
>> and one is producing engine resets and consuming
>> the i915 error_state dump while the other is
>> forcing full GT resets. (takes a while to trigger).
> Does need a fixes tag given that it is fixing a bug in an earlier
> patch. Especially when it is a kernel BUG.
> E.g.:
> 13:23> dim fixes 0e39037b31655
> Fixes: 0e39037b3165 ("drm/i915: Cache the error string")
>
okay, will add that.
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c
b/drivers/gpu/drm/i915/i915_sysfs.c
>> index a4d1759375b9..c40e51298066 100644
>> --- a/drivers/gpu/drm/i915/i915_sysfs.c
>> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
>> @@ -432,7 +432,7 @@ static ssize_t error_state_read(struct file
>> *filp, struct kobject *kobj,
>> struct device *kdev = kobj_to_dev(kobj);
>> struct drm_i915_private *i915 = kdev_minor_to_i915(kdev);
>> struct i915_gpu_coredump *gpu;
>> - ssize_t ret;
>> + ssize_t ret = 0;
>> gpu = i915_first_error_state(i915);
>> if (IS_ERR(gpu)) {
>> @@ -444,8 +444,10 @@ static ssize_t error_state_read(struct file
>> *filp, struct kobject *kobj,
>> const char *str = "No error state collected\n";
>> size_t len = strlen(str);
>> - ret = min_t(size_t, count, len - off);
>> - memcpy(buf, str + off, ret);
>> + if (off < len) {
>> + ret = min_t(size_t, count, len - off);
>> + memcpy(buf, str + off, ret);
>> + }
> So the problem is that the error dump disappeared while it was being
> read? That seems like it cause more problems than just this
> out-of-range access. E.g. what if the dump was freed and a new one
> created? The entity doing the partial reads would end up with half of
> one dump and half of the next.
>
> I think we should at least add a FIXME comment to the code that fast
> recycling dumps could cause inconsistent sysfs reads.
>
> I guess you could do something like add a unique id to the gpu
> coredump structure. Then, when a partial sysfs read occurs starting at
> zero (i.e. a new read), take a note of the id somewhere (e.g. inside
> the i915 structure). When the next non-zero read comes in, compare the
> save id with the current coredump's id and return an error if there is
> a mis-match.
>
> Not sure if this would be viewed as an important enough problem to be
> worth fixing. I'd be happy with just a FIXME comment for now.
For now I shall add a FIXME additional checks might impact IGT's that
currently dump and check the error state. I would assume with that
additional check in kernel, we would need to return a specific error
value like -ENODATA or -ENOLINK and handle it accordingly in the igt.
>
> I would also change the test to be 'if (off)' rather than 'if (off <
> len)'. Technically, the user could have read the first 10 bytes of a
> coredump and then gets "tate collected\n" as the remainder, for
> example. If we ensure that 'off' is zero then we know that this is a
> fresh read from scratch.
>
> John.
>
I'm a little confused, did u mean: in the case we dont have a gpu-state
to report, and then the user offset is zero (i.e. "if (!off)" ) then we
copy the string vs if we dont have a gpu-state to report and the
user-offset is non-zero, then we return an error (like the -ENOLINK?).
If thats what you meant, then it does mean we are assuming that (in the
case we dont have a gpu-state) the user will never come in with a
first-time-read where the user's buffer size of less than the string
length and be expected continue to read the rest of the string-length.
This i guess is okay since even 6 chars are enough to say "No err". :)
>> }
>> return ret;
>
next prev parent reply other threads:[~2022-03-08 19:47 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-26 6:27 [Intel-gfx] [PATCH v2 0/1] Fix i915 error_state_read ptr use Alan Previn
2022-02-26 6:27 ` [Intel-gfx] [PATCH v2 1/1] drm/i915/reset: Fix error_state_read ptr + offset use Alan Previn
2022-03-01 21:37 ` John Harrison
2022-03-08 19:47 ` Teres Alexis, Alan Previn [this message]
2022-03-10 1:18 ` John Harrison
2022-03-10 1:45 ` Teres Alexis, Alan Previn
2022-03-11 4:52 ` Teres Alexis, Alan Previn
2022-02-26 8:00 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Fix i915 error_state_read ptr use (rev2) Patchwork
2022-02-26 8:32 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-02-27 4:41 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dc58f444-12f6-6a33-1be7-f0fa004d4b93@intel.com \
--to=alan.previn.teres.alexis@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=john.c.harrison@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox