From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Hans de Goede <hdegoede@redhat.com>,
intel-gfx <intel-gfx@lists.freedesktop.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"Thorsten Leemhuis (regressions address)"
<regressions@leemhuis.info>
Subject: Re: [Intel-gfx] alderlake crashes (random memory corruption?) with 6.0 i915 / ucode related
Date: Mon, 17 Oct 2022 09:17:24 +0100 [thread overview]
Message-ID: <4cad6411-86af-dca5-09c7-92a4c5b5f7d3@linux.intel.com> (raw)
In-Reply-To: <e4f7b16e-5b6f-1b2c-5f88-fc4a129ae28f@redhat.com>
+ Jani and Ville for the intel_bios.c warn - no idea if that is relevant.
Hi,
On 15/10/2022 15:25, Hans de Goede wrote:
> Hi,
>
> On 10/13/22 22:33, Hans de Goede wrote:
>> Hi All,
>>
>> Yesterday I got a new Lenovo ThinkPad X1 yoga gen 7 laptop, since I plan
>> to make this my new day to day laptop I have copied over the entire
>> rootfs, /home, etc. from my current laptop to avoid having to tweak
>> everything to my liking again.
>>
>> This meant I had an initramfs generated for the other laptop. Which should
>> be fine since both are Intel machines and the old 5.19.y initramfs-es
>> worked fine. But 6.0.0 crashed with what seems like random memory
>> corruption (list integrity checks failing) until I regenerated the initrd ...
>>
>> Comparing the old vs regenerated initrds showed no relevant differences,
>> which made me think this is a CPU ucode issue (which is pre-fixed
>> to the initrd for early microcode loading).
>>
>> After some tests I have the following obeservations with 6.0.0:
>>
>> 1. The least stable is the old initrd (so with the wrong
>> ucode prefixed) this crashes before ever reaching gdm.
>> I believe that this is caused by late microcode loading
>> kicking in in this case (I though that was being removed?)
>> and doing load microcode loading on the i7-1260P with its
>> mix of P + E cores seems to seriously mess things up.
>>
>> 2. Slightly more stable, lasting at least a few minutes
>> before crashing is using dis_ucode_ldr
>>
>> 3. Using nomodeset seems to stabilize things even with
>> the old initrd with the wrong microcode prefixed
>>
>> 4. 5.19, with an old initrd and with normal modesetting
>> enabled works fine, so in a way this is a 6.0.0 regression
>>
>> 5. Using 6.0 with the new initrd with the new microcode
>> seems mostly stable, although sometimes this seems to
>> hang very early during boot, esp. if a previous boot
>> crashed and I have not run this for a long time yet.
>>
>> 6. After crashes it seems to be necessary to powercycle
>> the machine to get things back in working condition.
>>
>>
>> With 6.0 the following WARN triggers:
>> drivers/gpu/drm/i915/display/intel_bios.c:477:
>>
>> drm_WARN(&i915->drm, min_size == 0,
>> "Block %d min_size is zero\n", section_id);
>>
>> Since nomodeset helps this might be quite relevant, in 5.19.13
>> this does not happen, but I'm not sure if 5.19 has this check
>> at all.
>>
>>
>> There is a 2022/10/07 BIOS update which includes a CPU microcode
>> update available from Lenovo, I have not applied this yet in case
>> people want to investigate this further first.
>
> A quick update on this, the microcode being in the initrd or not
> seems to be a bit of a red herring. Yesterday the machine crashed
> twice at boot with 6.0.0 with an initrd which did correctly have
> the alderlake microcode cpio archive prefixed.
>
> Where as with 5.19 it boots correctly everytime. I will try to
> make some time to git bisect this sometime next week. I expect
> this is an i915 issue though since 6.0.0 with nomodeset on
> the cmdline does seem to boot successfully every time.
Maybe try with KASAN to see if it catches something before random list
corruption starts happening?
Regards,
Tvrtko
next prev parent reply other threads:[~2022-10-17 8:17 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-13 20:33 [Intel-gfx] alderlake crashes (random memory corruption?) with 6.0 i915 / ucode related Hans de Goede
2022-10-15 14:25 ` Hans de Goede
2022-10-17 8:17 ` Tvrtko Ursulin [this message]
2022-10-17 8:30 ` Jani Nikula
2022-10-17 8:32 ` Hans de Goede
2022-10-17 8:39 ` Jani Nikula
2022-10-17 10:48 ` Hans de Goede
2022-10-17 11:19 ` Thorsten Leemhuis
2022-10-17 13:14 ` Hans de Goede
2022-10-17 13:35 ` Jani Nikula
2022-10-17 14:32 ` Hans de Goede
2022-10-18 10:32 ` Ville Syrjälä
2022-10-17 11:40 ` Jani Nikula
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4cad6411-86af-dca5-09c7-92a4c5b5f7d3@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=hdegoede@redhat.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=regressions@leemhuis.info \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox