* REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems
@ 2023-08-17 21:09 Todd Brandt
2023-08-17 21:47 ` Jarkko Sakkinen
2023-08-18 0:10 ` Bagas Sanjaya
0 siblings, 2 replies; 16+ messages in thread
From: Todd Brandt @ 2023-08-17 21:09 UTC (permalink / raw)
To: mario.limonciello, linux-integrity
Cc: linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki
While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing a
crash and reboot situation when S3 suspend is initiated. To reproduce
it, this call is all that's required "sudo sleepgraph -m mem -rtcwake
15".
Ive created a Bugzilla to track this issue here:
https://bugzilla.kernel.org/show_bug.cgi?id=217804
I've bisected the issue to this patch:
commit 554b841d470338a3b1d6335b14ee1cd0c8f5d754
Author: Mario Limonciello <mario.limonciello@amd.com>
Date: Wed Aug 2 07:25:33 2023 -0500
tpm: Disable RNG for all AMD fTPMs
The TPM RNG functionality is not necessary for entropy when the CPU
already supports the RDRAND instruction. The TPM RNG functionality
was previously disabled on a subset of AMD fTPM series, but reports
continue to show problems on some systems causing stutter root
caused
to TPM RNG functionality.
Expand disabling TPM RNG use for all AMD fTPMs whether they have
versions
that claim to have fixed or not. To accomplish this, move the
detection
into part of the TPM CRB registration and add a flag indicating
that
the TPM should opt-out of registration to hwrng.
By reverting this patch in 6.5.0-rc6 the problem goes away, so it's
pretty clear that this commit is at fault. I've done further debugging
and I've found that if I simply comment out these lines in 6.5.0-rc6
the problem goes away. So the "crb_check_flags" call is the root cause.
diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
index 9eb1a1859012..20ce8102e6bd 100644
--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -826,9 +826,9 @@ static int crb_acpi_add(struct acpi_device *device)
if (rc)
goto out;
- rc = crb_check_flags(chip);
- if (rc)
- goto out;
+// rc = crb_check_flags(chip);
+// if (rc)
+// goto out;
rc = tpm_chip_register(chip);
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems 2023-08-17 21:09 REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems Todd Brandt @ 2023-08-17 21:47 ` Jarkko Sakkinen 2023-08-17 22:25 ` Todd Brandt 2023-08-18 0:10 ` Bagas Sanjaya 1 sibling, 1 reply; 16+ messages in thread From: Jarkko Sakkinen @ 2023-08-17 21:47 UTC (permalink / raw) To: todd.e.brandt, mario.limonciello, linux-integrity Cc: linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt wrote: > While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing a > crash and reboot situation when S3 suspend is initiated. To reproduce > it, this call is all that's required "sudo sleepgraph -m mem -rtcwake > 15". 1. Are there logs available? 2. Is this the test case: https://pypi.org/project/sleepgraph/ (never used it before). I'll see if I can repeat it with QEMU + swtpm. > Ive created a Bugzilla to track this issue here: > https://bugzilla.kernel.org/show_bug.cgi?id=217804 Thank you for reporting this. BR, Jarkko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems 2023-08-17 21:47 ` Jarkko Sakkinen @ 2023-08-17 22:25 ` Todd Brandt 2023-08-17 22:33 ` Jarkko Sakkinen 0 siblings, 1 reply; 16+ messages in thread From: Todd Brandt @ 2023-08-17 22:25 UTC (permalink / raw) To: Jarkko Sakkinen, mario.limonciello, linux-integrity Cc: linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen wrote: > On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt wrote: > > While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing > > a > > crash and reboot situation when S3 suspend is initiated. To > > reproduce > > it, this call is all that's required "sudo sleepgraph -m mem > > -rtcwake > > 15". > > 1. Are there logs available? > 2. Is this the test case: https://pypi.org/project/sleepgraph/ (never > used it before). There are no dmesg logs because the S3 crash wipes them out. Sleepgraph isn't actually necessary to activate it, just an S3 suspend "echo mem > /sys/power/state". So far it appears to only have affected test systems, not production hardware, and none of them have TPM chips, so I'm beginning to wonder if this patch just inadvertently activated a bug somewhere else in the kernel that happens to affect test hardware. I'll continue to debug it, this isn't an emergency as so far I haven't seen it in production hardware. > > I'll see if I can repeat it with QEMU + swtpm. > > > I’ve created a Bugzilla to track this issue here: > > https://bugzilla.kernel.org/show_bug.cgi?id=217804 > > Thank you for reporting this. > > BR, Jarkko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems 2023-08-17 22:25 ` Todd Brandt @ 2023-08-17 22:33 ` Jarkko Sakkinen 2023-08-18 1:58 ` Limonciello, Mario 2023-08-18 16:51 ` Jarkko Sakkinen 0 siblings, 2 replies; 16+ messages in thread From: Jarkko Sakkinen @ 2023-08-17 22:33 UTC (permalink / raw) To: todd.e.brandt, mario.limonciello, linux-integrity Cc: linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki On Fri Aug 18, 2023 at 1:25 AM EEST, Todd Brandt wrote: > On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen wrote: > > On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt wrote: > > > While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing > > > a > > > crash and reboot situation when S3 suspend is initiated. To > > > reproduce > > > it, this call is all that's required "sudo sleepgraph -m mem > > > -rtcwake > > > 15". > > > > 1. Are there logs available? > > 2. Is this the test case: https://pypi.org/project/sleepgraph/ (never > > used it before). > > There are no dmesg logs because the S3 crash wipes them out. Sleepgraph > isn't actually necessary to activate it, just an S3 suspend "echo mem > > /sys/power/state". > > So far it appears to only have affected test systems, not production > hardware, and none of them have TPM chips, so I'm beginning to wonder > if this patch just inadvertently activated a bug somewhere else in the > kernel that happens to affect test hardware. > > I'll continue to debug it, this isn't an emergency as so far I haven't > seen it in production hardware. OK, I'll still see if I could reproduce it just in case. BR, Jarkko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems 2023-08-17 22:33 ` Jarkko Sakkinen @ 2023-08-18 1:58 ` Limonciello, Mario 2023-08-18 17:00 ` Jarkko Sakkinen 2023-08-18 16:51 ` Jarkko Sakkinen 1 sibling, 1 reply; 16+ messages in thread From: Limonciello, Mario @ 2023-08-18 1:58 UTC (permalink / raw) To: Jarkko Sakkinen, todd.e.brandt, linux-integrity Cc: linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki On 8/17/2023 5:33 PM, Jarkko Sakkinen wrote: > On Fri Aug 18, 2023 at 1:25 AM EEST, Todd Brandt wrote: >> On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen wrote: >>> On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt wrote: >>>> While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing >>>> a >>>> crash and reboot situation when S3 suspend is initiated. To >>>> reproduce >>>> it, this call is all that's required "sudo sleepgraph -m mem >>>> -rtcwake >>>> 15". >>> >>> 1. Are there logs available? >>> 2. Is this the test case: https://pypi.org/project/sleepgraph/ (never >>> used it before). >> >> There are no dmesg logs because the S3 crash wipes them out. Sleepgraph >> isn't actually necessary to activate it, just an S3 suspend "echo mem > >> /sys/power/state". >> >> So far it appears to only have affected test systems, not production >> hardware, and none of them have TPM chips, so I'm beginning to wonder >> if this patch just inadvertently activated a bug somewhere else in the >> kernel that happens to affect test hardware. >> >> I'll continue to debug it, this isn't an emergency as so far I haven't >> seen it in production hardware. > > OK, I'll still see if I could reproduce it just in case. > > BR, Jarkko I'd like to better understand what kind of TPM initialization path has run. Does the machine have some sort of TPM that failed to fully initialize perhaps? If you can't share a full bootup dmesg, can you at least share # dmesg | grep -i tpm I think it would be really good to try the following: 0) Start with a machine without this patch in place 1) At bootup run this command: # tpm2_getcap properties-fixed 2) Suspend machine 3) Repeat that command after suspend 4) Reboot with that patch and repeat steps 1 and 2. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems 2023-08-18 1:58 ` Limonciello, Mario @ 2023-08-18 17:00 ` Jarkko Sakkinen 2023-08-18 17:21 ` Mario Limonciello 0 siblings, 1 reply; 16+ messages in thread From: Jarkko Sakkinen @ 2023-08-18 17:00 UTC (permalink / raw) To: Limonciello, Mario, todd.e.brandt, linux-integrity Cc: linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki On Fri Aug 18, 2023 at 4:58 AM EEST, Limonciello, Mario wrote: > > > On 8/17/2023 5:33 PM, Jarkko Sakkinen wrote: > > On Fri Aug 18, 2023 at 1:25 AM EEST, Todd Brandt wrote: > >> On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen wrote: > >>> On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt wrote: > >>>> While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing > >>>> a > >>>> crash and reboot situation when S3 suspend is initiated. To > >>>> reproduce > >>>> it, this call is all that's required "sudo sleepgraph -m mem > >>>> -rtcwake > >>>> 15". > >>> > >>> 1. Are there logs available? > >>> 2. Is this the test case: https://pypi.org/project/sleepgraph/ (never > >>> used it before). > >> > >> There are no dmesg logs because the S3 crash wipes them out. Sleepgraph > >> isn't actually necessary to activate it, just an S3 suspend "echo mem > > >> /sys/power/state". > >> > >> So far it appears to only have affected test systems, not production > >> hardware, and none of them have TPM chips, so I'm beginning to wonder > >> if this patch just inadvertently activated a bug somewhere else in the > >> kernel that happens to affect test hardware. > >> > >> I'll continue to debug it, this isn't an emergency as so far I haven't > >> seen it in production hardware. > > > > OK, I'll still see if I could reproduce it just in case. > > > > BR, Jarkko > > I'd like to better understand what kind of TPM initialization path has > run. Does the machine have some sort of TPM that failed to fully > initialize perhaps? > > If you can't share a full bootup dmesg, can you at least share > > # dmesg | grep -i tpm It would be more useful perhaps to get full dmesg output after power on and before going into suspend. Also ftrace filter could be added to the kernel command-line: ftrace=function ftrace_filter=tpm* After bootup: mount -t tracefs nodev /sys/kernel/tracing cat /sys/kernel/tracing/trace BR, Jarkko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems 2023-08-18 17:00 ` Jarkko Sakkinen @ 2023-08-18 17:21 ` Mario Limonciello [not found] ` <CUVV2MQRCGET.2U22LFQPX1J3G@suppilovahvero> 0 siblings, 1 reply; 16+ messages in thread From: Mario Limonciello @ 2023-08-18 17:21 UTC (permalink / raw) To: Jarkko Sakkinen, todd.e.brandt, linux-integrity Cc: linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki On 8/18/2023 12:00, Jarkko Sakkinen wrote: > On Fri Aug 18, 2023 at 4:58 AM EEST, Limonciello, Mario wrote: >> >> >> On 8/17/2023 5:33 PM, Jarkko Sakkinen wrote: >>> On Fri Aug 18, 2023 at 1:25 AM EEST, Todd Brandt wrote: >>>> On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen wrote: >>>>> On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt wrote: >>>>>> While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing >>>>>> a >>>>>> crash and reboot situation when S3 suspend is initiated. To >>>>>> reproduce >>>>>> it, this call is all that's required "sudo sleepgraph -m mem >>>>>> -rtcwake >>>>>> 15". >>>>> >>>>> 1. Are there logs available? >>>>> 2. Is this the test case: https://pypi.org/project/sleepgraph/ (never >>>>> used it before). >>>> >>>> There are no dmesg logs because the S3 crash wipes them out. Sleepgraph >>>> isn't actually necessary to activate it, just an S3 suspend "echo mem > >>>> /sys/power/state". >>>> >>>> So far it appears to only have affected test systems, not production >>>> hardware, and none of them have TPM chips, so I'm beginning to wonder >>>> if this patch just inadvertently activated a bug somewhere else in the >>>> kernel that happens to affect test hardware. >>>> >>>> I'll continue to debug it, this isn't an emergency as so far I haven't >>>> seen it in production hardware. >>> >>> OK, I'll still see if I could reproduce it just in case. >>> >>> BR, Jarkko >> >> I'd like to better understand what kind of TPM initialization path has >> run. Does the machine have some sort of TPM that failed to fully >> initialize perhaps? >> >> If you can't share a full bootup dmesg, can you at least share >> >> # dmesg | grep -i tpm > > It would be more useful perhaps to get full dmesg output after power on > and before going into suspend. > > Also ftrace filter could be added to the kernel command-line: > > ftrace=function ftrace_filter=tpm* > > After bootup: > > mount -t tracefs nodev /sys/kernel/tracing > cat /sys/kernel/tracing/trace > > BR, Jarkko Todd and I have gone back and forth a little bit on the bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=217804), and it seems that this isn't an S3 problem - it's a probing problem. [ 1.132521] tpm_crb: probe of INTC6001:00 failed with error 378 That error 378 specifically matches TPM2_CC_GET_CAPABILITY, which is the same command that was being requested. This leads me to believe the TPM isn't ready at the time of probing. In this case one solution is we could potentially ignore failures for that tpm2_get_tpm_pt() call, but I think we should first understand why it doesn't work at probing time for this TPM to ensure the actual quirk isn't built on a house of cards. ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CUVV2MQRCGET.2U22LFQPX1J3G@suppilovahvero>]
* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems [not found] ` <CUVV2MQRCGET.2U22LFQPX1J3G@suppilovahvero> @ 2023-08-18 17:57 ` Mario Limonciello 2023-08-18 18:07 ` Jarkko Sakkinen 0 siblings, 1 reply; 16+ messages in thread From: Mario Limonciello @ 2023-08-18 17:57 UTC (permalink / raw) To: Jarkko Sakkinen, todd.e.brandt, linux-integrity Cc: linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki On 8/18/2023 12:53, Jarkko Sakkinen wrote: > On Fri Aug 18, 2023 at 8:21 PM EEST, Mario Limonciello wrote: >> On 8/18/2023 12:00, Jarkko Sakkinen wrote: >>> On Fri Aug 18, 2023 at 4:58 AM EEST, Limonciello, Mario wrote: >>>> >>>> >>>> On 8/17/2023 5:33 PM, Jarkko Sakkinen wrote: >>>>> On Fri Aug 18, 2023 at 1:25 AM EEST, Todd Brandt wrote: >>>>>> On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen wrote: >>>>>>> On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt wrote: >>>>>>>> While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing >>>>>>>> a >>>>>>>> crash and reboot situation when S3 suspend is initiated. To >>>>>>>> reproduce >>>>>>>> it, this call is all that's required "sudo sleepgraph -m mem >>>>>>>> -rtcwake >>>>>>>> 15". >>>>>>> >>>>>>> 1. Are there logs available? >>>>>>> 2. Is this the test case: https://pypi.org/project/sleepgraph/ (never >>>>>>> used it before). >>>>>> >>>>>> There are no dmesg logs because the S3 crash wipes them out. Sleepgraph >>>>>> isn't actually necessary to activate it, just an S3 suspend "echo mem > >>>>>> /sys/power/state". >>>>>> >>>>>> So far it appears to only have affected test systems, not production >>>>>> hardware, and none of them have TPM chips, so I'm beginning to wonder >>>>>> if this patch just inadvertently activated a bug somewhere else in the >>>>>> kernel that happens to affect test hardware. >>>>>> >>>>>> I'll continue to debug it, this isn't an emergency as so far I haven't >>>>>> seen it in production hardware. >>>>> >>>>> OK, I'll still see if I could reproduce it just in case. >>>>> >>>>> BR, Jarkko >>>> >>>> I'd like to better understand what kind of TPM initialization path has >>>> run. Does the machine have some sort of TPM that failed to fully >>>> initialize perhaps? >>>> >>>> If you can't share a full bootup dmesg, can you at least share >>>> >>>> # dmesg | grep -i tpm >>> >>> It would be more useful perhaps to get full dmesg output after power on >>> and before going into suspend. >>> >>> Also ftrace filter could be added to the kernel command-line: >>> >>> ftrace=function ftrace_filter=tpm* >>> >>> After bootup: >>> >>> mount -t tracefs nodev /sys/kernel/tracing >>> cat /sys/kernel/tracing/trace >>> >>> BR, Jarkko >> >> Todd and I have gone back and forth a little bit on the bugzilla >> (https://bugzilla.kernel.org/show_bug.cgi?id=217804), and it seems that >> this isn't an S3 problem - it's a probing problem. >> >> [ 1.132521] tpm_crb: probe of INTC6001:00 failed with error 378 >> >> That error 378 specifically matches TPM2_CC_GET_CAPABILITY, which is the >> same command that was being requested. This leads me to believe the TPM >> isn't ready at the time of probing. >> >> In this case one solution is we could potentially ignore failures for >> that tpm2_get_tpm_pt() call, but I think we should first understand why >> it doesn't work at probing time for this TPM to ensure the actual quirk >> isn't built on a house of cards. > > Given that there is nothing known broken (at the moment) in production, > I think the following might be a reasonable change. > > BR, Jarkko > Yeah that would prevent it. Here's a simpler change that I think should work too though: diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c index 9eb1a18590123..b0e9931fe436c 100644 --- a/drivers/char/tpm/tpm_crb.c +++ b/drivers/char/tpm/tpm_crb.c @@ -472,8 +472,7 @@ static int crb_check_flags(struct tpm_chip *chip) if (ret) return ret; - ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL); - if (ret) + if (tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL)) goto release; if (val == 0x414D4400U /* AMD */) I think Todd needs to check whether TPM works with that or not though. ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems 2023-08-18 17:57 ` Mario Limonciello @ 2023-08-18 18:07 ` Jarkko Sakkinen 2023-08-18 18:11 ` Mario Limonciello 0 siblings, 1 reply; 16+ messages in thread From: Jarkko Sakkinen @ 2023-08-18 18:07 UTC (permalink / raw) To: Mario Limonciello, todd.e.brandt, linux-integrity Cc: linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki On Fri Aug 18, 2023 at 8:57 PM EEST, Mario Limonciello wrote: > On 8/18/2023 12:53, Jarkko Sakkinen wrote: > > On Fri Aug 18, 2023 at 8:21 PM EEST, Mario Limonciello wrote: > >> On 8/18/2023 12:00, Jarkko Sakkinen wrote: > >>> On Fri Aug 18, 2023 at 4:58 AM EEST, Limonciello, Mario wrote: > >>>> > >>>> > >>>> On 8/17/2023 5:33 PM, Jarkko Sakkinen wrote: > >>>>> On Fri Aug 18, 2023 at 1:25 AM EEST, Todd Brandt wrote: > >>>>>> On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen wrote: > >>>>>>> On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt wrote: > >>>>>>>> While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing > >>>>>>>> a > >>>>>>>> crash and reboot situation when S3 suspend is initiated. To > >>>>>>>> reproduce > >>>>>>>> it, this call is all that's required "sudo sleepgraph -m mem > >>>>>>>> -rtcwake > >>>>>>>> 15". > >>>>>>> > >>>>>>> 1. Are there logs available? > >>>>>>> 2. Is this the test case: https://pypi.org/project/sleepgraph/ (never > >>>>>>> used it before). > >>>>>> > >>>>>> There are no dmesg logs because the S3 crash wipes them out. Sleepgraph > >>>>>> isn't actually necessary to activate it, just an S3 suspend "echo mem > > >>>>>> /sys/power/state". > >>>>>> > >>>>>> So far it appears to only have affected test systems, not production > >>>>>> hardware, and none of them have TPM chips, so I'm beginning to wonder > >>>>>> if this patch just inadvertently activated a bug somewhere else in the > >>>>>> kernel that happens to affect test hardware. > >>>>>> > >>>>>> I'll continue to debug it, this isn't an emergency as so far I haven't > >>>>>> seen it in production hardware. > >>>>> > >>>>> OK, I'll still see if I could reproduce it just in case. > >>>>> > >>>>> BR, Jarkko > >>>> > >>>> I'd like to better understand what kind of TPM initialization path has > >>>> run. Does the machine have some sort of TPM that failed to fully > >>>> initialize perhaps? > >>>> > >>>> If you can't share a full bootup dmesg, can you at least share > >>>> > >>>> # dmesg | grep -i tpm > >>> > >>> It would be more useful perhaps to get full dmesg output after power on > >>> and before going into suspend. > >>> > >>> Also ftrace filter could be added to the kernel command-line: > >>> > >>> ftrace=function ftrace_filter=tpm* > >>> > >>> After bootup: > >>> > >>> mount -t tracefs nodev /sys/kernel/tracing > >>> cat /sys/kernel/tracing/trace > >>> > >>> BR, Jarkko > >> > >> Todd and I have gone back and forth a little bit on the bugzilla > >> (https://bugzilla.kernel.org/show_bug.cgi?id=217804), and it seems that > >> this isn't an S3 problem - it's a probing problem. > >> > >> [ 1.132521] tpm_crb: probe of INTC6001:00 failed with error 378 > >> > >> That error 378 specifically matches TPM2_CC_GET_CAPABILITY, which is the > >> same command that was being requested. This leads me to believe the TPM > >> isn't ready at the time of probing. > >> > >> In this case one solution is we could potentially ignore failures for > >> that tpm2_get_tpm_pt() call, but I think we should first understand why > >> it doesn't work at probing time for this TPM to ensure the actual quirk > >> isn't built on a house of cards. > > > > Given that there is nothing known broken (at the moment) in production, > > I think the following might be a reasonable change. > > > > BR, Jarkko > > > > Yeah that would prevent it. > > Here's a simpler change that I think should work too though: > diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c > index 9eb1a18590123..b0e9931fe436c 100644 > --- a/drivers/char/tpm/tpm_crb.c > +++ b/drivers/char/tpm/tpm_crb.c > @@ -472,8 +472,7 @@ static int crb_check_flags(struct tpm_chip *chip) > if (ret) > return ret; > > - ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL); > - if (ret) > + if (tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL)) > goto release; > > if (val == 0x414D4400U /* AMD */) > > I think Todd needs to check whether TPM works with that or not though. Hmm... I'm sorry if I have a blind spot now but what is that changing? BR, Jarkko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems 2023-08-18 18:07 ` Jarkko Sakkinen @ 2023-08-18 18:11 ` Mario Limonciello 2023-08-18 20:08 ` Todd Brandt 2023-08-18 21:39 ` Todd Brandt 0 siblings, 2 replies; 16+ messages in thread From: Mario Limonciello @ 2023-08-18 18:11 UTC (permalink / raw) To: Jarkko Sakkinen, todd.e.brandt, linux-integrity Cc: linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki On 8/18/2023 13:07, Jarkko Sakkinen wrote: > On Fri Aug 18, 2023 at 8:57 PM EEST, Mario Limonciello wrote: >> On 8/18/2023 12:53, Jarkko Sakkinen wrote: >>> On Fri Aug 18, 2023 at 8:21 PM EEST, Mario Limonciello wrote: >>>> On 8/18/2023 12:00, Jarkko Sakkinen wrote: >>>>> On Fri Aug 18, 2023 at 4:58 AM EEST, Limonciello, Mario wrote: >>>>>> >>>>>> >>>>>> On 8/17/2023 5:33 PM, Jarkko Sakkinen wrote: >>>>>>> On Fri Aug 18, 2023 at 1:25 AM EEST, Todd Brandt wrote: >>>>>>>> On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen wrote: >>>>>>>>> On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt wrote: >>>>>>>>>> While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing >>>>>>>>>> a >>>>>>>>>> crash and reboot situation when S3 suspend is initiated. To >>>>>>>>>> reproduce >>>>>>>>>> it, this call is all that's required "sudo sleepgraph -m mem >>>>>>>>>> -rtcwake >>>>>>>>>> 15". >>>>>>>>> >>>>>>>>> 1. Are there logs available? >>>>>>>>> 2. Is this the test case: https://pypi.org/project/sleepgraph/ (never >>>>>>>>> used it before). >>>>>>>> >>>>>>>> There are no dmesg logs because the S3 crash wipes them out. Sleepgraph >>>>>>>> isn't actually necessary to activate it, just an S3 suspend "echo mem > >>>>>>>> /sys/power/state". >>>>>>>> >>>>>>>> So far it appears to only have affected test systems, not production >>>>>>>> hardware, and none of them have TPM chips, so I'm beginning to wonder >>>>>>>> if this patch just inadvertently activated a bug somewhere else in the >>>>>>>> kernel that happens to affect test hardware. >>>>>>>> >>>>>>>> I'll continue to debug it, this isn't an emergency as so far I haven't >>>>>>>> seen it in production hardware. >>>>>>> >>>>>>> OK, I'll still see if I could reproduce it just in case. >>>>>>> >>>>>>> BR, Jarkko >>>>>> >>>>>> I'd like to better understand what kind of TPM initialization path has >>>>>> run. Does the machine have some sort of TPM that failed to fully >>>>>> initialize perhaps? >>>>>> >>>>>> If you can't share a full bootup dmesg, can you at least share >>>>>> >>>>>> # dmesg | grep -i tpm >>>>> >>>>> It would be more useful perhaps to get full dmesg output after power on >>>>> and before going into suspend. >>>>> >>>>> Also ftrace filter could be added to the kernel command-line: >>>>> >>>>> ftrace=function ftrace_filter=tpm* >>>>> >>>>> After bootup: >>>>> >>>>> mount -t tracefs nodev /sys/kernel/tracing >>>>> cat /sys/kernel/tracing/trace >>>>> >>>>> BR, Jarkko >>>> >>>> Todd and I have gone back and forth a little bit on the bugzilla >>>> (https://bugzilla.kernel.org/show_bug.cgi?id=217804), and it seems that >>>> this isn't an S3 problem - it's a probing problem. >>>> >>>> [ 1.132521] tpm_crb: probe of INTC6001:00 failed with error 378 >>>> >>>> That error 378 specifically matches TPM2_CC_GET_CAPABILITY, which is the >>>> same command that was being requested. This leads me to believe the TPM >>>> isn't ready at the time of probing. >>>> >>>> In this case one solution is we could potentially ignore failures for >>>> that tpm2_get_tpm_pt() call, but I think we should first understand why >>>> it doesn't work at probing time for this TPM to ensure the actual quirk >>>> isn't built on a house of cards. >>> >>> Given that there is nothing known broken (at the moment) in production, >>> I think the following might be a reasonable change. >>> >>> BR, Jarkko >>> >> >> Yeah that would prevent it. >> >> Here's a simpler change that I think should work too though: >> diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c >> index 9eb1a18590123..b0e9931fe436c 100644 >> --- a/drivers/char/tpm/tpm_crb.c >> +++ b/drivers/char/tpm/tpm_crb.c >> @@ -472,8 +472,7 @@ static int crb_check_flags(struct tpm_chip *chip) >> if (ret) >> return ret; >> >> - ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL); >> - if (ret) >> + if (tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL)) >> goto release; >> >> if (val == 0x414D4400U /* AMD */) >> >> I think Todd needs to check whether TPM works with that or not though. > > Hmm... I'm sorry if I have a blind spot now but what is that changing? > > BR, Jarkko It throws away the error code if it fails for some reason. Todd just checked it works too. I'll drop it on the M/L for review. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems 2023-08-18 18:11 ` Mario Limonciello @ 2023-08-18 20:08 ` Todd Brandt 2023-08-18 21:39 ` Todd Brandt 1 sibling, 0 replies; 16+ messages in thread From: Todd Brandt @ 2023-08-18 20:08 UTC (permalink / raw) To: Mario Limonciello, Jarkko Sakkinen, linux-integrity Cc: linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki On Fri, 2023-08-18 at 13:11 -0500, Mario Limonciello wrote: > On 8/18/2023 13:07, Jarkko Sakkinen wrote: > > On Fri Aug 18, 2023 at 8:57 PM EEST, Mario Limonciello wrote: > > > On 8/18/2023 12:53, Jarkko Sakkinen wrote: > > > > On Fri Aug 18, 2023 at 8:21 PM EEST, Mario Limonciello wrote: > > > > > On 8/18/2023 12:00, Jarkko Sakkinen wrote: > > > > > > On Fri Aug 18, 2023 at 4:58 AM EEST, Limonciello, Mario > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > On 8/17/2023 5:33 PM, Jarkko Sakkinen wrote: > > > > > > > > On Fri Aug 18, 2023 at 1:25 AM EEST, Todd Brandt wrote: > > > > > > > > > On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen > > > > > > > > > wrote: > > > > > > > > > > On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt > > > > > > > > > > wrote: > > > > > > > > > > > While testing S3 on 6.5.0-rc6 we've found that 5 > > > > > > > > > > > systems are seeing > > > > > > > > > > > a > > > > > > > > > > > crash and reboot situation when S3 suspend is > > > > > > > > > > > initiated. To > > > > > > > > > > > reproduce > > > > > > > > > > > it, this call is all that's required "sudo > > > > > > > > > > > sleepgraph -m mem > > > > > > > > > > > -rtcwake > > > > > > > > > > > 15". > > > > > > > > > > > > > > > > > > > > 1. Are there logs available? > > > > > > > > > > 2. Is this the test case: > > > > > > > > > > https://pypi.org/project/sleepgraph/ (never > > > > > > > > > > used it before). > > > > > > > > > > > > > > > > > > There are no dmesg logs because the S3 crash wipes > > > > > > > > > them out. Sleepgraph > > > > > > > > > isn't actually necessary to activate it, just an S3 > > > > > > > > > suspend "echo mem > > > > > > > > > > /sys/power/state". > > > > > > > > > > > > > > > > > > So far it appears to only have affected test systems, > > > > > > > > > not production > > > > > > > > > hardware, and none of them have TPM chips, so I'm > > > > > > > > > beginning to wonder > > > > > > > > > if this patch just inadvertently activated a bug > > > > > > > > > somewhere else in the > > > > > > > > > kernel that happens to affect test hardware. > > > > > > > > > > > > > > > > > > I'll continue to debug it, this isn't an emergency as > > > > > > > > > so far I haven't > > > > > > > > > seen it in production hardware. > > > > > > > > > > > > > > > > OK, I'll still see if I could reproduce it just in > > > > > > > > case. > > > > > > > > > > > > > > > > BR, Jarkko > > > > > > > > > > > > > > I'd like to better understand what kind of TPM > > > > > > > initialization path has > > > > > > > run. Does the machine have some sort of TPM that failed > > > > > > > to fully > > > > > > > initialize perhaps? > > > > > > > > > > > > > > If you can't share a full bootup dmesg, can you at least > > > > > > > share > > > > > > > > > > > > > > # dmesg | grep -i tpm > > > > > > > > > > > > It would be more useful perhaps to get full dmesg output > > > > > > after power on > > > > > > and before going into suspend. > > > > > > > > > > > > Also ftrace filter could be added to the kernel command- > > > > > > line: > > > > > > > > > > > > ftrace=function ftrace_filter=tpm* > > > > > > > > > > > > After bootup: > > > > > > > > > > > > mount -t tracefs nodev /sys/kernel/tracing > > > > > > cat /sys/kernel/tracing/trace > > > > > > > > > > > > BR, Jarkko > > > > > > > > > > Todd and I have gone back and forth a little bit on the > > > > > bugzilla > > > > > (https://bugzilla.kernel.org/show_bug.cgi?id=217804), and it > > > > > seems that > > > > > this isn't an S3 problem - it's a probing problem. > > > > > > > > > > [ 1.132521] tpm_crb: probe of INTC6001:00 failed with > > > > > error 378 > > > > > > > > > > That error 378 specifically matches TPM2_CC_GET_CAPABILITY, > > > > > which is the > > > > > same command that was being requested. This leads me to > > > > > believe the TPM > > > > > isn't ready at the time of probing. > > > > > > > > > > In this case one solution is we could potentially ignore > > > > > failures for > > > > > that tpm2_get_tpm_pt() call, but I think we should first > > > > > understand why > > > > > it doesn't work at probing time for this TPM to ensure the > > > > > actual quirk > > > > > isn't built on a house of cards. > > > > > > > > Given that there is nothing known broken (at the moment) in > > > > production, > > > > I think the following might be a reasonable change. > > > > > > > > BR, Jarkko > > > > > > > > > > Yeah that would prevent it. > > > > > > Here's a simpler change that I think should work too though: > > > diff --git a/drivers/char/tpm/tpm_crb.c > > > b/drivers/char/tpm/tpm_crb.c > > > index 9eb1a18590123..b0e9931fe436c 100644 > > > --- a/drivers/char/tpm/tpm_crb.c > > > +++ b/drivers/char/tpm/tpm_crb.c > > > @@ -472,8 +472,7 @@ static int crb_check_flags(struct tpm_chip > > > *chip) > > > if (ret) > > > return ret; > > > > > > - ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, > > > NULL); > > > - if (ret) > > > + if (tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, > > > NULL)) > > > goto release; > > > > > > if (val == 0x414D4400U /* AMD */) > > > > > > I think Todd needs to check whether TPM works with that or not > > > though. > > > > Hmm... I'm sorry if I have a blind spot now but what is that > > changing? > > > > BR, Jarkko > > It throws away the error code if it fails for some reason. > Todd just checked it works too. I'll drop it on the M/L for review. Yea, instead of returning the error code, it just goes to "release" with the ret value uninitialized. So the issue is apparently that Intel TPM sometimes does not properly returning the manufacturer code in early boot. This patch just inadvertently triggered the real issue. We'll investigate this to see if there's a solution, but for now this patch works great. Thanks Mario and Jarkko! ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems 2023-08-18 18:11 ` Mario Limonciello 2023-08-18 20:08 ` Todd Brandt @ 2023-08-18 21:39 ` Todd Brandt 2023-09-02 2:47 ` Tyler Stachecki 1 sibling, 1 reply; 16+ messages in thread From: Todd Brandt @ 2023-08-18 21:39 UTC (permalink / raw) To: Mario Limonciello, Jarkko Sakkinen, linux-integrity Cc: linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki On Fri, 2023-08-18 at 13:11 -0500, Mario Limonciello wrote: > On 8/18/2023 13:07, Jarkko Sakkinen wrote: > > On Fri Aug 18, 2023 at 8:57 PM EEST, Mario Limonciello wrote: > > > On 8/18/2023 12:53, Jarkko Sakkinen wrote: > > > > On Fri Aug 18, 2023 at 8:21 PM EEST, Mario Limonciello wrote: > > > > > On 8/18/2023 12:00, Jarkko Sakkinen wrote: > > > > > > On Fri Aug 18, 2023 at 4:58 AM EEST, Limonciello, Mario > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > On 8/17/2023 5:33 PM, Jarkko Sakkinen wrote: > > > > > > > > On Fri Aug 18, 2023 at 1:25 AM EEST, Todd Brandt wrote: > > > > > > > > > On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen > > > > > > > > > wrote: > > > > > > > > > > On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt > > > > > > > > > > wrote: > > > > > > > > > > > While testing S3 on 6.5.0-rc6 we've found that 5 > > > > > > > > > > > systems are seeing > > > > > > > > > > > a > > > > > > > > > > > crash and reboot situation when S3 suspend is > > > > > > > > > > > initiated. To > > > > > > > > > > > reproduce > > > > > > > > > > > it, this call is all that's required "sudo > > > > > > > > > > > sleepgraph -m mem > > > > > > > > > > > -rtcwake > > > > > > > > > > > 15". > > > > > > > > > > > > > > > > > > > > 1. Are there logs available? > > > > > > > > > > 2. Is this the test case: > > > > > > > > > > https://pypi.org/project/sleepgraph/ (never > > > > > > > > > > used it before). > > > > > > > > > > > > > > > > > > There are no dmesg logs because the S3 crash wipes > > > > > > > > > them out. Sleepgraph > > > > > > > > > isn't actually necessary to activate it, just an S3 > > > > > > > > > suspend "echo mem > > > > > > > > > > /sys/power/state". > > > > > > > > > > > > > > > > > > So far it appears to only have affected test systems, > > > > > > > > > not production > > > > > > > > > hardware, and none of them have TPM chips, so I'm > > > > > > > > > beginning to wonder > > > > > > > > > if this patch just inadvertently activated a bug > > > > > > > > > somewhere else in the > > > > > > > > > kernel that happens to affect test hardware. > > > > > > > > > > > > > > > > > > I'll continue to debug it, this isn't an emergency as > > > > > > > > > so far I haven't > > > > > > > > > seen it in production hardware. > > > > > > > > > > > > > > > > OK, I'll still see if I could reproduce it just in > > > > > > > > case. > > > > > > > > > > > > > > > > BR, Jarkko > > > > > > > > > > > > > > I'd like to better understand what kind of TPM > > > > > > > initialization path has > > > > > > > run. Does the machine have some sort of TPM that failed > > > > > > > to fully > > > > > > > initialize perhaps? > > > > > > > > > > > > > > If you can't share a full bootup dmesg, can you at least > > > > > > > share > > > > > > > > > > > > > > # dmesg | grep -i tpm > > > > > > > > > > > > It would be more useful perhaps to get full dmesg output > > > > > > after power on > > > > > > and before going into suspend. > > > > > > > > > > > > Also ftrace filter could be added to the kernel command- > > > > > > line: > > > > > > > > > > > > ftrace=function ftrace_filter=tpm* > > > > > > > > > > > > After bootup: > > > > > > > > > > > > mount -t tracefs nodev /sys/kernel/tracing > > > > > > cat /sys/kernel/tracing/trace > > > > > > > > > > > > BR, Jarkko > > > > > > > > > > Todd and I have gone back and forth a little bit on the > > > > > bugzilla > > > > > (https://bugzilla.kernel.org/show_bug.cgi?id=217804), and it > > > > > seems that > > > > > this isn't an S3 problem - it's a probing problem. > > > > > > > > > > [ 1.132521] tpm_crb: probe of INTC6001:00 failed with > > > > > error 378 > > > > > > > > > > That error 378 specifically matches TPM2_CC_GET_CAPABILITY, > > > > > which is the > > > > > same command that was being requested. This leads me to > > > > > believe the TPM > > > > > isn't ready at the time of probing. > > > > > > > > > > In this case one solution is we could potentially ignore > > > > > failures for > > > > > that tpm2_get_tpm_pt() call, but I think we should first > > > > > understand why > > > > > it doesn't work at probing time for this TPM to ensure the > > > > > actual quirk > > > > > isn't built on a house of cards. > > > > > > > > Given that there is nothing known broken (at the moment) in > > > > production, > > > > I think the following might be a reasonable change. > > > > > > > > BR, Jarkko > > > > > > > > > > Yeah that would prevent it. > > > > > > Here's a simpler change that I think should work too though: > > > diff --git a/drivers/char/tpm/tpm_crb.c > > > b/drivers/char/tpm/tpm_crb.c > > > index 9eb1a18590123..b0e9931fe436c 100644 > > > --- a/drivers/char/tpm/tpm_crb.c > > > +++ b/drivers/char/tpm/tpm_crb.c > > > @@ -472,8 +472,7 @@ static int crb_check_flags(struct tpm_chip > > > *chip) > > > if (ret) > > > return ret; > > > > > > - ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, > > > NULL); > > > - if (ret) > > > + if (tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, > > > NULL)) > > > goto release; > > > > > > if (val == 0x414D4400U /* AMD */) > > > > > > I think Todd needs to check whether TPM works with that or not > > > though. > > > > Hmm... I'm sorry if I have a blind spot now but what is that > > changing? > > > > BR, Jarkko > > It throws away the error code if it fails for some reason. > Todd just checked it works too. I'll drop it on the M/L for review. I just ran 6.5.0-rc6 plus this patch on all 5 machines where the problem was detected and they work now. It looks good. Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems 2023-08-18 21:39 ` Todd Brandt @ 2023-09-02 2:47 ` Tyler Stachecki 2023-09-04 22:30 ` Jarkko Sakkinen 0 siblings, 1 reply; 16+ messages in thread From: Tyler Stachecki @ 2023-09-02 2:47 UTC (permalink / raw) To: Todd Brandt Cc: Mario Limonciello, Jarkko Sakkinen, linux-integrity, linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki On Fri, Aug 18, 2023 at 02:39:58PM -0700, Todd Brandt wrote: > > > > > > > > > > > > While testing S3 on 6.5.0-rc6 we've found that 5 > > > > > > > > > > > > systems are seeing > > > > > > > > > > > > a > > > > > > > > > > > > crash and reboot situation when S3 suspend is > > > > > > > > > > > > initiated. To > > > > > > > > > > > > reproduce > > > > > > > > > > > > it, this call is all that's required "sudo > > > > > > > > > > > > sleepgraph -m mem > > > > > > > > > > > > -rtcwake > > > > > > > > > > > > 15". > > > > > > > > > > > > > I just ran 6.5.0-rc6 plus this patch on all 5 machines where the > problem was detected and they work now. It looks good. > > Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com> I just wanted to report that at least on an SBC running a J4105, this prevented the kernel from booting *at all* ever since 6.1.46 was released. The fix which drops the ret assignment above allows the SBC to successfully boot a kernel again -- just mentioning for severity given that: 1) It also impacts the 6.1 LTS kernel that distros are tracking 2) In some cases, the kernel no longer boots anymore at all SBC this is seen on is a Seeed Studio ODYSSEY - X86J4105 for the curious. Tested-by: Tyler J Stachecki <stachecki.tyler@gmail.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems 2023-09-02 2:47 ` Tyler Stachecki @ 2023-09-04 22:30 ` Jarkko Sakkinen 0 siblings, 0 replies; 16+ messages in thread From: Jarkko Sakkinen @ 2023-09-04 22:30 UTC (permalink / raw) To: Tyler Stachecki, Todd Brandt Cc: Mario Limonciello, linux-integrity, linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki On Sat Sep 2, 2023 at 5:47 AM EEST, Tyler Stachecki wrote: > On Fri, Aug 18, 2023 at 02:39:58PM -0700, Todd Brandt wrote: > > > > > > > > > > > > > While testing S3 on 6.5.0-rc6 we've found that 5 > > > > > > > > > > > > > systems are seeing > > > > > > > > > > > > > a > > > > > > > > > > > > > crash and reboot situation when S3 suspend is > > > > > > > > > > > > > initiated. To > > > > > > > > > > > > > reproduce > > > > > > > > > > > > > it, this call is all that's required "sudo > > > > > > > > > > > > > sleepgraph -m mem > > > > > > > > > > > > > -rtcwake > > > > > > > > > > > > > 15". > > > > > > > > > > > > > > > > I just ran 6.5.0-rc6 plus this patch on all 5 machines where the > > problem was detected and they work now. It looks good. > > > > Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com> > > I just wanted to report that at least on an SBC running a J4105, this > prevented the kernel from booting *at all* ever since 6.1.46 was released. > > The fix which drops the ret assignment above allows the SBC to successfully > boot a kernel again -- just mentioning for severity given that: > 1) It also impacts the 6.1 LTS kernel that distros are tracking > 2) In some cases, the kernel no longer boots anymore at all > > SBC this is seen on is a Seeed Studio ODYSSEY - X86J4105 for the curious. > > Tested-by: Tyler J Stachecki <stachecki.tyler@gmail.com> Hi I submitted PR today: https://lore.kernel.org/linux-integrity/20230904202512.29825-1-jarkko@kernel.org/T/#u I hope this ends the story but please if does not. Unfortunately I missed this tested-by b4 sending the PR (sorry about that). BR, Jarkko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems 2023-08-17 22:33 ` Jarkko Sakkinen 2023-08-18 1:58 ` Limonciello, Mario @ 2023-08-18 16:51 ` Jarkko Sakkinen 1 sibling, 0 replies; 16+ messages in thread From: Jarkko Sakkinen @ 2023-08-18 16:51 UTC (permalink / raw) To: Jarkko Sakkinen, todd.e.brandt, mario.limonciello, linux-integrity Cc: linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki On Fri Aug 18, 2023 at 1:33 AM EEST, Jarkko Sakkinen wrote: > On Fri Aug 18, 2023 at 1:25 AM EEST, Todd Brandt wrote: > > On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen wrote: > > > On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt wrote: > > > > While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing > > > > a > > > > crash and reboot situation when S3 suspend is initiated. To > > > > reproduce > > > > it, this call is all that's required "sudo sleepgraph -m mem > > > > -rtcwake > > > > 15". > > > > > > 1. Are there logs available? > > > 2. Is this the test case: https://pypi.org/project/sleepgraph/ (never > > > used it before). > > > > There are no dmesg logs because the S3 crash wipes them out. Sleepgraph > > isn't actually necessary to activate it, just an S3 suspend "echo mem > > > /sys/power/state". > > > > So far it appears to only have affected test systems, not production > > hardware, and none of them have TPM chips, so I'm beginning to wonder > > if this patch just inadvertently activated a bug somewhere else in the > > kernel that happens to affect test hardware. > > > > I'll continue to debug it, this isn't an emergency as so far I haven't > > seen it in production hardware. > > OK, I'll still see if I could reproduce it just in case. I created a BR environment for QA: https://github.com/jarkkojs/buildroot-tpmdd/tree/linux-6.5.y Building: make qemu_x64_64_config && make Transcript: $ output/images/start-qemu.sh --use-system-swtpm --serial-only --rtc Enable SWTPM Enable RTC SeaBIOS (version rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org) iPXE (http://ipxe.org) 00:03.0 CA00 PCI2.10 PnP PMM+06FD0F20+06F30F20 CA00 Booting from ROM.. Linux version 6.5.0-rc6 (jarkko@suppilovahvero) (x86_64-buildroot-linux-gnu-gcc.br_real (Buildroot 2023.08-rc1-94-gc4bbd4b659) 12.3.0, GNU ld (GNU Binutils) 2.40) #2 SMP PREEMPT_DYNAMIC Fri Aug 18 19:35:50 EEST 2023 Command line: rootwait root=/dev/vda console=tty1 console=ttyS0 BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x0000000007fccfff] usable BIOS-e820: [mem 0x0000000007fcd000-0x0000000007ffffff] reserved BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved BIOS-e820: [mem 0x000000fd00000000-0x000000ffffffffff] reserved NX (Execute Disable) protection: active SMBIOS 2.8 present. DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 tsc: Fast TSC calibration using PIT tsc: Detected 2995.194 MHz processor last_pfn = 0x7fcd max_arch_pfn = 0x400000000 MTRR map: 4 entries (3 fixed + 1 variable; max 19), built from 8 variable MTRRs x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT found SMP MP-table at [mem 0x000f5b90-0x000f5b9f] ACPI: Early table checksum verification disabled ACPI: RSDP 0x00000000000F59B0 000014 (v00 BOCHS ) ACPI: RSDT 0x0000000007FD1DA8 000038 (v01 BOCHS BXPC 00000001 BXPC 00000001) ACPI: FACP 0x0000000007FD1C10 000074 (v01 BOCHS BXPC 00000001 BXPC 00000001) ACPI: DSDT 0x0000000007FD0040 001BD0 (v01 BOCHS BXPC 00000001 BXPC 00000001) ACPI: FACS 0x0000000007FD0000 000040 ACPI: APIC 0x0000000007FD1C84 000078 (v01 BOCHS BXPC 00000001 BXPC 00000001) ACPI: HPET 0x0000000007FD1CFC 000038 (v01 BOCHS BXPC 00000001 BXPC 00000001) ACPI: TPM2 0x0000000007FD1D34 00004C (v04 BOCHS BXPC 00000001 BXPC 00000001) ACPI: WAET 0x0000000007FD1D80 000028 (v01 BOCHS BXPC 00000001 BXPC 00000001) ACPI: Reserving FACP table memory at [mem 0x7fd1c10-0x7fd1c83] ACPI: Reserving DSDT table memory at [mem 0x7fd0040-0x7fd1c0f] ACPI: Reserving FACS table memory at [mem 0x7fd0000-0x7fd003f] ACPI: Reserving APIC table memory at [mem 0x7fd1c84-0x7fd1cfb] ACPI: Reserving HPET table memory at [mem 0x7fd1cfc-0x7fd1d33] ACPI: Reserving TPM2 table memory at [mem 0x7fd1d34-0x7fd1d7f] ACPI: Reserving WAET table memory at [mem 0x7fd1d80-0x7fd1da7] Zone ranges: DMA [mem 0x0000000000001000-0x0000000000ffffff] DMA32 [mem 0x0000000001000000-0x0000000007fccfff] Normal empty Movable zone start for each node Early memory node ranges node 0: [mem 0x0000000000001000-0x000000000009efff] node 0: [mem 0x0000000000100000-0x0000000007fccfff] Initmem setup node 0 [mem 0x0000000000001000-0x0000000007fccfff] On node 0, zone DMA: 1 pages in unavailable ranges On node 0, zone DMA: 97 pages in unavailable ranges On node 0, zone DMA32: 51 pages in unavailable ranges ACPI: PM-Timer IO Port: 0x608 ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) ACPI: Using ACPI (MADT) for SMP configuration information ACPI: HPET id: 0x8086a201 base: 0xfed00000 smpboot: Allowing 1 CPUs, 0 hotplug CPUs [mem 0x08000000-0xfffbffff] available for PCI devices Booting paravirtualized kernel on bare hardware clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns setup_percpu: NR_CPUS:64 nr_cpumask_bits:1 nr_cpu_ids:1 nr_node_ids:1 percpu: Embedded 44 pages/cpu s140520 r8192 d31512 u2097152 Kernel command line: rootwait root=/dev/vda console=tty1 console=ttyS0 Dentry cache hash table entries: 16384 (order: 5, 131072 bytes, linear) Inode-cache hash table entries: 8192 (order: 4, 65536 bytes, linear) Built 1 zonelists, mobility grouping on. Total pages: 31949 mem auto-init: stack:all(zero), heap alloc:off, heap free:off Memory: 104524K/130476K available (10240K kernel code, 943K rwdata, 6612K rodata, 1776K init, 1272K bss, 25692K reserved, 0K cma-reserved) SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 Dynamic Preempt: none rcu: Preemptible hierarchical RCU implementation. rcu: RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=1. rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1 NR_IRQS: 4352, nr_irqs: 48, preallocated irqs: 16 rcu: srcu_init: Setting srcu_struct sizes based on contention. Console: colour VGA+ 80x25 printk: console [tty1] enabled printk: console [ttyS0] enabled ACPI: Core revision 20230331 clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns APIC: Switch to symmetric I/O mode setup ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2b2c89527d5, max_idle_ns: 440795312054 ns Calibrating delay loop (skipped), value calculated using timer frequency.. 5990.38 BogoMIPS (lpj=11980776) process: using AMD E400 aware idle routine Last level iTLB entries: 4KB 512, 2MB 255, 4MB 127 Last level dTLB entries: 4KB 512, 2MB 255, 4MB 127, 1GB 0 Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization Spectre V2 : Mitigation: Retpolines Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch Spectre V2 : Spectre v2 / SpectreRSB : Filling RSB on VMEXIT x86/fpu: x87 FPU will use FXSAVE Freeing SMP alternatives memory: 28K pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear) Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear) smpboot: CPU0: AMD QEMU Virtual CPU version 2.5+ (family: 0xf, model: 0x6b, stepping: 0x1) Performance Events: PMU not available due to virtualization, using software events only. signal: max sigframe size: 1040 rcu: Hierarchical SRCU implementation. rcu: Max phase no-delay instances is 1000. smp: Bringing up secondary CPUs ... smp: Brought up 1 node, 1 CPU smpboot: Max logical packages: 1 smpboot: Total of 1 processors activated (5990.38 BogoMIPS) devtmpfs: initialized clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns futex hash table entries: 256 (order: 2, 16384 bytes, linear) NET: Registered PF_NETLINK/PF_ROUTE protocol family thermal_sys: Registered thermal governor 'step_wise' thermal_sys: Registered thermal governor 'user_space' cpuidle: using governor ladder PCI: Using configuration type 1 for base access ACPI: Added _OSI(Module Device) ACPI: Added _OSI(Processor Device) ACPI: Added _OSI(3.0 _SCP Extensions) ACPI: Added _OSI(Processor Aggregator Device) ACPI: 1 ACPI AML tables successfully acquired and loaded ACPI: Interpreter enabled ACPI: PM: (supports S0 S3 S5) ACPI: Using IOAPIC for interrupt routing PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug PCI: Using E820 reservations for host bridge windows ACPI: Enabled 2 GPEs in block 00 to 0F ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff]) acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments HPX-Type3] acpi PNP0A03:00: PCIe port services disabled; not requesting _OSC control acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended configuration space under this bridge PCI host bridge to bus 0000:00 pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] pci_bus 0000:00: root bus resource [mem 0x08000000-0xfebfffff window] pci_bus 0000:00: root bus resource [mem 0x100000000-0x17fffffff window] pci_bus 0000:00: root bus resource [bus 00-ff] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000 pci 0000:00:01.0: [8086:7000] type 00 class 0x060100 pci 0000:00:01.1: [8086:7010] type 00 class 0x010180 pci 0000:00:01.1: reg 0x20: [io 0xc0a0-0xc0af] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io 0x01f0-0x01f7] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io 0x03f6] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io 0x0170-0x0177] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io 0x0376] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000 pci 0000:00:01.3: quirk: [io 0x0600-0x063f] claimed by PIIX4 ACPI pci 0000:00:01.3: quirk: [io 0x0700-0x070f] claimed by PIIX4 SMB pci 0000:00:02.0: [1234:1111] type 00 class 0x030000 pci 0000:00:02.0: reg 0x10: [mem 0xfd000000-0xfdffffff pref] pci 0000:00:02.0: reg 0x18: [mem 0xfebd0000-0xfebd0fff] pci 0000:00:02.0: reg 0x30: [mem 0xfebc0000-0xfebcffff pref] pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff] pci 0000:00:03.0: [1af4:1000] type 00 class 0x020000 pci 0000:00:03.0: reg 0x10: [io 0xc080-0xc09f] pci 0000:00:03.0: reg 0x14: [mem 0xfebd1000-0xfebd1fff] pci 0000:00:03.0: reg 0x20: [mem 0xfe000000-0xfe003fff 64bit pref] pci 0000:00:03.0: reg 0x30: [mem 0xfeb80000-0xfebbffff pref] pci 0000:00:04.0: [1af4:1001] type 00 class 0x010000 pci 0000:00:04.0: reg 0x10: [io 0xc000-0xc07f] pci 0000:00:04.0: reg 0x14: [mem 0xfebd2000-0xfebd2fff] pci 0000:00:04.0: reg 0x20: [mem 0xfe004000-0xfe007fff 64bit pref] ACPI: PCI: Interrupt link LNKA configured for IRQ 10 ACPI: PCI: Interrupt link LNKB configured for IRQ 10 ACPI: PCI: Interrupt link LNKC configured for IRQ 11 ACPI: PCI: Interrupt link LNKD configured for IRQ 11 ACPI: PCI: Interrupt link LNKS configured for IRQ 9 iommu: Default domain type: Translated iommu: DMA domain TLB invalidation policy: lazy mode SCSI subsystem initialized ACPI: bus type USB registered usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb pps_core: LinuxPPS API ver. 1 registered pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it> PTP clock support registered Advanced Linux Sound Architecture Driver Initialized. PCI: Using ACPI for IRQ routing pci 0000:00:02.0: vgaarb: setting as boot VGA device pci 0000:00:02.0: vgaarb: bridge control possible pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none vgaarb: loaded clocksource: Switched to clocksource tsc-early pnp: PnP ACPI init pnp: PnP ACPI: found 6 devices clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns NET: Registered PF_INET protocol family IP idents hash table entries: 2048 (order: 2, 16384 bytes, linear) tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 4096 bytes, linear) Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear) TCP established hash table entries: 1024 (order: 1, 8192 bytes, linear) TCP bind hash table entries: 1024 (order: 3, 32768 bytes, linear) TCP: Hash tables configured (established 1024 bind 1024) UDP hash table entries: 256 (order: 1, 8192 bytes, linear) UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear) NET: Registered PF_UNIX/PF_LOCAL protocol family pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window] pci_bus 0000:00: resource 7 [mem 0x08000000-0xfebfffff window] pci_bus 0000:00: resource 8 [mem 0x100000000-0x17fffffff window] pci 0000:00:01.0: PIIX3: Enabling Passive Release pci 0000:00:00.0: Limiting direct PCI/PCI transfers PCI: CLS 0 bytes, default 64 workingset: timestamp_bits=62 max_order=15 bucket_order=0 Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250) io scheduler mq-deadline registered io scheduler kyber registered input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 ACPI: button: Power Button [PWRF] ACPI: \_SB_.LNKC: Enabled at IRQ 11 ACPI: \_SB_.LNKD: Enabled at IRQ 10 Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A ACPI: bus type drm_connector registered bochs-drm 0000:00:02.0: vgaarb: deactivate vga console Console: switching to colour dummy device 80x25 [drm] Found bochs VGA, ID 0xb0c5. [drm] Framebuffer size 16384 kB @ 0xfd000000, mmio @ 0xfebd0000. [drm] Found EDID data blob. [drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:02.0 on minor 0 virtio_blk virtio1: 1/0/0 default/read/poll queues virtio_blk virtio1: [vda] 122880 512-byte logical blocks (62.9 MB/60.0 MiB) scsi host0: ata_piix scsi host1: ata_piix ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc0a0 irq 14 ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc0a8 irq 15 usbcore: registered new interface driver usb-storage i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12 serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 rtc_cmos 00:05: RTC can wake from S4 input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1 rtc_cmos 00:05: registered as rtc0 rtc_cmos 00:05: setting system clock to 2023-08-18T19:35:58 UTC (1692387358) rtc_cmos 00:05: alarms up to one day, y3k, 242 bytes nvram, hpet irqs fail to initialize ptp_kvm usbcore: registered new interface driver usbhid usbhid: USB HID core driver NET: Registered PF_INET6 protocol family Segment Routing with IPv6 In-situ OAM (IOAM) with IPv6 sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver NET: Registered PF_PACKET protocol family IPI shorthand broadcast: enabled sched_clock: Marking stable (654763761, 13648236)->(674898287, -6486290) ALSA device list: No soundcards found. ata2: found unknown device (class 0) ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100 scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ PQ: 0 ANSI: 5 input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3 EXT4-fs (vda): mounting ext2 file system using the ext4 subsystem EXT4-fs (vda): mounted filesystem dbde8608-8ccb-44fe-8e89-a32f7984e7a4 ro without journal. Quota mode: disabled. VFS: Mounted root (ext2 filesystem) readonly on device 254:0. devtmpfs: mounted Freeing unused kernel image (initmem) memory: 1776K Write protecting the kernel read-only data: 18432k Freeing unused kernel image (rodata/data gap) memory: 1580K Run /sbin/init as init process EXT4-fs (vda): re-mounted dbde8608-8ccb-44fe-8e89-a32f7984e7a4 r/w. Quota mode: disabled. Starting syslogd: OK Starting klogd: OK Running sysctl: tsc: Refined TSC clocksource calibration: 2995.170 MHz clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2b2c72918ff, max_idle_ns: 440795304087 ns clocksource: Switched to clocksource tsc OK Saving 256 bits of non-creditable seed for next boot Starting network: udhcpc: started, v1.36.1 udhcpc: broadcasting discover udhcpc: broadcasting select for 10.0.2.15, server 10.0.2.2 udhcpc: lease of 10.0.2.15 obtained from 10.0.2.2, lease time 86400 deleting routers adding dns 10.0.2.3 OK Starting dropbear sshd: OK mount: mounting nodev on /sys/kernel/tracing failed: No such file or directory mount: mounting nodev on /sys/kernel/security failed: No such file or directory Welcome to Buildroot buildroot login: root # ls /dev/rtc0 ^C # ^C # rtcwake -s 15 -m mem rtcwake: assuming RTC uses UTC ... rtcwake: wakeup from "mem" using /dev/rtc0 at Fri Aug 18 19:36:29 2023 PM: suspend entry (deep) Filesystems sync: 0.000 seconds Freezing user space processes Freezing user space processes completed (elapsed 0.001 seconds) OOM killer disabled. Freezing remaining freezable tasks Freezing remaining freezable tasks completed (elapsed 0.003 seconds) printk: Suspending console(s) (use no_console_suspend to debug) ------------[ cut here ]------------ WARNING: CPU: 0 PID: 10 at kernel/workqueue.c:3382 __flush_work.isra.0+0x2d9/0x300 Modules linked in: CPU: 0 PID: 10 Comm: kworker/u2:0 Not tainted 6.5.0-rc6 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound async_run_entry_fn RIP: 0010:__flush_work.isra.0+0x2d9/0x300 Code: 39 c2 0f 84 e3 fe ff ff 48 8b 46 40 48 85 c0 0f 84 d6 fe ff ff 48 8b 78 50 e8 63 95 01 00 e9 c8 fe ff ff 0f 0b e9 0d ff ff ff <0f> 0b e9 06 ff ff ff 0f 0b e9 a1 fe ff ff 0f 0b e9 74 fe ff ff e8 RSP: 0018:ffffb07c0005bc68 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff9b12c125d330 RCX: 0000000000000000 RDX: ffffb07c0005bc70 RSI: 0000000000000001 RDI: ffffb07c0005bcb8 RBP: ffffb07c0005bce8 R08: 0000000000000000 R09: 0000000000000308 R10: ffff9b12c1144890 R11: ffff9b12c7a21370 R12: ffff9b12c125d330 R13: 0000000000000001 R14: 0000000000000001 R15: ffff9b12c105a005 FS: 0000000000000000(0000) GS:ffff9b12c7a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f56fdf201d6 CR3: 0000000001b1e000 CR4: 00000000000006f0 Call Trace: <TASK> ? show_regs+0x64/0x70 ? __warn+0x74/0xe0 ? __flush_work.isra.0+0x2d9/0x300 ? report_bug+0x147/0x1a0 ? handle_bug+0x40/0x70 ? exc_invalid_op+0x1c/0x70 ? asm_exc_invalid_op+0x1f/0x30 ? __flush_work.isra.0+0x2d9/0x300 ? set_next_entity+0x7b/0x80 __cancel_work_timer+0xfc/0x190 cancel_delayed_work_sync+0x12/0x20 drm_kms_helper_poll_disable+0x21/0x40 drm_mode_config_helper_suspend+0x1a/0x60 bochs_pm_suspend+0x11/0x20 pci_pm_suspend+0x80/0x230 ? __pfx_pci_pm_suspend+0x10/0x10 __device_suspend+0x112/0x3d0 async_suspend+0x1e/0xa0 async_run_entry_fn+0x24/0xb0 process_one_work+0x1e0/0x370 worker_thread+0x4f/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0xf1/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x3c/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> ---[ end trace 0000000000000000 ]--- ACPI: PM: Preparing to enter system sleep state S3 ACPI: PM: Saving platform NVS memory Disabling non-boot CPUs ... ACPI: PM: Low-level resume complete ACPI: PM: Restoring platform NVS memory ACPI: PM: Waking up from system sleep state S3 pci 0000:00:01.0: PIIX3: Enabling Passive Release virtio_blk virtio1: 1/0/0 default/read/poll queues OOM killer enabled. Restarting tasks ... done. PM: suspend exit # ata2: found unknown device (class 0) # dmesg|grep -i tpm [ 0.016261] ACPI: TPM2 0x0000000007FD1D34 00004C (v04 BOCHS BXPC 00000001 BXPC 00000001) [ 0.016351] ACPI: Reserving TPM2 table memory at [mem 0x7fd1d34-0x7fd1d7f] # /usr/lib/kselftests/run_kselftest.sh kselftest: Running tests in tpm2 TAP version 13 1..3 # timeout set to 600 # selftests: tpm2: test_smoke.sh random: crng init done # test_read_partial_overwrite (tpm2_tests.SmokeTest.test_read_partial_overwrite) ... ok # test_read_partial_resp (tpm2_tests.SmokeTest.test_read_partial_resp) ... ok # test_seal_with_auth (tpm2_tests.SmokeTest.test_seal_with_auth) ... ok # test_seal_with_policy (tpm2_tests.SmokeTest.test_seal_with_policy) ... ok # test_seal_with_too_long_auth (tpm2_tests.SmokeTest.test_seal_with_too_long_auth) ... ok # test_send_two_cmds (tpm2_tests.SmokeTest.test_send_two_cmds) ... ok # test_too_short_cmd (tpm2_tests.SmokeTest.test_too_short_cmd) ... ok # test_unseal_with_wrong_auth (tpm2_tests.SmokeTest.test_unseal_with_wrong_auth) ... ok # test_unseal_with_wrong_policy (tpm2_tests.SmokeTest.test_unseal_with_wrong_policy) ... ok # # ---------------------------------------------------------------------- # Ran 9 tests in 0.488s # # OK ok 1 selftests: tpm2: test_smoke.sh # timeout set to 600 # selftests: tpm2: test_space.sh # test_flush_context (tpm2_tests.SpaceTest.test_flush_context) ... ok # test_get_handles (tpm2_tests.SpaceTest.test_get_handles) ... ok # test_invalid_cc (tpm2_tests.SpaceTest.test_invalid_cc) ... ok # test_make_two_spaces (tpm2_tests.SpaceTest.test_make_two_spaces) ... ok # # ---------------------------------------------------------------------- # Ran 4 tests in 0.334s # # OK ok 2 selftests: tpm2: test_space.sh # timeout set to 600 # selftests: tpm2: test_async.sh # test_async (tpm2_tests.AsyncTest.test_async) ... ok # test_flush_invalid_context (tpm2_tests.AsyncTest.test_flush_invalid_context) ... ok # # ---------------------------------------------------------------------- # Ran 2 tests in 0.007s # # OK ok 3 selftests: tpm2: test_async.sh BR, Jarkko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems 2023-08-17 21:09 REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems Todd Brandt 2023-08-17 21:47 ` Jarkko Sakkinen @ 2023-08-18 0:10 ` Bagas Sanjaya 1 sibling, 0 replies; 16+ messages in thread From: Bagas Sanjaya @ 2023-08-18 0:10 UTC (permalink / raw) To: Todd Brandt, mario.limonciello, linux-integrity Cc: linux-kernel, len.brown, charles.d.prestopine, rafael.j.wysocki, Linux Regressions [-- Attachment #1: Type: text/plain, Size: 896 bytes --] On Thu, Aug 17, 2023 at 02:09:00PM -0700, Todd Brandt wrote: > While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing a > crash and reboot situation when S3 suspend is initiated. To reproduce > it, this call is all that's required "sudo sleepgraph -m mem -rtcwake > 15". > > Ive created a Bugzilla to track this issue here: > https://bugzilla.kernel.org/show_bug.cgi?id=217804 > > I've bisected the issue to this patch: > > commit 554b841d470338a3b1d6335b14ee1cd0c8f5d754 > Author: Mario Limonciello <mario.limonciello@amd.com> > Date: Wed Aug 2 07:25:33 2023 -0500 > > tpm: Disable RNG for all AMD fTPMs > Thanks for the regression report. I'm adding it to regzbot: #regzbot ^introduced: 554b841d470338 #regzbot title: Disabling RNG for AMD fTPMs breaks S3 on some Intel systems -- An old man doll... just what I always wanted! - Clara [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-09-04 22:30 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-17 21:09 REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems Todd Brandt
2023-08-17 21:47 ` Jarkko Sakkinen
2023-08-17 22:25 ` Todd Brandt
2023-08-17 22:33 ` Jarkko Sakkinen
2023-08-18 1:58 ` Limonciello, Mario
2023-08-18 17:00 ` Jarkko Sakkinen
2023-08-18 17:21 ` Mario Limonciello
[not found] ` <CUVV2MQRCGET.2U22LFQPX1J3G@suppilovahvero>
2023-08-18 17:57 ` Mario Limonciello
2023-08-18 18:07 ` Jarkko Sakkinen
2023-08-18 18:11 ` Mario Limonciello
2023-08-18 20:08 ` Todd Brandt
2023-08-18 21:39 ` Todd Brandt
2023-09-02 2:47 ` Tyler Stachecki
2023-09-04 22:30 ` Jarkko Sakkinen
2023-08-18 16:51 ` Jarkko Sakkinen
2023-08-18 0:10 ` Bagas Sanjaya
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox