* [regression] significant delays when secureboot is enabled since 6.10
@ 2024-09-10 9:01 Linux regression tracking (Thorsten Leemhuis)
2024-09-10 9:05 ` Roberto Sassu
2024-09-10 12:22 ` James Bottomley
0 siblings, 2 replies; 34+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-09-10 9:01 UTC (permalink / raw)
To: James Bottomley, Jarkko Sakkinen
Cc: keyrings, linux-integrity@vger.kernel.org, LKML,
Linux kernel regressions list, Pengyu Ma
Hi, Thorsten here, the Linux kernel's regression tracker.
James, Jarkoo, I noticed a report about a regression in
bugzilla.kernel.org that appears to be caused by this change of yours:
6519fea6fd372b ("tpm: add hmac checks to tpm2_pcr_extend()") [v6.10-rc1]
As many (most?) kernel developers don't keep an eye on the bug tracker,
I decided to forward it by mail. To quote from
https://bugzilla.kernel.org/show_bug.cgi?id=219229 :
> When secureboot is enabled,
> the kernel boot time is ~20 seconds after 6.10 kernel.
> it's ~7 seconds on 6.8 kernel version.
>
> When secureboot is disabled,
> the boot time is ~7 seconds too.
>
> Reproduced on both AMD and Intel platform on ThinkPad X1 and T14.
>
> It probably caused autologin failure and micmute led not loaded on AMD platform.
It was later bisected to the change mentioned above. See the ticket for
more details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
[1] because bugzilla.kernel.org tells users upon registration their
"email address will never be displayed to logged out users"
P.S.: let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:
#regzbot introduced: 6519fea6fd372b
#regzbot from: Pengyu Ma <mapengyu@gmail.com>
#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=219229
#regzbot title: tpm: significant delays when secureboot is enabled
#regzbot ignore-activity
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-10 9:01 [regression] significant delays when secureboot is enabled since 6.10 Linux regression tracking (Thorsten Leemhuis) @ 2024-09-10 9:05 ` Roberto Sassu 2024-09-10 12:39 ` Jarkko Sakkinen 2024-09-10 12:22 ` James Bottomley 1 sibling, 1 reply; 34+ messages in thread From: Roberto Sassu @ 2024-09-10 9:05 UTC (permalink / raw) To: Linux regressions mailing list, James Bottomley, Jarkko Sakkinen Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Tue, 2024-09-10 at 11:01 +0200, Linux regression tracking (Thorsten Leemhuis) wrote: > Hi, Thorsten here, the Linux kernel's regression tracker. > > James, Jarkoo, I noticed a report about a regression in > bugzilla.kernel.org that appears to be caused by this change of yours: > > 6519fea6fd372b ("tpm: add hmac checks to tpm2_pcr_extend()") [v6.10-rc1] > > As many (most?) kernel developers don't keep an eye on the bug tracker, > I decided to forward it by mail. To quote from > https://bugzilla.kernel.org/show_bug.cgi?id=219229 : > > > When secureboot is enabled, > > the kernel boot time is ~20 seconds after 6.10 kernel. > > it's ~7 seconds on 6.8 kernel version. > > > > When secureboot is disabled, > > the boot time is ~7 seconds too. > > > > Reproduced on both AMD and Intel platform on ThinkPad X1 and T14. > > > > It probably caused autologin failure and micmute led not loaded on AMD platform. > > It was later bisected to the change mentioned above. See the ticket for > more details. Hi I suspect I encountered the same problem: https://lore.kernel.org/linux-integrity/b8a7b3566e6014ba102ab98e10ede0d574d8930e.camel@huaweicloud.com/ Going to provide more info there. Roberto > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > -- > Everything you wanna know about Linux kernel regression tracking: > https://linux-regtracking.leemhuis.info/about/#tldr > If I did something stupid, please tell me, as explained on that page. > > [1] because bugzilla.kernel.org tells users upon registration their > "email address will never be displayed to logged out users" > > P.S.: let me use this mail to also add the report to the list of tracked > regressions to ensure it's doesn't fall through the cracks: > > #regzbot introduced: 6519fea6fd372b > #regzbot from: Pengyu Ma <mapengyu@gmail.com> > #regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=219229 > #regzbot title: tpm: significant delays when secureboot is enabled > #regzbot ignore-activity ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-10 9:05 ` Roberto Sassu @ 2024-09-10 12:39 ` Jarkko Sakkinen 2024-09-10 12:48 ` Jarkko Sakkinen 0 siblings, 1 reply; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-10 12:39 UTC (permalink / raw) To: Roberto Sassu, Linux regressions mailing list, James Bottomley Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Tue Sep 10, 2024 at 12:05 PM EEST, Roberto Sassu wrote: > On Tue, 2024-09-10 at 11:01 +0200, Linux regression tracking (Thorsten > Leemhuis) wrote: > > Hi, Thorsten here, the Linux kernel's regression tracker. > > > > James, Jarkoo, I noticed a report about a regression in > > bugzilla.kernel.org that appears to be caused by this change of yours: > > > > 6519fea6fd372b ("tpm: add hmac checks to tpm2_pcr_extend()") [v6.10-rc1] > > > > As many (most?) kernel developers don't keep an eye on the bug tracker, > > I decided to forward it by mail. To quote from > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 : > > > > > When secureboot is enabled, > > > the kernel boot time is ~20 seconds after 6.10 kernel. > > > it's ~7 seconds on 6.8 kernel version. > > > > > > When secureboot is disabled, > > > the boot time is ~7 seconds too. > > > > > > Reproduced on both AMD and Intel platform on ThinkPad X1 and T14. > > > > > > It probably caused autologin failure and micmute led not loaded on AMD platform. > > > > It was later bisected to the change mentioned above. See the ticket for > > more details. > > Hi > > I suspect I encountered the same problem: > > https://lore.kernel.org/linux-integrity/b8a7b3566e6014ba102ab98e10ede0d574d8930e.camel@huaweicloud.com/ > > Going to provide more info there. I suppose you are going try to acquire the tracing data I asked? That would be awesome, thanks for taking the troube. Let's look at the data and draw conclusions based on that. Workaround is pretty simple: CONFIG_TCG_TPM2_HMAC=n to the kernel configuration disables the feature. For making decisions what to do with the we are talking about ~2 week window estimated, given the Vienna conference slows things down, so I hope my workaround is good enough before that. > Roberto BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-10 12:39 ` Jarkko Sakkinen @ 2024-09-10 12:48 ` Jarkko Sakkinen 2024-09-10 12:57 ` James Bottomley 0 siblings, 1 reply; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-10 12:48 UTC (permalink / raw) To: Jarkko Sakkinen, Roberto Sassu, Linux regressions mailing list, James Bottomley Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Tue Sep 10, 2024 at 3:39 PM EEST, Jarkko Sakkinen wrote: > On Tue Sep 10, 2024 at 12:05 PM EEST, Roberto Sassu wrote: > > On Tue, 2024-09-10 at 11:01 +0200, Linux regression tracking (Thorsten > > Leemhuis) wrote: > > > Hi, Thorsten here, the Linux kernel's regression tracker. > > > > > > James, Jarkoo, I noticed a report about a regression in > > > bugzilla.kernel.org that appears to be caused by this change of yours: > > > > > > 6519fea6fd372b ("tpm: add hmac checks to tpm2_pcr_extend()") [v6.10-rc1] > > > > > > As many (most?) kernel developers don't keep an eye on the bug tracker, > > > I decided to forward it by mail. To quote from > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 : > > > > > > > When secureboot is enabled, > > > > the kernel boot time is ~20 seconds after 6.10 kernel. > > > > it's ~7 seconds on 6.8 kernel version. > > > > > > > > When secureboot is disabled, > > > > the boot time is ~7 seconds too. > > > > > > > > Reproduced on both AMD and Intel platform on ThinkPad X1 and T14. > > > > > > > > It probably caused autologin failure and micmute led not loaded on AMD platform. > > > > > > It was later bisected to the change mentioned above. See the ticket for > > > more details. > > > > Hi > > > > I suspect I encountered the same problem: > > > > https://lore.kernel.org/linux-integrity/b8a7b3566e6014ba102ab98e10ede0d574d8930e.camel@huaweicloud.com/ > > > > Going to provide more info there. > > I suppose you are going try to acquire the tracing data I asked? > That would be awesome, thanks for taking the troube. Let's look > at the data and draw conclusions based on that. > > Workaround is pretty simple: CONFIG_TCG_TPM2_HMAC=n to the kernel > configuration disables the feature. > > For making decisions what to do with the we are talking about ~2 > week window estimated, given the Vienna conference slows things > down, so I hope my workaround is good enough before that. I can enumerate three most likely ways to address the issue: 1. Strongest: drop from defconfig. 2. Medium: leave to defconfig but add an opt-in kernel command-line parameter. 3. Lightest: if we can based on tracing data nail the regression in sustainable schedule, fix it. Without data it is impossible to point out the right choice (or some unknown alternative that has not crossed my mind yet). BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-10 12:48 ` Jarkko Sakkinen @ 2024-09-10 12:57 ` James Bottomley 2024-09-10 13:28 ` Jarkko Sakkinen 0 siblings, 1 reply; 34+ messages in thread From: James Bottomley @ 2024-09-10 12:57 UTC (permalink / raw) To: Jarkko Sakkinen, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Tue, 2024-09-10 at 15:48 +0300, Jarkko Sakkinen wrote: > On Tue Sep 10, 2024 at 3:39 PM EEST, Jarkko Sakkinen wrote: > > On Tue Sep 10, 2024 at 12:05 PM EEST, Roberto Sassu wrote: > > > On Tue, 2024-09-10 at 11:01 +0200, Linux regression tracking > > > (Thorsten > > > Leemhuis) wrote: > > > > Hi, Thorsten here, the Linux kernel's regression tracker. > > > > > > > > James, Jarkoo, I noticed a report about a regression in > > > > bugzilla.kernel.org that appears to be caused by this change of > > > > yours: > > > > > > > > 6519fea6fd372b ("tpm: add hmac checks to tpm2_pcr_extend()") > > > > [v6.10-rc1] > > > > > > > > As many (most?) kernel developers don't keep an eye on the bug > > > > tracker, > > > > I decided to forward it by mail. To quote from > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 : > > > > > > > > > When secureboot is enabled, > > > > > the kernel boot time is ~20 seconds after 6.10 kernel. > > > > > it's ~7 seconds on 6.8 kernel version. > > > > > > > > > > When secureboot is disabled, > > > > > the boot time is ~7 seconds too. > > > > > > > > > > Reproduced on both AMD and Intel platform on ThinkPad X1 and > > > > > T14. > > > > > > > > > > It probably caused autologin failure and micmute led not > > > > > loaded on AMD platform. > > > > > > > > It was later bisected to the change mentioned above. See the > > > > ticket for > > > > more details. > > > > > > Hi > > > > > > I suspect I encountered the same problem: > > > > > > https://lore.kernel.org/linux-integrity/b8a7b3566e6014ba102ab98e10ede0d574d8930e.camel@huaweicloud.com/ > > > > > > Going to provide more info there. > > > > I suppose you are going try to acquire the tracing data I asked? > > That would be awesome, thanks for taking the troube. Let's look > > at the data and draw conclusions based on that. > > > > Workaround is pretty simple: CONFIG_TCG_TPM2_HMAC=n to the kernel > > configuration disables the feature. > > > > For making decisions what to do with the we are talking about ~2 > > week window estimated, given the Vienna conference slows things > > down, so I hope my workaround is good enough before that. > > I can enumerate three most likely ways to address the issue: > > 1. Strongest: drop from defconfig. > 2. Medium: leave to defconfig but add an opt-in kernel command-line > parameter. > 3. Lightest: if we can based on tracing data nail the regression in > sustainable schedule, fix it. Actually, there's a fourth: not use sessions for the PCR extend (if we'd got the timings when I asked, this was going to be my suggestion if they came back problematic). This seems only to be a problem for IMA measured boot (because it does lots of extends). If necessary this could even be wrapped in a separate config or boot option that only disables HMAC on extend if IMA (so we still get security for things like sd-boot) The down side of doing this is that an interposer can drop any extend it wants without being immediately detected, but as long as they don't have control of the kernel they can't change the log entry, so the mismatch would be detected on check (which has to be done by the remote verifier). The unavoidable increased threat is that if you get tricked into booting a malicious kernel (so the attacker has control of the log) and the interposer substitutes the boot measurements, it can actually fake out a remote verification system into thinking you're actually a good node. James ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-10 12:57 ` James Bottomley @ 2024-09-10 13:28 ` Jarkko Sakkinen 2024-09-11 8:53 ` Roberto Sassu 0 siblings, 1 reply; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-10 13:28 UTC (permalink / raw) To: James Bottomley, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Tue Sep 10, 2024 at 3:57 PM EEST, James Bottomley wrote: > On Tue, 2024-09-10 at 15:48 +0300, Jarkko Sakkinen wrote: > > On Tue Sep 10, 2024 at 3:39 PM EEST, Jarkko Sakkinen wrote: > > > On Tue Sep 10, 2024 at 12:05 PM EEST, Roberto Sassu wrote: > > > > On Tue, 2024-09-10 at 11:01 +0200, Linux regression tracking > > > > (Thorsten > > > > Leemhuis) wrote: > > > > > Hi, Thorsten here, the Linux kernel's regression tracker. > > > > > > > > > > James, Jarkoo, I noticed a report about a regression in > > > > > bugzilla.kernel.org that appears to be caused by this change of > > > > > yours: > > > > > > > > > > 6519fea6fd372b ("tpm: add hmac checks to tpm2_pcr_extend()") > > > > > [v6.10-rc1] > > > > > > > > > > As many (most?) kernel developers don't keep an eye on the bug > > > > > tracker, > > > > > I decided to forward it by mail. To quote from > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 : > > > > > > > > > > > When secureboot is enabled, > > > > > > the kernel boot time is ~20 seconds after 6.10 kernel. > > > > > > it's ~7 seconds on 6.8 kernel version. > > > > > > > > > > > > When secureboot is disabled, > > > > > > the boot time is ~7 seconds too. > > > > > > > > > > > > Reproduced on both AMD and Intel platform on ThinkPad X1 and > > > > > > T14. > > > > > > > > > > > > It probably caused autologin failure and micmute led not > > > > > > loaded on AMD platform. > > > > > > > > > > It was later bisected to the change mentioned above. See the > > > > > ticket for > > > > > more details. > > > > > > > > Hi > > > > > > > > I suspect I encountered the same problem: > > > > > > > > https://lore.kernel.org/linux-integrity/b8a7b3566e6014ba102ab98e10ede0d574d8930e.camel@huaweicloud.com/ > > > > > > > > Going to provide more info there. > > > > > > I suppose you are going try to acquire the tracing data I asked? > > > That would be awesome, thanks for taking the troube. Let's look > > > at the data and draw conclusions based on that. > > > > > > Workaround is pretty simple: CONFIG_TCG_TPM2_HMAC=n to the kernel > > > configuration disables the feature. > > > > > > For making decisions what to do with the we are talking about ~2 > > > week window estimated, given the Vienna conference slows things > > > down, so I hope my workaround is good enough before that. > > > > I can enumerate three most likely ways to address the issue: > > > > 1. Strongest: drop from defconfig. > > 2. Medium: leave to defconfig but add an opt-in kernel command-line > > parameter. > > 3. Lightest: if we can based on tracing data nail the regression in > > sustainable schedule, fix it. > > Actually, there's a fourth: not use sessions for the PCR extend (if > we'd got the timings when I asked, this was going to be my suggestion > if they came back problematic). This seems only to be a problem for > IMA measured boot (because it does lots of extends). If necessary this > could even be wrapped in a separate config or boot option that only > disables HMAC on extend if IMA (so we still get security for things > like sd-boot) I can buy that but with a twist that make it an opt-in kernel command line option. We don't want to take already existing functionality away from those who might want to use it (given e.g. hardening requirements), and with that basis opt-in (by default disabled) would be more balanced way to address the issue. Please do a send a patch! BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-10 13:28 ` Jarkko Sakkinen @ 2024-09-11 8:53 ` Roberto Sassu 2024-09-11 12:21 ` James Bottomley 2024-09-11 15:14 ` Jarkko Sakkinen 0 siblings, 2 replies; 34+ messages in thread From: Roberto Sassu @ 2024-09-11 8:53 UTC (permalink / raw) To: Jarkko Sakkinen, James Bottomley, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Tue, 2024-09-10 at 16:28 +0300, Jarkko Sakkinen wrote: > On Tue Sep 10, 2024 at 3:57 PM EEST, James Bottomley wrote: > > On Tue, 2024-09-10 at 15:48 +0300, Jarkko Sakkinen wrote: > > > On Tue Sep 10, 2024 at 3:39 PM EEST, Jarkko Sakkinen wrote: > > > > On Tue Sep 10, 2024 at 12:05 PM EEST, Roberto Sassu wrote: > > > > > On Tue, 2024-09-10 at 11:01 +0200, Linux regression tracking > > > > > (Thorsten > > > > > Leemhuis) wrote: > > > > > > Hi, Thorsten here, the Linux kernel's regression tracker. > > > > > > > > > > > > James, Jarkoo, I noticed a report about a regression in > > > > > > bugzilla.kernel.org that appears to be caused by this change of > > > > > > yours: > > > > > > > > > > > > 6519fea6fd372b ("tpm: add hmac checks to tpm2_pcr_extend()") > > > > > > [v6.10-rc1] > > > > > > > > > > > > As many (most?) kernel developers don't keep an eye on the bug > > > > > > tracker, > > > > > > I decided to forward it by mail. To quote from > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 : > > > > > > > > > > > > > When secureboot is enabled, > > > > > > > the kernel boot time is ~20 seconds after 6.10 kernel. > > > > > > > it's ~7 seconds on 6.8 kernel version. > > > > > > > > > > > > > > When secureboot is disabled, > > > > > > > the boot time is ~7 seconds too. > > > > > > > > > > > > > > Reproduced on both AMD and Intel platform on ThinkPad X1 and > > > > > > > T14. > > > > > > > > > > > > > > It probably caused autologin failure and micmute led not > > > > > > > loaded on AMD platform. > > > > > > > > > > > > It was later bisected to the change mentioned above. See the > > > > > > ticket for > > > > > > more details. > > > > > > > > > > Hi > > > > > > > > > > I suspect I encountered the same problem: > > > > > > > > > > https://lore.kernel.org/linux-integrity/b8a7b3566e6014ba102ab98e10ede0d574d8930e.camel@huaweicloud.com/ > > > > > > > > > > Going to provide more info there. > > > > > > > > I suppose you are going try to acquire the tracing data I asked? > > > > That would be awesome, thanks for taking the troube. Let's look > > > > at the data and draw conclusions based on that. > > > > > > > > Workaround is pretty simple: CONFIG_TCG_TPM2_HMAC=n to the kernel > > > > configuration disables the feature. > > > > > > > > For making decisions what to do with the we are talking about ~2 > > > > week window estimated, given the Vienna conference slows things > > > > down, so I hope my workaround is good enough before that. > > > > > > I can enumerate three most likely ways to address the issue: > > > > > > 1. Strongest: drop from defconfig. > > > 2. Medium: leave to defconfig but add an opt-in kernel command-line > > > parameter. > > > 3. Lightest: if we can based on tracing data nail the regression in > > > sustainable schedule, fix it. > > > > Actually, there's a fourth: not use sessions for the PCR extend (if > > we'd got the timings when I asked, this was going to be my suggestion > > if they came back problematic). This seems only to be a problem for > > IMA measured boot (because it does lots of extends). If necessary this > > could even be wrapped in a separate config or boot option that only > > disables HMAC on extend if IMA (so we still get security for things > > like sd-boot) > > I can buy that but with a twist that make it an opt-in kernel command > line option. We don't want to take already existing functionality away > from those who might want to use it (given e.g. hardening requirements), > and with that basis opt-in (by default disabled) would be more balanced > way to address the issue. > > Please do a send a patch! I made few measurements. I have a Fedora 38 VM with TPM passthrough. Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host) QEMU: rc qemu-kvm 1:4.2-3ubuntu6.27 ii qemu-system-x86 1:6.2+dfsg-2ubuntu6.22 TPM2_PT_MANUFACTURER: raw: 0x49465800 value: "IFX" TPM2_PT_VENDOR_STRING_1: raw: 0x534C4239 value: "SLB9" TPM2_PT_VENDOR_STRING_2: raw: 0x36373000 value: "670" No HMAC: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) | tpm2_pcr_extend() { 0) 1.112 us | tpm_buf_append_hmac_session(); 0) # 6360.029 us | tpm_transmit_cmd(); 0) # 6415.012 us | } HMAC: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 1) | tpm2_pcr_extend() { 1) | tpm2_start_auth_session() { 1) * 36976.99 us | tpm_transmit_cmd(); 1) * 84746.51 us | tpm_transmit_cmd(); 1) # 3195.083 us | tpm_transmit_cmd(); 1) @ 126795.1 us | } 1) 2.254 us | tpm_buf_append_hmac_session(); 1) 3.546 us | tpm_buf_fill_hmac_session(); 1) * 24356.46 us | tpm_transmit_cmd(); 1) 3.496 us | tpm_buf_check_hmac_response(); 1) @ 151171.0 us | } Roberto ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-11 8:53 ` Roberto Sassu @ 2024-09-11 12:21 ` James Bottomley 2024-09-12 13:16 ` Jarkko Sakkinen 2024-09-14 10:42 ` Jarkko Sakkinen 2024-09-11 15:14 ` Jarkko Sakkinen 1 sibling, 2 replies; 34+ messages in thread From: James Bottomley @ 2024-09-11 12:21 UTC (permalink / raw) To: Roberto Sassu, Jarkko Sakkinen, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Wed, 2024-09-11 at 10:53 +0200, Roberto Sassu wrote: > On Tue, 2024-09-10 at 16:28 +0300, Jarkko Sakkinen wrote: > > On Tue Sep 10, 2024 at 3:57 PM EEST, James Bottomley wrote: > > > On Tue, 2024-09-10 at 15:48 +0300, Jarkko Sakkinen wrote: > > > > On Tue Sep 10, 2024 at 3:39 PM EEST, Jarkko Sakkinen wrote: > > > > > On Tue Sep 10, 2024 at 12:05 PM EEST, Roberto Sassu wrote: > > > > > > On Tue, 2024-09-10 at 11:01 +0200, Linux regression > > > > > > tracking > > > > > > (Thorsten > > > > > > Leemhuis) wrote: > > > > > > > Hi, Thorsten here, the Linux kernel's regression tracker. > > > > > > > > > > > > > > James, Jarkoo, I noticed a report about a regression in > > > > > > > bugzilla.kernel.org that appears to be caused by this > > > > > > > change of > > > > > > > yours: > > > > > > > > > > > > > > 6519fea6fd372b ("tpm: add hmac checks to > > > > > > > tpm2_pcr_extend()") > > > > > > > [v6.10-rc1] > > > > > > > > > > > > > > As many (most?) kernel developers don't keep an eye on > > > > > > > the bug > > > > > > > tracker, > > > > > > > I decided to forward it by mail. To quote from > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 : > > > > > > > > > > > > > > > When secureboot is enabled, > > > > > > > > the kernel boot time is ~20 seconds after 6.10 kernel. > > > > > > > > it's ~7 seconds on 6.8 kernel version. > > > > > > > > > > > > > > > > When secureboot is disabled, > > > > > > > > the boot time is ~7 seconds too. > > > > > > > > > > > > > > > > Reproduced on both AMD and Intel platform on ThinkPad > > > > > > > > X1 and > > > > > > > > T14. > > > > > > > > > > > > > > > > It probably caused autologin failure and micmute led > > > > > > > > not > > > > > > > > loaded on AMD platform. > > > > > > > > > > > > > > It was later bisected to the change mentioned above. See > > > > > > > the > > > > > > > ticket for > > > > > > > more details. > > > > > > > > > > > > Hi > > > > > > > > > > > > I suspect I encountered the same problem: > > > > > > > > > > > > https://lore.kernel.org/linux-integrity/b8a7b3566e6014ba102ab98e10ede0d574d8930e.camel@huaweicloud.com/ > > > > > > > > > > > > Going to provide more info there. > > > > > > > > > > I suppose you are going try to acquire the tracing data I > > > > > asked? > > > > > That would be awesome, thanks for taking the troube. Let's > > > > > look > > > > > at the data and draw conclusions based on that. > > > > > > > > > > Workaround is pretty simple: CONFIG_TCG_TPM2_HMAC=n to the > > > > > kernel > > > > > configuration disables the feature. > > > > > > > > > > For making decisions what to do with the we are talking > > > > > about ~2 > > > > > week window estimated, given the Vienna conference slows > > > > > things > > > > > down, so I hope my workaround is good enough before that. > > > > > > > > I can enumerate three most likely ways to address the issue: > > > > > > > > 1. Strongest: drop from defconfig. > > > > 2. Medium: leave to defconfig but add an opt-in kernel command- > > > > line > > > > parameter. > > > > 3. Lightest: if we can based on tracing data nail the > > > > regression in > > > > sustainable schedule, fix it. > > > > > > Actually, there's a fourth: not use sessions for the PCR extend > > > (if > > > we'd got the timings when I asked, this was going to be my > > > suggestion > > > if they came back problematic). This seems only to be a problem > > > for > > > IMA measured boot (because it does lots of extends). If > > > necessary this > > > could even be wrapped in a separate config or boot option that > > > only > > > disables HMAC on extend if IMA (so we still get security for > > > things > > > like sd-boot) > > > > I can buy that but with a twist that make it an opt-in kernel > > command > > line option. We don't want to take already existing functionality > > away > > from those who might want to use it (given e.g. hardening > > requirements), > > and with that basis opt-in (by default disabled) would be more > > balanced > > way to address the issue. > > > > Please do a send a patch! > > I made few measurements. I have a Fedora 38 VM with TPM passthrough. > > Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host) > > QEMU: > > rc qemu-kvm 1:4.2- > 3ubuntu6.27 > ii qemu-system-x86 1:6.2+dfsg- > 2ubuntu6.22 > > > TPM2_PT_MANUFACTURER: > raw: 0x49465800 > value: "IFX" > TPM2_PT_VENDOR_STRING_1: > raw: 0x534C4239 > value: "SLB9" > TPM2_PT_VENDOR_STRING_2: > raw: 0x36373000 > value: "670" > > > No HMAC: > > # tracer: function_graph > # > # CPU DURATION FUNCTION CALLS > # | | | | | | | > 0) | tpm2_pcr_extend() { > 0) 1.112 us | tpm_buf_append_hmac_session(); > 0) # 6360.029 us | tpm_transmit_cmd(); > 0) # 6415.012 us | } > > > HMAC: > > # tracer: function_graph > # > # CPU DURATION FUNCTION CALLS > # | | | | | | | > 1) | tpm2_pcr_extend() { > 1) | tpm2_start_auth_session() { > 1) * 36976.99 us | tpm_transmit_cmd(); > 1) * 84746.51 us | tpm_transmit_cmd(); > 1) # 3195.083 us | tpm_transmit_cmd(); > 1) @ 126795.1 us | } > 1) 2.254 us | tpm_buf_append_hmac_session(); > 1) 3.546 us | tpm_buf_fill_hmac_session(); > 1) * 24356.46 us | tpm_transmit_cmd(); > 1) 3.496 us | tpm_buf_check_hmac_response(); > 1) @ 151171.0 us | } Well, unfortunately, that tells us that it's the TPM itself that's taking the time processing the security overhead. The ordering of the commands in tpm2_start_auth_session() shows 37ms for context restore of null key 85ms for start session with encrypted salt 3ms to flush null key ----- 125ms If we context save the session, we'd likely only bear a single 37ms cost to restore it (replacing the total 125ms). However, there's nothing we can do about the extend execution going from 6ms to 24ms, so I could halve your current boot time with security enabled (it's currently 149ms, it would go to 61ms, but it's still 10x slower than the unsecured extend at 6ms) James ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-11 12:21 ` James Bottomley @ 2024-09-12 13:16 ` Jarkko Sakkinen 2024-09-12 13:26 ` James Bottomley 2024-09-14 10:42 ` Jarkko Sakkinen 1 sibling, 1 reply; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-12 13:16 UTC (permalink / raw) To: James Bottomley, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Wed Sep 11, 2024 at 3:21 PM EEST, James Bottomley wrote: > On Wed, 2024-09-11 at 10:53 +0200, Roberto Sassu wrote: > > On Tue, 2024-09-10 at 16:28 +0300, Jarkko Sakkinen wrote: > > > On Tue Sep 10, 2024 at 3:57 PM EEST, James Bottomley wrote: > > > > On Tue, 2024-09-10 at 15:48 +0300, Jarkko Sakkinen wrote: > > > > > On Tue Sep 10, 2024 at 3:39 PM EEST, Jarkko Sakkinen wrote: > > > > > > On Tue Sep 10, 2024 at 12:05 PM EEST, Roberto Sassu wrote: > > > > > > > On Tue, 2024-09-10 at 11:01 +0200, Linux regression > > > > > > > tracking > > > > > > > (Thorsten > > > > > > > Leemhuis) wrote: > > > > > > > > Hi, Thorsten here, the Linux kernel's regression tracker. > > > > > > > > > > > > > > > > James, Jarkoo, I noticed a report about a regression in > > > > > > > > bugzilla.kernel.org that appears to be caused by this > > > > > > > > change of > > > > > > > > yours: > > > > > > > > > > > > > > > > 6519fea6fd372b ("tpm: add hmac checks to > > > > > > > > tpm2_pcr_extend()") > > > > > > > > [v6.10-rc1] > > > > > > > > > > > > > > > > As many (most?) kernel developers don't keep an eye on > > > > > > > > the bug > > > > > > > > tracker, > > > > > > > > I decided to forward it by mail. To quote from > > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 : > > > > > > > > > > > > > > > > > When secureboot is enabled, > > > > > > > > > the kernel boot time is ~20 seconds after 6.10 kernel. > > > > > > > > > it's ~7 seconds on 6.8 kernel version. > > > > > > > > > > > > > > > > > > When secureboot is disabled, > > > > > > > > > the boot time is ~7 seconds too. > > > > > > > > > > > > > > > > > > Reproduced on both AMD and Intel platform on ThinkPad > > > > > > > > > X1 and > > > > > > > > > T14. > > > > > > > > > > > > > > > > > > It probably caused autologin failure and micmute led > > > > > > > > > not > > > > > > > > > loaded on AMD platform. > > > > > > > > > > > > > > > > It was later bisected to the change mentioned above. See > > > > > > > > the > > > > > > > > ticket for > > > > > > > > more details. > > > > > > > > > > > > > > Hi > > > > > > > > > > > > > > I suspect I encountered the same problem: > > > > > > > > > > > > > > https://lore.kernel.org/linux-integrity/b8a7b3566e6014ba102ab98e10ede0d574d8930e.camel@huaweicloud.com/ > > > > > > > > > > > > > > Going to provide more info there. > > > > > > > > > > > > I suppose you are going try to acquire the tracing data I > > > > > > asked? > > > > > > That would be awesome, thanks for taking the troube. Let's > > > > > > look > > > > > > at the data and draw conclusions based on that. > > > > > > > > > > > > Workaround is pretty simple: CONFIG_TCG_TPM2_HMAC=n to the > > > > > > kernel > > > > > > configuration disables the feature. > > > > > > > > > > > > For making decisions what to do with the we are talking > > > > > > about ~2 > > > > > > week window estimated, given the Vienna conference slows > > > > > > things > > > > > > down, so I hope my workaround is good enough before that. > > > > > > > > > > I can enumerate three most likely ways to address the issue: > > > > > > > > > > 1. Strongest: drop from defconfig. > > > > > 2. Medium: leave to defconfig but add an opt-in kernel command- > > > > > line > > > > > parameter. > > > > > 3. Lightest: if we can based on tracing data nail the > > > > > regression in > > > > > sustainable schedule, fix it. > > > > > > > > Actually, there's a fourth: not use sessions for the PCR extend > > > > (if > > > > we'd got the timings when I asked, this was going to be my > > > > suggestion > > > > if they came back problematic). This seems only to be a problem > > > > for > > > > IMA measured boot (because it does lots of extends). If > > > > necessary this > > > > could even be wrapped in a separate config or boot option that > > > > only > > > > disables HMAC on extend if IMA (so we still get security for > > > > things > > > > like sd-boot) > > > > > > I can buy that but with a twist that make it an opt-in kernel > > > command > > > line option. We don't want to take already existing functionality > > > away > > > from those who might want to use it (given e.g. hardening > > > requirements), > > > and with that basis opt-in (by default disabled) would be more > > > balanced > > > way to address the issue. > > > > > > Please do a send a patch! > > > > I made few measurements. I have a Fedora 38 VM with TPM passthrough. > > > > Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host) > > > > QEMU: > > > > rc qemu-kvm 1:4.2- > > 3ubuntu6.27 > > ii qemu-system-x86 1:6.2+dfsg- > > 2ubuntu6.22 > > > > > > TPM2_PT_MANUFACTURER: > > raw: 0x49465800 > > value: "IFX" > > TPM2_PT_VENDOR_STRING_1: > > raw: 0x534C4239 > > value: "SLB9" > > TPM2_PT_VENDOR_STRING_2: > > raw: 0x36373000 > > value: "670" > > > > > > No HMAC: > > > > # tracer: function_graph > > # > > # CPU DURATION FUNCTION CALLS > > # | | | | | | | > > 0) | tpm2_pcr_extend() { > > 0) 1.112 us | tpm_buf_append_hmac_session(); > > 0) # 6360.029 us | tpm_transmit_cmd(); > > 0) # 6415.012 us | } > > > > > > HMAC: > > > > # tracer: function_graph > > # > > # CPU DURATION FUNCTION CALLS > > # | | | | | | | > > 1) | tpm2_pcr_extend() { > > 1) | tpm2_start_auth_session() { > > 1) * 36976.99 us | tpm_transmit_cmd(); > > 1) * 84746.51 us | tpm_transmit_cmd(); > > 1) # 3195.083 us | tpm_transmit_cmd(); > > 1) @ 126795.1 us | } > > 1) 2.254 us | tpm_buf_append_hmac_session(); > > 1) 3.546 us | tpm_buf_fill_hmac_session(); > > 1) * 24356.46 us | tpm_transmit_cmd(); > > 1) 3.496 us | tpm_buf_check_hmac_response(); > > 1) @ 151171.0 us | } > > Well, unfortunately, that tells us that it's the TPM itself that's > taking the time processing the security overhead. The ordering of the > commands in tpm2_start_auth_session() shows > > 37ms for context restore of null key > 85ms for start session with encrypted salt > 3ms to flush null key > ----- > 125ms > > If we context save the session, we'd likely only bear a single 37ms > cost to restore it (replacing the total 125ms). However, there's > nothing we can do about the extend execution going from 6ms to 24ms, so > I could halve your current boot time with security enabled (it's > currently 149ms, it would go to 61ms, but it's still 10x slower than > the unsecured extend at 6ms) > > James I'll hold for better benchmarks. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-12 13:16 ` Jarkko Sakkinen @ 2024-09-12 13:26 ` James Bottomley 2024-09-12 13:36 ` Roberto Sassu 2024-09-12 14:26 ` Jarkko Sakkinen 0 siblings, 2 replies; 34+ messages in thread From: James Bottomley @ 2024-09-12 13:26 UTC (permalink / raw) To: Jarkko Sakkinen, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Thu, 2024-09-12 at 16:16 +0300, Jarkko Sakkinen wrote: > On Wed Sep 11, 2024 at 3:21 PM EEST, James Bottomley wrote: > > On Wed, 2024-09-11 at 10:53 +0200, Roberto Sassu wrote: [...] > > > I made few measurements. I have a Fedora 38 VM with TPM > > > passthrough. > > > > > > Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host) > > > > > > QEMU: > > > > > > rc qemu-kvm 1:4.2- > > > 3ubuntu6.27 > > > ii qemu-system-x86 1:6.2+dfsg- > > > 2ubuntu6.22 > > > > > > > > > TPM2_PT_MANUFACTURER: > > > raw: 0x49465800 > > > value: "IFX" > > > TPM2_PT_VENDOR_STRING_1: > > > raw: 0x534C4239 > > > value: "SLB9" > > > TPM2_PT_VENDOR_STRING_2: > > > raw: 0x36373000 > > > value: "670" > > > > > > > > > No HMAC: > > > > > > # tracer: function_graph > > > # > > > # CPU DURATION FUNCTION CALLS > > > # | | | | | | | > > > 0) | tpm2_pcr_extend() { > > > 0) 1.112 us | tpm_buf_append_hmac_session(); > > > 0) # 6360.029 us | tpm_transmit_cmd(); > > > 0) # 6415.012 us | } > > > > > > > > > HMAC: > > > > > > # tracer: function_graph > > > # > > > # CPU DURATION FUNCTION CALLS > > > # | | | | | | | > > > 1) | tpm2_pcr_extend() { > > > 1) | tpm2_start_auth_session() { > > > 1) * 36976.99 us | tpm_transmit_cmd(); > > > 1) * 84746.51 us | tpm_transmit_cmd(); > > > 1) # 3195.083 us | tpm_transmit_cmd(); > > > 1) @ 126795.1 us | } > > > 1) 2.254 us | tpm_buf_append_hmac_session(); > > > 1) 3.546 us | tpm_buf_fill_hmac_session(); > > > 1) * 24356.46 us | tpm_transmit_cmd(); > > > 1) 3.496 us | tpm_buf_check_hmac_response(); > > > 1) @ 151171.0 us | } > > > > Well, unfortunately, that tells us that it's the TPM itself that's > > taking the time processing the security overhead. The ordering of > > the commands in tpm2_start_auth_session() shows > > > > 37ms for context restore of null key > > 85ms for start session with encrypted salt > > 3ms to flush null key > > ----- > > 125ms > > > > If we context save the session, we'd likely only bear a single 37ms > > cost to restore it (replacing the total 125ms). However, there's > > nothing we can do about the extend execution going from 6ms to > > 24ms, so I could halve your current boot time with security enabled > > (it's currently 149ms, it would go to 61ms, but it's still 10x > > slower than the unsecured extend at 6ms) > > > > James > > I'll hold for better benchmarks. Well, yes, I'd like to see this for a variety of TPMs. This one clearly shows it's the real time wait for the TPM (since it dwarfs the CPU time calculation there's not much optimization we can do on the kernel end). The one thing that's missing in all of this is what was the TPM? but even if it's an outlier that's really bad at crypto what should we do? We could have a blacklist that turns off the extend hmac (or a whitelist that turns it on), but we can't simply say too bad you need a better TPM. James ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-12 13:26 ` James Bottomley @ 2024-09-12 13:36 ` Roberto Sassu 2024-09-12 14:13 ` James Bottomley 2024-09-12 14:26 ` Jarkko Sakkinen 1 sibling, 1 reply; 34+ messages in thread From: Roberto Sassu @ 2024-09-12 13:36 UTC (permalink / raw) To: James Bottomley, Jarkko Sakkinen, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Thu, 2024-09-12 at 09:26 -0400, James Bottomley wrote: > On Thu, 2024-09-12 at 16:16 +0300, Jarkko Sakkinen wrote: > > On Wed Sep 11, 2024 at 3:21 PM EEST, James Bottomley wrote: > > > On Wed, 2024-09-11 at 10:53 +0200, Roberto Sassu wrote: > [...] > > > > I made few measurements. I have a Fedora 38 VM with TPM > > > > passthrough. > > > > > > > > Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host) > > > > > > > > QEMU: > > > > > > > > rc qemu-kvm 1:4.2- > > > > 3ubuntu6.27 > > > > ii qemu-system-x86 1:6.2+dfsg- > > > > 2ubuntu6.22 > > > > > > > > > > > > TPM2_PT_MANUFACTURER: > > > > raw: 0x49465800 > > > > value: "IFX" > > > > TPM2_PT_VENDOR_STRING_1: > > > > raw: 0x534C4239 > > > > value: "SLB9" > > > > TPM2_PT_VENDOR_STRING_2: > > > > raw: 0x36373000 > > > > value: "670" > > > > > > > > > > > > No HMAC: > > > > > > > > # tracer: function_graph > > > > # > > > > # CPU DURATION FUNCTION CALLS > > > > # | | | | | | | > > > > 0) | tpm2_pcr_extend() { > > > > 0) 1.112 us | tpm_buf_append_hmac_session(); > > > > 0) # 6360.029 us | tpm_transmit_cmd(); > > > > 0) # 6415.012 us | } > > > > > > > > > > > > HMAC: > > > > > > > > # tracer: function_graph > > > > # > > > > # CPU DURATION FUNCTION CALLS > > > > # | | | | | | | > > > > 1) | tpm2_pcr_extend() { > > > > 1) | tpm2_start_auth_session() { > > > > 1) * 36976.99 us | tpm_transmit_cmd(); > > > > 1) * 84746.51 us | tpm_transmit_cmd(); > > > > 1) # 3195.083 us | tpm_transmit_cmd(); > > > > 1) @ 126795.1 us | } > > > > 1) 2.254 us | tpm_buf_append_hmac_session(); > > > > 1) 3.546 us | tpm_buf_fill_hmac_session(); > > > > 1) * 24356.46 us | tpm_transmit_cmd(); > > > > 1) 3.496 us | tpm_buf_check_hmac_response(); > > > > 1) @ 151171.0 us | } > > > > > > Well, unfortunately, that tells us that it's the TPM itself that's > > > taking the time processing the security overhead. The ordering of > > > the commands in tpm2_start_auth_session() shows > > > > > > 37ms for context restore of null key > > > 85ms for start session with encrypted salt > > > 3ms to flush null key > > > ----- > > > 125ms > > > > > > If we context save the session, we'd likely only bear a single 37ms > > > cost to restore it (replacing the total 125ms). However, there's > > > nothing we can do about the extend execution going from 6ms to > > > 24ms, so I could halve your current boot time with security enabled > > > (it's currently 149ms, it would go to 61ms, but it's still 10x > > > slower than the unsecured extend at 6ms) > > > > > > James > > > > I'll hold for better benchmarks. > > Well, yes, I'd like to see this for a variety of TPMs. > > This one clearly shows it's the real time wait for the TPM (since it > dwarfs the CPU time calculation there's not much optimization we can do > on the kernel end). The one thing that's missing in all of this is > what was the TPM? but even if it's an outlier that's really bad at > crypto what should we do? We could have a blacklist that turns off the > extend hmac (or a whitelist that turns it on), but we can't simply say > too bad you need a better TPM. Ops, sorry. I pasted the TPM properties. Was not that clear: Infineon Optiga SLB9670 (interpreting the properties). Roberto ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-12 13:36 ` Roberto Sassu @ 2024-09-12 14:13 ` James Bottomley 2024-09-12 14:52 ` Roberto Sassu 0 siblings, 1 reply; 34+ messages in thread From: James Bottomley @ 2024-09-12 14:13 UTC (permalink / raw) To: Roberto Sassu, Jarkko Sakkinen, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Thu, 2024-09-12 at 15:36 +0200, Roberto Sassu wrote: > On Thu, 2024-09-12 at 09:26 -0400, James Bottomley wrote: > > On Thu, 2024-09-12 at 16:16 +0300, Jarkko Sakkinen wrote: > > > On Wed Sep 11, 2024 at 3:21 PM EEST, James Bottomley wrote: > > > > On Wed, 2024-09-11 at 10:53 +0200, Roberto Sassu wrote: > > [...] > > > > > I made few measurements. I have a Fedora 38 VM with TPM > > > > > passthrough. > > > > > > > > > > Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host) > > > > > > > > > > QEMU: > > > > > > > > > > rc qemu-kvm 1:4.2- > > > > > 3ubuntu6.27 > > > > > ii qemu-system-x86 > > > > > 1:6.2+dfsg- > > > > > 2ubuntu6.22 > > > > > > > > > > > > > > > TPM2_PT_MANUFACTURER: > > > > > raw: 0x49465800 > > > > > value: "IFX" > > > > > TPM2_PT_VENDOR_STRING_1: > > > > > raw: 0x534C4239 > > > > > value: "SLB9" > > > > > TPM2_PT_VENDOR_STRING_2: > > > > > raw: 0x36373000 > > > > > value: "670" > > > > > > > > > > > > > > > No HMAC: > > > > > > > > > > # tracer: function_graph > > > > > # > > > > > # CPU DURATION FUNCTION CALLS > > > > > # | | | | | | | > > > > > 0) | tpm2_pcr_extend() { > > > > > 0) 1.112 us | tpm_buf_append_hmac_session(); > > > > > 0) # 6360.029 us | tpm_transmit_cmd(); > > > > > 0) # 6415.012 us | } > > > > > > > > > > > > > > > HMAC: > > > > > > > > > > # tracer: function_graph > > > > > # > > > > > # CPU DURATION FUNCTION CALLS > > > > > # | | | | | | | > > > > > 1) | tpm2_pcr_extend() { > > > > > 1) | tpm2_start_auth_session() { > > > > > 1) * 36976.99 us | tpm_transmit_cmd(); > > > > > 1) * 84746.51 us | tpm_transmit_cmd(); > > > > > 1) # 3195.083 us | tpm_transmit_cmd(); > > > > > 1) @ 126795.1 us | } > > > > > 1) 2.254 us | tpm_buf_append_hmac_session(); > > > > > 1) 3.546 us | tpm_buf_fill_hmac_session(); > > > > > 1) * 24356.46 us | tpm_transmit_cmd(); > > > > > 1) 3.496 us | tpm_buf_check_hmac_response(); > > > > > 1) @ 151171.0 us | } > > > > > > > > Well, unfortunately, that tells us that it's the TPM itself > > > > that's > > > > taking the time processing the security overhead. The ordering > > > > of > > > > the commands in tpm2_start_auth_session() shows > > > > > > > > 37ms for context restore of null key > > > > 85ms for start session with encrypted salt > > > > 3ms to flush null key > > > > ----- > > > > 125ms > > > > > > > > If we context save the session, we'd likely only bear a single > > > > 37ms > > > > cost to restore it (replacing the total 125ms). However, > > > > there's > > > > nothing we can do about the extend execution going from 6ms to > > > > 24ms, so I could halve your current boot time with security > > > > enabled > > > > (it's currently 149ms, it would go to 61ms, but it's still 10x > > > > slower than the unsecured extend at 6ms) > > > > > > > > James > > > > > > I'll hold for better benchmarks. > > > > Well, yes, I'd like to see this for a variety of TPMs. > > > > This one clearly shows it's the real time wait for the TPM (since > > it dwarfs the CPU time calculation there's not much optimization we > > can do on the kernel end). The one thing that's missing in all of > > this is what was the TPM? but even if it's an outlier that's > > really bad at crypto what should we do? We could have a blacklist > > that turns off the extend hmac (or a whitelist that turns it on), > > but we can't simply say too bad you need a better TPM. > > Ops, sorry. I pasted the TPM properties. Was not that clear: > > Infineon Optiga SLB9670 (interpreting the properties). OK, that's reasonably modern and common: https://www.infineon.com/cms/en/product/security-smart-card-solutions/optiga-embedded-security-solutions/optiga-tpm/ I assume it's one of the Q20 (otherwise it would be a TPM 1.2) but what firmware version (as in could it be upgraded and the tests re-run to see if that makes a difference). I also need the IMA community to start thinking about what they're willing to accept in terms of performance for the added security hmac brings to TPM extends. James ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-12 14:13 ` James Bottomley @ 2024-09-12 14:52 ` Roberto Sassu 0 siblings, 0 replies; 34+ messages in thread From: Roberto Sassu @ 2024-09-12 14:52 UTC (permalink / raw) To: James Bottomley, Jarkko Sakkinen, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Thu, 2024-09-12 at 10:13 -0400, James Bottomley wrote: > On Thu, 2024-09-12 at 15:36 +0200, Roberto Sassu wrote: > > On Thu, 2024-09-12 at 09:26 -0400, James Bottomley wrote: > > > On Thu, 2024-09-12 at 16:16 +0300, Jarkko Sakkinen wrote: > > > > On Wed Sep 11, 2024 at 3:21 PM EEST, James Bottomley wrote: > > > > > On Wed, 2024-09-11 at 10:53 +0200, Roberto Sassu wrote: > > > [...] > > > > > > I made few measurements. I have a Fedora 38 VM with TPM > > > > > > passthrough. > > > > > > > > > > > > Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host) > > > > > > > > > > > > QEMU: > > > > > > > > > > > > rc qemu-kvm 1:4.2- > > > > > > 3ubuntu6.27 > > > > > > ii qemu-system-x86 > > > > > > 1:6.2+dfsg- > > > > > > 2ubuntu6.22 > > > > > > > > > > > > > > > > > > TPM2_PT_MANUFACTURER: > > > > > > raw: 0x49465800 > > > > > > value: "IFX" > > > > > > TPM2_PT_VENDOR_STRING_1: > > > > > > raw: 0x534C4239 > > > > > > value: "SLB9" > > > > > > TPM2_PT_VENDOR_STRING_2: > > > > > > raw: 0x36373000 > > > > > > value: "670" > > > > > > > > > > > > > > > > > > No HMAC: > > > > > > > > > > > > # tracer: function_graph > > > > > > # > > > > > > # CPU DURATION FUNCTION CALLS > > > > > > # | | | | | | | > > > > > > 0) | tpm2_pcr_extend() { > > > > > > 0) 1.112 us | tpm_buf_append_hmac_session(); > > > > > > 0) # 6360.029 us | tpm_transmit_cmd(); > > > > > > 0) # 6415.012 us | } > > > > > > > > > > > > > > > > > > HMAC: > > > > > > > > > > > > # tracer: function_graph > > > > > > # > > > > > > # CPU DURATION FUNCTION CALLS > > > > > > # | | | | | | | > > > > > > 1) | tpm2_pcr_extend() { > > > > > > 1) | tpm2_start_auth_session() { > > > > > > 1) * 36976.99 us | tpm_transmit_cmd(); > > > > > > 1) * 84746.51 us | tpm_transmit_cmd(); > > > > > > 1) # 3195.083 us | tpm_transmit_cmd(); > > > > > > 1) @ 126795.1 us | } > > > > > > 1) 2.254 us | tpm_buf_append_hmac_session(); > > > > > > 1) 3.546 us | tpm_buf_fill_hmac_session(); > > > > > > 1) * 24356.46 us | tpm_transmit_cmd(); > > > > > > 1) 3.496 us | tpm_buf_check_hmac_response(); > > > > > > 1) @ 151171.0 us | } > > > > > > > > > > Well, unfortunately, that tells us that it's the TPM itself > > > > > that's > > > > > taking the time processing the security overhead. The ordering > > > > > of > > > > > the commands in tpm2_start_auth_session() shows > > > > > > > > > > 37ms for context restore of null key > > > > > 85ms for start session with encrypted salt > > > > > 3ms to flush null key > > > > > ----- > > > > > 125ms > > > > > > > > > > If we context save the session, we'd likely only bear a single > > > > > 37ms > > > > > cost to restore it (replacing the total 125ms). However, > > > > > there's > > > > > nothing we can do about the extend execution going from 6ms to > > > > > 24ms, so I could halve your current boot time with security > > > > > enabled > > > > > (it's currently 149ms, it would go to 61ms, but it's still 10x > > > > > slower than the unsecured extend at 6ms) > > > > > > > > > > James > > > > > > > > I'll hold for better benchmarks. > > > > > > Well, yes, I'd like to see this for a variety of TPMs. > > > > > > This one clearly shows it's the real time wait for the TPM (since > > > it dwarfs the CPU time calculation there's not much optimization we > > > can do on the kernel end). The one thing that's missing in all of > > > this is what was the TPM? but even if it's an outlier that's > > > really bad at crypto what should we do? We could have a blacklist > > > that turns off the extend hmac (or a whitelist that turns it on), > > > but we can't simply say too bad you need a better TPM. > > > > Ops, sorry. I pasted the TPM properties. Was not that clear: > > > > Infineon Optiga SLB9670 (interpreting the properties). > > OK, that's reasonably modern and common: > > https://www.infineon.com/cms/en/product/security-smart-card-solutions/optiga-embedded-security-solutions/optiga-tpm/ > > I assume it's one of the Q20 (otherwise it would be a TPM 1.2) but what > firmware version (as in could it be upgraded and the tests re-run to > see if that makes a difference). > > I also need the IMA community to start thinking about what they're > willing to accept in terms of performance for the added security hmac > brings to TPM extends. Just for curiosity, I made a comparison of the boot time of Fedora 38 (minimal installation) without and with HMAC enabled, without and with the Integrity Digest Cache [1], which I originally designed exactly for this purpose (one measurement per package): Without HMAC: Without Integrity Digest Cache: [root@fedora ~]# systemd-analyze Startup finished in 2.486s (kernel) + 3.594s (initrd) + 11.613s (userspace) = 17.694s multi-user.target reached after 11.559s in userspace. [root@fedora ~]# cat /sys/kernel/security/ima/ascii_runtime_measurements|wc -l 444 With Integrity Digest Cache: [root@fedora ~]# systemd-analyze Startup finished in 2.381s (kernel) + 3.469s (initrd) + 11.794s (userspace) = 17.644s multi-user.target reached after 11.750s in userspace. [root@fedora ~]# cat /sys/kernel/security/ima/ascii_runtime_measurements|wc -l 218 With HMAC: Without Integrity Digest Cache: [root@fedora ~]# systemd-analyze Startup finished in 2.911s (kernel) + 3.453s (initrd) + 1min 5.754s (userspace) = 1min 12.119s multi-user.target reached after 1min 5.707s in userspace. [root@fedora ~]# cat /sys/kernel/security/ima/ascii_runtime_measurements|wc -l 444 With Integrity Digest Cache: [root@fedora ~]# systemd-analyze Startup finished in 2.990s (kernel) + 3.462s (initrd) + 37.038s (userspace) = 43.491s multi-user.target reached after 36.997s in userspace. [root@fedora ~]# cat /sys/kernel/security/ima/ascii_runtime_measurements|wc -l 218 [1]: https://lore.kernel.org/linux-integrity/20240905150543.3766895-1-roberto.sassu@huaweicloud.com/ Roberto ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-12 13:26 ` James Bottomley 2024-09-12 13:36 ` Roberto Sassu @ 2024-09-12 14:26 ` Jarkko Sakkinen 1 sibling, 0 replies; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-12 14:26 UTC (permalink / raw) To: James Bottomley, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Thu Sep 12, 2024 at 4:26 PM EEST, James Bottomley wrote: > On Thu, 2024-09-12 at 16:16 +0300, Jarkko Sakkinen wrote: > > On Wed Sep 11, 2024 at 3:21 PM EEST, James Bottomley wrote: > > > On Wed, 2024-09-11 at 10:53 +0200, Roberto Sassu wrote: > [...] > > > > I made few measurements. I have a Fedora 38 VM with TPM > > > > passthrough. > > > > > > > > Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host) > > > > > > > > QEMU: > > > > > > > > rc qemu-kvm 1:4.2- > > > > 3ubuntu6.27 > > > > ii qemu-system-x86 1:6.2+dfsg- > > > > 2ubuntu6.22 > > > > > > > > > > > > TPM2_PT_MANUFACTURER: > > > > raw: 0x49465800 > > > > value: "IFX" > > > > TPM2_PT_VENDOR_STRING_1: > > > > raw: 0x534C4239 > > > > value: "SLB9" > > > > TPM2_PT_VENDOR_STRING_2: > > > > raw: 0x36373000 > > > > value: "670" > > > > > > > > > > > > No HMAC: > > > > > > > > # tracer: function_graph > > > > # > > > > # CPU DURATION FUNCTION CALLS > > > > # | | | | | | | > > > > 0) | tpm2_pcr_extend() { > > > > 0) 1.112 us | tpm_buf_append_hmac_session(); > > > > 0) # 6360.029 us | tpm_transmit_cmd(); > > > > 0) # 6415.012 us | } > > > > > > > > > > > > HMAC: > > > > > > > > # tracer: function_graph > > > > # > > > > # CPU DURATION FUNCTION CALLS > > > > # | | | | | | | > > > > 1) | tpm2_pcr_extend() { > > > > 1) | tpm2_start_auth_session() { > > > > 1) * 36976.99 us | tpm_transmit_cmd(); > > > > 1) * 84746.51 us | tpm_transmit_cmd(); > > > > 1) # 3195.083 us | tpm_transmit_cmd(); > > > > 1) @ 126795.1 us | } > > > > 1) 2.254 us | tpm_buf_append_hmac_session(); > > > > 1) 3.546 us | tpm_buf_fill_hmac_session(); > > > > 1) * 24356.46 us | tpm_transmit_cmd(); > > > > 1) 3.496 us | tpm_buf_check_hmac_response(); > > > > 1) @ 151171.0 us | } > > > > > > Well, unfortunately, that tells us that it's the TPM itself that's > > > taking the time processing the security overhead. The ordering of > > > the commands in tpm2_start_auth_session() shows > > > > > > 37ms for context restore of null key > > > 85ms for start session with encrypted salt > > > 3ms to flush null key > > > ----- > > > 125ms > > > > > > If we context save the session, we'd likely only bear a single 37ms > > > cost to restore it (replacing the total 125ms). However, there's > > > nothing we can do about the extend execution going from 6ms to > > > 24ms, so I could halve your current boot time with security enabled > > > (it's currently 149ms, it would go to 61ms, but it's still 10x > > > slower than the unsecured extend at 6ms) > > > > > > James > > > > I'll hold for better benchmarks. > > Well, yes, I'd like to see this for a variety of TPMs. > > This one clearly shows it's the real time wait for the TPM (since it > dwarfs the CPU time calculation there's not much optimization we can do > on the kernel end). The one thing that's missing in all of this is > what was the TPM? but even if it's an outlier that's really bad at > crypto what should we do? We could have a blacklist that turns off the > extend hmac (or a whitelist that turns it on), but we can't simply say > too bad you need a better TPM. > > James I'm pasting here my yesterday's one-liner ;-) sudo bpftrace -e 'k:tpm_transmit { @start[tid] = nsecs; } kr:tpm_transmit { @[kstack, ustack, comm] = sum(nsecs - @start[tid]); delete(@start[tid]); } END { clear(@start); }' If you have a fix candidate, snippet of the output before/after would work as rationale too. Looking into the data Roberto put me tomorrow. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-11 12:21 ` James Bottomley 2024-09-12 13:16 ` Jarkko Sakkinen @ 2024-09-14 10:42 ` Jarkko Sakkinen 2024-09-14 10:51 ` Jarkko Sakkinen 1 sibling, 1 reply; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-14 10:42 UTC (permalink / raw) To: James Bottomley, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Wed Sep 11, 2024 at 3:21 PM EEST, James Bottomley wrote: > On Wed, 2024-09-11 at 10:53 +0200, Roberto Sassu wrote: > > On Tue, 2024-09-10 at 16:28 +0300, Jarkko Sakkinen wrote: > > > On Tue Sep 10, 2024 at 3:57 PM EEST, James Bottomley wrote: > > > > On Tue, 2024-09-10 at 15:48 +0300, Jarkko Sakkinen wrote: > > > > > On Tue Sep 10, 2024 at 3:39 PM EEST, Jarkko Sakkinen wrote: > > > > > > On Tue Sep 10, 2024 at 12:05 PM EEST, Roberto Sassu wrote: > > > > > > > On Tue, 2024-09-10 at 11:01 +0200, Linux regression > > > > > > > tracking > > > > > > > (Thorsten > > > > > > > Leemhuis) wrote: > > > > > > > > Hi, Thorsten here, the Linux kernel's regression tracker. > > > > > > > > > > > > > > > > James, Jarkoo, I noticed a report about a regression in > > > > > > > > bugzilla.kernel.org that appears to be caused by this > > > > > > > > change of > > > > > > > > yours: > > > > > > > > > > > > > > > > 6519fea6fd372b ("tpm: add hmac checks to > > > > > > > > tpm2_pcr_extend()") > > > > > > > > [v6.10-rc1] > > > > > > > > > > > > > > > > As many (most?) kernel developers don't keep an eye on > > > > > > > > the bug > > > > > > > > tracker, > > > > > > > > I decided to forward it by mail. To quote from > > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 : > > > > > > > > > > > > > > > > > When secureboot is enabled, > > > > > > > > > the kernel boot time is ~20 seconds after 6.10 kernel. > > > > > > > > > it's ~7 seconds on 6.8 kernel version. > > > > > > > > > > > > > > > > > > When secureboot is disabled, > > > > > > > > > the boot time is ~7 seconds too. > > > > > > > > > > > > > > > > > > Reproduced on both AMD and Intel platform on ThinkPad > > > > > > > > > X1 and > > > > > > > > > T14. > > > > > > > > > > > > > > > > > > It probably caused autologin failure and micmute led > > > > > > > > > not > > > > > > > > > loaded on AMD platform. > > > > > > > > > > > > > > > > It was later bisected to the change mentioned above. See > > > > > > > > the > > > > > > > > ticket for > > > > > > > > more details. > > > > > > > > > > > > > > Hi > > > > > > > > > > > > > > I suspect I encountered the same problem: > > > > > > > > > > > > > > https://lore.kernel.org/linux-integrity/b8a7b3566e6014ba102ab98e10ede0d574d8930e.camel@huaweicloud.com/ > > > > > > > > > > > > > > Going to provide more info there. > > > > > > > > > > > > I suppose you are going try to acquire the tracing data I > > > > > > asked? > > > > > > That would be awesome, thanks for taking the troube. Let's > > > > > > look > > > > > > at the data and draw conclusions based on that. > > > > > > > > > > > > Workaround is pretty simple: CONFIG_TCG_TPM2_HMAC=n to the > > > > > > kernel > > > > > > configuration disables the feature. > > > > > > > > > > > > For making decisions what to do with the we are talking > > > > > > about ~2 > > > > > > week window estimated, given the Vienna conference slows > > > > > > things > > > > > > down, so I hope my workaround is good enough before that. > > > > > > > > > > I can enumerate three most likely ways to address the issue: > > > > > > > > > > 1. Strongest: drop from defconfig. > > > > > 2. Medium: leave to defconfig but add an opt-in kernel command- > > > > > line > > > > > parameter. > > > > > 3. Lightest: if we can based on tracing data nail the > > > > > regression in > > > > > sustainable schedule, fix it. > > > > > > > > Actually, there's a fourth: not use sessions for the PCR extend > > > > (if > > > > we'd got the timings when I asked, this was going to be my > > > > suggestion > > > > if they came back problematic). This seems only to be a problem > > > > for > > > > IMA measured boot (because it does lots of extends). If > > > > necessary this > > > > could even be wrapped in a separate config or boot option that > > > > only > > > > disables HMAC on extend if IMA (so we still get security for > > > > things > > > > like sd-boot) > > > > > > I can buy that but with a twist that make it an opt-in kernel > > > command > > > line option. We don't want to take already existing functionality > > > away > > > from those who might want to use it (given e.g. hardening > > > requirements), > > > and with that basis opt-in (by default disabled) would be more > > > balanced > > > way to address the issue. > > > > > > Please do a send a patch! > > > > I made few measurements. I have a Fedora 38 VM with TPM passthrough. > > > > Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host) > > > > QEMU: > > > > rc qemu-kvm 1:4.2- > > 3ubuntu6.27 > > ii qemu-system-x86 1:6.2+dfsg- > > 2ubuntu6.22 > > > > > > TPM2_PT_MANUFACTURER: > > raw: 0x49465800 > > value: "IFX" > > TPM2_PT_VENDOR_STRING_1: > > raw: 0x534C4239 > > value: "SLB9" > > TPM2_PT_VENDOR_STRING_2: > > raw: 0x36373000 > > value: "670" > > > > > > No HMAC: > > > > # tracer: function_graph > > # > > # CPU DURATION FUNCTION CALLS > > # | | | | | | | > > 0) | tpm2_pcr_extend() { > > 0) 1.112 us | tpm_buf_append_hmac_session(); > > 0) # 6360.029 us | tpm_transmit_cmd(); > > 0) # 6415.012 us | } > > > > > > HMAC: > > > > # tracer: function_graph > > # > > # CPU DURATION FUNCTION CALLS > > # | | | | | | | > > 1) | tpm2_pcr_extend() { > > 1) | tpm2_start_auth_session() { > > 1) * 36976.99 us | tpm_transmit_cmd(); > > 1) * 84746.51 us | tpm_transmit_cmd(); > > 1) # 3195.083 us | tpm_transmit_cmd(); > > 1) @ 126795.1 us | } > > 1) 2.254 us | tpm_buf_append_hmac_session(); > > 1) 3.546 us | tpm_buf_fill_hmac_session(); > > 1) * 24356.46 us | tpm_transmit_cmd(); > > 1) 3.496 us | tpm_buf_check_hmac_response(); > > 1) @ 151171.0 us | } > > Well, unfortunately, that tells us that it's the TPM itself that's > taking the time processing the security overhead. The ordering of the > commands in tpm2_start_auth_session() shows > > 37ms for context restore of null key > 85ms for start session with encrypted salt > 3ms to flush null key > ----- > 125ms > > If we context save the session, we'd likely only bear a single 37ms > cost to restore it (replacing the total 125ms). However, there's > nothing we can do about the extend execution going from 6ms to 24ms, so > I could halve your current boot time with security enabled (it's > currently 149ms, it would go to 61ms, but it's still 10x slower than > the unsecured extend at 6ms) Please address how this discussion is related to https://bugzilla.kernel.org/show_bug.cgi?id=219229 I just read the bug report nothing about IMA or PCR extend. There's now tons of spam about performance issue in a patch set that is not in the mainline and barely nothing about the original issue: " When secureboot is enabled, the kernel boot time is ~20 seconds after 6.10 kernel. it's ~7 seconds on 6.8 kernel version. When secureboot is disabled, the boot time is ~7 seconds too. Reproduced on both AMD and Intel platform on ThinkPad X1 and T14. It probably caused autologin failure and micmute led not loaded on AMD platform. 6.9 kernel version is not tested since not signed kernel found. 6.8, 6.10, 6.11 are tested, the first bad version is 6.10. " How is this going to help to fix this one? I say this once and one: I zero care fixing code that is in the mainline. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-14 10:42 ` Jarkko Sakkinen @ 2024-09-14 10:51 ` Jarkko Sakkinen 2024-09-14 10:58 ` Jarkko Sakkinen 0 siblings, 1 reply; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-14 10:51 UTC (permalink / raw) To: Jarkko Sakkinen, James Bottomley, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Sat Sep 14, 2024 at 1:42 PM EEST, Jarkko Sakkinen wrote: > Please address how this discussion is related to https://bugzilla.kernel.org/show_bug.cgi?id=219229 > > I just read the bug report nothing about IMA or PCR extend. > > There's now tons of spam about performance issue in a patch set that is > not in the mainline and barely nothing about the original issue: > > " > When secureboot is enabled, > the kernel boot time is ~20 seconds after 6.10 kernel. > it's ~7 seconds on 6.8 kernel version. > > When secureboot is disabled, > the boot time is ~7 seconds too. > > Reproduced on both AMD and Intel platform on ThinkPad X1 and T14. > > It probably caused autologin failure and micmute led not loaded on AMD platform. > > 6.9 kernel version is not tested since not signed kernel found. > 6.8, 6.10, 6.11 are tested, the first bad version is 6.10. > " > > How is this going to help to fix this one? > > I say this once and one: I zero care fixing code that is in the > mainline. How do we now that bug is anything to do with IMA? I'm having a weekend now but on Monday I'll ask the kconfig from the reporter. I think important thing is to then revisit how many times the session is setup during boot and make conclusions from that. It is plain wrong and immoral to convolute a regression with marketing a new kernel feature. These topics should be brought up in the topic (i.e. patch set comments), not here. It misleads everyone. Please explain me how this is going to help the reporter in any possible? BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-14 10:51 ` Jarkko Sakkinen @ 2024-09-14 10:58 ` Jarkko Sakkinen 0 siblings, 0 replies; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-14 10:58 UTC (permalink / raw) To: Jarkko Sakkinen, Jarkko Sakkinen, James Bottomley, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Sat Sep 14, 2024 at 1:51 PM EEST, Jarkko Sakkinen wrote: > On Sat Sep 14, 2024 at 1:42 PM EEST, Jarkko Sakkinen wrote: > > Please address how this discussion is related to https://bugzilla.kernel.org/show_bug.cgi?id=219229 > > > > I just read the bug report nothing about IMA or PCR extend. > > > > There's now tons of spam about performance issue in a patch set that is > > not in the mainline and barely nothing about the original issue: > > > > " > > When secureboot is enabled, > > the kernel boot time is ~20 seconds after 6.10 kernel. > > it's ~7 seconds on 6.8 kernel version. > > > > When secureboot is disabled, > > the boot time is ~7 seconds too. > > > > Reproduced on both AMD and Intel platform on ThinkPad X1 and T14. > > > > It probably caused autologin failure and micmute led not loaded on AMD platform. > > > > 6.9 kernel version is not tested since not signed kernel found. > > 6.8, 6.10, 6.11 are tested, the first bad version is 6.10. > > " > > > > How is this going to help to fix this one? > > > > I say this once and one: I zero care fixing code that is in the > > mainline. "not in the mainline" (oops) > > How do we now that bug is anything to do with IMA? I'm having a weekend > now but on Monday I'll ask the kconfig from the reporter. I think > important thing is to then revisit how many times the session is setup > during boot and make conclusions from that. > > It is plain wrong and immoral to convolute a regression with marketing > a new kernel feature. These topics should be brought up in the topic > (i.e. patch set comments), not here. It misleads everyone. > > Please explain me how this is going to help the reporter in any > possible? I will check the original reporters kconfig once I get it. Based on that I can reverse TPM call sequences. Based on those I check if anything can be orchestrated. If this leads no results I just send a patch that makes the whole feature as an opt-in kernel command-line option and call it a day. I think we can the full next week timeline for this not going to hold longer than that. Any comments that are related to Roberto's unfinished patch set take them elsewhere. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-11 8:53 ` Roberto Sassu 2024-09-11 12:21 ` James Bottomley @ 2024-09-11 15:14 ` Jarkko Sakkinen 2024-09-12 8:13 ` Roberto Sassu 1 sibling, 1 reply; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-11 15:14 UTC (permalink / raw) To: Roberto Sassu, James Bottomley, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Wed Sep 11, 2024 at 11:53 AM EEST, Roberto Sassu wrote: > I made few measurements. I have a Fedora 38 VM with TPM passthrough. I was thinking more like sudo bpftrace -e 'k:tpm_transmit { @start[tid] = nsecs; } kr:tpm_transmit { @[kstack, ustack, comm] = sum(nsecs - @start[tid]); delete(@start[tid]); } END { clear(@start); }' For example when running "tpm2_createprimary --hierarchy o -G rsa2048 -c owner.txt", I get: Attaching 3 probes... ^C @[ tpm_transmit_cmd+46 tpm2_flush_context+120 tpm2_commit_space+197 tpm_dev_transmit.constprop.0+137 tpm_dev_async_work+102 process_one_work+374 worker_thread+614 kthread+207 ret_from_fork+49 ret_from_fork_asm+26 , , kworker/4:2]: 2860677 @[ tpm_dev_transmit.constprop.0+111 tpm_dev_async_work+102 process_one_work+374 worker_thread+614 kthread+207 ret_from_fork+49 ret_from_fork_asm+26 , , kworker/16:1]: 3890693 @[ tpm_transmit_cmd+46 tpm2_load_context+195 tpm2_prepare_space+410 tpm_dev_transmit.constprop.0+54 tpm_dev_async_work+102 process_one_work+374 worker_thread+614 kthread+207 ret_from_fork+49 ret_from_fork_asm+26 , , kworker/4:2]: 9058524 @[ tpm_transmit_cmd+46 tpm2_save_context+179 tpm2_commit_space+314 tpm_dev_transmit.constprop.0+137 tpm_dev_async_work+102 process_one_work+374 worker_thread+614 kthread+207 ret_from_fork+49 ret_from_fork_asm+26 , , kworker/4:2]: 11426260 @[ tpm_transmit_cmd+46 tpm2_load_context+195 tpm2_prepare_space+318 tpm_dev_transmit.constprop.0+54 tpm_dev_async_work+102 process_one_work+374 worker_thread+614 kthread+207 ret_from_fork+49 ret_from_fork_asm+26 , , kworker/4:2]: 14182972 @[ tpm_transmit_cmd+46 tpm2_save_context+179 tpm2_commit_space+155 tpm_dev_transmit.constprop.0+137 tpm_dev_async_work+102 process_one_work+374 worker_thread+614 kthread+207 ret_from_fork+49 ret_from_fork_asm+26 , , kworker/4:2]: 22597059 @[ tpm_dev_transmit.constprop.0+111 tpm_dev_async_work+102 process_one_work+374 worker_thread+614 kthread+207 ret_from_fork+49 ret_from_fork_asm+26 , , kworker/4:2]: 1958500581 This results stacks to compare with "real" time spent total in each stack (in nsecs). CPU time is relevant measure in the problem we're dealing. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-11 15:14 ` Jarkko Sakkinen @ 2024-09-12 8:13 ` Roberto Sassu 2024-09-12 14:23 ` Jarkko Sakkinen ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Roberto Sassu @ 2024-09-12 8:13 UTC (permalink / raw) To: Jarkko Sakkinen, James Bottomley, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Wed, 2024-09-11 at 18:14 +0300, Jarkko Sakkinen wrote: > On Wed Sep 11, 2024 at 11:53 AM EEST, Roberto Sassu wrote: > > I made few measurements. I have a Fedora 38 VM with TPM passthrough. > > I was thinking more like > > sudo bpftrace -e 'k:tpm_transmit { @start[tid] = nsecs; } kr:tpm_transmit { @[kstack, ustack, comm] = sum(nsecs - @start[tid]); delete(@start[tid]); } END { clear(@start); }' > > For example when running "tpm2_createprimary --hierarchy o -G rsa2048 -c owner.txt", I get: Sure: Without HMAC: @[ tpm_transmit_cmd+50 tpm2_pcr_extend+295 tpm_pcr_extend+221 ima_add_template_entry+437 ima_store_template+114 ima_store_measurement+209 process_measurement+2473 ima_file_check+82 security_file_post_open+92 path_openat+550 do_filp_open+171 do_sys_openat2+186 do_sys_open+76 __x64_sys_openat+35 x64_sys_call+9589 do_syscall_64+96 entry_SYSCALL_64_after_hwframe+118 , 0x7f338ee7be55 0x55bf24459ac2 0x7f338eda2b8a 0x7f338eda2c4b 0x55bf2445a9b5 , cat]: 5273648 With HMAC: @[ tpm_transmit_cmd+50 tpm2_flush_context+95 tpm2_start_auth_session+676 tpm2_pcr_extend+39 tpm_pcr_extend+221 ima_add_template_entry+437 ima_store_template+114 ima_store_measurement+209 process_measurement+2473 ima_file_check+82 security_file_post_open+92 path_openat+550 do_filp_open+171 do_sys_openat2+186 do_sys_open+76 __x64_sys_openat+35 x64_sys_call+9589 do_syscall_64+96 entry_SYSCALL_64_after_hwframe+118 , 0x7f03ea0ade55 0x55f929b7dac2 0x7f03e9fd4b8a 0x7f03e9fd4c4b 0x55f929b7e9b5 , cat]: 3128177 @[ tpm_transmit_cmd+50 tpm2_pcr_extend+338 tpm_pcr_extend+221 ima_add_template_entry+437 ima_store_template+114 ima_store_measurement+209 process_measurement+2473 ima_file_check+82 security_file_post_open+92 path_openat+550 do_filp_open+171 do_sys_openat2+186 do_sys_open+76 __x64_sys_openat+35 x64_sys_call+9589 do_syscall_64+96 entry_SYSCALL_64_after_hwframe+118 , 0x7f03ea0ade55 0x55f929b7dac2 0x7f03e9fd4b8a 0x7f03e9fd4c4b 0x55f929b7e9b5 , cat]: 25851638 @[ tpm_transmit_cmd+50 tpm2_load_context+161 tpm2_start_auth_session+98 tpm2_pcr_extend+39 tpm_pcr_extend+221 ima_add_template_entry+437 ima_store_template+114 ima_store_measurement+209 process_measurement+2473 ima_file_check+82 security_file_post_open+92 path_openat+550 do_filp_open+171 do_sys_openat2+186 do_sys_open+76 __x64_sys_openat+35 x64_sys_call+9589 do_syscall_64+96 entry_SYSCALL_64_after_hwframe+118 , 0x7f03ea0ade55 0x55f929b7dac2 0x7f03e9fd4b8a 0x7f03e9fd4c4b 0x55f929b7e9b5 , cat]: 35928108 @[ tpm_transmit_cmd+50 tpm2_start_auth_session+650 tpm2_pcr_extend+39 tpm_pcr_extend+221 ima_add_template_entry+437 ima_store_template+114 ima_store_measurement+209 process_measurement+2473 ima_file_check+82 security_file_post_open+92 path_openat+550 do_filp_open+171 do_sys_openat2+186 do_sys_open+76 __x64_sys_openat+35 x64_sys_call+9589 do_syscall_64+96 entry_SYSCALL_64_after_hwframe+118 , 0x7f03ea0ade55 0x55f929b7dac2 0x7f03e9fd4b8a 0x7f03e9fd4c4b 0x55f929b7e9b5 , cat]: 84616611 Roberto ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-12 8:13 ` Roberto Sassu @ 2024-09-12 14:23 ` Jarkko Sakkinen 2024-09-13 20:50 ` Jarkko Sakkinen 2024-09-15 9:43 ` Jarkko Sakkinen 2 siblings, 0 replies; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-12 14:23 UTC (permalink / raw) To: Roberto Sassu, James Bottomley, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Thu Sep 12, 2024 at 11:13 AM EEST, Roberto Sassu wrote: > On Wed, 2024-09-11 at 18:14 +0300, Jarkko Sakkinen wrote: > > On Wed Sep 11, 2024 at 11:53 AM EEST, Roberto Sassu wrote: > > > I made few measurements. I have a Fedora 38 VM with TPM passthrough. > > > > I was thinking more like > > > > sudo bpftrace -e 'k:tpm_transmit { @start[tid] = nsecs; } kr:tpm_transmit { @[kstack, ustack, comm] = sum(nsecs - @start[tid]); delete(@start[tid]); } END { clear(@start); }' > > > > For example when running "tpm2_createprimary --hierarchy o -G rsa2048 -c owner.txt", I get: > > Sure: > > Without HMAC: > > @[ > tpm_transmit_cmd+50 > tpm2_pcr_extend+295 > tpm_pcr_extend+221 > ima_add_template_entry+437 > ima_store_template+114 > ima_store_measurement+209 > process_measurement+2473 > ima_file_check+82 > security_file_post_open+92 > path_openat+550 > do_filp_open+171 > do_sys_openat2+186 > do_sys_open+76 > __x64_sys_openat+35 > x64_sys_call+9589 > do_syscall_64+96 > entry_SYSCALL_64_after_hwframe+118 > , > 0x7f338ee7be55 > 0x55bf24459ac2 > 0x7f338eda2b8a > 0x7f338eda2c4b > 0x55bf2445a9b5 > , cat]: 5273648 > > > With HMAC: > > @[ > tpm_transmit_cmd+50 > tpm2_flush_context+95 > tpm2_start_auth_session+676 > tpm2_pcr_extend+39 > tpm_pcr_extend+221 > ima_add_template_entry+437 > ima_store_template+114 > ima_store_measurement+209 > process_measurement+2473 > ima_file_check+82 > security_file_post_open+92 > path_openat+550 > do_filp_open+171 > do_sys_openat2+186 > do_sys_open+76 > __x64_sys_openat+35 > x64_sys_call+9589 > do_syscall_64+96 > entry_SYSCALL_64_after_hwframe+118 > , > 0x7f03ea0ade55 > 0x55f929b7dac2 > 0x7f03e9fd4b8a > 0x7f03e9fd4c4b > 0x55f929b7e9b5 > , cat]: 3128177 > @[ > tpm_transmit_cmd+50 > tpm2_pcr_extend+338 > tpm_pcr_extend+221 > ima_add_template_entry+437 > ima_store_template+114 > ima_store_measurement+209 > process_measurement+2473 > ima_file_check+82 > security_file_post_open+92 > path_openat+550 > do_filp_open+171 > do_sys_openat2+186 > do_sys_open+76 > __x64_sys_openat+35 > x64_sys_call+9589 > do_syscall_64+96 > entry_SYSCALL_64_after_hwframe+118 > , > 0x7f03ea0ade55 > 0x55f929b7dac2 > 0x7f03e9fd4b8a > 0x7f03e9fd4c4b > 0x55f929b7e9b5 > , cat]: 25851638 > @[ > tpm_transmit_cmd+50 > tpm2_load_context+161 > tpm2_start_auth_session+98 > tpm2_pcr_extend+39 > tpm_pcr_extend+221 > ima_add_template_entry+437 > ima_store_template+114 > ima_store_measurement+209 > process_measurement+2473 > ima_file_check+82 > security_file_post_open+92 > path_openat+550 > do_filp_open+171 > do_sys_openat2+186 > do_sys_open+76 > __x64_sys_openat+35 > x64_sys_call+9589 > do_syscall_64+96 > entry_SYSCALL_64_after_hwframe+118 > , > 0x7f03ea0ade55 > 0x55f929b7dac2 > 0x7f03e9fd4b8a > 0x7f03e9fd4c4b > 0x55f929b7e9b5 > , cat]: 35928108 > @[ > tpm_transmit_cmd+50 > tpm2_start_auth_session+650 > tpm2_pcr_extend+39 > tpm_pcr_extend+221 > ima_add_template_entry+437 > ima_store_template+114 > ima_store_measurement+209 > process_measurement+2473 > ima_file_check+82 > security_file_post_open+92 > path_openat+550 > do_filp_open+171 > do_sys_openat2+186 > do_sys_open+76 > __x64_sys_openat+35 > x64_sys_call+9589 > do_syscall_64+96 > entry_SYSCALL_64_after_hwframe+118 > , > 0x7f03ea0ade55 > 0x55f929b7dac2 > 0x7f03e9fd4b8a > 0x7f03e9fd4c4b > 0x55f929b7e9b5 > , cat]: 84616611 > > Roberto Looking into tomorrow thank you. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-12 8:13 ` Roberto Sassu 2024-09-12 14:23 ` Jarkko Sakkinen @ 2024-09-13 20:50 ` Jarkko Sakkinen 2024-09-13 22:06 ` Jarkko Sakkinen 2024-09-15 9:43 ` Jarkko Sakkinen 2 siblings, 1 reply; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-13 20:50 UTC (permalink / raw) To: Roberto Sassu, Jarkko Sakkinen, James Bottomley, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Thu Sep 12, 2024 at 11:13 AM EEST, Roberto Sassu wrote: > On Wed, 2024-09-11 at 18:14 +0300, Jarkko Sakkinen wrote: > > On Wed Sep 11, 2024 at 11:53 AM EEST, Roberto Sassu wrote: > > > I made few measurements. I have a Fedora 38 VM with TPM passthrough. > > > > I was thinking more like > > > > sudo bpftrace -e 'k:tpm_transmit { @start[tid] = nsecs; } kr:tpm_transmit { @[kstack, ustack, comm] = sum(nsecs - @start[tid]); delete(@start[tid]); } END { clear(@start); }' > > > > For example when running "tpm2_createprimary --hierarchy o -G rsa2048 -c owner.txt", I get: > > Sure: Took couple of days to upgrade my BuildRoot environment to have bcc and bpftrace [1] but finally got similar figures (not the same test but doing extends). Summarizing your results looking at call before tpm_transmit: - HMAC management: 124 ms - extend with HMAC: 25 ms - extend without HMAC: 5.2 ms I'd see the only possible way to fix this would be refactor the HMAC implementation by making the caller always the orchestrator and thus allowing to use continueSession flag for TPM2_StartAuthSession to be used. For example if you do multiple extends there should not be good reason to setup and rollback session for each call separately right? [1] https://codeberg.org/jarkko/linux-tpmdd-test BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-13 20:50 ` Jarkko Sakkinen @ 2024-09-13 22:06 ` Jarkko Sakkinen 0 siblings, 0 replies; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-13 22:06 UTC (permalink / raw) To: Jarkko Sakkinen, Roberto Sassu, James Bottomley, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Fri Sep 13, 2024 at 11:50 PM EEST, Jarkko Sakkinen wrote: > On Thu Sep 12, 2024 at 11:13 AM EEST, Roberto Sassu wrote: > > On Wed, 2024-09-11 at 18:14 +0300, Jarkko Sakkinen wrote: > > > On Wed Sep 11, 2024 at 11:53 AM EEST, Roberto Sassu wrote: > > > > I made few measurements. I have a Fedora 38 VM with TPM passthrough. > > > > > > I was thinking more like > > > > > > sudo bpftrace -e 'k:tpm_transmit { @start[tid] = nsecs; } kr:tpm_transmit { @[kstack, ustack, comm] = sum(nsecs - @start[tid]); delete(@start[tid]); } END { clear(@start); }' > > > > > > For example when running "tpm2_createprimary --hierarchy o -G rsa2048 -c owner.txt", I get: > > > > Sure: > > Took couple of days to upgrade my BuildRoot environment to have bcc and > bpftrace [1] but finally got similar figures (not the same test but doing > extends). > > Summarizing your results looking at call before tpm_transmit: > > - HMAC management: 124 ms > - extend with HMAC: 25 ms > - extend without HMAC: 5.2 ms > > I'd see the only possible way to fix this would be refactor the HMAC > implementation by making the caller always the orchestrator and thus > allowing to use continueSession flag for TPM2_StartAuthSession to be > used. > > For example if you do multiple extends there should not be good reason > to setup and rollback session for each call separately right? > > [1] https://codeberg.org/jarkko/linux-tpmdd-test Note that the timings are accumulated (not averaged). It would be easy to fix this tho. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-12 8:13 ` Roberto Sassu 2024-09-12 14:23 ` Jarkko Sakkinen 2024-09-13 20:50 ` Jarkko Sakkinen @ 2024-09-15 9:43 ` Jarkko Sakkinen 2024-09-15 10:07 ` Jarkko Sakkinen 2 siblings, 1 reply; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-15 9:43 UTC (permalink / raw) To: Roberto Sassu, James Bottomley, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Thu Sep 12, 2024 at 11:13 AM EEST, Roberto Sassu wrote: > @[ > tpm_transmit_cmd+50 > tpm2_load_context+161 > tpm2_start_auth_session+98 > tpm2_pcr_extend+39 > tpm_pcr_extend+221 > ima_add_template_entry+437 > ima_store_template+114 > ima_store_measurement+209 > process_measurement+2473 > ima_file_check+82 > security_file_post_open+92 > path_openat+550 > do_filp_open+171 > do_sys_openat2+186 > do_sys_open+76 > __x64_sys_openat+35 > x64_sys_call+9589 > do_syscall_64+96 > entry_SYSCALL_64_after_hwframe+118 > , > 0x7f03ea0ade55 > 0x55f929b7dac2 > 0x7f03e9fd4b8a > 0x7f03e9fd4c4b > 0x55f929b7e9b5 > , cat]: 35928108 > @[ > tpm_transmit_cmd+50 > tpm2_start_auth_session+650 > tpm2_pcr_extend+39 > tpm_pcr_extend+221 > ima_add_template_entry+437 > ima_store_template+114 > ima_store_measurement+209 > process_measurement+2473 > ima_file_check+82 > security_file_post_open+92 > path_openat+550 > do_filp_open+171 > do_sys_openat2+186 > do_sys_open+76 > __x64_sys_openat+35 > x64_sys_call+9589 > do_syscall_64+96 > entry_SYSCALL_64_after_hwframe+118 > , > 0x7f03ea0ade55 > 0x55f929b7dac2 > 0x7f03e9fd4b8a > 0x7f03e9fd4c4b > 0x55f929b7e9b5 > , cat]: 84616611 These commands and TPM2_CreatePrimary are the ones that give overhead to the AMD boot-up: 1. TPM2_LoadContext (35 ms) 2. TPM2_StartAuthSession (85 ms) We can conclude that the implementation is too slow and making it faster requires a whole set of small improvements. From this basis the only right fix is to make it opt-in kernel command-line option. That will give space to make small performance improvements over time, and not rush. How the session is orchestrated is not production quality, and the bug gives direct evidence of that. High-level improvements that could be done over time: - Do not call start_auth_session() in extend and get_random(). Orchestrate outside. - Find places to not close and open session sequentially, e.g. with the help of use SA_CONTINUE_SESSION. When it comes to boot we should aim for one single start_auth_session during boot, i.e. different phases would leave that session open so that we don't have to load the context every single time. I think it should be doable. Making all this happen is not a "performance regression fix". It is set of gradual improvements to the code that is not there yet On plus side, the kernel command-line option allows the enable the feature by default during compilation time for all architectures. I've made my decision on this and will submit a fix for it. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-15 9:43 ` Jarkko Sakkinen @ 2024-09-15 10:07 ` Jarkko Sakkinen 2024-09-15 13:59 ` James Bottomley 0 siblings, 1 reply; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-15 10:07 UTC (permalink / raw) To: Jarkko Sakkinen, Roberto Sassu, James Bottomley, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Sun Sep 15, 2024 at 12:43 PM EEST, Jarkko Sakkinen wrote: > When it comes to boot we should aim for one single start_auth_session > during boot, i.e. different phases would leave that session open so > that we don't have to load the context every single time. I think it > should be doable. The best possible idea how to improve performance here would be to transfer the cost from time to space. This can be achieved by keeping null key permanently in the TPM memory during power cycle. It would give about 80% increase given Roberto's benchmark to all in-kernel callers. There's no really other possible solution for this to make any major improvements. So after opt-in kernel command line option I might look into this. This is already done locally in tpm2_get_random(), which uses continueSession to keep session open for all calls. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-15 10:07 ` Jarkko Sakkinen @ 2024-09-15 13:59 ` James Bottomley 2024-09-15 14:50 ` Jarkko Sakkinen 0 siblings, 1 reply; 34+ messages in thread From: James Bottomley @ 2024-09-15 13:59 UTC (permalink / raw) To: Jarkko Sakkinen, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Sun, 2024-09-15 at 13:07 +0300, Jarkko Sakkinen wrote: > On Sun Sep 15, 2024 at 12:43 PM EEST, Jarkko Sakkinen wrote: > > When it comes to boot we should aim for one single > > start_auth_session during boot, i.e. different phases would leave > > that session open so that we don't have to load the context every > > single time. I think it should be doable. > > The best possible idea how to improve performance here would be to > transfer the cost from time to space. This can be achieved by keeping > null key permanently in the TPM memory during power cycle. No it's not at all. If you look at it, the NULL key is only used to encrypt the salt for the start session and that's the operating taking a lot of time. That's why the cleanest mitigation would be to save and restore the session. Unfortunately the timings you already complain about still show this would be about 10x longer than a no-hmac extend so I'm still waiting to see if IMA people consider that an acceptable tradeoff. > It would give about 80% increase given Roberto's benchmark to all > in-kernel callers. There's no really other possible solution for this > to make any major improvements. So after opt-in kernel command line > option I might look into this. > > This is already done locally in tpm2_get_random(), which uses > continueSession to keep session open for all calls. The other problem if the session is context saved, as I already said, is that it becomes long lived and requires degapping the session manager. James ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-15 13:59 ` James Bottomley @ 2024-09-15 14:50 ` Jarkko Sakkinen 2024-09-15 14:55 ` Jarkko Sakkinen 2024-09-15 15:00 ` James Bottomley 0 siblings, 2 replies; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-15 14:50 UTC (permalink / raw) To: James Bottomley, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Sun Sep 15, 2024 at 4:59 PM EEST, James Bottomley wrote: > On Sun, 2024-09-15 at 13:07 +0300, Jarkko Sakkinen wrote: > > On Sun Sep 15, 2024 at 12:43 PM EEST, Jarkko Sakkinen wrote: > > > When it comes to boot we should aim for one single > > > start_auth_session during boot, i.e. different phases would leave > > > that session open so that we don't have to load the context every > > > single time. I think it should be doable. > > > > The best possible idea how to improve performance here would be to > > transfer the cost from time to space. This can be achieved by keeping > > null key permanently in the TPM memory during power cycle. > > No it's not at all. If you look at it, the NULL key is only used to > encrypt the salt for the start session and that's the operating taking > a lot of time. That's why the cleanest mitigation would be to save and > restore the session. Unfortunately the timings you already complain > about still show this would be about 10x longer than a no-hmac extend > so I'm still waiting to see if IMA people consider that an acceptable > tradeoff. The bug report does not say anything about IMA issues. Please read the bug reports before commenting ;-) I will ignore your comment because it is plain misleading information. https://bugzilla.kernel.org/show_bug.cgi?id=219229 > > > It would give about 80% increase given Roberto's benchmark to all > > in-kernel callers. There's no really other possible solution for this > > to make any major improvements. So after opt-in kernel command line > > option I might look into this. > > > > This is already done locally in tpm2_get_random(), which uses > > continueSession to keep session open for all calls. > > The other problem if the session is context saved, as I already said, > is that it becomes long lived and requires degapping the session > manager. I don't really care what you claim, I care what you code only at most. Especially when topic shifted like it was now to IMA, which feels to me like misguided communication tbh. I don't think a round trip in kernel would qualify in that but there is more low-hanging fruit too. One low-hanging fruit improvement in the startup code is the handling of null key. If it was flushed only on need, which means in practice access to /dev/tpm0 or /dev/tpmrm0 I'm already working on patch set which adds chip->null_key that will be flushed on-need basis only. I can measure with qemu how it affects boot time. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-15 14:50 ` Jarkko Sakkinen @ 2024-09-15 14:55 ` Jarkko Sakkinen 2024-09-15 15:00 ` James Bottomley 1 sibling, 0 replies; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-15 14:55 UTC (permalink / raw) To: Jarkko Sakkinen, James Bottomley, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Sun Sep 15, 2024 at 5:50 PM EEST, Jarkko Sakkinen wrote: > One low-hanging fruit improvement in the startup code is the handling > of null key. If it was flushed only on need, which means in practice > access to /dev/tpm0 or /dev/tpmrm0 > > I'm already working on patch set which adds chip->null_key that will > be flushed on-need basis only. I can measure with qemu how it affects > boot time. I can agree with that playing continueSession is not like the first thing to try out but keeping null key in memory as long as it can be does not affect context gap so I start experimenting with that. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-15 14:50 ` Jarkko Sakkinen 2024-09-15 14:55 ` Jarkko Sakkinen @ 2024-09-15 15:00 ` James Bottomley 2024-09-15 16:22 ` Jarkko Sakkinen 1 sibling, 1 reply; 34+ messages in thread From: James Bottomley @ 2024-09-15 15:00 UTC (permalink / raw) To: Jarkko Sakkinen, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Sun, 2024-09-15 at 17:50 +0300, Jarkko Sakkinen wrote: > On Sun Sep 15, 2024 at 4:59 PM EEST, James Bottomley wrote: > > On Sun, 2024-09-15 at 13:07 +0300, Jarkko Sakkinen wrote: > > > On Sun Sep 15, 2024 at 12:43 PM EEST, Jarkko Sakkinen wrote: > > > > When it comes to boot we should aim for one single > > > > start_auth_session during boot, i.e. different phases would > > > > leave that session open so that we don't have to load the > > > > context every single time. I think it should be doable. > > > > > > The best possible idea how to improve performance here would be > > > to transfer the cost from time to space. This can be achieved by > > > keeping null key permanently in the TPM memory during power > > > cycle. > > > > No it's not at all. If you look at it, the NULL key is only used > > to encrypt the salt for the start session and that's the operating > > taking a lot of time. That's why the cleanest mitigation would be > > to save and restore the session. Unfortunately the timings you > > already complain about still show this would be about 10x longer > > than a no-hmac extend so I'm still waiting to see if IMA people > > consider that an acceptable tradeoff. > > The bug report does not say anything about IMA issues. Please read > the bug reports before commenting ;-) I will ignore your comment > because it is plain misleading information. > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 Well, given that the kernel does no measured boot extends after the EFI boot stub (which isn't session protected) finishes, what's your theory for the root cause? James ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-15 15:00 ` James Bottomley @ 2024-09-15 16:22 ` Jarkko Sakkinen 2024-09-21 15:40 ` Jarkko Sakkinen 0 siblings, 1 reply; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-15 16:22 UTC (permalink / raw) To: James Bottomley, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Sun Sep 15, 2024 at 6:00 PM EEST, James Bottomley wrote: > On Sun, 2024-09-15 at 17:50 +0300, Jarkko Sakkinen wrote: > > On Sun Sep 15, 2024 at 4:59 PM EEST, James Bottomley wrote: > > > On Sun, 2024-09-15 at 13:07 +0300, Jarkko Sakkinen wrote: > > > > On Sun Sep 15, 2024 at 12:43 PM EEST, Jarkko Sakkinen wrote: > > > > > When it comes to boot we should aim for one single > > > > > start_auth_session during boot, i.e. different phases would > > > > > leave that session open so that we don't have to load the > > > > > context every single time. I think it should be doable. > > > > > > > > The best possible idea how to improve performance here would be > > > > to transfer the cost from time to space. This can be achieved by > > > > keeping null key permanently in the TPM memory during power > > > > cycle. > > > > > > No it's not at all. If you look at it, the NULL key is only used > > > to encrypt the salt for the start session and that's the operating > > > taking a lot of time. That's why the cleanest mitigation would be > > > to save and restore the session. Unfortunately the timings you > > > already complain about still show this would be about 10x longer > > > than a no-hmac extend so I'm still waiting to see if IMA people > > > consider that an acceptable tradeoff. > > > > The bug report does not say anything about IMA issues. Please read > > the bug reports before commenting ;-) I will ignore your comment > > because it is plain misleading information. > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 > > Well, given that the kernel does no measured boot extends after the EFI > boot stub (which isn't session protected) finishes, what's your theory > for the root cause? I don't think there is a silver bullet. Based on benchmark which showed 80% overhead from throttling the context reducing number of loads and saves will cut a slice of the fat. Since it is the low-hanging fruit I'll start with that. In other words, I'm not going touch session loading and saving. I'll start with null key loading and saving. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-15 16:22 ` Jarkko Sakkinen @ 2024-09-21 15:40 ` Jarkko Sakkinen 2024-09-22 14:11 ` Jarkko Sakkinen 0 siblings, 1 reply; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-21 15:40 UTC (permalink / raw) To: Jarkko Sakkinen, James Bottomley, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Sun Sep 15, 2024 at 7:22 PM EEST, Jarkko Sakkinen wrote: > On Sun Sep 15, 2024 at 6:00 PM EEST, James Bottomley wrote: > > On Sun, 2024-09-15 at 17:50 +0300, Jarkko Sakkinen wrote: > > > On Sun Sep 15, 2024 at 4:59 PM EEST, James Bottomley wrote: > > > > On Sun, 2024-09-15 at 13:07 +0300, Jarkko Sakkinen wrote: > > > > > On Sun Sep 15, 2024 at 12:43 PM EEST, Jarkko Sakkinen wrote: > > > > > > When it comes to boot we should aim for one single > > > > > > start_auth_session during boot, i.e. different phases would > > > > > > leave that session open so that we don't have to load the > > > > > > context every single time. I think it should be doable. > > > > > > > > > > The best possible idea how to improve performance here would be > > > > > to transfer the cost from time to space. This can be achieved by > > > > > keeping null key permanently in the TPM memory during power > > > > > cycle. > > > > > > > > No it's not at all. If you look at it, the NULL key is only used > > > > to encrypt the salt for the start session and that's the operating > > > > taking a lot of time. That's why the cleanest mitigation would be > > > > to save and restore the session. Unfortunately the timings you > > > > already complain about still show this would be about 10x longer > > > > than a no-hmac extend so I'm still waiting to see if IMA people > > > > consider that an acceptable tradeoff. > > > > > > The bug report does not say anything about IMA issues. Please read > > > the bug reports before commenting ;-) I will ignore your comment > > > because it is plain misleading information. > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 > > > > Well, given that the kernel does no measured boot extends after the EFI > > boot stub (which isn't session protected) finishes, what's your theory > > for the root cause? > > I don't think there is a silver bullet. Based on benchmark which showed > 80% overhead from throttling the context reducing number of loads and > saves will cut a slice of the fat. > > Since it is the low-hanging fruit I'll start with that. In other words, > I'm not going touch session loading and saving. I'll start with null > key loading and saving. "my theory" worked pretty well. It brought the boot time back to 8.7s, which can be explained with encryption overhead pretty well. I'd suggest reading the bug report next time before solving a problem that did not exist. We care about users, not unfinished patch sets. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-21 15:40 ` Jarkko Sakkinen @ 2024-09-22 14:11 ` Jarkko Sakkinen 0 siblings, 0 replies; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-22 14:11 UTC (permalink / raw) To: Jarkko Sakkinen, James Bottomley, Roberto Sassu, Linux regressions mailing list Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Sat Sep 21, 2024 at 6:40 PM EEST, Jarkko Sakkinen wrote: > On Sun Sep 15, 2024 at 7:22 PM EEST, Jarkko Sakkinen wrote: > > On Sun Sep 15, 2024 at 6:00 PM EEST, James Bottomley wrote: > > > On Sun, 2024-09-15 at 17:50 +0300, Jarkko Sakkinen wrote: > > > > On Sun Sep 15, 2024 at 4:59 PM EEST, James Bottomley wrote: > > > > > On Sun, 2024-09-15 at 13:07 +0300, Jarkko Sakkinen wrote: > > > > > > On Sun Sep 15, 2024 at 12:43 PM EEST, Jarkko Sakkinen wrote: > > > > > > > When it comes to boot we should aim for one single > > > > > > > start_auth_session during boot, i.e. different phases would > > > > > > > leave that session open so that we don't have to load the > > > > > > > context every single time. I think it should be doable. > > > > > > > > > > > > The best possible idea how to improve performance here would be > > > > > > to transfer the cost from time to space. This can be achieved by > > > > > > keeping null key permanently in the TPM memory during power > > > > > > cycle. > > > > > > > > > > No it's not at all. If you look at it, the NULL key is only used > > > > > to encrypt the salt for the start session and that's the operating > > > > > taking a lot of time. That's why the cleanest mitigation would be > > > > > to save and restore the session. Unfortunately the timings you > > > > > already complain about still show this would be about 10x longer > > > > > than a no-hmac extend so I'm still waiting to see if IMA people > > > > > consider that an acceptable tradeoff. > > > > > > > > The bug report does not say anything about IMA issues. Please read > > > > the bug reports before commenting ;-) I will ignore your comment > > > > because it is plain misleading information. > > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 > > > > > > Well, given that the kernel does no measured boot extends after the EFI > > > boot stub (which isn't session protected) finishes, what's your theory > > > for the root cause? > > > > I don't think there is a silver bullet. Based on benchmark which showed > > 80% overhead from throttling the context reducing number of loads and > > saves will cut a slice of the fat. > > > > Since it is the low-hanging fruit I'll start with that. In other words, > > I'm not going touch session loading and saving. I'll start with null > > key loading and saving. > > "my theory" worked pretty well. It brought the boot time back to 8.7s, > which can be explained with encryption overhead pretty well. > > I'd suggest reading the bug report next time before solving a problem > that did not exist. We care about users, not unfinished patch sets. I'd also expect to review a patch set that fixes a performance issue caused by a feature that you implemented less than a one week. One that doubles the boot time on AMD CPU's. This is ridiculous tbh. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-10 9:01 [regression] significant delays when secureboot is enabled since 6.10 Linux regression tracking (Thorsten Leemhuis) 2024-09-10 9:05 ` Roberto Sassu @ 2024-09-10 12:22 ` James Bottomley 2024-09-10 12:41 ` Linux regression tracking (Thorsten Leemhuis) 1 sibling, 1 reply; 34+ messages in thread From: James Bottomley @ 2024-09-10 12:22 UTC (permalink / raw) To: Linux regressions mailing list, Jarkko Sakkinen Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma On Tue, 2024-09-10 at 11:01 +0200, Linux regression tracking (Thorsten Leemhuis) wrote: > Hi, Thorsten here, the Linux kernel's regression tracker. > > James, Jarkoo, I noticed a report about a regression in > bugzilla.kernel.org that appears to be caused by this change of > yours: > > 6519fea6fd372b ("tpm: add hmac checks to tpm2_pcr_extend()") [v6.10- > rc1] > > As many (most?) kernel developers don't keep an eye on the bug > tracker, I decided to forward it by mail. To quote from > https://bugzilla.kernel.org/show_bug.cgi?id=219229 : > > > When secureboot is enabled, > > the kernel boot time is ~20 seconds after 6.10 kernel. > > it's ~7 seconds on 6.8 kernel version. > > > > When secureboot is disabled, > > the boot time is ~7 seconds too. > > > > Reproduced on both AMD and Intel platform on ThinkPad X1 and T14. > > > > It probably caused autologin failure and micmute led not loaded on > > AMD platform. > > It was later bisected to the change mentioned above. See the ticket > for more details. We always suspected encryption and hmac would add overheads which is why it's gated by a config option. The way to fix this is to set CONFIG_TCG_TPM_HMAC to N of course, TPM transactions are then insecure, but it's the same state as you were in before. James ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-10 12:22 ` James Bottomley @ 2024-09-10 12:41 ` Linux regression tracking (Thorsten Leemhuis) 2024-09-10 22:40 ` Jarkko Sakkinen 0 siblings, 1 reply; 34+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2024-09-10 12:41 UTC (permalink / raw) To: James Bottomley, Linux regressions mailing list, Jarkko Sakkinen Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma, Roberto Sassu On 10.09.24 14:22, James Bottomley wrote: > On Tue, 2024-09-10 at 11:01 +0200, Linux regression tracking (Thorsten > Leemhuis) wrote: >> >> 6519fea6fd372b ("tpm: add hmac checks to tpm2_pcr_extend()") [v6.10- >> rc1] >> >> https://bugzilla.kernel.org/show_bug.cgi?id=219229 : >> >>> When secureboot is enabled, >>> the kernel boot time is ~20 seconds after 6.10 kernel. >>> it's ~7 seconds on 6.8 kernel version. >>> >>> When secureboot is disabled, >>> the boot time is ~7 seconds too. >>> >>> Reproduced on both AMD and Intel platform on ThinkPad X1 and T14. > > We always suspected encryption and hmac would add overheads which is > why it's gated by a config option. The way to fix this is to set > > CONFIG_TCG_TPM_HMAC to N FWIW (mainly for others that later find this thread on lore), I's pretty sure James meant CONFIG_TCG_TPM2_HMAC. > of course, TPM transactions are then insecure, but it's the same state > as you were in before. Hmmm. But it's on by default on X86_64. Hmmm. If this would cause serious trouble, I'd say this is a regression that must be fixed, as we can't expect people to know that they need to turn this off. But delays during boot? Hmmm. Makes me wonder what Linus stance would be here. I suspect it might be "why was this enabled by default for x86_64 anyway, new features almost always should be off by default", but might be wrong there. And given that this was introduced in 6.10 I assume a lot of users already have CONFIG_TCG_TPM2_HMAC=Y in their .config files already anyway. :-/ Hmmm. :-| Ciao, Thorsten ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [regression] significant delays when secureboot is enabled since 6.10 2024-09-10 12:41 ` Linux regression tracking (Thorsten Leemhuis) @ 2024-09-10 22:40 ` Jarkko Sakkinen 0 siblings, 0 replies; 34+ messages in thread From: Jarkko Sakkinen @ 2024-09-10 22:40 UTC (permalink / raw) To: Linux regressions mailing list, James Bottomley Cc: keyrings, linux-integrity@vger.kernel.org, LKML, Pengyu Ma, Roberto Sassu On Tue Sep 10, 2024 at 3:41 PM EEST, Linux regression tracking (Thorsten Leemhuis) wrote: > FWIW (mainly for others that later find this thread on lore), I's pretty > sure James meant CONFIG_TCG_TPM2_HMAC. Yeah, exactly. BR, Jarkko ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2024-09-22 14:11 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-10 9:01 [regression] significant delays when secureboot is enabled since 6.10 Linux regression tracking (Thorsten Leemhuis) 2024-09-10 9:05 ` Roberto Sassu 2024-09-10 12:39 ` Jarkko Sakkinen 2024-09-10 12:48 ` Jarkko Sakkinen 2024-09-10 12:57 ` James Bottomley 2024-09-10 13:28 ` Jarkko Sakkinen 2024-09-11 8:53 ` Roberto Sassu 2024-09-11 12:21 ` James Bottomley 2024-09-12 13:16 ` Jarkko Sakkinen 2024-09-12 13:26 ` James Bottomley 2024-09-12 13:36 ` Roberto Sassu 2024-09-12 14:13 ` James Bottomley 2024-09-12 14:52 ` Roberto Sassu 2024-09-12 14:26 ` Jarkko Sakkinen 2024-09-14 10:42 ` Jarkko Sakkinen 2024-09-14 10:51 ` Jarkko Sakkinen 2024-09-14 10:58 ` Jarkko Sakkinen 2024-09-11 15:14 ` Jarkko Sakkinen 2024-09-12 8:13 ` Roberto Sassu 2024-09-12 14:23 ` Jarkko Sakkinen 2024-09-13 20:50 ` Jarkko Sakkinen 2024-09-13 22:06 ` Jarkko Sakkinen 2024-09-15 9:43 ` Jarkko Sakkinen 2024-09-15 10:07 ` Jarkko Sakkinen 2024-09-15 13:59 ` James Bottomley 2024-09-15 14:50 ` Jarkko Sakkinen 2024-09-15 14:55 ` Jarkko Sakkinen 2024-09-15 15:00 ` James Bottomley 2024-09-15 16:22 ` Jarkko Sakkinen 2024-09-21 15:40 ` Jarkko Sakkinen 2024-09-22 14:11 ` Jarkko Sakkinen 2024-09-10 12:22 ` James Bottomley 2024-09-10 12:41 ` Linux regression tracking (Thorsten Leemhuis) 2024-09-10 22:40 ` Jarkko Sakkinen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox