[PATCH AUTOSEL 6.16 07/35] tpm: Check for completion after timeout

Linux kernel -stable discussions
 help / color / mirror / Atom feed

From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Jarkko Sakkinen <jarkko@kernel.org>,
	Ivan Orlov <ivan.orlov0322@gmail.com>,
	Jonathan McDowell <noodles@meta.com>,
	Sasha Levin <sashal@kernel.org>,
	peterhuewe@gmx.de, linux-integrity@vger.kernel.org
Subject: [PATCH AUTOSEL 6.16 07/35] tpm: Check for completion after timeout
Date: Sun,  3 Aug 2025 17:17:07 -0400	[thread overview]
Message-ID: <20250803211736.3545028-7-sashal@kernel.org> (raw)
In-Reply-To: <20250803211736.3545028-1-sashal@kernel.org>

From: Jarkko Sakkinen <jarkko@kernel.org>

[ Upstream commit d4640c394f23b202a89512346cf28f6622a49031 ]

The current implementation of timeout detection works in the following
way:

1. Read completion status. If completed, return the data
2. Sleep for some time (usleep_range)
3. Check for timeout using current jiffies value. Return an error if
   timed out
4. Goto 1

usleep_range doesn't guarantee it's always going to wake up strictly in
(min, max) range, so such a situation is possible:

1. Driver reads completion status. No completion yet
2. Process sleeps indefinitely. In the meantime, TPM responds
3. We check for timeout without checking for the completion again.
   Result is lost.

Such a situation also happens for the guest VMs: if vCPU goes to sleep
and doesn't get scheduled for some time, the guest TPM driver will
timeout instantly after waking up without checking for the completion
(which may already be in place).

Perform the completion check once again after exiting the busy loop in
order to give the device the last chance to send us some data.

Since now we check for completion in two places, extract this check into
a separate function.

Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Reviewed-by: Jonathan McDowell <noodles@meta.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now let me analyze the code changes in detail to understand the race
condition being fixed.

**Backport Status: YES**

This commit should be backported to stable kernel trees. Here's my
extensive analysis:

## Bug Analysis

The commit fixes a critical race condition in the TPM driver's timeout
handling mechanism that can cause data loss. The bug occurs in the
following scenario:

1. **The Race Window**: Between the last completion status check in the
   busy loop (line 149-151 in the original code) and the timeout check
   (line 160), there's a window where:
   - The driver checks status and finds no completion
   - The process sleeps via `tpm_msleep(TPM_TIMEOUT_POLL)`
   - During sleep, the TPM device completes the operation
   - When the process wakes up, `time_before(jiffies, stop)` returns
     false
   - The driver immediately returns `-ETIME` without checking completion
     again

2. **VM-Specific Impact**: The commit message specifically mentions this
   affects virtual machines where vCPUs can be descheduled for extended
   periods. When the vCPU resumes, the timeout may have expired while
   the TPM response is actually ready.

## Code Changes Analysis

The fix involves:

1. **New Helper Function** (lines 85-90): `tpm_transmit_completed()`
   extracts the completion check logic into a reusable function:
  ```c
  static bool tpm_transmit_completed(u8 status, struct tpm_chip *chip)
  {
  u8 status_masked = status & chip->ops->req_complete_mask;
  return status_masked == chip->ops->req_complete_val;
  }
  ```

2. **Additional Completion Check** (lines 162-167): After the timeout
   loop exits, the code now performs one final completion check:
  ```c
  /*
    - Check for completion one more time, just in case the device
      reported
    - it while the driver was sleeping in the busy loop above.
    */
   if (tpm_transmit_completed(tpm_chip_status(chip), chip))
       goto out_recv;
   ```

## Stable Backport Criteria Analysis

1. **Fixes a real bug**: ✓ - This fixes a race condition that causes TPM
   operations to fail spuriously
2. **User impact**: ✓ - TPM operations failing due to this race can
   affect:
   - System boot (TPM measurements)
   - Cryptographic operations
   - Attestation services
   - Secure boot processes
3. **Small and contained**: ✓ - The fix is minimal:
   - Adds one helper function
   - Adds one additional check after the loop
   - No architectural changes
4. **Low regression risk**: ✓ - The additional check only runs when
   timeout occurs, adding minimal overhead
5. **No new features**: ✓ - Only fixes existing functionality
6. **Critical subsystem**: ✓ - TPM is security-critical infrastructure

## Additional Considerations

- The fix is particularly important for virtualized environments where
  scheduling delays are common
- The race condition can cause intermittent failures that are difficult
  to debug
- The fix follows the principle of "check one more time" which is a
  common pattern for avoiding race conditions
- No follow-up fixes were needed for this commit, indicating it's stable

This commit meets all the criteria for stable backporting and addresses
a real-world bug that affects system reliability, especially in
virtualized environments.

 drivers/char/tpm/tpm-interface.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 8d7e4da6ed53..8d18b33aa62d 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -82,6 +82,13 @@ static bool tpm_chip_req_canceled(struct tpm_chip *chip, u8 status)
 	return chip->ops->req_canceled(chip, status);
 }

+static bool tpm_transmit_completed(u8 status, struct tpm_chip *chip)
+{
+	u8 status_masked = status & chip->ops->req_complete_mask;
+
+	return status_masked == chip->ops->req_complete_val;
+}
+
 static ssize_t tpm_try_transmit(struct tpm_chip *chip, void *buf, size_t bufsiz)
 {
 	struct tpm_header *header = buf;
@@ -129,8 +136,7 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip, void *buf, size_t bufsiz)
 	stop = jiffies + tpm_calc_ordinal_duration(chip, ordinal);
 	do {
 		u8 status = tpm_chip_status(chip);
-		if ((status & chip->ops->req_complete_mask) ==
-		    chip->ops->req_complete_val)
+		if (tpm_transmit_completed(status, chip))
 			goto out_recv;

 		if (tpm_chip_req_canceled(chip, status)) {
@@ -142,6 +148,13 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip, void *buf, size_t bufsiz)
 		rmb();
 	} while (time_before(jiffies, stop));

+	/*
+	 * Check for completion one more time, just in case the device reported
+	 * it while the driver was sleeping in the busy loop above.
+	 */
+	if (tpm_transmit_completed(tpm_chip_status(chip), chip))
+		goto out_recv;
+
 	tpm_chip_cancel(chip);
 	dev_err(&chip->dev, "Operation Timed out\n");
 	return -ETIME;
-- 
2.39.5

next prev parent reply	other threads:[~2025-08-03 21:17 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-03 21:17 [PATCH AUTOSEL 6.16 01/35] hfs: fix general protection fault in hfs_find_init() Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 02/35] hfs: fix slab-out-of-bounds in hfs_bnode_read() Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 03/35] hfsplus: fix slab-out-of-bounds in hfsplus_bnode_read() Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 04/35] hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc() Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 05/35] hfsplus: don't use BUG_ON() in hfsplus_create_attributes_file() Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 06/35] arm64: Handle KCOV __init vs inline mismatches Sasha Levin
2025-08-03 21:17 ` Sasha Levin [this message]
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 08/35] tpm: tpm_crb_ffa: try to probe tpm_crb_ffa when it's built-in Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 09/35] firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 10/35] btrfs: fix -ENOSPC mmap write failure on NOCOW files/extents Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 11/35] smb/server: avoid deadlock when linking with ReplaceIfExists Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 12/35] nvme-pci: try function level reset on init failure Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 13/35] dm-stripe: limit chunk_sectors to the stripe size Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 14/35] md/raid10: set chunk_sectors limit Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 15/35] nvme-tcp: log TLS handshake failures at error level Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 16/35] gfs2: Validate i_depth for exhash directories Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 17/35] gfs2: Set .migrate_folio in gfs2_{rgrp,meta}_aops Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 18/35] md: call del_gendisk in control path Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 19/35] loop: Avoid updating block size under exclusive owner Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 20/35] udf: Verify partition map count Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 21/35] drbd: add missing kref_get in handle_write_conflicts Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 22/35] hfs: fix not erasing deleted b-tree node issue Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 23/35] better lockdep annotations for simple_recursive_removal() Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 24/35] ata: ahci: Disallow LPM policy control if not supported Sasha Levin

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:8d7e4da6ed5 dfblob:8d18b33aa62 )
 OR (
bs:"[PATCH AUTOSEL 6.16 07/35] tpm: Check for completion after timeout" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250803211736.3545028-7-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=ivan.orlov0322@gmail.com \
    --cc=jarkko@kernel.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=noodles@meta.com \
    --cc=patches@lists.linux.dev \
    --cc=peterhuewe@gmx.de \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox