[PATCH] tpm: Fix the timeout & use ktime

linux-integrity.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] tpm: Fix the timeout & use ktime
@ 2025-06-11 16:25 Orlov, Ivan
  2025-06-11 17:02 ` Jarkko Sakkinen
  2025-06-20 13:24 ` Jonathan McDowell
  0 siblings, 2 replies; 5+ messages in thread
From: Orlov, Ivan @ 2025-06-11 16:25 UTC (permalink / raw)
  To: peterhuewe@gmx.de, jarkko@kernel.org
  Cc: Orlov, Ivan, jgg@ziepe.ca, linux-integrity@vger.kernel.org,
	linux-kernel@vger.kernel.org, Woodhouse, David

The current implementation of timeout detection works in the following
way:

1. Read completion status. If completed, return the data
2. Sleep for some time (usleep_range)
3. Check for timeout using current jiffies value. Return an error if
   timed out
4. Goto 1

usleep_range doesn't guarantee it's always going to wake up strictly in
(min, max) range, so such a situation is possible:

1. Driver reads completion status. No completion yet
2. Process sleeps indefinitely. In the meantime, TPM responds
3. We check for timeout without checking for the completion again.
   Result is lost.

Such a situation also happens for the guest VMs: if vCPU goes to sleep
and doesn't get scheduled for some time, the guest TPM driver will
timeout instantly after waking up without checking for the completion
(which may already be in place).

Instead, perform the check in the following way:

1. Read the current timestamp
2. Read the completion status. If completed, return the result
3. Sleep
4. Check if the timestamp read at step 1 exceeds the timeout. Return
   an error if it does
5. Goto 1

Also, use ktime instead of jiffes as a more reliable and precise timing
source.

Signed-off-by: Ivan Orlov <iorlov@amazon.com>
---
 drivers/char/tpm/tpm-interface.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 8d7e4da6ed53..959330212a16 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -88,7 +88,8 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip, void *buf, size_t bufsiz)
 	int rc;
 	ssize_t len = 0;
 	u32 count, ordinal;
-	unsigned long stop;
+	ktime_t timeout, curr_time;
+	unsigned int ord_dur_us;

 	if (bufsiz < TPM_HEADER_SIZE)
 		return -EINVAL;
@@ -126,8 +127,16 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip, void *buf, size_t bufsiz)
 	if (chip->flags & TPM_CHIP_FLAG_IRQ)
 		goto out_recv;

-	stop = jiffies + tpm_calc_ordinal_duration(chip, ordinal);
+	ord_dur_us = jiffies_to_usecs(tpm_calc_ordinal_duration(chip, ordinal));
+	timeout = ktime_add_us(ktime_get(), ord_dur_us);
 	do {
+		/*
+		 * Save the time of the completion check. This way even if CPU
+		 * goes to sleep indefinitely on tpm_sleep, the driver will
+		 * check for completion one more time instead of timing out
+		 * instantly after waking up.
+		 */
+		curr_time = ktime_get();
 		u8 status = tpm_chip_status(chip);
 		if ((status & chip->ops->req_complete_mask) ==
 		    chip->ops->req_complete_val)
@@ -140,7 +149,7 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip, void *buf, size_t bufsiz)

 		tpm_msleep(TPM_TIMEOUT_POLL);
 		rmb();
-	} while (time_before(jiffies, stop));
+	} while (ktime_before(curr_time, timeout));

 	tpm_chip_cancel(chip);
 	dev_err(&chip->dev, "Operation Timed out\n");
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] tpm: Fix the timeout & use ktime
  2025-06-11 16:25 [PATCH] tpm: Fix the timeout & use ktime Orlov, Ivan
@ 2025-06-11 17:02 ` Jarkko Sakkinen
  2025-06-20 17:19   ` Orlov, Ivan
  2025-06-20 13:24 ` Jonathan McDowell
  1 sibling, 1 reply; 5+ messages in thread
From: Jarkko Sakkinen @ 2025-06-11 17:02 UTC (permalink / raw)
  To: Orlov, Ivan
  Cc: peterhuewe@gmx.de, jgg@ziepe.ca, linux-integrity@vger.kernel.org,
	linux-kernel@vger.kernel.org, Woodhouse, David

On Wed, Jun 11, 2025 at 04:25:24PM +0000, Orlov, Ivan wrote:
> The current implementation of timeout detection works in the following
> way:
> 
> 1. Read completion status. If completed, return the data
> 2. Sleep for some time (usleep_range)
> 3. Check for timeout using current jiffies value. Return an error if
>    timed out
> 4. Goto 1
> 
> usleep_range doesn't guarantee it's always going to wake up strictly in
> (min, max) range, so such a situation is possible:
> 
> 1. Driver reads completion status. No completion yet
> 2. Process sleeps indefinitely. In the meantime, TPM responds
> 3. We check for timeout without checking for the completion again.
>    Result is lost.
> 
> Such a situation also happens for the guest VMs: if vCPU goes to sleep
> and doesn't get scheduled for some time, the guest TPM driver will
> timeout instantly after waking up without checking for the completion
> (which may already be in place).

Got it.

> 
> Instead, perform the check in the following way:
> 
> 1. Read the current timestamp
> 2. Read the completion status. If completed, return the result
> 3. Sleep
> 4. Check if the timestamp read at step 1 exceeds the timeout. Return
>    an error if it does
> 5. Goto 1
> 
> Also, use ktime instead of jiffes as a more reliable and precise timing
> source.

"also", i.e. a logically separate change which should be split up to
a separate patch.

> 
> Signed-off-by: Ivan Orlov <iorlov@amazon.com>
> ---
>  drivers/char/tpm/tpm-interface.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
> index 8d7e4da6ed53..959330212a16 100644
> --- a/drivers/char/tpm/tpm-interface.c
> +++ b/drivers/char/tpm/tpm-interface.c
> @@ -88,7 +88,8 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip, void *buf, size_t bufsiz)
>  	int rc;
>  	ssize_t len = 0;
>  	u32 count, ordinal;
> -	unsigned long stop;
> +	ktime_t timeout, curr_time;
> +	unsigned int ord_dur_us;
>  
>  	if (bufsiz < TPM_HEADER_SIZE)
>  		return -EINVAL;
> @@ -126,8 +127,16 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip, void *buf, size_t bufsiz)
>  	if (chip->flags & TPM_CHIP_FLAG_IRQ)
>  		goto out_recv;
>  
> -	stop = jiffies + tpm_calc_ordinal_duration(chip, ordinal);
> +	ord_dur_us = jiffies_to_usecs(tpm_calc_ordinal_duration(chip, ordinal));
> +	timeout = ktime_add_us(ktime_get(), ord_dur_us);
>  	do {
> +		/*
> +		 * Save the time of the completion check. This way even if CPU
> +		 * goes to sleep indefinitely on tpm_sleep, the driver will
> +		 * check for completion one more time instead of timing out
> +		 * instantly after waking up.
> +		 */
> +		curr_time = ktime_get();
>  		u8 status = tpm_chip_status(chip);
>  		if ((status & chip->ops->req_complete_mask) ==
>  		    chip->ops->req_complete_val)
> @@ -140,7 +149,7 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip, void *buf, size_t bufsiz)
>  
>  		tpm_msleep(TPM_TIMEOUT_POLL);
>  		rmb();
> -	} while (time_before(jiffies, stop));
> +	} while (ktime_before(curr_time, timeout));


Wouldn't it be simpler fix to just check completion after dropping out
of the loop?

	} while (time_before(jiffies, stop));

	if (tpm_transmit_completed(chip))
		goto out_recv;

And declare this before tpm_try_transmit():

static bool tpm_transmit_completed(struct tpm_chip *chip)
{
	u8 status = tpm_chip_status(chip);
	
	return (status & chip->ops->req_complete_mask) == chip->ops->req_complete_val;
}

>  
>  	tpm_chip_cancel(chip);
>  	dev_err(&chip->dev, "Operation Timed out\n");
> -- 
> 2.43.0
> 

BR, Jarkko

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tpm: Fix the timeout & use ktime
  2025-06-11 16:25 [PATCH] tpm: Fix the timeout & use ktime Orlov, Ivan
  2025-06-11 17:02 ` Jarkko Sakkinen
@ 2025-06-20 13:24 ` Jonathan McDowell
  2025-06-20 17:23   ` Orlov, Ivan
  1 sibling, 1 reply; 5+ messages in thread
From: Jonathan McDowell @ 2025-06-20 13:24 UTC (permalink / raw)
  To: Orlov, Ivan
  Cc: peterhuewe@gmx.de, jarkko@kernel.org, jgg@ziepe.ca,
	linux-integrity@vger.kernel.org, linux-kernel@vger.kernel.org,
	Woodhouse, David

On Wed, Jun 11, 2025 at 04:25:24PM +0000, Orlov, Ivan wrote:
>The current implementation of timeout detection works in the following
>way:
>
>1. Read completion status. If completed, return the data
>2. Sleep for some time (usleep_range)
>3. Check for timeout using current jiffies value. Return an error if
>   timed out
>4. Goto 1
>
>usleep_range doesn't guarantee it's always going to wake up strictly in
>(min, max) range, so such a situation is possible:
>
>1. Driver reads completion status. No completion yet
>2. Process sleeps indefinitely. In the meantime, TPM responds
>3. We check for timeout without checking for the completion again.
>   Result is lost.

This looks similar to the issue I fixed in 7146dffa875c ('Fix timeout
handling when waiting for TPM status'), I assume you're actually seeing
it in your systems? I think we're starting to see it (rarely) now the
other issues are fixed in our builds. As a similar approach does the
following work?

diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 8d7e4da6ed53..18ae0767fa60 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -127,7 +127,7 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip, void *buf, size_t bufsiz)
  		goto out_recv;
  
  	stop = jiffies + tpm_calc_ordinal_duration(chip, ordinal);
-	do {
+	while (true) {
  		u8 status = tpm_chip_status(chip);
  		if ((status & chip->ops->req_complete_mask) ==
  		    chip->ops->req_complete_val)
@@ -138,9 +138,12 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip, void *buf, size_t bufsiz)
  			return -ECANCELED;
  		}
  
+		if (time_after(jiffies, stop))
+			break;
+
  		tpm_msleep(TPM_TIMEOUT_POLL);
  		rmb();
-	} while (time_before(jiffies, stop));
+	}
  
  	tpm_chip_cancel(chip);
  	dev_err(&chip->dev, "Operation Timed out\n");

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] tpm: Fix the timeout & use ktime
  2025-06-11 17:02 ` Jarkko Sakkinen
@ 2025-06-20 17:19   ` Orlov, Ivan
  0 siblings, 0 replies; 5+ messages in thread
From: Orlov, Ivan @ 2025-06-20 17:19 UTC (permalink / raw)
  To: Jarkko Sakkinen, Orlov, Ivan
  Cc: peterhuewe@gmx.de, jgg@ziepe.ca, linux-integrity@vger.kernel.org,
	linux-kernel@vger.kernel.org, Woodhouse, David

On 11/06/2025 18:02, Jarkko Sakkinen wrote:
>> Instead, perform the check in the following way:
>>
>> 1. Read the current timestamp
>> 2. Read the completion status. If completed, return the result
>> 3. Sleep
>> 4. Check if the timestamp read at step 1 exceeds the timeout. Return
>>     an error if it does
>> 5. Goto 1
>>
>> Also, use ktime instead of jiffes as a more reliable and precise timing
>> source.
> 
> "also", i.e. a logically separate change which should be split up to
> a separate patch.
> 

Got it, will send this enhancement as a separate patch.

>> +		curr_time = ktime_get();
>>   		u8 status = tpm_chip_status(chip);
>>   		if ((status & chip->ops->req_complete_mask) ==
>>   		    chip->ops->req_complete_val)
>> @@ -140,7 +149,7 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip, void *buf, size_t bufsiz)
>>   
>>   		tpm_msleep(TPM_TIMEOUT_POLL);
>>   		rmb();
>> -	} while (time_before(jiffies, stop));
>> +	} while (ktime_before(curr_time, timeout));
> 
> 
> Wouldn't it be simpler fix to just check completion after dropping out
> of the loop?
> 

yes, this should also solve the problem without taking additional space 
on the stack with new vars. Will update in V2, thanks!

> And declare this before tpm_try_transmit():
> 
> static bool tpm_transmit_completed(struct tpm_chip *chip)
> {
> 	u8 status = tpm_chip_status(chip);
> 	
> 	return (status & chip->ops->req_complete_mask) == chip->ops->req_complete_val;
> }
> 

Cool, will do.

Thanks for the review and sorry for the late reply.

--
Kind regards,
Ivan Orlov


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tpm: Fix the timeout & use ktime
  2025-06-20 13:24 ` Jonathan McDowell
@ 2025-06-20 17:23   ` Orlov, Ivan
  0 siblings, 0 replies; 5+ messages in thread
From: Orlov, Ivan @ 2025-06-20 17:23 UTC (permalink / raw)
  To: Jonathan McDowell, Orlov, Ivan
  Cc: peterhuewe@gmx.de, jarkko@kernel.org, jgg@ziepe.ca,
	linux-integrity@vger.kernel.org, linux-kernel@vger.kernel.org,
	Woodhouse, David

On 20/06/2025 14:24, Jonathan McDowell wrote:
> This looks similar to the issue I fixed in 7146dffa875c ('Fix timeout
> handling when waiting for TPM status'), I assume you're actually seeing
> it in your systems? I think we're starting to see it (rarely) now the
> other issues are fixed in our builds. As a similar approach does the
> following work?
> 
> diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm- 
> interface.c
> index 8d7e4da6ed53..18ae0767fa60 100644
> --- a/drivers/char/tpm/tpm-interface.c
> +++ b/drivers/char/tpm/tpm-interface.c
> @@ -127,7 +127,7 @@ static ssize_t tpm_try_transmit(struct tpm_chip 
> *chip, void *buf, size_t bufsiz)
>           goto out_recv;
> 
>       stop = jiffies + tpm_calc_ordinal_duration(chip, ordinal);
> -    do {
> +    while (true) {
>           u8 status = tpm_chip_status(chip);
>           if ((status & chip->ops->req_complete_mask) ==
>               chip->ops->req_complete_val)
> @@ -138,9 +138,12 @@ static ssize_t tpm_try_transmit(struct tpm_chip 
> *chip, void *buf, size_t bufsiz)
>               return -ECANCELED;
>           }
> 

<-- This would solve the problem with usleep_range taking arbitrary 
time, but unfortunately won't solve it for the guest VM scenario: if 
vCPU gets interrupted here, then it still will account the steal time 
and time out when it's woken up before checking for completion again

> +        if (time_after(jiffies, stop))
> +            break;
> +
>           tpm_msleep(TPM_TIMEOUT_POLL);
>           rmb();
> -    } while (time_before(jiffies, stop));
> +    }
> 
>       tpm_chip_cancel(chip);
>       dev_err(&chip->dev, "Operation Timed out\n");

--
Kind regards,
Ivan Orlov


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-06-20 17:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-11 16:25 [PATCH] tpm: Fix the timeout & use ktime Orlov, Ivan
2025-06-11 17:02 ` Jarkko Sakkinen
2025-06-20 17:19   ` Orlov, Ivan
2025-06-20 13:24 ` Jonathan McDowell
2025-06-20 17:23   ` Orlov, Ivan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).