All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Richard W.M. Jones" <rjones@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: qemu-devel@nongnu.org, Thomas Huth <thuth@redhat.com>
Subject: Re: [PATCH] crypto: fix bogus error benchmarking pbkdf on fast machines
Date: Wed, 8 Jan 2025 18:59:04 +0000	[thread overview]
Message-ID: <20250108185904.GH1450@redhat.com> (raw)
In-Reply-To: <20250108184354.997818-1-berrange@redhat.com>

On Wed, Jan 08, 2025 at 06:43:54PM +0000, Daniel P. Berrangé wrote:
> We're seeing periodic reports of errors like:
> 
> $ qemu-img create -f luks --object secret,data=123456,id=sec0 \
>                   -o key-secret=sec0 luks-info.img 1M
>   Formatting 'luks-info.img', fmt=luks size=1048576 key-secret=sec0
>   qemu-img: luks-info.img: Unable to get accurate CPU usage
> 
> This error message comes from a recent attempt to workaround a
> kernel bug with measuring rusage in long running processes:
> 
>   commit c72cab5ad9f849bbcfcf4be7952b8b8946cc626e
>   Author: Tiago Pasqualini <tiago.pasqualini@canonical.com>
>   Date:   Wed Sep 4 20:52:30 2024 -0300
> 
>     crypto: run qcrypto_pbkdf2_count_iters in a new thread
> 
> Unfortunately this has a subtle bug on machines which are very fast.
> 
> On the first time around the loop, the 'iterations' value is quite
> small (1 << 15), and so will run quite fast. Testing has shown that
> some machines can complete this benchmarking task in as little as
> 7 milliseconds.
> 
> Unfortunately the 'getrusage' data is not updated at the time of
> the 'getrusage' call, it is done asynchronously by the schedular.

scheduler (https://en.wiktionary.org/wiki/scheduler)

> The 7 millisecond completion time for the benchmark is short
> enough that 'getrusage' sometimes reports 0 accumulated execution
> time.
> 
> As a result the 'delay_ms == 0' sanity check in the above commit
> is triggering non-deterministically on such machines.
> 
> The benchmarking loop intended to run multiple times, increasing
> the 'iterations' value until the benchmark ran for > 500 ms, but
> the sanity check doesn't allow this to happen.
> 
> To fix it, we keep a loop counter and only run the sanity check
> after we've been around the loop more than 5 times. At that point
> the 'iterations' value is high enough that even with infrequent
> updates of 'getrusage' accounting data on fast machines, we should
> see a non-zero value.
> 
> Reported-by: Thomas Huth <thuth@redhat.com>
> Reported-by: Richard W.M. Jones <rjones@redhat.com>

I think it was originally reported by a Debian automated tool in
https://lore.kernel.org/qemu-devel/ffe542bb-310c-4616-b0ca-13182f849fd1@redhat.com/

> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  crypto/pbkdf.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/crypto/pbkdf.c b/crypto/pbkdf.c
> index 0dd7c3aeaa..b285958319 100644
> --- a/crypto/pbkdf.c
> +++ b/crypto/pbkdf.c
> @@ -107,7 +107,7 @@ static void *threaded_qcrypto_pbkdf2_count_iters(void *data)
>      size_t nsalt = iters_data->nsalt;
>      size_t nout = iters_data->nout;
>      Error **errp = iters_data->errp;
> -
> +    size_t scaled = 0;
>      uint64_t ret = -1;
>      g_autofree uint8_t *out = g_new(uint8_t, nout);
>      uint64_t iterations = (1 << 15);
> @@ -131,7 +131,17 @@ static void *threaded_qcrypto_pbkdf2_count_iters(void *data)
>  
>          delta_ms = end_ms - start_ms;
>  
> -        if (delta_ms == 0) { /* sanity check */
> +        /*
> +         * For very small 'iterations' values, CPU (or crypto
> +         * accelerator) might be fast enough that the schedular

scheduler as above

> +         * hasn't incremented getrusage() data, or incremented
> +         * it by a very small amount, resulting in delta_ms == 0.
> +         * Once we've scaled 'iterations' x10, 5 times, we really
> +         * should be seeing delta_ms != 0, so sanity check at
> +         * that point.
> +         */
> +        if (scaled > 5 &&
> +            delta_ms == 0) { /* sanity check */
>              error_setg(errp, "Unable to get accurate CPU usage");
>              goto cleanup;
>          } else if (delta_ms > 500) {
> @@ -141,6 +151,7 @@ static void *threaded_qcrypto_pbkdf2_count_iters(void *data)
>          } else {
>              iterations = (iterations * 1000 / delta_ms);
>          }
> +        scaled++;
>      }
>  
>      iterations = iterations * 1000 / delta_ms;
> -- 
> 2.47.1

I don't have a good way to test this unfortunately as we only saw it
once, on s390x, in a build system.

You might want to add links to the bug reports and earlier threads
about this:

https://lore.kernel.org/qemu-devel/ffe542bb-310c-4616-b0ca-13182f849fd1@redhat.com/
https://lore.kernel.org/qemu-devel/ffe542bb-310c-4616-b0ca-13182f849fd1@redhat.com/
https://bugzilla.redhat.com/show_bug.cgi?id=2336437

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html



  reply	other threads:[~2025-01-08 18:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-08 18:43 [PATCH] crypto: fix bogus error benchmarking pbkdf on fast machines Daniel P. Berrangé
2025-01-08 18:59 ` Richard W.M. Jones [this message]
2025-01-08 19:41 ` Thomas Huth
2025-01-22 19:19 ` Michael Tokarev
2025-01-22 20:16   ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250108185904.GH1450@redhat.com \
    --to=rjones@redhat.com \
    --cc=berrange@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.