From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 68263E7717F for ; Tue, 10 Dec 2024 18:46:43 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tL5FF-0002Z1-LU; Tue, 10 Dec 2024 13:46:07 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tL5FC-0002Yp-3R for qemu-devel@nongnu.org; Tue, 10 Dec 2024 13:46:02 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tL5F9-0006T8-04 for qemu-devel@nongnu.org; Tue, 10 Dec 2024 13:46:01 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733856356; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=b4ptb0ccDOWSKZk5zdYYZ1wYMNBp3LZvCUdSrBHIxDU=; b=MyEzKiBzzVEW+jqJAYgK19tpMErIcog8rq5jlbprrFwviyPZIIJmm6U8tdQmIbcjzXX3U/ C1apQ/mhowFGYxcA6/a+wIKh6cDHoPxjpfdSoZEfqcvbrtOXK4xntyVfyNGIM6kMYWY3t1 ydAmtDZOUKHPiFNQQEGlcZfWFB2a1QA= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-628-i3GEfFlhMvC2PpmuSQgZ5w-1; Tue, 10 Dec 2024 13:45:53 -0500 X-MC-Unique: i3GEfFlhMvC2PpmuSQgZ5w-1 X-Mimecast-MFC-AGG-ID: i3GEfFlhMvC2PpmuSQgZ5w Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D07AE1955DBB; Tue, 10 Dec 2024 18:45:51 +0000 (UTC) Received: from redhat.com (unknown [10.42.28.69]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9376D19560A2; Tue, 10 Dec 2024 18:45:49 +0000 (UTC) Date: Tue, 10 Dec 2024 18:45:46 +0000 From: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= To: Thomas Huth Cc: Tiago Pasqualini , qemu-devel@nongnu.org Subject: Re: [PATCH v2] crypto: run qcrypto_pbkdf2_count_iters in a new thread Message-ID: References: <20240904235230.199672-1-tiago.pasqualini@canonical.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/2.2.13 (2024-03-09) X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 Received-SPF: pass client-ip=170.10.133.124; envelope-from=berrange@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.52, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On Tue, Dec 10, 2024 at 06:29:49PM +0100, Thomas Huth wrote: > On 05/09/2024 01.52, Tiago Pasqualini wrote: > > CPU time accounting in the kernel has been demonstrated to have a > > sawtooth pattern[1][2]. This can cause the getrusage system call to > > not be as accurate as we are expecting, which can cause this calculation > > to stall. > > > > The kernel discussions shows that this inaccuracy happens when CPU time > > gets big enough, so this patch changes qcrypto_pbkdf2_count_iters to run > > in a fresh thread to avoid this inaccuracy. It also adds a sanity check > > to fail the process if CPU time is not accounted. > > > > [1] https://lore.kernel.org/lkml/159231011694.16989.16351419333851309713.tip-bot2@tip-bot2/ > > [2] https://lore.kernel.org/lkml/20221226031010.4079885-1-maxing.lan@bytedance.com/t/#m1c7f2fdc0ea742776a70fd1aa2a2e414c437f534 > > > > Resolves: #2398 > > Signed-off-by: Tiago Pasqualini > > --- > > crypto/pbkdf.c | 53 +++++++++++++++++++++++++++++++++++++++++++------- > > 1 file changed, 46 insertions(+), 7 deletions(-) > > +static void *threaded_qcrypto_pbkdf2_count_iters(void *data) > > { > > + CountItersData *iters_data = (CountItersData *) data; > > + QCryptoHashAlgorithm hash = iters_data->hash; > > + const uint8_t *key = iters_data->key; > > + size_t nkey = iters_data->nkey; > > + const uint8_t *salt = iters_data->salt; > > + size_t nsalt = iters_data->nsalt; > > + size_t nout = iters_data->nout; > > + Error **errp = iters_data->errp; > > + > > uint64_t ret = -1; > > g_autofree uint8_t *out = g_new(uint8_t, nout); > > uint64_t iterations = (1 << 15); > > @@ -114,7 +131,10 @@ uint64_t qcrypto_pbkdf2_count_iters(QCryptoHashAlgorithm hash, > > delta_ms = end_ms - start_ms; > > - if (delta_ms > 500) { > > + if (delta_ms == 0) { /* sanity check */ > > + error_setg(errp, "Unable to get accurate CPU usage"); > > Hi! > > While running "make check -j12 SPEED=slow" on a s390x host, I got: > > 3/657 qemu:unit / test-crypto-block ERROR 0.27s > killed by signal 6 SIGABRT > >>> G_TEST_SLOW=1 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 > MALLOC_PERTURB_=150 G_TEST_BUILDDIR=/home/thuth/s390x/qemu/tests/unit > MESON_TEST_ITERATION=1 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 > ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 > G_TEST_SRCDIR=/home/thuth/devel/qemu/tests/unit > /home/thuth/s390x/qemu/tests/unit/test-crypto-block --tap -k > ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > ✀ ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > stderr: > Unexpected error in threaded_qcrypto_pbkdf2_count_iters() at > ../../devel/qemu/crypto/pbkdf.c:135: > /home/thuth/s390x/qemu/tests/unit/test-crypto-block: Unable to get accurate > CPU usage > > (test program exited with status code -6) > > TAP parsing error: Too few tests run (expected 27, got 1) > ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > > Could we handle this more gracefully, please? I don't think that this should > fail the unit tests, should it? This is a scenario that is not supposed to ever happen. It means we've had two calls to getrusage which returns exactly the same CPU time. This should be impossible, as we've definitely burnt lots of CPU between these two calls. Is this reliably reproducable only on s390x ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|