From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DECEDC76186 for ; Thu, 9 Mar 2023 09:03:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231181AbjCIJDj (ORCPT ); Thu, 9 Mar 2023 04:03:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229685AbjCIJDM (ORCPT ); Thu, 9 Mar 2023 04:03:12 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BF7C38EA7 for ; Thu, 9 Mar 2023 01:01:34 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 23E8261AA8 for ; Thu, 9 Mar 2023 09:01:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 87296C433D2; Thu, 9 Mar 2023 09:01:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678352493; bh=Z8hV4/oscPiK1xSoerJtDAFhaIARSxXogbjpSNynNjE=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=tY/YV0YGZIqbjM+JdQ6aUd2Ot7qSc8uvnE89vFRda+7UvNaztep5jd7uD5wjHn5dt GkQDIL9ju8okZHN05SjrFSLUbsqXV+q9x85CWvRzDlBKeBiTFaCZ4OoL5hZR/3Om2Y N4KXJ5NE/b9PGtAZimQboJLUGPSwxq1X4WoOA5GsmqY5LMA4ZCSi7OIoXb7p+na/v/ X8q3KU1PX0zdR/yUEzIHh48w/WG78otESB0FECspTDRw7K024ZxQhMOUkTAH4KcJrT 35sSP8ppyFz+SiN/r3w+RS+lv0nQrhedi2o0v/xggjsyq6mWulxiwmnqnIHTUbNTEm u651SOf0k5PIw== Received: from 82-132-228-233.dab.02.net ([82.132.228.233] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1paC9S-00GFoL-Up; Thu, 09 Mar 2023 09:01:31 +0000 Date: Thu, 09 Mar 2023 09:01:29 +0000 Message-ID: <87a60m9u3a.wl-maz@kernel.org> From: Marc Zyngier To: Colton Lewis Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, yuzenghui@huawei.com, ricarkol@google.com, sveith@amazon.de, dwmw2@infradead.org Subject: Re: [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets In-Reply-To: References: <20230216142123.2638675-16-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 82.132.228.233 X-SA-Exim-Rcpt-To: coltonlewis@google.com, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, yuzenghui@huawei.com, ricarkol@google.com, sveith@amazon.de, dwmw2@infradead.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Mon, 06 Mar 2023 22:08:04 +0000, Colton Lewis wrote: > > Hi Marc, > > First of all, thanks for your previous responses to my comments. Many of > them clarified things I did not fully understand on my own. > > As I stated in another email, I've been testing this series on ECV > capable hardware. Things look good but I have been able to reproduce a > consistent assertion failure in this selftest when setting a > sufficiently large physical offset. I have so far not been able to > determine the cause of the failure and wonder if you have any insight as > to what might be causing this and how to debug. > > The following example reproduces the error every time I have tried: > > mvbbq9:/data/coltonlewis/ecv/arm64-obj/kselftest/kvm# > ./aarch64/arch_timer -O 0xffff > ==== Test Assertion Failure ==== > aarch64/arch_timer.c:239: false > pid=48094 tid=48095 errno=4 - Interrupted system call > 1 0x4010fb: test_vcpu_run at arch_timer.c:239 > 2 0x42a5bf: start_thread at pthread_create.o:0 > 3 0x46845b: thread_start at clone.o:0 > Failed guest assert: xcnt >= cval at aarch64/arch_timer.c:151 > values: 2500645901305, 2500645961845; 9939, vcpu 0; stage; 3; iter: 2 The fun part is that you can see similar things without the series: ==== Test Assertion Failure ==== aarch64/arch_timer.c:239: false pid=647 tid=651 errno=4 - Interrupted system call 1 0x00000000004026db: test_vcpu_run at arch_timer.c:239 2 0x00007fffb13cedd7: ?? ??:0 3 0x00007fffb1437e9b: ?? ??:0 Failed guest assert: config_iter + 1 == irq_iter at aarch64/arch_timer.c:188 values: 2, 3; 0, vcpu 3; stage; 4; iter: 3 That's on a vanilla kernel (6.2-rc4) on an M1 with the test run without any argument in a loop. After a few iterations, it blows. > > Observations: > > - Failure always occurs at stage 3 or 4 (physical timer stages) > - xcnt_diff_us is always slightly less than 10000, or 10 ms > - Reducing offset size reduces the probability of failure linearly (for > example, -O 0x8000 will fail close to half the time) > - Failure occurs with a wide range of different period values and > whether or not migrations happen The problem is that I don't understand enough of the test to make a judgement call. I hardly get *what* it is testing. Do you? Thanks, M. -- Without deviation from the norm, progress is not possible.