From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A28DBC6FA99 for ; Sun, 12 Mar 2023 15:55:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Subject:Cc:To:From:Message-ID:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=7O888fC5HJmoniIY7Kk66MwsdbwxzH7P53RkSvtyERs=; b=Bn1dDQHCs/as50 l7g1sxEc4IwXZ1w898zMWDEiFxpfbOrY0ikr6RMsprU0Ra7s1R7TcNfIZSUO/9x7E5lD2bCVsTmaj bjInirdxfJi4F/MiEMb1f7R83m468SBrDxAvxkTHAwB9yPRxSyr+cOj2TFhLlTMht72zZrNg4bywv S1hLSx78d+kzMfHJsRBjbszbdPtnT3e4g7P4LGo6/ujLM2EBc4FoNJzdVqxRWGpAA0nsENqGbhh5b yGbCYmIHSfvB+64ebuxCXhvW67Y3i1OzRtw/Tjv0HkO6RK8HNf9U2UKLHFupJg1/t0B4a6Qc/n4aX iZxWgA4aCK9nPnABxaIA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pbO1L-002nJy-Vc; Sun, 12 Mar 2023 15:54:04 +0000 Received: from ams.source.kernel.org ([2604:1380:4601:e00::1]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pbO1H-002nJ1-7I for linux-arm-kernel@lists.infradead.org; Sun, 12 Mar 2023 15:54:01 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 6D86DB80B50; Sun, 12 Mar 2023 15:53:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2D148C433EF; Sun, 12 Mar 2023 15:53:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1678636436; bh=r6+551gIOQKPPiRxQCYx1252Mvbk04o0a3W/pzy3Ryw=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=jznITYrwq254t7C79j1H/zWrUXib6KvF4iweuYkuUR+Ze/Qk5Irbuv3d0abBQzOLs ZcF3Gg6qNm6xmT//JVCoGc5Wj5hqwkza+Ad6gjUkxl06Sqxb+6BV62O8z3y0T5oY/j 2uDY3j4td1zuPQip7xcZEzk2QY698HyFa6Vea1Q41zsAwmZZu1QMTVPLO2iUsm/T+F Nq0l/usM9FzzAI9z0nHKTkTuJ3EEn+vk5jcenyJBOYWcpaeDanOyffZwkeOYXtsQiB j8Pv3jrE/mxTBLuROF25m2VJZVNHk3hU8KLD5gKrHDxb4kV8hl2TFZghno7r1wAWkX blLcuWstc8c4w== Received: from sofa.misterjones.org ([185.219.108.64] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1pbO1B-00H1OE-VX; Sun, 12 Mar 2023 15:53:54 +0000 Date: Sun, 12 Mar 2023 15:53:53 +0000 Message-ID: <87ttyq3qzy.wl-maz@kernel.org> From: Marc Zyngier To: Colton Lewis Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, yuzenghui@huawei.com, ricarkol@google.com, sveith@amazon.de, dwmw2@infradead.org Subject: Re: [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets In-Reply-To: References: <87a60m9u3a.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: coltonlewis@google.com, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, yuzenghui@huawei.com, ricarkol@google.com, sveith@amazon.de, dwmw2@infradead.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230312_085359_868476_B74F360C X-CRM114-Status: GOOD ( 36.40 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, 10 Mar 2023 19:26:47 +0000, Colton Lewis wrote: > > Marc Zyngier writes: > > >> mvbbq9:/data/coltonlewis/ecv/arm64-obj/kselftest/kvm# > >> ./aarch64/arch_timer -O 0xffff > >> ==== Test Assertion Failure ==== > >> aarch64/arch_timer.c:239: false > >> pid=48094 tid=48095 errno=4 - Interrupted system call > >> 1 0x4010fb: test_vcpu_run at arch_timer.c:239 > >> 2 0x42a5bf: start_thread at pthread_create.o:0 > >> 3 0x46845b: thread_start at clone.o:0 > >> Failed guest assert: xcnt >= cval at aarch64/arch_timer.c:151 > >> values: 2500645901305, 2500645961845; 9939, vcpu 0; stage; 3; iter: 2 > > > The fun part is that you can see similar things without the series: > > > ==== Test Assertion Failure ==== > > aarch64/arch_timer.c:239: false > > pid=647 tid=651 errno=4 - Interrupted system call > > 1 0x00000000004026db: test_vcpu_run at arch_timer.c:239 > > 2 0x00007fffb13cedd7: ?? ??:0 > > 3 0x00007fffb1437e9b: ?? ??:0 > > Failed guest assert: config_iter + 1 == irq_iter at > > aarch64/arch_timer.c:188 > > values: 2, 3; 0, vcpu 3; stage; 4; iter: 3 > > > That's on a vanilla kernel (6.2-rc4) on an M1 with the test run > > without any argument in a loop. After a few iterations, it blows. > > These things are different failures. The first I've only ever found when > setting the -O option. What command did you use to trigger the second if > there were any non-default options? As I already said: "without any argument". maz@babette:~$ ./arch_timer ==== Test Assertion Failure ==== aarch64/arch_timer.c:239: false pid=1110 tid=1113 errno=4 - Interrupted system call 1 0x000000000040268b: test_vcpu_run at arch_timer.c:239 2 0x00007fff9c48edd7: ?? ??:0 3 0x00007fff9c4f7e9b: ?? ??:0 Failed guest assert: config_iter + 1 == irq_iter at aarch64/arch_timer.c:188 values: 3, 4; 0, vcpu 1; stage; 4; iter: 4 As simple as it gets. So either KVM is terminally buggy (quite possible), or this test is. My money is on the second one. > Another interesting finding is that I can't reproduce any problems using > ARM's emulated platform. There is a possibility these errors are > ultimately down to individual hardware quirks, but that's still worth > understanding since everyone uses hardware and not emulators. > > > The problem is that I don't understand enough of the test to make a > > judgement call. I hardly get *what* it is testing. Do you? > > My understanding is the test validates timer interrupts are occuring > when the ARM manual says they should. It sets a comparison value (cval) > at some point a few miliseconds into the future and waits for the > counter (xcnt) to be greater than or equal to the comparison value, at > which point an interrupt should fire. > > The failure I posted occurs at a line that says > > GUEST_ASSERT_3(xcnt >= cval, xcnt, cval, xcnt_diff_us); > > The counter was less than the comparison value, which implies the > interrupt fired early. Do we care? I don't know. I think it's weird that > this occurs when I set a physical offset with -O and no other time. The thing is, you say nothing about your hardware. What is it? does it have ECV? Does it have CNTPOFF? If it has any of those, does it help if you disable this support? > I've also noticed that the greater the offset I set, the greater the > difference between xcnt and cval. I think the physical offset is not > being accounted for every place it should. At the very least, that > indicates change is required in the test. > > The failure you posted occurs at a line that says > > GUEST_ASSERT_2(config_iter + 1 == irq_iter, > config_iter + 1, irq_iter); > > I gather from context that the values were unequal because an expected > interrupt never fired or was not counted. Do we care? I don't know. I > think someone should. What is the point of a test that fails randomly without anyone understanding what it is supposed to do? If that's the state of the selftests, maybe I should just go and remove the aarch64 directory. M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel