From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3336C4320E for ; Thu, 26 Aug 2021 00:52:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CF29F6108E for ; Thu, 26 Aug 2021 00:52:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235554AbhHZAwq (ORCPT ); Wed, 25 Aug 2021 20:52:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235218AbhHZAwp (ORCPT ); Wed, 25 Aug 2021 20:52:45 -0400 Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 711DCC0613D9 for ; Wed, 25 Aug 2021 17:51:59 -0700 (PDT) Received: by mail-pf1-x433.google.com with SMTP id a21so1178385pfh.5 for ; Wed, 25 Aug 2021 17:51:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=f3PaRj0+UK3KiHBEQ/mOLHR7LbXNTGksF7paqU+Kuls=; b=A6asxkjO+IhVOQFmhI2tBLQ24vNU4qbyC65KkjHK6YZEdFzhqDTuNOl/jUY0GY8BZk sO21CWd5euGfV+luP8orfSSY/6R8H/lyogJvRGslTcBehiRZLI6fp4wTJHAZTkWW0Luj 236HN6iRo5IlditmnFRnNMm8X/Z+T2vuG5T/fkhA/TK8hHnA0ENvDZ7NczgztwdHw795 FBtREj4Nfnw/RApEMCNmyqeYph9H+5+VTpFwKCsOwANy6FK0VwyT/yA9+TXDqcqo4zbp CBNvsaomjk9PSQ4KnFbCwLS1WZ5WBZeuC6vG5p684Rv340MRHtj/gV3n4AGPaCXS6GPY fWYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=f3PaRj0+UK3KiHBEQ/mOLHR7LbXNTGksF7paqU+Kuls=; b=mVbwJaE3RXZtqYhjWnIhE2qHZcko80OO2+OnOWoDcjglXWsXEIGK5vz5q9i1Pqh8bH jZt0PFzapCAhtYAKIalcbdxa89x3MiiDQKYY/6KQfvVKBz1hyVh6bxQSuKaf3pJBx9Qa qTwolY7/nFUSkd4RsOeldcK6dH4cS/G8t163vFVM6zdwmbQuOLNZTVm8PzT2dr5gd2CA 6e+kNfeHxSYfnJwSJgYufUkRtw5KXYMDVY1JI6ph0y7jzVJ+iXGob1tCm3VstNTjxJ+A aq1LyVAkg8lLDa63w9QoAV7vqkv7D45yHI05WX9bBvGVP6/FsLdRgeb6oVuEqMgs5JeH zZHg== X-Gm-Message-State: AOAM532g6HSNBNg/5ufId9KEZYDsA4WaQiIaeB4cDUZ8+kG+DtSdBxjG 1aKzTlckQpcw6+iy+XSec7L1ew== X-Google-Smtp-Source: ABdhPJz66WwJdot9pOGh7Q3Z/ZxgXoc0GykrLFcpNH6VM8KC5/c4zJUQuHeGGSWtTQv/EwVGtnL95A== X-Received: by 2002:a63:401:: with SMTP id 1mr926843pge.166.1629939118422; Wed, 25 Aug 2021 17:51:58 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id j21sm756334pfj.66.2021.08.25.17.51.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Aug 2021 17:51:57 -0700 (PDT) Date: Thu, 26 Aug 2021 00:51:54 +0000 From: Sean Christopherson To: Mathieu Desnoyers Cc: Darren Hart , "Russell King, ARM Linux" , Catalin Marinas , Will Deacon , Guo Ren , Thomas Bogendoerfer , Michael Ellerman , Heiko Carstens , gor , Christian Borntraeger , rostedt , Ingo Molnar , Oleg Nesterov , Thomas Gleixner , Peter Zijlstra , Andy Lutomirski , paulmck , Boqun Feng , Paolo Bonzini , shuah , Benjamin Herrenschmidt , Paul Mackerras , linux-arm-kernel , linux-kernel , linux-csky , linux-mips , linuxppc-dev , linux-s390 , KVM list , linux-kselftest , Peter Foley , Shakeel Butt , Ben Gardon Subject: Re: [PATCH v2 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs Message-ID: References: <20210820225002.310652-1-seanjc@google.com> <20210820225002.310652-5-seanjc@google.com> <766990430.21713.1629731934069.JavaMail.zimbra@efficios.com> <282257549.21721.1629732017655.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <282257549.21721.1629732017655.JavaMail.zimbra@efficios.com> Precedence: bulk List-ID: X-Mailing-List: linux-csky@vger.kernel.org On Mon, Aug 23, 2021, Mathieu Desnoyers wrote: > [ re-send to Darren Hart ] > > ----- On Aug 23, 2021, at 11:18 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > > > ----- On Aug 20, 2021, at 6:50 PM, Sean Christopherson seanjc@google.com wrote: > > > >> Add a test to verify an rseq's CPU ID is updated correctly if the task is > >> migrated while the kernel is handling KVM_RUN. This is a regression test > >> for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer > >> to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM > >> without updating rseq, leading to a stale CPU ID and other badness. > >> > > > > [...] > > > > +#define RSEQ_SIG 0xdeadbeef > > > > Is there any reason for defining a custom signature rather than including > > tools/testing/selftests/rseq/rseq.h ? This should take care of including > > the proper architecture header which will define the appropriate signature. > > > > Arguably you don't define rseq critical sections in this test per se, but > > I'm wondering why the custom signature here. Partly to avoid taking a dependency on rseq.h, and partly to try to call out that the test doesn't actually do any rseq critical sections. > > [...] > > > >> + > >> +static void *migration_worker(void *ign) > >> +{ > >> + cpu_set_t allowed_mask; > >> + int r, i, nr_cpus, cpu; > >> + > >> + CPU_ZERO(&allowed_mask); > >> + > >> + nr_cpus = CPU_COUNT(&possible_mask); > >> + > >> + for (i = 0; i < 20000; i++) { > >> + cpu = i % nr_cpus; > >> + if (!CPU_ISSET(cpu, &possible_mask)) > >> + continue; > >> + > >> + CPU_SET(cpu, &allowed_mask); > >> + > >> + /* > >> + * Bump the sequence count twice to allow the reader to detect > >> + * that a migration may have occurred in between rseq and sched > >> + * CPU ID reads. An odd sequence count indicates a migration > >> + * is in-progress, while a completely different count indicates > >> + * a migration occurred since the count was last read. > >> + */ > >> + atomic_inc(&seq_cnt); > > > > So technically this atomic_inc contains the required barriers because the > > selftests implementation uses "__sync_add_and_fetch(&addr->val, 1)". But > > it's rather odd that the semantic differs from the kernel implementation in > > terms of memory barriers: the kernel implementation of atomic_inc > > guarantees no memory barriers, but this one happens to provide full > > barriers pretty much by accident (selftests futex/include/atomic.h > > documents no such guarantee). Yeah, I got quite lost trying to figure out what atomics the test would actually end up with. > > If this full barrier guarantee is indeed provided by the selftests atomic.h > > header, I would really like a comment stating that in the atomic.h header > > so the carpet is not pulled from under our feet by a future optimization. > > > > > >> + r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask); > >> + TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)", > >> + errno, strerror(errno)); > >> + atomic_inc(&seq_cnt); > >> + > >> + CPU_CLR(cpu, &allowed_mask); > >> + > >> + /* > >> + * Let the read-side get back into KVM_RUN to improve the odds > >> + * of task migration coinciding with KVM's run loop. > > > > This comment should be about increasing the odds of letting the seqlock > > read-side complete. Otherwise, the delay between the two back-to-back > > atomic_inc is so small that the seqlock read-side may never have time to > > complete the reading the rseq cpu id and the sched_getcpu() call, and can > > retry forever. Hmm, but that's not why there's a delay. I'm not arguing that a livelock isn't possible (though that syscall would have to be screaming fast), but the primary motivation is very much to allow the read-side enough time to get back into KVM proper. To encounter the bug, TIF_NOTIFY_RESUME has to be recognized by KVM in its run loop, i.e. sched_setaffinity() must induce task migration after the read-side has invoked ioctl(KVM_RUN). > > I'm wondering if 1 microsecond is sufficient on other architectures as > > well. I'm definitely wondering that as well :-) > > One alternative way to make this depend less on the architecture's > > implementation of sched_getcpu (whether it's a vDSO, or goes through a > > syscall) would be to read the rseq cpu id and call sched_getcpu a few times > > (e.g. 3 times) in the migration thread rather than use usleep, and throw > > away the value read. This would ensure the delay is appropriate on all > > architectures. As above, I think an arbitrary delay is required regardless of how fast sched_getcpu() can execute. One thought would be to do sched_getcpu() _and_ usleep() to account for sched_getcpu() overhead and to satisfy the KVM_RUN part, but I don't know that that adds meaningful value. The real test is if someone could see if the bug repros on non-x86 hardware... From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FB2CC432BE for ; Thu, 26 Aug 2021 00:52:47 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1E84A60E90 for ; Thu, 26 Aug 2021 00:52:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 1E84A60E90 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.ozlabs.org Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4Gw47w3X4wz2ync for ; Thu, 26 Aug 2021 10:52:44 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20161025 header.b=A6asxkjO; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::52c; helo=mail-pg1-x52c.google.com; envelope-from=seanjc@google.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20161025 header.b=A6asxkjO; dkim-atps=neutral Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Gw4782CZ6z2yHR for ; Thu, 26 Aug 2021 10:52:02 +1000 (AEST) Received: by mail-pg1-x52c.google.com with SMTP id g184so1474845pgc.6 for ; Wed, 25 Aug 2021 17:52:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=f3PaRj0+UK3KiHBEQ/mOLHR7LbXNTGksF7paqU+Kuls=; b=A6asxkjO+IhVOQFmhI2tBLQ24vNU4qbyC65KkjHK6YZEdFzhqDTuNOl/jUY0GY8BZk sO21CWd5euGfV+luP8orfSSY/6R8H/lyogJvRGslTcBehiRZLI6fp4wTJHAZTkWW0Luj 236HN6iRo5IlditmnFRnNMm8X/Z+T2vuG5T/fkhA/TK8hHnA0ENvDZ7NczgztwdHw795 FBtREj4Nfnw/RApEMCNmyqeYph9H+5+VTpFwKCsOwANy6FK0VwyT/yA9+TXDqcqo4zbp CBNvsaomjk9PSQ4KnFbCwLS1WZ5WBZeuC6vG5p684Rv340MRHtj/gV3n4AGPaCXS6GPY fWYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=f3PaRj0+UK3KiHBEQ/mOLHR7LbXNTGksF7paqU+Kuls=; b=FbspXBFHR+s1b7IDB2jftksuzJtYIydhRQBimH7reWpD5a80FLIfpUCIexF0Dd2V9i W7fBUhPQnC/vFNu3t1DwBztAI4ETZntMVlbxScijyrjNV5KcaRQLc0V1FR5vvfFV0Mcj JmJzF9XhjaAwQPkflJDach+slGqXjvsYJulGCDqRrdqKXemI3lTMTZWoWxhHMmRJ4eNu J98F0S2Yh0sEMp5yoKem/CR2mTdRBPLMADWemOKvabmEK0lsDOw/AfPywEIS1utcToaB n87Z0ILLrdFakefmeTY+urvYZkhzD/z9FHsfKFPirxIP0a0BkmBAREqWU/CGDI+ZLAFn Xlzw== X-Gm-Message-State: AOAM532OOpUZySDYm7fky44jmiWLnOEP8x5cvS96jh3t50HWV0e/oBW5 mchbls/laYf9GpXQqyKDUDam7A== X-Google-Smtp-Source: ABdhPJz66WwJdot9pOGh7Q3Z/ZxgXoc0GykrLFcpNH6VM8KC5/c4zJUQuHeGGSWtTQv/EwVGtnL95A== X-Received: by 2002:a63:401:: with SMTP id 1mr926843pge.166.1629939118422; Wed, 25 Aug 2021 17:51:58 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id j21sm756334pfj.66.2021.08.25.17.51.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Aug 2021 17:51:57 -0700 (PDT) Date: Thu, 26 Aug 2021 00:51:54 +0000 From: Sean Christopherson To: Mathieu Desnoyers Subject: Re: [PATCH v2 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs Message-ID: References: <20210820225002.310652-1-seanjc@google.com> <20210820225002.310652-5-seanjc@google.com> <766990430.21713.1629731934069.JavaMail.zimbra@efficios.com> <282257549.21721.1629732017655.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <282257549.21721.1629732017655.JavaMail.zimbra@efficios.com> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: KVM list , Peter Zijlstra , Catalin Marinas , linux-kernel , Will Deacon , Guo Ren , linux-kselftest , Ben Gardon , shuah , Paul Mackerras , linux-s390 , gor , "Russell King, ARM Linux" , linux-csky , Christian Borntraeger , Ingo Molnar , Darren Hart , linux-mips , Boqun Feng , paulmck , Heiko Carstens , rostedt , Shakeel Butt , Andy Lutomirski , Thomas Gleixner , Peter Foley , linux-arm-kernel , Thomas Bogendoerfer , Oleg Nesterov , Paolo Bonzini , linuxppc-dev Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Mon, Aug 23, 2021, Mathieu Desnoyers wrote: > [ re-send to Darren Hart ] > > ----- On Aug 23, 2021, at 11:18 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > > > ----- On Aug 20, 2021, at 6:50 PM, Sean Christopherson seanjc@google.com wrote: > > > >> Add a test to verify an rseq's CPU ID is updated correctly if the task is > >> migrated while the kernel is handling KVM_RUN. This is a regression test > >> for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer > >> to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM > >> without updating rseq, leading to a stale CPU ID and other badness. > >> > > > > [...] > > > > +#define RSEQ_SIG 0xdeadbeef > > > > Is there any reason for defining a custom signature rather than including > > tools/testing/selftests/rseq/rseq.h ? This should take care of including > > the proper architecture header which will define the appropriate signature. > > > > Arguably you don't define rseq critical sections in this test per se, but > > I'm wondering why the custom signature here. Partly to avoid taking a dependency on rseq.h, and partly to try to call out that the test doesn't actually do any rseq critical sections. > > [...] > > > >> + > >> +static void *migration_worker(void *ign) > >> +{ > >> + cpu_set_t allowed_mask; > >> + int r, i, nr_cpus, cpu; > >> + > >> + CPU_ZERO(&allowed_mask); > >> + > >> + nr_cpus = CPU_COUNT(&possible_mask); > >> + > >> + for (i = 0; i < 20000; i++) { > >> + cpu = i % nr_cpus; > >> + if (!CPU_ISSET(cpu, &possible_mask)) > >> + continue; > >> + > >> + CPU_SET(cpu, &allowed_mask); > >> + > >> + /* > >> + * Bump the sequence count twice to allow the reader to detect > >> + * that a migration may have occurred in between rseq and sched > >> + * CPU ID reads. An odd sequence count indicates a migration > >> + * is in-progress, while a completely different count indicates > >> + * a migration occurred since the count was last read. > >> + */ > >> + atomic_inc(&seq_cnt); > > > > So technically this atomic_inc contains the required barriers because the > > selftests implementation uses "__sync_add_and_fetch(&addr->val, 1)". But > > it's rather odd that the semantic differs from the kernel implementation in > > terms of memory barriers: the kernel implementation of atomic_inc > > guarantees no memory barriers, but this one happens to provide full > > barriers pretty much by accident (selftests futex/include/atomic.h > > documents no such guarantee). Yeah, I got quite lost trying to figure out what atomics the test would actually end up with. > > If this full barrier guarantee is indeed provided by the selftests atomic.h > > header, I would really like a comment stating that in the atomic.h header > > so the carpet is not pulled from under our feet by a future optimization. > > > > > >> + r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask); > >> + TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)", > >> + errno, strerror(errno)); > >> + atomic_inc(&seq_cnt); > >> + > >> + CPU_CLR(cpu, &allowed_mask); > >> + > >> + /* > >> + * Let the read-side get back into KVM_RUN to improve the odds > >> + * of task migration coinciding with KVM's run loop. > > > > This comment should be about increasing the odds of letting the seqlock > > read-side complete. Otherwise, the delay between the two back-to-back > > atomic_inc is so small that the seqlock read-side may never have time to > > complete the reading the rseq cpu id and the sched_getcpu() call, and can > > retry forever. Hmm, but that's not why there's a delay. I'm not arguing that a livelock isn't possible (though that syscall would have to be screaming fast), but the primary motivation is very much to allow the read-side enough time to get back into KVM proper. To encounter the bug, TIF_NOTIFY_RESUME has to be recognized by KVM in its run loop, i.e. sched_setaffinity() must induce task migration after the read-side has invoked ioctl(KVM_RUN). > > I'm wondering if 1 microsecond is sufficient on other architectures as > > well. I'm definitely wondering that as well :-) > > One alternative way to make this depend less on the architecture's > > implementation of sched_getcpu (whether it's a vDSO, or goes through a > > syscall) would be to read the rseq cpu id and call sched_getcpu a few times > > (e.g. 3 times) in the migration thread rather than use usleep, and throw > > away the value read. This would ensure the delay is appropriate on all > > architectures. As above, I think an arbitrary delay is required regardless of how fast sched_getcpu() can execute. One thought would be to do sched_getcpu() _and_ usleep() to account for sched_getcpu() overhead and to satisfy the KVM_RUN part, but I don't know that that adds meaningful value. The real test is if someone could see if the bug repros on non-x86 hardware... From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF499C432BE for ; Thu, 26 Aug 2021 00:53:29 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id ADA1860E90 for ; Thu, 26 Aug 2021 00:53:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org ADA1860E90 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=j4JGVqcmVJNJmI+eYBHc0KZVGVFjmNlboLdUPtmbF3A=; b=TeCdKECiqi7qvT UI20CpRdnx1ekDBkgZ4qkJxJk5NKFQ8O51ivWwNw6rF81kNSntJuyz8bl6tZmmYLYvZboVJV5dN9u xMqy211/GT/lNvJHrk/goOdOBgmgK6UfFWpL+S9FQZY3gNReIvtedJ5gytMlTuywFqMB1wu8Nl4C7 qoQ6sHa1lzk5z8bTf+mJZa8ySXX8hagGkPhETY9qvpQv2mjYpcm2qZuwMg3FggJaZvFJjGhbJEruH DCoHLOP5WZEEXXhySQZdzqjp/gCNfWKfz6SPdwcAG5v38i3VlphHoQOaw1IC10nb6GVXHrqTHrBog 2XS0MR4E5rMUhyDm2FPA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mJ3ch-008oPu-JT; Thu, 26 Aug 2021 00:52:03 +0000 Received: from mail-pf1-x434.google.com ([2607:f8b0:4864:20::434]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mJ3cd-008oOt-Ts for linux-arm-kernel@lists.infradead.org; Thu, 26 Aug 2021 00:52:01 +0000 Received: by mail-pf1-x434.google.com with SMTP id 2so1164580pfo.8 for ; Wed, 25 Aug 2021 17:51:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=f3PaRj0+UK3KiHBEQ/mOLHR7LbXNTGksF7paqU+Kuls=; b=A6asxkjO+IhVOQFmhI2tBLQ24vNU4qbyC65KkjHK6YZEdFzhqDTuNOl/jUY0GY8BZk sO21CWd5euGfV+luP8orfSSY/6R8H/lyogJvRGslTcBehiRZLI6fp4wTJHAZTkWW0Luj 236HN6iRo5IlditmnFRnNMm8X/Z+T2vuG5T/fkhA/TK8hHnA0ENvDZ7NczgztwdHw795 FBtREj4Nfnw/RApEMCNmyqeYph9H+5+VTpFwKCsOwANy6FK0VwyT/yA9+TXDqcqo4zbp CBNvsaomjk9PSQ4KnFbCwLS1WZ5WBZeuC6vG5p684Rv340MRHtj/gV3n4AGPaCXS6GPY fWYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=f3PaRj0+UK3KiHBEQ/mOLHR7LbXNTGksF7paqU+Kuls=; b=FHDeawsgEAGM8+gshJEWobUi53cWckDjWlKG+ZzoWBwCyNLYBrwGdeztOx2DIbeSyn ItHeYd0XAQCrsOndJMywfzmNPlpWebUGG3oi/mkerPe/Ow/ZZQrQz41NYIdce71KQRQi gLG+pZCSS3XgQwB4gyBuySGKKOT6KT+7mBcqsSmnH/s6VRKhyuvRmloWp9gGmquz4Opg o4rsXaY3uOfKc2L+Lg1FgahmJFZac/Z1IGu+VFBKMdYFjB78s2JVGQ0Vlp22iKz3RTSe B9D/io/eHoybDkDWv5rKSMQytspK78pqwH/Tyn8V1eaQTEz72MXixFRpG3RbbkHTvA9n lPlQ== X-Gm-Message-State: AOAM531L1F9RTjIvdnmqHhbWIczJxhMm61CyV9RvlfD35N8XkyUCD4Sf l9YCVorllk5dAML9vZoq+3Unzw== X-Google-Smtp-Source: ABdhPJz66WwJdot9pOGh7Q3Z/ZxgXoc0GykrLFcpNH6VM8KC5/c4zJUQuHeGGSWtTQv/EwVGtnL95A== X-Received: by 2002:a63:401:: with SMTP id 1mr926843pge.166.1629939118422; Wed, 25 Aug 2021 17:51:58 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id j21sm756334pfj.66.2021.08.25.17.51.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Aug 2021 17:51:57 -0700 (PDT) Date: Thu, 26 Aug 2021 00:51:54 +0000 From: Sean Christopherson To: Mathieu Desnoyers Cc: Darren Hart , "Russell King, ARM Linux" , Catalin Marinas , Will Deacon , Guo Ren , Thomas Bogendoerfer , Michael Ellerman , Heiko Carstens , gor , Christian Borntraeger , rostedt , Ingo Molnar , Oleg Nesterov , Thomas Gleixner , Peter Zijlstra , Andy Lutomirski , paulmck , Boqun Feng , Paolo Bonzini , shuah , Benjamin Herrenschmidt , Paul Mackerras , linux-arm-kernel , linux-kernel , linux-csky , linux-mips , linuxppc-dev , linux-s390 , KVM list , linux-kselftest , Peter Foley , Shakeel Butt , Ben Gardon Subject: Re: [PATCH v2 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs Message-ID: References: <20210820225002.310652-1-seanjc@google.com> <20210820225002.310652-5-seanjc@google.com> <766990430.21713.1629731934069.JavaMail.zimbra@efficios.com> <282257549.21721.1629732017655.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <282257549.21721.1629732017655.JavaMail.zimbra@efficios.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210825_175200_036368_774CEB1F X-CRM114-Status: GOOD ( 45.86 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Aug 23, 2021, Mathieu Desnoyers wrote: > [ re-send to Darren Hart ] > > ----- On Aug 23, 2021, at 11:18 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote: > > > ----- On Aug 20, 2021, at 6:50 PM, Sean Christopherson seanjc@google.com wrote: > > > >> Add a test to verify an rseq's CPU ID is updated correctly if the task is > >> migrated while the kernel is handling KVM_RUN. This is a regression test > >> for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer > >> to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM > >> without updating rseq, leading to a stale CPU ID and other badness. > >> > > > > [...] > > > > +#define RSEQ_SIG 0xdeadbeef > > > > Is there any reason for defining a custom signature rather than including > > tools/testing/selftests/rseq/rseq.h ? This should take care of including > > the proper architecture header which will define the appropriate signature. > > > > Arguably you don't define rseq critical sections in this test per se, but > > I'm wondering why the custom signature here. Partly to avoid taking a dependency on rseq.h, and partly to try to call out that the test doesn't actually do any rseq critical sections. > > [...] > > > >> + > >> +static void *migration_worker(void *ign) > >> +{ > >> + cpu_set_t allowed_mask; > >> + int r, i, nr_cpus, cpu; > >> + > >> + CPU_ZERO(&allowed_mask); > >> + > >> + nr_cpus = CPU_COUNT(&possible_mask); > >> + > >> + for (i = 0; i < 20000; i++) { > >> + cpu = i % nr_cpus; > >> + if (!CPU_ISSET(cpu, &possible_mask)) > >> + continue; > >> + > >> + CPU_SET(cpu, &allowed_mask); > >> + > >> + /* > >> + * Bump the sequence count twice to allow the reader to detect > >> + * that a migration may have occurred in between rseq and sched > >> + * CPU ID reads. An odd sequence count indicates a migration > >> + * is in-progress, while a completely different count indicates > >> + * a migration occurred since the count was last read. > >> + */ > >> + atomic_inc(&seq_cnt); > > > > So technically this atomic_inc contains the required barriers because the > > selftests implementation uses "__sync_add_and_fetch(&addr->val, 1)". But > > it's rather odd that the semantic differs from the kernel implementation in > > terms of memory barriers: the kernel implementation of atomic_inc > > guarantees no memory barriers, but this one happens to provide full > > barriers pretty much by accident (selftests futex/include/atomic.h > > documents no such guarantee). Yeah, I got quite lost trying to figure out what atomics the test would actually end up with. > > If this full barrier guarantee is indeed provided by the selftests atomic.h > > header, I would really like a comment stating that in the atomic.h header > > so the carpet is not pulled from under our feet by a future optimization. > > > > > >> + r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask); > >> + TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)", > >> + errno, strerror(errno)); > >> + atomic_inc(&seq_cnt); > >> + > >> + CPU_CLR(cpu, &allowed_mask); > >> + > >> + /* > >> + * Let the read-side get back into KVM_RUN to improve the odds > >> + * of task migration coinciding with KVM's run loop. > > > > This comment should be about increasing the odds of letting the seqlock > > read-side complete. Otherwise, the delay between the two back-to-back > > atomic_inc is so small that the seqlock read-side may never have time to > > complete the reading the rseq cpu id and the sched_getcpu() call, and can > > retry forever. Hmm, but that's not why there's a delay. I'm not arguing that a livelock isn't possible (though that syscall would have to be screaming fast), but the primary motivation is very much to allow the read-side enough time to get back into KVM proper. To encounter the bug, TIF_NOTIFY_RESUME has to be recognized by KVM in its run loop, i.e. sched_setaffinity() must induce task migration after the read-side has invoked ioctl(KVM_RUN). > > I'm wondering if 1 microsecond is sufficient on other architectures as > > well. I'm definitely wondering that as well :-) > > One alternative way to make this depend less on the architecture's > > implementation of sched_getcpu (whether it's a vDSO, or goes through a > > syscall) would be to read the rseq cpu id and call sched_getcpu a few times > > (e.g. 3 times) in the migration thread rather than use usleep, and throw > > away the value read. This would ensure the delay is appropriate on all > > architectures. As above, I think an arbitrary delay is required regardless of how fast sched_getcpu() can execute. One thought would be to do sched_getcpu() _and_ usleep() to account for sched_getcpu() overhead and to satisfy the KVM_RUN part, but I don't know that that adds meaningful value. The real test is if someone could see if the bug repros on non-x86 hardware... _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel