From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09A5CC4320A for ; Thu, 19 Aug 2021 23:33:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E2354610D2 for ; Thu, 19 Aug 2021 23:33:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236572AbhHSXef (ORCPT ); Thu, 19 Aug 2021 19:34:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234019AbhHSXee (ORCPT ); Thu, 19 Aug 2021 19:34:34 -0400 Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9428C06175F for ; Thu, 19 Aug 2021 16:33:57 -0700 (PDT) Received: by mail-pl1-x636.google.com with SMTP id c4so4882979plh.7 for ; Thu, 19 Aug 2021 16:33:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=FSqOrapu64fBYafjbU5tjcyWOc2DrNI0dbheqGMVOMc=; b=qPf6yOnP+Rcz9FT3tKuWykTPcKe82BX3Np45gaPnFPSg6PdUJnLygVgfX+kzI3cJ9H jtESUvUqN1Ez+DTuoTUzsGouHIoQxocdabjlXnhhXdwEIL3SL3EPMESdP1tUYWRrWW5X 56WH9AV8Z8I1+Dv+ob4dltRCZxxppIzcXuZZbDajZOYY+UHcUj8zqAQ2N2zC8ir+rqzQ 7xRIzOV9hhvssqy3laN9jfhD8920m0bpiZR1ld4i0qnlmpXy3nqZRp+SXiFRJw3tk0Cb jAkYC7GX39D/8+SHQHrfwZ3zlTN2RbLaSSUFeut9erJLOEf2TnrtjesxTQV5Xd8hTAQL GItQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=FSqOrapu64fBYafjbU5tjcyWOc2DrNI0dbheqGMVOMc=; b=DUWWeoyX9ardRTcRrFL3yYuIsqazRmkiAOretudYLdTl5Sz8DLk/8AY+Fu7XOg8OJH 32zrATrqvhBdCaGgN+6mTKzreD0Dy0bWvBypkZbRxG3G9zadTrbV4FZWiv/KRVwd/HS1 u6TjA2QXqAtpAbUrEABPnqHp6RDlQr0n5SNhumggRp56dgpWSYDqr6e8sY4NkZhczZKV Sf5kHynfuqX+pQq1bFj7RboJXJQhoXN4/cvhldbuBdyNCHC+Do1xoIOARtUMzdy7/fWv 28doOH8naXF5IIwcMUAYPNxZWHqnP49vzZgd2UCCimdkDSpuL7Ifwn2qw5ZBW4cx6J8T bLzw== X-Gm-Message-State: AOAM530EmqmU27HE3SAxjO4CSHlqPabe5/AjLQ6B893kfBctjylspqey 0Z/ChRH0RURvi/qoPJ7LJmepOw== X-Google-Smtp-Source: ABdhPJyN4VYmA4QdN+DQJ6bFcB8JQlcPHEw8dvqiqzL/oaDUmvZhZqDC612NpYT8Mg6IsLybHUWCkQ== X-Received: by 2002:a17:90a:d686:: with SMTP id x6mr1336693pju.227.1629416036922; Thu, 19 Aug 2021 16:33:56 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id j185sm4649122pfb.86.2021.08.19.16.33.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Aug 2021 16:33:56 -0700 (PDT) Date: Thu, 19 Aug 2021 23:33:50 +0000 From: Sean Christopherson To: Mathieu Desnoyers Cc: "Russell King, ARM Linux" , Catalin Marinas , Will Deacon , Guo Ren , Thomas Bogendoerfer , Michael Ellerman , Heiko Carstens , gor , Christian Borntraeger , Oleg Nesterov , rostedt , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Andy Lutomirski , paulmck , Boqun Feng , Paolo Bonzini , shuah , Benjamin Herrenschmidt , Paul Mackerras , linux-arm-kernel , linux-kernel , linux-csky , linux-mips@vger.kernel.org, linuxppc-dev , linux-s390@vger.kernel.org, KVM list , linux-kselftest , Peter Foley , Shakeel Butt , Ben Gardon Subject: Re: [PATCH 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs Message-ID: References: <20210818001210.4073390-1-seanjc@google.com> <20210818001210.4073390-5-seanjc@google.com> <1540548616.19739.1629409956315.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1540548616.19739.1629409956315.JavaMail.zimbra@efficios.com> Precedence: bulk List-ID: X-Mailing-List: linux-csky@vger.kernel.org On Thu, Aug 19, 2021, Mathieu Desnoyers wrote: > ----- On Aug 17, 2021, at 8:12 PM, Sean Christopherson seanjc@google.com wrote: > > > Add a test to verify an rseq's CPU ID is updated correctly if the task is > > migrated while the kernel is handling KVM_RUN. This is a regression test > > for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer > > to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM > > without updating rseq, leading to a stale CPU ID and other badness. > > > > Signed-off-by: Sean Christopherson > > --- > > [...] > > > + while (!done) { > > + vcpu_run(vm, VCPU_ID); > > + TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC, > > + "Guest failed?"); > > + > > + cpu = sched_getcpu(); > > + rseq_cpu = READ_ONCE(__rseq.cpu_id); > > + > > + /* > > + * Verify rseq's CPU matches sched's CPU, and that sched's CPU > > + * is stable. This doesn't handle the case where the task is > > + * migrated between sched_getcpu() and reading rseq, and again > > + * between reading rseq and sched_getcpu(), but in practice no > > + * false positives have been observed, while on the other hand > > + * blocking migration while this thread reads CPUs messes with > > + * the timing and prevents hitting failures on a buggy kernel. > > + */ > > I think you could get a stable cpu id between sched_getcpu and __rseq_abi.cpu_id > if you add a pthread mutex to protect: > > sched_getcpu and __rseq_abi.cpu_id reads > > vs > > sched_setaffinity calls within the migration thread. > > Thoughts ? I tried that and couldn't reproduce the bug. That's what I attempted to call out in the blurb "blocking migration while this thread reads CPUs ... prevents hitting failures on a buggy kernel". I considered adding arbitrary delays around the mutex to try and hit the bug, but I was worried that even if I got it "working" for this bug, the test would be too tailored to this bug and potentially miss future regression. Letting the two threads run wild seemed like it would provide the best coverage, at the cost of potentially causing to false failures. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBE7FC4338F for ; Thu, 19 Aug 2021 23:34:41 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E73B36108F for ; Thu, 19 Aug 2021 23:34:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E73B36108F Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.ozlabs.org Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4GrLhZ6wPKz3cPy for ; Fri, 20 Aug 2021 09:34:38 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20161025 header.b=qPf6yOnP; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=google.com (client-ip=2607:f8b0:4864:20::634; helo=mail-pl1-x634.google.com; envelope-from=seanjc@google.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20161025 header.b=qPf6yOnP; dkim-atps=neutral Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4GrLgq3qxdz2xfw for ; Fri, 20 Aug 2021 09:33:59 +1000 (AEST) Received: by mail-pl1-x634.google.com with SMTP id o10so4955501plg.0 for ; Thu, 19 Aug 2021 16:33:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=FSqOrapu64fBYafjbU5tjcyWOc2DrNI0dbheqGMVOMc=; b=qPf6yOnP+Rcz9FT3tKuWykTPcKe82BX3Np45gaPnFPSg6PdUJnLygVgfX+kzI3cJ9H jtESUvUqN1Ez+DTuoTUzsGouHIoQxocdabjlXnhhXdwEIL3SL3EPMESdP1tUYWRrWW5X 56WH9AV8Z8I1+Dv+ob4dltRCZxxppIzcXuZZbDajZOYY+UHcUj8zqAQ2N2zC8ir+rqzQ 7xRIzOV9hhvssqy3laN9jfhD8920m0bpiZR1ld4i0qnlmpXy3nqZRp+SXiFRJw3tk0Cb jAkYC7GX39D/8+SHQHrfwZ3zlTN2RbLaSSUFeut9erJLOEf2TnrtjesxTQV5Xd8hTAQL GItQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=FSqOrapu64fBYafjbU5tjcyWOc2DrNI0dbheqGMVOMc=; b=bc1KQK29vObd566HYDDa2VyCieR//jq09R/Zt7EQ5NoqU7UXjmy7Xq1Ddotd1Thziu GQmlTXmsx46WvPq4Sdm/3RxGAdUPb2Rm0GAnb6QtCsxf6TTdL2b/QxNzqNAj0DwUNyTR 3WfJrfraphMZHdwkO80ShVZaJ+pjtNyrSdSRE15o1SbHEY87cmmgLwkMx9ChD5vDUdYw x4v9PPlmmjp+lFsDRyMSlBREf3KKccP1zxSmAVkIhrF8jxfn1BqjKiegLFEnwjpe4VNf oPqokKXpusEXxWh25oYvVqSN/r46cWLua602ahQJiPTXaJvllcdTsfrwOpO3zQKmTYQl jEjg== X-Gm-Message-State: AOAM531jYe5K7g1mvrG660iax/IH55fEDjgqSbb4OwMwa+fiVg2QGhGK ABovlbnnA12XSpYNCX04iPaeOA== X-Google-Smtp-Source: ABdhPJyN4VYmA4QdN+DQJ6bFcB8JQlcPHEw8dvqiqzL/oaDUmvZhZqDC612NpYT8Mg6IsLybHUWCkQ== X-Received: by 2002:a17:90a:d686:: with SMTP id x6mr1336693pju.227.1629416036922; Thu, 19 Aug 2021 16:33:56 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id j185sm4649122pfb.86.2021.08.19.16.33.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Aug 2021 16:33:56 -0700 (PDT) Date: Thu, 19 Aug 2021 23:33:50 +0000 From: Sean Christopherson To: Mathieu Desnoyers Subject: Re: [PATCH 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs Message-ID: References: <20210818001210.4073390-1-seanjc@google.com> <20210818001210.4073390-5-seanjc@google.com> <1540548616.19739.1629409956315.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1540548616.19739.1629409956315.JavaMail.zimbra@efficios.com> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: KVM list , Peter Zijlstra , linux-kernel , Will Deacon , Guo Ren , linux-kselftest , Ben Gardon , shuah , Paul Mackerras , linux-s390@vger.kernel.org, gor , "Russell King, ARM Linux" , linux-csky , Christian Borntraeger , Ingo Molnar , Catalin Marinas , linux-mips@vger.kernel.org, Boqun Feng , paulmck , Heiko Carstens , rostedt , Shakeel Butt , Andy Lutomirski , Thomas Gleixner , Peter Foley , linux-arm-kernel , Thomas Bogendoerfer , Oleg Nesterov , Paolo Bonzini , linuxppc-dev Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Thu, Aug 19, 2021, Mathieu Desnoyers wrote: > ----- On Aug 17, 2021, at 8:12 PM, Sean Christopherson seanjc@google.com wrote: > > > Add a test to verify an rseq's CPU ID is updated correctly if the task is > > migrated while the kernel is handling KVM_RUN. This is a regression test > > for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer > > to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM > > without updating rseq, leading to a stale CPU ID and other badness. > > > > Signed-off-by: Sean Christopherson > > --- > > [...] > > > + while (!done) { > > + vcpu_run(vm, VCPU_ID); > > + TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC, > > + "Guest failed?"); > > + > > + cpu = sched_getcpu(); > > + rseq_cpu = READ_ONCE(__rseq.cpu_id); > > + > > + /* > > + * Verify rseq's CPU matches sched's CPU, and that sched's CPU > > + * is stable. This doesn't handle the case where the task is > > + * migrated between sched_getcpu() and reading rseq, and again > > + * between reading rseq and sched_getcpu(), but in practice no > > + * false positives have been observed, while on the other hand > > + * blocking migration while this thread reads CPUs messes with > > + * the timing and prevents hitting failures on a buggy kernel. > > + */ > > I think you could get a stable cpu id between sched_getcpu and __rseq_abi.cpu_id > if you add a pthread mutex to protect: > > sched_getcpu and __rseq_abi.cpu_id reads > > vs > > sched_setaffinity calls within the migration thread. > > Thoughts ? I tried that and couldn't reproduce the bug. That's what I attempted to call out in the blurb "blocking migration while this thread reads CPUs ... prevents hitting failures on a buggy kernel". I considered adding arbitrary delays around the mutex to try and hit the bug, but I was worried that even if I got it "working" for this bug, the test would be too tailored to this bug and potentially miss future regression. Letting the two threads run wild seemed like it would provide the best coverage, at the cost of potentially causing to false failures. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F17D5C4338F for ; Thu, 19 Aug 2021 23:35:54 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A7998610CF for ; Thu, 19 Aug 2021 23:35:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A7998610CF Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=eO3rOLM8VgnZ2ybnWWAwzI+QtAvGyzA11yZVyCz5qZ8=; b=WVC9wTuRMVyv63 zyaodq1KmM5wsm3MgC9+BhrzTZp8sdAOF8RUj2A2bcSATKQpdgDi0LHalg3l6+82HgdvKKsJv21gV AvKjYjXLLQKWjm5a4W1cZlXXGFDIIl4MbsI0guRvZkb24FRLJkqtghuWI1L1fW7Fnb2bMLb85lK9/ 1kI2D8Rn3PaZegBptPVbgb9mE4XPFmVO2V1Ffp1MbU8kKIfq/OcMbXa7ow+/ysxo7UomIgv3XDDXb xAYvH0aUPOgPJf8jbDOxTUVFWsnloKQIAhrqOsw9soDlhnTGKSp+zj5ZYQ4pvhMQL4ZXohwGG3d91 o8pvhcAp8WEt3Nb5fXEQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mGrXx-009kjn-5b; Thu, 19 Aug 2021 23:34:05 +0000 Received: from mail-pj1-x1029.google.com ([2607:f8b0:4864:20::1029]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mGrXt-009kip-IC for linux-arm-kernel@lists.infradead.org; Thu, 19 Aug 2021 23:34:02 +0000 Received: by mail-pj1-x1029.google.com with SMTP id oc2-20020a17090b1c0200b00179e56772d6so2653557pjb.4 for ; Thu, 19 Aug 2021 16:33:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=FSqOrapu64fBYafjbU5tjcyWOc2DrNI0dbheqGMVOMc=; b=qPf6yOnP+Rcz9FT3tKuWykTPcKe82BX3Np45gaPnFPSg6PdUJnLygVgfX+kzI3cJ9H jtESUvUqN1Ez+DTuoTUzsGouHIoQxocdabjlXnhhXdwEIL3SL3EPMESdP1tUYWRrWW5X 56WH9AV8Z8I1+Dv+ob4dltRCZxxppIzcXuZZbDajZOYY+UHcUj8zqAQ2N2zC8ir+rqzQ 7xRIzOV9hhvssqy3laN9jfhD8920m0bpiZR1ld4i0qnlmpXy3nqZRp+SXiFRJw3tk0Cb jAkYC7GX39D/8+SHQHrfwZ3zlTN2RbLaSSUFeut9erJLOEf2TnrtjesxTQV5Xd8hTAQL GItQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=FSqOrapu64fBYafjbU5tjcyWOc2DrNI0dbheqGMVOMc=; b=lTzQgIx5ds7c7OGrUnCdSblOqcanHTLOhgHVW1nUyDtEdVZv7DdH7b2RQswQVbDUTl K/tS0UjGTU3jJz8QPvp6i1dNsWmjTfQ+Ev/H5DzbDxK5j7b6o7UlWhx7f6AmmaKZHgtP hEnr/Z4/HuANJDIytwTb/y2HX4gE26lZCera37VxM8K4481SDBh0TBpi1ege3ZYRixcx 8Wtvnk44Ed973NhmO/ADUb9o9fJMrsPh+UTDwnjtZWVbi4cm7TgvePxRL62nST0wSpXK 0A4ektw3oIDIGJ83fZId3btmSCPLAb+d6dl2wb0aFWTws2WIl8czhyDG1ZXpC1ZiuZs0 FmrA== X-Gm-Message-State: AOAM531NxkWJluCzGL12PEwIf/IM1QS+kAUld1hmQQTmk1O6W0G9GsT/ H/jomx0wxSfbDvxhosWl/R7n2A== X-Google-Smtp-Source: ABdhPJyN4VYmA4QdN+DQJ6bFcB8JQlcPHEw8dvqiqzL/oaDUmvZhZqDC612NpYT8Mg6IsLybHUWCkQ== X-Received: by 2002:a17:90a:d686:: with SMTP id x6mr1336693pju.227.1629416036922; Thu, 19 Aug 2021 16:33:56 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id j185sm4649122pfb.86.2021.08.19.16.33.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Aug 2021 16:33:56 -0700 (PDT) Date: Thu, 19 Aug 2021 23:33:50 +0000 From: Sean Christopherson To: Mathieu Desnoyers Cc: "Russell King, ARM Linux" , Catalin Marinas , Will Deacon , Guo Ren , Thomas Bogendoerfer , Michael Ellerman , Heiko Carstens , gor , Christian Borntraeger , Oleg Nesterov , rostedt , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Andy Lutomirski , paulmck , Boqun Feng , Paolo Bonzini , shuah , Benjamin Herrenschmidt , Paul Mackerras , linux-arm-kernel , linux-kernel , linux-csky , linux-mips@vger.kernel.org, linuxppc-dev , linux-s390@vger.kernel.org, KVM list , linux-kselftest , Peter Foley , Shakeel Butt , Ben Gardon Subject: Re: [PATCH 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs Message-ID: References: <20210818001210.4073390-1-seanjc@google.com> <20210818001210.4073390-5-seanjc@google.com> <1540548616.19739.1629409956315.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1540548616.19739.1629409956315.JavaMail.zimbra@efficios.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210819_163401_652159_E36F13A5 X-CRM114-Status: GOOD ( 26.98 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, Aug 19, 2021, Mathieu Desnoyers wrote: > ----- On Aug 17, 2021, at 8:12 PM, Sean Christopherson seanjc@google.com wrote: > > > Add a test to verify an rseq's CPU ID is updated correctly if the task is > > migrated while the kernel is handling KVM_RUN. This is a regression test > > for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer > > to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM > > without updating rseq, leading to a stale CPU ID and other badness. > > > > Signed-off-by: Sean Christopherson > > --- > > [...] > > > + while (!done) { > > + vcpu_run(vm, VCPU_ID); > > + TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC, > > + "Guest failed?"); > > + > > + cpu = sched_getcpu(); > > + rseq_cpu = READ_ONCE(__rseq.cpu_id); > > + > > + /* > > + * Verify rseq's CPU matches sched's CPU, and that sched's CPU > > + * is stable. This doesn't handle the case where the task is > > + * migrated between sched_getcpu() and reading rseq, and again > > + * between reading rseq and sched_getcpu(), but in practice no > > + * false positives have been observed, while on the other hand > > + * blocking migration while this thread reads CPUs messes with > > + * the timing and prevents hitting failures on a buggy kernel. > > + */ > > I think you could get a stable cpu id between sched_getcpu and __rseq_abi.cpu_id > if you add a pthread mutex to protect: > > sched_getcpu and __rseq_abi.cpu_id reads > > vs > > sched_setaffinity calls within the migration thread. > > Thoughts ? I tried that and couldn't reproduce the bug. That's what I attempted to call out in the blurb "blocking migration while this thread reads CPUs ... prevents hitting failures on a buggy kernel". I considered adding arbitrary delays around the mutex to try and hit the bug, but I was worried that even if I got it "working" for this bug, the test would be too tailored to this bug and potentially miss future regression. Letting the two threads run wild seemed like it would provide the best coverage, at the cost of potentially causing to false failures. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel