From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, LOTS_OF_MONEY,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EEB9C5519F for ; Wed, 18 Nov 2020 13:35:03 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CEA36221FC for ; Wed, 18 Nov 2020 13:35:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="rQWhnxNP"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="QImmPshd" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CEA36221FC Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=nkZMkppy/hR52U438OBDBOmmUenbFomjUbw+RTjBZlE=; b=rQWhnxNPka9CX0IjIP1Na3bmO /eXJrJzWsp/BL5y2dlI0AKx7DOj8GmLDBJ05f0OmqXB5lbsWxGEkQh9iX9PgTTpHm+QFdVBLoe4EI ihzqab2TLq5JeiboV1iiFQSV3f+GCH8RiJ30SxfLzK+IDpa3s8gaFkmy2qwEj3rtWdh1rJ9bkaNhd M4eEs6CZ9kjEAQGpp5oAI8M/ny2OI86M6bKIg4t31OOXMxtk9FogBBOS26YI8yeHEjJoYGbSld64F zVH9VxEF5iCZUIM5EbX7tSCuOrkW0upHRy3YsQyVOYSz8BYBbKSbvjUkM+DLRnum4K2Rjvi3TnB42 bqGrF89+w==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kfNal-0006em-V0; Wed, 18 Nov 2020 13:33:47 +0000 Received: from mail-wm1-x341.google.com ([2a00:1450:4864:20::341]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kfNai-0006di-9e for linux-arm-kernel@lists.infradead.org; Wed, 18 Nov 2020 13:33:45 +0000 Received: by mail-wm1-x341.google.com with SMTP id c9so3024280wml.5 for ; Wed, 18 Nov 2020 05:33:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=KhSKm6xXD/S2USk/xN5rMNMtHJTfjEBBg1GQZ9wQdwM=; b=QImmPshdTjV1NJrDADCTneKdpxi5uJioL+eQ8QDHDpdv0hH30nPGo0kDGvD/zwALyd 30vCXYj2RStj1hB7kh4JE3YXMMTbI9H07/h8WOebJDrrsFvsP8xjQy72AMrNINrzqeMj tqUnoDB4cNjCxmsiFdUHG5vQmuoXBMKuGmdgn8yY2JDXREAMiNPwX2ht7UHmw3IYEFV3 ll80L/0XCHkm6Z+ZW+AF96hQ8IWr7pMQEEpcbXPSOEnYwaHvs7MWot6fdjzTzn56q/fh zO0ccXySdqQIY05+P9m+fDk4b43TH+3zK5ByjKaqZS1o2FJ/eQJt8+tiKMwScKV41+Tp gJKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=KhSKm6xXD/S2USk/xN5rMNMtHJTfjEBBg1GQZ9wQdwM=; b=YRogrtXcZ6Bl1ETaeuB3oEaqJjoqtoHM+t+H0PxAzhf5m2BO/kE6gt0HBP5ZirYqvx 6IjQlLl37CtL79uEqA1r+TRnRKL+88ZSsjg3M25saB18Ei4jBCtFgGKOmjwW/5m7k0y5 yMn5amrE3PxWtdqwySWoaWoMkxKGwc9DEmAZP+qHca2YeaJHqMOc6tq3b5Te5KmbMcqG LVfrlYAGkZPRdbxJdwe2e0ZN84jSAkO1iDuve5Fh1nWSw+gt/wzQWveRUvwXviq95Osn YL37QBbIWXUBXZmTOmUbRAaFAzcgclrlj9bRhajkue8mGSGZnxyYe9BpkZfy6FKGgLVT HPBw== X-Gm-Message-State: AOAM5321CBOj7bpGeDjhBMtckB4zz5XT3YC/mGZ4GAYC9TXOGDrnBci7 JFVICNmw0J8I06jNLPoIUr/scA== X-Google-Smtp-Source: ABdhPJzlRM1pabR6XP3qseZnb4OnIlGC+nOLcf09PaLAN+UpslI69NrtJvWxE+TCMD1dmy53V6aLqw== X-Received: by 2002:a1c:2d93:: with SMTP id t141mr80222wmt.104.1605706421330; Wed, 18 Nov 2020 05:33:41 -0800 (PST) Received: from elver.google.com ([2a00:79e0:15:13:f693:9fff:fef4:2449]) by smtp.gmail.com with ESMTPSA id b124sm3919845wmh.13.2020.11.18.05.33.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Nov 2020 05:33:39 -0800 (PST) Date: Wed, 18 Nov 2020 14:33:33 +0100 From: Marco Elver To: Peter Zijlstra Subject: Re: [PATCH] sched: Fix data-race in wakeup Message-ID: <20201118133333.GA1506553@elver.google.com> References: <20201116091054.GL3371@techsingularity.net> <20201116131102.GA29992@willie-the-truck> <20201116133721.GQ3371@techsingularity.net> <20201116142005.GE3121392@hirez.programming.kicks-ass.net> <20201116193149.GW3371@techsingularity.net> <20201117083016.GK3121392@hirez.programming.kicks-ass.net> <20201117091545.GA31837@willie-the-truck> <20201117092936.GA3121406@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20201117092936.GA3121406@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.14.6 (2020-07-11) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201118_083344_395690_A9ABB3F0 X-CRM114-Status: GOOD ( 15.26 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Davidlohr Bueso , paulmck@kernel.org, Will Deacon , linux-kernel@vger.kernel.org, Mel Gorman , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Nov 17, 2020 at 10:29AM +0100, Peter Zijlstra wrote: [...] > > Now the million dollar question is why KCSAN hasn't run into this. Hrmph. > > kernel/sched/Makefile:KCSAN_SANITIZE := n > > might have something to do with that, I suppose. For the record, I tried to reproduce this data race. I found a read/write race on this bitfield, but not yet that write/write race (perhaps I wasn't running the right workload). | read to 0xffff8d4e2ce39aac of 1 bytes by task 5269 on cpu 3: | __sched_setscheduler+0x4a9/0x1070 kernel/sched/core.c:5297 | sched_setattr kernel/sched/core.c:5512 [inline] | ... | | write to 0xffff8d4e2ce39aac of 1 bytes by task 5268 on cpu 1: | __schedule+0x296/0xab0 kernel/sched/core.c:4462 prev->sched_contributes_to_load = | schedule+0xd1/0x130 kernel/sched/core.c:4601 | ... | | Full report: https://paste.debian.net/hidden/07a50732/ Getting to the above race also required some effort as 1) I kept hitting other unrelated data races in the scheduler and had to silence those first to be able to make progress, and 2) only enable KCSAN for scheduler code to just ignore all other data races. Then I let syzkaller run for a few minutes. Also note our default KCSAN config is suboptimal. For serious debugging, I'd recommend the same config that rcutorture uses with the --kcsan flag, specifically: CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n, CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n to get the full picture. However, as a first step, it'd be nice to eventually remove the KCSAN_SANITIZE := n from kernel/sched/Makefile when things are less noisy (so that syzbot and default builds can start finding more serious issues, too). Thanks, -- Marco _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.7 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, LOTS_OF_MONEY,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A35EC63697 for ; Wed, 18 Nov 2020 13:34:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D12A220825 for ; Wed, 18 Nov 2020 13:34:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="QImmPshd" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726238AbgKRNdo (ORCPT ); Wed, 18 Nov 2020 08:33:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58624 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725613AbgKRNdn (ORCPT ); Wed, 18 Nov 2020 08:33:43 -0500 Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0B80C0613D4 for ; Wed, 18 Nov 2020 05:33:42 -0800 (PST) Received: by mail-wm1-x341.google.com with SMTP id 1so2743009wme.3 for ; Wed, 18 Nov 2020 05:33:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=KhSKm6xXD/S2USk/xN5rMNMtHJTfjEBBg1GQZ9wQdwM=; b=QImmPshdTjV1NJrDADCTneKdpxi5uJioL+eQ8QDHDpdv0hH30nPGo0kDGvD/zwALyd 30vCXYj2RStj1hB7kh4JE3YXMMTbI9H07/h8WOebJDrrsFvsP8xjQy72AMrNINrzqeMj tqUnoDB4cNjCxmsiFdUHG5vQmuoXBMKuGmdgn8yY2JDXREAMiNPwX2ht7UHmw3IYEFV3 ll80L/0XCHkm6Z+ZW+AF96hQ8IWr7pMQEEpcbXPSOEnYwaHvs7MWot6fdjzTzn56q/fh zO0ccXySdqQIY05+P9m+fDk4b43TH+3zK5ByjKaqZS1o2FJ/eQJt8+tiKMwScKV41+Tp gJKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=KhSKm6xXD/S2USk/xN5rMNMtHJTfjEBBg1GQZ9wQdwM=; b=mxvZA200dvNgV5zMvWEqlVB4iRx1pToMXseFSX9J+zlkcSW3Eot5F/LseVyw0e/Yq8 /qrQZH619RJGjTO01PJrUc5DFXEEYBSBwXshi/UQsRERh3FIR7SGVFem53O9+3kehtFo WX34zYrInGumfjQO4S7VXJfzAzUfQR5G4SlhIF5LWzU/ahqat0T7sql8PHIuKyY6Dt71 Y+BjaaDgTZsxi51AuOVpbR1Kfo8G0a2Nf75StyU4k4n2GBLUvtKKLnAwhDge7AYfsV0U qJ9Krmeib/CEApqCSsi2XJJvaVyix5mrBNAOvGIDKoGLnuKULL8HPVaeJybyJgO7I40H 462Q== X-Gm-Message-State: AOAM5315Jo1GiJrxg76iNF1Xb42AZX0ylwD+HD8Ln+S7A1DGirPHTmCl NpKb+lemkqYF93pLTeO9MXCYfw== X-Google-Smtp-Source: ABdhPJzlRM1pabR6XP3qseZnb4OnIlGC+nOLcf09PaLAN+UpslI69NrtJvWxE+TCMD1dmy53V6aLqw== X-Received: by 2002:a1c:2d93:: with SMTP id t141mr80222wmt.104.1605706421330; Wed, 18 Nov 2020 05:33:41 -0800 (PST) Received: from elver.google.com ([2a00:79e0:15:13:f693:9fff:fef4:2449]) by smtp.gmail.com with ESMTPSA id b124sm3919845wmh.13.2020.11.18.05.33.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Nov 2020 05:33:39 -0800 (PST) Date: Wed, 18 Nov 2020 14:33:33 +0100 From: Marco Elver To: Peter Zijlstra Cc: Will Deacon , Mel Gorman , Davidlohr Bueso , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, paulmck@kernel.org Subject: Re: [PATCH] sched: Fix data-race in wakeup Message-ID: <20201118133333.GA1506553@elver.google.com> References: <20201116091054.GL3371@techsingularity.net> <20201116131102.GA29992@willie-the-truck> <20201116133721.GQ3371@techsingularity.net> <20201116142005.GE3121392@hirez.programming.kicks-ass.net> <20201116193149.GW3371@techsingularity.net> <20201117083016.GK3121392@hirez.programming.kicks-ass.net> <20201117091545.GA31837@willie-the-truck> <20201117092936.GA3121406@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201117092936.GA3121406@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.14.6 (2020-07-11) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 17, 2020 at 10:29AM +0100, Peter Zijlstra wrote: [...] > > Now the million dollar question is why KCSAN hasn't run into this. Hrmph. > > kernel/sched/Makefile:KCSAN_SANITIZE := n > > might have something to do with that, I suppose. For the record, I tried to reproduce this data race. I found a read/write race on this bitfield, but not yet that write/write race (perhaps I wasn't running the right workload). | read to 0xffff8d4e2ce39aac of 1 bytes by task 5269 on cpu 3: | __sched_setscheduler+0x4a9/0x1070 kernel/sched/core.c:5297 | sched_setattr kernel/sched/core.c:5512 [inline] | ... | | write to 0xffff8d4e2ce39aac of 1 bytes by task 5268 on cpu 1: | __schedule+0x296/0xab0 kernel/sched/core.c:4462 prev->sched_contributes_to_load = | schedule+0xd1/0x130 kernel/sched/core.c:4601 | ... | | Full report: https://paste.debian.net/hidden/07a50732/ Getting to the above race also required some effort as 1) I kept hitting other unrelated data races in the scheduler and had to silence those first to be able to make progress, and 2) only enable KCSAN for scheduler code to just ignore all other data races. Then I let syzkaller run for a few minutes. Also note our default KCSAN config is suboptimal. For serious debugging, I'd recommend the same config that rcutorture uses with the --kcsan flag, specifically: CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n, CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n to get the full picture. However, as a first step, it'd be nice to eventually remove the KCSAN_SANITIZE := n from kernel/sched/Makefile when things are less noisy (so that syzbot and default builds can start finding more serious issues, too). Thanks, -- Marco