From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BCCBC433DF for ; Wed, 14 Oct 2020 15:31:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9CF212222A for ; Wed, 14 Oct 2020 15:31:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="g/hy24jv" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9CF212222A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 20EBD6B0062; Wed, 14 Oct 2020 11:31:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1E47B6B006E; Wed, 14 Oct 2020 11:31:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0AAD36B0070; Wed, 14 Oct 2020 11:31:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0171.hostedemail.com [216.40.44.171]) by kanga.kvack.org (Postfix) with ESMTP id D1DB36B0062 for ; Wed, 14 Oct 2020 11:31:31 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5A53D180AD815 for ; Wed, 14 Oct 2020 15:31:31 +0000 (UTC) X-FDA: 77370920382.07.slip85_2612c752720c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 36D021803F9A8 for ; Wed, 14 Oct 2020 15:31:31 +0000 (UTC) X-HE-Tag: slip85_2612c752720c X-Filterd-Recvd-Size: 5308 Received: from mail-ed1-f68.google.com (mail-ed1-f68.google.com [209.85.208.68]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Wed, 14 Oct 2020 15:31:30 +0000 (UTC) Received: by mail-ed1-f68.google.com with SMTP id 33so3539963edq.13 for ; Wed, 14 Oct 2020 08:31:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=lLaoTqSkVwqSk2kesrGYJgFsDCH26vFshkH1oOI8pwo=; b=g/hy24jvrFDBm71npM+gxMiD3HSgxwvb5f2YmF3TNA9MuHmH3FuodKDQvI3WmsriQm dMAh+LL1qkl/6kCZXDIRLTi37VYcN3yhF9WpoeeP0t0NRnWZNKv82E4hzn+xxqmSp5PO 4ZdA083QHu2nqLkQsDMw5CpDN+fAliBkRWNv+LdE3y0as12ZLwgi+wWxkTVTcs+yolhn 7XveiZaUWqHDI5HzNL3Wd3iqgBb/aHUxvtjvjQewP28Vcu/0fWnwJXYD2Gin5jvgJM8N KfADG5ed8Cm8P9xthfENabJDg8/+zUtAXV8c0org2AuvD4c/LWNGhW9VWhTExTmNyoRA elOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=lLaoTqSkVwqSk2kesrGYJgFsDCH26vFshkH1oOI8pwo=; b=UQ/gztPFhSEWXnll6e1vFrBxrXguIqvC/JUTvCNAvJS3FqJDn6Px8D9Y83Du1h68/Y lcrAcPcFfzRraTqVLgeKzdz+nIjbE+JZSu8giYriMcI5Xce2wwSyh5L94YyGDqRbPO5I NzlrJHM7T5mLgaUjEXuZrIm3nSzjNF3c9bgw9gwl7ocb9tJTx0/VLfY22qXir3kwokiL WsyxIjXiOndlT5cf91SscgPoysv7N4qMV9Fcue2Eh7/1su2v++2RB0e4CxQmbmcoLdTk jg3UuobyCF+dWJpI0/IMRPu/hOsIcw73F/GBWJjeiV3GU8m2F6iS47gphPzPGQAeDKQj ZaPw== X-Gm-Message-State: AOAM5328QdsfpG88yc1umnvkRWlEgtwzs3jrFEEjKDIIfIpDG1p+WEum RAnuuoUxPetk1DgPRKEw5TY= X-Google-Smtp-Source: ABdhPJzaO3cAu5cpAAZ9DHBxIBB1nK+e+HXCHSadJyqFA4sTOKKQikXRRxIZMZ2pZCaq/XPcKtkVoA== X-Received: by 2002:aa7:d79a:: with SMTP id s26mr5706740edq.251.1602689489821; Wed, 14 Oct 2020 08:31:29 -0700 (PDT) Received: from gmail.com (563B81C8.dsl.pool.telekom.hu. [86.59.129.200]) by smtp.gmail.com with ESMTPSA id b6sm1951089edu.21.2020.10.14.08.31.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Oct 2020 08:31:29 -0700 (PDT) Date: Wed, 14 Oct 2020 17:31:27 +0200 From: Ingo Molnar To: Ankur Arora Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kirill@shutemov.name, mhocko@kernel.org, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Tony Luck , Sean Christopherson , Mike Rapoport , Xiaoyao Li , Fenghua Yu , "Peter Zijlstra (Intel)" , Dave Hansen Subject: Re: [PATCH 7/8] x86/cpu/intel: enable X86_FEATURE_NT_GOOD on Intel Broadwellx Message-ID: <20201014153127.GB1424414@gmail.com> References: <20201014083300.19077-1-ankur.a.arora@oracle.com> <20201014083300.19077-8-ankur.a.arora@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201014083300.19077-8-ankur.a.arora@oracle.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: * Ankur Arora wrote: > System: Oracle X6-2 > CPU: 2 nodes * 10 cores/node * 2 threads/core > Intel Xeon E5-2630 v4 (Broadwellx, 6:79:1) > Memory: 256 GB evenly split between nodes > Microcode: 0xb00002e > scaling_governor: performance > L3 size: 25MB > intel_pstate/no_turbo: 1 > > Performance comparison of 'perf bench mem memset -l 1' for x86-64-stosb > (X86_FEATURE_ERMS) and x86-64-movnt (X86_FEATURE_NT_GOOD): > > x86-64-stosb (5 runs) x86-64-movnt (5 runs) speedup > ----------------------- ----------------------- ------- > size BW ( pstdev) BW ( pstdev) > > 16MB 17.35 GB/s ( +- 9.27%) 11.83 GB/s ( +- 0.19%) -31.81% > 128MB 5.31 GB/s ( +- 0.13%) 11.72 GB/s ( +- 0.44%) +121.84% > 1024MB 5.42 GB/s ( +- 0.13%) 11.78 GB/s ( +- 0.03%) +117.34% > 4096MB 5.41 GB/s ( +- 0.41%) 11.76 GB/s ( +- 0.07%) +117.37% > + if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X) > + set_cpu_cap(c, X86_FEATURE_NT_GOOD); So while I agree with how you've done careful measurements to isolate bad microarchitectures where non-temporal stores are slow, I do think this approach of opt-in doesn't scale and is hard to maintain. Instead I'd suggest enabling this by default everywhere, and creating a X86_FEATURE_NT_BAD quirk table for the bad microarchitectures. This means that with new microarchitectures we'd get automatic enablement, and hopefully chip testing would identify cases where performance isn't as good. I.e. the 'trust but verify' method. Thanks, Ingo