From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DATE_IN_FUTURE_06_12, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MSGID_FROM_MTA_HEADER,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A59C5C432BE for ; Thu, 2 Sep 2021 02:09:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1F84B60FD7 for ; Thu, 2 Sep 2021 02:09:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 1F84B60FD7 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=os.amperecomputing.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 8D2478D0002; Wed, 1 Sep 2021 22:09:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 87EEC8D0001; Wed, 1 Sep 2021 22:09:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 682A78D0002; Wed, 1 Sep 2021 22:09:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0042.hostedemail.com [216.40.44.42]) by kanga.kvack.org (Postfix) with ESMTP id 52B9F8D0001 for ; Wed, 1 Sep 2021 22:09:44 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 196221696E for ; Thu, 2 Sep 2021 02:09:44 +0000 (UTC) X-FDA: 78541002288.25.2A31E2D Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2119.outbound.protection.outlook.com [40.107.223.119]) by imf27.hostedemail.com (Postfix) with ESMTP id 8A088700009D for ; Thu, 2 Sep 2021 02:09:43 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FS49ErYsP+jqq0zlrsjohJEg7iFc+nHaIYr3cZOWLHn6ATNocW50VqkDAG9S175QbGlqx69xKvMeG7EG6LN0wmP2OPyk3lX62L33Wt+eDtJjWGDm4JTy2y/kEwT2mz0PcC9txPY7uuaGSlyBX43xvgZH6QQ7XzswPdEDpdSKK131bqsoUiMyWgQUllmQ8ZeGEoxc/a4iW/0PO/XfTBcLvW5wfALhJMUh6JyKvLUWf6QXY35QjLMUQVDLyYK1rbzTXzb0P50pr9FJUe/MMKmgrvlCygHL8wvIDz2pfuhOpjBnF+z/q1zTkI4Plqase/4kgkR9pMVd23PPt4654F5gRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qrpuFmjL5FzZ2YOQmrFPRHu1YbgqBCZci4SrJE0dtY0=; b=ZWaGF3SfRqc6UDx9Qex2bJk6Z+eCkVpRgpoqujkjrbYslo8H1yIeb+t9teQoBn4q1b8kJnzX9O22LJ/WjxP9XASVSwnn3fcxN3zevM4glh5W0WyVMu0vUeGHD7uu+nQxZiSWos915eFEhVZXohq+34//eYp72FTg4GRJZi3j+STo0P66+NsZ3OeU5ITK0IU91zTM5ev/I82evaTwDBydvlWBLIySVwQQaDMAoKcyOjSvrMxpx7vUkTuKU05Zl1IBDmynmCTgKootDLxHTqc5lvWvBufWartJO9fLKLT7M0eKwyxWHT8CJT56nfLYYtnAcvcBA7x6C5PW2mZt/SLFuQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=os.amperecomputing.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qrpuFmjL5FzZ2YOQmrFPRHu1YbgqBCZci4SrJE0dtY0=; b=BxwP5a4JE1OYYHKxFAgsfJ7oe8qDwSQzg75XeRQ68j0wwudogKX13DQ6zOiH0CslbdwmiEjc8uTWUFXfjcfqRN4ugIMHS6A2QAgdh+eJTLDAuYtyX9dolLLfFBRiYqx5T9AOLwVGkN3mPhKhL3RYioPlZ9yF1JiDmqAcWIvjGDY= Received: from MWHPR0101MB3165.prod.exchangelabs.com (2603:10b6:301:2f::19) by MW2PR0102MB3529.prod.exchangelabs.com (2603:10b6:302:2::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.23; Thu, 2 Sep 2021 02:09:40 +0000 Received: from MWHPR0101MB3165.prod.exchangelabs.com ([fe80::ed89:1b21:10f4:ed56]) by MWHPR0101MB3165.prod.exchangelabs.com ([fe80::ed89:1b21:10f4:ed56%3]) with mapi id 15.20.4478.019; Thu, 2 Sep 2021 02:09:39 +0000 Date: Thu, 2 Sep 2021 10:08:06 +0000 From: Huang Shijie To: Linus Torvalds Cc: Al Viro , Shijie Huang , Andrew Morton , Linux-MM , "Song Bao Hua (Barry Song)" , Linux Kernel Mailing List , Frank Wang Subject: Re: Is it possible to implement the per-node page cache for programs/libraries? Message-ID: References: Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: CH0PR03CA0075.namprd03.prod.outlook.com (2603:10b6:610:cc::20) To MWHPR0101MB3165.prod.exchangelabs.com (2603:10b6:301:2f::19) MIME-Version: 1.0 Received: from hsj (180.167.209.74) by CH0PR03CA0075.namprd03.prod.outlook.com (2603:10b6:610:cc::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.18 via Frontend Transport; Thu, 2 Sep 2021 02:09:36 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 9c07b6aa-2003-40c4-09f3-08d96db6b870 X-MS-TrafficTypeDiagnostic: MW2PR0102MB3529: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: jBwJzt199XW/Oci/bUZjCRG0aJyLB3KcxOFeTE+DZzWhuYl59Ls8plmSGQFKYeNHGwMbFlhHS5Sai/kT464ViLrWcfDzEp2brv8I+eWnKey01h/dcqfBdPyOVx4zLdc3bDgH7CjjR9LNnge0cG4l2J9i2P4Bp6ecBrGkZOFLrVM97tdEpx9FOBo/KvkfJf83j/NAE3cSSVG6Q1J+XX+cn7uJXODAFc5DuyJETfcWkTqsepFWJttiHrSvSXrlBbpb8JtmBhy70hljfZ+UbR7Hj7GRij5Zm0zY5X6S8rPHWLLrosqksixhqIXT6QrljdG/IT5ZVxA+1EDb2bYaCKrJGHSn17W4CcQjVVYIHprX2BEYkhG/OX+MmQlrwiqku3CnkUfbItlspD/puu1QRXUuRe70RULohCWKjG7FDy1I1YzWvWK6CU6LXHWP1paiw/OkfMG5Quf6sH2cf+zP1JZWCIz8CPwwC9KTT64mbNtiDqZJkLDquoLdPb0P+7DvOilvLMwgm454GZckXOIAurg9VIAfigjg2a+y5hww5wNf/zgDpdOGsHNebojtsLJwkQ3X1S3TF6UFBkT+v5JfyLXTEboDukzpIBrp4eJjgZPwH/lub+d2LMDxLcQJrnoV9U1NTkDfffQ/m5DyFH8Ebqy8FcYW8DGJQxq6/vPN65ofpMuwCzFZVrM9YOjzWvAcG94UqZ1R7buP2mnkvIIJSAtyvw== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MWHPR0101MB3165.prod.exchangelabs.com;PTR:;CAT:NONE;SFS:(4636009)(366004)(396003)(136003)(376002)(39850400004)(346002)(186003)(66476007)(6916009)(55016002)(6496006)(9576002)(956004)(66556008)(9686003)(316002)(107886003)(86362001)(5660300002)(83380400001)(53546011)(66946007)(2906002)(52116002)(38350700002)(8676002)(8936002)(38100700002)(4326008)(26005)(6666004)(54906003)(33716001)(478600001);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?J79CUGleBRQHvTKdyw8T/WKONz/nohoCntvDcewvkQpbyNkFi3ivnVl66MRY?= =?us-ascii?Q?I4f8OTBfoJuE5BsW+k4IWaeDR/4decCRbNTdGqDfjK/QP5lPDZDvWgnqUEUi?= =?us-ascii?Q?jLmt4+DcdX4ygkRZ99GV8+y+HBJ4WlP5affoA8YauFLubdLkO4tIsO9wYH4S?= =?us-ascii?Q?3DaPwXStKKJz/XZlPFmB8W8zrIUdROB+WJr+60v7iXiYFI4pJTvpKTnbS7jG?= =?us-ascii?Q?2MrVFVHRc1+GvQMYYXpA+ezOlAA7DEmmpx+ov7pma4b/fQnAtNofQO1hkBJY?= =?us-ascii?Q?khrtUheJ+6jiSZPMgIzfG7ZsDzml7BYUISciUPlcUcL8sT9KoSSd7IkarEXp?= =?us-ascii?Q?27z1xLK0GB/evZhFmlpJoZnSISV/ppN474l3kEvhxUds1ez1eP08tNP4Lc+m?= =?us-ascii?Q?6WqVGIgZBPp6CviA6x6N1WyyTA2G69R6DoqaXBAc1d0N7/d2dsWUGAsS45f0?= =?us-ascii?Q?xviwWyygVqgrA7XEmcLbXi0hPLOH33YDtlvRID7U+Ytamt217qDpkrjnLLrA?= =?us-ascii?Q?4SP6UsVG0IfmoJJuECB9Thx1W6TuoVf/NDelhBpPIPU2Bk8LVe0psSamGa8Z?= =?us-ascii?Q?gmane+TkSaRh7eHrEvz/B5WSyPNMR0vUG2eHCtj9W32DLBfEHHcX4dNS2o9Z?= =?us-ascii?Q?nihzDJTbzwM8gYufhJgXRM1nYDy5lyMMbXFKH0ewNd1w7LPogyxGHgIBOoKy?= =?us-ascii?Q?AjDV3VjQ7zLPfMN+FX9Nq5R6HkA2jt5gBoeUilewSKdKsyQm0dxHhF19tdyL?= =?us-ascii?Q?y4lKGUo1NerdJXV7hFRkRsK68n3EgLGF6WsDnpvkIP/+YOtRyVXYiLPWiMDC?= =?us-ascii?Q?Hm1hf6f9pnYaMPQ3Si0tzaLqQ/kxINp0YMI7Ab4TfBp+invJJWAumLeT1YMx?= =?us-ascii?Q?FpTGC1WKwiKsJaHrJtZLBrUH0GW3PNGJpdDO7+u+m+BEPzcGiIZBzQmYBB9A?= =?us-ascii?Q?5SdGqt2M3vr+LiGmVYihZezA/YWUEOSfA5juF6qQ3kVi8iKDrlstBNK0d5uS?= =?us-ascii?Q?1yD8BVZ6ZOG6MMuXbJx58oC6f0tXJOXRpsGo4LQ7vsScI69j34KX/qYzPZ4T?= =?us-ascii?Q?vwPfs2X7at4Ge7RL4Phg84Z5XRIUBkegsk3dWFg+iXQDKYEwwfwzc9GhafvO?= =?us-ascii?Q?Wb6xUtvJb6TgECD76kK5QNXX+JFlnd60zxkj+ICFk375JSp7nT7iWe6MmEK1?= =?us-ascii?Q?KjVHTphS0NyMQEBFtf9mBMSAOIbk9ozo+7oWLwbFtHp3xCi4YX2jx5FGZhVz?= =?us-ascii?Q?gORmRjcnA6eiU5c0J2iWGHl5Gex+IIOP5VsOLHH1T6F0zlC/Fzo+3opy6jsn?= =?us-ascii?Q?11W99rQQ6nThzLf6HicAtuMb?= X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-Network-Message-Id: 9c07b6aa-2003-40c4-09f3-08d96db6b870 X-MS-Exchange-CrossTenant-AuthSource: MWHPR0101MB3165.prod.exchangelabs.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Sep 2021 02:09:39.7210 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 2WTNBDnjHCCsvXzCOtzje06kZ0x4Tcsh1fIWZixvA6ktrJguI4yyY5tcV5TjEeP5actrOmXstd/0O43j7sr+xiY/XK5hSi3fmxa9XlnX2Tz8TlbIBLIrk7KYw0/BjLlZ X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW2PR0102MB3529 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=os.amperecomputing.com header.s=selector2 header.b=BxwP5a4J; spf=pass (imf27.hostedemail.com: domain of Shijie@os.amperecomputing.com designates 40.107.223.119 as permitted sender) smtp.mailfrom=Shijie@os.amperecomputing.com; dmarc=pass (policy=quarantine) header.from=amperecomputing.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 8A088700009D X-Stat-Signature: 9pr9o9iihwdk3fs7c3bprcpdinjbsnpn X-HE-Tag: 1630548583-195600 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Linus, On Wed, Sep 01, 2021 at 10:29:01AM -0700, Linus Torvalds wrote: > On Wed, Sep 1, 2021 at 10:24 AM Linus Torvalds > wrote: > > > > But what you could do, if you wanted to, would be to catch the > > situation where you have lots of expensive NUMA accesses either using > > our VM infrastructure or performance counters, and when the mapping is > > a MAP_PRIVATE you just do a COW fault on them. > > > > Sounds entirely doable, and has absolutely nothing to do with the page > > cache. It would literally just be an "over-eager COW fault triggered > > by NUMA access counters". Yes. You are right, we can use COW. :) Actually we have _TWO_ levels to do the optimization for NUMA remote-access: 1.) the page cache which is independent to process. 2.) the process address space(page table). For 2.), we can use the over-eager COW: 2.1) I have finished a user patch for glibc which uses "over-eager COW" to do the text replication in NUMA. 2.2) Also a kernel patch uses the "over-eager COW" to do the replication for the programs itself in NUMA. (We may refine it to another topic..) > > Note how it would work perfectly fine for anonymous mappings too. Just > to reinforce the point that this has nothing to do with any page cache > issues. > > Of course, if you want to actually then *share* pages within a node > (rather than replicate them for each process), that gets more > exciting. Do we really need to change the page cache? The 2.1) above may produces one-copy "shared libraries pages" for each process, such glibc.so. Even in the same NUMA node 0, we may run two same processes. So it produces "two glibc.so" now. If We run 5 same processes in NUMA Node 0, it will produces "five glibs.so". But if we have per-node page cache for the glibc.so, we can do it like this: (1) disable the "over-eager COW" in the process. (2) use the per-node page cache's pages to different processes in the _SAME_ NUMA node. So all the processes in the same NUMA node, can use only one same page. (3) Processes in other NUMA nodes, use the pages belong to this node. By this way, we can save many pages, and provide more access speed in NUMA. Thanks Huang Shijie