From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98716C54EAA for ; Fri, 27 Jan 2023 06:55:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2CDA36B0074; Fri, 27 Jan 2023 01:55:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 27DE08E0001; Fri, 27 Jan 2023 01:55:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 145086B0078; Fri, 27 Jan 2023 01:55:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0650D6B0074 for ; Fri, 27 Jan 2023 01:55:52 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B1EAA16064C for ; Fri, 27 Jan 2023 06:55:51 +0000 (UTC) X-FDA: 80399668902.27.EFDB5A5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 874F540006 for ; Fri, 27 Jan 2023 06:55:48 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=c6i4U0o+; spf=pass (imf12.hostedemail.com: domain of leobras@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=leobras@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674802548; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5Py7HADKZYX3ssSPb3FNn4cVwhyfVYOvaEhQ0rSUd/s=; b=jwC/LDX21zNIzomzd2xw0dQTECS0kz1hX3B5I5AkxUNT6Y5APrX/fHzXdUBLa9CKI2hBXF 8/EEbDDaeuI48N6kpWQz3V67B+CC3cPV38RYkc9HETzTPwxYn/zcmO7RR/3PF67B4/LcJ5 bPOKVVhza79luk0L2r10G3wloyIaVLY= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=c6i4U0o+; spf=pass (imf12.hostedemail.com: domain of leobras@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=leobras@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674802548; a=rsa-sha256; cv=none; b=f2mJQjDCO84Pqm8LJiP6ZOGAmriYmAdFp4Pp2QuIye7P7YuAKdXuM2lyvNSGa1GTeF3LZl 7LOsaRUD4hJs3rSlOCXNtZ2eGux3UWlKZma46rKYF9CEFWeaWyBhy3jHefOTTDg5QIrzzZ rmVKYFYKcn2erIw7ZhaJC0OBOpt5/zQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674802547; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5Py7HADKZYX3ssSPb3FNn4cVwhyfVYOvaEhQ0rSUd/s=; b=c6i4U0o+yNqRHeajgdOo5S/lM+vSgKs4LKj2Ck1Ka8iNGmpO9gEo0Uj4MB3nJmhCGhF7EF svYr1UzMn59fRYzcMxYZpDJEhtDIMfTYjifzA5CSykijJe7bYcaQ/ZZa063d7O0py5RDo+ LzuMJZAu5+djbxO///d5Cg9x0VTYuGE= Received: from mail-ot1-f70.google.com (mail-ot1-f70.google.com [209.85.210.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-588-EUHY3yQMO_aYzAKf5gf7Kg-1; Fri, 27 Jan 2023 01:55:46 -0500 X-MC-Unique: EUHY3yQMO_aYzAKf5gf7Kg-1 Received: by mail-ot1-f70.google.com with SMTP id y18-20020a9d6352000000b00686b3ef5990so1953883otk.11 for ; Thu, 26 Jan 2023 22:55:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=NDdEzZIJpaMrus45LhANj9u0TRucPAk/LF4N1hRhX9Q=; b=Jr8nsvWyR086skP/Pd5m3gLnyuHriiaBxZOTXfPK6FITzamkZJgEnHPh88VW1204Yq Zg3lfuxFmyW/xg8N6d4govrRiR1tfAYJEO8WcuWajKLJ1ztFZ+p94Wp7wGUB3XqTbkSh Yf7fuLBYyKhijgUhUONytobS1P/qeu7ezQQEdV5dDjQ/D+uUUqTf0Czam/Yl1XuedOvT qj6AmeO4cMGO9c7ZAtRTITRPQ/X+DAHolgj8oWXP30ifdgEkJjH+ALewbjKvy+Fu5scd hqArX90u5q9ezWOgyICSRG4SWb2gPLH2KqLcoqu0uKJdc1Bo29jtTYdTci/WAAsLxD5K 8rxQ== X-Gm-Message-State: AFqh2krrQByebSI5Who+Zuy48KUMgyvFMdmmvtsfA6eDlRr7/3Wew5jC T5PXTYLHxaUAK6C+m4nBYIWMdkBtcb01rk7/7el3fN3FR4ooGE1Xy2MICHXRapOz5uodKpENY0k SN0YlaP3VBo4= X-Received: by 2002:a05:6830:1bdb:b0:686:56e2:2e41 with SMTP id v27-20020a0568301bdb00b0068656e22e41mr16544505ota.38.1674802546009; Thu, 26 Jan 2023 22:55:46 -0800 (PST) X-Google-Smtp-Source: AMrXdXvfswXh7DppF1VeSoZqsSmPMJugeFI1r9Oj3sddFZy7IuwLtSHny7FBVdKerPF7Mn9vk6qEzw== X-Received: by 2002:a05:6830:1bdb:b0:686:56e2:2e41 with SMTP id v27-20020a0568301bdb00b0068656e22e41mr16544488ota.38.1674802545731; Thu, 26 Jan 2023 22:55:45 -0800 (PST) Received: from ?IPv6:2804:1b3:a800:6912:c477:c73a:cf7c:3a27? ([2804:1b3:a800:6912:c477:c73a:cf7c:3a27]) by smtp.gmail.com with ESMTPSA id bm11-20020a0568081a8b00b0035a2f3e423esm1294570oib.32.2023.01.26.22.55.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Jan 2023 22:55:45 -0800 (PST) Message-ID: <0122005439ffb7895efda7a1a67992cbe41392fe.camel@redhat.com> Subject: Re: [PATCH v2 0/5] Introduce memcg_stock_pcp remote draining From: Leonardo =?ISO-8859-1?Q?Br=E1s?= To: Michal Hocko , Marcelo Tosatti Cc: Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 27 Jan 2023 03:55:39 -0300 In-Reply-To: References: <20230125073502.743446-1-leobras@redhat.com> <9e61ab53e1419a144f774b95230b789244895424.camel@redhat.com> User-Agent: Evolution 3.46.2 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 874F540006 X-Rspam-User: X-Stat-Signature: sx6kd5y5w1hkm6ij6y64mn1zq78k3xdi X-HE-Tag: 1674802548-735663 X-HE-Meta: U2FsdGVkX1/rArN9wzGmlNzxdsmmCxp1y83jzmc1c4gbFnLvZfg/qojSkfSjqkyPOYMhjbaiOX0kPPkMp/4FHRW5VvNV/aI7GVImShbqCgkUa05PhlzHZaMkZ0WOW4ZOm8iBtT3TMUVbkK/PKjsPLCRnVlFW2z6Trmhp1rwEo3jOifdK0ueyBB8BSEY5INjtFnOVlWGW80xbgeYnoukChwLi0g0goE1dGnK6t7zMue9AxXyAcGlyBN50SrZ2x/th44my4CfMs8IK6mF6+8XWmR4czpdM+K5rsvYtVwBFRwe3KQQuAWQ1QJnF/J6QTBjGSJk3ckQMS6UMncvkLC/5qAo9eIMwgNIDtnkvXWgwdUSdeFRDBFQv/1PFkNpYqvzDkRAaLjzdtvRehpgvWYQx6Guxh85FKkgB4+3/SpELlqJBjuAdNkphV2Q2CQ9bsjx1YgNT/fIURxIFPPWw413bQOvhIT/d/K2lcWM6f3h8vwhDz13WiC4yMuT25CcVMHK0iKk7eMWlYT5IOhqOjol01F2rG5vNJHrzpsFOU9zR0B18d90y84iMXW+cCEsKB+by2nrMz7ErKVsvXQ3a0pD3sVaoICArQjZcx/ZKMmJeskpsHMSAHZQGeuwrB+jLdMXa8D9AwoEjKMJBf627o+LT2Nz5faepZdgE/S+33zg/aFH+BUs0AYzZQylwQNkMRQu4Ng+e8SgrFBqCnEPKsASuCvblpK55WVTEE3hxgp11ctHGvaOBRl5F02K2sx51RE+sB8khuw/IisBYvb1yBN35m2J8y2CnzrZIbhLeRKths6cPk5hx/q3ZSJ947pzfeARB4JevmoWHYLTxWXja5bhRxJ1RZBxvsYxopln1/j7joBrfoFw4VS4NhWrtscgIP3xDzTc/OECgTZWVi9/Bx4nlQauBCe+aasl/iYgwfODK8tSt74LwUUHUvo3KsijcwfNnBLXqPZyN0NKq2U8F5hZ waFtBZ06 KBoWlSPHHq4mxG2GEovRgN2YwggagCm+itHwIIYEz4QbbBZK03asq2ZzAmy8G8ZKQDsPd8DmWsH4R0SezlK3zbX+UKzvbR4fLfqwhSP/VxOkTcErXvh7ovj/PNlBoy5C2Q/bTA8EEcjTupSc+pVPwntoBJmt0Xn1Rn5b7OjEkJlQ72JN3zCDCqYGLQgVfb9lg+5rVPr07AI8+l8/9mS5F9vEaMUcvEKdjL4oKTb+UvJDF6cPiwmiKiR4fDzEFbzE1J+Ru4OWiyl0VFZd7pTWVSB8Yuxm5LNTh44EFD/8PAF0nh6r5W8K+BJANRizem7tT1fZBWJ1rXYSz5joP799BPYfqH7rFbn+4jyAyubeTFUysVi6t8GReDo/Nt+QLkqZkMT7NjgfDCt5ElOB5a0OdmAc+rkIigcb/cVu8D1Es+52kYmhCf9SxwTwpeiBqVutzr1NWx8SDqJX9LwtL97GkgTx4u/VHVnnIi8dQywaXklhKkZaWvX4eLp6XYw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 2023-01-26 at 20:13 +0100, Michal Hocko wrote: > On Thu 26-01-23 15:14:25, Marcelo Tosatti wrote: > > On Thu, Jan 26, 2023 at 08:45:36AM +0100, Michal Hocko wrote: > > > On Wed 25-01-23 15:22:00, Marcelo Tosatti wrote: > > > [...] > > > > Remote draining reduces interruptions whether CPU=20 > > > > is marked as isolated or not: > > > >=20 > > > > - Allows isolated CPUs from benefiting of pcp caching. > > > > - Removes the interruption to non isolated CPUs. See for example=20 > > > >=20 > > > > https://lkml.org/lkml/2022/6/13/2769 > > >=20 > > > This is talking about page allocato per cpu caches, right? In this pa= tch > > > we are talking about memcg pcp caches. Are you sure the same applies > > > here? > >=20 > > Both can stall the users of the drain operation. >=20 > Yes. But it is important to consider who those users are. We are > draining when > =09- we are charging and the limit is hit so that memory reclaim > =09 has to be triggered. > =09- hard, high limits are set and require memory reclaim. > =09- force_empty - full memory reclaim for a memcg > =09- memcg offlining - cgroup removel - quite a heavy operation as > =09 well. > all those could be really costly kernel operations and they affect > isolated cpu only if the same memcg is used by both isolated and non-isol= ated > cpus. In other words those costly operations would have to be triggered > from non-isolated cpus and those are to be expected to be stalled. It is > the side effect of the local cpu draining that is scheduled that affects > the isolated cpu as well. >=20 > Is that more clear? I think so, please help me check: IIUC, we can approach this by dividing the problem in two working modes: 1 - Normal, meaning no drain_all_stock() running. 2 - Draining, grouping together pre-OOM and userspace 'config' : changing, destroying, reconfiguring a memcg. For (1), we will have (ideally) only local cpu working on the percpu struct= . This mode will not have any kind of contention, because each CPU will hold = it's own spinlock only.=20 For (2), we will have a lot of drain_all_stock() running. This will mean a = lot of schedule_work_on() running (on upstream) or possibly causing contention,= i.e. local cpus having to wait for a lock to get their cache, on the patch propo= sal. Ok, given the above is correct: # Some arguments point that (1) becomes slower with this patch. This is partially true: while test 2.2 pointed that local cpu functions run= ning time had became slower by a few cycles, test 2.4 points that the userspace perception of it was that the syscalls and pagefaulting actually became fas= ter: During some debugging tests before getting the performance on test 2.4, I noticed that the 'syscall + write' test would call all those functions that became slower on test 2.2. Those functions were called multiple millions of times during a single test, and still the patched version performance test returned faster for test 2.4 than upstream version. Maybe the functions bec= ame slower, but overall the usage of them in the usual context became faster. Is not that a small improvement? # Regarding (2), I notice that we fear contention=20 While this seems to be the harder part of the discussion, I think we have e= nough data to deal with it.=20 In which case contention would be a big problem here?=C2=A0 IIUC it would be when a lot of drain_all_stock() get running because the me= mory limit is getting near.=C2=A0I mean, having the user to create / modify a me= mcg multiple times a second for a while is not something that is expected, IMHO= . Now, if I assumed correctly and the case where contention could be a proble= m is on a memcg with high memory pressure, then we have the argument that Marcel= o Tosatti brought to the discussion[P1]: using spinlocks on percpu caches for= page allocation brought better results than local_locks + schedule_work_on(). I mean, while contention would cause the cpu to wait for a while before get= ting the lock for allocating a page from cache, something similar would happen w= ith schedule_work_on(), which would force the current task to wait while the draining happens locally.=C2=A0 What I am able to see is that, for each drain_all_stock(), for each cpu get= ting drained we have the option to (a) (sometimes) wait for a lock to be freed, = or (b) wait for a whole context switch to happen. And IIUC, (b) is much slower than (a) on average, and this is what causes t= he improved performance seen in [P1]. (I mean, waiting while drain_local_stock() runs in the local CPU vs waiting= for it to run on the remote CPU may not be that different, since the cacheline = is already writen to by the remote cpu on Upstream) Also according to test 2.2, for the patched version, drain_local_stock() ha= ve gotten faster (much faster for 128 cpus), even though it does all the drain= ing instead of just scheduling it on the other cpus.=C2=A0 I mean, summing that to the brief nature of local cpu functions, we may not= hit contention as much as we are expected. ## Sorry for the long text. I may be missing some point, please let me know if that's the case. Thanks a lot for reviewing! Leo [P1]: https://lkml.org/lkml/2022/6/13/2769