From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAD5CC61DA4 for ; Tue, 7 Feb 2023 03:18:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 14D4D6B0078; Mon, 6 Feb 2023 22:18:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1068B6B0075; Mon, 6 Feb 2023 22:18:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB9466B0078; Mon, 6 Feb 2023 22:18:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id DB3B96B0074 for ; Mon, 6 Feb 2023 22:18:13 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B0B9D1203E1 for ; Tue, 7 Feb 2023 03:18:13 +0000 (UTC) X-FDA: 80439037266.14.3E2E95A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 83658180004 for ; Tue, 7 Feb 2023 03:18:11 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=R2uGBf2I; spf=pass (imf24.hostedemail.com: domain of leobras@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=leobras@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675739891; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GIxTMZmbsJ4n4h+wodJFJ7AH4L9sE2etRukeWGfprOQ=; b=MJ2Rn/jRSXcCPOVhefkHnqS1ensiRDLkeVXc7yg6bKMQ/uTmYuRAHv4wC8tJJt6zWU042I ud3q1kWV5rVYl1ugWCLgqh/BIpMYFJonRhVhTwlRIpbaUUHNo4YWI77iYVeGY163WMTLXg u7bWiw/p/RH1crcaGFsQrgbRWIaV+Vg= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=R2uGBf2I; spf=pass (imf24.hostedemail.com: domain of leobras@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=leobras@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675739891; a=rsa-sha256; cv=none; b=lLactYcaxdxfalBktzye/VHjOiKjvXOifgjWWlkWznh0hxueKXU453K7K5EfhM/mLVd/BO SJX8nknFQDdM0XrWEZ170KfPtrrHyDfhV1ShdKhpYmSyQtzwkc24XWreRJuw4LXGCA7jOK gl53vqeZRgIUsiAvWqzcOC0Q2z+CyBs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675739890; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GIxTMZmbsJ4n4h+wodJFJ7AH4L9sE2etRukeWGfprOQ=; b=R2uGBf2IQoxXr11Ktr15HSf/tx0vlri7yHWHdgbWo5fedBsdZYYukoyJ2dMm/I5Orzk0fv EixYL0Bz8ywbwLathflLGmKK+OgwYXrR6SeLJKmEUiQVTHe9exhyHhAdLhj2kNnHgzwuu3 4l6dvDc5kcbLL/4vIrORH4yW2cn9rnA= Received: from mail-oo1-f71.google.com (mail-oo1-f71.google.com [209.85.161.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-58-B6o3arrcNjOnnxlxU3vU4A-1; Mon, 06 Feb 2023 22:18:07 -0500 X-MC-Unique: B6o3arrcNjOnnxlxU3vU4A-1 Received: by mail-oo1-f71.google.com with SMTP id h1-20020a4abb81000000b005178afbdbfcso4100976oop.11 for ; Mon, 06 Feb 2023 19:18:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=GIxTMZmbsJ4n4h+wodJFJ7AH4L9sE2etRukeWGfprOQ=; b=Csoz+ZyllQLqCaekpiDqoi/ajp0+JaqkHht7dUDc0nkzqQ5+66zEPdOIzxRYflye11 BVv7/DmXcHOSFcKfsQA6h5Ts8RJMmwgYTFmjfB5vFxw+pwKZPE+DFVGBBI0mMeKU+hSE pF3P2iEtvIGeVfFqHb6DIXolytaWA1xyiG4vhQwE6BM/Y3pt4VnIcLWgV1ISETakPaja Wg0sZ3Pzvwkv8iYguVuIq/h9h7adPoaVFfUIYy0K5p19nlA7fK/ZYjXYNc/R0hMdxDf5 7qEM6TP0a/8FwUNhX0omT0bIRSawYL5GmsQ9TGGjLsZDUfUpLiKuWrLVe7W4VLP29oVy sOlw== X-Gm-Message-State: AO0yUKVyfGQ4QG00dzt7t7Utv81+wYPxLoxYSC8D2wR3ij688dxU4m6W 6VZyA8i1GTrfgLthOYL97mhGsg7MHIbhfi8AKdP3bD0qbBqRfMeOVemtIEYUIdoazUJ11oThJSC KFSWKUGkzpgY= X-Received: by 2002:aca:2809:0:b0:37a:2bf0:5025 with SMTP id 9-20020aca2809000000b0037a2bf05025mr5460893oix.19.1675739886856; Mon, 06 Feb 2023 19:18:06 -0800 (PST) X-Google-Smtp-Source: AK7set+7VS3H2qMxD1XHEeHGgHZ74B+TJyfGc4ZI4Mp5kB81qQes42VlFPDf7PJJ4nRSKOza3aEOWw== X-Received: by 2002:aca:2809:0:b0:37a:2bf0:5025 with SMTP id 9-20020aca2809000000b0037a2bf05025mr5460883oix.19.1675739886546; Mon, 06 Feb 2023 19:18:06 -0800 (PST) Received: from ?IPv6:2804:1b3:a800:9aa9:fdcb:7dec:9680:8417? ([2804:1b3:a800:9aa9:fdcb:7dec:9680:8417]) by smtp.gmail.com with ESMTPSA id z12-20020a54458c000000b003790759c310sm5040893oib.15.2023.02.06.19.18.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Feb 2023 19:18:05 -0800 (PST) Message-ID: <4b232f47e038ab6fcaa0114f73c28d4bf8799f84.camel@redhat.com> Subject: Re: [PATCH v2 0/5] Introduce memcg_stock_pcp remote draining From: Leonardo =?ISO-8859-1?Q?Br=E1s?= To: Roman Gushchin Cc: Michal Hocko , Marcelo Tosatti , Johannes Weiner , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Tue, 07 Feb 2023 00:18:01 -0300 In-Reply-To: References: <9e61ab53e1419a144f774b95230b789244895424.camel@redhat.com> <0122005439ffb7895efda7a1a67992cbe41392fe.camel@redhat.com> <28e08669302ad1e7a41bdf8b9988de6a352b5fe1.camel@redhat.com> User-Agent: Evolution 3.46.3 MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: pkez8wk563ygahpxgj4w38np7ruarhwg X-Rspamd-Queue-Id: 83658180004 X-HE-Tag: 1675739891-81923 X-HE-Meta: U2FsdGVkX1/JvSeBo0JTnyymO6Hub0GxacC6uQp2QRxrdHou2Tbcwb35VhpqMtWf2+aRK+2YWtWPTqfG4IBCppB0bCQnz24mgsYi6O3vHKu9AHAH0hwmwpS7OuPpmwaX8R97o70LlvUqE9vQpQJ6o8TCh1xvGipaNIrW+aHQBUJ/jv/HWNug+ayTxwF1F8WgcTmpfGtEEZxJvfAbme4Me0fMnFseyjDZ6wuRbQeZqcZY7hxckw5QJWzdXtgYoZgk/QRMeRJ/swFwudJBu3GmcSk2Tj8pdKn4pAzZsNYEvOb3CjHwuoqx4tLALmlUi/Hjmc1RxDTSqxM4ur6alJ3E3l6tqv/CMNMVHL67628hPSFnfHDdoIXndpF6gPBpNgb0TSanaeeVmOksJKEkAxz9ZTLZhO2RO8NouRfJrXxKcihRmxn8mBegZ2wlohcKf3D9QCY2Kvp1f+cTtFCKZvjCbVxHEEODUh4WNXe+A2EX/z0hH1c22POj+tJL8zhXK/3Ydfvkygd4FZQkI+J7xI5rFpCCnmQDczxXLnhq12lPqw8UdskRzOU3UBzFqj5SoBDwOotpuDxyiW4VX7m0XOuCxQNbYYP8/qJabgNdEeR8MJ3rFx03NdbIbG/nFNVdpLC3JrJP7VrFst9ivn27Kf1ghjPSVflPpyDjFiX9dVXU7QNWK8lUXQxuedxZpJyBeNximN+Qd1/US/SflHswu9O3qMXjm02x7/kg3X0I22zwZTEnKy9zlk1PxBGOBOlzy/RHHGKBmhpAAIvO19cYFGxfi6MeGtoj2Tqz2GlZAyCw962pRcnoWkF1LmIF0fk4QzbytzwDD7JsILpLQAdXPFtPYtvN8iMK62qBYu7XJ2TrGZMeFnfDC3h1wAlJpybzoE0Xj2uYXmxIjjKemb5JJoJfgxYF/vXda/HCdjkXIxGTDXiXVVZLfhGx+6aiNvWp9JJiLbGtqxpItcpoIQxvF/B QzuCrCTr ME488U6DN7UuP0c3O5GXoTxgWTk1SYDHpzJGgoev6C/gggWNHS435IKtxdEZEKSObsiRfH0g/VPzB/xmr6G2VVt9hj6Qgt3BbCUZFkMipojmJpa6/PUkRVzhsjkJQTPMPsJmrUGITwu5oY5HX9JLn17O5kbx0UCUtrThY9/6Ytbnu/DwptVwt+awK0DkJTxSDzmpXGDTh/lyju+K4zCnMYODnWUTGrWFjvNKR5Rgfz+rNW4UDno5DGCYg9nXdXnxwOQbw3pgvNd1+ZaXKCLf8/PSprCS1+AM7qzUOpe/kDYdHyjMYYeFH2UCSkwTQW3O1L1EMyPOtlKsyountRR5X5pWK4iUVdrdk/lcr2LpZaCNDT1uZ1wl0aPSK5jZPf4CGpZwnRYhOL9qMI+76V5lxbK1Gwk/8WlCX7BRdF+W7BOEL8MxfbehfZAOqeQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, 2023-02-05 at 11:49 -0800, Roman Gushchin wrote: > Hi Leonardo! Hello Roman, Thanks a lot for replying! >=20 > > Yes, but we are exchanging an "always schedule_work_on()", which is a k= ind of > > contention, for a "sometimes we hit spinlock contention". > >=20 > > For the spinlock proposal, on the local cpu side, the *worst case* cont= ention > > is: > > 1 - wait the spin_unlock() for a complete , > > 2 - wait a cache hit for local per-cpu cacheline=C2=A0 > >=20 > > What is current implemented (schedule_work_on() approach), for the loca= l > > cpu=C2=A0side there is *always* this contention: > > 1 - wait for a context switch, > > 2 - wait a cache hit from it's local per-cpu cacheline, > > 3 - wait a complete ,=C2=A0 > > 4 - then for a new context switch to the current thread. >=20 > I think both Michal and me are thinking of a more generic case in which t= he cpu > is not exclusively consumed by 1 special process, so that the draining wo= rk can > be executed during an idle time. In this case the work is basically free. Oh, it makes sense. But in such scenarios, wouldn't the same happens to spinlocks? I mean, most of the contention with spinlocks only happens if the remote cp= u is trying to drain the cache while the local cpu happens to be draining/chargi= ng, which is quite rare due to how fast the local cpu operations are. Also, if the cpu has some idle time, using a little more on a possible spin= lock contention should not be a problem. Right? >=20 > And the introduction of a spin_lock() on the hot path is what we're are c= oncerned > about. I agree, that on some hardware platforms it won't be that expensiv= e,=C2=A0 >=20 IIRC most hardware platforms with multicore supported by the kernel should = have the same behavior, since it's better to rely on cache coherence than lockin= g the memory bus. For instance, the other popular architectures supported by Linux use the LR= /SC strategy for atomic operations (tested on ARM, POWER, RISCV) and IIRC the LoadReserve slow part waits for the cacheline exclusivity, which is already already exclusive in this perCPU structure. > but in general not having any spinlocks is so much better. I agree that spinlocks may bring contention, which is not ideal in many cas= es. In this case, though, it may not be a big issue, due to very rare remote ac= cess in the structure, for the usual case (non-pre-OOMCG) >=20 > >=20 > > So moving from schedule_work_on() to spinlocks will save 2 context swit= ches per > > cpu every time drain_all_stock() is called. > >=20 > > On the remote cpu side, my tests point that doing the remote draining i= s faster > > than scheduling a local draining, so it's also a gain. > >=20 > > Also, IIUC the possible contention in the spinlock approach happens onl= y on > > page-faulting and syscalls, versus the schedule_work_on() approach that= can > > interrupt user workload at any time.=C2=A0 > >=20 > > In fact, not interrupting the user workload in isolated cpus is just a = bonus of > > using spinlocks. >=20 > I believe it significantly depends on the preemption model: you're right = regarding > fully preemptive kernels, but with voluntary/none preemption it's exactly= opposite: > the draining work will be executed at some point later (probably with 0 c= ost), So, in case of voluntary/none preemption with some free cpu time.=20 > while the remote access from another cpu will potentially cause delays on= the > spin lock as well as a need to refill the stock. But if there is some free CPU time, what is the issue of some (potential) d= elays due to spinlock contention? I am probably missing the whole picture, but when I think of performance improvement, I think on doing more with the same cputime. If we can use fre= e cputime to do stuff later, it's only fair to also use it in case of content= ion, right? I know there are some cases that may need to be more previsible (mostly RT)= , but when I think of memory allocation, I don't expect it to always take the sam= e time (as there are caches, pre-OOM, and so) Also, as previously discussed, in case of a busy cpu, the spinlock approach= will probably allow more work to be done. >=20 > Overall I'd expect a noticeable performance regression from an introducti= on of > spin locks and remote draining. Maybe not on all platforms, but at least = on some. > That's my main concern. >=20 I see.=20 For the platform I have tested (x86) I noticed better overall performance o= n spinlocks than upstream solution. For other popular platforms, I have brief= ly read some documentation on locking/atomicity and I think we may keep the performance gains. But to be sure, I could retake the tests on other platforms, such as ARM, P= OWER, RISCV, and so. Or even perform extra suggested tests. With that info, would you feel less concerned about a possible change in me= mcg pcp cache locking scheme? > And I don't think the problem we're aiming to solve here > justifies this potential regression. >=20 To be strict, the isolated cpu scheduling problem is already fixed by the housekeeping patch (with some limitations).=C2=A0 At this point, I am trying to bring focus to a (possible) performance improvement on the memcg pcp cache locking system. > Thanks! >=20 Thank you for helping me better understand your arguments and concerns. I really appreciate it! Best regards, Leo