From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BAD5CC61DA4
	for <linux-mm@archiver.kernel.org>; Tue,  7 Feb 2023 03:18:14 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 14D4D6B0078; Mon,  6 Feb 2023 22:18:14 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 1068B6B0075; Mon,  6 Feb 2023 22:18:14 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id EB9466B0078; Mon,  6 Feb 2023 22:18:13 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id DB3B96B0074
	for <linux-mm@kvack.org>; Mon,  6 Feb 2023 22:18:13 -0500 (EST)
Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay02.hostedemail.com (Postfix) with ESMTP id B0B9D1203E1
	for <linux-mm@kvack.org>; Tue,  7 Feb 2023 03:18:13 +0000 (UTC)
X-FDA: 80439037266.14.3E2E95A
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by imf24.hostedemail.com (Postfix) with ESMTP id 83658180004
	for <linux-mm@kvack.org>; Tue,  7 Feb 2023 03:18:11 +0000 (UTC)
Authentication-Results: imf24.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=R2uGBf2I;
	spf=pass (imf24.hostedemail.com: domain of leobras@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=leobras@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1675739891;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=GIxTMZmbsJ4n4h+wodJFJ7AH4L9sE2etRukeWGfprOQ=;
	b=MJ2Rn/jRSXcCPOVhefkHnqS1ensiRDLkeVXc7yg6bKMQ/uTmYuRAHv4wC8tJJt6zWU042I
	ud3q1kWV5rVYl1ugWCLgqh/BIpMYFJonRhVhTwlRIpbaUUHNo4YWI77iYVeGY163WMTLXg
	u7bWiw/p/RH1crcaGFsQrgbRWIaV+Vg=
ARC-Authentication-Results: i=1;
	imf24.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=R2uGBf2I;
	spf=pass (imf24.hostedemail.com: domain of leobras@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=leobras@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675739891; a=rsa-sha256;
	cv=none;
	b=lLactYcaxdxfalBktzye/VHjOiKjvXOifgjWWlkWznh0hxueKXU453K7K5EfhM/mLVd/BO
	SJX8nknFQDdM0XrWEZ170KfPtrrHyDfhV1ShdKhpYmSyQtzwkc24XWreRJuw4LXGCA7jOK
	gl53vqeZRgIUsiAvWqzcOC0Q2z+CyBs=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1675739890;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=GIxTMZmbsJ4n4h+wodJFJ7AH4L9sE2etRukeWGfprOQ=;
	b=R2uGBf2IQoxXr11Ktr15HSf/tx0vlri7yHWHdgbWo5fedBsdZYYukoyJ2dMm/I5Orzk0fv
	EixYL0Bz8ywbwLathflLGmKK+OgwYXrR6SeLJKmEUiQVTHe9exhyHhAdLhj2kNnHgzwuu3
	4l6dvDc5kcbLL/4vIrORH4yW2cn9rnA=
Received: from mail-oo1-f71.google.com (mail-oo1-f71.google.com
 [209.85.161.71]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-58-B6o3arrcNjOnnxlxU3vU4A-1; Mon, 06 Feb 2023 22:18:07 -0500
X-MC-Unique: B6o3arrcNjOnnxlxU3vU4A-1
Received: by mail-oo1-f71.google.com with SMTP id h1-20020a4abb81000000b005178afbdbfcso4100976oop.11
        for <linux-mm@kvack.org>; Mon, 06 Feb 2023 19:18:07 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=mime-version:user-agent:content-transfer-encoding:references
         :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=GIxTMZmbsJ4n4h+wodJFJ7AH4L9sE2etRukeWGfprOQ=;
        b=Csoz+ZyllQLqCaekpiDqoi/ajp0+JaqkHht7dUDc0nkzqQ5+66zEPdOIzxRYflye11
         BVv7/DmXcHOSFcKfsQA6h5Ts8RJMmwgYTFmjfB5vFxw+pwKZPE+DFVGBBI0mMeKU+hSE
         pF3P2iEtvIGeVfFqHb6DIXolytaWA1xyiG4vhQwE6BM/Y3pt4VnIcLWgV1ISETakPaja
         Wg0sZ3Pzvwkv8iYguVuIq/h9h7adPoaVFfUIYy0K5p19nlA7fK/ZYjXYNc/R0hMdxDf5
         7qEM6TP0a/8FwUNhX0omT0bIRSawYL5GmsQ9TGGjLsZDUfUpLiKuWrLVe7W4VLP29oVy
         sOlw==
X-Gm-Message-State: AO0yUKVyfGQ4QG00dzt7t7Utv81+wYPxLoxYSC8D2wR3ij688dxU4m6W
	6VZyA8i1GTrfgLthOYL97mhGsg7MHIbhfi8AKdP3bD0qbBqRfMeOVemtIEYUIdoazUJ11oThJSC
	KFSWKUGkzpgY=
X-Received: by 2002:aca:2809:0:b0:37a:2bf0:5025 with SMTP id 9-20020aca2809000000b0037a2bf05025mr5460893oix.19.1675739886856;
        Mon, 06 Feb 2023 19:18:06 -0800 (PST)
X-Google-Smtp-Source: AK7set+7VS3H2qMxD1XHEeHGgHZ74B+TJyfGc4ZI4Mp5kB81qQes42VlFPDf7PJJ4nRSKOza3aEOWw==
X-Received: by 2002:aca:2809:0:b0:37a:2bf0:5025 with SMTP id 9-20020aca2809000000b0037a2bf05025mr5460883oix.19.1675739886546;
        Mon, 06 Feb 2023 19:18:06 -0800 (PST)
Received: from ?IPv6:2804:1b3:a800:9aa9:fdcb:7dec:9680:8417? ([2804:1b3:a800:9aa9:fdcb:7dec:9680:8417])
        by smtp.gmail.com with ESMTPSA id z12-20020a54458c000000b003790759c310sm5040893oib.15.2023.02.06.19.18.03
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 06 Feb 2023 19:18:05 -0800 (PST)
Message-ID: <4b232f47e038ab6fcaa0114f73c28d4bf8799f84.camel@redhat.com>
Subject: Re: [PATCH v2 0/5] Introduce memcg_stock_pcp remote draining
From: Leonardo =?ISO-8859-1?Q?Br=E1s?= <leobras@redhat.com>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Michal Hocko <mhocko@suse.com>, Marcelo Tosatti <mtosatti@redhat.com>, 
 Johannes Weiner <hannes@cmpxchg.org>, Shakeel Butt <shakeelb@google.com>,
 Muchun Song <muchun.song@linux.dev>,  Andrew Morton
 <akpm@linux-foundation.org>, cgroups@vger.kernel.org, linux-mm@kvack.org, 
 linux-kernel@vger.kernel.org
Date: Tue, 07 Feb 2023 00:18:01 -0300
In-Reply-To: <Y+AIOQy0HdVXCw8m@P9FQF9L96D>
References: <Y9DpbVF+JR/G+5Or@dhcp22.suse.cz>
	 <9e61ab53e1419a144f774b95230b789244895424.camel@redhat.com>
	 <Y9FzSBw10MGXm2TK@tpad> <Y9IvoDJbLbFcitTc@dhcp22.suse.cz>
	 <Y9LDAZmApLeffrT8@tpad> <Y9LQ615H13RmG7wL@dhcp22.suse.cz>
	 <0122005439ffb7895efda7a1a67992cbe41392fe.camel@redhat.com>
	 <Y9j9BnMwfm4TJks7@tpad> <Y9pd7AxAILUSHrpe@dhcp22.suse.cz>
	 <28e08669302ad1e7a41bdf8b9988de6a352b5fe1.camel@redhat.com>
	 <Y+AIOQy0HdVXCw8m@P9FQF9L96D>
User-Agent: Evolution 3.46.3
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspam-User: 
X-Rspamd-Server: rspam03
X-Stat-Signature: pkez8wk563ygahpxgj4w38np7ruarhwg
X-Rspamd-Queue-Id: 83658180004
X-HE-Tag: 1675739891-81923
X-HE-Meta: U2FsdGVkX1/JvSeBo0JTnyymO6Hub0GxacC6uQp2QRxrdHou2Tbcwb35VhpqMtWf2+aRK+2YWtWPTqfG4IBCppB0bCQnz24mgsYi6O3vHKu9AHAH0hwmwpS7OuPpmwaX8R97o70LlvUqE9vQpQJ6o8TCh1xvGipaNIrW+aHQBUJ/jv/HWNug+ayTxwF1F8WgcTmpfGtEEZxJvfAbme4Me0fMnFseyjDZ6wuRbQeZqcZY7hxckw5QJWzdXtgYoZgk/QRMeRJ/swFwudJBu3GmcSk2Tj8pdKn4pAzZsNYEvOb3CjHwuoqx4tLALmlUi/Hjmc1RxDTSqxM4ur6alJ3E3l6tqv/CMNMVHL67628hPSFnfHDdoIXndpF6gPBpNgb0TSanaeeVmOksJKEkAxz9ZTLZhO2RO8NouRfJrXxKcihRmxn8mBegZ2wlohcKf3D9QCY2Kvp1f+cTtFCKZvjCbVxHEEODUh4WNXe+A2EX/z0hH1c22POj+tJL8zhXK/3Ydfvkygd4FZQkI+J7xI5rFpCCnmQDczxXLnhq12lPqw8UdskRzOU3UBzFqj5SoBDwOotpuDxyiW4VX7m0XOuCxQNbYYP8/qJabgNdEeR8MJ3rFx03NdbIbG/nFNVdpLC3JrJP7VrFst9ivn27Kf1ghjPSVflPpyDjFiX9dVXU7QNWK8lUXQxuedxZpJyBeNximN+Qd1/US/SflHswu9O3qMXjm02x7/kg3X0I22zwZTEnKy9zlk1PxBGOBOlzy/RHHGKBmhpAAIvO19cYFGxfi6MeGtoj2Tqz2GlZAyCw962pRcnoWkF1LmIF0fk4QzbytzwDD7JsILpLQAdXPFtPYtvN8iMK62qBYu7XJ2TrGZMeFnfDC3h1wAlJpybzoE0Xj2uYXmxIjjKemb5JJoJfgxYF/vXda/HCdjkXIxGTDXiXVVZLfhGx+6aiNvWp9JJiLbGtqxpItcpoIQxvF/B
 QzuCrCTr
 ME488U6DN7UuP0c3O5GXoTxgWTk1SYDHpzJGgoev6C/gggWNHS435IKtxdEZEKSObsiRfH0g/VPzB/xmr6G2VVt9hj6Qgt3BbCUZFkMipojmJpa6/PUkRVzhsjkJQTPMPsJmrUGITwu5oY5HX9JLn17O5kbx0UCUtrThY9/6Ytbnu/DwptVwt+awK0DkJTxSDzmpXGDTh/lyju+K4zCnMYODnWUTGrWFjvNKR5Rgfz+rNW4UDno5DGCYg9nXdXnxwOQbw3pgvNd1+ZaXKCLf8/PSprCS1+AM7qzUOpe/kDYdHyjMYYeFH2UCSkwTQW3O1L1EMyPOtlKsyountRR5X5pWK4iUVdrdk/lcr2LpZaCNDT1uZ1wl0aPSK5jZPf4CGpZwnRYhOL9qMI+76V5lxbK1Gwk/8WlCX7BRdF+W7BOEL8MxfbehfZAOqeQ==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Sun, 2023-02-05 at 11:49 -0800, Roman Gushchin wrote:
> Hi Leonardo!

Hello Roman,
Thanks a lot for replying!

>=20
> > Yes, but we are exchanging an "always schedule_work_on()", which is a k=
ind of
> > contention, for a "sometimes we hit spinlock contention".
> >=20
> > For the spinlock proposal, on the local cpu side, the *worst case* cont=
ention
> > is:
> > 1 - wait the spin_unlock() for a complete <percpu cache drain process>,
> > 2 - wait a cache hit for local per-cpu cacheline=C2=A0
> >=20
> > What is current implemented (schedule_work_on() approach), for the loca=
l
> > cpu=C2=A0side there is *always* this contention:
> > 1 - wait for a context switch,
> > 2 - wait a cache hit from it's local per-cpu cacheline,
> > 3 - wait a complete <percpu cache drain process>,=C2=A0
> > 4 - then for a new context switch to the current thread.
>=20
> I think both Michal and me are thinking of a more generic case in which t=
he cpu
> is not exclusively consumed by 1 special process, so that the draining wo=
rk can
> be executed during an idle time. In this case the work is basically free.

Oh, it makes sense.
But in such scenarios, wouldn't the same happens to spinlocks?

I mean, most of the contention with spinlocks only happens if the remote cp=
u is
trying to drain the cache while the local cpu happens to be draining/chargi=
ng,
which is quite rare due to how fast the local cpu operations are.

Also, if the cpu has some idle time, using a little more on a possible spin=
lock
contention should not be a problem. Right?

>=20
> And the introduction of a spin_lock() on the hot path is what we're are c=
oncerned
> about. I agree, that on some hardware platforms it won't be that expensiv=
e,=C2=A0
>=20

IIRC most hardware platforms with multicore supported by the kernel should =
have
the same behavior, since it's better to rely on cache coherence than lockin=
g the
memory bus.

For instance, the other popular architectures supported by Linux use the LR=
/SC
strategy for atomic operations (tested on ARM, POWER, RISCV) and IIRC the
LoadReserve slow part waits for the cacheline exclusivity, which is already
already exclusive in this perCPU structure.


> but in general not having any spinlocks is so much better.

I agree that spinlocks may bring contention, which is not ideal in many cas=
es.
In this case, though, it may not be a big issue, due to very rare remote ac=
cess
in the structure, for the usual case (non-pre-OOMCG)

>=20
> >=20
> > So moving from schedule_work_on() to spinlocks will save 2 context swit=
ches per
> > cpu every time drain_all_stock() is called.
> >=20
> > On the remote cpu side, my tests point that doing the remote draining i=
s faster
> > than scheduling a local draining, so it's also a gain.
> >=20
> > Also, IIUC the possible contention in the spinlock approach happens onl=
y on
> > page-faulting and syscalls, versus the schedule_work_on() approach that=
 can
> > interrupt user workload at any time.=C2=A0
> >=20
> > In fact, not interrupting the user workload in isolated cpus is just a =
bonus of
> > using spinlocks.
>=20
> I believe it significantly depends on the preemption model: you're right =
regarding
> fully preemptive kernels, but with voluntary/none preemption it's exactly=
 opposite:
> the draining work will be executed at some point later (probably with 0 c=
ost),

So, in case of voluntary/none preemption with some free cpu time.=20

> while the remote access from another cpu will potentially cause delays on=
 the
> spin lock as well as a need to refill the stock.

But if there is some free CPU time, what is the issue of some (potential) d=
elays
due to spinlock contention?

I am probably missing the whole picture, but when I think of performance
improvement, I think on doing more with the same cputime. If we can use fre=
e
cputime to do stuff later, it's only fair to also use it in case of content=
ion,
right?

I know there are some cases that may need to be more previsible (mostly RT)=
, but
when I think of memory allocation, I don't expect it to always take the sam=
e
time (as there are caches, pre-OOM, and so)

Also, as previously discussed, in case of a busy cpu, the spinlock approach=
 will
probably allow more work to be done.

>=20
> Overall I'd expect a noticeable performance regression from an introducti=
on of
> spin locks and remote draining. Maybe not on all platforms, but at least =
on some.
> That's my main concern.
>=20

I see.=20
For the platform I have tested (x86) I noticed better overall performance o=
n
spinlocks than upstream solution. For other popular platforms, I have brief=
ly
read some documentation on locking/atomicity and I think we may keep the
performance gains.

But to be sure, I could retake the tests on other platforms, such as ARM, P=
OWER,
RISCV, and so. Or even perform extra suggested tests.

With that info, would you feel less concerned about a possible change in me=
mcg
pcp cache locking scheme?


>  And I don't think the problem we're aiming to solve here
> justifies this potential regression.
>=20

To be strict, the isolated cpu scheduling problem is already fixed by the
housekeeping patch (with some limitations).=C2=A0

At this point, I am trying to bring focus to a (possible) performance
improvement on the memcg pcp cache locking system.


> Thanks!
>=20

Thank you for helping me better understand your arguments and concerns.
I really appreciate it!

Best regards,
Leo