From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 743CDC6FA8F for ; Thu, 24 Aug 2023 18:51:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D48D62800B0; Thu, 24 Aug 2023 14:51:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF9998E0011; Thu, 24 Aug 2023 14:51:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B9A552800B0; Thu, 24 Aug 2023 14:51:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A57868E0011 for ; Thu, 24 Aug 2023 14:51:31 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 71A67B2214 for ; Thu, 24 Aug 2023 18:51:31 +0000 (UTC) X-FDA: 81159891582.10.7E0DB86 Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) by imf12.hostedemail.com (Postfix) with ESMTP id 9A2E640003 for ; Thu, 24 Aug 2023 18:51:29 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="YTj/bsBv"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692903089; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FIntCscqhDfEHiFTt/nLdYcoh0FY54rWEh/TsH/OwaQ=; b=ntoYQB0BaASxmc+V2dUhP/e0tS8GElZARCMGjkxUTonqN5huL24NjbHWm7O1HSWIt7OlEL ZEzrS3PgMwxnFtW17aVaLnznarq20sOjZdNrQxwLt5TEIKafil1XbsyAC5AYe25sgkGQ+U sT7xxQ2UM5MLSmIW6q4pInCVEgwSxKQ= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b="YTj/bsBv"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692903089; a=rsa-sha256; cv=none; b=Y1521rIOW2819nkIM84D0OyRN/ms0O8p/jto5lyvP4nb91eo+bkz8SqZbOOTfOWCsno/dO A/HqYGn7SqjK+dC+eqk7ZocFpn6UiXcPcH/wdTGQsS//Lr8Qx7W8113ZtDi2tfynUfzyGP LCFrK0IsvX4qmUk9DT9EYvGiyf/3ctc= Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-99bf8e5ab39so9548766b.2 for ; Thu, 24 Aug 2023 11:51:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692903088; x=1693507888; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FIntCscqhDfEHiFTt/nLdYcoh0FY54rWEh/TsH/OwaQ=; b=YTj/bsBvUZUTjYN49dDIHhjUbcqUZnmwFrUEPy7wBE1yP8CClRJZcB2EqH8BTQ9L1p 9rfVxKZVqXDZQGgj5YVz4GBx700Sa82keEh+vFvHbV2cDP51G+M+h52AeHFmD3rhQT/S KAjaYgd0InsPnpsp6p4u+LcvCftNLUwhzZ+GmDpg07XyqQf+hQVd/5t4uAHmpkqoqyIx vmLOEMUE89WpXkh3oTDnsYgyIlxzeAz8grVxGy6Iux4RvAMQlX6In2y406zdh46ESotF J2SvKFOD1STNKGBtkA7auuwY6QebYfK0veMj/+Eb8DraUFnVMB4WLcrBRvcODMDC0zcu v8Pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692903088; x=1693507888; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FIntCscqhDfEHiFTt/nLdYcoh0FY54rWEh/TsH/OwaQ=; b=iPGnrLmlPL8h3R9ls12w2LzuogkurfPr/5A0fnNHZ1/pfWQRBzTeceF+IH8hr/DFtM ZXztoeua7OYOjhIqiXkGRh3kRNQxZycpn2oPuKw2yYjFse6eZrBU6n1ingkfSuCDy+6+ ZcEt8afhQC+0X6bQx5Mzg1JN0IrW33GXCW0MKDoKzKykCx5nRzum/B2ZWs3JNOEhJi3G q1e+bZclxGMdNz/nJK+PvmHkDSn9N/Nl+gNVm0Wcnc8oKrmjXeN8asMlk8s8/kwmRX6u 0xiTaaEic79+ZN19XZQfKvVABSOe7Hd9yUPG5zKJ39KeuY2CYaoCJ/caa3CuX0Wye4/U XOVQ== X-Gm-Message-State: AOJu0Yy3jQLyJnX0gOYp5PJlqwBRhU6J6Vue2zCJRSspC8vsE4QxcvVc DBfXfXd3BSJw+0PMXnz/ZB4YJRCHiuPkPhrpyMXuIA== X-Google-Smtp-Source: AGHT+IF+IFNG4EAVUrKQsHTGpc++b97oZVHRYbXxJwtABfkADfSJsBlQZ57k7Xe37dAiSU1MfBG0ZsgWcdHrULKljiE= X-Received: by 2002:a17:906:24e:b0:99c:c8ec:bd4a with SMTP id 14-20020a170906024e00b0099cc8ecbd4amr14264570ejl.60.1692903087746; Thu, 24 Aug 2023 11:51:27 -0700 (PDT) MIME-Version: 1.0 References: <20230821205458.1764662-1-yosryahmed@google.com> <20230821205458.1764662-4-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Thu, 24 Aug 2023 11:50:51 -0700 Message-ID: Subject: Re: [PATCH 3/3] mm: memcg: use non-unified stats flushing for userspace reads To: Michal Hocko Cc: Andrew Morton , Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Ivan Babrou , Tejun Heo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: 3aueosnzuazhh53s6twgbqufnnz6hr1g X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 9A2E640003 X-HE-Tag: 1692903089-887680 X-HE-Meta: U2FsdGVkX19ZOoenqFIvUuliuAeJFXvpz/fwYiN9HNBMCzf7IUMlqrLMFgOMqVnXQkwxQHmCgjGL2kXZs+pPQCttMgWspXSc2BhHJVlMYxq2uU11o9CFQNxEV1ADo2+f+yb1In9anX1oIm0zYOXpPPg390/HZ0c4U6bkeofknRxQhaOLR5s0xxTLfFkH7rk75U9x8jmE40TnNhXg0YqCO25DEamngHleKxnffMMLyjvniL0adcH6UX5d9bbR6j+UFCoFf3PFVcVKyugbJ02kcxqM5zFTJYxlFUZ68B8T1CkYoG2YoQI55B5x8j4ViM72BiC/YR0DAXDE7yaG0NXrCAZ7fSe3J87LGQ89prrlOAAw07k+9t75GyT/c0f+lRiF0k+57C3zPlNVvt3a9jyaJFTvcSnX4uGtV2VZGNiQ6wmBzJIAvt0ptjFULLl3G5b+BkrFpMS1Th360j2Z1Tpd4m26dPRMhyk9szeex91sHp/GycfWWHMxKTqlFabnXG9bWTbNooMKzH3J5pUB+UWXuyuame2cA4twslfduJtvSi62146WCB5yUi45XD2k0bI8yyPkdALF5IENhb88A1o8v0UDm2m9atqqFaHacLe2KMwvamw4zHbPVZqm2OXS2MwDO5KuCOaI6fLnrr/PgmPGPnlxxbbys07GHnw6/8wPtMeV/ZzkbtZtolKjEvffCMv5dvhyc94HlJ3SciWfooS0F7JPZJ5TfMLuyTzDyzssAvEQZoovbE+JQD32jFM7FOV7I4a8H6Ko7L0lI9F/ppwZBN3krB9UZuKu2EApypVuco6zgzJQ4d39iF7bEdGccZ2Gi0tlHMLWFDAq90wUbNDTtrBoeFU/+PWTnBPxWpmRWYqe3vhL188fiP+zjgEVaYsC9BWRMhC+JnQb/NORV9I88IGd+zCvM8iSN9OfLXJVXt82/h6AyAFfNW36Zurppqj+yaW9qKt89bAfMVONTeR hZgD3EMz yfNGby6enMXXiRcWKeYzJg2Z4pbrRafKl04abvI8+1oNM/D4Tt3EOmqM3TY4Ig/54oY3ll9w2Ner5TL4iptUCGsc46qZSAdUIhKAgwYB/dJX33GwAfvo744ARIoDAU1aTr35CdVCZNmdYyXHCrjAjqHHWCfurqZvi63Q32uOpg4HmgNrQfOH9DQM7g/qqCmhlan5OemTu0DmUFeXAw9fa5ImoIIEK1RH6tiu7afDvz9jkxLoWJbB1GHk0Lzyr3DWZcYLvQaQfwR4SHXQbsk422WqaUyZXByFc2b4KDdxcPppI+xYU13ra/QDKhw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 24, 2023 at 11:15=E2=80=AFAM Yosry Ahmed wrote: > > On Thu, Aug 24, 2023 at 12:13=E2=80=AFAM Michal Hocko w= rote: > > > > On Wed 23-08-23 07:55:40, Yosry Ahmed wrote: > > > On Wed, Aug 23, 2023 at 12:33=E2=80=AFAM Michal Hocko wrote: > > > > > > > > On Tue 22-08-23 08:30:05, Yosry Ahmed wrote: > > > > > On Tue, Aug 22, 2023 at 2:06=E2=80=AFAM Michal Hocko wrote: > > > > > > > > > > > > On Mon 21-08-23 20:54:58, Yosry Ahmed wrote: > > > > [...] > > > > > So to answer your question, I don't think a random user can reall= y > > > > > affect the system in a significant way by constantly flushing. In > > > > > fact, in the test script (which I am now attaching, in case you'r= e > > > > > interested), there are hundreds of threads that are reading stats= of > > > > > different cgroups every 1s, and I don't see any negative effects = on > > > > > in-kernel flushers in this case (reclaimers). > > > > > > > > I suspect you have missed my point. > > > > > > I suspect you are right :) > > > > > > > > > > Maybe I am just misunderstanding > > > > the code but it seems to me that the lock dropping inside > > > > cgroup_rstat_flush_locked effectivelly allows unbounded number of > > > > contenders which is really dangerous when it is triggerable from th= e > > > > userspace. The number of spinners at a moment is always bound by th= e > > > > number CPUs but depending on timing many potential spinners might b= e > > > > cond_rescheded and the worst time latency to complete can be really > > > > high. Makes more sense? > > > > > > I think I understand better now. So basically because we might drop > > > the lock and resched, there can be nr_cpus spinners + other spinners > > > that are currently scheduled away, so these will need to wait to be > > > scheduled and then start spinning on the lock. This may happen for on= e > > > reader multiple times during its read, which is what can cause a high > > > worst case latency. > > > > > > I hope I understood you correctly this time. Did I? > > > > Yes. I would just add that this could also influence the worst case > > latency for a different reader - so an adversary user can stall others. > > I can add that for v2 to the commit log, thanks. > > > Exposing a shared global lock in uncontrolable way over generally > > available user interface is not really a great idea IMHO. > > I think that's how it was always meant to be when it was designed. The > global rstat lock has always existed and was always available to > userspace readers. The memory controller took a different path at some > point with unified flushing, but that was mainly because of high > concurrency from in-kernel flushers, not because userspace readers > caused a problem. Outside of memcg, the core cgroup code has always > exercised this global lock when reading cpu.stat since rstat's > introduction. I assume there hasn't been any problems since it's still > there. I was hoping Tejun would confirm/deny this. One thing we can do to remedy this situation is to replace the global rstat lock with a mutex, and drop the resched/lock dropping condition. Tejun suggested this in the previous thread. This effectively reverts 0fa294fb1985 ("cgroup: Replace cgroup_rstat_mutex with a spinlock") since now all the flushing contexts are sleepable. My synthetic stress test does not show any regressions with mutexes, and there is a small boost to reading latency (probably because we stop dropping the lock / rescheduling). Not sure if we may start seeing need_resched warnings on big flushes though. One other concern that Shakeel pointed out to me is preemption. If someone holding the mutex gets preempted this may starve other waiters. We can disable preemption while we hold the mutex, not sure if that's a common pattern though.