From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D516DC369D9 for ; Wed, 30 Apr 2025 23:43:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D01FF6B00A5; Wed, 30 Apr 2025 19:43:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CAF786B00A6; Wed, 30 Apr 2025 19:43:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B9E7D6B00A7; Wed, 30 Apr 2025 19:43:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9B2B66B00A5 for ; Wed, 30 Apr 2025 19:43:45 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7BC86C0918 for ; Wed, 30 Apr 2025 23:43:46 +0000 (UTC) X-FDA: 83392340052.22.C896E0E Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf05.hostedemail.com (Postfix) with ESMTP id 732E8100007 for ; Wed, 30 Apr 2025 23:43:44 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SwznbCPU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746056624; a=rsa-sha256; cv=none; b=JmeBXxce3UfvRZGRZX/2Bz/EObsrW7Qz4jjxshNxxtdVRqAOgTJPSOQ+xoIw2+quwRVJsR 6CMSSfdjXYvuC30Yr3qe+dIh+qBZIQrguL9ol7IfA/HzhpbqkvB5IpH+isw07o0a9qr3u3 3qyaAg8vz+he55ldzNQY7Y47NwOUq5c= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SwznbCPU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf05.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746056624; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=trrg9W1nidtTS4DlfIahIOyZJBnc1zJhZ4QyQKNercU=; b=hexfLKYVxtDWY9SxQR3Wzxwe87ovpbft+e50vFHBwf5oVkvOyUXCXglXD5y1OeMUNByHDw rygmtwD0c85Wd4feXdQvqvKqMdb0oIFVgkZHRclS+YnRncVzYclxySoNtg+EsOu6E/20CP TgbB+GCorJ2nQRfY2dSu2H8t03bxMw4= Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-22c33677183so4184875ad.2 for ; Wed, 30 Apr 2025 16:43:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746056623; x=1746661423; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=trrg9W1nidtTS4DlfIahIOyZJBnc1zJhZ4QyQKNercU=; b=SwznbCPUh1DE44aAvlw3JjkLeZsQuHM6TqEJ67+FVk/o3PLZbzdKjcmjb2MZZg6d76 Hz6rDvmzriC76rdtFRykvqNk/cTVBM2+dP8nIjIvNfwObBegotIXLQMzEv5SURN6Szx4 KEIhT5QGWR6VcUaIrji7uF0CjL+bNA9aG89T16cB91CidcTf/bnEaW3/svLVlcJ/qmk/ FCqSgAZTUa0tAX4HMbrgf+IKFX9N0pIKhOf3U8MYOiKU4oIVuXtHfE+rqGS4Nz/chiv+ 3BTsjxl7jwPSiP9YdrDbtChaRrQe6cDw+e4OhH8we8d9mPfCt8SBkWQPSrEYNUa+8Llr RumQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746056623; x=1746661423; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=trrg9W1nidtTS4DlfIahIOyZJBnc1zJhZ4QyQKNercU=; b=d33RmC2GKLswrvw3WdTr98j8Uvqtv7XHNWM0xqQkONma4CffaLrk3GDxcMfGbWSFoa yX5g4602MItWA1y8xaOxlsSXHR3RAw1SRgn545oy4wZAxB+QPeyWfHjdKXvz9LnHaK/y V2ClpJ232hd1sNrBQ497aCWNoII8kbRdr02uBFwsWVHYHHplmYyHLEfwjmLs7/0nNY7W 5pjDaQm2SulonmA2O4tGeALaX3s55ehyR4aSiT1/5V/snWrUi/LJUsCRB8ULeZIQXUYW /gw5Tjh+Ee/PYUeCqFjzCLpBQrZyZBPojUgx51P3mhfvgksLxdO7r2nmZ59kdHm472YJ CtnA== X-Forwarded-Encrypted: i=1; AJvYcCWgmdTgt4Rr5Y5EogSKKF1trkPeE+Qm2VY9/4593g+c+se7ueFGexMAK+J/SLi7FMCuJGWoV+cowg==@kvack.org X-Gm-Message-State: AOJu0Yw2Xr1MEAzcJgpwUmPtAwU7BxzLKZgT713dJPFEaDbU3yDT8lUn D1doCQYT6tYIl+mfILub96vz5I23yxM+urclgwswLTgpu/REpQ8sMSOCUw== X-Gm-Gg: ASbGncsWh05iTC2Wb//TmFpXxt0Be/ddyXsuIRYiX+zQ89Ju8ifggCaYjq1ORdkfPa+ G5tmfBLXEYkN5YQP6kyf1K2lYooVyEk1nWQQWUgklGToUgFF3k0iLlCg1mOUGDXx+xEkkdXrH+m Q2mIfBdLrId3r9TFzITkI6O5AQPmJidqxCKX/4iufCbS5IJ8zG5u/kfK718XlO8/ZOgDhYGLgVA yadRN15gYJIVc/T6VXa2/JtRsg7y/TynnA6FxoU4pH3qoEPcWEdQdq+oZtdmdjBHkVQjwm208oo WRz5teourz8VaCUfJYTsJ6gYkTQTa9rH2NcAb30uPFJOCm7Oiy4r2/M5o70WaXYtnGjfE6ucGQ= = X-Google-Smtp-Source: AGHT+IHuiRYYQi9vmblq0Qh1SJ7ySSma5yx29+dgQfBF4xpvGJlIP7kbk/Z9xi99THgw1a8tjzZ+SQ== X-Received: by 2002:a17:902:ecc3:b0:224:76f:9e59 with SMTP id d9443c01a7336-22e040b0136mr15620515ad.10.1746056623178; Wed, 30 Apr 2025 16:43:43 -0700 (PDT) Received: from ?IPV6:2a03:83e0:1151:15:661a:2fad:77ee:b1ad? ([2620:10d:c090:500::5:69e5]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-b15f7ec1faasm9350159a12.29.2025.04.30.16.43.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 30 Apr 2025 16:43:42 -0700 (PDT) Message-ID: Date: Wed, 30 Apr 2025 16:43:41 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 4/5] cgroup: use separate rstat trees for each subsystem To: Yosry Ahmed Cc: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com References: <20250404011050.121777-1-inwardvessel@gmail.com> <20250404011050.121777-5-inwardvessel@gmail.com> <68079aa7.df0a0220.30a1a0.cbb2SMTPIN_ADDED_BROKEN@mx.google.com> Content-Language: en-US From: JP Kobryn In-Reply-To: <68079aa7.df0a0220.30a1a0.cbb2SMTPIN_ADDED_BROKEN@mx.google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 732E8100007 X-Stat-Signature: x9poc1poug1no6sb51wizmijs1jrdd1i X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1746056624-501442 X-HE-Meta: U2FsdGVkX1/+3dw/XNmka5gMqi5Wlth8Yn0Q/EaIOSoSw3qxm18XB5uCKeq/PAb0yKihqM38dFS5zN+AF4V/px0Ad6jOrJClwz2EnVdQP6XKN2zy9QTrlrwdsWGapZxlJBJo1W6pQRvMJLvPfvG+ugCbweiFcb8OjeZHa6ySBJiyN+NhgLmnaZIMzbgoquXBLSZchQe8hai9xk3a3hGcQrT3Sr1BEuEK1zB3Sd+5UevNQATWV5swwwOZqoKnIsoc60EIHSoQOq0w8RbGBGO4vDcfoH09ZJjhxm2/8oLcemA+kGv6JxoTYRSrca6NkB+XTsLG6cJs6wJAnaFa7o+hO/4DUxbtyIwJ1z1nZzWpGfVMBv20kW8Qz8CL+tRmVQDs5SRfKyQTpnOqb1zuJ9CnBz+RYb7c+9NDJsfEv4YjOAeYdzcpphOIdPtducoacJi/1MKrhsxKDupME7NNEkLdHG19afDnnYpxCx22F9B4Ns7GRbgKRchYiPqYFzIfOFVnlDgUn1p7+wjz934mWK0mculnIzmK5+Jt9tnnJwJrcuLG2eGR/YGBvtevR84FxH6ZIh/bDZCsfY093MLVsknKwrD1B3zurHpr3VPQXQWF+AdUH2nhWJXH7RrKnYRJPRrjvJRVd7Z1W3CiFEClEz8a1k9Ipco5ED++RE6rpHcIrCy7uR6RQg7uWqONneansRI+kSylVXPFG1/2NzZtRBTI0OAGW3IqVmgpnXYvU3fdI51m/bvjar8hqXhOnVdRuY573DjrhVGYPTvgXBaCHwZ0X7lFf6s+20Az/0EVjkJWU1PKtp6tq74IbMTF0nWOSrGWvndN7xPaTBXPno4FQ9GhyG6w1yE6WxYDSdv7ETz5LblGWrnc2Yd+4bKZaVxjBc2EEMx/Z8wsqB4gFkVL+R6WnrezqHI5hXO5lgGRJdOO96JTzzuFwHGCOt3gNYpxcLn7PD4J144C2S0zard0oGG 00fAznIo 7oV5GvCsWNJQ0Ek5AFIU8wJ55/5texsRpEa2Vne4Yy0TbDlI1ZYMXS6Pc7YYGBLTQ2r3oswU23Zvg699XUf9D4wwctApsLXEDjJSeK5y8dPPFHcPfOp1QlJY+GO8OwLzaHXkqZRk7VXofSsYfOlD/1ACXszzWFgfzRyEvjA1OD5ioCd4K2E/8NmTxAxzgwILI5+EG/x69lJbxXBMVU/DJeY69oDwil46uvFYQkJaw0huCuH3QpZV7//B90HXYpNrfsr08RZvsIaZTCj33E/FugvgZSZJjXlHyK03D562XLPdvJ3tHBLLd4FjzuLUv5w0aVxb4k8R+bse2euN2FksTQmoOaocfGIhpvhP+3oBo52F4gv293dKP7EafQPV9oF4yswDoTRyVwgAD7Sfq/YmK/lH8rBtUJ+xXtNyBw1nvprbQwR0aKuYxJL4wThghySrTBblA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/22/25 6:33 AM, Yosry Ahmed wrote: > On Thu, Apr 03, 2025 at 06:10:49PM -0700, JP Kobryn wrote: [..] >> @@ -5425,6 +5417,9 @@ static void css_free_rwork_fn(struct work_struct *work) >> struct cgroup_subsys_state *parent = css->parent; >> int id = css->id; >> >> + if (ss->css_rstat_flush) >> + css_rstat_exit(css); >> + > > This call now exists in both branches (self css or not), so it's > probably best to pull it outside. We should probably also pull the call > in cgroup_destroy_root() outside into css_free_rwork_fn() so that we end > up with a single call to css_rstat_exit() (apart from failure paths). This can be done if css_rstat_exit() is modified to guard against invalid css's like being a subsystem css and not having implemented css_rstat_flush. > > We can probably also use css_is_cgroup() here instead of 'if (ss)' for > consistency. > >> ss->css_free(css); >> cgroup_idr_remove(&ss->css_idr, id); >> cgroup_put(cgrp); > [..] >> @@ -5659,6 +5647,12 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, >> goto err_free_css; >> css->id = err; >> >> + if (ss->css_rstat_flush) { >> + err = css_rstat_init(css); >> + if (err) >> + goto err_free_css; >> + } >> + >> /* @css is ready to be brought online now, make it visible */ >> list_add_tail_rcu(&css->sibling, &parent_css->children); >> cgroup_idr_replace(&ss->css_idr, css, css->id); >> @@ -5672,7 +5666,6 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp, >> err_list_del: >> list_del_rcu(&css->sibling); >> err_free_css: >> - list_del_rcu(&css->rstat_css_node); >> INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn); >> queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork); >> return ERR_PTR(err); >> @@ -6104,11 +6097,17 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early) >> css->flags |= CSS_NO_REF; >> >> if (early) { >> - /* allocation can't be done safely during early init */ >> + /* >> + * Allocation can't be done safely during early init. >> + * Defer IDR and rstat allocations until cgroup_init(). >> + */ >> css->id = 1; >> } else { >> css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL); >> BUG_ON(css->id < 0); >> + >> + if (ss->css_rstat_flush) >> + BUG_ON(css_rstat_init(css)); >> } >> >> /* Update the init_css_set to contain a subsys >> @@ -6207,9 +6206,17 @@ int __init cgroup_init(void) >> struct cgroup_subsys_state *css = >> init_css_set.subsys[ss->id]; >> >> + /* >> + * It is now safe to perform allocations. >> + * Finish setting up subsystems that previously >> + * deferred IDR and rstat allocations. >> + */ >> css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, >> GFP_KERNEL); >> BUG_ON(css->id < 0); >> + >> + if (ss->css_rstat_flush) >> + BUG_ON(css_rstat_init(css)); > > The calls to css_rstat_init() are really difficult to track. Let's > recap, before this change we had two calls: > - In cgroup_setup_root(), for root cgroups. > - In cgroup_create(), for non-root cgroups. > > This patch adds 3 more, so we end up with 5 calls as follows: > - In cgroup_setup_root(), for root self css's. > - In cgroup_create(), for non-root self css's. > - In cgroup_subsys_init(), for root subsys css's without early > initialization. > - In cgroup_init(), for root subsys css's with early > initialization, as we cannot call it from cgroup_subsys_init() early > as allocations are not allowed during early init. > - In css_create(), for non-root non-self css's. > > We should try to consolidate as much as possible. For example: > - Can we always make the call for root subsys css's in cgroup_init(), > regardless of early initialization status? Is there a need to make the > call early for subsystems that use early in cgroup_subsys_init() > initialization? > > - Can we always make the call for root css's in cgroup_init(), > regardless of whether the css is a self css or a subsys css? I imagine > we'd still need two separate calls, one outside the loop for the self > css's, and one in the loop for subsys css's, but having them in the > same place should make things easier. The answer might be the same for the two questions above. I think at best, we can eliminate the single call below to css_rstat_init(): cgroup_init() for_each_subsys(ss, ssid) if (ss->early_init) css_rstat_init(css) /* remove */ I'm not sure if it's by design and also an undocumented constraint but I find that it is not possible to have a cgroup subsystem that is designated for "early init" participate in rstat at the same time. The necessary ordering of actions should be: init_and_link_css(css, ss, ...) css_rstat_init(css) css_online(css) This sequence occurs within cgroup_init_subsys() when ss->early_init is false. However, when early_init is true, the last two calls are effectively inverted: css_online(css) css_rstat_init(css) /* too late */ This needs to be avoided or else failures will occur during boot. Note that even before this series, this constraint seems to have existed. Using branch for-6.16 as a base, I experimented with a minimal custom test cgroup in which I implement css_rstat_flush while early_init is on. The system fails during boot because online_css() is called before cgroup_rstat_init(). cgroup_init_early() for_each_subsys(ss, ssid) if (ss->early) cgroup_init_subsys(ss, true) css = ss->css_alloc() online_css(css) cgroup_init() cgroup_setup_root() cgroup_rstat_init(root_cgrp) /* too late */ Unless I"m missing something, do you think this constraint is worthy of a separate patch? Something like this that prevents the combination of rstat with early_init: --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -6130,7 +6130,8 @@ int __init cgroup_init_early(void) for_each_subsys(ss, i) { - WARN(!ss->css_alloc || !ss->css_free || ss->name || ss->id, + WARN(!ss->css_alloc || !ss->css_free || ss->name || ss->id || + (ss->early_init && ss->css_rstat_flush), > > Ideally if we can do both the above, we'd end up with 3 calling > functions only: > - cgroup_init() -> for all root css's. > - cgroup_create() -> for non-root self css's. > - css_create() -> for non-root subsys css's. > > Also, we should probably document all the different call paths for > css_rstat_init() and css_rstat_exit() somewhere. Will do.