From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39FC1C71153 for ; Tue, 29 Aug 2023 18:44:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 642118E003E; Tue, 29 Aug 2023 14:44:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5CAF68E003B; Tue, 29 Aug 2023 14:44:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4445C8E003E; Tue, 29 Aug 2023 14:44:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 31A7A8E003B for ; Tue, 29 Aug 2023 14:44:30 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0639A1601CE for ; Tue, 29 Aug 2023 18:44:30 +0000 (UTC) X-FDA: 81178017900.23.0E4E849 Received: from mail-pf1-f175.google.com (mail-pf1-f175.google.com [209.85.210.175]) by imf01.hostedemail.com (Postfix) with ESMTP id 076064001B for ; Tue, 29 Aug 2023 18:44:26 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=K20oT6UB; spf=pass (imf01.hostedemail.com: domain of htejun@gmail.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693334667; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8wRQp/5cBNehRizqtkTinmdKAq2BLV1uZRrSduBoGzQ=; b=b3AcycNvF6NcFFHJb7vIop4Gy7ayLunAl3SELxciAtI9ktvC/z/Q8tFdwNId9ISLOOVGrT C3RNaaGq+F7blLazf63fhqobm+qAr6vusLxFJoEhX+03l0vPevF/30jhId/WRn6LPiieQA E27Qfr+Xctbd3FCpN0Rf1NMkJaBX8Vg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693334667; a=rsa-sha256; cv=none; b=z1/xOrgyuaF/roTwv6hnTmWt0z7+KwBz3GLeoTqkUUdOYI7Dsb06OWAgHxsMnEwm772XWV ogGDQqebIs22SoZpbEsrPWo8wP7+SQuEqQpYlKbJbC7Gslfpnzyzm5FJRe6qXTb2sKO8pq ysixoGXAuwwixeIjDlK7CvOrT9uxiXg= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=K20oT6UB; spf=pass (imf01.hostedemail.com: domain of htejun@gmail.com designates 209.85.210.175 as permitted sender) smtp.mailfrom=htejun@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none) Received: by mail-pf1-f175.google.com with SMTP id d2e1a72fcca58-68a4bcf8a97so3055759b3a.1 for ; Tue, 29 Aug 2023 11:44:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693334666; x=1693939466; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=8wRQp/5cBNehRizqtkTinmdKAq2BLV1uZRrSduBoGzQ=; b=K20oT6UBlLP6e4zBU9fP+O48oEByZU8o9nXJeNzZkkk9SjcXbaItTarv4Ie6OpbHP4 7WLqXyYuO2NLmMX4p3cH+R4M/ruDk+AToZhcRvTRps3rEZw7Cl+sMmW3cKyFjNHpxdiJ EL02kssLm1DMG0icyMNhHw+nXKJNXGKmvN2hBuOtgBK8dTmObxrGPBLzlYvcPt5PB+rG uMQcfb1oWdQY4m/KHP3kuxvop+TqOyRbHmDe4T6q9f1x1dIueWkhBIlX7LiVcErMQOwN s07I+4Xx8dooGWneLTG8cFnzSf284ralXlV0WIPyWKiCDigPohMLg9gLpPgXANqIweSu RDNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693334666; x=1693939466; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8wRQp/5cBNehRizqtkTinmdKAq2BLV1uZRrSduBoGzQ=; b=Dlx/bEJrEOV0e67+U+abIQ1G78+X8Xx10yynMfXF1bp3zX/wfzxv9O+2L4Yfhi4/Rb IcrHGLBxckeNVg7FIR7pjwRBsy9Y3YZEVTyOd8Eh9gP5C2spJ9AvotYFqndhy7eLKUEL 9hMKWFtGKwK+5epfWBxw7BHkDUtHq1qWUHhKgwua8wE7ndMv1kC7Ad3qcNF91+mK99qJ kQuMqwBX2dgfDQYsVRJTZIoDs5vO0gOCHb1+V102a+QQU5FXXXQhKXfH632wS2MiQmUD lhE5m9GFI5ymq4Z/TNIfxzsKefS6KfWWPKdT2CN3kVZ5HqG3TkeDKsA4mlVFsMXF2NRE B0vA== X-Gm-Message-State: AOJu0YzwqqyofWfB6Vk5OTL2VgyrEyVhgFgcPOBhqDidgAPZRtpxkW6p NDDxYZO+mRldvG2rdIBYbwY= X-Google-Smtp-Source: AGHT+IFj/vKyxTOrqfXOhoGBFsBE++kcdb2Z1tiqAfT+c3JgcPzCAt3Y3NSYxmqcdxH06qVgNQO3ow== X-Received: by 2002:a05:6a00:b54:b0:68a:69ba:6789 with SMTP id p20-20020a056a000b5400b0068a69ba6789mr105680pfo.16.1693334665640; Tue, 29 Aug 2023 11:44:25 -0700 (PDT) Received: from localhost ([2620:10d:c090:400::5:f05]) by smtp.gmail.com with ESMTPSA id ey6-20020a056a0038c600b006828e49c04csm8843046pfb.75.2023.08.29.11.44.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 11:44:24 -0700 (PDT) Date: Tue, 29 Aug 2023 08:44:23 -1000 From: Tejun Heo To: Michal Hocko Cc: Yosry Ahmed , Andrew Morton , Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Ivan Babrou , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/3] mm: memcg: use non-unified stats flushing for userspace reads Message-ID: References: <20230821205458.1764662-1-yosryahmed@google.com> <20230821205458.1764662-4-yosryahmed@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 076064001B X-Rspam-User: X-Stat-Signature: sjr91ehitq3558t6wx1iqwbhiege6wxd X-Rspamd-Server: rspam03 X-HE-Tag: 1693334666-163182 X-HE-Meta: U2FsdGVkX1/ST5uy7GuwbrGuHR6JLvjPrJzQl/lsMEKNlBC42mvvKQT6MX1QRLe+Dogj41ZGzjj7whR094Cowl28iQn02cznaFrBI77iut+utYnulI/3xFGZJQFmHMiHSAlG9abV03A8t+GrJpCrRGHHZB9hNuaz4Bo9F6wjnV5/RO8tJXu+OkS1QtwHchZljilsAHn+YlgeVcXsVjU89bTH/mcRMF/hUbzwzlB9LTwmgJ2xQ57SIVL/67YWrQ+K4ESAY3AOmRua2AnM0os2mbv5TTbKwTUDzEbmr8EjrDJflLuBA5zOcHbOqttxCIcPFmLCGWvZZwc9I4DZ66zkcRYRxNSaWyFxSk6elu7hl+BD9DKdTWPPJuSkSu7a1wU3Zf7XA0ZVAYCYulf9ty92et3NM7TXkM937quhVnHwLlJL/71JYjk+HAI3hq5EN/GF4wo4+e4kSuSQh4tFljvLyZ8kPhfCMDH2olywyt9ZgR+OpoNRkkVu+UR7FeJqVBbEHKVWX+uKOEGpezobvQIuy32XWdgIZ9pW3P0b96iJsrYyVC4I5k4cB3ZNKEODWh6r2CfGOk8SwaxNHrrAFgFGLSEBEKZS7ZsWCo0bhlQ6Uooubb+D35x1Dz60ursoMmoaRqB0E91ugUeGh/7JY7blqqj4YKowOMIZpVakccOudwnM4YyBoClZKYJcp5loILzeqMOEHAjYH0o+/dBAMK0l8XSKIe3oxT7b3ulLGEffJSYYUd03t0gcX+wNlYIY0ToTtnrZg4S+bfKsrwcph+yzCwziYDueipGoFSeEF1msDvEJoNI2ObH/sjqoRxpueZ+pMcrmmz2gMW+Qx55yhNHI/juWQ58XL2M274kP5C1dnb7R3Tj7pXrwLvPZAjDwLAXdQ7oC66oGNglQJlVsrEgfKshtnYI7jCeYPUJpbcnFl8r/krnKEftObDQOHh1NTGzDGKV9BQrX3DLAXKOBivT byi+aIXd 4HeMrUgp5CUjySmRJqeqn/judWK8NNGp4b1zaRsOeWnYeTuhFbFRBAVzr5HdYGzO3wKSu1vbOubz4QNHozCAWkVKjjiqIJmAFyH2VfAQfkv7GbfijbWM7Ulq41T9qYoNro0gIe57tsxffNV8po2vsUi57r2BPvynaBScFdWryomQsZVvm5iFJ9+q3/ruuYFqbxnc6Oa2seL556ur7FmJG6L6T2nIfpXmFD9bWZudcd8UP5y9uwEC/xDiGZizlOakJIHSxz2g0yoyzQPe8Lig7ls6SLQyrH3sPS11okLDG/P1ej9qZXSCt7CfmzpSXsVjT6N0RF7RQlcCLsJEkdIPzRfNjt7pDs6jH9rtwrVHs7Gs/GF1Eamz7oceSw9dwp3aH/qY644gmAc9ZPQxTSfzEKaDlpHHT1FCHODm+LkNfbgSyitzibuQhBGZermrsB2kK3BN/ZcPyODdiB1AXS/jV3ZfD3+DbPfExtB8N X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, On Fri, Aug 25, 2023 at 09:05:46AM +0200, Michal Hocko wrote: > > > I think that's how it was always meant to be when it was designed. The > > > global rstat lock has always existed and was always available to > > > userspace readers. The memory controller took a different path at some > > > point with unified flushing, but that was mainly because of high > > > concurrency from in-kernel flushers, not because userspace readers > > > caused a problem. Outside of memcg, the core cgroup code has always > > > exercised this global lock when reading cpu.stat since rstat's > > > introduction. I assume there hasn't been any problems since it's still > > > there. > > I suspect nobody has just considered a malfunctioning or adversary > workloads so far. > > > > I was hoping Tejun would confirm/deny this. > > Yes, that would be interesting to hear. So, the assumptions in the original design were: * Writers are high freq but readers are lower freq and can block. * The global lock is mutex. * Back-to-back reads won't have too much to do because it only has to flush what's been accumulated since the last flush which took place just before. It's likely that the userspace side is gonna be just fine if we restore the global lock to be a mutex and let them be. Most of the problems are caused by trying to allow flushing from non-sleepable and kernel contexts. Would it make sense to distinguish what can and can't wait and make the latter group always use cached value? e.g. even in kernel, during oom kill, waiting doesn't really matter and it can just wait to obtain the up-to-date numbers. Thanks. -- tejun