From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C64C8C3DA4A for ; Thu, 1 Aug 2024 10:35:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C0636B0082; Thu, 1 Aug 2024 06:35:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 54A1B6B0088; Thu, 1 Aug 2024 06:35:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E9AC6B0089; Thu, 1 Aug 2024 06:35:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1FF866B0082 for ; Thu, 1 Aug 2024 06:35:40 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 90DF2140B31 for ; Thu, 1 Aug 2024 10:35:39 +0000 (UTC) X-FDA: 82403320398.26.4078766 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf15.hostedemail.com (Postfix) with ESMTP id A8938A0008 for ; Thu, 1 Aug 2024 10:35:36 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=CPeoJ5zb; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf15.hostedemail.com: domain of vbabka@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=vbabka@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722508490; a=rsa-sha256; cv=none; b=WRMRxe0CsvH1pMtnZYIvO8Ebb2LL/F9P1ogA/+BfZq9SbSYPnjHOEq5S6+RWkWD/d75ZA8 XhjrPQkb0q4V6Mu1Gcv8QoFpYnqAxsg4N68+ooW/BeZzxrmDj6QeLLiytIUr10Ky0g69Sg TYnr0Xx3FZ30y89xedHrwPgc5kzqnsI= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=CPeoJ5zb; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf15.hostedemail.com: domain of vbabka@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=vbabka@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722508490; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NrLqXm8VDh7wkqfyskRul9360eHJFFf9QNsod0c9W+A=; b=A1Q8M1Cd8qInIDVLsZHzoQqS2XvSSDUDsJPG342BU+borZ0YNCAtxYKJi91R/rmxV13l2o M3SZkVielmlZ/lDslUfLdEb8pD68XiBFaryUIH40MbQOmrqAQASH20gxAygBAMi66v5yhD iDih19zSFR+s/oZQgDGBcnyOyS+/O9A= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id E851B62785; Thu, 1 Aug 2024 10:35:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5A51DC4AF09; Thu, 1 Aug 2024 10:35:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1722508533; bh=FTOFv8FAFGdvJC8md3mZcio/JkgfWJnUswLaH3sTGG0=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=CPeoJ5zbd6feDjrEsdIAG+7qGMeInbIcjgdlQxJFPuN4hOsN9eV0wsEWFph65UyLb 0QG0mI8FAQ2tb6U6xzg8Fgaj38jAVg5V+c561skLABCbS/ALjgWlh+1A8i4VdlFqHY +kozrCY1KafV9AZV8g2QEQXX3CeddCM3TwO3GYv36uNnQSH+FIOrP7LjYxLrUuCdZ+ ciL/QzPxwqoln7f6MA2EOuQKU6n++6AzRfDduzZ+NfhcRXuKcp6Y+MLD6frTbVJJ1I iYOY+mkhhAcFZZU6/aMQEJE8nOE66NIGSJBKdcSHz640rDxHO1KcYObGgvfuicxISR 3Ku1tpzSnCztA== Message-ID: <2527d5a4-de1f-4c93-b7ee-fdd6fbe2a6f0@kernel.org> Date: Thu, 1 Aug 2024 12:35:27 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [BUG] mm/cgroupv2: memory.min may lead to an OOM error Content-Language: en-US To: Lance Yang , akpm@linux-foundation.org Cc: 21cnbao@gmail.com, ryan.roberts@arm.com, david@redhat.com, shy828301@gmail.com, ziy@nvidia.com, libang.li@antgroup.com, baolin.wang@linux.alibaba.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Cgroups References: <20240801045430.48694-1-ioworker0@gmail.com> From: "Vlastimil Babka (SUSE)" In-Reply-To: <20240801045430.48694-1-ioworker0@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: A8938A0008 X-Stat-Signature: 65gqimckrhtx36fddftjyuq7c35uapcd X-Rspam-User: X-HE-Tag: 1722508536-480720 X-HE-Meta: U2FsdGVkX1/KFBSAfeaDmXLLIbiC52ey62PWKUXQLRFAA/5JuTPsOCv+4nI5Z68uin2XVAm49+/PNgJSLRi5tEN8rcvWsYcyfXVliSqPaaPKSbeRS/UuPopE2INIxKRB0/ALPDUaz95KvbSlxtaKHUti+OcROI+Sm0DgaWn8Q5ub/XwhFMnvOEzLa/XWax6xWI2dbs1NAJFXxGfib4ivdrWcr1gdJn92oRDoH5qrdox7S+z9jrZZ3ytVJb03RfyH4sPSGbDXfoSlxTwlP+q9jcG/1d6uIWfbI0EbMdCGp7nVY+mXdwkRX+iicDlZdfCLA9VzsRX6XbBmX8Ow9yklcc5qg7E602OwJKMoybNmqK3cEe/oaCIYJJwv0QaW597Hlsj1BcWdXr79m1kudEGqpWO6OP8k1QICDj1o3NO1xwhJKKlBUdjPhUCHOciTiwDdT8IXoDvyF7QMmjYqBmBVatn8IqF0N/7dddHcpPLuvLH6SbVpxWB/iC0GMZ8cv5g7U2hiz+64vXqnmWzdpLlwGmUyJfctg9oEsOyhqqLCAG3W6IQYZrGSlW0t2lo6FiXIoaBYOGxSnSVVfQyy9MREI4WznOzdiAOSjixF9vkq3g40nmpdKZXQ+SB0+PqMJWqZ6iLKa+JMkXiDMmXdERgluAR74ta/o8dqKfusNdMjXSkhSMehx7e9IFunBiMN62IfzdP09gxveR8f1IHl9hWGcvGfczNBgp8ob+cMSFEGkw0b1edJh+DQ+OJzZQZsxadP0vorw0nDeZ9FbJC2po2oALw6hZQNfcZCEyeTXDDdqEZb7zlee+ys9V8Y3cv1pcvxXUKwG2GLNup6zRsknGKNI0vB0gLQH8oAaZh6q8Kf1dv/G0Qg+GewBk2zbgmHc77BByrC11WiLo0fzR8FCbKXIE6zCTsbTxcgnWyJ47LkKN3tnGD3+qcDYLKXZhDVTGUThgltg3DajAa8SoHcQFp 2ehS5ntz RjmH10Jg9uLRoWSExz+YnPiTMDYxwE1Qqjn9+ljagdZ6u0jjehBKa6L/b9gGTTE8Rlkyn9dBaHCTGkJb94lRjpDVcJ7G1EsPoAijYCMab6oyOn20CHxAPRWBUJJOgsmMGNZ8Z8R5kpWpgshiUblLnlfLTfkdWHndXp638wOnf069f/92YZRGM4VUWJDH1kgERoONnM1BCK+OoC6sU2PoNX7DVhM1fAXVfY9e0hx95ytPX62fUvq06O5ICFk6z2tWJ5w9peIXie/sfS+lAj1xvEEmWBOQ12TlPQgR1Ley15gG6MPvW8i5+/woTi7gbpTzfMQQHGowc/COSd24zNk0mibXzvg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000984, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 8/1/24 06:54, Lance Yang wrote: > Hi all, > > It's possible to encounter an OOM error if both parent and child cgroups are > configured such that memory.min and memory.max are set to the same values, as > is practice in Kubernetes. Is it a practice in Kubernetes since forever or a recent one? Did it work differently before? > Hmm... I'm not sure that whether this behavior is a bug or an expected aspect of > the kernel design. Hmm I'm not a memcg expert, so I cc'd some. > To reproduce the bug, we can follow these command-based steps: > > 1. Check Kernel Version and OS release: > > ``` > $ uname -r > 6.10.0-rc5+ Were older kernels behaving the same? Anyway memory.min documentations says "Hard memory protection. If the memory usage of a cgroup is within its effective min boundary, the cgroup’s memory won’t be reclaimed under any conditions. If there is no unprotected reclaimable memory available, OOM killer is invoked." So to my non-expert opinion this behavior seems valid. if you set min to the same value as max and then reach the max, you effectively don't allow any reclaim, so the memcg OOM kill is the only option AFAICS? > $ cat /etc/os-release > PRETTY_NAME="Ubuntu 24.04 LTS" > NAME="Ubuntu" > VERSION_ID="24.04" > VERSION="24.04 LTS (Noble Numbat)" > VERSION_CODENAME=noble > ID=ubuntu > ID_LIKE=debian > HOME_URL="" > SUPPORT_URL="" > BUG_REPORT_URL="" > PRIVACY_POLICY_URL="" > UBUNTU_CODENAME=noble > LOGO=ubuntu-logo > > ``` > > 2. Navigate to the cgroup v2 filesystem, create a test cgroup, and set memory settings: > > ``` > $ cd /sys/fs/cgroup/ > $ stat -fc %T /sys/fs/cgroup > cgroup2fs > $ mkdir test > $ echo "+memory" > cgroup.subtree_control > $ mkdir test/test-child > $ echo 1073741824 > memory.max > $ echo 1073741824 > memory.min > $ cat memory.max > 1073741824 > $ cat memory.min > 1073741824 > $ cat memory.low > 0 > $ cat memory.high > max > ``` > > 3. Set up and check memory settings in the child cgroup: > > ``` > $ cd test-child > $ echo 1073741824 > memory.max > $ echo 1073741824 > memory.min > $ cat memory.max > 1073741824 > $ cat memory.min > 1073741824 > $ cat memory.low > 0 > $ cat memory.high > max > ``` > > 4. Add process to the child cgroup and verify: > > ``` > $ echo $$ > cgroup.procs > $ cat cgroup.procs > 1131 > 1320 > $ ps -ef|grep 1131 > root 1131 1014 0 10:45 pts/0 00:00:00 -bash > root 1321 1131 99 11:06 pts/0 00:00:00 ps -ef > root 1322 1131 0 11:06 pts/0 00:00:00 grep --color=auto 1131 > ``` > > 5. Attempt to create a large file using dd and observe the process being killed: > > ``` > $ dd if=/dev/zero of=/tmp/2gbfile bs=10M count=200 > Killed > ``` > > 6. Check kernel messages related to the OOM event: > > ``` > $ dmesg > ... > [ 1341.112388] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test,task_memcg=/test/test-child,task=dd,pid=1324,uid=0 > [ 1341.112418] Memory cgroup out of memory: Killed process 1324 (dd) total-vm:15548kB, anon-rss:10240kB, file-rss:1764kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0 > ``` > > 7. Reduce the `memory.min` setting in the child cgroup and attempt the same large file creation, and then this issue is resolved. > > ``` > # echo 107374182 > memory.min > # dd if=/dev/zero of=/tmp/2gbfile bs=10M count=200 > 200+0 records in > 200+0 records out > 2097152000 bytes (2.1 GB, 2.0 GiB) copied, 1.8713 s, 1.1 GB/s > ``` > > Thanks, > Lance >