From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B685DC76196 for ; Tue, 11 Apr 2023 13:04:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 410BE900003; Tue, 11 Apr 2023 09:04:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C10D6B0078; Tue, 11 Apr 2023 09:04:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 261E0900003; Tue, 11 Apr 2023 09:04:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 133676B0075 for ; Tue, 11 Apr 2023 09:04:29 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id CDB8AC0CC1 for ; Tue, 11 Apr 2023 13:04:28 +0000 (UTC) X-FDA: 80669129016.03.E07E691 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf29.hostedemail.com (Postfix) with ESMTP id 2D05B12002D for ; Tue, 11 Apr 2023 13:04:24 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=i5JVHa5e; spf=pass (imf29.hostedemail.com: domain of ligang.bdlg@bytedance.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=ligang.bdlg@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681218265; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TZUxjOTN3EEVrgEvX6fomzktb865tnCcQCLiDPcQhKI=; b=ky065YhiwbvNYLTndW6ZDdPvx7h8af2U8QzCpgpovZ2F0/oVal8JyIp/eVFPK7CEDobdLv AQ/mzipDGczXkcs3U2EhjbrPMRD0nHIZVP6fGiszjSi4zNG3EyRIC2gWL4wIiSptjwk4s+ 58wliZz1A/VcIzWZJzEA6LyLsyu18gQ= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=i5JVHa5e; spf=pass (imf29.hostedemail.com: domain of ligang.bdlg@bytedance.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=ligang.bdlg@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681218265; a=rsa-sha256; cv=none; b=yogenPTSfGw0IBozM8HTgyc/vlfGmjU3q6UZ+erKeE8F7xO6SoCHIa7AiT1hJsOr2heZNW fgYvUM1E3eJN6QZ8fjy08C+3Bdtn4Ky1MmJ9a22w+AT9Yof9BPSyVyNFRkro1eKKZx9FGp jh0rrbPGOPLabVX/JCMDWzhLT9rPhMo= Received: by mail-pj1-f50.google.com with SMTP id y11-20020a17090a600b00b0024693e96b58so6765697pji.1 for ; Tue, 11 Apr 2023 06:04:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1681218264; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=TZUxjOTN3EEVrgEvX6fomzktb865tnCcQCLiDPcQhKI=; b=i5JVHa5eM9KbQAKWODGWgoGm8LPXMSkBcQk0PHI3WSU6doos8dY3Jdj27hKGayaVka Nyxor8+0DirCbLUdnpgXe6e/OvT6Y3DNmSlM3KvuyBvNlVvtatXcJt7DQ1oH7wx4x2EI rvWneewVZL4FIVDgP7JYRH4AWs8kHIAAhzRlt0xLvFIn+iZHhb6xelC51sIQ9yDi0K1N /K9eTKj3lYhrPuxELIq7vLoDzhc1C6WmpDonYb0BJCGPJrx8guObqyAqQDO/RPydGVgV EFAvOY83IlO434GmAkUT8YJ6cxeQ8HWch3zrSVBKhuM3ddVPMwAngocF5fsQ9I+luQA1 njQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1681218264; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TZUxjOTN3EEVrgEvX6fomzktb865tnCcQCLiDPcQhKI=; b=YWEcsrQxusW20wAr/zOJMz6Wx+en152HUi0unxGiN4EL7BxNx7FnnR2kckQMAOrBDv SHYQS6/T2XHOCt4khljkiCVXsK4rfKU/T4Zkzw4hipNPrE/MceYeOiRix2wyj7VF7BEp mr+joZ/yyBYJEd0QWkf7+/U0CA1fVnHV/4vikgFfxDIIHJjntXtCQ0bTyX55hcmm9Cuh PNFEs7nyIqjihcYL94qHb8tbrN192Bk8+Bdx58JnbkgXv4In9+nj/3tRPWgu5hternts yLmKysEHKuArGWQb318JGHqZg8P6wnYeqgkdwV7S9LvF53HKncKz0YUUnfNR0xDZUjgW 4E0g== X-Gm-Message-State: AAQBX9dkLQMcDWKEJ62fA3JWQUX7HU3Vu2Z68qC1weRXlgXsGY0miGBX 1+BF7oLCnVYepaS84h6LMzHGKg== X-Google-Smtp-Source: AKy350YwNMjah0OluGNu+/pypB0/gnmcJ/Jr9PqVh9UZDcWB78JzQ8zLBl/fBlxEkTDa9tQFrwPSag== X-Received: by 2002:a05:6a20:c530:b0:eb:b8:bdc8 with SMTP id gm48-20020a056a20c53000b000eb00b8bdc8mr2482627pzb.57.1681218263733; Tue, 11 Apr 2023 06:04:23 -0700 (PDT) Received: from [10.2.117.253] ([61.213.176.11]) by smtp.gmail.com with ESMTPSA id v16-20020aa78090000000b00625d84a0194sm9826012pff.107.2023.04.11.06.04.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 11 Apr 2023 06:04:23 -0700 (PDT) Message-ID: Date: Tue, 11 Apr 2023 21:04:18 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: Re: [PATCH v4] mm: oom: introduce cpuset oom Content-Language: en-US To: =?UTF-8?Q?Michal_Koutn=c3=bd?= Cc: Waiman Long , Michal Hocko , cgroups@vger.kernel.org, linux-mm@kvack.org, rientjes@google.com, Zefan Li , linux-kernel@vger.kernel.org References: <20230411065816.9798-1-ligang.bdlg@bytedance.com> <3myr57cw3qepul7igpifypxx4xd2buo2y453xlqhdw4xgjokc4@vi3odjfo3ahc> From: Gang Li In-Reply-To: <3myr57cw3qepul7igpifypxx4xd2buo2y453xlqhdw4xgjokc4@vi3odjfo3ahc> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: 4edh88xcy7ctgw5ws9mic7oz7nzmnmtp X-Rspam-User: X-Rspamd-Queue-Id: 2D05B12002D X-Rspamd-Server: rspam06 X-HE-Tag: 1681218264-701657 X-HE-Meta: U2FsdGVkX1+63Uia4nxi7ZphZ+s+6TLWxQtbTo1yUtNxbyIH1ZSG6SSmcZtiiTVfzDOlvOizp5aFmBQZcs3GrtfIpw7JDJA+Ms1FmtmtnT4JS6Ab/GnUzZXALZWR8r7GVCoxis1rhF1RjAbpt1N6cFJZCBFXuSeDrvY2Bs4EjSEaaZNPh5HG/H9NirdHv5iF4YDGPdyzRzyTdhFofN10Bp51hicGBy6FeHxb6Sch/UWnA8cQG8hS08Vq8iYQciLRU1IOsbTgg+4Y8fxVeNtPEUU7nmjb+KTNkwOkZVJWcNvjKeyhRfmJ5STONw6Xd78I/3p+v9a8KBnXxBnPWVuyE9Cts2KuYgsXPYqS5LtoqbaI3vTEaljQxeDodFGFCp/+IBMsyEsruJaK2o7xtevkGegzPe7h+WvWNBExl0BgRuOjXnQ4srjCrhwec3dONNrCTgzeCZYRNBb+Kt4IFpdtP0H0KPoW+1AKzbhSFiD8wdDUVThzYKnoCDR9YN+Bgl230n2oOuywmRaaZLHJumCf4adQiaGWwHSAZ8ZR0ELY8vITfQg3r1ILhSaRF8y12P/g/LrIL9olONm2FQP//k1EP/QKKItCHrRpK4rLynImT8MRonOCtaJlZjBYVjeZoY1kYrw2rKOFLqrKji3FFV/6tzljkvi8jSXs5t7fUK4arV7C+ZlDmY7co7ni66gF/F0fYvj62MXOT+xvouIoUAVINO5Ew7t87GGPVoK//FdGdwLjgYGMFJbFXuf59N2g4l+W+1Lic7gysPRUkrFcCt3m1i9Dq2dLq96NULXoNRPU/LIQM4TOfWffuCj/ahbxnYyc6w41vmPOTnDWQ2BO4whGLnpu06ceggRyAgeZHgKIkXfeW73/ZdBtAyT9tvG7BZE24bLwSKWT+/8gPsvkv9jgrPcmaQPq2oEHp/0UNyRMQosWN8woLImYXElo7N5kkw0K4D2bfQ8y5vSQG3vTU4T Ci3XUeeP nti/q6wRRFqaCvST5KyAho77hqioU0cGuwKMfoOeaD1bWrLU7uNODuxC6b1BjlstXNWIqHlALPH9+pZiREgHuN4wvJsi1KK8d0+jth4jdSYFt9bfdOrbf2mKOrr6Y1sSKfILzDE5G1kw3EVCoQcftFl55BRimJfOjYRO9lLbH3rKJy+7KaxB5badNCU3HI+GYZqd+J9dRrloHc4s3xk6L0hXi4uqut5gayHVtEUOLnnO8chgnkhPkcGfy/W5ed6wJRDIv2VthZdPq8GK+FdOksSePCPeZ8S6HWUZMsvnbBcCwv/GQPlXXXCi7IHbMCO5jBjcfNGgqE3mxvBIFYSobaYY8k1ZimTm5XaW4piPbMT4DPfU33nkYxN5/gL0xr86PVDpOH7sE4glcXNqzMNzA7SmeIW3m0ilRHAqQDtM/5hndSU1lMTjBVhZtAQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000049, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/4/11 20:23, Michal Koutný wrote: > Hello. > > On Tue, Apr 11, 2023 at 02:58:15PM +0800, Gang Li wrote: >> + cpuset_for_each_descendant_pre(cs, pos_css, &top_cpuset) { >> + if (nodes_equal(cs->mems_allowed, task_cs(current)->mems_allowed)) { >> + css_task_iter_start(&(cs->css), CSS_TASK_ITER_PROCS, &it); >> + while (!ret && (task = css_task_iter_next(&it))) >> + ret = fn(task, arg); >> + css_task_iter_end(&it); >> + } >> + } >> + rcu_read_unlock(); >> + cpuset_read_unlock(); >> + return ret; >> +} > > I see this traverses all cpusets without the hierarchy actually > mattering that much. Wouldn't the CONSTRAINT_CPUSET better achieved by > globally (or per-memcg) scanning all processes and filtering with: Oh I see, you mean scanning all processes in all cpusets and scanning all processes globally are equivalent. > nodes_intersect(current->mems_allowed, p->mems_allowed Perhaps it would be better to use nodes_equal first, and if no suitable victim is found, then downgrade to nodes_intersect? NUMA balancing mechanism tends to keep memory on the same NUMA node, and if the selected victim's memory happens to be on a node that does not intersect with the current process's node, we still won't be able to free up any memory. In this example: A->mems_allowed: 0,1 B->mems_allowed: 1,2 nodes_intersect(A->mems_allowed, B->mems_allowed) == true Memory Distribution: +=======+=======+=======+ | Node0 | Node1 | Node2 | +=======+=======+=======+ | A | | | +-------+-------+-------+ | | |B | +-------+-------+-------+ Process A invoke oom, then kill B. But A still can't get any free mem on Node0 and 1. > (`current` triggers the OOM, `p` is the iterated task) > ? > > Thanks, > Michal