From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF9DBE77184 for ; Tue, 17 Dec 2024 15:56:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0FA856B0082; Tue, 17 Dec 2024 10:56:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0AAFE6B0083; Tue, 17 Dec 2024 10:56:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB4756B0085; Tue, 17 Dec 2024 10:56:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CC7C96B0082 for ; Tue, 17 Dec 2024 10:56:00 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 66B95A0804 for ; Tue, 17 Dec 2024 15:56:00 +0000 (UTC) X-FDA: 82904901786.14.6DF2B0D Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) by imf25.hostedemail.com (Postfix) with ESMTP id 25C53A0013 for ; Tue, 17 Dec 2024 15:55:35 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=IpzyH4ja; spf=pass (imf25.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.172 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734450924; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vZPTMFYjgagD9vXUB8hfJV4lWIxsKDKRvleZqrPiixs=; b=dyTnhPbxLcYFRiqX4KFaZdaaHfVqgw+A7OfSCFpp4aAi2VWPtSDx5FDB61f3P/xmjPtMj4 HyJ1rjP2bkySL8m4uooa1+SvXC5lRQDAIMUjIggltu072sqYnINvnDEaKjY3I+LmPS6cTW lhLQlPw8qGmfDV4GEL/Ix3O8gC/RvJI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734450924; a=rsa-sha256; cv=none; b=Qqy4z+YYiFZQTsi5s3zrVOeRSE/LDynS5Ut2wPU7NlqY3ng0bTA6hDjkWxR1gYfz0GSGGi qwmDKQsT8iBlYorW9eaNVBmhfzmReQegOhOERPXgzunKeLgOcYpKaeICijXQHaQXLwvfB6 32vkQ9U4MHtltWjV5KM9A2Y19IXbewg= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=IpzyH4ja; spf=pass (imf25.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.172 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-467a37a2a53so49504161cf.2 for ; Tue, 17 Dec 2024 07:55:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1734450957; x=1735055757; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=vZPTMFYjgagD9vXUB8hfJV4lWIxsKDKRvleZqrPiixs=; b=IpzyH4jamfYU34s4xDHzOP6MWlM2DecgSz6v0T3Cb/IAiNxQi3RkG5L31Er5lD+yU3 2Q74dT7DW9+F8LYjDGGrTCS6LxhLanu0AzUQebpHnl1TFGZ2boeJpxllfeA+x3OOVPEw TD3xJx5Zk+RLESvPa6gNuxS10RqOA/AGZV2WBLGiFmpjrT3h0+oMeR7L7Z5/J8gpdK3X H4S6cGJHkz2eLt2F3vVab0uFCxoXsqy/Dr4bq+9YJo2Drm/rQ/8W+7K4zXi80CZEPx5A rBiIxi2nYKTrh7Q1i2JMDI2+S4Y6ixfCX0hl3i430EpfB10dB6Q+6Fe0FsYp44q1mHjn Ql2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734450957; x=1735055757; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=vZPTMFYjgagD9vXUB8hfJV4lWIxsKDKRvleZqrPiixs=; b=JzhemBJRhFb7HxA+fFORp9gTsKfJOXsDqbLmPNKHZJn1kF6dXD+Qdo7HnsiACwHRgf Zd14Wy2tSFzA3Q4owDFDNm8T6qYs85sFyoSLBvtTREYnkarsK0fTQIwpNIcPv+IL5N8y bPj62G5k7d0KqkiaG5XB3+FoTq/jfFpCBxOHRW6yCIilOiHUt80ms3Nl5mpbEGIDlMGB VuSSFr0LpfEPHXRwkbwPswFdB1b74G2I3Fm1OkxsbqgYEkNCdn4Z4CbyK556idIFuzF8 R8n7OyziHhTJxmPQHjIW3gRwUAHp/pZ4K3ElTgbhp8+XQwRox5/HHpzeezDJFkc7P5h3 gYLA== X-Forwarded-Encrypted: i=1; AJvYcCUN0ju5UmmY/CxuhEZlQo06+tVgfKQLsIxnAGrjcByIfBYkOwRYqRguWZv0Bmw6WpEkOlERZgl0FQ==@kvack.org X-Gm-Message-State: AOJu0Yxf89hMv8mve7TEG/3OGimxnpWOeTBp2OIhRYb4VNoWN5W29ta9 XRI/3Kd6OWtgnzb5L96PHdA1iHxONNULSRYIGU63sa38ILbxOMsrRNVB+4Z77FU= X-Gm-Gg: ASbGncu9F7sKetHRvfzAdg7uyZ+QH8si9E7ExMpbtLlyAeRrhX3UL+2cESDINvU95Lx lTIyEalfYLzLL/W4IH0sAMIXy7szbzvX2a5LIC4EjwXR5lcHSHuM0IhrjnBvY3rgvM1pEWA5M4+ 0IbI0rMVcFkl/7v8njp/LlyqjaaCJLdiwPAGNEE0Fx63sHv9Wh4iA7upPXB91MhTpeT2LgKHWn0 BV1O78n2no2HfTjqSd11jWjkyUPw7TTRLWfzS49QcZXnrvf4Hd8iWw= X-Google-Smtp-Source: AGHT+IEaT2Pv15sPRiFkxD6+c3DIkhwkYwdNFL+CjC45Xpa4bv6O8SA2c0O3In5fD1t0Mm3JFoucOw== X-Received: by 2002:ac8:57c2:0:b0:467:7295:b75f with SMTP id d75a77b69052e-467a581ac95mr339540231cf.38.1734450957097; Tue, 17 Dec 2024 07:55:57 -0800 (PST) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-467b2c69b10sm40454121cf.18.2024.12.17.07.55.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2024 07:55:56 -0800 (PST) Date: Tue, 17 Dec 2024 10:55:51 -0500 From: Johannes Weiner To: yangge1116@126.com Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, baolin.wang@linux.alibaba.com, vbabka@suse.cz, liuzixing@hygon.cn Subject: Re: [PATCH V7] mm, compaction: don't use ALLOC_CMA for unmovable allocations Message-ID: <20241217155551.GA37530@cmpxchg.org> References: <1734436004-1212-1-git-send-email-yangge1116@126.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1734436004-1212-1-git-send-email-yangge1116@126.com> X-Rspamd-Queue-Id: 25C53A0013 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: pcpqcx4eht74mmhyqd3awtrmb4qdpm6h X-HE-Tag: 1734450935-579319 X-HE-Meta: U2FsdGVkX1/XrEoOpe/iIxNRAtRRkHSfhDBeqLNXCirsyGFBTpYOkpP4cGbF1XyLQuVj8CskT5wpvxhXt0/tdwPvXpRgew+zvhPaOLyl/3umYxdrlzTpUyE+SRxkg+IOilRzP4j5+tCba1N+e2d0H2IpBiIzYShig8JcZYah6kfzire0ncZUNvA676l8M57fkKs/uUI2JorDk0kFR+PSlI2pNCf1IlgEamJFqVfUnGCp2G+xgtpeNcqR9ISId4eg1SkLCVN2qZV7Ns2CEWmMq8OcGk2ZCmf1D0wSylHQZ8+r2wY5CekZIbiwgb4sygFaheLOx+ZG2UWbRx3YRQ11yqJIuwNq5T8teeHG+vIs0d7xYrUDG/w13aI3ZlwHWFgCuJ1BdADNRhYndth47VXtzCvrROipyyBnvkqgLf2nLTtmUwUwgwkcn05EfKW38iXi2tZWcOupWQDBs5KvE7VWNna5NxMB1vmFeUU7wEeHkAVKEhp0B2lFF0iPDjKWQlEBSif1ZGi2iLxIOqx45nE2Lonj+mn198NKRvqqCawMkjtcB867Vq2r2enDgJE03psDVmsLv6XyAr5HH4+PSfAtiE8D8Pc1ytf8gZMyYrCkAsYdaAkdmQQKRHb+bJKoRH0DmQt/hUkiONft2PJpTyUfO67yKpVLn3AOearUkvxyhJVmZVdSZORNsGmTp2emT9GSG7dhqEanWT8jQM0LEABqW1pV4pzMspzopHMPjeZq6DPt0dxb9iadcQ3bZW4LaAD77NzRJ62leUlZGLJ/iWNhYG0owO92pc9+2znCmET3qunGUwlsrfQQAui9YSYcfPy8YdxXphglQNgaO+1jrtAntTWpoCgWXW9LfRTbLj6NKJ3kOdp3SsZg+zNMUFveo33IW04Dd1TuqPpOTf/sDcLKly6Og1WjAJrEaUG2ClCT6zzsqqmBPlxDpt6o30fuB7HLnpDwfRoViUrTkqo58Rl WKTPPcg6 eZ67ILTmVLpv3TyP4Sd9VZxFcEG0ONAr5sGhAfHuFXgTnkK27NzgXfoep/mffPGk6Jr/pWOuBHWMreUL0xGq6N2X3UVyUSmq9EwYGuGEprCPUtdpbi9skYlC6HvO1rsvUJahPhLGVPbwVzS/iDLe+EwSEqoO8Rozm2WTbk9QcMSD6fTKyTSWn17yln7w7nex81DP1JWl60I7vz5thqZI0DdpaYim33AuQHWgMHW+X1f5+2S4O8sIT88L+5CZvXZu4IZKfquCjb4ZPyYQWQOSa82GtE8weW2zuM6J+/f5OqDPOkSLhaLylmedyTvyy2LPSxMjBJXZPT077XJyMgjekuGM4Vchiv0ur2Z7kY1jM8QDzl0PxibFyQ7/nLsU19K+e9csRW/ruHrmFzfUN9VMHZ0QGQzb65Q+xb99Iz0ktVk3QJdbxkbt2JjoiAvDJ43IGw3yFxQNvbAymCPflF4xoeDWDEVKAipwd9MEFyMucXEk5T3g6+EYGRaFtR1/ONU0aG5FORdSUukyec/xCfNNdwEpmxG5AHqdKWvlgECCcX/ZbQYcKkiwPjn+Opw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.317484, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello Yangge, On Tue, Dec 17, 2024 at 07:46:44PM +0800, yangge1116@126.com wrote: > From: yangge > > Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags > in __compaction_suitable()") allow compaction to proceed when free > pages required for compaction reside in the CMA pageblocks, it's > possible that __compaction_suitable() always returns true, and in > some cases, it's not acceptable. > > There are 4 NUMA nodes on my machine, and each NUMA node has 32GB > of memory. I have configured 16GB of CMA memory on each NUMA node, > and starting a 32GB virtual machine with device passthrough is > extremely slow, taking almost an hour. > > During the start-up of the virtual machine, it will call > pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. > Long term GUP cannot allocate memory from CMA area, so a maximum > of 16 GB of no-CMA memory on a NUMA node can be used as virtual > machine memory. Since there is 16G of free CMA memory on the NUMA > node, watermark for order-0 always be met for compaction, so > __compaction_suitable() always returns true, even if the node is > unable to allocate non-CMA memory for the virtual machine. > > For costly allocations, because __compaction_suitable() always > returns true, __alloc_pages_slowpath() can't exit at the appropriate > place, resulting in excessively long virtual machine startup times. > Call trace: > __alloc_pages_slowpath > if (compact_result == COMPACT_SKIPPED || > compact_result == COMPACT_DEFERRED) > goto nopage; // should exit __alloc_pages_slowpath() from here > > Other unmovable alloctions, like dma_buf, which can be large in a > Linux system, are also unable to allocate memory from CMA, and these > allocations suffer from the same problems described above. In order > to quickly fall back to remote node, we should remove ALLOC_CMA both > in __compaction_suitable() and __isolate_free_page() for unmovable > alloctions. After this fix, starting a 32GB virtual machine with > device passthrough takes only a few seconds. The symptom is obviously bad, but I don't understand this fix. The reason we do ALLOC_CMA is that, even for unmovable allocations, you can create space in non-CMA space by moving migratable pages over to CMA space. This is not a property we want to lose. But I also don't see how it would interfere with your scenario. There is the compaction_suitable() check in should_compact_retry(), but that only applies when COMPACT_SKIPPED. IOW, it should only happen when compaction_suitable() just now returned false. IOW, a race condition. Which is why it's also not subject to limited retries. What's the exact condition that traps the allocator inside the loop?