From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yb1-f181.google.com (mail-yb1-f181.google.com [209.85.219.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CE5818A6DF; Fri, 13 Jun 2025 15:25:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.181 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749828322; cv=none; b=lKWgZyOED3Pj/Wi1bP2kTpphnBhXcYlASW/gBnky7mQqLM5upySL+7U+pS8Sr6q+oHTSdT2O1umG8F+P1shdTBo3AXTPrp6Ql2kgoir6dYWI2A/i9KvkTneULS9N/bzBsvKu5ufDKuDHlb12txWCmZyZbkXQqdbGHKkGqtY0gak= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749828322; c=relaxed/simple; bh=x0q3aAgsaGr1YImdUr5Y7Vj0uSBEzksUJDuAngb1phQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PmYQyey+Y19QYZJ2luQqrx1BlsUeKyWPO0A9aDD7khXOGDdrjaf41AolfGK/KC4IC2okhroNjnXx/F+76phLeI6L2sepr4f5y0XHv2+FXX17fqLPFZFR7S0k2EQwFEOTUFrV2zmzJ45vqqKxeaRqT19gMz0oWbUGHMQ2anKKbHc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=a0V1xnaO; arc=none smtp.client-ip=209.85.219.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="a0V1xnaO" Received: by mail-yb1-f181.google.com with SMTP id 3f1490d57ef6-e817b40d6e7so2121270276.1; Fri, 13 Jun 2025 08:25:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749828320; x=1750433120; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nLBwQeoFr9oCLfiw/yHROpbDBnE11hBXVxYow8WnAKg=; b=a0V1xnaOAu8NHUISjEJbOfEEhBxr7bVd4y4K1QY+xb+CsH67HXGu3DO2WHZtxtqhLX fe5lfdd2D02bCj3HZEpXzeAfIdIqJDm2YTBgKhwt8ENB1qAnYzICdVVZ1/NrBzu9xG3I O6JIFZ84NmDQNePoSeeMe0pqE40j54pRTpiBfzWGrBY/m7gV9KMNl30NDTrgh7TTNjJA EKZekYmZDJKWI3q+Wyji4/aiX6QvP9phcjf5o5Ynh0a67KRRnir8WErEH/aLTf9hL1vh JhT6paozdOPDZAqUK9tRX7GDOax2jEsbemwZ7IJuXl2urAUDZecenwkKT9DoxIpNFbSN BXJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749828320; x=1750433120; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nLBwQeoFr9oCLfiw/yHROpbDBnE11hBXVxYow8WnAKg=; b=tpehwPg8/X8vb/VmtPf/qVu5dFb2lPP695WnXUSvoagxBwXmsNJkNyRS4YiByM9RmL oGqaBtTCjPbrAztHHFP2wF4WDzOfDtUKx6242cIHBPvnm4g8VBZugaNmh8MmoHZi1OMi yZitInD87kHSsOsgKrDmP7ih7h9ZcDjExe0UcsTn0Waf80yZBkuZvXG3HNTwUc42T8BF GIQ6DEblf2cv83TJtnwEyGU8grkNPg4UPi8pK7uOh+NkS9JyqRFpj3XunBPYNqCFWOgR whe+kvyi4B7jtVyz8i4R4qy3zOZm3RQySNWLfI6t3knMf7e81PZVmacJNoDME/+ctrn6 iKgw== X-Forwarded-Encrypted: i=1; AJvYcCVaDiVEmQMuLBbK6eUhNZgg2DAR9XfZPu9wIPmp+N0ni/OwP1AsTN57cuM1hONmwtgC577on8Fo9RS38y7L@vger.kernel.org, AJvYcCXNHZh5nfyoBsRCnrFIcSMxsVBSX8g7+Sc0jCcKYpcHrcw4GLO9jZgnMbrS/KQMhzyZFLuT0WUjroY=@vger.kernel.org X-Gm-Message-State: AOJu0YzfaBweEM0ezucGlLiFru7DAZtpxHMEAnf8ViScLr7W9hDu8kJN W6wHeRt23xLdIrNc03h9of0d+F4NYLY9tlXpxpUx9K4nA1LTf5Dq4Dlh X-Gm-Gg: ASbGncuLegFc+DYp8YvfjHEZygSfuwyGW1awXa1jtCG75ZdKlbF0yYJTN30OcJsm9MN Wk39rEa0gHvKYQP14ihZEiD1KtX7a3AKR3X3oOuW999GNF2kHPb1P2sXqHBKoz5OJhcJHCHHmyd CNiFJZuQszZYYl+WrWL7H6V8chVMHuph8+L+OMme9dM0IO2bT7hlrsQEe7Gc0w/muy5vemYJ/w9 wOL9UJR8hqj2fMM+gYytig5siEFfLIf50c1npxMuus/ej+fGMXKBE5G54IkrOWxK4PheYrFDzX6 ZEFlaDcOy8acClS4lv4OXddHf+Do6wnA8tptbvHPUlR1PBZM4kv7Jv8+tkzb2Q== X-Google-Smtp-Source: AGHT+IHXDiLtYHaZndPHQeLdgKaaAgPSkVlO6f5fnt8QbVgvy/P9bE8XmqcR3tg67nBnU0fusVEqMw== X-Received: by 2002:a05:6902:1ac4:b0:e7b:3d15:10f0 with SMTP id 3f1490d57ef6-e822ad5f1e5mr140623276.31.1749828319830; Fri, 13 Jun 2025 08:25:19 -0700 (PDT) Received: from localhost ([2a03:2880:25ff:4b::]) by smtp.gmail.com with ESMTPSA id 3f1490d57ef6-e820e0a6d0fsm1156117276.25.2025.06.13.08.25.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Jun 2025 08:25:19 -0700 (PDT) From: Joshua Hahn To: Bijan Tabatabai Cc: damon@lists.linux.com, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, sj@kernel.org, akpm@linux-foundation.org, corbet@lwn.net, david@redhat.com, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, bijantabatab@micron.com, venkataravis@micron.com, emirakhur@micron.com, ajayjoshi@micron.com, vtavarespetr@micron.com Subject: Re: [RFC PATCH 0/4] mm/damon: Add DAMOS action to interleave data across nodes Date: Fri, 13 Jun 2025 08:25:09 -0700 Message-ID: <20250613152517.225529-1-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20250612181330.31236-1-bijan311@gmail.com> References: Precedence: bulk X-Mailing-List: linux-doc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit On Thu, 12 Jun 2025 13:13:26 -0500 Bijan Tabatabai wrote: > From: Bijan Tabatabai > > A recent patch set automatically set the interleave weight for each node > according to the node's maximum bandwidth [1]. In another thread, the patch > set's author, Joshua Hahn, wondered if/how these weights should be changed > if the bandwidth utilization of the system changes [2]. Hi Bijan, Thank you for this patchset, and thank you for finding interest in my question! > This patch set adds the mechanism for dynamically changing how application > data is interleaved across nodes while leaving the policy of what the > interleave weights should be to userspace. It does this by adding a new > DAMOS action: DAMOS_INTERLEAVE. We implement DAMOS_INTERLEAVE with both > paddr and vaddr operations sets. Using the paddr version is useful for > managing page placement globally. Using the vaddr version limits tracking > to one process per kdamond instance, but the va based tracking better > captures spacial locality. > > DAMOS_INTERLEAVE interleaves pages within a region across nodes using the > interleave weights at /sys/kernel/mm/mempolicy/weighted_interleave/node > and the page placement algorithm in weighted_interleave_nid via > policy_nodemask. We chose to reuse the mempolicy weighted interleave > infrastructure to avoid reimplementing code. However, this has the awkward > side effect that only pages that are mapped to processes using > MPOL_WEIGHTED_INTERLEAVE will be migrated according to new interleave > weights. This might be fine because workloads that want their data to be > dynamically interleaved will want their newly allocated data to be > interleaved at the same ratio. I think this is generally true. Maybe until a user says that they have a usecase where they would like to have a non-weighted-interleave policy to allocate pages, but would like to place them according to a set weight, we can leave support for other mempolicies out for now. > If exposing policy_nodemask is undesirable, we have two alternative methods > for having DAMON access the interleave weights it should use. We would > appreciate feedback on which method is preferred. > 1. Use mpol_misplaced instead > pros: mpol_misplaced is already exposed publically > cons: Would require refactoring mpol_misplaced to take a struct vm_area > instead of a struct vm_fault, and require refactoring mpol_misplaced and > get_vma_policy to take in a struct task_struct rather than just using > current. Also requires processes to use MPOL_WEIGHTED_INTERLEAVE. > 2. Add a new field to struct damos, similar to target_nid for the > MIGRATE_HOT/COLD schemes. > pros: Keeps changes contained inside DAMON. Would not require processes > to use MPOL_WEIGHTED_INTERLEAVE. > cons: Duplicates page placement code. Requires discussion on the sysfs > interface to use for users to pass in the interleave weights. Here I agree with SJ's sentiment -- I think mpol_misplaced runs with the context of working with current / fault contexts, like you pointed out. Perhaps it is best to keep the scope of the changes as local as possible : -) As for duplicating page placement code, I think that is something we can refine over iterations of this patchset, and maybe SJ will have some great ideas on how this can best be done as well. > This patchset was tested on an AMD machine with a NUMA node with CPUs > attached to DDR memory and a cpu-less NUMA node attached to CXL memory. > However, this patch set should generalize to other architectures and number > of NUMA nodes. I think moving the test results to the cover letter will help reviewers better understand the intent of the work. Also, I think it will also be very helpful to include some potential use-cases in here as well. That is, what workloads would benefit from placing pages according to a set ratio, rather than using existing migration policies that adjust this based on hotness / coldness? One such use case that I can think of is using this patchset + weighted interleave auto-tuning, which would help alleviate bandwidth limitations by ensuring that past the allocation stage, pages are being accessed in a way that maximizes the bandwidth usage of the system (at the cost of latency, which may or may not even be true based on how bandwidth-bound the workload is). Thank you again for the amazing patchset! Have a great day : -) Joshua Sent using hkml (https://github.com/sjp38/hackermail)