From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FCC522092 for ; Mon, 27 Jan 2025 06:55:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737960952; cv=none; b=T9DPytDb3yX2kq730Z9t9pn5XAj4CotPHuxns77ISnxde14/0GtKbW/7SxQnoiSG0dt0VWffmEBZK7EcsIsplg3EPcJX0ersMx+qF02SB40+mCTVPwy7oDtWfF/Ph+qs0eDUmLJrTk2U7W74o0Fno2xYLIJNNZNXBSnXamAP2/c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737960952; c=relaxed/simple; bh=lDb4GIVl/F+JYVPzUCSzlZlsiwSu/HZ8kfJeHCFyz/0=; h=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References: MIME-Version:Content-Type; b=OtScJkX5gprLe9K+TVVP0f561D53Yuf3bwhr2ErBIj+ujgWinNZZBOLK5sPgYS9GtukGVSYBn+3LF8+Tm4Vb2SuRQITAORX2G+0bUJBlPFDv/p2bZg/RwI4/rhPo610iZx8CtLAsLnrK0TTtPXiK3D8tWnVSTJggoOvqf8t0dsE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=1nKUXvD8; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="1nKUXvD8" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-2163affd184so163885ad.1 for ; Sun, 26 Jan 2025 22:55:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737960950; x=1738565750; darn=vger.kernel.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=4wfQi+VRysSXzPv7Sv93reG91bbyjwdHfo545GyBzaY=; b=1nKUXvD8FM4sehz//jfjQjOGA4Pg/xod38bihr9tXlrlP7Zsg9VsRsyx0Hl/GAy6ce GOYyV+8myjTmQX7ZFN3OYFys9Wj9atOkXJ7BlX9VK5adCzsZIatcEt3tROGJgn3MS9rU 1H5KxiNlt+/7/q29c/s0XLlUrzTiQvu2PumAHYEIP9C5u85upwyQr514QlNDVWSxggiE UsaUaF+CbNxexb5khpeBkpJREjIAS+qEPiV0uuR79bZaSIJ+qkDZ0bbsVOi5Pq051reG zaF6f6Qr1vHZBkM6UkFk3J6IxtJht62V1725FQHh6stbXdD40yvE/wt0kYXq/2GimKXm wpUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737960950; x=1738565750; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4wfQi+VRysSXzPv7Sv93reG91bbyjwdHfo545GyBzaY=; b=eWzQbrzLg9XXzwVsFRZv8QDPCvDcTnWzM68ji0QW98x48V2Zdc2AAx+TyHoR50EcJv 2aI3sZbD3K3Cld2WmBnxki9wUZlBnC68JGXREcazjfhkZ5HrdpoZX4IkNdEMKKW/1alh KehvxW6IB0gWtN7H7XjKeQuB5xuUZ/IUm0u/vzC9uD+ZfNU1EDi9Erd6zPJqn1a3UslQ epCoaRZtCnpYUyCNdTBrgc+R8gyMFTGCcy3k+4/UXCS9jwm5Ke0+8agSoJlkFW9LGLIG 52FOxhGT+sQHW2cttJ0jy/7oTiQGLj4CTVQQ7aCDNfrcRhvucjPHfIx1CBXkIUWWYynH xiYA== X-Forwarded-Encrypted: i=1; AJvYcCVKozqLJz9xVc0NwMIm1v+LVKFi6Vo4ZfZengJsZAbOXbBMIBeNdEnV5HTNqE/B/WnAWfzGG1soyYkyQvs=@vger.kernel.org X-Gm-Message-State: AOJu0Yw3y8WJ5upwe+mrmNd2XXoUtWxGsjYQ+hRjNKK1kEeC+H5Ir9n7 oQ+Mcc0Uc5EDxcVOvzoTMH0YAAILQhVsB0XSsFpBcqRMoLPXSXWpyIlOb+NApg== X-Gm-Gg: ASbGncv5JABbtP06IuOysJ11c21tq9WfPjhTdp7zftdNO5pK/3L32B69jk2hR3VhSNm R3NSV5GabC3qA/WokveA1lm9PaHbcjqcMJjyUkGZtSBhw8zVImyhbYOQ7wcJILQLI5cntPOdgmj bdhin58McKeZfjAJ7E9ceOeS3FfQeFznFSQA6NCexvvDeT4BOoDXfkYPYmvt808BF+ZRCOmB+db MpoxQu0bljRX3TebIDjSSeKmZmM0g6iANJEuBs0dCXSk/Q5p4hbg+aG767LB6plY1C8gL+ZVCEF 3oIC/3G4BYFUASv/5K3jwTTM2NzAG57CS7s= X-Google-Smtp-Source: AGHT+IFEqfbeimL+xhM+fNb+Xy/kok4WLeJmsHgJYlD59tLRzk+6aTM7RWwdLPVJTkad1O4Oyg1sKQ== X-Received: by 2002:a17:902:eb83:b0:216:6ecd:8950 with SMTP id d9443c01a7336-21db0383784mr2600065ad.19.1737960950296; Sun, 26 Jan 2025 22:55:50 -0800 (PST) Received: from [2620:0:1008:15:a895:32e7:423e:b2d8] ([2620:0:1008:15:a895:32e7:423e:b2d8]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72f8a77c882sm6520994b3a.146.2025.01.26.22.55.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Jan 2025 22:55:49 -0800 (PST) Date: Sun, 26 Jan 2025 22:55:48 -0800 (PST) From: David Rientjes To: Shivank Garg cc: akpm@linux-foundation.org, lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, ziy@nvidia.com, AneeshKumar.KizhakeVeetil@arm.com, baolin.wang@linux.alibaba.com, bharata@amd.com, david@redhat.com, gregory.price@memverge.com, honggyu.kim@sk.com, jane.chu@oracle.com, jhubbard@nvidia.com, jon.grimm@amd.com, k.shutemov@gmail.com, leesuyeon0506@gmail.com, leillc@google.com, liam.howlett@oracle.com, linux-kernel@vger.kernel.org, mel.gorman@gmail.com, Michael.Day@amd.com, Raghavendra.KodsaraThimmappa@amd.com, riel@surriel.com, santosh.shukla@amd.com, shy828301@gmail.com, sj@kernel.org, wangkefeng.wang@huawei.com, weixugc@google.com, willy@infradead.org, ying.huang@linux.alibaba.com Subject: Re: [LSF/MM/BPF TOPIC] Enhancements to Page Migration with Multi-threading and Batch Offloading to DMA In-Reply-To: Message-ID: <3b59ea3e-04db-ad38-97b1-20cff0f8f17c@google.com> References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII On Thu, 23 Jan 2025, Shivank Garg wrote: > Hi all, > > Zi Yan and I would like to propose the topic: Enhancements to Page > Migration with Multi-threading and Batch Offloading to DMA. > I think this would be a very useful topic to discuss, thanks for proposing it. > Page migration is a critical operation in NUMA systems that can incur > significant overheads, affecting memory management performance across > various workloads. For example, copying folios between DRAM NUMA nodes > can take ~25% of the total migration cost for migrating 256MB of data. > > Modern systems are equipped with powerful DMA engines for bulk data > copying, GPUs, and high CPU core counts. Leveraging these hardware > capabilities becomes essential for systems where frequent page promotion > and demotion occur - from large-scale tiered-memory systems with CXL nodes > to CPU-GPU coherent system with GPU memory exposed as NUMA nodes. > Indeed, there are multiple use cases for optimizations in this area. With the ramp of memory tiered systems, I think there will be an even greater reliance on memory migration going forward. Do you have numbers to share on how offloading, even as a proof of concept, moves the needle compared to traditional and sequential memory migration? > Existing page migration performs sequential page copying, underutilizing > modern CPU architectures and high-bandwidth memory subsystems. > > We have proposed and posted RFCs to enhance page migration through three > key techniques: > 1. Batching migration operations for bulk copying data [1] > 2. Multi-threaded folio copying [2] > 3. DMA offloading to hardware accelerators [1] > Curious: does memory migration of pages that are actively undergoing DMA with hardware assist fit into any of these? > By employing batching and multi-threaded folio copying, we are able to > achieve significant improvements in page migration throughput for large > pages. > > Discussion points: > 1. Performance: > a. Policy decision for DMA and CPU selection > b. Platform-specific scheduling of folio-copy worker threads for better > bandwidth utilization Why platform specific? I *assume* this means a generic framework that can optimize for scheduling based on the underlying hardware and not specific implementations that can only be used on AMD, for example. Is that the case? > c. Using Non-temporal instructions for CPU-based memcpy > d. Upscaling/downscaling worker threads based on migration size, CPU > availability (system load), bandwidth saturation, etc. > 2. Interface requirements with DMA hardware: > a. Standardizing APIs for DMA drivers and support for different DMA > drivers > b. Enhancing DMA drivers for bulk copying (e.g., SDXi Engine) > 3. Resources Accounting: > a. CPU cgroups accounting and fairness [3] > b. Who bears migration cost? - (Migration cost attribution) > > References: > [1] https://lore.kernel.org/all/20240614221525.19170-1-shivankg@amd.com > [2] https://lore.kernel.org/all/20250103172419.4148674-1-ziy@nvidia.com > [3] https://lore.kernel.org/all/CAHbLzkpoKP0fVZP5b10wdzAMDLWysDy7oH0qaUssiUXj80R6bw@mail.gmail.com >