From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0FFCC3FB05A for ; Mon, 18 May 2026 14:20:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.43 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779114051; cv=none; b=qR+4YSfOMDGWZM4fuG04eyRLBrbT2pf1onsm6Py1uQQnQO4DRNBEcV+TrvQrimhjUaqstAnxG/K9lgK3inVfRj5UXOKHM5FeQGZWODnC4h6lEaz2xqpOSjwH95YbgRUWGtTTePSASkVqqUJbMfgwrrxSCnyU7+SyqRgxxZZqoc0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779114051; c=relaxed/simple; bh=Ot+YtaBMPpXrggDTrr/CycUG+ybBLJ7xo07KfT0+6Mw=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=EwJkR23jC9pnpTw7DyRXQQ94I61hPmKxUoVdMKVlozdBgEo9hoC2nQIFMcVwemREsBX9liE25BSu3J25NfomE8HANx/wyYAD9rFmW0G0kcNnEGj8K6wjKUY6MFsOq6xg+CCOQmBpjdUwbTl0YBDgqlSy87EQHv90PUpH3SFnaAY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=pD47nmsy; arc=none smtp.client-ip=209.85.128.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="pD47nmsy" Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-488af9fdaa7so10666765e9.1 for ; Mon, 18 May 2026 07:20:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779114041; x=1779718841; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=VgqsLcPceO4xLp9kcaA3sDp7OP4w6Hr5Y1X8/ZbJnD8=; b=pD47nmsyrbd1lBuw6mqbVKSMqUBUPO6aLLHQfu/95uFbCg/ibSi1cxWUNiKDiAP3rl G7GtLvtMCagWD+6W2NRVUhAHCfseLSdpvzTMQGlxWruibF7WW6x2sWUbTFh1d4COJYRH bdzytFwJ95qYwPHPVKfJoebVbOBSRXKEwugLNRPjSEPdiJCLZYyaClwCHBt1fX2fdSL8 X99S3AzX+eRrlXL33oz+HjqzxbVO22ybf5c0uUBAfHDs53PIqNYpWsxzsOuRcGE80e9U LICFRRkhzTxqtYtPhMWZQUZ9lFkiXo9TlSpRo1pIBQndMk9N2QFdPR58zMkiaFc2dCYp hVTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779114041; x=1779718841; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=VgqsLcPceO4xLp9kcaA3sDp7OP4w6Hr5Y1X8/ZbJnD8=; b=n+bjPovlatKccvuqgDLJ0EfBJch7oSGJ6kf86OXjLxjebPiTstDXeHaGu+wsRNSCBY XDYmSYOHgU3XiKz66V582Y9ZseqmH9lBT+84FjmLZroTa8uV34ciDf3FXWmI0XI2U0fH 7uH1on29vQ7vweI43yq8eguVpXW8ELRoe7ql6gW8a8bEU05E9mN94AUgHFZI7rNn++oA rhcvl4S4RtQX6gqQa+Ux3yGadO+C8plfOs230DhpFexRC/WR41CQ9gByxKFWyNghYKWw KRQQtFZa/A7Ilf4Q5oBa66U+URqdNHJJZaoCpg4SeHPLsfVV6+tkXw0yalifhhiPqZp9 C8cg== X-Forwarded-Encrypted: i=1; AFNElJ97/mipr+hPJ87DqF2Rnqhld54M6XI9mhsJbY7b5+MFDv9oqNhPAMdHbs7Z+LcuBhGhgzelodl8K1hbQrc=@vger.kernel.org X-Gm-Message-State: AOJu0Yy/NT4chpyMj5MiLIrvxvPjc9zW6yj6qdckjwT9jzP1eT2eauip GP+XAIa3QFA42kAv3z9z2w9op7bN6LEBiFN9VJr+JZ9HKicCO6ALefG7 X-Gm-Gg: Acq92OE+369hNDko5gKTKp0ET9Ylyjr+NAXynzerZmylllX+ONBaBIRkDVKXCAkYRRy Qq5mGiSxJ4wNSxovdPToZqB++WxovbInk70Qht56ZTsQBCOtmabI3sbwatpOWfbS0tDUISUvBVO 6vAxA9i+fdRMEuZlXK5I/BkAb6XiDcpiQ1GnhMhEMokUdaDuIOcYKx1/YmXry+7g19OxFaftsHf nL1mMXE2Cttz7jzDtSeSlIbSEONV4BHkDdYWVkzqTG6MhmQR5qYBSBdZ1TATYJ9EWU4OPdhOruz +78uJTFRU0XoSsf94fqt3lmP2I4Xe77bQcQKaYMjX4yRd4lnL/v9gDKZ6Shi2Bvb9sUljc4PWld wxvRGEKvjThFiwI8Uz+yLjiRasipyPsBJgtMTcaJVoCTY1AU4SKTgtLK+d71mv0QV8IZYE+6dm7 +aI5hwtKD6tvyzTqTy633h9zlfgp1SWzF2pgSg7neLbd1sHLSGrkTDJI1G9KJF435H X-Received: by 2002:a05:600d:8:b0:48a:93f8:dd02 with SMTP id 5b1f17b1804b1-48fe61f291cmr203243815e9.14.1779114040603; Mon, 18 May 2026 07:20:40 -0700 (PDT) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48fe57943b2sm265129155e9.8.2026.05.18.07.20.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 May 2026 07:20:40 -0700 (PDT) Date: Mon, 18 May 2026 15:20:38 +0100 From: David Laight To: "David Hildenbrand (Arm)" Cc: "Garg, Shivank" , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ankur Arora , Bharata B Rao , Hrushikesh Salunke , David Rientjes , sandipan.das@amd.com Subject: Re: [RFC PATCH 1/1] mm: batch page copies in folio_copy() and folio_mc_copy() Message-ID: <20260518152038.7d959f39@pumpkin> In-Reply-To: <60fb497d-a35a-42b9-8bf1-7f9ccbde8ade@kernel.org> References: <20260427142036.111940-2-shivankg@amd.com> <20260427142036.111940-4-shivankg@amd.com> <073e5e2c-7102-4141-b0d7-fa5635f811f5@kernel.org> <6a5e794a-a608-4126-9abe-0d512a57dd67@amd.com> <60fb497d-a35a-42b9-8bf1-7f9ccbde8ade@kernel.org> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Mon, 18 May 2026 10:43:22 +0200 "David Hildenbrand (Arm)" wrote: ... > > Another option is to leave memcpy() untouched for this series and add > > a new copy_pages() helper that the folio copy path can use. It would > > use ALTERNATIVE_2 that picks rep movsb on ERMS/FSRM and rep movsq on > > REP_GOOD and per-page copy_page() loop as the final fallback. =20 >=20 > That would fit the clear_pages() design we have. But if that's avoidable,= that > would be nice. >=20 For full pages 'rep movsq' is likely to be 'almost the best' on all cpu. The fixed overhead is amortised over a lot of copies so has little impact. (My brain suggests a value of 30 clocks - ignoring P4 netburst.) For Intel cpu the aligned destination will double throughput. I did a load of benchmarking of 'rep movsb' on my Zen-5. (I should be able to find the results again.) The real oddity was copies where (something like): 0 < (dst - src) & 4095 < 128 when the startup time was a lot longer and the copy ran massively slower.=20 I need to run those tests on some other cpu. However I don't have any older AMD ones (except a piledriver) or Intel ones newer than an i7-7 (Kaby lake?). (I need to get my Apollo Lake N3350 into the test set for comparison.) =46rom what I remember of some earlier benchmarking (which failed to measure the fixed cost properly) even Sandy bridge handles 'rep movsb' and 'rep movsq' the same way. The problem with memcpy() is you want a hint from the source about the likely length and any alignment assumptions. Otherwise the costs of the conditionals become significant. -- David