From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E1D7C433F5 for ; Fri, 6 May 2022 02:56:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 870556B0071; Thu, 5 May 2022 22:56:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 81F7E6B0073; Thu, 5 May 2022 22:56:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CAB26B0074; Thu, 5 May 2022 22:56:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 569A36B0071 for ; Thu, 5 May 2022 22:56:50 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 2EC90120860 for ; Fri, 6 May 2022 02:56:50 +0000 (UTC) X-FDA: 79433805780.02.E459F38 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf23.hostedemail.com (Postfix) with ESMTP id B738A140006 for ; Fri, 6 May 2022 02:56:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651805809; x=1683341809; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=bm/ECb2jddD2tXLPLMb1gNbwUfVlj/otmAhjYIFTg4o=; b=NSr1bgPZpM2PsmTeDDLRQrMPa8gYTrfQsrGm+/07OUJnIa5m+OwyY8hV wPJgRUURySKdTRLrGZrjFrTEol2VVkNsnFp8E3Z7e4nVMQMlxDnkcYdzT Jau50lN2EOiHClM64a4ZHwIVywNWg4dYRL80svfg7HDLdsEWpGDn/R+g3 lgaH3GZP3aW4AeZe1IaCdzbDdGZDBJnWJjcJ43GxqqMp+zACfeyJu3DyK fN4syJqL2b+ZsEmT7FZRny+hRHEPQXK4e6unz+uo+RLU8IiB4HFx+bQWx wyVIJQzjHKxnmivmjouv4DJ+hWAZFf5IJvWOKQPGmmEawG+pHZvge20wO Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="267929265" X-IronPort-AV: E=Sophos;i="5.91,203,1647327600"; d="scan'208";a="267929265" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2022 19:56:46 -0700 X-IronPort-AV: E=Sophos;i="5.91,203,1647327600"; d="scan'208";a="735337943" Received: from fulaizha-mobl1.ccr.corp.intel.com ([10.254.213.163]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2022 19:56:43 -0700 Message-ID: Subject: Re: [PATCH 1/2] MM: handle THP in swap_*page_fs() From: "ying.huang@intel.com" To: NeilBrown , Yang Shi Cc: Andrew Morton , Geert Uytterhoeven , Christoph Hellwig , Miaohe Lin , linux-nfs@vger.kernel.org, Linux MM , Linux Kernel Mailing List Date: Fri, 06 May 2022 10:56:40 +0800 In-Reply-To: <165170771676.24672.16520001373464213119@noble.neil.brown.name> References: <165119280115.15698.2629172320052218921.stgit@noble.brown> , <165119301488.15698.9457662928942765453.stgit@noble.brown> , , <165146539609.24404.4051313590023463843@noble.neil.brown.name> , <165170771676.24672.16520001373464213119@noble.neil.brown.name> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.3-1 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Stat-Signature: er9pj9i8dpncu4hajsxb4h5sp41t1uma X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: B738A140006 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=NSr1bgPZ; spf=none (imf23.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 134.134.136.24) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-HE-Tag: 1651805798-750419 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 2022-05-05 at 09:41 +1000, NeilBrown wrote: > On Tue, 03 May 2022, Yang Shi wrote: > > On Sun, May 1, 2022 at 9:23 PM NeilBrown wrote: > > > > > > On Sat, 30 Apr 2022, Yang Shi wrote: > > > > On Thu, Apr 28, 2022 at 5:44 PM NeilBrown wrote: > > > > > > > > > > Pages passed to swap_readpage()/swap_writepage() are not necessarily all > > > > > the same size - there may be transparent-huge-pages involves. > > > > > > > > > > The BIO paths of swap_*page() handle this correctly, but the SWP_FS_OPS > > > > > path does not. > > > > > > > > > > So we need to use thp_size() to find the size, not just assume > > > > > PAGE_SIZE, and we need to track the total length of the request, not > > > > > just assume it is "page * PAGE_SIZE". > > > > > > > > Swap-over-nfs doesn't support THP swap IIUC. So SWP_FS_OPS should not > > > > see THP at all. But I agree to remove the assumption about page size > > > > in this path. > > > > > > Can you help me understand this please. How would the swap code know > > > that swap-over-NFS doesn't support THP swap? There is no reason that > > > NFS wouldn't be able to handle 2MB writes. Even 1GB should work though > > > NFS would have to split into several smaller WRITE requests. > > > > AFAICT, THP swap is only supported on non-rotate block devices, for > > example, SSD, PMEM, etc. IIRC, the swap device has to support the > > cluster in order to swap THP. The cluster is only supported by > > non-rotate block devices. > > > > Looped Ying in, who is the author of THP swap. > > I hunted around the code and found that THP swap only happens if a > 'cluster_info' is allocated, and that only happens if > if (p->bdev && bdev_nonrot(p->bdev)) { > in the swapon syscall. > And in get_swap_pages(), the cluster is only allocated for block devices. if (size == SWAPFILE_CLUSTER) { if (si->flags & SWP_BLKDEV) n_ret = swap_alloc_cluster(si, swp_entries); } else n_ret = scan_swap_map_slots(si, SWAP_HAS_CACHE, n_goal, swp_entries); We may remove this restriction in the future if someone can show the benefit. Best Regards, Huang, Ying > I guess "nonrot" is being use as a synonym for "low latency"... > So even if NFS was low-latency it couldn't benefit from THP swap. > > So as you say it is not currently possible for THP pages to be send to > NFS for swapout. It makes sense to prepare for it though I think - if > only so that the code is more consistent and less confusing. > > Thanks, > NeilBrown