From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B1B8CD4F4A for ; Wed, 4 Sep 2024 23:23:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CF3706B025A; Wed, 4 Sep 2024 19:23:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C9D9E6B025C; Wed, 4 Sep 2024 19:23:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B16E86B025E; Wed, 4 Sep 2024 19:23:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 92BD86B025A for ; Wed, 4 Sep 2024 19:23:53 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2D0091A071B for ; Wed, 4 Sep 2024 23:23:53 +0000 (UTC) X-FDA: 82528635546.20.10FE403 Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) by imf30.hostedemail.com (Postfix) with ESMTP id 306BA8000F for ; Wed, 4 Sep 2024 23:23:50 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Bdd4zaXe; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.221.54 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725492153; a=rsa-sha256; cv=none; b=qL1BsMAE44bJ7kAZTGdqnwIOKc2AgHRixLgMncngQAPKUZ4ckxZKJ3bDpqsV/QFWXjkbyK 3ErRmkzYf3JqwoDdlr+NSa4zbKblVJ0l8Iazli8JLkjsQYnMedJYs/7DuWNhHSeYP0Q4H7 byO+PBfM2TAd58Gqdf/Tt8E+DKx8WdI= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Bdd4zaXe; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.221.54 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725492153; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BiTb7LH4fqeIoEg+GobGmbCm8qwvMMZGYKK0A6xILq0=; b=A9jzx/Qrxp/2oFY7/efowmrNj53/WPOEN2XwLyzf2qbM51a6sg2XabiSNf/LWSuErbOBuL EyovMcS2+bwbsHjBtjt5ht4nu5avXLwdxE9gyAMn5wiLIfdGNVba3ZZRYqq/0rHmae9e54 aBzWQMQt8QvazyfBwcxsY6VlPUwuGPk= Received: by mail-wr1-f54.google.com with SMTP id ffacd0b85a97d-374ba78f192so33750f8f.3 for ; Wed, 04 Sep 2024 16:23:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725492229; x=1726097029; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=BiTb7LH4fqeIoEg+GobGmbCm8qwvMMZGYKK0A6xILq0=; b=Bdd4zaXeDNDFp0yIZCxNsWavvdbdXv+UWmUsYOhZrBWRWZGfvDYyX+Rh5S/qPfvZea PD/5reOqTtMTdjO2+tXBjbS29Xc1TkbhZxm4XyY/pwDf2syTsY+LMGP7jZy7eInD0HtS LeqKYpA9vF0KJxciviaxQZBSZyDKz8b7ZZELnxJRPsxwxjuKh0uZR1jCKIIDgxKm4g3L egWAlAwZYD89MY7dgmybYCal9xFolAu9Ii8IP50kwpkbJA0HeFH3Zy/ZHHP8+s9Q/Rvo u2NmLzTcM5vTa3DzvV6yRaNuO+Yu90Y552eHlXqTmybsMPDFZEpq/Rv8XpkNPtg0Z8LF APcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725492229; x=1726097029; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BiTb7LH4fqeIoEg+GobGmbCm8qwvMMZGYKK0A6xILq0=; b=k32HePklR7HNSj1qIZ1gAtRuZlnt25VeiCRlDvab6eouCF4Fc/Zcj15XN2tV38Zmqp 0XfstcdmNxE7CmDkfT6wIo927vR5/WWSRrKvzLB6oOHGsd7m4usIqhowMPbyKDttrwXh bJL/iPke1O2/hi5XDsiCJCtdZWcCt8sqyi+OIpHEJfe0BPu68WZLFvwP1SSbNVDTWIJG nMdYPP+5LtYRu8PIrEaYMQeJdEalq1FNP9c6NLv9Xa/oWdWQ/gz1UHRiRRVkUIQ3ZDvM 0zxwW7t2BTi/R673DxTic7Ylqs8tAxMKQ0Qu/wDVHoWcnUhXrQElig5c7ls+bxY5thrM lbRA== X-Forwarded-Encrypted: i=1; AJvYcCVnH3dePVDcECT+bm7hnPgMRJ61gju93jeKpHaUXMQUsr3/bihYuOcrhxHTzECNl6JTUQtcxDDM6A==@kvack.org X-Gm-Message-State: AOJu0YwfjrgOpZgeVsgaMoYZNXelAv2+LMJpLbfg4UknNNecMUqDzq/x 0wmQlgxSMZ2pZZmiYGDSFQ4R+GQTT3nYOksSAmIds8PUmqfyNxpZ X-Google-Smtp-Source: AGHT+IFzoYwnFMj2Xh2bpT8OPwF50goTtMqOi3vdctTDCW4lq4WU/KntXi0cjXhqDZ8qjeoAaBzEIQ== X-Received: by 2002:adf:f747:0:b0:376:27b7:da7d with SMTP id ffacd0b85a97d-376dd71ab0fmr4158311f8f.32.1725492228534; Wed, 04 Sep 2024 16:23:48 -0700 (PDT) Received: from ?IPV6:2a02:6b6f:e750:7600:c5:51ce:2b5:970b? ([2a02:6b6f:e750:7600:c5:51ce:2b5:970b]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-374b67ff88dsm15059306f8f.26.2024.09.04.16.23.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 04 Sep 2024 16:23:48 -0700 (PDT) Message-ID: Date: Thu, 5 Sep 2024 00:23:47 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v7 2/2] mm: support large folios swap-in for sync io devices To: Barry Song <21cnbao@gmail.com> Cc: Yosry Ahmed , Andrew Morton , Kairui Song , hanchuanhua@oppo.com, linux-mm@kvack.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, hughd@google.com, kaleshsingh@google.com, linux-kernel@vger.kernel.org, mhocko@suse.com, minchan@kernel.org, nphamcs@gmail.com, ryan.roberts@arm.com, senozhatsky@chromium.org, shakeel.butt@linux.dev, shy828301@gmail.com, surenb@google.com, v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, hch@infradead.org References: <20240821074541.516249-1-hanchuanhua@oppo.com> <20240821074541.516249-3-hanchuanhua@oppo.com> <20240903130757.f584c73f356c03617a2c8804@linux-foundation.org> <94eb70cd-b508-42ef-b5d2-acc29e22eb0e@gmail.com> Content-Language: en-US From: Usama Arif In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 306BA8000F X-Stat-Signature: bzsuxbqkc4azx4ejmqkn67fq67qeeubq X-Rspam-User: X-HE-Tag: 1725492230-932566 X-HE-Meta: U2FsdGVkX186IYOXOc/bfe8BqLEhUGv5AyWKhQ2gp+A5megO+ORiayDMmyZHyruGLFnqlRRoZCtnKibXCB5i8og5CYHgfevzneDSd5N1XbXa5SD9QjTvqtzXoGtP1HDtuKPWULpstgR6emqI7UmhZR3OASygvJB1c5xLDN/rWmyPazWtf4sFK3kST7xt4jFviwZWhRHoZyDrex7Odgsq8vVH4aKI24a6/hGmanqzh6pavhWmrGnh6w7XWdSpAjRG3pJBz5/F3wrpbamOBVe0vgAA6sZDRtxtG7D88qH3rFSiuygIorBsY+34Pj4RqhYrYfKvhJik/YnbMgNYzascFc8txECAH9cSFB2ceruPwJnQ5FB665i1PvHfOcjEryayKbdWPYB90BkIM8FsKloWaPJ44pFqJKJXnBbTOs5txg0b2QyJNNMSqCH45S6kzyQR10yAMaj7SYhGUiz4xDQ4xVmmwE4vLSESL0crt6+Wp0iBUgT5DUoWBew9rBcoQ//j63M5WMd7Xj7tXvz+0mitrWxWlPZQyz614uT0VBAqNwB8ECGdjIlnNsN2m7AndLVl5GaZMoeA0aUDEQjJ/vsveiqosl/hZGJx3I+2XJJotNzT7UXDggHaXsTr9tq5tYRO2AAbme6ayQY3DIBMCrqFctgXnrssk79c4Qj0tNuxV/zNbULLT6Arb7qSoM6lVhwInpKHNPihuh3qqzgS4xXZXUkciAIrKE2/i4xnGHjmnbmyqCFIa/rMQ0D9Pk2s4owZDqcGxqbZKrVifue4YEF+81YmliiHOsykc+DeuHQ52LJ8kiFNw1zuM7VUr8HjrknvkkorV1t/7XCOEDEbFlo+N6ErKlI/O6KPkkBfSHpnA06yaPWAQ4dt9Rz6d/AliUD6mjCloMMENvcwuP/vhpREZ5VHUo2o+NAnOkE5Y23CPfTul9vQ7y5IoBTISX7B+/X+VwlQrbNFExw8MdonjmQ 5jbXEmbb PLFTgX+uQ/TX1QSwy6d3KIonqXYhHj8PshqVRP32w0EIg3zkrgcLA+f+svAfxhe9y4f+2J2z53f/OiTmVRG5XCniuiutt4dZ0BZp0/H5nkqcFWRqN4IoST+Tnh9nNwNBgMfDXskPzVvY4a6pjOTqs9QjICaJfl0FQeCoTFip86nkiPVSJ83eA6/H5w1v4eZ/WWxBfIX7twRdw11FcSa3z9DyxuY7CV1yWDVVlUaSgwaesO5wE0Ttxq4VyFuLXphpQkVc7FLlvNJg1OF8BGPARUZdwfjgdqnlBu2xNY4qXBdK1trEXdogD5GLIjIrTiHdSDYubTU53WTXwT6Go6gQDmh2rPZOu10KOGQAx6wOAKd0yqJdbfPbdmo3D6y2ktbqGtMGjSvuU6MP4+HSsOTUBYCWH0aOnSf+hzDlJhDqTCYWOxpXAnXzBzeST5iA98UER31kyiB2TPeppJPyQ5Jbq3IIAMiFXMHs9K4QBIGqSlFPaBCxyySIZXdphLHw0fd30Ovr86vfuQww9g5DWFVQSPwxQfTdIOnNran05Ft4owLxKDJ0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 05/09/2024 00:10, Barry Song wrote: > On Thu, Sep 5, 2024 at 9:30 AM Usama Arif wrote: >> >> >> >> On 03/09/2024 23:05, Yosry Ahmed wrote: >>> On Tue, Sep 3, 2024 at 2:36 PM Barry Song <21cnbao@gmail.com> wrote: >>>> >>>> On Wed, Sep 4, 2024 at 8:08 AM Andrew Morton wrote: >>>>> >>>>> On Tue, 3 Sep 2024 11:38:37 -0700 Yosry Ahmed wrote: >>>>> >>>>>>> [ 39.157954] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000007 >>>>>>> [ 39.158288] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 >>>>>>> [ 39.158634] R13: 0000000000002b9a R14: 0000000000000000 R15: 00007ffd619d5518 >>>>>>> [ 39.158998] >>>>>>> [ 39.159226] ---[ end trace 0000000000000000 ]--- >>>>>>> >>>>>>> After reverting this or Usama's "mm: store zero pages to be swapped >>>>>>> out in a bitmap", the problem is gone. I think these two patches may >>>>>>> have some conflict that needs to be resolved. >>>>>> >>>>>> Yup. I saw this conflict coming and specifically asked for this >>>>>> warning to be added in Usama's patch to catch it [1]. It served its >>>>>> purpose. >>>>>> >>>>>> Usama's patch does not handle large folio swapin, because at the time >>>>>> it was written we didn't have it. We expected Usama's series to land >>>>>> sooner than this one, so the warning was to make sure that this series >>>>>> handles large folio swapin in the zeromap code. Now that they are both >>>>>> in mm-unstable, we are gonna have to figure this out. >>>>>> >>>>>> I suspect Usama's patches are closer to land so it's better to handle >>>>>> this in this series, but I will leave it up to Usama and >>>>>> Chuanhua/Barry to figure this out :) >>>> >>>> I believe handling this in swap-in might violate layer separation. >>>> `swap_read_folio()` should be a reliable API to call, regardless of >>>> whether `zeromap` is present. Therefore, the fix should likely be >>>> within `zeromap` but not this `swap-in`. I’ll take a look at this with >>>> Usama :-) >>> >>> I meant handling it within this series to avoid blocking Usama >>> patches, not within this code. Thanks for taking a look, I am sure you >>> and Usama will figure out the best way forward :) >> >> Hi Barry and Yosry, >> >> Is the best (and quickest) way forward to have a v8 of this with >> https://lore.kernel.org/all/20240904055522.2376-1-21cnbao@gmail.com/ >> as the first patch, and using swap_zeromap_entries_count in alloc_swap_folio >> in this support large folios swap-in patch? > > Yes, Usama. i can actually do a check: > > zeromap_cnt = swap_zeromap_entries_count(entry, nr); > > /* swap_read_folio() can handle inconsistent zeromap in multiple entries */ > if (zeromap_cnt > 0 && zeromap_cnt < nr) > try next order; > > On the other hand, if you read the code of zRAM, you will find zRAM has > exactly the same mechanism as zeromap but zRAM can even do more > by same_pages filled. since zRAM does the job in swapfile layer, there > is no this kind of consistency issue like zeromap. > > So I feel for zRAM case, we don't need zeromap at all as there are duplicated > efforts while I really appreciate your job which can benefit all swapfiles. > i mean, zRAM has the ability to check "zero"(and also non-zero but same > content). after zeromap checks zeromap, zRAM will check again: > Yes, so there is a reason for having the zeromap patches, which I have outlined in the coverletter. https://lore.kernel.org/all/20240627105730.3110705-1-usamaarif642@gmail.com/ There are usecases where zswap/zram might not be used in production. We can reduce I/O and flash wear in those cases by a large amount. Also running in Meta production, we found that the number of non-zero filled complete pages were less than 1%, so essentially its only the zero-filled pages that matter. I believe after zeromap, it might be a good idea to remove the page_same_filled check from zram code? Its not really a problem if its kept as well as I dont believe any zero-filled pages should reach zram_write_page? > static int zram_write_page(struct zram *zram, struct page *page, u32 index) > { > ... > > if (page_same_filled(mem, &element)) { > kunmap_local(mem); > /* Free memory associated with this sector now. */ > flags = ZRAM_SAME; > atomic64_inc(&zram->stats.same_pages); > goto out; > } > ... > } > > So it seems that zeromap might slightly impact my zRAM use case. I'm not > blaming you, just pointing out that there might be some overlap in effort > here :-) > >> >> Thanks, >> Usama > > Thanks > Barry