From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC0FBE7F153 for ; Thu, 28 Sep 2023 19:44:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A46C8D00D4; Thu, 28 Sep 2023 15:44:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 154D58D0053; Thu, 28 Sep 2023 15:44:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F36C78D00D4; Thu, 28 Sep 2023 15:44:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E28078D0053 for ; Thu, 28 Sep 2023 15:44:39 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BB8151CA638 for ; Thu, 28 Sep 2023 19:44:39 +0000 (UTC) X-FDA: 81287033478.22.4AEA2DF Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) by imf16.hostedemail.com (Postfix) with ESMTP id E6C24180015 for ; Thu, 28 Sep 2023 19:44:37 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DcYCWKgv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of zokeefe@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=zokeefe@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695930278; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JRyUr6UVl7FFC/RDg2+bKaNhZoV26yTOAqV73hVeW/I=; b=kwiWRYl0wfDTNexo16W2urfJHXIk2CSKm4VQDxbz887mjSUIKXSt3cA03ofaSxsZR+5A72 1DgmlPqZduN/fGKI7205bJZiZM5QhTmcVcqym/tRAcH9XQJkCE+19znbDLmFidBa0JB2PB mkW30ur8pIBS3SNujXxDqv/5dJJYVfs= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DcYCWKgv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of zokeefe@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=zokeefe@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695930278; a=rsa-sha256; cv=none; b=znWFhofCvYxcmkfXnGK3cZOHe/TenkUqWFX5aHunzc+Ln6HDEI5ensLqKHQGGkp5C16wLY uIfRKBgLLWbx0ZaAtXamuS4WxEUEiSL4nR5n+Zy0CaacFEHIZVR/gEfyWhDJpMjxfkdwum wownF1CJuTIZ13eSr1/zenNJ/zDWwjA= Received: by mail-lf1-f42.google.com with SMTP id 2adb3069b0e04-502f29ed596so1663e87.0 for ; Thu, 28 Sep 2023 12:44:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695930276; x=1696535076; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JRyUr6UVl7FFC/RDg2+bKaNhZoV26yTOAqV73hVeW/I=; b=DcYCWKgvOuZfR3UguG8ZyKnCe6g9bq2hEXBIRRdqjFXr9fkGeo4TQhK6ugiEMQddgN q72RmI39ro9UgJbVA/IqFFmVa5f+pcPIXgEoWXWxSoq0wm/6IZKIDUQUMG6VZlhdlmEj 5bhe3+/TEu3ALYH7H7p9B1Z7TTNVxZ9FbltMt/bXfSHAsLZoWz6kt8kNZYSQxu2blVMm OODG43xalvkiSTpw0Sa4s/SoSiaR0838EW33H+rz4UoApDQbjJv+5alK8h7I9AWXhGM8 O7mmQGBhiJBKP6c5vYTFatp62AsA1B6aLJ+P+feg92NiCFzTUdrT/O4f1Y9ERttoDHU9 vzfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695930276; x=1696535076; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JRyUr6UVl7FFC/RDg2+bKaNhZoV26yTOAqV73hVeW/I=; b=Sr3vZsEv0sODUzGGymPFSpWhl5Q1HxYX67xsqBxcgvlIHPQKY9siBEI/F1zLHn+lrl zIN8/dnpvPjDrMbS98eX9R1+G6sQzBlTperdx3Gc2IOyz6brrmXt6AABhS2tVgseeL+0 bbQMGVypRj4ZyyCJ/rBZYoBALWY3crmYF+wTivXbOdOqwE472O7OMddJQcPIcQHKfZvN w41N9qL985O5eUqh8os3mCvWsvg/DUddAwD8tzkWseAyzGonH9bw/7TO3k8UovqlVqeH 5gzhRutXtEZ26ygIMtWBC4CODDAkV/VNzzq1+dKLkdxLj/2jVFxOnPhGmzxnbfjf1VNk GS5g== X-Gm-Message-State: AOJu0Yyboi1HhwjpfMqbl0LHJZywBAJCvSLFdZZoM91XKxS4Q/U0JD+x 9Uh9mOku5vbOkGhRaJjkl3C7699wtVCslTvLQZsi0w== X-Google-Smtp-Source: AGHT+IFKlbBKVnPaqgfeKYB1Ca18yK3GM5PEyM10qfeD3UfU7Tifq5GZfWfclUM1vEGRG8VhRzADV8ZSB+r3FpENcj0= X-Received: by 2002:ac2:55ac:0:b0:501:ba53:a4f7 with SMTP id y12-20020ac255ac000000b00501ba53a4f7mr252243lfg.0.1695930275610; Thu, 28 Sep 2023 12:44:35 -0700 (PDT) MIME-Version: 1.0 References: <4d6c9b19-cdbb-4a00-9a40-5ed5c36332e5@arm.com> <54e5accf-1a56-495a-a4f5-d57504bc2fc8@arm.com> In-Reply-To: <54e5accf-1a56-495a-a4f5-d57504bc2fc8@arm.com> From: "Zach O'Keefe" Date: Thu, 28 Sep 2023 12:43:57 -0700 Message-ID: Subject: Re: BUG: MADV_COLLAPSE doesn't work for XFS files To: Ryan Roberts Cc: Bagas Sanjaya , Hugh Dickins , David Hildenbrand , Matthew Wilcox , Chandan Babu R , "Darrick J. Wong" , Linux Memory Management List , Linux XFS , Linux Kernel Mailing List , Yu Zhao Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: E6C24180015 X-Stat-Signature: m61paapic4m11q73ghom99sjxusifodt X-Rspam-User: X-HE-Tag: 1695930277-411886 X-HE-Meta: U2FsdGVkX1+kLWtn2ljZPHalO2d2u+hPqTOPOiC8p+c1Ltlma9KZmVXXCCO9jG5I8Jv6O7CC1enL0XUYCJr0uuDYJh/HC7nv1dIsViP1paD0FnyLQlklyC0bEL+CeNBh9ZW2x2UZveYVSESle5bmOpKidkH1bhgJDZSC7JXR5+Pyn0YAplayv5nkaJUrl+aG2Mrg3PC84tebw9T63HjtPUX4S4pe2niOJrGGLBMX1UhFPu0Tq6tObZ0sAUXH+VePkk3M5PWAaHuyqZifD2NJ2ociRYLtAPibZ7x/+k1OVWBtEbPM+FPvxmMRiezSvIAWpFmuw9JLd6/gPvPgbr+dMb4ved5kJZ3dsMFWtGWPSDLLUuMY/dUOPgLxcuyWzdOAePOmCFhxKxk3SoevWUeOLbEhmCfDAIjeFxGEIPtxPKpoZ01vjrbfGRRtK7e7ZjTGiJdQhjowaEdEBCxlEjx71jSUQxJalM+BrGHjKjQUh1uvC9TS1e43M/8cuqkI/vE+OMKbIS4T/i8BTZ7tPHh7UvojeuTx7mUzyr4Ia5kXnRxy0HOwefgvw71KNpilNoVOkCKbqjVcdVtV011G7m6mskOyK8yp08RuCqZ7OyN/ARGicRRc3KrkdQwJzuTH1S9tYSAjaGvt6WTqK3qH8Fu1z9UfxvqQZqLArVUPRT/lZSbs3mHRsQoankckEe92CsrzxbrCQFiMWgfGjQWNt/mM5MbjgE2LEN1J3PPIVuXEe+nH5Yl5lT6ktPhh8JHoF3eP8n5o3un9zyMBphQFQ02Q1QvUvJjfdK+WYEZCuZhBUVfPxKvL/99ZNxIZUJ3chXCOudqfwiud2goVfk0cIM1YBH5+vFwQKsW9H8/CKwfA9TJTKXE/fEy4htoy2dOIKx2aAxJkrNY04n2htpW6tkMFUvtQHEi5RC+s8clkmbDDdljLocL9iMkxVn0liYHPJdm8TmVMco8NUGaJt4gOCh/ KkrNjJ0j fO3fLn+bia75UT1Bn4RlRNw71j1jVgj+WxIurSrRfS63IAUn89RE8enG3XZFAZUInU5e2zJCnwN3IikslI6yXdwEPD1htbMTOS89VfkfKSPK5SUc8Ch9Gtgia3JMivg1f9Ypey3NWuNqmBw59FuySNjIfkloHQPuAxT8MHmEaJaseGuiU1On7qle7ClhB0c7PIBcCap9fOMW9FRcE6fRC7FSzp7+pAq+RcCtZ9/pyHSff9dc7BK8ePYAdvIkzJ/GfGFbUn6zmlJwxS6JdW+uc2zUqVfnTSO0c8Q3ZVuAl2RX6yG1A3ORgA+Jc8jVYA01ZO1TxLrBjQbSLV3AaTBodpTiRmMvLECeEegt/7r9+4Yr+1aCK7eQBMBk9zA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hey Ryan, Thanks for bringing this up. On Thu, Sep 28, 2023 at 4:59=E2=80=AFAM Ryan Roberts = wrote: > > On 28/09/2023 11:54, Bagas Sanjaya wrote: > > On Thu, Sep 28, 2023 at 10:55:17AM +0100, Ryan Roberts wrote: > >> Hi all, > >> > >> I've just noticed that when applied to a file mapping for a file on xf= s, MADV_COLLAPSE returns EINVAL. The same test case works fine if the file = is on ext4. > >> > >> I think the root cause is that the implementation bails out if it find= s a (non-PMD-sized) large folio in the page cache for any part of the file = covered by the region. XFS does readahead into large folios so we hit this = issue. See khugepaged.h:collapse_file(): > >> > >> if (PageTransCompound(page)) { > >> struct page *head =3D compound_head(page); > >> > >> result =3D compound_order(head) =3D=3D HPAGE_PMD_= ORDER && > >> head->index =3D=3D start > >> /* Maybe PMD-mapped */ > >> ? SCAN_PTE_MAPPED_HUGEPAGE > >> : SCAN_PAGE_COMPOUND; > >> goto out_unlock; > >> } > > Ya, non-PMD-sized THPs were just barely visible in my peripherals when writing this, and I'm still woefully behind on your work on them now (sorry!). I'd like to eventually make collapse (not just MADV_COLLAPSE, but khugepaged too) support arbitrary-sized large folios in general, but I'm very pressed for time right now. I think M. Wilcox is also interested in this, given he left the TODO to support it :P Thank you for the reproducer though! I haven't run it, but I'll probably come back here to steal it when the time comes. > > I don't see any hint to -EINVAL above. Am I missing something? > > The SCAN_PAGE_COMPOUND result ends up back at madvise_collapse() where it > eventually gets converted to -EINVAL by madvise_collapse_errno(). > > > > >> > >> I'm not sure if this is already a known issue? I don't have time to wo= rk on a fix for this right now, so thought I would highlight it at least. I= might get around to it at some point in the future if nobody else tackles = it. My guess is Q1 2024 is when I'd be able to look into this, at the current level of urgency. It doesn't sound like it's blocking anything for your work right now -- lmk if that changes though! Thanks, Zach > >> > >> Thanks, > >> Ryan > >> > >> > >> Test case I've been using: > >> > >> -->8-- > >> > >> #include > >> #include > >> #include > >> #include > >> #include > >> #include > >> #include > >> > >> #ifndef MADV_COLLAPSE > >> #define MADV_COLLAPSE 25 > >> #endif > >> > >> #define handle_error(msg) do { perror(msg); exit(EXIT_FAILURE); } w= hile (0) > >> > >> #define SZ_1K 1024 > >> #define SZ_1M (SZ_1K * SZ_1K) > >> #define ALIGN(val, align) (((val) + ((align) - 1)) & ~((align) - 1)= ) > >> > >> #if 1 > >> // ext4 > >> #define DATA_FILE "/home/ubuntu/data.txt" > >> #else > >> // xfs > >> #define DATA_FILE "/boot/data.txt" > >> #endif > >> > >> int main(void) > >> { > >> int fd; > >> char *mem; > >> int ret; > >> > >> fd =3D open(DATA_FILE, O_RDONLY); > >> if (fd =3D=3D -1) > >> handle_error("open"); > >> > >> mem =3D mmap(NULL, SZ_1M * 4, PROT_READ | PROT_EXEC, MAP_PRIVATE,= fd, 0); > >> close(fd); > >> if (mem =3D=3D MAP_FAILED) > >> handle_error("mmap"); > >> > >> printf("1: pid=3D%d, mem=3D%p\n", getpid(), mem); > >> getchar(); > >> > >> mem =3D (char *)ALIGN((unsigned long)mem, SZ_1M * 2); > >> ret =3D madvise(mem, SZ_1M * 2, MADV_COLLAPSE); > >> if (ret) > >> handle_error("madvise"); > >> > >> printf("2: pid=3D%d, mem=3D%p\n", getpid(), mem); > >> getchar(); > >> > >> return 0; > >> } > >> > >> -->8-- > >> > > > > Confused... > > This is a user space test case that shows the problem; data.txt needs to = be at > least 4MB and on a mounted ext4 and xfs filesystem. By toggling the '#if = 1' to > 0, you can see the different behaviours for ext4 and xfs - > handle_error("madvise") fires with EINVAL in the xfs case. The getchar()s= are > leftovers from me looking at the smaps file. >