All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: Bagas Sanjaya <bagasdotme@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	David Hildenbrand <david@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	zokeefe@google.com, Chandan Babu R <chandan.babu@oracle.com>,
	"Darrick J. Wong" <djwong@kernel.org>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	Linux XFS <linux-xfs@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: BUG: MADV_COLLAPSE doesn't work for XFS files
Date: Thu, 28 Sep 2023 12:59:43 +0100	[thread overview]
Message-ID: <54e5accf-1a56-495a-a4f5-d57504bc2fc8@arm.com> (raw)
In-Reply-To: <ZRVbV6yJ-zFzRoas@debian.me>

On 28/09/2023 11:54, Bagas Sanjaya wrote:
> On Thu, Sep 28, 2023 at 10:55:17AM +0100, Ryan Roberts wrote:
>> Hi all,
>>
>> I've just noticed that when applied to a file mapping for a file on xfs, MADV_COLLAPSE returns EINVAL. The same test case works fine if the file is on ext4. 
>>
>> I think the root cause is that the implementation bails out if it finds a (non-PMD-sized) large folio in the page cache for any part of the file covered by the region. XFS does readahead into large folios so we hit this issue. See khugepaged.h:collapse_file():
>>
>> 		if (PageTransCompound(page)) {
>> 			struct page *head = compound_head(page);
>>
>> 			result = compound_order(head) == HPAGE_PMD_ORDER &&
>> 					head->index == start
>> 					/* Maybe PMD-mapped */
>> 					? SCAN_PTE_MAPPED_HUGEPAGE
>> 					: SCAN_PAGE_COMPOUND;
>> 			goto out_unlock;
>> 		}
> 
> I don't see any hint to -EINVAL above. Am I missing something?

The SCAN_PAGE_COMPOUND result ends up back at madvise_collapse() where it
eventually gets converted to -EINVAL by madvise_collapse_errno().

> 
>>
>> I'm not sure if this is already a known issue? I don't have time to work on a fix for this right now, so thought I would highlight it at least. I might get around to it at some point in the future if nobody else tackles it.
>>
>> Thanks,
>> Ryan
>>
>>
>> Test case I've been using:
>>
>> -->8--
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <sys/mman.h>
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>> #include <unistd.h>
>>
>> #ifndef MADV_COLLAPSE
>> #define MADV_COLLAPSE		25
>> #endif
>>
>> #define handle_error(msg) 	do { perror(msg); exit(EXIT_FAILURE); } while (0)
>>
>> #define SZ_1K			1024
>> #define SZ_1M			(SZ_1K * SZ_1K)
>> #define ALIGN(val, align)	(((val) + ((align) - 1)) & ~((align) - 1))
>>
>> #if 1
>> // ext4
>> #define DATA_FILE		"/home/ubuntu/data.txt"
>> #else
>> // xfs
>> #define DATA_FILE		"/boot/data.txt"
>> #endif
>>
>> int main(void)
>> {
>> 	int fd;
>> 	char *mem;
>> 	int ret;
>>
>> 	fd = open(DATA_FILE, O_RDONLY);
>> 	if (fd == -1)
>>         	handle_error("open");
>>
>> 	mem = mmap(NULL, SZ_1M * 4, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0);
>> 	close(fd);
>> 	if (mem == MAP_FAILED)
>>         	handle_error("mmap");
>>
>> 	printf("1: pid=%d, mem=%p\n", getpid(), mem);
>> 	getchar();
>>
>> 	mem = (char *)ALIGN((unsigned long)mem, SZ_1M * 2);
>> 	ret = madvise(mem, SZ_1M * 2, MADV_COLLAPSE);
>> 	if (ret)
>> 		handle_error("madvise");
>>
>> 	printf("2: pid=%d, mem=%p\n", getpid(), mem);
>> 	getchar();
>>
>> 	return 0;
>> }
>>
>> -->8--
>>
> 
> Confused...

This is a user space test case that shows the problem; data.txt needs to be at
least 4MB and on a mounted ext4 and xfs filesystem. By toggling the '#if 1' to
0, you can see the different behaviours for ext4 and xfs -
handle_error("madvise") fires with EINVAL in the xfs case. The getchar()s are
leftovers from me looking at the smaps file.

> 


  reply	other threads:[~2023-09-28 11:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-28  9:55 BUG: MADV_COLLAPSE doesn't work for XFS files Ryan Roberts
2023-09-28 10:54 ` Bagas Sanjaya
2023-09-28 11:59   ` Ryan Roberts [this message]
2023-09-28 19:43     ` Zach O'Keefe
2023-09-28 21:04       ` BUG: MADV_COLLAPSE doesn't work for XFS files] Darrick J. Wong
2023-09-28 22:48         ` Zach O'Keefe
2023-09-29 12:33       ` BUG: MADV_COLLAPSE doesn't work for XFS files Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54e5accf-1a56-495a-a4f5-d57504bc2fc8@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=bagasdotme@gmail.com \
    --cc=chandan.babu@oracle.com \
    --cc=david@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.