linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/25] Large pages in the page cache
@ 2020-04-29 13:36 Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 01/25] mm: Allow hpages to be arbitrary order Matthew Wilcox
                   ` (26 more replies)
  0 siblings, 27 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

This patch set does not pass xfstests.  Test at your own risk.  It is
based on the readahead rewrite which is in Andrew's tree.  The large
pages somehow manage to fall off the LRU, so the test VM quickly runs
out of memory and freezes.  To reproduce:

# mkfs.xfs /dev/sdb && mount /dev/sdb /mnt && dd if=/dev/zero bs=1M count=2048 of=/mnt/bigfile && sync && sleep 2 && sync && echo 1 >/proc/sys/vm/drop_caches 
# /host/home/willy/kernel/xarray-2/tools/vm/page-types | grep thp
0x0000000000401800	       511        1  ___________Ma_________t____________________	mmap,anonymous,thp
0x0000000000405868	         1        0  ___U_lA____Ma_b_______t____________________	uptodate,lru,active,mmap,anonymous,swapbacked,thp
# dd if=/mnt/bigfile of=/dev/null bs=2M count=5
# /host/home/willy/kernel/xarray-2/tools/vm/page-types | grep thp
0x0000000000400000	      2516        9  ______________________t____________________	thp
0x0000000000400028	         1        0  ___U_l________________t____________________	uptodate,lru,thp
0x000000000040006c	       106        0  __RU_lA_______________t____________________	referenced,uptodate,lru,active,thp
0x0000000000400228	         1        0  ___U_l___I____________t____________________	uptodate,lru,reclaim,thp
0x0000000000401800	       511        1  ___________Ma_________t____________________	mmap,anonymous,thp
0x0000000000405868	         1        0  ___U_lA____Ma_b_______t____________________	uptodate,lru,active,mmap,anonymous,swapbacked,thp


The principal idea here is that a large part of the overhead in dealing
with individual pages is that there's just so darned many of them.  We
would be better off dealing with fewer, larger pages, even if they don't
get to be the size necessary for the CPU to use a larger TLB entry.

Matthew Wilcox (Oracle) (24):
  mm: Allow hpages to be arbitrary order
  mm: Introduce thp_size
  mm: Introduce thp_order
  mm: Introduce offset_in_thp
  fs: Add a filesystem flag for large pages
  fs: Introduce i_blocks_per_page
  fs: Make page_mkwrite_check_truncate thp-aware
  fs: Support THPs in zero_user_segments
  bio: Add bio_for_each_thp_segment_all
  iomap: Support arbitrarily many blocks per page
  iomap: Support large pages in iomap_adjust_read_range
  iomap: Support large pages in read paths
  iomap: Support large pages in write paths
  iomap: Inline data shouldn't see large pages
  xfs: Support large pages
  mm: Make prep_transhuge_page return its argument
  mm: Add __page_cache_alloc_order
  mm: Allow large pages to be added to the page cache
  mm: Allow large pages to be removed from the page cache
  mm: Remove page fault assumption of compound page size
  mm: Add DEFINE_READAHEAD
  mm: Make page_cache_readahead_unbounded take a readahead_control
  mm: Make __do_page_cache_readahead take a readahead_control
  mm: Add large page readahead

William Kucharski (1):
  mm: Align THP mappings for non-DAX

 drivers/nvdimm/btt.c    |   4 +-
 drivers/nvdimm/pmem.c   |   6 +-
 fs/ext4/verity.c        |   4 +-
 fs/f2fs/verity.c        |   4 +-
 fs/iomap/buffered-io.c  | 110 ++++++++++++++++--------------
 fs/jfs/jfs_metapage.c   |   2 +-
 fs/xfs/xfs_aops.c       |   4 +-
 fs/xfs/xfs_super.c      |   2 +-
 include/linux/bio.h     |  13 ++++
 include/linux/bvec.h    |  23 +++++++
 include/linux/fs.h      |   1 +
 include/linux/highmem.h |  15 +++--
 include/linux/huge_mm.h |  25 +++++--
 include/linux/mm.h      |  97 ++++++++++++++-------------
 include/linux/pagemap.h |  62 ++++++++++++++---
 mm/filemap.c            |  60 ++++++++++++-----
 mm/highmem.c            |  62 ++++++++++++++++-
 mm/huge_memory.c        |  49 ++++++--------
 mm/internal.h           |  13 ++--
 mm/memory.c             |   7 +-
 mm/page_io.c            |   2 +-
 mm/page_vma_mapped.c    |   4 +-
 mm/readahead.c          | 145 ++++++++++++++++++++++++++++++----------
 23 files changed, 485 insertions(+), 229 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 01/25] mm: Allow hpages to be arbitrary order
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 02/25] mm: Introduce thp_size Matthew Wilcox
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Remove the assumption in hpage_nr_pages() that compound pages are
necessarily PMD sized.  Move the relevant parts of mm.h to before the
include of huge_mm.h so we can use an inline function rather than a macro.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/huge_mm.h |  5 +--
 include/linux/mm.h      | 96 ++++++++++++++++++++---------------------
 2 files changed, 50 insertions(+), 51 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index cfbb0a87c5f0..6bec4b5b61e1 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -265,11 +265,10 @@ static inline spinlock_t *pud_trans_huge_lock(pud_t *pud,
 	else
 		return NULL;
 }
+
 static inline int hpage_nr_pages(struct page *page)
 {
-	if (unlikely(PageTransHuge(page)))
-		return HPAGE_PMD_NR;
-	return 1;
+	return compound_nr(page);
 }
 
 struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 581e56275bc4..088acbda722d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -671,6 +671,54 @@ int vma_is_stack_for_current(struct vm_area_struct *vma);
 struct mmu_gather;
 struct inode;
 
+static inline unsigned int compound_order(struct page *page)
+{
+	if (!PageHead(page))
+		return 0;
+	return page[1].compound_order;
+}
+
+static inline bool hpage_pincount_available(struct page *page)
+{
+	/*
+	 * Can the page->hpage_pinned_refcount field be used? That field is in
+	 * the 3rd page of the compound page, so the smallest (2-page) compound
+	 * pages cannot support it.
+	 */
+	page = compound_head(page);
+	return PageCompound(page) && compound_order(page) > 1;
+}
+
+static inline int compound_pincount(struct page *page)
+{
+	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
+	page = compound_head(page);
+	return atomic_read(compound_pincount_ptr(page));
+}
+
+static inline void set_compound_order(struct page *page, unsigned int order)
+{
+	page[1].compound_order = order;
+}
+
+/* Returns the number of pages in this potentially compound page. */
+static inline unsigned long compound_nr(struct page *page)
+{
+	return 1UL << compound_order(page);
+}
+
+/* Returns the number of bytes in this potentially compound page. */
+static inline unsigned long page_size(struct page *page)
+{
+	return PAGE_SIZE << compound_order(page);
+}
+
+/* Returns the number of bits needed for the number of bytes in a page */
+static inline unsigned int page_shift(struct page *page)
+{
+	return PAGE_SHIFT + compound_order(page);
+}
+
 /*
  * FIXME: take this include out, include page-flags.h in
  * files which need it (119 of them)
@@ -875,54 +923,6 @@ static inline compound_page_dtor *get_compound_page_dtor(struct page *page)
 	return compound_page_dtors[page[1].compound_dtor];
 }
 
-static inline unsigned int compound_order(struct page *page)
-{
-	if (!PageHead(page))
-		return 0;
-	return page[1].compound_order;
-}
-
-static inline bool hpage_pincount_available(struct page *page)
-{
-	/*
-	 * Can the page->hpage_pinned_refcount field be used? That field is in
-	 * the 3rd page of the compound page, so the smallest (2-page) compound
-	 * pages cannot support it.
-	 */
-	page = compound_head(page);
-	return PageCompound(page) && compound_order(page) > 1;
-}
-
-static inline int compound_pincount(struct page *page)
-{
-	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
-	page = compound_head(page);
-	return atomic_read(compound_pincount_ptr(page));
-}
-
-static inline void set_compound_order(struct page *page, unsigned int order)
-{
-	page[1].compound_order = order;
-}
-
-/* Returns the number of pages in this potentially compound page. */
-static inline unsigned long compound_nr(struct page *page)
-{
-	return 1UL << compound_order(page);
-}
-
-/* Returns the number of bytes in this potentially compound page. */
-static inline unsigned long page_size(struct page *page)
-{
-	return PAGE_SIZE << compound_order(page);
-}
-
-/* Returns the number of bits needed for the number of bytes in a page */
-static inline unsigned int page_shift(struct page *page)
-{
-	return PAGE_SHIFT + compound_order(page);
-}
-
 void free_compound_page(struct page *page);
 
 #ifdef CONFIG_MMU
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 02/25] mm: Introduce thp_size
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 01/25] mm: Allow hpages to be arbitrary order Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-05-06 17:59   ` Yang Shi
  2020-04-29 13:36 ` [PATCH v3 03/25] mm: Introduce thp_order Matthew Wilcox
                   ` (24 subsequent siblings)
  26 siblings, 1 reply; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

This is like page_size(), but compiles down to just PAGE_SIZE if THP
are disabled.  Convert the users of hpage_nr_pages() which would prefer
this interface.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 drivers/nvdimm/btt.c    | 4 +---
 drivers/nvdimm/pmem.c   | 6 ++----
 include/linux/huge_mm.h | 7 +++++++
 mm/internal.h           | 2 +-
 mm/page_io.c            | 2 +-
 mm/page_vma_mapped.c    | 4 ++--
 6 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 3b09419218d6..78e8d972d45a 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1488,10 +1488,8 @@ static int btt_rw_page(struct block_device *bdev, sector_t sector,
 {
 	struct btt *btt = bdev->bd_disk->private_data;
 	int rc;
-	unsigned int len;
 
-	len = hpage_nr_pages(page) * PAGE_SIZE;
-	rc = btt_do_bvec(btt, NULL, page, len, 0, op, sector);
+	rc = btt_do_bvec(btt, NULL, page, thp_size(page), 0, op, sector);
 	if (rc == 0)
 		page_endio(page, op_is_write(op), 0);
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 2df6994acf83..184c8b516543 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -235,11 +235,9 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 	blk_status_t rc;
 
 	if (op_is_write(op))
-		rc = pmem_do_write(pmem, page, 0, sector,
-				   hpage_nr_pages(page) * PAGE_SIZE);
+		rc = pmem_do_write(pmem, page, 0, sector, tmp_size(page));
 	else
-		rc = pmem_do_read(pmem, page, 0, sector,
-				   hpage_nr_pages(page) * PAGE_SIZE);
+		rc = pmem_do_read(pmem, page, 0, sector, thp_size(page));
 	/*
 	 * The ->rw_page interface is subtle and tricky.  The core
 	 * retries on any error, so we can only invoke page_endio() in
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 6bec4b5b61e1..e944f9757349 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -271,6 +271,11 @@ static inline int hpage_nr_pages(struct page *page)
 	return compound_nr(page);
 }
 
+static inline unsigned long thp_size(struct page *page)
+{
+	return page_size(page);
+}
+
 struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
 		pmd_t *pmd, int flags, struct dev_pagemap **pgmap);
 struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
@@ -329,6 +334,8 @@ static inline int hpage_nr_pages(struct page *page)
 	return 1;
 }
 
+#define thp_size(x)		PAGE_SIZE
+
 static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
 {
 	return false;
diff --git a/mm/internal.h b/mm/internal.h
index f762a34b0c57..5efb13d5c226 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -386,7 +386,7 @@ vma_address(struct page *page, struct vm_area_struct *vma)
 	unsigned long start, end;
 
 	start = __vma_address(page, vma);
-	end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
+	end = start + thp_size(page) - PAGE_SIZE;
 
 	/* page should be within @vma mapping range */
 	VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma);
diff --git a/mm/page_io.c b/mm/page_io.c
index 76965be1d40e..dd935129e3cb 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -41,7 +41,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
 		bio->bi_iter.bi_sector <<= PAGE_SHIFT - 9;
 		bio->bi_end_io = end_io;
 
-		bio_add_page(bio, page, PAGE_SIZE * hpage_nr_pages(page), 0);
+		bio_add_page(bio, page, thp_size(page), 0);
 	}
 	return bio;
 }
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 719c35246cfa..e65629c056e8 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -227,7 +227,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 			if (pvmw->address >= pvmw->vma->vm_end ||
 			    pvmw->address >=
 					__vma_address(pvmw->page, pvmw->vma) +
-					hpage_nr_pages(pvmw->page) * PAGE_SIZE)
+					thp_size(pvmw->page))
 				return not_found(pvmw);
 			/* Did we cross page table boundary? */
 			if (pvmw->address % PMD_SIZE == 0) {
@@ -268,7 +268,7 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
 	unsigned long start, end;
 
 	start = __vma_address(page, vma);
-	end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
+	end = start + thp_size(page) - PAGE_SIZE;
 
 	if (unlikely(end < vma->vm_start || start >= vma->vm_end))
 		return 0;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 03/25] mm: Introduce thp_order
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 01/25] mm: Allow hpages to be arbitrary order Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 02/25] mm: Introduce thp_size Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 04/25] mm: Introduce offset_in_thp Matthew Wilcox
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Like compound_order() except 0 when THP is disabled.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/huge_mm.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index e944f9757349..1f6245091917 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -276,6 +276,11 @@ static inline unsigned long thp_size(struct page *page)
 	return page_size(page);
 }
 
+static inline unsigned int thp_order(struct page *page)
+{
+	return compound_order(page);
+}
+
 struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
 		pmd_t *pmd, int flags, struct dev_pagemap **pgmap);
 struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
@@ -335,6 +340,7 @@ static inline int hpage_nr_pages(struct page *page)
 }
 
 #define thp_size(x)		PAGE_SIZE
+#define thp_order(x)		0U
 
 static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
 {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 04/25] mm: Introduce offset_in_thp
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (2 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 03/25] mm: Introduce thp_order Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 05/25] fs: Add a filesystem flag for large pages Matthew Wilcox
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Mirroring offset_in_page(), this gives you the offset within this
particular page, no matter what size page it is.  It optimises down
to offset_in_page() if CONFIG_TRANSPARENT_HUGEPAGE is not set.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 088acbda722d..9a55dce6a535 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1577,6 +1577,7 @@ static inline void clear_page_pfmemalloc(struct page *page)
 extern void pagefault_out_of_memory(void);
 
 #define offset_in_page(p)	((unsigned long)(p) & ~PAGE_MASK)
+#define offset_in_thp(page, p)	((unsigned long)(p) & (thp_size(page) - 1))
 
 /*
  * Flags passed to show_mem() and show_free_areas() to suppress output in
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 05/25] fs: Add a filesystem flag for large pages
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (3 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 04/25] mm: Introduce offset_in_thp Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 06/25] fs: Introduce i_blocks_per_page Matthew Wilcox
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

The page cache needs to know whether the filesystem supports pages >
PAGE_SIZE.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/fs.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 55c743925c40..777783c8760b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2241,6 +2241,7 @@ struct file_system_type {
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_DISALLOW_NOTIFY_PERM	16	/* Disable fanotify permission events */
+#define FS_LARGE_PAGES		8192	/* Remove once all fs converted */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	int (*init_fs_context)(struct fs_context *);
 	const struct fs_parameter_spec *parameters;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 06/25] fs: Introduce i_blocks_per_page
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (4 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 05/25] fs: Add a filesystem flag for large pages Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 07/25] fs: Make page_mkwrite_check_truncate thp-aware Matthew Wilcox
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel,
	Christoph Hellwig

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

This helper is useful for both large pages in the page cache and for
supporting block size larger than page size.  Convert some example
users (we have a few different ways of writing this idiom).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c  |  8 ++++----
 fs/jfs/jfs_metapage.c   |  2 +-
 fs/xfs/xfs_aops.c       |  2 +-
 include/linux/pagemap.h | 16 ++++++++++++++++
 4 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 890c8fcda4f3..4bc37bf8d057 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -46,7 +46,7 @@ iomap_page_create(struct inode *inode, struct page *page)
 {
 	struct iomap_page *iop = to_iomap_page(page);
 
-	if (iop || i_blocksize(inode) == PAGE_SIZE)
+	if (iop || i_blocks_per_page(inode, page) <= 1)
 		return iop;
 
 	iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
@@ -152,7 +152,7 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 	unsigned int i;
 
 	spin_lock_irqsave(&iop->uptodate_lock, flags);
-	for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) {
+	for (i = 0; i < i_blocks_per_page(inode, page); i++) {
 		if (i >= first && i <= last)
 			set_bit(i, iop->uptodate);
 		else if (!test_bit(i, iop->uptodate))
@@ -1090,7 +1090,7 @@ iomap_finish_page_writeback(struct inode *inode, struct page *page,
 		mapping_set_error(inode->i_mapping, -EIO);
 	}
 
-	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
+	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) <= 0);
 
 	if (!iop || atomic_dec_and_test(&iop->write_count))
@@ -1386,7 +1386,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	int error = 0, count = 0, i;
 	LIST_HEAD(submit_list);
 
-	WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE && !iop);
+	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) != 0);
 
 	/*
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index a2f5338a5ea1..176580f54af9 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -473,7 +473,7 @@ static int metapage_readpage(struct file *fp, struct page *page)
 	struct inode *inode = page->mapping->host;
 	struct bio *bio = NULL;
 	int block_offset;
-	int blocks_per_page = PAGE_SIZE >> inode->i_blkbits;
+	int blocks_per_page = i_blocks_per_page(inode, page);
 	sector_t page_start;	/* address of page in fs blocks */
 	sector_t pblock;
 	int xlen;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1fd4fb7a607c..5b25f5ee84dc 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -544,7 +544,7 @@ xfs_discard_page(
 			page, ip->i_ino, offset);
 
 	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
-			PAGE_SIZE / i_blocksize(inode));
+			i_blocks_per_page(inode, page));
 	if (error && !XFS_FORCED_SHUTDOWN(mp))
 		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
 out_invalidate:
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 36bfc9d855bb..ba16f7bf676b 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -816,4 +816,20 @@ static inline int page_mkwrite_check_truncate(struct page *page,
 	return offset;
 }
 
+/**
+ * i_blocks_per_page - How many blocks fit in this page.
+ * @inode: The inode which contains the blocks.
+ * @page: The (potentially large) page.
+ *
+ * If the block size is larger than the size of this page, will return
+ * zero,
+ *
+ * Context: Any context.
+ * Return: The number of filesystem blocks covered by this page.
+ */
+static inline
+unsigned int i_blocks_per_page(struct inode *inode, struct page *page)
+{
+	return thp_size(page) >> inode->i_blkbits;
+}
 #endif /* _LINUX_PAGEMAP_H */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 07/25] fs: Make page_mkwrite_check_truncate thp-aware
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (5 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 06/25] fs: Introduce i_blocks_per_page Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 08/25] fs: Support THPs in zero_user_segments Matthew Wilcox
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

If the page is compound, check the last index in the page and return
the appropriate size.  Change the return type to ssize_t in case we ever
support pages larger than 2GB.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ba16f7bf676b..55199cb5bd66 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -793,22 +793,22 @@ static inline unsigned long dir_pages(struct inode *inode)
  * @page: the page to check
  * @inode: the inode to check the page against
  *
- * Returns the number of bytes in the page up to EOF,
+ * Return: The number of bytes in the page up to EOF,
  * or -EFAULT if the page was truncated.
  */
-static inline int page_mkwrite_check_truncate(struct page *page,
+static inline ssize_t page_mkwrite_check_truncate(struct page *page,
 					      struct inode *inode)
 {
 	loff_t size = i_size_read(inode);
 	pgoff_t index = size >> PAGE_SHIFT;
-	int offset = offset_in_page(size);
+	unsigned long offset = offset_in_thp(page, size);
 
 	if (page->mapping != inode->i_mapping)
 		return -EFAULT;
 
 	/* page is wholly inside EOF */
-	if (page->index < index)
-		return PAGE_SIZE;
+	if (page->index + hpage_nr_pages(page) - 1 < index)
+		return thp_size(page);
 	/* page is wholly past EOF */
 	if (page->index > index || !offset)
 		return -EFAULT;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 08/25] fs: Support THPs in zero_user_segments
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (6 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 07/25] fs: Make page_mkwrite_check_truncate thp-aware Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 09/25] bio: Add bio_for_each_thp_segment_all Matthew Wilcox
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

We can only kmap() one subpage of a THP at a time, so loop over all
relevant subpages, skipping ones which don't need to be zeroed.  This is
too large to inline when THPs are enabled and we actually need highmem,
so put it in highmem.c.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/highmem.h | 15 +++++++---
 mm/highmem.c            | 62 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index ea5cdbd8c2c3..74614903619d 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -215,13 +215,18 @@ static inline void clear_highpage(struct page *page)
 	kunmap_atomic(kaddr);
 }
 
+#if defined(CONFIG_HIGHMEM) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
+void zero_user_segments(struct page *page, unsigned start1, unsigned end1,
+		unsigned start2, unsigned end2);
+#else /* !HIGHMEM || !TRANSPARENT_HUGEPAGE */
 static inline void zero_user_segments(struct page *page,
-	unsigned start1, unsigned end1,
-	unsigned start2, unsigned end2)
+		unsigned start1, unsigned end1,
+		unsigned start2, unsigned end2)
 {
+	unsigned long i;
 	void *kaddr = kmap_atomic(page);
 
-	BUG_ON(end1 > PAGE_SIZE || end2 > PAGE_SIZE);
+	BUG_ON(end1 > thp_size(page) || end2 > thp_size(page));
 
 	if (end1 > start1)
 		memset(kaddr + start1, 0, end1 - start1);
@@ -230,8 +235,10 @@ static inline void zero_user_segments(struct page *page,
 		memset(kaddr + start2, 0, end2 - start2);
 
 	kunmap_atomic(kaddr);
-	flush_dcache_page(page);
+	for (i = 0; i < hpage_nr_pages(page); i++)
+		flush_dcache_page(page + i);
 }
+#endif /* !HIGHMEM || !TRANSPARENT_HUGEPAGE */
 
 static inline void zero_user_segment(struct page *page,
 	unsigned start, unsigned end)
diff --git a/mm/highmem.c b/mm/highmem.c
index 64d8dea47dd1..3a85c66ef532 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -367,9 +367,67 @@ void kunmap_high(struct page *page)
 	if (need_wakeup)
 		wake_up(pkmap_map_wait);
 }
-
 EXPORT_SYMBOL(kunmap_high);
-#endif
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+void zero_user_segments(struct page *page, unsigned start1, unsigned end1,
+		unsigned start2, unsigned end2)
+{
+	unsigned int i;
+
+	BUG_ON(end1 > thp_size(page) || end2 > thp_size(page));
+
+	for (i = 0; i < hpage_nr_pages(page); i++) {
+		void *kaddr;
+		unsigned this_end;
+
+		if (end1 == 0 && start2 >= PAGE_SIZE) {
+			start2 -= PAGE_SIZE;
+			end2 -= PAGE_SIZE;
+			continue;
+		}
+
+		if (start1 >= PAGE_SIZE) {
+			start1 -= PAGE_SIZE;
+			end1 -= PAGE_SIZE;
+			if (start2) {
+				start2 -= PAGE_SIZE;
+				end2 -= PAGE_SIZE;
+			}
+			continue;
+		}
+
+		kaddr = kmap_atomic(page + i);
+
+		this_end = min_t(unsigned, end1, PAGE_SIZE);
+		if (end1 > start1)
+			memset(kaddr + start1, 0, this_end - start1);
+		end1 -= this_end;
+		start1 = 0;
+
+		if (start2 >= PAGE_SIZE) {
+			start2 -= PAGE_SIZE;
+			end2 -= PAGE_SIZE;
+		} else {
+			this_end = min_t(unsigned, end2, PAGE_SIZE);
+			if (end2 > start2)
+				memset(kaddr + start2, 0, this_end - start2);
+			end2 -= this_end;
+			start2 = 0;
+		}
+
+		kunmap_atomic(kaddr);
+		flush_dcache_page(page + i);
+
+		if (!end1 && !end2)
+			break;
+	}
+
+	BUG_ON((start1 | start2 | end1 | end2) != 0);
+}
+EXPORT_SYMBOL(zero_user_segments);
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+#endif /* CONFIG_HIGHMEM */
 
 #if defined(HASHED_PAGE_VIRTUAL)
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 09/25] bio: Add bio_for_each_thp_segment_all
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (7 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 08/25] fs: Support THPs in zero_user_segments Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 10/25] iomap: Support arbitrarily many blocks per page Matthew Wilcox
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Iterate once for each THP page instead of once for each base page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/bio.h  | 13 +++++++++++++
 include/linux/bvec.h | 23 +++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index c1c0f9ea4e63..4cc883fd8d63 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -131,12 +131,25 @@ static inline bool bio_next_segment(const struct bio *bio,
 	return true;
 }
 
+static inline bool bio_next_thp_segment(const struct bio *bio,
+				    struct bvec_iter_all *iter)
+{
+	if (iter->idx >= bio->bi_vcnt)
+		return false;
+
+	bvec_thp_advance(&bio->bi_io_vec[iter->idx], iter);
+	return true;
+}
+
 /*
  * drivers should _never_ use the all version - the bio may have been split
  * before it got to the driver and the driver won't own all of it
  */
 #define bio_for_each_segment_all(bvl, bio, iter) \
 	for (bvl = bvec_init_iter_all(&iter); bio_next_segment((bio), &iter); )
+#define bio_for_each_thp_segment_all(bvl, bio, iter) \
+	for (bvl = bvec_init_iter_all(&iter); \
+	     bio_next_thp_segment((bio), &iter); )
 
 static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
 				    unsigned bytes)
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index a81c13ac1972..e08bd192e0ed 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -153,4 +153,27 @@ static inline void bvec_advance(const struct bio_vec *bvec,
 	}
 }
 
+static inline void bvec_thp_advance(const struct bio_vec *bvec,
+				struct bvec_iter_all *iter_all)
+{
+	struct bio_vec *bv = &iter_all->bv;
+	unsigned int page_size = thp_size(bvec->bv_page);
+
+	if (iter_all->done) {
+		bv->bv_page += hpage_nr_pages(bv->bv_page);
+		bv->bv_offset = 0;
+	} else {
+		BUG_ON(bvec->bv_offset >= page_size);
+		bv->bv_page = bvec->bv_page;
+		bv->bv_offset = bvec->bv_offset & (page_size - 1);
+	}
+	bv->bv_len = min(page_size - bv->bv_offset,
+			 bvec->bv_len - iter_all->done);
+	iter_all->done += bv->bv_len;
+
+	if (iter_all->done == bvec->bv_len) {
+		iter_all->idx++;
+		iter_all->done = 0;
+	}
+}
 #endif /* __LINUX_BVEC_ITER_H */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 10/25] iomap: Support arbitrarily many blocks per page
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (8 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 09/25] bio: Add bio_for_each_thp_segment_all Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 11/25] iomap: Support large pages in iomap_adjust_read_range Matthew Wilcox
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Size the uptodate array dynamically.  Now that this array is protected
by a spinlock, we can use bitmap functions to set the bits in this array
instead of a loop around set_bit().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 27 +++++++++------------------
 1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 4bc37bf8d057..4a79061073eb 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -22,14 +22,14 @@
 #include "../internal.h"
 
 /*
- * Structure allocated for each page when block size < PAGE_SIZE to track
+ * Structure allocated for each page when block size < page size to track
  * sub-page uptodate status and I/O completions.
  */
 struct iomap_page {
 	atomic_t		read_count;
 	atomic_t		write_count;
 	spinlock_t		uptodate_lock;
-	DECLARE_BITMAP(uptodate, PAGE_SIZE / 512);
+	unsigned long		uptodate[];
 };
 
 static inline struct iomap_page *to_iomap_page(struct page *page)
@@ -45,15 +45,14 @@ static struct iomap_page *
 iomap_page_create(struct inode *inode, struct page *page)
 {
 	struct iomap_page *iop = to_iomap_page(page);
+	unsigned int nr_blocks = i_blocks_per_page(inode, page);
 
-	if (iop || i_blocks_per_page(inode, page) <= 1)
+	if (iop || nr_blocks <= 1)
 		return iop;
 
-	iop = kmalloc(sizeof(*iop), GFP_NOFS | __GFP_NOFAIL);
-	atomic_set(&iop->read_count, 0);
-	atomic_set(&iop->write_count, 0);
+	iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)),
+				GFP_NOFS | __GFP_NOFAIL);
 	spin_lock_init(&iop->uptodate_lock);
-	bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
 
 	/*
 	 * migrate_page_move_mapping() assumes that pages with private data have
@@ -146,20 +145,12 @@ iomap_iop_set_range_uptodate(struct page *page, unsigned off, unsigned len)
 	struct iomap_page *iop = to_iomap_page(page);
 	struct inode *inode = page->mapping->host;
 	unsigned first = off >> inode->i_blkbits;
-	unsigned last = (off + len - 1) >> inode->i_blkbits;
-	bool uptodate = true;
+	unsigned count = len >> inode->i_blkbits;
 	unsigned long flags;
-	unsigned int i;
 
 	spin_lock_irqsave(&iop->uptodate_lock, flags);
-	for (i = 0; i < i_blocks_per_page(inode, page); i++) {
-		if (i >= first && i <= last)
-			set_bit(i, iop->uptodate);
-		else if (!test_bit(i, iop->uptodate))
-			uptodate = false;
-	}
-
-	if (uptodate)
+	bitmap_set(iop->uptodate, first, count);
+	if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
 		SetPageUptodate(page);
 	spin_unlock_irqrestore(&iop->uptodate_lock, flags);
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 11/25] iomap: Support large pages in iomap_adjust_read_range
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (9 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 10/25] iomap: Support arbitrarily many blocks per page Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 12/25] iomap: Support large pages in read paths Matthew Wilcox
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Pass the struct page instead of the iomap_page so we can determine the
size of the page.  Use offset_in_thp() instead of offset_in_page() and use
thp_size() instead of PAGE_SIZE.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 4a79061073eb..423ffc9d4a97 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -83,15 +83,16 @@ iomap_page_release(struct page *page)
  * Calculate the range inside the page that we actually need to read.
  */
 static void
-iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
+iomap_adjust_read_range(struct inode *inode, struct page *page,
 		loff_t *pos, loff_t length, unsigned *offp, unsigned *lenp)
 {
+	struct iomap_page *iop = to_iomap_page(page);
 	loff_t orig_pos = *pos;
 	loff_t isize = i_size_read(inode);
 	unsigned block_bits = inode->i_blkbits;
 	unsigned block_size = (1 << block_bits);
-	unsigned poff = offset_in_page(*pos);
-	unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length);
+	unsigned poff = offset_in_thp(page, *pos);
+	unsigned plen = min_t(loff_t, thp_size(page) - poff, length);
 	unsigned first = poff >> block_bits;
 	unsigned last = (poff + plen - 1) >> block_bits;
 
@@ -129,7 +130,7 @@ iomap_adjust_read_range(struct inode *inode, struct iomap_page *iop,
 	 * page cache for blocks that are entirely outside of i_size.
 	 */
 	if (orig_pos <= isize && orig_pos + length > isize) {
-		unsigned end = offset_in_page(isize - 1) >> block_bits;
+		unsigned end = offset_in_thp(page, isize - 1) >> block_bits;
 
 		if (first <= end && last > end)
 			plen -= (last - end) * block_size;
@@ -256,7 +257,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	}
 
 	/* zero post-eof blocks as the page may be mapped */
-	iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
+	iomap_adjust_read_range(inode, page, &pos, length, &poff, &plen);
 	if (plen == 0)
 		goto done;
 
@@ -571,7 +572,6 @@ static int
 __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 		struct page *page, struct iomap *srcmap)
 {
-	struct iomap_page *iop = iomap_page_create(inode, page);
 	loff_t block_size = i_blocksize(inode);
 	loff_t block_start = pos & ~(block_size - 1);
 	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
@@ -580,9 +580,10 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 
 	if (PageUptodate(page))
 		return 0;
+	iomap_page_create(inode, page);
 
 	do {
-		iomap_adjust_read_range(inode, iop, &block_start,
+		iomap_adjust_read_range(inode, page, &block_start,
 				block_end - block_start, &poff, &plen);
 		if (plen == 0)
 			break;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 12/25] iomap: Support large pages in read paths
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (10 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 11/25] iomap: Support large pages in iomap_adjust_read_range Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 13/25] iomap: Support large pages in write paths Matthew Wilcox
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Use thp_size() instead of PAGE_SIZE, offset_in_thp() instead of
offset_in_page() and bio_for_each_thp_segment_all().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 423ffc9d4a97..75f42c0d4cd9 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -198,7 +198,7 @@ iomap_read_end_io(struct bio *bio)
 	struct bio_vec *bvec;
 	struct bvec_iter_all iter_all;
 
-	bio_for_each_segment_all(bvec, bio, iter_all)
+	bio_for_each_thp_segment_all(bvec, bio, iter_all)
 		iomap_read_page_end_io(bvec, error);
 	bio_put(bio);
 }
@@ -238,6 +238,16 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode,
 		pos >= i_size_read(inode);
 }
 
+/*
+ * Estimate the number of vectors we need based on the current page size;
+ * if we're wrong we'll end up doing an overly large allocation or needing
+ * to do a second allocation, neither of which is a big deal.
+ */
+static unsigned int iomap_nr_vecs(struct page *page, loff_t length)
+{
+	return (length + thp_size(page) - 1) >> page_shift(page);
+}
+
 static loff_t
 iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		struct iomap *iomap, struct iomap *srcmap)
@@ -294,7 +304,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	if (!ctx->bio || !is_contig || bio_full(ctx->bio, plen)) {
 		gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
 		gfp_t orig_gfp = gfp;
-		int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
+		int nr_vecs = iomap_nr_vecs(page, length);
 
 		if (ctx->bio)
 			submit_bio(ctx->bio);
@@ -338,9 +348,9 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
 
 	trace_iomap_readpage(page->mapping->host, 1);
 
-	for (poff = 0; poff < PAGE_SIZE; poff += ret) {
+	for (poff = 0; poff < thp_size(page); poff += ret) {
 		ret = iomap_apply(inode, page_offset(page) + poff,
-				PAGE_SIZE - poff, 0, ops, &ctx,
+				thp_size(page) - poff, 0, ops, &ctx,
 				iomap_readpage_actor);
 		if (ret <= 0) {
 			WARN_ON_ONCE(ret == 0);
@@ -374,7 +384,8 @@ iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
 	loff_t done, ret;
 
 	for (done = 0; done < length; done += ret) {
-		if (ctx->cur_page && offset_in_page(pos + done) == 0) {
+		if (ctx->cur_page &&
+		    offset_in_thp(ctx->cur_page, pos + done) == 0) {
 			if (!ctx->cur_page_in_bio)
 				unlock_page(ctx->cur_page);
 			put_page(ctx->cur_page);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 13/25] iomap: Support large pages in write paths
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (11 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 12/25] iomap: Support large pages in read paths Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 14/25] iomap: Inline data shouldn't see large pages Matthew Wilcox
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Use thp_size() instead of PAGE_SIZE and offset_in_thp() instead of
offset_in_page().  Also simplify the logic in iomap_do_writepage()
for determining end of file.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 43 +++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 75f42c0d4cd9..709be90a1997 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -466,7 +466,7 @@ iomap_is_partially_uptodate(struct page *page, unsigned long from,
 	unsigned i;
 
 	/* Limit range to one page */
-	len = min_t(unsigned, PAGE_SIZE - from, count);
+	len = min_t(unsigned, thp_size(page) - from, count);
 
 	/* First and last blocks in range within page */
 	first = from >> inode->i_blkbits;
@@ -510,7 +510,7 @@ iomap_invalidatepage(struct page *page, unsigned int offset, unsigned int len)
 	 * If we are invalidating the entire page, clear the dirty state from it
 	 * and release it to avoid unnecessary buildup of the LRU.
 	 */
-	if (offset == 0 && len == PAGE_SIZE) {
+	if (offset == 0 && len == thp_size(page)) {
 		WARN_ON_ONCE(PageWriteback(page));
 		cancel_dirty_page(page);
 		iomap_page_release(page);
@@ -586,7 +586,9 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
 	loff_t block_size = i_blocksize(inode);
 	loff_t block_start = pos & ~(block_size - 1);
 	loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
-	unsigned from = offset_in_page(pos), to = from + len, poff, plen;
+	unsigned from = offset_in_thp(page, pos);
+	unsigned to = from + len;
+	unsigned poff, plen;
 	int status;
 
 	if (PageUptodate(page))
@@ -718,7 +720,7 @@ __iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
 	 */
 	if (unlikely(copied < len && !PageUptodate(page)))
 		return 0;
-	iomap_set_range_uptodate(page, offset_in_page(pos), len);
+	iomap_set_range_uptodate(page, offset_in_thp(page, pos), len);
 	iomap_set_page_dirty(page);
 	return copied;
 }
@@ -793,6 +795,10 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		unsigned long bytes;	/* Bytes to write to page */
 		size_t copied;		/* Bytes copied from user */
 
+		/*
+		 * XXX: We don't know what size page we'll find in the
+		 * page cache, so only copy up to a regular page boundary.
+		 */
 		offset = offset_in_page(pos);
 		bytes = min_t(unsigned long, PAGE_SIZE - offset,
 						iov_iter_count(i));
@@ -1335,7 +1341,7 @@ iomap_add_to_ioend(struct inode *inode, loff_t offset, struct page *page,
 {
 	sector_t sector = iomap_sector(&wpc->iomap, offset);
 	unsigned len = i_blocksize(inode);
-	unsigned poff = offset & (PAGE_SIZE - 1);
+	unsigned poff = offset & (thp_size(page) - 1);
 	bool merged, same_page = false;
 
 	if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, offset, sector)) {
@@ -1385,11 +1391,12 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	struct iomap_page *iop = to_iomap_page(page);
 	struct iomap_ioend *ioend, *next;
 	unsigned len = i_blocksize(inode);
-	u64 file_offset; /* file offset of page */
+	loff_t pos;
 	int error = 0, count = 0, i;
+	int nr_blocks = i_blocks_per_page(inode, page);
 	LIST_HEAD(submit_list);
 
-	WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
+	WARN_ON_ONCE(nr_blocks > 1 && !iop);
 	WARN_ON_ONCE(iop && atomic_read(&iop->write_count) != 0);
 
 	/*
@@ -1397,20 +1404,20 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
 	 * end of the current map or find the current map invalid, grab a new
 	 * one.
 	 */
-	for (i = 0, file_offset = page_offset(page);
-	     i < (PAGE_SIZE >> inode->i_blkbits) && file_offset < end_offset;
-	     i++, file_offset += len) {
+	for (i = 0, pos = page_offset(page);
+	     i < nr_blocks && pos < end_offset;
+	     i++, pos += len) {
 		if (iop && !test_bit(i, iop->uptodate))
 			continue;
 
-		error = wpc->ops->map_blocks(wpc, inode, file_offset);
+		error = wpc->ops->map_blocks(wpc, inode, pos);
 		if (error)
 			break;
 		if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
 			continue;
 		if (wpc->iomap.type == IOMAP_HOLE)
 			continue;
-		iomap_add_to_ioend(inode, file_offset, page, iop, wpc, wbc,
+		iomap_add_to_ioend(inode, pos, page, iop, wpc, wbc,
 				 &submit_list);
 		count++;
 	}
@@ -1492,7 +1499,6 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 {
 	struct iomap_writepage_ctx *wpc = data;
 	struct inode *inode = page->mapping->host;
-	pgoff_t end_index;
 	u64 end_offset;
 	loff_t offset;
 
@@ -1533,10 +1539,8 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 	 * ---------------------------------^------------------|
 	 */
 	offset = i_size_read(inode);
-	end_index = offset >> PAGE_SHIFT;
-	if (page->index < end_index)
-		end_offset = (loff_t)(page->index + 1) << PAGE_SHIFT;
-	else {
+	end_offset = page_offset(page) + thp_size(page);
+	if (end_offset > offset) {
 		/*
 		 * Check whether the page to write out is beyond or straddles
 		 * i_size or not.
@@ -1548,7 +1552,8 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 		 * |				    |      Straddles     |
 		 * ---------------------------------^-----------|--------|
 		 */
-		unsigned offset_into_page = offset & (PAGE_SIZE - 1);
+		unsigned offset_into_page = offset_in_thp(page, offset);
+		pgoff_t end_index = offset >> PAGE_SHIFT;
 
 		/*
 		 * Skip the page if it is fully outside i_size, e.g. due to a
@@ -1579,7 +1584,7 @@ iomap_do_writepage(struct page *page, struct writeback_control *wbc, void *data)
 		 * memory is zeroed when mapped, and writes to that region are
 		 * not written out to the file."
 		 */
-		zero_user_segment(page, offset_into_page, PAGE_SIZE);
+		zero_user_segment(page, offset_into_page, thp_size(page));
 
 		/* Adjust the end_offset to the end of file */
 		end_offset = offset;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 14/25] iomap: Inline data shouldn't see large pages
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (12 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 13/25] iomap: Support large pages in write paths Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 15/25] xfs: Support " Matthew Wilcox
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel,
	Christoph Hellwig

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Assert that we're not seeing large pages in functions that read/write
inline data, rather than zeroing out the tail.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 709be90a1997..e489b8769fcb 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -221,6 +221,7 @@ iomap_read_inline_data(struct inode *inode, struct page *page,
 		return;
 
 	BUG_ON(page->index);
+	BUG_ON(PageCompound(page));
 	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
 
 	addr = kmap_atomic(page);
@@ -732,6 +733,7 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
 	void *addr;
 
 	WARN_ON_ONCE(!PageUptodate(page));
+	BUG_ON(PageCompound(page));
 	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
 
 	addr = kmap_atomic(page);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 15/25] xfs: Support large pages
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (13 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 14/25] iomap: Inline data shouldn't see large pages Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 16/25] mm: Make prep_transhuge_page return its argument Matthew Wilcox
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

There is one place which assumes the size of a page; fix it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/xfs/xfs_aops.c  | 2 +-
 fs/xfs/xfs_super.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 5b25f5ee84dc..bb677ecbdf32 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -548,7 +548,7 @@ xfs_discard_page(
 	if (error && !XFS_FORCED_SHUTDOWN(mp))
 		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
 out_invalidate:
-	iomap_invalidatepage(page, 0, PAGE_SIZE);
+	iomap_invalidatepage(page, 0, thp_size(page));
 }
 
 static const struct iomap_writeback_ops xfs_writeback_ops = {
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index abf06bf9c3f3..0c7c4afa5afd 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1793,7 +1793,7 @@ static struct file_system_type xfs_fs_type = {
 	.init_fs_context	= xfs_init_fs_context,
 	.parameters		= xfs_fs_parameters,
 	.kill_sb		= kill_block_super,
-	.fs_flags		= FS_REQUIRES_DEV,
+	.fs_flags		= FS_REQUIRES_DEV | FS_LARGE_PAGES,
 };
 MODULE_ALIAS_FS("xfs");
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 16/25] mm: Make prep_transhuge_page return its argument
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (14 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 15/25] xfs: Support " Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 17/25] mm: Add __page_cache_alloc_order Matthew Wilcox
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel,
	Kirill A . Shutemov

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

By permitting NULL or order-0 pages as an argument, and returning the
argument, callers can write:

	return prep_transhuge_page(alloc_pages(...));

instead of assigning the result to a temporary variable and conditionally
passing that to prep_transhuge_page().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/huge_mm.h | 7 +++++--
 mm/huge_memory.c        | 9 +++++++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 1f6245091917..6a8502278f41 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -193,7 +193,7 @@ extern unsigned long thp_get_unmapped_area(struct file *filp,
 		unsigned long addr, unsigned long len, unsigned long pgoff,
 		unsigned long flags);
 
-extern void prep_transhuge_page(struct page *page);
+extern struct page *prep_transhuge_page(struct page *page);
 extern void free_transhuge_page(struct page *page);
 bool is_transparent_hugepage(struct page *page);
 
@@ -358,7 +358,10 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
 	return false;
 }
 
-static inline void prep_transhuge_page(struct page *page) {}
+static inline struct page *prep_transhuge_page(struct page *page)
+{
+	return page;
+}
 
 static inline bool is_transparent_hugepage(struct page *page)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6ecd1045113b..7a5e2b470bc7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -508,15 +508,20 @@ static inline struct deferred_split *get_deferred_split_queue(struct page *page)
 }
 #endif
 
-void prep_transhuge_page(struct page *page)
+struct page *prep_transhuge_page(struct page *page)
 {
+	if (!page || compound_order(page) == 0)
+		return page;
 	/*
-	 * we use page->mapping and page->indexlru in second tail page
+	 * we use page->mapping and page->index in second tail page
 	 * as list_head: assuming THP order >= 2
 	 */
+	BUG_ON(compound_order(page) == 1);
 
 	INIT_LIST_HEAD(page_deferred_list(page));
 	set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
+
+	return page;
 }
 
 bool is_transparent_hugepage(struct page *page)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 17/25] mm: Add __page_cache_alloc_order
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (15 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 16/25] mm: Make prep_transhuge_page return its argument Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-05-06 18:03   ` Yang Shi
  2020-04-29 13:36 ` [PATCH v3 18/25] mm: Allow large pages to be added to the page cache Matthew Wilcox
                   ` (9 subsequent siblings)
  26 siblings, 1 reply; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel,
	Kirill A . Shutemov

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

This new function allows page cache pages to be allocated that are
larger than an order-0 page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/pagemap.h | 24 +++++++++++++++++++++---
 mm/filemap.c            | 12 ++++++++----
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 55199cb5bd66..1169e2428dd7 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -205,15 +205,33 @@ static inline int page_cache_add_speculative(struct page *page, int count)
 	return __page_cache_add_speculative(page, count);
 }
 
+static inline gfp_t thp_gfpmask(gfp_t gfp)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	/* We'd rather allocate smaller pages than stall a page fault */
+	gfp |= GFP_TRANSHUGE_LIGHT;
+	gfp &= ~__GFP_DIRECT_RECLAIM;
+#endif
+	return gfp;
+}
+
 #ifdef CONFIG_NUMA
-extern struct page *__page_cache_alloc(gfp_t gfp);
+extern struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order);
 #else
-static inline struct page *__page_cache_alloc(gfp_t gfp)
+static inline
+struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order)
 {
-	return alloc_pages(gfp, 0);
+	if (order == 0)
+		return alloc_pages(gfp, 0);
+	return prep_transhuge_page(alloc_pages(thp_gfpmask(gfp), order));
 }
 #endif
 
+static inline struct page *__page_cache_alloc(gfp_t gfp)
+{
+	return __page_cache_alloc_order(gfp, 0);
+}
+
 static inline struct page *page_cache_alloc(struct address_space *x)
 {
 	return __page_cache_alloc(mapping_gfp_mask(x));
diff --git a/mm/filemap.c b/mm/filemap.c
index 23a051a7ef0f..9abba062973a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -941,24 +941,28 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
 EXPORT_SYMBOL_GPL(add_to_page_cache_lru);
 
 #ifdef CONFIG_NUMA
-struct page *__page_cache_alloc(gfp_t gfp)
+struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order)
 {
 	int n;
 	struct page *page;
 
+	if (order > 0)
+		gfp = thp_gfpmask(gfp);
+
 	if (cpuset_do_page_mem_spread()) {
 		unsigned int cpuset_mems_cookie;
 		do {
 			cpuset_mems_cookie = read_mems_allowed_begin();
 			n = cpuset_mem_spread_node();
-			page = __alloc_pages_node(n, gfp, 0);
+			page = __alloc_pages_node(n, gfp, order);
+			prep_transhuge_page(page);
 		} while (!page && read_mems_allowed_retry(cpuset_mems_cookie));
 
 		return page;
 	}
-	return alloc_pages(gfp, 0);
+	return prep_transhuge_page(alloc_pages(gfp, order));
 }
-EXPORT_SYMBOL(__page_cache_alloc);
+EXPORT_SYMBOL(__page_cache_alloc_order);
 #endif
 
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 18/25] mm: Allow large pages to be added to the page cache
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (16 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 17/25] mm: Add __page_cache_alloc_order Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-05-04  3:10   ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 19/25] mm: Allow large pages to be removed from " Matthew Wilcox
                   ` (8 subsequent siblings)
  26 siblings, 1 reply; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

We return -EEXIST if there are any non-shadow entries in the page
cache in the range covered by the large page.  If there are multiple
shadow entries in the range, we set *shadowp to one of them (currently
the one at the highest index).  If that turns out to be the wrong
answer, we can implement something more complex.  This is mostly
modelled after the equivalent function in the shmem code.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 46 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 33 insertions(+), 13 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 9abba062973a..842afee3d066 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -834,6 +834,7 @@ static int __add_to_page_cache_locked(struct page *page,
 	int huge = PageHuge(page);
 	struct mem_cgroup *memcg;
 	int error;
+	unsigned int nr = 1;
 	void *old;
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
@@ -845,31 +846,50 @@ static int __add_to_page_cache_locked(struct page *page,
 					      gfp_mask, &memcg, false);
 		if (error)
 			return error;
+		xas_set_order(&xas, offset, thp_order(page));
+		nr = hpage_nr_pages(page);
 	}
 
-	get_page(page);
+	page_ref_add(page, nr);
 	page->mapping = mapping;
 	page->index = offset;
 
 	do {
+		unsigned long exceptional = 0;
+		unsigned int i = 0;
+
 		xas_lock_irq(&xas);
-		old = xas_load(&xas);
-		if (old && !xa_is_value(old))
-			xas_set_err(&xas, -EEXIST);
-		xas_store(&xas, page);
+		xas_for_each_conflict(&xas, old) {
+			if (!xa_is_value(old)) {
+				xas_set_err(&xas, -EEXIST);
+				break;
+			}
+			exceptional++;
+			if (shadowp)
+				*shadowp = old;
+		}
+		xas_create_range(&xas);
 		if (xas_error(&xas))
 			goto unlock;
 
-		if (xa_is_value(old)) {
-			mapping->nrexceptional--;
-			if (shadowp)
-				*shadowp = old;
+next:
+		xas_store(&xas, page);
+		if (++i < nr) {
+			xas_next(&xas);
+			goto next;
 		}
-		mapping->nrpages++;
+		mapping->nrexceptional -= exceptional;
+		mapping->nrpages += nr;
 
 		/* hugetlb pages do not participate in page cache accounting */
-		if (!huge)
-			__inc_node_page_state(page, NR_FILE_PAGES);
+		if (!huge) {
+			__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES,
+						nr);
+			if (nr > 1) {
+				__inc_node_page_state(page, NR_FILE_THPS);
+				filemap_nr_thps_inc(mapping);
+			}
+		}
 unlock:
 		xas_unlock_irq(&xas);
 	} while (xas_nomem(&xas, gfp_mask & GFP_RECLAIM_MASK));
@@ -886,7 +906,7 @@ static int __add_to_page_cache_locked(struct page *page,
 	/* Leave page->index set: truncation relies upon it */
 	if (!huge)
 		mem_cgroup_cancel_charge(page, memcg, false);
-	put_page(page);
+	page_ref_sub(page, nr);
 	return xas_error(&xas);
 }
 ALLOW_ERROR_INJECTION(__add_to_page_cache_locked, ERRNO);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 19/25] mm: Allow large pages to be removed from the page cache
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (17 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 18/25] mm: Allow large pages to be added to the page cache Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 20/25] mm: Remove page fault assumption of compound page size Matthew Wilcox
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

page_cache_free_page() assumes compound pages are PMD_SIZE; fix
that assumption.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 842afee3d066..8c174e6064d4 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -248,7 +248,7 @@ static void page_cache_free_page(struct address_space *mapping,
 		freepage(page);
 
 	if (PageTransHuge(page) && !PageHuge(page)) {
-		page_ref_sub(page, HPAGE_PMD_NR);
+		page_ref_sub(page, hpage_nr_pages(page));
 		VM_BUG_ON_PAGE(page_count(page) <= 0, page);
 	} else {
 		put_page(page);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 20/25] mm: Remove page fault assumption of compound page size
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (18 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 19/25] mm: Allow large pages to be removed from " Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 21/25] mm: Add DEFINE_READAHEAD Matthew Wilcox
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

A compound page in the page cache will not necessarily be of PMD size,
so check explicitly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index f703fe8c8346..d68ce428ddd2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3549,13 +3549,14 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
 	pmd_t entry;
 	int i;
-	vm_fault_t ret;
+	vm_fault_t ret = VM_FAULT_FALLBACK;
 
 	if (!transhuge_vma_suitable(vma, haddr))
-		return VM_FAULT_FALLBACK;
+		return ret;
 
-	ret = VM_FAULT_FALLBACK;
 	page = compound_head(page);
+	if (page_order(page) != HPAGE_PMD_ORDER)
+		return ret;
 
 	/*
 	 * Archs like ppc64 need additonal space to store information
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 21/25] mm: Add DEFINE_READAHEAD
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (19 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 20/25] mm: Remove page fault assumption of compound page size Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 22/25] mm: Make page_cache_readahead_unbounded take a readahead_control Matthew Wilcox
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Allow for a more concise definition of a struct readahead_control.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 7 +++++++
 mm/readahead.c          | 6 +-----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 1169e2428dd7..ff5bf10829a6 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -684,6 +684,13 @@ struct readahead_control {
 	unsigned int _batch_count;
 };
 
+#define DEFINE_READAHEAD(rac, f, m, i)					\
+	struct readahead_control rac = {				\
+		.file = f,						\
+		.mapping = m,						\
+		._index = i,						\
+	}
+
 /**
  * readahead_page - Get the next page to read.
  * @rac: The current readahead request.
diff --git a/mm/readahead.c b/mm/readahead.c
index 3c9a8dd7c56c..2126a2754e22 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -179,11 +179,7 @@ void page_cache_readahead_unbounded(struct address_space *mapping,
 {
 	LIST_HEAD(page_pool);
 	gfp_t gfp_mask = readahead_gfp_mask(mapping);
-	struct readahead_control rac = {
-		.mapping = mapping,
-		.file = file,
-		._index = index,
-	};
+	DEFINE_READAHEAD(rac, file, mapping, index);
 	unsigned long i;
 
 	/*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 22/25] mm: Make page_cache_readahead_unbounded take a readahead_control
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (20 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 21/25] mm: Add DEFINE_READAHEAD Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 23/25] mm: Make __do_page_cache_readahead " Matthew Wilcox
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Define it in the callers instead of in page_cache_readahead_unbounded().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/ext4/verity.c        |  4 ++--
 fs/f2fs/verity.c        |  4 ++--
 include/linux/pagemap.h |  5 ++---
 mm/readahead.c          | 26 ++++++++++++--------------
 4 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
index dec1244dd062..fe2e541543da 100644
--- a/fs/ext4/verity.c
+++ b/fs/ext4/verity.c
@@ -346,6 +346,7 @@ static struct page *ext4_read_merkle_tree_page(struct inode *inode,
 					       pgoff_t index,
 					       unsigned long num_ra_pages)
 {
+	DEFINE_READAHEAD(rac, NULL, inode->i_mapping, index);
 	struct page *page;
 
 	index += ext4_verity_metadata_pos(inode) >> PAGE_SHIFT;
@@ -355,8 +356,7 @@ static struct page *ext4_read_merkle_tree_page(struct inode *inode,
 		if (page)
 			put_page(page);
 		else if (num_ra_pages > 1)
-			page_cache_readahead_unbounded(inode->i_mapping, NULL,
-					index, num_ra_pages, 0);
+			page_cache_readahead_unbounded(&rac, num_ra_pages, 0);
 		page = read_mapping_page(inode->i_mapping, index, NULL);
 	}
 	return page;
diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
index 865c9fb774fb..707a94745472 100644
--- a/fs/f2fs/verity.c
+++ b/fs/f2fs/verity.c
@@ -226,6 +226,7 @@ static struct page *f2fs_read_merkle_tree_page(struct inode *inode,
 					       pgoff_t index,
 					       unsigned long num_ra_pages)
 {
+	DEFINE_READAHEAD(rac, NULL, inode->i_mapping, index);
 	struct page *page;
 
 	index += f2fs_verity_metadata_pos(inode) >> PAGE_SHIFT;
@@ -235,8 +236,7 @@ static struct page *f2fs_read_merkle_tree_page(struct inode *inode,
 		if (page)
 			put_page(page);
 		else if (num_ra_pages > 1)
-			page_cache_readahead_unbounded(inode->i_mapping, NULL,
-					index, num_ra_pages, 0);
+			page_cache_readahead_unbounded(&rac, num_ra_pages, 0);
 		page = read_mapping_page(inode->i_mapping, index, NULL);
 	}
 	return page;
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ff5bf10829a6..7eb54f5c403b 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -640,9 +640,8 @@ void page_cache_sync_readahead(struct address_space *, struct file_ra_state *,
 void page_cache_async_readahead(struct address_space *, struct file_ra_state *,
 		struct file *, struct page *, pgoff_t index,
 		unsigned long req_count);
-void page_cache_readahead_unbounded(struct address_space *, struct file *,
-		pgoff_t index, unsigned long nr_to_read,
-		unsigned long lookahead_count);
+void page_cache_readahead_unbounded(struct readahead_control *,
+		unsigned long nr_to_read, unsigned long lookahead_count);
 
 /*
  * Like add_to_page_cache_locked, but used to add newly allocated pages:
diff --git a/mm/readahead.c b/mm/readahead.c
index 2126a2754e22..62da2d4beed1 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -159,9 +159,7 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
 
 /**
  * page_cache_readahead_unbounded - Start unchecked readahead.
- * @mapping: File address space.
- * @file: This instance of the open file; used for authentication.
- * @index: First page index to read.
+ * @rac: Readahead control.
  * @nr_to_read: The number of pages to read.
  * @lookahead_size: Where to start the next readahead.
  *
@@ -173,13 +171,13 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
  * Context: File is referenced by caller.  Mutexes may be held by caller.
  * May sleep, but will not reenter filesystem to reclaim memory.
  */
-void page_cache_readahead_unbounded(struct address_space *mapping,
-		struct file *file, pgoff_t index, unsigned long nr_to_read,
-		unsigned long lookahead_size)
+void page_cache_readahead_unbounded(struct readahead_control *rac,
+		unsigned long nr_to_read, unsigned long lookahead_size)
 {
+	struct address_space *mapping = rac->mapping;
+	unsigned long index = readahead_index(rac);
 	LIST_HEAD(page_pool);
 	gfp_t gfp_mask = readahead_gfp_mask(mapping);
-	DEFINE_READAHEAD(rac, file, mapping, index);
 	unsigned long i;
 
 	/*
@@ -200,7 +198,7 @@ void page_cache_readahead_unbounded(struct address_space *mapping,
 	for (i = 0; i < nr_to_read; i++) {
 		struct page *page = xa_load(&mapping->i_pages, index + i);
 
-		BUG_ON(index + i != rac._index + rac._nr_pages);
+		BUG_ON(index + i != rac->_index + rac->_nr_pages);
 
 		if (page && !xa_is_value(page)) {
 			/*
@@ -211,7 +209,7 @@ void page_cache_readahead_unbounded(struct address_space *mapping,
 			 * have a stable reference to this page, and it's
 			 * not worth getting one just for that.
 			 */
-			read_pages(&rac, &page_pool, true);
+			read_pages(rac, &page_pool, true);
 			continue;
 		}
 
@@ -224,12 +222,12 @@ void page_cache_readahead_unbounded(struct address_space *mapping,
 		} else if (add_to_page_cache_lru(page, mapping, index + i,
 					gfp_mask) < 0) {
 			put_page(page);
-			read_pages(&rac, &page_pool, true);
+			read_pages(rac, &page_pool, true);
 			continue;
 		}
 		if (i == nr_to_read - lookahead_size)
 			SetPageReadahead(page);
-		rac._nr_pages++;
+		rac->_nr_pages++;
 	}
 
 	/*
@@ -237,7 +235,7 @@ void page_cache_readahead_unbounded(struct address_space *mapping,
 	 * uptodate then the caller will launch readpage again, and
 	 * will then handle the error.
 	 */
-	read_pages(&rac, &page_pool, false);
+	read_pages(rac, &page_pool, false);
 	memalloc_nofs_restore(nofs);
 }
 EXPORT_SYMBOL_GPL(page_cache_readahead_unbounded);
@@ -252,6 +250,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
 		struct file *file, pgoff_t index, unsigned long nr_to_read,
 		unsigned long lookahead_size)
 {
+	DEFINE_READAHEAD(rac, file, mapping, index);
 	struct inode *inode = mapping->host;
 	loff_t isize = i_size_read(inode);
 	pgoff_t end_index;	/* The last page we want to read */
@@ -266,8 +265,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
 	if (nr_to_read > end_index - index)
 		nr_to_read = end_index - index + 1;
 
-	page_cache_readahead_unbounded(mapping, file, index, nr_to_read,
-			lookahead_size);
+	page_cache_readahead_unbounded(&rac, nr_to_read, lookahead_size);
 }
 
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 23/25] mm: Make __do_page_cache_readahead take a readahead_control
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (21 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 22/25] mm: Make page_cache_readahead_unbounded take a readahead_control Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 24/25] mm: Add large page readahead Matthew Wilcox
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Also call __do_page_cache_readahead() directly from ondemand_readahead()
instead of indirecting via ra_submit().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/internal.h  | 11 +++++------
 mm/readahead.c | 26 ++++++++++++++------------
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 5efb13d5c226..fd3eaff7acdc 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -51,18 +51,17 @@ void unmap_page_range(struct mmu_gather *tlb,
 
 void force_page_cache_readahead(struct address_space *, struct file *,
 		pgoff_t index, unsigned long nr_to_read);
-void __do_page_cache_readahead(struct address_space *, struct file *,
-		pgoff_t index, unsigned long nr_to_read,
-		unsigned long lookahead_size);
+void __do_page_cache_readahead(struct readahead_control *,
+		unsigned long nr_to_read, unsigned long lookahead_size);
 
 /*
  * Submit IO for the read-ahead request in file_ra_state.
  */
 static inline void ra_submit(struct file_ra_state *ra,
-		struct address_space *mapping, struct file *filp)
+		struct address_space *mapping, struct file *file)
 {
-	__do_page_cache_readahead(mapping, filp,
-			ra->start, ra->size, ra->async_size);
+	DEFINE_READAHEAD(rac, file, mapping, ra->start);
+	__do_page_cache_readahead(&rac, ra->size, ra->async_size);
 }
 
 /**
diff --git a/mm/readahead.c b/mm/readahead.c
index 62da2d4beed1..74c7e1eff540 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -246,12 +246,11 @@ EXPORT_SYMBOL_GPL(page_cache_readahead_unbounded);
  * behaviour which would occur if page allocations are causing VM writeback.
  * We really don't want to intermingle reads and writes like that.
  */
-void __do_page_cache_readahead(struct address_space *mapping,
-		struct file *file, pgoff_t index, unsigned long nr_to_read,
-		unsigned long lookahead_size)
+void __do_page_cache_readahead(struct readahead_control *rac,
+		unsigned long nr_to_read, unsigned long lookahead_size)
 {
-	DEFINE_READAHEAD(rac, file, mapping, index);
-	struct inode *inode = mapping->host;
+	struct inode *inode = rac->mapping->host;
+	unsigned long index = readahead_index(rac);
 	loff_t isize = i_size_read(inode);
 	pgoff_t end_index;	/* The last page we want to read */
 
@@ -265,7 +264,7 @@ void __do_page_cache_readahead(struct address_space *mapping,
 	if (nr_to_read > end_index - index)
 		nr_to_read = end_index - index + 1;
 
-	page_cache_readahead_unbounded(&rac, nr_to_read, lookahead_size);
+	page_cache_readahead_unbounded(rac, nr_to_read, lookahead_size);
 }
 
 /*
@@ -273,10 +272,11 @@ void __do_page_cache_readahead(struct address_space *mapping,
  * memory at once.
  */
 void force_page_cache_readahead(struct address_space *mapping,
-		struct file *filp, pgoff_t index, unsigned long nr_to_read)
+		struct file *file, pgoff_t index, unsigned long nr_to_read)
 {
+	DEFINE_READAHEAD(rac, file, mapping, index);
 	struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
-	struct file_ra_state *ra = &filp->f_ra;
+	struct file_ra_state *ra = &file->f_ra;
 	unsigned long max_pages;
 
 	if (unlikely(!mapping->a_ops->readpage && !mapping->a_ops->readpages &&
@@ -294,7 +294,7 @@ void force_page_cache_readahead(struct address_space *mapping,
 
 		if (this_chunk > nr_to_read)
 			this_chunk = nr_to_read;
-		__do_page_cache_readahead(mapping, filp, index, this_chunk, 0);
+		__do_page_cache_readahead(&rac, this_chunk, 0);
 
 		index += this_chunk;
 		nr_to_read -= this_chunk;
@@ -432,10 +432,11 @@ static int try_context_readahead(struct address_space *mapping,
  * A minimal readahead algorithm for trivial sequential/random reads.
  */
 static void ondemand_readahead(struct address_space *mapping,
-		struct file_ra_state *ra, struct file *filp,
+		struct file_ra_state *ra, struct file *file,
 		bool hit_readahead_marker, pgoff_t index,
 		unsigned long req_size)
 {
+	DEFINE_READAHEAD(rac, file, mapping, index);
 	struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
 	unsigned long max_pages = ra->ra_pages;
 	unsigned long add_pages;
@@ -516,7 +517,7 @@ static void ondemand_readahead(struct address_space *mapping,
 	 * standalone, small random read
 	 * Read as is, and do not pollute the readahead state.
 	 */
-	__do_page_cache_readahead(mapping, filp, index, req_size, 0);
+	__do_page_cache_readahead(&rac, req_size, 0);
 	return;
 
 initial_readahead:
@@ -542,7 +543,8 @@ static void ondemand_readahead(struct address_space *mapping,
 		}
 	}
 
-	ra_submit(ra, mapping, filp);
+	rac._index = ra->start;
+	__do_page_cache_readahead(&rac, ra->size, ra->async_size);
 }
 
 /**
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 24/25] mm: Add large page readahead
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (22 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 23/25] mm: Make __do_page_cache_readahead " Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 13:36 ` [PATCH v3 25/25] mm: Align THP mappings for non-DAX Matthew Wilcox
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

If the filesystem supports large pages, allocate larger pages in the
readahead code when it seems worth doing.  The heuristic for choosing
larger page sizes will surely need some tuning, but this aggressive
ramp-up seems good for testing.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/readahead.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 87 insertions(+), 6 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 74c7e1eff540..e2493189e832 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -149,7 +149,7 @@ static void read_pages(struct readahead_control *rac, struct list_head *pages,
 
 	blk_finish_plug(&plug);
 
-	BUG_ON(!list_empty(pages));
+	BUG_ON(pages && !list_empty(pages));
 	BUG_ON(readahead_count(rac));
 
 out:
@@ -428,13 +428,92 @@ static int try_context_readahead(struct address_space *mapping,
 	return 1;
 }
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static inline int ra_alloc_page(struct readahead_control *rac, pgoff_t index,
+		pgoff_t mark, unsigned int order, gfp_t gfp)
+{
+	int err;
+	struct page *page = __page_cache_alloc_order(gfp, order);
+
+	if (!page)
+		return -ENOMEM;
+	if (mark - index < (1UL << order))
+		SetPageReadahead(page);
+	err = add_to_page_cache_lru(page, rac->mapping, index, gfp);
+	if (err)
+		put_page(page);
+	else
+		rac->_nr_pages += 1UL << order;
+	return err;
+}
+
+static bool page_cache_readahead_order(struct readahead_control *rac,
+		struct file_ra_state *ra, unsigned int order)
+{
+	struct address_space *mapping = rac->mapping;
+	unsigned int old_order = order;
+	pgoff_t index = readahead_index(rac);
+	pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
+	pgoff_t mark = index + ra->size - ra->async_size;
+	int err = 0;
+	gfp_t gfp = readahead_gfp_mask(mapping);
+
+	if (!(mapping->host->i_sb->s_type->fs_flags & FS_LARGE_PAGES))
+		return false;
+
+	limit = min(limit, index + ra->size - 1);
+
+	/* Grow page size up to PMD size */
+	if (order < HPAGE_PMD_ORDER) {
+		order += 2;
+		if (order > HPAGE_PMD_ORDER)
+			order = HPAGE_PMD_ORDER;
+		while ((1 << order) > ra->size)
+			order--;
+	}
+
+	/* If size is somehow misaligned, fill with order-0 pages */
+	while (!err && index & ((1UL << old_order) - 1))
+		err = ra_alloc_page(rac, index++, mark, 0, gfp);
+
+	while (!err && index & ((1UL << order) - 1)) {
+		err = ra_alloc_page(rac, index, mark, old_order, gfp);
+		index += 1UL << old_order;
+	}
+
+	while (!err && index <= limit) {
+		err = ra_alloc_page(rac, index, mark, order, gfp);
+		index += 1UL << order;
+	}
+
+	if (index > limit) {
+		ra->size += index - limit - 1;
+		ra->async_size += index - limit - 1;
+	}
+
+	read_pages(rac, NULL, false);
+
+	/*
+	 * If there were already pages in the page cache, then we may have
+	 * left some gaps.  Let the regular readahead code take care of this
+	 * situation.
+	 */
+	return !err;
+}
+#else
+static bool page_cache_readahead_order(struct readahead_control *rac,
+		struct file_ra_state *ra, unsigned int order)
+{
+	return false;
+}
+#endif
+
 /*
  * A minimal readahead algorithm for trivial sequential/random reads.
  */
 static void ondemand_readahead(struct address_space *mapping,
 		struct file_ra_state *ra, struct file *file,
-		bool hit_readahead_marker, pgoff_t index,
-		unsigned long req_size)
+		struct page *page, pgoff_t index, unsigned long req_size)
 {
 	DEFINE_READAHEAD(rac, file, mapping, index);
 	struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
@@ -473,7 +552,7 @@ static void ondemand_readahead(struct address_space *mapping,
 	 * Query the pagecache for async_size, which normally equals to
 	 * readahead size. Ramp it up and use it as the new readahead size.
 	 */
-	if (hit_readahead_marker) {
+	if (page) {
 		pgoff_t start;
 
 		rcu_read_lock();
@@ -544,6 +623,8 @@ static void ondemand_readahead(struct address_space *mapping,
 	}
 
 	rac._index = ra->start;
+	if (page && page_cache_readahead_order(&rac, ra, compound_order(page)))
+		return;
 	__do_page_cache_readahead(&rac, ra->size, ra->async_size);
 }
 
@@ -578,7 +659,7 @@ void page_cache_sync_readahead(struct address_space *mapping,
 	}
 
 	/* do read-ahead */
-	ondemand_readahead(mapping, ra, filp, false, index, req_count);
+	ondemand_readahead(mapping, ra, filp, NULL, index, req_count);
 }
 EXPORT_SYMBOL_GPL(page_cache_sync_readahead);
 
@@ -624,7 +705,7 @@ page_cache_async_readahead(struct address_space *mapping,
 		return;
 
 	/* do read-ahead */
-	ondemand_readahead(mapping, ra, filp, true, index, req_count);
+	ondemand_readahead(mapping, ra, filp, page, index, req_count);
 }
 EXPORT_SYMBOL_GPL(page_cache_async_readahead);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v3 25/25] mm: Align THP mappings for non-DAX
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (23 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 24/25] mm: Add large page readahead Matthew Wilcox
@ 2020-04-29 13:36 ` Matthew Wilcox
  2020-04-29 15:40 ` [PATCH v3 00/25] Large pages in the page cache Kirill A. Shutemov
  2020-04-30 11:34 ` Matthew Wilcox
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-29 13:36 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: William Kucharski, linux-mm, linux-kernel, Matthew Wilcox

From: William Kucharski <william.kucharski@oracle.com>

When we have the opportunity to use transparent huge pages to map a
file, we want to follow the same rules as DAX.

Signed-off-by: William Kucharski <william.kucharski@oracle.com>
[Inline __thp_get_unmapped_area() into thp_get_unmapped_area()]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/huge_memory.c | 40 +++++++++++++---------------------------
 1 file changed, 13 insertions(+), 27 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7a5e2b470bc7..ebaf649aa28d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -535,30 +535,30 @@ bool is_transparent_hugepage(struct page *page)
 }
 EXPORT_SYMBOL_GPL(is_transparent_hugepage);
 
-static unsigned long __thp_get_unmapped_area(struct file *filp,
-		unsigned long addr, unsigned long len,
-		loff_t off, unsigned long flags, unsigned long size)
+unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
+		unsigned long len, unsigned long pgoff, unsigned long flags)
 {
+	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
 	loff_t off_end = off + len;
-	loff_t off_align = round_up(off, size);
+	loff_t off_align = round_up(off, PMD_SIZE);
 	unsigned long len_pad, ret;
 
-	if (off_end <= off_align || (off_end - off_align) < size)
-		return 0;
+	if (off_end <= off_align || (off_end - off_align) < PMD_SIZE)
+		goto regular;
 
-	len_pad = len + size;
+	len_pad = len + PMD_SIZE;
 	if (len_pad < len || (off + len_pad) < off)
-		return 0;
+		goto regular;
 
 	ret = current->mm->get_unmapped_area(filp, addr, len_pad,
 					      off >> PAGE_SHIFT, flags);
 
 	/*
-	 * The failure might be due to length padding. The caller will retry
-	 * without the padding.
+	 * The failure might be due to length padding.  Retry without
+	 * the padding.
 	 */
 	if (IS_ERR_VALUE(ret))
-		return 0;
+		goto regular;
 
 	/*
 	 * Do not try to align to THP boundary if allocation at the address
@@ -567,23 +567,9 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
 	if (ret == addr)
 		return addr;
 
-	ret += (off - ret) & (size - 1);
+	ret += (off - ret) & (PMD_SIZE - 1);
 	return ret;
-}
-
-unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
-		unsigned long len, unsigned long pgoff, unsigned long flags)
-{
-	unsigned long ret;
-	loff_t off = (loff_t)pgoff << PAGE_SHIFT;
-
-	if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD))
-		goto out;
-
-	ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE);
-	if (ret)
-		return ret;
-out:
+regular:
 	return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
 }
 EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 00/25] Large pages in the page cache
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (24 preceding siblings ...)
  2020-04-29 13:36 ` [PATCH v3 25/25] mm: Align THP mappings for non-DAX Matthew Wilcox
@ 2020-04-29 15:40 ` Kirill A. Shutemov
  2020-04-29 15:45   ` Kirill A. Shutemov
  2020-04-30 11:34 ` Matthew Wilcox
  26 siblings, 1 reply; 36+ messages in thread
From: Kirill A. Shutemov @ 2020-04-29 15:40 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel

On Wed, Apr 29, 2020 at 06:36:32AM -0700, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> 
> This patch set does not pass xfstests.  Test at your own risk.  It is
> based on the readahead rewrite which is in Andrew's tree.  The large
> pages somehow manage to fall off the LRU, so the test VM quickly runs
> out of memory and freezes.  To reproduce:
> 
> # mkfs.xfs /dev/sdb && mount /dev/sdb /mnt && dd if=/dev/zero bs=1M count=2048 of=/mnt/bigfile && sync && sleep 2 && sync && echo 1 >/proc/sys/vm/drop_caches 
> # /host/home/willy/kernel/xarray-2/tools/vm/page-types | grep thp
> 0x0000000000401800	       511        1  ___________Ma_________t____________________	mmap,anonymous,thp
> 0x0000000000405868	         1        0  ___U_lA____Ma_b_______t____________________	uptodate,lru,active,mmap,anonymous,swapbacked,thp
> # dd if=/mnt/bigfile of=/dev/null bs=2M count=5
> # /host/home/willy/kernel/xarray-2/tools/vm/page-types | grep thp
> 0x0000000000400000	      2516        9  ______________________t____________________	thp
> 0x0000000000400028	         1        0  ___U_l________________t____________________	uptodate,lru,thp
> 0x000000000040006c	       106        0  __RU_lA_______________t____________________	referenced,uptodate,lru,active,thp

Note that you have 107 pages on LRU. It is only head pages. With order-5
pages it is over 13MiB.

Looks like everything is fine.

> 0x0000000000400228	         1        0  ___U_l___I____________t____________________	uptodate,lru,reclaim,thp
> 0x0000000000401800	       511        1  ___________Ma_________t____________________	mmap,anonymous,thp
> 0x0000000000405868	         1        0  ___U_lA____Ma_b_______t____________________	uptodate,lru,active,mmap,anonymous,swapbacked,thp
> 
> 
> The principal idea here is that a large part of the overhead in dealing
> with individual pages is that there's just so darned many of them.  We
> would be better off dealing with fewer, larger pages, even if they don't
> get to be the size necessary for the CPU to use a larger TLB entry.
> 
> Matthew Wilcox (Oracle) (24):
>   mm: Allow hpages to be arbitrary order
>   mm: Introduce thp_size
>   mm: Introduce thp_order
>   mm: Introduce offset_in_thp
>   fs: Add a filesystem flag for large pages
>   fs: Introduce i_blocks_per_page
>   fs: Make page_mkwrite_check_truncate thp-aware
>   fs: Support THPs in zero_user_segments
>   bio: Add bio_for_each_thp_segment_all
>   iomap: Support arbitrarily many blocks per page
>   iomap: Support large pages in iomap_adjust_read_range
>   iomap: Support large pages in read paths
>   iomap: Support large pages in write paths
>   iomap: Inline data shouldn't see large pages
>   xfs: Support large pages
>   mm: Make prep_transhuge_page return its argument
>   mm: Add __page_cache_alloc_order
>   mm: Allow large pages to be added to the page cache
>   mm: Allow large pages to be removed from the page cache
>   mm: Remove page fault assumption of compound page size
>   mm: Add DEFINE_READAHEAD
>   mm: Make page_cache_readahead_unbounded take a readahead_control
>   mm: Make __do_page_cache_readahead take a readahead_control
>   mm: Add large page readahead
> 
> William Kucharski (1):
>   mm: Align THP mappings for non-DAX
> 
>  drivers/nvdimm/btt.c    |   4 +-
>  drivers/nvdimm/pmem.c   |   6 +-
>  fs/ext4/verity.c        |   4 +-
>  fs/f2fs/verity.c        |   4 +-
>  fs/iomap/buffered-io.c  | 110 ++++++++++++++++--------------
>  fs/jfs/jfs_metapage.c   |   2 +-
>  fs/xfs/xfs_aops.c       |   4 +-
>  fs/xfs/xfs_super.c      |   2 +-
>  include/linux/bio.h     |  13 ++++
>  include/linux/bvec.h    |  23 +++++++
>  include/linux/fs.h      |   1 +
>  include/linux/highmem.h |  15 +++--
>  include/linux/huge_mm.h |  25 +++++--
>  include/linux/mm.h      |  97 ++++++++++++++-------------
>  include/linux/pagemap.h |  62 ++++++++++++++---
>  mm/filemap.c            |  60 ++++++++++++-----
>  mm/highmem.c            |  62 ++++++++++++++++-
>  mm/huge_memory.c        |  49 ++++++--------
>  mm/internal.h           |  13 ++--
>  mm/memory.c             |   7 +-
>  mm/page_io.c            |   2 +-
>  mm/page_vma_mapped.c    |   4 +-
>  mm/readahead.c          | 145 ++++++++++++++++++++++++++++++----------
>  23 files changed, 485 insertions(+), 229 deletions(-)
> 
> -- 
> 2.26.2
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 00/25] Large pages in the page cache
  2020-04-29 15:40 ` [PATCH v3 00/25] Large pages in the page cache Kirill A. Shutemov
@ 2020-04-29 15:45   ` Kirill A. Shutemov
  0 siblings, 0 replies; 36+ messages in thread
From: Kirill A. Shutemov @ 2020-04-29 15:45 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel

On Wed, Apr 29, 2020 at 06:40:02PM +0300, Kirill A. Shutemov wrote:
> On Wed, Apr 29, 2020 at 06:36:32AM -0700, Matthew Wilcox wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> > 
> > This patch set does not pass xfstests.  Test at your own risk.  It is
> > based on the readahead rewrite which is in Andrew's tree.  The large
> > pages somehow manage to fall off the LRU, so the test VM quickly runs
> > out of memory and freezes.  To reproduce:
> > 
> > # mkfs.xfs /dev/sdb && mount /dev/sdb /mnt && dd if=/dev/zero bs=1M count=2048 of=/mnt/bigfile && sync && sleep 2 && sync && echo 1 >/proc/sys/vm/drop_caches 
> > # /host/home/willy/kernel/xarray-2/tools/vm/page-types | grep thp
> > 0x0000000000401800	       511        1  ___________Ma_________t____________________	mmap,anonymous,thp
> > 0x0000000000405868	         1        0  ___U_lA____Ma_b_______t____________________	uptodate,lru,active,mmap,anonymous,swapbacked,thp
> > # dd if=/mnt/bigfile of=/dev/null bs=2M count=5
> > # /host/home/willy/kernel/xarray-2/tools/vm/page-types | grep thp
> > 0x0000000000400000	      2516        9  ______________________t____________________	thp
> > 0x0000000000400028	         1        0  ___U_l________________t____________________	uptodate,lru,thp
> > 0x000000000040006c	       106        0  __RU_lA_______________t____________________	referenced,uptodate,lru,active,thp
> 
> Note that you have 107 pages on LRU. It is only head pages. With order-5
> pages it is over 13MiB.
> 
> Looks like everything is fine.

/proc/kpageflags reads page's flag bit directly instead of relying on
PageLRU:

	u |= kpf_copy_bit(k, KPF_LRU,		PG_lru);

Tail pages don't have this bit set. They rely on head page's flag for
PageLRU().

It would be nice to get it fixed, but I guess it is too late. Somebody may
rely on the current behaviour by now.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 00/25] Large pages in the page cache
  2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
                   ` (25 preceding siblings ...)
  2020-04-29 15:40 ` [PATCH v3 00/25] Large pages in the page cache Kirill A. Shutemov
@ 2020-04-30 11:34 ` Matthew Wilcox
  26 siblings, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-04-30 11:34 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-mm, linux-kernel

On Wed, Apr 29, 2020 at 06:36:32AM -0700, Matthew Wilcox wrote:
> This patch set does not pass xfstests.  Test at your own risk.  It is
> based on the readahead rewrite which is in Andrew's tree.  The large
> pages somehow manage to fall off the LRU, so the test VM quickly runs
> out of memory and freezes.

Kirill was right; that was not exactly the bug.  It did lead to the
realisation that this could be avoided by simply not splitting the page
if the filesystem knows how to write back large pages.  So this is
the current diff.

I just hit the bug in clear_inode() that i_data.nrpages is not 0, so
there's clearly at least one remaining problem to fix (and I suspect
about half a dozen).

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 184c8b516543..d511504d07af 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -235,7 +235,7 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
 	blk_status_t rc;
 
 	if (op_is_write(op))
-		rc = pmem_do_write(pmem, page, 0, sector, tmp_size(page));
+		rc = pmem_do_write(pmem, page, 0, sector, thp_size(page));
 	else
 		rc = pmem_do_read(pmem, page, 0, sector, thp_size(page));
 	/*
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 7eb54f5c403b..ce978ed4f0cd 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -116,6 +116,11 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
 	m->gfp_mask = mask;
 }
 
+static inline bool mapping_large_pages(struct address_space *mapping)
+{
+	return mapping->host->i_sb->s_type->fs_flags & FS_LARGE_PAGES;
+}
+
 void release_pages(struct page **pages, int nr);
 
 /*
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ebaf649aa28d..e78686b628ae 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2654,7 +2654,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 
 int total_mapcount(struct page *page)
 {
-	int i, compound, ret;
+	int i, compound, nr, ret;
 
 	VM_BUG_ON_PAGE(PageTail(page), page);
 
@@ -2662,16 +2662,17 @@ int total_mapcount(struct page *page)
 		return atomic_read(&page->_mapcount) + 1;
 
 	compound = compound_mapcount(page);
+	nr = compound_nr(page);
 	if (PageHuge(page))
 		return compound;
 	ret = compound;
-	for (i = 0; i < HPAGE_PMD_NR; i++)
+	for (i = 0; i < nr; i++)
 		ret += atomic_read(&page[i]._mapcount) + 1;
 	/* File pages has compound_mapcount included in _mapcount */
 	if (!PageAnon(page))
-		return ret - compound * HPAGE_PMD_NR;
+		return ret - compound * nr;
 	if (PageDoubleMap(page))
-		ret -= HPAGE_PMD_NR;
+		ret -= nr;
 	return ret;
 }
 
diff --git a/mm/readahead.c b/mm/readahead.c
index e2493189e832..ac16e96a8828 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -458,7 +458,7 @@ static bool page_cache_readahead_order(struct readahead_control *rac,
 	int err = 0;
 	gfp_t gfp = readahead_gfp_mask(mapping);
 
-	if (!(mapping->host->i_sb->s_type->fs_flags & FS_LARGE_PAGES))
+	if (!mapping_large_pages(mapping))
 		return false;
 
 	limit = min(limit, index + ra->size - 1);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b06868fc4926..51e6c135575d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1271,9 +1271,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				/* Adding to swap updated mapping */
 				mapping = page_mapping(page);
 			}
-		} else if (unlikely(PageTransHuge(page))) {
+		} else if (PageTransHuge(page)) {
 			/* Split file THP */
-			if (split_huge_page_to_list(page, page_list))
+			if (!mapping_large_pages(mapping) &&
+			    split_huge_page_to_list(page, page_list))
 				goto keep_locked;
 		}
 


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 18/25] mm: Allow large pages to be added to the page cache
  2020-04-29 13:36 ` [PATCH v3 18/25] mm: Allow large pages to be added to the page cache Matthew Wilcox
@ 2020-05-04  3:10   ` Matthew Wilcox
  2020-05-06 18:32     ` Yang Shi
  2020-06-07  3:04     ` Matthew Wilcox
  0 siblings, 2 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-05-04  3:10 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-mm, linux-kernel

On Wed, Apr 29, 2020 at 06:36:50AM -0700, Matthew Wilcox wrote:
> @@ -886,7 +906,7 @@ static int __add_to_page_cache_locked(struct page *page,
>  	/* Leave page->index set: truncation relies upon it */
>  	if (!huge)
>  		mem_cgroup_cancel_charge(page, memcg, false);
> -	put_page(page);
> +	page_ref_sub(page, nr);
>  	return xas_error(&xas);
>  }
>  ALLOW_ERROR_INJECTION(__add_to_page_cache_locked, ERRNO);

This is wrong.  page_ref_sub() will not call __put_page() if the refcount
gets to zero.  What do people prefer?

-	put_page(page);

(a)
+	put_thp(page);

(b)
+	put_page_nr(page, nr);

(c)
+	if (page_ref_sub_return(page, nr) == 0)
+		__put_page(page);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 02/25] mm: Introduce thp_size
  2020-04-29 13:36 ` [PATCH v3 02/25] mm: Introduce thp_size Matthew Wilcox
@ 2020-05-06 17:59   ` Yang Shi
  0 siblings, 0 replies; 36+ messages in thread
From: Yang Shi @ 2020-05-06 17:59 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linux FS-devel Mailing List, Linux MM, Linux Kernel Mailing List

On Wed, Apr 29, 2020 at 6:37 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>
> This is like page_size(), but compiles down to just PAGE_SIZE if THP
> are disabled.  Convert the users of hpage_nr_pages() which would prefer
> this interface.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  drivers/nvdimm/btt.c    | 4 +---
>  drivers/nvdimm/pmem.c   | 6 ++----
>  include/linux/huge_mm.h | 7 +++++++
>  mm/internal.h           | 2 +-
>  mm/page_io.c            | 2 +-
>  mm/page_vma_mapped.c    | 4 ++--
>  6 files changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
> index 3b09419218d6..78e8d972d45a 100644
> --- a/drivers/nvdimm/btt.c
> +++ b/drivers/nvdimm/btt.c
> @@ -1488,10 +1488,8 @@ static int btt_rw_page(struct block_device *bdev, sector_t sector,
>  {
>         struct btt *btt = bdev->bd_disk->private_data;
>         int rc;
> -       unsigned int len;
>
> -       len = hpage_nr_pages(page) * PAGE_SIZE;
> -       rc = btt_do_bvec(btt, NULL, page, len, 0, op, sector);
> +       rc = btt_do_bvec(btt, NULL, page, thp_size(page), 0, op, sector);
>         if (rc == 0)
>                 page_endio(page, op_is_write(op), 0);
>
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 2df6994acf83..184c8b516543 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -235,11 +235,9 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
>         blk_status_t rc;
>
>         if (op_is_write(op))
> -               rc = pmem_do_write(pmem, page, 0, sector,
> -                                  hpage_nr_pages(page) * PAGE_SIZE);
> +               rc = pmem_do_write(pmem, page, 0, sector, tmp_size(page));

s/tmp_size/thp_size

>         else
> -               rc = pmem_do_read(pmem, page, 0, sector,
> -                                  hpage_nr_pages(page) * PAGE_SIZE);
> +               rc = pmem_do_read(pmem, page, 0, sector, thp_size(page));
>         /*
>          * The ->rw_page interface is subtle and tricky.  The core
>          * retries on any error, so we can only invoke page_endio() in
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 6bec4b5b61e1..e944f9757349 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -271,6 +271,11 @@ static inline int hpage_nr_pages(struct page *page)
>         return compound_nr(page);
>  }
>
> +static inline unsigned long thp_size(struct page *page)
> +{
> +       return page_size(page);
> +}
> +
>  struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
>                 pmd_t *pmd, int flags, struct dev_pagemap **pgmap);
>  struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
> @@ -329,6 +334,8 @@ static inline int hpage_nr_pages(struct page *page)
>         return 1;
>  }
>
> +#define thp_size(x)            PAGE_SIZE
> +
>  static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
>  {
>         return false;
> diff --git a/mm/internal.h b/mm/internal.h
> index f762a34b0c57..5efb13d5c226 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -386,7 +386,7 @@ vma_address(struct page *page, struct vm_area_struct *vma)
>         unsigned long start, end;
>
>         start = __vma_address(page, vma);
> -       end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
> +       end = start + thp_size(page) - PAGE_SIZE;
>
>         /* page should be within @vma mapping range */
>         VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma);
> diff --git a/mm/page_io.c b/mm/page_io.c
> index 76965be1d40e..dd935129e3cb 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -41,7 +41,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
>                 bio->bi_iter.bi_sector <<= PAGE_SHIFT - 9;
>                 bio->bi_end_io = end_io;
>
> -               bio_add_page(bio, page, PAGE_SIZE * hpage_nr_pages(page), 0);
> +               bio_add_page(bio, page, thp_size(page), 0);
>         }
>         return bio;
>  }
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index 719c35246cfa..e65629c056e8 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -227,7 +227,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>                         if (pvmw->address >= pvmw->vma->vm_end ||
>                             pvmw->address >=
>                                         __vma_address(pvmw->page, pvmw->vma) +
> -                                       hpage_nr_pages(pvmw->page) * PAGE_SIZE)
> +                                       thp_size(pvmw->page))
>                                 return not_found(pvmw);
>                         /* Did we cross page table boundary? */
>                         if (pvmw->address % PMD_SIZE == 0) {
> @@ -268,7 +268,7 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma)
>         unsigned long start, end;
>
>         start = __vma_address(page, vma);
> -       end = start + PAGE_SIZE * (hpage_nr_pages(page) - 1);
> +       end = start + thp_size(page) - PAGE_SIZE;
>
>         if (unlikely(end < vma->vm_start || start >= vma->vm_end))
>                 return 0;
> --
> 2.26.2
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 17/25] mm: Add __page_cache_alloc_order
  2020-04-29 13:36 ` [PATCH v3 17/25] mm: Add __page_cache_alloc_order Matthew Wilcox
@ 2020-05-06 18:03   ` Yang Shi
  2020-06-07  3:08     ` Matthew Wilcox
  0 siblings, 1 reply; 36+ messages in thread
From: Yang Shi @ 2020-05-06 18:03 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linux FS-devel Mailing List, Linux MM, Linux Kernel Mailing List,
	Kirill A . Shutemov

On Wed, Apr 29, 2020 at 6:37 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>
> This new function allows page cache pages to be allocated that are
> larger than an order-0 page.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  include/linux/pagemap.h | 24 +++++++++++++++++++++---
>  mm/filemap.c            | 12 ++++++++----
>  2 files changed, 29 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 55199cb5bd66..1169e2428dd7 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -205,15 +205,33 @@ static inline int page_cache_add_speculative(struct page *page, int count)
>         return __page_cache_add_speculative(page, count);
>  }
>
> +static inline gfp_t thp_gfpmask(gfp_t gfp)
> +{
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +       /* We'd rather allocate smaller pages than stall a page fault */
> +       gfp |= GFP_TRANSHUGE_LIGHT;

This looks not correct. GFP_TRANSHUGE_LIGHT may set GFP_FS, but some
filesystem may expect GFP_NOFS, i.e. in readahead path.

> +       gfp &= ~__GFP_DIRECT_RECLAIM;
> +#endif
> +       return gfp;
> +}
> +
>  #ifdef CONFIG_NUMA
> -extern struct page *__page_cache_alloc(gfp_t gfp);
> +extern struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order);
>  #else
> -static inline struct page *__page_cache_alloc(gfp_t gfp)
> +static inline
> +struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order)
>  {
> -       return alloc_pages(gfp, 0);
> +       if (order == 0)
> +               return alloc_pages(gfp, 0);
> +       return prep_transhuge_page(alloc_pages(thp_gfpmask(gfp), order));
>  }
>  #endif
>
> +static inline struct page *__page_cache_alloc(gfp_t gfp)
> +{
> +       return __page_cache_alloc_order(gfp, 0);
> +}
> +
>  static inline struct page *page_cache_alloc(struct address_space *x)
>  {
>         return __page_cache_alloc(mapping_gfp_mask(x));
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 23a051a7ef0f..9abba062973a 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -941,24 +941,28 @@ int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
>  EXPORT_SYMBOL_GPL(add_to_page_cache_lru);
>
>  #ifdef CONFIG_NUMA
> -struct page *__page_cache_alloc(gfp_t gfp)
> +struct page *__page_cache_alloc_order(gfp_t gfp, unsigned int order)
>  {
>         int n;
>         struct page *page;
>
> +       if (order > 0)
> +               gfp = thp_gfpmask(gfp);
> +
>         if (cpuset_do_page_mem_spread()) {
>                 unsigned int cpuset_mems_cookie;
>                 do {
>                         cpuset_mems_cookie = read_mems_allowed_begin();
>                         n = cpuset_mem_spread_node();
> -                       page = __alloc_pages_node(n, gfp, 0);
> +                       page = __alloc_pages_node(n, gfp, order);
> +                       prep_transhuge_page(page);
>                 } while (!page && read_mems_allowed_retry(cpuset_mems_cookie));
>
>                 return page;
>         }
> -       return alloc_pages(gfp, 0);
> +       return prep_transhuge_page(alloc_pages(gfp, order));
>  }
> -EXPORT_SYMBOL(__page_cache_alloc);
> +EXPORT_SYMBOL(__page_cache_alloc_order);
>  #endif
>
>  /*
> --
> 2.26.2
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 18/25] mm: Allow large pages to be added to the page cache
  2020-05-04  3:10   ` Matthew Wilcox
@ 2020-05-06 18:32     ` Yang Shi
  2020-06-07  3:04     ` Matthew Wilcox
  1 sibling, 0 replies; 36+ messages in thread
From: Yang Shi @ 2020-05-06 18:32 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linux FS-devel Mailing List, Linux MM, Linux Kernel Mailing List

On Sun, May 3, 2020 at 8:10 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Apr 29, 2020 at 06:36:50AM -0700, Matthew Wilcox wrote:
> > @@ -886,7 +906,7 @@ static int __add_to_page_cache_locked(struct page *page,
> >       /* Leave page->index set: truncation relies upon it */
> >       if (!huge)
> >               mem_cgroup_cancel_charge(page, memcg, false);
> > -     put_page(page);
> > +     page_ref_sub(page, nr);
> >       return xas_error(&xas);
> >  }
> >  ALLOW_ERROR_INJECTION(__add_to_page_cache_locked, ERRNO);
>
> This is wrong.  page_ref_sub() will not call __put_page() if the refcount
> gets to zero.  What do people prefer?
>
> -       put_page(page);
>
> (a)
> +       put_thp(page);
>
> (b)
> +       put_page_nr(page, nr);
>
> (c)
> +       if (page_ref_sub_return(page, nr) == 0)
> +               __put_page(page);

b or c IMHO. The shmem uses page_ref_add/page_ref_sub so we'd better
follow it. If go with b, it sounds better to add get_page_nr() as
well.

>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 18/25] mm: Allow large pages to be added to the page cache
  2020-05-04  3:10   ` Matthew Wilcox
  2020-05-06 18:32     ` Yang Shi
@ 2020-06-07  3:04     ` Matthew Wilcox
  1 sibling, 0 replies; 36+ messages in thread
From: Matthew Wilcox @ 2020-06-07  3:04 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-mm, linux-kernel

On Sun, May 03, 2020 at 08:10:36PM -0700, Matthew Wilcox wrote:
> On Wed, Apr 29, 2020 at 06:36:50AM -0700, Matthew Wilcox wrote:
> > @@ -886,7 +906,7 @@ static int __add_to_page_cache_locked(struct page *page,
> >  	/* Leave page->index set: truncation relies upon it */
> >  	if (!huge)
> >  		mem_cgroup_cancel_charge(page, memcg, false);
> > -	put_page(page);
> > +	page_ref_sub(page, nr);
> >  	return xas_error(&xas);
> >  }
> >  ALLOW_ERROR_INJECTION(__add_to_page_cache_locked, ERRNO);
> 
> This is wrong.  page_ref_sub() will not call __put_page() if the refcount
> gets to zero.  What do people prefer?

*sigh*.  It's not wrong.  The caller holds a reference on the page
already, so calling page_ref_sub() will never reduce the refcount to 0.
The latest version looks like this:

+       page_ref_sub(page, nr);
+       VM_BUG_ON_PAGE(page_count(page) <= 0, page);


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 17/25] mm: Add __page_cache_alloc_order
  2020-05-06 18:03   ` Yang Shi
@ 2020-06-07  3:08     ` Matthew Wilcox
  2020-06-09 17:38       ` Yang Shi
  0 siblings, 1 reply; 36+ messages in thread
From: Matthew Wilcox @ 2020-06-07  3:08 UTC (permalink / raw)
  To: Yang Shi
  Cc: Linux FS-devel Mailing List, Linux MM, Linux Kernel Mailing List,
	Kirill A . Shutemov

On Wed, May 06, 2020 at 11:03:06AM -0700, Yang Shi wrote:
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index 55199cb5bd66..1169e2428dd7 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -205,15 +205,33 @@ static inline int page_cache_add_speculative(struct page *page, int count)
> >         return __page_cache_add_speculative(page, count);
> >  }
> >
> > +static inline gfp_t thp_gfpmask(gfp_t gfp)
> > +{
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +       /* We'd rather allocate smaller pages than stall a page fault */
> > +       gfp |= GFP_TRANSHUGE_LIGHT;
> 
> This looks not correct. GFP_TRANSHUGE_LIGHT may set GFP_FS, but some
> filesystem may expect GFP_NOFS, i.e. in readahead path.

Apologies, I overlooked this mail.

In one of the prerequisite patches for this patch set (which is now merged
as f2c817bed58d9be2051fad1d18e167e173c0c227), we call memalloc_nofs_save()
in the readahead path.  That ensures all allocations will have GFP_NOFS
set by the time the page allocator sees them.

Thanks for checking on this.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 17/25] mm: Add __page_cache_alloc_order
  2020-06-07  3:08     ` Matthew Wilcox
@ 2020-06-09 17:38       ` Yang Shi
  0 siblings, 0 replies; 36+ messages in thread
From: Yang Shi @ 2020-06-09 17:38 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linux FS-devel Mailing List, Linux MM, Linux Kernel Mailing List,
	Kirill A . Shutemov

On Sat, Jun 6, 2020 at 8:08 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, May 06, 2020 at 11:03:06AM -0700, Yang Shi wrote:
> > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > > index 55199cb5bd66..1169e2428dd7 100644
> > > --- a/include/linux/pagemap.h
> > > +++ b/include/linux/pagemap.h
> > > @@ -205,15 +205,33 @@ static inline int page_cache_add_speculative(struct page *page, int count)
> > >         return __page_cache_add_speculative(page, count);
> > >  }
> > >
> > > +static inline gfp_t thp_gfpmask(gfp_t gfp)
> > > +{
> > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > > +       /* We'd rather allocate smaller pages than stall a page fault */
> > > +       gfp |= GFP_TRANSHUGE_LIGHT;
> >
> > This looks not correct. GFP_TRANSHUGE_LIGHT may set GFP_FS, but some
> > filesystem may expect GFP_NOFS, i.e. in readahead path.
>
> Apologies, I overlooked this mail.
>
> In one of the prerequisite patches for this patch set (which is now merged
> as f2c817bed58d9be2051fad1d18e167e173c0c227), we call memalloc_nofs_save()
> in the readahead path.  That ensures all allocations will have GFP_NOFS
> set by the time the page allocator sees them.
>
> Thanks for checking on this.

Aha, yes, correct. I missed that. Thanks for finding that commit.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2020-06-09 17:38 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-29 13:36 [PATCH v3 00/25] Large pages in the page cache Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 01/25] mm: Allow hpages to be arbitrary order Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 02/25] mm: Introduce thp_size Matthew Wilcox
2020-05-06 17:59   ` Yang Shi
2020-04-29 13:36 ` [PATCH v3 03/25] mm: Introduce thp_order Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 04/25] mm: Introduce offset_in_thp Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 05/25] fs: Add a filesystem flag for large pages Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 06/25] fs: Introduce i_blocks_per_page Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 07/25] fs: Make page_mkwrite_check_truncate thp-aware Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 08/25] fs: Support THPs in zero_user_segments Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 09/25] bio: Add bio_for_each_thp_segment_all Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 10/25] iomap: Support arbitrarily many blocks per page Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 11/25] iomap: Support large pages in iomap_adjust_read_range Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 12/25] iomap: Support large pages in read paths Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 13/25] iomap: Support large pages in write paths Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 14/25] iomap: Inline data shouldn't see large pages Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 15/25] xfs: Support " Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 16/25] mm: Make prep_transhuge_page return its argument Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 17/25] mm: Add __page_cache_alloc_order Matthew Wilcox
2020-05-06 18:03   ` Yang Shi
2020-06-07  3:08     ` Matthew Wilcox
2020-06-09 17:38       ` Yang Shi
2020-04-29 13:36 ` [PATCH v3 18/25] mm: Allow large pages to be added to the page cache Matthew Wilcox
2020-05-04  3:10   ` Matthew Wilcox
2020-05-06 18:32     ` Yang Shi
2020-06-07  3:04     ` Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 19/25] mm: Allow large pages to be removed from " Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 20/25] mm: Remove page fault assumption of compound page size Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 21/25] mm: Add DEFINE_READAHEAD Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 22/25] mm: Make page_cache_readahead_unbounded take a readahead_control Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 23/25] mm: Make __do_page_cache_readahead " Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 24/25] mm: Add large page readahead Matthew Wilcox
2020-04-29 13:36 ` [PATCH v3 25/25] mm: Align THP mappings for non-DAX Matthew Wilcox
2020-04-29 15:40 ` [PATCH v3 00/25] Large pages in the page cache Kirill A. Shutemov
2020-04-29 15:45   ` Kirill A. Shutemov
2020-04-30 11:34 ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).