* [PATCH 0/3] Start moving write_begin/write_end out of aops
@ 2024-01-30  5:54 Matthew Wilcox (Oracle)
  2024-01-30  5:54 ` [PATCH 1/3] fs: Introduce buffered_write_operations Matthew Wilcox (Oracle)
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-01-30  5:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Matthew Wilcox (Oracle), linux-fsdevel, linux-ext4
Christoph wants to remove write_begin/write_end from aops and pass them
to filemap as callback functions.  Hre's one possible route to do this.
I combined it with the folio conversion (because why touch the same code
twice?) and tweaked some of the other things (support for ridiculously
large folios with size_t lengths, remove the need to initialise fsdata
by passing only a pointer to the fsdata pointer).  And then I converted
ext4, which is probably the worst filesystem to convert because it needs
three different bwops.  Most fs will only need one.
Not written yet: convert all the other fs, remove wrappers.
Matthew Wilcox (Oracle) (3):
  fs: Introduce buffered_write_operations
  fs: Supply optional buffered_write_operations in buffer.c
  ext4: Convert to buffered_write_operations
 fs/buffer.c                 | 62 +++++++++++++++++++++++--------
 fs/ext4/ext4.h              |  8 +++-
 fs/ext4/file.c              | 10 ++++-
 fs/ext4/inline.c            | 15 +++-----
 fs/ext4/inode.c             | 73 +++++++++++++++++++------------------
 fs/jfs/file.c               |  3 +-
 fs/ramfs/file-mmu.c         |  3 +-
 fs/ufs/file.c               |  2 +-
 include/linux/buffer_head.h | 22 +++++++++--
 include/linux/fs.h          |  3 --
 include/linux/pagemap.h     | 22 +++++++++++
 mm/filemap.c                | 70 +++++++++++++++++++++++------------
 12 files changed, 193 insertions(+), 100 deletions(-)
-- 
2.43.0
^ permalink raw reply	[flat|nested] 12+ messages in thread
* [PATCH 1/3] fs: Introduce buffered_write_operations
  2024-01-30  5:54 [PATCH 0/3] Start moving write_begin/write_end out of aops Matthew Wilcox (Oracle)
@ 2024-01-30  5:54 ` Matthew Wilcox (Oracle)
  2024-01-30  8:12   ` Christoph Hellwig
  2024-01-31  3:14   ` kernel test robot
  2024-01-30  5:54 ` [PATCH 2/3] fs: Supply optional buffered_write_operations in buffer.c Matthew Wilcox (Oracle)
  2024-01-30  5:54 ` [PATCH 3/3] ext4: Convert to buffered_write_operations Matthew Wilcox (Oracle)
  2 siblings, 2 replies; 12+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-01-30  5:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Matthew Wilcox (Oracle), linux-fsdevel, linux-ext4
Start the process of moving write_begin and write_end out from the
address_space_operations to their own struct.  Make them take a folio
instead of a page, and pass a pointer to the fsdata to write_end
(so that we don't need to initialise it).
Pass an optional buffered_write_operations pointer to various functions
in filemap.c.  The old names are available as macros for now, except
for generic_file_write_iter() which is used as a function pointer by
many filesystems.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/jfs/file.c           |  3 +-
 fs/ramfs/file-mmu.c     |  3 +-
 fs/ufs/file.c           |  2 +-
 include/linux/fs.h      |  3 --
 include/linux/pagemap.h | 22 +++++++++++++
 mm/filemap.c            | 70 +++++++++++++++++++++++++++--------------
 6 files changed, 71 insertions(+), 32 deletions(-)
diff --git a/fs/jfs/file.c b/fs/jfs/file.c
index 01b6912e60f8..9c62445ea6be 100644
--- a/fs/jfs/file.c
+++ b/fs/jfs/file.c
@@ -4,8 +4,7 @@
  *   Portions Copyright (C) Christoph Hellwig, 2001-2002
  */
 
-#include <linux/mm.h>
-#include <linux/fs.h>
+#include <linux/pagemap.h>
 #include <linux/posix_acl.h>
 #include <linux/quotaops.h>
 #include "jfs_incore.h"
diff --git a/fs/ramfs/file-mmu.c b/fs/ramfs/file-mmu.c
index c7a1aa3c882b..a621b08b0235 100644
--- a/fs/ramfs/file-mmu.c
+++ b/fs/ramfs/file-mmu.c
@@ -24,8 +24,7 @@
  * caches is sufficient.
  */
 
-#include <linux/fs.h>
-#include <linux/mm.h>
+#include <linux/pagemap.h>
 #include <linux/ramfs.h>
 #include <linux/sched.h>
 
diff --git a/fs/ufs/file.c b/fs/ufs/file.c
index 6558882a89ef..b557d4a14143 100644
--- a/fs/ufs/file.c
+++ b/fs/ufs/file.c
@@ -24,7 +24,7 @@
  *  ext2 fs regular file handling primitives
  */
 
-#include <linux/fs.h>
+#include <linux/pagemap.h>
 
 #include "ufs_fs.h"
 #include "ufs.h"
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ed5966a70495..92dc0cf08b1f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3048,10 +3048,7 @@ extern int generic_file_rw_checks(struct file *file_in, struct file *file_out);
 ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *to,
 		ssize_t already_read);
 extern ssize_t generic_file_read_iter(struct kiocb *, struct iov_iter *);
-extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *);
-extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *);
 extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *);
-ssize_t generic_perform_write(struct kiocb *, struct iov_iter *);
 ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
 		ssize_t direct_written, ssize_t buffered_written);
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 2df35e65557d..a5c474ad230e 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -18,6 +18,28 @@
 
 struct folio_batch;
 
+struct buffered_write_operations {
+	int (*write_begin)(struct file *, struct address_space *mapping,
+			loff_t pos, size_t len, struct folio **foliop,
+			void **fsdata);
+	int (*write_end)(struct file *, struct address_space *mapping,
+			loff_t pos, size_t len, size_t copied,
+			struct folio *folio, void **fsdata);
+};
+
+ssize_t filemap_write_iter(struct kiocb *, struct iov_iter *,
+		const struct buffered_write_operations *);
+ssize_t __filemap_write_iter(struct kiocb *, struct iov_iter *,
+		const struct buffered_write_operations *);
+ssize_t filemap_perform_write(struct kiocb *, struct iov_iter *,
+		const struct buffered_write_operations *);
+
+ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *);
+#define generic_perform_write(kiocb, iter)		\
+	filemap_perform_write(kiocb, iter, NULL)
+#define __generic_file_write_iter(kiocb, iter)		\
+	__filemap_write_iter(kiocb, iter, NULL)
+
 unsigned long invalidate_mapping_pages(struct address_space *mapping,
 					pgoff_t start, pgoff_t end);
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 750e779c23db..66d779b787c8 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -95,7 +95,7 @@
  *    ->invalidate_lock		(filemap_fault)
  *      ->lock_page		(filemap_fault, access_process_vm)
  *
- *  ->i_rwsem			(generic_perform_write)
+ *  ->i_rwsem			(filemap_perform_write)
  *    ->mmap_lock		(fault_in_readable->do_page_fault)
  *
  *  bdi->wb.list_lock
@@ -3890,7 +3890,8 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
 }
 EXPORT_SYMBOL(generic_file_direct_write);
 
-ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
+ssize_t filemap_perform_write(struct kiocb *iocb, struct iov_iter *i,
+		const struct buffered_write_operations *ops)
 {
 	struct file *file = iocb->ki_filp;
 	loff_t pos = iocb->ki_pos;
@@ -3900,9 +3901,9 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 	ssize_t written = 0;
 
 	do {
-		struct page *page;
-		unsigned long offset;	/* Offset into pagecache page */
-		unsigned long bytes;	/* Bytes to write to page */
+		struct folio *folio;
+		size_t offset;		/* Offset into pagecache folio */
+		size_t bytes;		/* Bytes to write to page */
 		size_t copied;		/* Bytes copied from user */
 		void *fsdata = NULL;
 
@@ -3927,19 +3928,31 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 			break;
 		}
 
-		status = a_ops->write_begin(file, mapping, pos, bytes,
+		if (ops)
+			status = ops->write_begin(file, mapping, pos, bytes,
+						&folio, &fsdata);
+		else {
+			struct page *page;
+			status = a_ops->write_begin(file, mapping, pos, bytes,
 						&page, &fsdata);
+			if (status >= 0)
+				folio = page_folio(page);
+		}
 		if (unlikely(status < 0))
 			break;
 
 		if (mapping_writably_mapped(mapping))
-			flush_dcache_page(page);
+			flush_dcache_folio(folio);
 
-		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
-		flush_dcache_page(page);
+		copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
+		flush_dcache_folio(folio);
 
-		status = a_ops->write_end(file, mapping, pos, bytes, copied,
-						page, fsdata);
+		if (ops)
+			status = ops->write_end(file, mapping, pos, bytes,
+					copied, folio, &fsdata);
+		else
+			status = a_ops->write_end(file, mapping, pos, bytes,
+					copied, &folio->page, fsdata);
 		if (unlikely(status != copied)) {
 			iov_iter_revert(i, copied - max(status, 0L));
 			if (unlikely(status < 0))
@@ -3969,12 +3982,13 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 	iocb->ki_pos += written;
 	return written;
 }
-EXPORT_SYMBOL(generic_perform_write);
+EXPORT_SYMBOL(filemap_perform_write);
 
 /**
- * __generic_file_write_iter - write data to a file
- * @iocb:	IO state structure (file, offset, etc.)
- * @from:	iov_iter with data to write
+ * __filemap_write_iter - write data to a file
+ * @iocb: IO state structure (file, offset, etc.)
+ * @from: iov_iter with data to write
+ * @ops: How to inform the filesystem that a write is starting/finishing.
  *
  * This function does all the work needed for actually writing data to a
  * file. It does all basic checks, removes SUID from the file, updates
@@ -3992,7 +4006,8 @@ EXPORT_SYMBOL(generic_perform_write);
  * * number of bytes written, even for truncated writes
  * * negative error code if no data has been written at all
  */
-ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+ssize_t __filemap_write_iter(struct kiocb *iocb, struct iov_iter *from,
+		const struct buffered_write_operations *ops)
 {
 	struct file *file = iocb->ki_filp;
 	struct address_space *mapping = file->f_mapping;
@@ -4019,27 +4034,29 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		if (ret < 0 || !iov_iter_count(from) || IS_DAX(inode))
 			return ret;
 		return direct_write_fallback(iocb, from, ret,
-				generic_perform_write(iocb, from));
+				filemap_perform_write(iocb, from, ops));
 	}
 
-	return generic_perform_write(iocb, from);
+	return filemap_perform_write(iocb, from, ops);
 }
-EXPORT_SYMBOL(__generic_file_write_iter);
+EXPORT_SYMBOL(__filemap_write_iter);
 
 /**
- * generic_file_write_iter - write data to a file
+ * filemap_write_iter - write data to a file
  * @iocb:	IO state structure
  * @from:	iov_iter with data to write
  *
- * This is a wrapper around __generic_file_write_iter() to be used by most
+ * This is a wrapper around __filemap_write_iter() to be used by most
  * filesystems. It takes care of syncing the file in case of O_SYNC file
  * and acquires i_rwsem as needed.
+ *
  * Return:
- * * negative error code if no data has been written at all of
+ * * negative error code if no data has been written at all or if
  *   vfs_fsync_range() failed for a synchronous write
  * * number of bytes written, even for truncated writes
  */
-ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+ssize_t filemap_write_iter(struct kiocb *iocb, struct iov_iter *from,
+		const struct buffered_write_operations *ops)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
@@ -4048,13 +4065,18 @@ ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	inode_lock(inode);
 	ret = generic_write_checks(iocb, from);
 	if (ret > 0)
-		ret = __generic_file_write_iter(iocb, from);
+		ret = __filemap_write_iter(iocb, from, ops);
 	inode_unlock(inode);
 
 	if (ret > 0)
 		ret = generic_write_sync(iocb, ret);
 	return ret;
 }
+
+ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+	return filemap_write_iter(iocb, from, NULL);
+}
 EXPORT_SYMBOL(generic_file_write_iter);
 
 /**
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 12+ messages in thread
* [PATCH 2/3] fs: Supply optional buffered_write_operations in buffer.c
  2024-01-30  5:54 [PATCH 0/3] Start moving write_begin/write_end out of aops Matthew Wilcox (Oracle)
  2024-01-30  5:54 ` [PATCH 1/3] fs: Introduce buffered_write_operations Matthew Wilcox (Oracle)
@ 2024-01-30  5:54 ` Matthew Wilcox (Oracle)
  2024-01-30  5:54 ` [PATCH 3/3] ext4: Convert to buffered_write_operations Matthew Wilcox (Oracle)
  2 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-01-30  5:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Matthew Wilcox (Oracle), linux-fsdevel, linux-ext4
generic_cont_expand_simple() and cont_expand_zero() currently call
->write_begin and ->write_end, so will also need to be converted to use
buffered_write_operations.  Use macro magic again to pass in optional
buffered_write_operations.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/buffer.c                 | 62 +++++++++++++++++++++++++++----------
 include/linux/buffer_head.h | 22 ++++++++++---
 2 files changed, 64 insertions(+), 20 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index d3bcf601d3e5..8ed76fc6cff0 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2441,11 +2441,13 @@ EXPORT_SYMBOL(block_read_full_folio);
  * truncates.  Uses filesystem pagecache writes to allow the filesystem to
  * deal with the hole.  
  */
-int generic_cont_expand_simple(struct inode *inode, loff_t size)
+int generic_cont_expand_simple(struct inode *inode, loff_t size,
+		const struct buffered_write_operations *ops)
 {
 	struct address_space *mapping = inode->i_mapping;
 	const struct address_space_operations *aops = mapping->a_ops;
 	struct page *page;
+	struct folio *folio;
 	void *fsdata = NULL;
 	int err;
 
@@ -2453,11 +2455,17 @@ int generic_cont_expand_simple(struct inode *inode, loff_t size)
 	if (err)
 		goto out;
 
-	err = aops->write_begin(NULL, mapping, size, 0, &page, &fsdata);
+	if (ops)
+		err = ops->write_begin(NULL, mapping, size, 0, &folio, &fsdata);
+	else
+		err = aops->write_begin(NULL, mapping, size, 0, &page, &fsdata);
 	if (err)
 		goto out;
 
-	err = aops->write_end(NULL, mapping, size, 0, 0, page, fsdata);
+	if (ops)
+		err = ops->write_end(NULL, mapping, size, 0, 0, folio, &fsdata);
+	else
+		err = aops->write_end(NULL, mapping, size, 0, 0, page, fsdata);
 	BUG_ON(err > 0);
 
 out:
@@ -2466,12 +2474,14 @@ int generic_cont_expand_simple(struct inode *inode, loff_t size)
 EXPORT_SYMBOL(generic_cont_expand_simple);
 
 static int cont_expand_zero(struct file *file, struct address_space *mapping,
-			    loff_t pos, loff_t *bytes)
+		loff_t pos, loff_t *bytes,
+		const struct buffered_write_operations *ops)
 {
 	struct inode *inode = mapping->host;
 	const struct address_space_operations *aops = mapping->a_ops;
 	unsigned int blocksize = i_blocksize(inode);
 	struct page *page;
+	struct folio *folio;
 	void *fsdata = NULL;
 	pgoff_t index, curidx;
 	loff_t curpos;
@@ -2489,13 +2499,23 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping,
 		}
 		len = PAGE_SIZE - zerofrom;
 
-		err = aops->write_begin(file, mapping, curpos, len,
-					    &page, &fsdata);
+		if (ops) {
+			err = ops->write_begin(file, mapping, curpos, len,
+					&folio, &fsdata);
+			page = &folio->page;
+		} else {
+			err = aops->write_begin(file, mapping, curpos, len,
+					&page, &fsdata);
+		}
 		if (err)
 			goto out;
 		zero_user(page, zerofrom, len);
-		err = aops->write_end(file, mapping, curpos, len, len,
-						page, fsdata);
+		if (ops)
+			err = ops->write_end(file, mapping, curpos, len, len,
+					folio, &fsdata);
+		else
+			err = aops->write_end(file, mapping, curpos, len, len,
+					page, fsdata);
 		if (err < 0)
 			goto out;
 		BUG_ON(err != len);
@@ -2522,13 +2542,23 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping,
 		}
 		len = offset - zerofrom;
 
-		err = aops->write_begin(file, mapping, curpos, len,
-					    &page, &fsdata);
+		if (ops) {
+			err = ops->write_begin(file, mapping, curpos, len,
+					&folio, &fsdata);
+			page = &folio->page;
+		} else {
+			err = aops->write_begin(file, mapping, curpos, len,
+					&page, &fsdata);
+		}
 		if (err)
 			goto out;
 		zero_user(page, zerofrom, len);
-		err = aops->write_end(file, mapping, curpos, len, len,
-						page, fsdata);
+		if (ops)
+			err = ops->write_end(file, mapping, curpos, len, len,
+					folio, &fsdata);
+		else
+			err = aops->write_end(file, mapping, curpos, len, len,
+					page, fsdata);
 		if (err < 0)
 			goto out;
 		BUG_ON(err != len);
@@ -2543,16 +2573,16 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping,
  * We may have to extend the file.
  */
 int cont_write_begin(struct file *file, struct address_space *mapping,
-			loff_t pos, unsigned len,
-			struct page **pagep, void **fsdata,
-			get_block_t *get_block, loff_t *bytes)
+		loff_t pos, unsigned len, struct page **pagep, void **fsdata,
+		get_block_t *get_block, loff_t *bytes,
+		const struct buffered_write_operations *ops)
 {
 	struct inode *inode = mapping->host;
 	unsigned int blocksize = i_blocksize(inode);
 	unsigned int zerofrom;
 	int err;
 
-	err = cont_expand_zero(file, mapping, pos, bytes);
+	err = cont_expand_zero(file, mapping, pos, bytes, ops);
 	if (err)
 		return err;
 
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index d78454a4dd1f..80de88c12d23 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -268,16 +268,30 @@ int generic_write_end(struct file *, struct address_space *,
 				loff_t, unsigned, unsigned,
 				struct page *, void *);
 void folio_zero_new_buffers(struct folio *folio, size_t from, size_t to);
-int cont_write_begin(struct file *, struct address_space *, loff_t,
-			unsigned, struct page **, void **,
-			get_block_t *, loff_t *);
-int generic_cont_expand_simple(struct inode *inode, loff_t size);
+int cont_write_begin(struct file *, struct address_space *, loff_t pos,
+		unsigned len, struct page **, void **fsdata, get_block_t *,
+		loff_t *bytes, const struct buffered_write_operations *);
+int generic_cont_expand_simple(struct inode *inode, loff_t size,
+		const struct buffered_write_operations *ops);
 void block_commit_write(struct page *page, unsigned int from, unsigned int to);
 int block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
 				get_block_t get_block);
 sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *);
 int block_truncate_page(struct address_space *, loff_t, get_block_t *);
 
+#define _cont_write_begin(file, mapping, pos, len, pagep, fsdata,	\
+		getblk, bytes, ops, extra...)				\
+	cont_write_begin(file, mapping, pos, len, pagep, fsdata,	\
+		getblk,bytes, ops)
+#define cont_write_begin(file, mapping, pos, len, pagep, fsdata,	\
+		getblk, bytes, ops...)					\
+	_cont_write_begin(file, mapping, pos, len, pagep, fsdata,	\
+		getblk, bytes, ## ops, NULL)
+#define _generic_cont_expand_simple(inode, size, ops, extra...)		\
+	generic_cont_expand_simple(inode, size, ops)
+#define generic_cont_expand_simple(inode, size, ops...)			\
+	_generic_cont_expand_simple(inode, size, ## ops, NULL)
+
 #ifdef CONFIG_MIGRATION
 extern int buffer_migrate_folio(struct address_space *,
 		struct folio *dst, struct folio *src, enum migrate_mode);
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 12+ messages in thread
* [PATCH 3/3] ext4: Convert to buffered_write_operations
  2024-01-30  5:54 [PATCH 0/3] Start moving write_begin/write_end out of aops Matthew Wilcox (Oracle)
  2024-01-30  5:54 ` [PATCH 1/3] fs: Introduce buffered_write_operations Matthew Wilcox (Oracle)
  2024-01-30  5:54 ` [PATCH 2/3] fs: Supply optional buffered_write_operations in buffer.c Matthew Wilcox (Oracle)
@ 2024-01-30  5:54 ` Matthew Wilcox (Oracle)
  2024-01-30  6:13   ` Matthew Wilcox
                     ` (2 more replies)
  2 siblings, 3 replies; 12+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-01-30  5:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Matthew Wilcox (Oracle), linux-fsdevel, linux-ext4
Pass the appropriate buffered_write_operations to filemap_perform_write().
Saves a lot of page<->folio conversions.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/ext4/ext4.h   |  8 ++++--
 fs/ext4/file.c   | 10 ++++++-
 fs/ext4/inline.c | 15 ++++------
 fs/ext4/inode.c  | 73 ++++++++++++++++++++++++------------------------
 4 files changed, 58 insertions(+), 48 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a5d784872303..3491d5c279c7 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3021,6 +3021,10 @@ extern void ext4_da_update_reserve_space(struct inode *inode,
 extern int ext4_issue_zeroout(struct inode *inode, ext4_lblk_t lblk,
 			      ext4_fsblk_t pblk, ext4_lblk_t len);
 
+extern const struct buffered_write_operations ext4_bw_ops;
+extern const struct buffered_write_operations ext4_journalled_bw_ops;
+extern const struct buffered_write_operations ext4_da_bw_ops;
+
 /* indirect.c */
 extern int ext4_ind_map_blocks(handle_t *handle, struct inode *inode,
 				struct ext4_map_blocks *map, int flags);
@@ -3549,13 +3553,13 @@ int ext4_readpage_inline(struct inode *inode, struct folio *folio);
 extern int ext4_try_to_write_inline_data(struct address_space *mapping,
 					 struct inode *inode,
 					 loff_t pos, unsigned len,
-					 struct page **pagep);
+					 struct folio **foliop);
 int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len,
 			       unsigned copied, struct folio *folio);
 extern int ext4_da_write_inline_data_begin(struct address_space *mapping,
 					   struct inode *inode,
 					   loff_t pos, unsigned len,
-					   struct page **pagep,
+					   struct folio **foliop,
 					   void **fsdata);
 extern int ext4_try_add_inline_entry(handle_t *handle,
 				     struct ext4_filename *fname,
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 6aa15dafc677..1bca5f47cb5f 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -287,16 +287,24 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
 {
 	ssize_t ret;
 	struct inode *inode = file_inode(iocb->ki_filp);
+	const struct buffered_write_operations *ops;
 
 	if (iocb->ki_flags & IOCB_NOWAIT)
 		return -EOPNOTSUPP;
 
+	if (ext4_inode_journal_mode(inode))
+		ops = &ext4_journalled_bw_ops;
+	else if (test_opt(inode->i_sb, DELALLOC))
+		ops = &ext4_da_bw_ops;
+	else
+		ops = &ext4_bw_ops;
+
 	inode_lock(inode);
 	ret = ext4_write_checks(iocb, from);
 	if (ret <= 0)
 		goto out;
 
-	ret = generic_perform_write(iocb, from);
+	ret = filemap_perform_write(iocb, from, ops);
 
 out:
 	inode_unlock(inode);
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index d5bd1e3a5d36..0d3fc5c14ad5 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -658,9 +658,8 @@ static int ext4_convert_inline_data_to_extent(struct address_space *mapping,
  * to the page make it update and let the later codes create extent for it.
  */
 int ext4_try_to_write_inline_data(struct address_space *mapping,
-				  struct inode *inode,
-				  loff_t pos, unsigned len,
-				  struct page **pagep)
+		struct inode *inode, loff_t pos, unsigned len,
+		struct folio **foliop)
 {
 	int ret;
 	handle_t *handle;
@@ -708,7 +707,7 @@ int ext4_try_to_write_inline_data(struct address_space *mapping,
 		goto out;
 	}
 
-	*pagep = &folio->page;
+	*foliop = folio;
 	down_read(&EXT4_I(inode)->xattr_sem);
 	if (!ext4_has_inline_data(inode)) {
 		ret = 0;
@@ -889,10 +888,8 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
  * normal ext4_da_write_end).
  */
 int ext4_da_write_inline_data_begin(struct address_space *mapping,
-				    struct inode *inode,
-				    loff_t pos, unsigned len,
-				    struct page **pagep,
-				    void **fsdata)
+		struct inode *inode, loff_t pos, unsigned len,
+		struct folio **foliop, void **fsdata)
 {
 	int ret;
 	handle_t *handle;
@@ -954,7 +951,7 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
 		goto out_release_page;
 
 	up_read(&EXT4_I(inode)->xattr_sem);
-	*pagep = &folio->page;
+	*foliop = folio;
 	brelse(iloc.bh);
 	return 1;
 out_release_page:
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5af1b0b8680e..4f532870c388 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1113,8 +1113,7 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
  * ext4_write_begin() is the right place.
  */
 static int ext4_write_begin(struct file *file, struct address_space *mapping,
-			    loff_t pos, unsigned len,
-			    struct page **pagep, void **fsdata)
+		loff_t pos, size_t len, struct folio **foliop, void **fsdata)
 {
 	struct inode *inode = mapping->host;
 	int ret, needed_blocks;
@@ -1139,7 +1138,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
 
 	if (ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) {
 		ret = ext4_try_to_write_inline_data(mapping, inode, pos, len,
-						    pagep);
+						    foliop);
 		if (ret < 0)
 			return ret;
 		if (ret == 1)
@@ -1239,7 +1238,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
 		folio_put(folio);
 		return ret;
 	}
-	*pagep = &folio->page;
+	*foliop = folio;
 	return ret;
 }
 
@@ -1264,12 +1263,10 @@ static int write_end_fn(handle_t *handle, struct inode *inode,
  * ext4 never places buffers on inode->i_mapping->i_private_list.  metadata
  * buffers are managed internally.
  */
-static int ext4_write_end(struct file *file,
-			  struct address_space *mapping,
-			  loff_t pos, unsigned len, unsigned copied,
-			  struct page *page, void *fsdata)
+static int ext4_write_end(struct file *file, struct address_space *mapping,
+		loff_t pos, size_t len, size_t copied, struct folio *folio,
+		void **fsdata)
 {
-	struct folio *folio = page_folio(page);
 	handle_t *handle = ext4_journal_current_handle();
 	struct inode *inode = mapping->host;
 	loff_t old_size = inode->i_size;
@@ -1284,7 +1281,7 @@ static int ext4_write_end(struct file *file,
 		return ext4_write_inline_data_end(inode, pos, len, copied,
 						  folio);
 
-	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
+	copied = block_write_end(file, mapping, pos, len, copied, &folio->page, *fsdata);
 	/*
 	 * it's important to update i_size while still holding folio lock:
 	 * page writeout could otherwise come in and zero beyond i_size.
@@ -1369,11 +1366,9 @@ static void ext4_journalled_zero_new_buffers(handle_t *handle,
 }
 
 static int ext4_journalled_write_end(struct file *file,
-				     struct address_space *mapping,
-				     loff_t pos, unsigned len, unsigned copied,
-				     struct page *page, void *fsdata)
+		struct address_space *mapping, loff_t pos, size_t len,
+		size_t copied, struct folio *folio, void **fsdata)
 {
-	struct folio *folio = page_folio(page);
 	handle_t *handle = ext4_journal_current_handle();
 	struct inode *inode = mapping->host;
 	loff_t old_size = inode->i_size;
@@ -2856,9 +2851,9 @@ static int ext4_nonda_switch(struct super_block *sb)
 	return 0;
 }
 
-static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
-			       loff_t pos, unsigned len,
-			       struct page **pagep, void **fsdata)
+static int ext4_da_write_begin(struct file *file,
+		struct address_space *mapping, loff_t pos, size_t len,
+		struct folio **foliop, void **fsdata)
 {
 	int ret, retries = 0;
 	struct folio *folio;
@@ -2873,14 +2868,14 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 	if (ext4_nonda_switch(inode->i_sb) || ext4_verity_in_progress(inode)) {
 		*fsdata = (void *)FALL_BACK_TO_NONDELALLOC;
 		return ext4_write_begin(file, mapping, pos,
-					len, pagep, fsdata);
+					len, foliop, fsdata);
 	}
 	*fsdata = (void *)0;
 	trace_ext4_da_write_begin(inode, pos, len);
 
 	if (ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) {
 		ret = ext4_da_write_inline_data_begin(mapping, inode, pos, len,
-						      pagep, fsdata);
+						      foliop, fsdata);
 		if (ret < 0)
 			return ret;
 		if (ret == 1)
@@ -2918,7 +2913,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 		return ret;
 	}
 
-	*pagep = &folio->page;
+	*foliop = folio;
 	return ret;
 }
 
@@ -2945,9 +2940,8 @@ static int ext4_da_should_update_i_disksize(struct folio *folio,
 	return 1;
 }
 
-static int ext4_da_do_write_end(struct address_space *mapping,
-			loff_t pos, unsigned len, unsigned copied,
-			struct folio *folio)
+static int ext4_da_do_write_end(struct address_space *mapping, loff_t pos,
+		size_t len, size_t copied, struct folio *folio)
 {
 	struct inode *inode = mapping->host;
 	loff_t old_size = inode->i_size;
@@ -3007,18 +3001,16 @@ static int ext4_da_do_write_end(struct address_space *mapping,
 	return copied;
 }
 
-static int ext4_da_write_end(struct file *file,
-			     struct address_space *mapping,
-			     loff_t pos, unsigned len, unsigned copied,
-			     struct page *page, void *fsdata)
+static int ext4_da_write_end(struct file *file, struct address_space *mapping,
+		loff_t pos, size_t len, size_t copied, struct folio *folio,
+		void **fsdata)
 {
 	struct inode *inode = mapping->host;
-	int write_mode = (int)(unsigned long)fsdata;
-	struct folio *folio = page_folio(page);
+	int write_mode = (int)(unsigned long)*fsdata;
 
 	if (write_mode == FALL_BACK_TO_NONDELALLOC)
 		return ext4_write_end(file, mapping, pos,
-				      len, copied, &folio->page, fsdata);
+				      len, copied, folio, fsdata);
 
 	trace_ext4_da_write_end(inode, pos, len, copied);
 
@@ -3556,8 +3548,6 @@ static const struct address_space_operations ext4_aops = {
 	.read_folio		= ext4_read_folio,
 	.readahead		= ext4_readahead,
 	.writepages		= ext4_writepages,
-	.write_begin		= ext4_write_begin,
-	.write_end		= ext4_write_end,
 	.dirty_folio		= ext4_dirty_folio,
 	.bmap			= ext4_bmap,
 	.invalidate_folio	= ext4_invalidate_folio,
@@ -3573,8 +3563,6 @@ static const struct address_space_operations ext4_journalled_aops = {
 	.read_folio		= ext4_read_folio,
 	.readahead		= ext4_readahead,
 	.writepages		= ext4_writepages,
-	.write_begin		= ext4_write_begin,
-	.write_end		= ext4_journalled_write_end,
 	.dirty_folio		= ext4_journalled_dirty_folio,
 	.bmap			= ext4_bmap,
 	.invalidate_folio	= ext4_journalled_invalidate_folio,
@@ -3590,8 +3578,6 @@ static const struct address_space_operations ext4_da_aops = {
 	.read_folio		= ext4_read_folio,
 	.readahead		= ext4_readahead,
 	.writepages		= ext4_writepages,
-	.write_begin		= ext4_da_write_begin,
-	.write_end		= ext4_da_write_end,
 	.dirty_folio		= ext4_dirty_folio,
 	.bmap			= ext4_bmap,
 	.invalidate_folio	= ext4_invalidate_folio,
@@ -3611,6 +3597,21 @@ static const struct address_space_operations ext4_dax_aops = {
 	.swap_activate		= ext4_iomap_swap_activate,
 };
 
+const struct buffered_write_operations ext4_bw_ops = {
+	.write_begin		= ext4_write_begin,
+	.write_end		= ext4_write_end,
+};
+
+const struct buffered_write_operations ext4_journalled_bw_ops = {
+	.write_begin		= ext4_write_begin,
+	.write_end		= ext4_journalled_write_end,
+};
+
+const struct buffered_write_operations ext4_da_bw_ops = {
+	.write_begin		= ext4_da_write_begin,
+	.write_end		= ext4_da_write_end,
+};
+
 void ext4_set_aops(struct inode *inode)
 {
 	switch (ext4_inode_journal_mode(inode)) {
-- 
2.43.0
^ permalink raw reply related	[flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] ext4: Convert to buffered_write_operations
  2024-01-30  5:54 ` [PATCH 3/3] ext4: Convert to buffered_write_operations Matthew Wilcox (Oracle)
@ 2024-01-30  6:13   ` Matthew Wilcox
  2024-01-30  8:13   ` Christoph Hellwig
  2024-02-01 15:12   ` kernel test robot
  2 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2024-01-30  6:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4
On Tue, Jan 30, 2024 at 05:54:13AM +0000, Matthew Wilcox (Oracle) wrote:
> +++ b/fs/ext4/file.c
> @@ -287,16 +287,24 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
>  {
>  	ssize_t ret;
>  	struct inode *inode = file_inode(iocb->ki_filp);
> +	const struct buffered_write_operations *ops;
>  
>  	if (iocb->ki_flags & IOCB_NOWAIT)
>  		return -EOPNOTSUPP;
>  
> +	if (ext4_inode_journal_mode(inode))
I forgot to commit --amend.  This line should have been:
+       if (ext4_should_journal_data(inode))
> +		ops = &ext4_journalled_bw_ops;
> +	else if (test_opt(inode->i_sb, DELALLOC))
> +		ops = &ext4_da_bw_ops;
> +	else
> +		ops = &ext4_bw_ops;
> +
>  	inode_lock(inode);
>  	ret = ext4_write_checks(iocb, from);
>  	if (ret <= 0)
>  		goto out;
>  
> -	ret = generic_perform_write(iocb, from);
> +	ret = filemap_perform_write(iocb, from, ops);
>  
>  out:
>  	inode_unlock(inode);
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] fs: Introduce buffered_write_operations
  2024-01-30  5:54 ` [PATCH 1/3] fs: Introduce buffered_write_operations Matthew Wilcox (Oracle)
@ 2024-01-30  8:12   ` Christoph Hellwig
  2024-02-01  4:36     ` Matthew Wilcox
  2024-01-31  3:14   ` kernel test robot
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2024-01-30  8:12 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Christoph Hellwig, linux-fsdevel, linux-ext4
> +struct buffered_write_operations {
> +	int (*write_begin)(struct file *, struct address_space *mapping,
> +			loff_t pos, size_t len, struct folio **foliop,
> +			void **fsdata);
> +	int (*write_end)(struct file *, struct address_space *mapping,
> +			loff_t pos, size_t len, size_t copied,
> +			struct folio *folio, void **fsdata);
> +};
Should write_begin simply return the folio or an ERR_PTR instead of
the return by reference?
I also wonder if the fsdata paramter should go away - if a fs needs
to pass forth and back fsdata, generic/filemap_perform_write is
probably the wrong abstraction for it.
Otherwise this looks sane to me.
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] ext4: Convert to buffered_write_operations
  2024-01-30  5:54 ` [PATCH 3/3] ext4: Convert to buffered_write_operations Matthew Wilcox (Oracle)
  2024-01-30  6:13   ` Matthew Wilcox
@ 2024-01-30  8:13   ` Christoph Hellwig
  2024-02-01 15:12   ` kernel test robot
  2 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2024-01-30  8:13 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Christoph Hellwig, linux-fsdevel, linux-ext4
Given that an ext4 conversion to iomap is pending on the list I wonder
if this is the best example.  But the conversion itself looks sensible
to me with the fixup.
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] fs: Introduce buffered_write_operations
  2024-01-30  5:54 ` [PATCH 1/3] fs: Introduce buffered_write_operations Matthew Wilcox (Oracle)
  2024-01-30  8:12   ` Christoph Hellwig
@ 2024-01-31  3:14   ` kernel test robot
  1 sibling, 0 replies; 12+ messages in thread
From: kernel test robot @ 2024-01-31  3:14 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Christoph Hellwig
  Cc: oe-kbuild-all, Matthew Wilcox (Oracle), linux-fsdevel, linux-ext4
Hi Matthew,
kernel test robot noticed the following build warnings:
[auto build test WARNING on kleikamp-shaggy/jfs-next]
[also build test WARNING on akpm-mm/mm-everything linus/master v6.8-rc2 next-20240130]
[cannot apply to tytso-ext4/dev]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/fs-Introduce-buffered_write_operations/20240130-135555
base:   https://github.com/kleikamp/linux-shaggy jfs-next
patch link:    https://lore.kernel.org/r/20240130055414.2143959-2-willy%40infradead.org
patch subject: [PATCH 1/3] fs: Introduce buffered_write_operations
config: x86_64-allnoconfig (https://download.01.org/0day-ci/archive/20240131/202401311058.81kC2rCv-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240131/202401311058.81kC2rCv-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202401311058.81kC2rCv-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> mm/filemap.c:4060: warning: Function parameter or struct member 'ops' not described in 'filemap_write_iter'
vim +4060 mm/filemap.c
^1da177e4c3f41 Linus Torvalds          2005-04-16  4043  
e4dd9de3c66bc7 Jan Kara                2009-08-17  4044  /**
21878deb477823 Matthew Wilcox (Oracle  2024-01-30  4045)  * filemap_write_iter - write data to a file
e4dd9de3c66bc7 Jan Kara                2009-08-17  4046   * @iocb:	IO state structure
8174202b34c30e Al Viro                 2014-04-03  4047   * @from:	iov_iter with data to write
e4dd9de3c66bc7 Jan Kara                2009-08-17  4048   *
21878deb477823 Matthew Wilcox (Oracle  2024-01-30  4049)  * This is a wrapper around __filemap_write_iter() to be used by most
e4dd9de3c66bc7 Jan Kara                2009-08-17  4050   * filesystems. It takes care of syncing the file in case of O_SYNC file
9608703e488cf7 Jan Kara                2021-04-12  4051   * and acquires i_rwsem as needed.
21878deb477823 Matthew Wilcox (Oracle  2024-01-30  4052)  *
a862f68a8b3600 Mike Rapoport           2019-03-05  4053   * Return:
21878deb477823 Matthew Wilcox (Oracle  2024-01-30  4054)  * * negative error code if no data has been written at all or if
a862f68a8b3600 Mike Rapoport           2019-03-05  4055   *   vfs_fsync_range() failed for a synchronous write
a862f68a8b3600 Mike Rapoport           2019-03-05  4056   * * number of bytes written, even for truncated writes
e4dd9de3c66bc7 Jan Kara                2009-08-17  4057   */
21878deb477823 Matthew Wilcox (Oracle  2024-01-30  4058) ssize_t filemap_write_iter(struct kiocb *iocb, struct iov_iter *from,
21878deb477823 Matthew Wilcox (Oracle  2024-01-30  4059) 		const struct buffered_write_operations *ops)
^1da177e4c3f41 Linus Torvalds          2005-04-16 @4060  {
^1da177e4c3f41 Linus Torvalds          2005-04-16  4061  	struct file *file = iocb->ki_filp;
148f948ba877f4 Jan Kara                2009-08-17  4062  	struct inode *inode = file->f_mapping->host;
^1da177e4c3f41 Linus Torvalds          2005-04-16  4063  	ssize_t ret;
^1da177e4c3f41 Linus Torvalds          2005-04-16  4064  
5955102c9984fa Al Viro                 2016-01-22  4065  	inode_lock(inode);
3309dd04cbcd2c Al Viro                 2015-04-09  4066  	ret = generic_write_checks(iocb, from);
3309dd04cbcd2c Al Viro                 2015-04-09  4067  	if (ret > 0)
21878deb477823 Matthew Wilcox (Oracle  2024-01-30  4068) 		ret = __filemap_write_iter(iocb, from, ops);
5955102c9984fa Al Viro                 2016-01-22  4069  	inode_unlock(inode);
^1da177e4c3f41 Linus Torvalds          2005-04-16  4070  
e259221763a404 Christoph Hellwig       2016-04-07  4071  	if (ret > 0)
e259221763a404 Christoph Hellwig       2016-04-07  4072  		ret = generic_write_sync(iocb, ret);
^1da177e4c3f41 Linus Torvalds          2005-04-16  4073  	return ret;
^1da177e4c3f41 Linus Torvalds          2005-04-16  4074  }
21878deb477823 Matthew Wilcox (Oracle  2024-01-30  4075) 
-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] fs: Introduce buffered_write_operations
  2024-01-30  8:12   ` Christoph Hellwig
@ 2024-02-01  4:36     ` Matthew Wilcox
  2024-02-01  4:42       ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2024-02-01  4:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4
On Tue, Jan 30, 2024 at 09:12:52AM +0100, Christoph Hellwig wrote:
> > +struct buffered_write_operations {
> > +	int (*write_begin)(struct file *, struct address_space *mapping,
> > +			loff_t pos, size_t len, struct folio **foliop,
> > +			void **fsdata);
> > +	int (*write_end)(struct file *, struct address_space *mapping,
> > +			loff_t pos, size_t len, size_t copied,
> > +			struct folio *folio, void **fsdata);
> > +};
> 
> Should write_begin simply return the folio or an ERR_PTR instead of
> the return by reference?
OK, I've done that.  It's a _lot_ more intrusive for the ext4
conversion.  There's a higher risk of bugs.  BUT I think it does end up
looking a bit cleaner.  I also did the same conversion to iomap; ie
-static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
-               size_t len, struct folio **foliop)
+static struct folio *iomap_write_begin(struct iomap_iter *iter, loff_t pos,
+               size_t len)
with corresponding changes.  Again, ends up looking slightly cleaner.
> I also wonder if the fsdata paramter should go away - if a fs needs
> to pass forth and back fsdata, generic/filemap_perform_write is
> probably the wrong abstraction for it.
So I could get rid of fsdata in ext4; that works out fine.
We have three filesystems actually using fsdata (unless they call fsdata
something else ...)
fs/bcachefs/fs-io-buffered.c:   *fsdata = res;
fs/f2fs/compress.c:             *fsdata = cc->rpages;
fs/ocfs2/aops.c:        *fsdata = wc;
bcachefs seems to actually be using it for its intended purpose --
passing a reservation between write_begin and write_end.
f2fs also doesn't seem terribly objectional; passing rpages between
begin & end.
ocfs2 is passing a ocfs2_write_ctxt between the two.
I don't know that it's a win to remove fsdata from these callbacks,
only to duplicate __generic_file_write_iter() into ocfs2,
generic_perform_write() into f2fs and ... er, I don't think bcachefs
uses any of the functions which would call back through
write_begin/write_end.  So maybe that one's dead code?
Anyway, I'm most inclined to leave fsdata andling the way I had it
in v1, unless you have a better suggestion.
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] fs: Introduce buffered_write_operations
  2024-02-01  4:36     ` Matthew Wilcox
@ 2024-02-01  4:42       ` Christoph Hellwig
  2024-02-02 19:54         ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2024-02-01  4:42 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Christoph Hellwig, linux-fsdevel, linux-ext4
On Thu, Feb 01, 2024 at 04:36:09AM +0000, Matthew Wilcox wrote:
> +static struct folio *iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> +               size_t len)
> 
> with corresponding changes.  Again, ends up looking slightly cleaner.
iomap also really needs some tweaks to the naming in the area as a
__foo function calling foo as the default is horrible.  I'll take a
look at your patches and can add that on top.
> f2fs also doesn't seem terribly objectional; passing rpages between
> begin & end.
> 
> ocfs2 is passing a ocfs2_write_ctxt between the two.
Well, it might be the intended purpose, but as-is it is horribly
inefficient - they need to do a dynamic allocation for every
page they iterate over.  So all of them are candidates for an
iterator model that does this allocation once per write.
But maybe this isn't the time to deal with that and we should just leave
it in place.
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] ext4: Convert to buffered_write_operations
  2024-01-30  5:54 ` [PATCH 3/3] ext4: Convert to buffered_write_operations Matthew Wilcox (Oracle)
  2024-01-30  6:13   ` Matthew Wilcox
  2024-01-30  8:13   ` Christoph Hellwig
@ 2024-02-01 15:12   ` kernel test robot
  2 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2024-02-01 15:12 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: oe-lkp, lkp, linux-ext4, Christoph Hellwig,
	Matthew Wilcox (Oracle), linux-fsdevel, oliver.sang
Hello,
kernel test robot noticed "kernel_BUG_at_fs/ext4/inode.c" on:
commit: 06094684ce207c4f2ce3f0d0f541361158579818 ("[PATCH 3/3] ext4: Convert to buffered_write_operations")
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/fs-Introduce-buffered_write_operations/20240130-135555
base: https://github.com/kleikamp/linux-shaggy jfs-next
patch link: https://lore.kernel.org/all/20240130055414.2143959-4-willy@infradead.org/
patch subject: [PATCH 3/3] ext4: Convert to buffered_write_operations
in testcase: fxmark
version: fxmark-x86_64-0ce9491-1_20220601
with following parameters:
	disk: 1SSD
	media: ssd
	test: DWAL
	fstype: ext4_no_jnl
	directio: bufferedio
	cpufreq_governor: performance
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202402012226.578a5c20-oliver.sang@intel.com
[   68.790992][ T6736] ------------[ cut here ]------------
[   68.790993][ T6738] ------------[ cut here ]------------
[   68.790995][ T6738] kernel BUG at fs/ext4/inode.c:1385!
[   68.790998][ T6824] ------------[ cut here ]------------
[   68.791001][ T6680] ------------[ cut here ]------------
[   68.791001][ T6738] invalid opcode: 0000 [#1] SMP NOPTI
[   68.791003][ T6680] kernel BUG at fs/ext4/inode.c:1385!
[   68.791004][ T6738] CPU: 31 PID: 6738 Comm: fxmark Tainted: G S                 6.8.0-rc2-00004-g06094684ce20 #1
[   68.791006][ T6738] Hardware name: Intel Corporation M50CYP2SB1U/M50CYP2SB1U, BIOS SE5C620.86B.01.01.0003.2104260124 04/26/2021
[ 68.791007][ T6738] RIP: 0010:ext4_journalled_write_end (fs/ext4/inode.c:1385) 
[ 68.791014][ T6738] Code: 48 8b 05 c1 5e bf 01 48 85 c0 74 11 48 8b 78 08 48 8b 54 24 10 48 89 de e8 d3 36 02 00 48 81 fd ff 0f 00 00 0f 87 25 fe ff ff <0f> 0b 66 83 bb f2 02 00 00 00 0f 84 27 fe ff ff 4c 8b 44 24 18 48
All code
========
   0:	48 8b 05 c1 5e bf 01 	mov    0x1bf5ec1(%rip),%rax        # 0x1bf5ec8
   7:	48 85 c0             	test   %rax,%rax
   a:	74 11                	je     0x1d
   c:	48 8b 78 08          	mov    0x8(%rax),%rdi
  10:	48 8b 54 24 10       	mov    0x10(%rsp),%rdx
  15:	48 89 de             	mov    %rbx,%rsi
  18:	e8 d3 36 02 00       	callq  0x236f0
  1d:	48 81 fd ff 0f 00 00 	cmp    $0xfff,%rbp
  24:	0f 87 25 fe ff ff    	ja     0xfffffffffffffe4f
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	66 83 bb f2 02 00 00 	cmpw   $0x0,0x2f2(%rbx)
  33:	00 
  34:	0f 84 27 fe ff ff    	je     0xfffffffffffffe61
  3a:	4c 8b 44 24 18       	mov    0x18(%rsp),%r8
  3f:	48                   	rex.W
Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	66 83 bb f2 02 00 00 	cmpw   $0x0,0x2f2(%rbx)
   9:	00 
   a:	0f 84 27 fe ff ff    	je     0xfffffffffffffe37
  10:	4c 8b 44 24 18       	mov    0x18(%rsp),%r8
  15:	48                   	rex.W
[   68.791016][ T6738] RSP: 0018:ffa000000d65fd28 EFLAGS: 00010293
[   68.791019][ T6738] RAX: 0000000000000000 RBX: ff1100011001d870 RCX: 0000000000001000
[   68.791020][ T6738] RDX: 0000000000000000 RSI: ff1100011001d9e8 RDI: ff110001088edf00
[   68.791021][ T6738] RBP: 0000000000000001 R08: 0000000000001000 R09: ffd4000006a4ac00
[   68.791022][ T6738] R10: ff110001097e0548 R11: ffd4000006a4ac00 R12: 0000000000001000
[   68.791023][ T6738] R13: 0000000000001000 R14: ffffffff8245b350 R15: ffa000000d65fe68
[   68.791023][ T6738] FS:  00007f540d431600(0000) GS:ff1100103f9c0000(0000) knlGS:0000000000000000
[   68.791025][ T6738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   68.791026][ T6738] CR2: 000055b84bf3f000 CR3: 00000001e7bd0002 CR4: 0000000000771ef0
[   68.791027][ T6738] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   68.791027][ T6738] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   68.791028][ T6738] PKRU: 55555554
[   68.791029][ T6738] Call Trace:
[   68.791029][ T6853] ------------[ cut here ]------------
[   68.791031][ T6738]  <TASK>
[   68.791031][ T6853] kernel BUG at fs/ext4/inode.c:1385!
[ 68.791033][ T6738] ? die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434 arch/x86/kernel/dumpstack.c:447) 
[ 68.791038][ T6738] ? do_trap (arch/x86/kernel/traps.c:113 arch/x86/kernel/traps.c:154) 
[   68.791040][ T6820] ------------[ cut here ]------------
[ 68.791040][ T6738] ? ext4_journalled_write_end (fs/ext4/inode.c:1385) 
[   68.791041][ T6820] kernel BUG at fs/ext4/inode.c:1385!
[ 68.791043][ T6738] ? do_error_trap (arch/x86/include/asm/traps.h:58 arch/x86/kernel/traps.c:175) 
[   68.791044][ T6823] ------------[ cut here ]------------
[   68.791045][ T6674] ------------[ cut here ]------------
[ 68.791045][ T6738] ? ext4_journalled_write_end (fs/ext4/inode.c:1385) 
[   68.791047][ T6755] ------------[ cut here ]------------
[   68.791047][ T6823] kernel BUG at fs/ext4/inode.c:1385!
[   68.791048][ T6674] kernel BUG at fs/ext4/inode.c:1385!
[   68.791049][ T6755] kernel BUG at fs/ext4/inode.c:1385!
[ 68.791048][ T6738] ? exc_invalid_op (arch/x86/kernel/traps.c:266) 
[ 68.791053][ T6738] ? ext4_journalled_write_end (fs/ext4/inode.c:1385) 
[   68.791054][ T6793] ------------[ cut here ]------------
[   68.791056][ T6763] ------------[ cut here ]------------
[   68.791056][ T6793] kernel BUG at fs/ext4/inode.c:1385!
[   68.791057][ T6763] kernel BUG at fs/ext4/inode.c:1385!
[   68.791059][ T6669] ------------[ cut here ]------------
[ 68.791055][ T6738] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568) 
[   68.791061][ T6669] kernel BUG at fs/ext4/inode.c:1385!
[ 68.791062][ T6738] ? ext4_journalled_write_end (fs/ext4/inode.c:1385) 
[   68.791063][ T6774] ------------[ cut here ]------------
[   68.791064][ T6774] kernel BUG at fs/ext4/inode.c:1385!
[ 68.791065][ T6738] filemap_perform_write (mm/filemap.c:3951) 
[   68.791068][ T6765] ------------[ cut here ]------------
[   68.791070][ T6765] kernel BUG at fs/ext4/inode.c:1385!
[   68.791072][ T6777] ------------[ cut here ]------------
[   68.791072][ T6809] ------------[ cut here ]------------
[   68.791073][ T6777] kernel BUG at fs/ext4/inode.c:1385!
[ 68.791070][ T6738] ? generic_update_time (fs/inode.c:1907) 
[   68.791073][ T6809] kernel BUG at fs/ext4/inode.c:1385!
[   68.791074][ T6828] ------------[ cut here ]------------
[   68.791076][ T6841] ------------[ cut here ]------------
[   68.791076][ T6671] ------------[ cut here ]------------
[   68.791077][ T6828] kernel BUG at fs/ext4/inode.c:1385!
[   68.791077][ T6841] kernel BUG at fs/ext4/inode.c:1385!
[   68.791078][ T6786] ------------[ cut here ]------------
[ 68.791078][ T6738] ext4_buffered_write_iter (include/linux/fs.h:807 fs/ext4/file.c:310) 
[   68.791081][ T6786] kernel BUG at fs/ext4/inode.c:1385!
[   68.791081][ T6671] kernel BUG at fs/ext4/inode.c:1385!
[ 68.791082][ T6738] vfs_write (include/linux/fs.h:2085 fs/read_write.c:497 fs/read_write.c:590) 
[ 68.791086][ T6738] ksys_write (fs/read_write.c:643) 
[   68.791087][ T6807] ------------[ cut here ]------------
[   68.791088][ T6807] kernel BUG at fs/ext4/inode.c:1385!
[ 68.791089][ T6738] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) 
[   68.791093][ T6737] ------------[ cut here ]------------
[   68.791094][ T6804] ------------[ cut here ]------------
[   68.791094][ T6737] kernel BUG at fs/ext4/inode.c:1385!
[   68.791094][ T6705] ------------[ cut here ]------------
[ 68.791094][ T6738] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) 
[   68.791098][ T6738] RIP: 0033:0x7f540d33a473
[   68.791100][ T6804] kernel BUG at fs/ext4/inode.c:1385!
[ 68.791100][ T6738] Code: 8b 15 21 2a 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
All code
========
   0:	8b 15 21 2a 0e 00    	mov    0xe2a21(%rip),%edx        # 0xe2a27
   6:	f7 d8                	neg    %eax
   8:	64 89 02             	mov    %eax,%fs:(%rdx)
   b:	48 c7 c0 ff ff ff ff 	mov    $0xffffffffffffffff,%rax
  12:	eb b7                	jmp    0xffffffffffffffcb
  14:	0f 1f 00             	nopl   (%rax)
  17:	64 8b 04 25 18 00 00 	mov    %fs:0x18,%eax
  1e:	00 
  1f:	85 c0                	test   %eax,%eax
  21:	75 14                	jne    0x37
  23:	b8 01 00 00 00       	mov    $0x1,%eax
  28:	0f 05                	syscall 
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 55                	ja     0x87
  32:	c3                   	retq   
  33:	0f 1f 40 00          	nopl   0x0(%rax)
  37:	48 83 ec 28          	sub    $0x28,%rsp
  3b:	48 89 54 24 18       	mov    %rdx,0x18(%rsp)
Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 55                	ja     0x5d
   8:	c3                   	retq   
   9:	0f 1f 40 00          	nopl   0x0(%rax)
   d:	48 83 ec 28          	sub    $0x28,%rsp
  11:	48 89 54 24 18       	mov    %rdx,0x18(%rsp)
[   68.791103][ T6738] RSP: 002b:00007ffe5f9adbb8 EFLAGS: 00000246
[   68.791103][ T6684] ------------[ cut here ]------------
[   68.791105][ T6738]  ORIG_RAX: 0000000000000001
[   68.791105][ T6684] kernel BUG at fs/ext4/inode.c:1385!
[   68.791106][ T6738] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f540d33a473
[   68.791106][ T6705] kernel BUG at fs/ext4/inode.c:1385!
[   68.791107][ T6738] RDX: 0000000000001000 RSI: 000055b84bf3f000 RDI: 0000000000000003
[   68.791109][ T6738] RBP: 00007f540d424000 R08: 000055b84bf3f000 R09: 00007f540d41dbe0
[   68.791109][ T6700] ------------[ cut here ]------------
[   68.791110][ T6738] R10: fffffffffffffa2e R11: 0000000000000246 R12: 000055b84bf3f000
[   68.791110][ T6700] kernel BUG at fs/ext4/inode.c:1385!
[   68.791111][ T6738] R13: 0000000000000003 R14: 00007f540d42dfc0 R15: 0000000000000000
[   68.791113][ T6738]  </TASK>
[   68.791114][ T6738] Modules linked in: loop btrfs blake2b_generic xor
[   68.791117][ T6718] ------------[ cut here ]------------
[   68.791118][ T6673] ------------[ cut here ]------------
[   68.791118][ T6688] ------------[ cut here ]------------
[   68.791118][ T6738]  raid6_pq
[   68.791120][ T6718] kernel BUG at fs/ext4/inode.c:1385!
[   68.791121][ T6673] kernel BUG at fs/ext4/inode.c:1385!
[   68.791122][ T6688] kernel BUG at fs/ext4/inode.c:1385!
[   68.791122][ T6738]  libcrc32c sd_mod t10_pi
[   68.791126][ T6715] ------------[ cut here ]------------
[   68.791126][ T6850] ------------[ cut here ]------------
[   68.791126][ T6738]  crc64_rocksoft_generic
[   68.791127][ T6715] kernel BUG at fs/ext4/inode.c:1385!
[   68.791129][ T6738]  crc64_rocksoft
[   68.791129][ T6850] kernel BUG at fs/ext4/inode.c:1385!
[   68.791131][ T6822] ------------[ cut here ]------------
[   68.791131][ T6738]  crc64
[   68.791133][ T6799] ------------[ cut here ]------------
[   68.791134][ T6738]  sg
[   68.791134][ T6822] kernel BUG at fs/ext4/inode.c:1385!
[   68.791136][ T6799] kernel BUG at fs/ext4/inode.c:1385!
[   68.791137][ T6738]  intel_rapl_msr
[   68.791138][ T6805] ------------[ cut here ]------------
[   68.791140][ T6738]  intel_rapl_common
[   68.791140][ T6805] kernel BUG at fs/ext4/inode.c:1385!
[   68.791142][ T6738]  x86_pkg_temp_thermal
[   68.791143][ T6761] ------------[ cut here ]------------
[   68.791144][ T6738]  intel_powerclamp
[   68.791145][ T6761] kernel BUG at fs/ext4/inode.c:1385!
[   68.791147][ T6738]  coretemp
[   68.791147][ T6692] ------------[ cut here ]------------
[   68.791149][ T6738]  kvm_intel kvm
[   68.791150][ T6692] kernel BUG at fs/ext4/inode.c:1385!
[   68.791151][ T6710] ------------[ cut here ]------------
[   68.791152][ T6722] ------------[ cut here ]------------
[   68.791151][ T6738]  irqbypass
[   68.791154][ T6710] kernel BUG at fs/ext4/inode.c:1385!
[   68.791155][ T6738]  crct10dif_pclmul
[   68.791155][ T6722] kernel BUG at fs/ext4/inode.c:1385!
[   68.791158][ T6738]  crc32_pclmul crc32c_intel
[   68.791160][ T6780] ------------[ cut here ]------------
[   68.791161][ T6716] ------------[ cut here ]------------
[   68.791161][ T6738]  ghash_clmulni_intel
[   68.791162][ T6685] ------------[ cut here ]------------
[   68.791163][ T6767] ------------[ cut here ]------------
[   68.791163][ T6780] kernel BUG at fs/ext4/inode.c:1385!
[   68.791164][ T6732] ------------[ cut here ]------------
[   68.791165][ T6738]  ipmi_ssif
[   68.791165][ T6857] ------------[ cut here ]------------
[   68.791165][ T6672] ------------[ cut here ]------------
[   68.791165][ T6716] kernel BUG at fs/ext4/inode.c:1385!
[   68.791166][ T6811] ------------[ cut here ]------------
[   68.791166][ T6689] ------------[ cut here ]------------
[   68.791165][ T6795] ------------[ cut here ]------------
[   68.791167][ T6816] ------------[ cut here ]------------
[   68.791165][ T6677] ------------[ cut here ]------------
[   68.791168][ T6667] ------------[ cut here ]------------
[   68.791169][ T6732] kernel BUG at fs/ext4/inode.c:1385!
[   68.791170][ T6746] ------------[ cut here ]------------
[   68.791170][ T6757] ------------[ cut here ]------------
[   68.791171][ T6845] ------------[ cut here ]------------
[   68.791170][ T6848] ------------[ cut here ]------------
[   68.791172][ T6749] ------------[ cut here ]------------
[   68.791173][ T6670] ------------[ cut here ]------------
[   68.791172][ T6816] kernel BUG at fs/ext4/inode.c:1385!
[   68.791173][ T6790] ------------[ cut here ]------------
[   68.791174][ T6751] ------------[ cut here ]------------
[   68.791172][ T6753] ------------[ cut here ]------------
[   68.791172][ T6758] ------------[ cut here ]------------
[   68.791174][ T6682] ------------[ cut here ]------------
[   68.791174][ T6760] ------------[ cut here ]------------
[   68.791175][ T6746] kernel BUG at fs/ext4/inode.c:1385!
[   68.791171][ T6725] ------------[ cut here ]------------
[   68.791172][ T6839] ------------[ cut here ]------------
[   68.791173][ T6696] ------------[ cut here ]------------
[   68.791174][ T6723] ------------[ cut here ]------------
[   68.791172][ T6668] ------------[ cut here ]------------
[   68.791174][ T6833] ------------[ cut here ]------------
[   68.791177][ T6689] kernel BUG at fs/ext4/inode.c:1385!
[   68.791178][ T6848] kernel BUG at fs/ext4/inode.c:1385!
[   68.791179][ T6723] kernel BUG at fs/ext4/inode.c:1385!
[   68.791182][ T6685] kernel BUG at fs/ext4/inode.c:1385!
[   68.791181][ T6725] kernel BUG at fs/ext4/inode.c:1385!
[   68.791182][ T6672] kernel BUG at fs/ext4/inode.c:1385!
[   68.791183][ T6790] kernel BUG at fs/ext4/inode.c:1385!
[   68.791184][ T6749] kernel BUG at fs/ext4/inode.c:1385!
[   68.791186][ T6758] kernel BUG at fs/ext4/inode.c:1385!
[   68.791186][ T6767] kernel BUG at fs/ext4/inode.c:1385!
[   68.791187][ T6795] kernel BUG at fs/ext4/inode.c:1385!
[   68.791189][ T6682] kernel BUG at fs/ext4/inode.c:1385!
[   68.791187][ T6757] kernel BUG at fs/ext4/inode.c:1385!
[   68.791190][ T6670] kernel BUG at fs/ext4/inode.c:1385!
[   68.791191][ T6839] kernel BUG at fs/ext4/inode.c:1385!
[   68.791190][ T6668] kernel BUG at fs/ext4/inode.c:1385!
[   68.791194][ T6751] kernel BUG at fs/ext4/inode.c:1385!
[   68.791195][ T6811] kernel BUG at fs/ext4/inode.c:1385!
[   68.791196][ T6760] kernel BUG at fs/ext4/inode.c:1385!
[   68.791196][ T6753] kernel BUG at fs/ext4/inode.c:1385!
[   68.791197][ T6857] kernel BUG at fs/ext4/inode.c:1385!
[   68.791200][ T6833] kernel BUG at fs/ext4/inode.c:1385!
[   68.791200][ T6667] kernel BUG at fs/ext4/inode.c:1385!
[   68.791201][ T6677] kernel BUG at fs/ext4/inode.c:1385!
[   68.791203][ T6696] kernel BUG at fs/ext4/inode.c:1385!
[   68.791204][ T6738]  sha512_ssse3
[   68.791205][ T6845] kernel BUG at fs/ext4/inode.c:1385!
[   68.791206][ T6738]  rapl
[   68.791210][ T6703] ------------[ cut here ]------------
[   68.791214][ T6738]  ahci
[   68.791214][ T6703] kernel BUG at fs/ext4/inode.c:1385!
[   68.791219][ T6738]  ast intel_cstate acpi_ipmi mei_me libahci drm_shmem_helper ioatdma intel_uncore i2c_i801 ipmi_si dax_hmem libata drm_kms_helper mei joydev i2c_smbus intel_pch_thermal dca wmi ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter drm fuse ip_tables
[   68.791234][ T6680] invalid opcode: 0000 [#2] SMP NOPTI
[   68.791237][ T6680] CPU: 9 PID: 6680 Comm: fxmark Tainted: G S    D            6.8.0-rc2-00004-g06094684ce20 #1
[   68.791239][ T6680] Hardware name: Intel Corporation M50CYP2SB1U/M50CYP2SB1U, BIOS SE5C620.86B.01.01.0003.2104260124 04/26/2021
[   68.791242][ T6738] ---[ end trace 0000000000000000 ]---
[ 68.791240][ T6680] RIP: 0010:ext4_journalled_write_end (fs/ext4/inode.c:1385) 
[ 68.791244][ T6680] Code: 48 8b 05 c1 5e bf 01 48 85 c0 74 11 48 8b 78 08 48 8b 54 24 10 48 89 de e8 d3 36 02 00 48 81 fd ff 0f 00 00 0f 87 25 fe ff ff <0f> 0b 66 83 bb f2 02 00 00 00 0f 84 27 fe ff ff 4c 8b 44 24 18 48
All code
========
   0:	48 8b 05 c1 5e bf 01 	mov    0x1bf5ec1(%rip),%rax        # 0x1bf5ec8
   7:	48 85 c0             	test   %rax,%rax
   a:	74 11                	je     0x1d
   c:	48 8b 78 08          	mov    0x8(%rax),%rdi
  10:	48 8b 54 24 10       	mov    0x10(%rsp),%rdx
  15:	48 89 de             	mov    %rbx,%rsi
  18:	e8 d3 36 02 00       	callq  0x236f0
  1d:	48 81 fd ff 0f 00 00 	cmp    $0xfff,%rbp
  24:	0f 87 25 fe ff ff    	ja     0xfffffffffffffe4f
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	66 83 bb f2 02 00 00 	cmpw   $0x0,0x2f2(%rbx)
  33:	00 
  34:	0f 84 27 fe ff ff    	je     0xfffffffffffffe61
  3a:	4c 8b 44 24 18       	mov    0x18(%rsp),%r8
  3f:	48                   	rex.W
Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	66 83 bb f2 02 00 00 	cmpw   $0x0,0x2f2(%rbx)
   9:	00 
   a:	0f 84 27 fe ff ff    	je     0xfffffffffffffe37
  10:	4c 8b 44 24 18       	mov    0x18(%rsp),%r8
  15:	48                   	rex.W
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240201/202402012226.578a5c20-oliver.sang@intel.com
-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply	[flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] fs: Introduce buffered_write_operations
  2024-02-01  4:42       ` Christoph Hellwig
@ 2024-02-02 19:54         ` Matthew Wilcox
  0 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2024-02-02 19:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, linux-ext4
On Thu, Feb 01, 2024 at 05:42:46AM +0100, Christoph Hellwig wrote:
> On Thu, Feb 01, 2024 at 04:36:09AM +0000, Matthew Wilcox wrote:
> > +static struct folio *iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> > +               size_t len)
> > 
> > with corresponding changes.  Again, ends up looking slightly cleaner.
> 
> iomap also really needs some tweaks to the naming in the area as a
> __foo function calling foo as the default is horrible.  I'll take a
> look at your patches and can add that on top.
> 
> > f2fs also doesn't seem terribly objectional; passing rpages between
> > begin & end.
> > 
> > ocfs2 is passing a ocfs2_write_ctxt between the two.
> 
> Well, it might be the intended purpose, but as-is it is horribly
> inefficient - they need to do a dynamic allocation for every
> page they iterate over.  So all of them are candidates for an
> iterator model that does this allocation once per write.
Oh, I see what you mean.  What I could do is pass in the fsdata,
like this.
commit 753c7d2d62e1
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Mon Jan 29 23:20:34 2024 -0500
    fs: Introduce buffered_write_operations
    
    Start the process of moving write_begin and write_end out from the
    address_space_operations to their own struct.
    
    The new write_begin returns the folio or an ERR_PTR instead of returning
    the folio by reference.  It also accepts len as a size_t and (as
    documented) the len may be larger than PAGE_SIZE.
    
    Pass an optional buffered_write_operations pointer to various functions
    in filemap.c.  The old names are available as macros for now, except
    for generic_file_write_iter() which is used as a function pointer by
    many filesystems.  If using the new functions, the filesystem can have
    per-operation fsdata instead of per-page fsdata.
    
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 2df35e65557d..a79c7f15ca9f 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -18,6 +18,27 @@
 
 struct folio_batch;
 
+struct buffered_write_operations {
+	struct folio *(*write_begin)(struct file *, struct address_space *,
+			loff_t pos, size_t len, void **fsdata);
+	size_t (*write_end)(struct file *, struct address_space *,
+			loff_t pos, size_t len, size_t copied,
+			struct folio *folio, void **fsdata);
+};
+
+ssize_t filemap_write_iter(struct kiocb *, struct iov_iter *,
+		const struct buffered_write_operations *, void *fsdata);
+ssize_t __filemap_write_iter(struct kiocb *, struct iov_iter *,
+		const struct buffered_write_operations *, void *fsdata);
+ssize_t filemap_perform_write(struct kiocb *, struct iov_iter *,
+		const struct buffered_write_operations *, void *fsdata);
+
+ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *);
+#define generic_perform_write(kiocb, iter)		\
+	filemap_perform_write(kiocb, iter, NULL, NULL)
+#define __generic_file_write_iter(kiocb, iter)		\
+	__filemap_write_iter(kiocb, iter, NULL, NULL)
+
 unsigned long invalidate_mapping_pages(struct address_space *mapping,
 					pgoff_t start, pgoff_t end);
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 750e779c23db..214266aeaca5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -95,7 +95,7 @@
  *    ->invalidate_lock		(filemap_fault)
  *      ->lock_page		(filemap_fault, access_process_vm)
  *
- *  ->i_rwsem			(generic_perform_write)
+ *  ->i_rwsem			(filemap_perform_write)
  *    ->mmap_lock		(fault_in_readable->do_page_fault)
  *
  *  bdi->wb.list_lock
@@ -3890,7 +3890,8 @@ generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
 }
 EXPORT_SYMBOL(generic_file_direct_write);
 
-ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
+ssize_t filemap_perform_write(struct kiocb *iocb, struct iov_iter *i,
+		const struct buffered_write_operations *ops, void *fsdata)
 {
 	struct file *file = iocb->ki_filp;
 	loff_t pos = iocb->ki_pos;
@@ -3900,11 +3901,10 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 	ssize_t written = 0;
 
 	do {
-		struct page *page;
-		unsigned long offset;	/* Offset into pagecache page */
-		unsigned long bytes;	/* Bytes to write to page */
+		struct folio *folio;
+		size_t offset;		/* Offset into pagecache folio */
+		size_t bytes;		/* Bytes to write to page */
 		size_t copied;		/* Bytes copied from user */
-		void *fsdata = NULL;
 
 		offset = (pos & (PAGE_SIZE - 1));
 		bytes = min_t(unsigned long, PAGE_SIZE - offset,
@@ -3927,19 +3927,33 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 			break;
 		}
 
-		status = a_ops->write_begin(file, mapping, pos, bytes,
+		if (ops) {
+			folio = ops->write_begin(file, mapping, pos, bytes, &fsdata);
+			if (IS_ERR(folio)) {
+				status = PTR_ERR(folio);
+				break;
+			}
+		} else {
+			struct page *page;
+			status = a_ops->write_begin(file, mapping, pos, bytes,
 						&page, &fsdata);
-		if (unlikely(status < 0))
-			break;
+			if (unlikely(status < 0))
+				break;
+			folio = page_folio(page);
+		}
 
 		if (mapping_writably_mapped(mapping))
-			flush_dcache_page(page);
+			flush_dcache_folio(folio);
 
-		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
-		flush_dcache_page(page);
+		copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
+		flush_dcache_folio(folio);
 
-		status = a_ops->write_end(file, mapping, pos, bytes, copied,
-						page, fsdata);
+		if (ops)
+			status = ops->write_end(file, mapping, pos, bytes,
+					copied, folio, &fsdata);
+		else
+			status = a_ops->write_end(file, mapping, pos, bytes,
+					copied, &folio->page, fsdata);
 		if (unlikely(status != copied)) {
 			iov_iter_revert(i, copied - max(status, 0L));
 			if (unlikely(status < 0))
@@ -3969,12 +3983,13 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 	iocb->ki_pos += written;
 	return written;
 }
-EXPORT_SYMBOL(generic_perform_write);
+EXPORT_SYMBOL(filemap_perform_write);
 
 /**
- * __generic_file_write_iter - write data to a file
- * @iocb:	IO state structure (file, offset, etc.)
- * @from:	iov_iter with data to write
+ * __filemap_write_iter - write data to a file
+ * @iocb: IO state structure (file, offset, etc.)
+ * @from: iov_iter with data to write
+ * @ops: How to inform the filesystem that a write is starting/finishing.
  *
  * This function does all the work needed for actually writing data to a
  * file. It does all basic checks, removes SUID from the file, updates
@@ -3992,7 +4007,8 @@ EXPORT_SYMBOL(generic_perform_write);
  * * number of bytes written, even for truncated writes
  * * negative error code if no data has been written at all
  */
-ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+ssize_t __filemap_write_iter(struct kiocb *iocb, struct iov_iter *from,
+		const struct buffered_write_operations *ops, void *fsdata)
 {
 	struct file *file = iocb->ki_filp;
 	struct address_space *mapping = file->f_mapping;
@@ -4019,27 +4035,29 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		if (ret < 0 || !iov_iter_count(from) || IS_DAX(inode))
 			return ret;
 		return direct_write_fallback(iocb, from, ret,
-				generic_perform_write(iocb, from));
+				filemap_perform_write(iocb, from, ops, fsdata));
 	}
 
-	return generic_perform_write(iocb, from);
+	return filemap_perform_write(iocb, from, ops, fsdata);
 }
-EXPORT_SYMBOL(__generic_file_write_iter);
+EXPORT_SYMBOL(__filemap_write_iter);
 
 /**
- * generic_file_write_iter - write data to a file
+ * filemap_write_iter - write data to a file
  * @iocb:	IO state structure
  * @from:	iov_iter with data to write
  *
- * This is a wrapper around __generic_file_write_iter() to be used by most
+ * This is a wrapper around __filemap_write_iter() to be used by most
  * filesystems. It takes care of syncing the file in case of O_SYNC file
  * and acquires i_rwsem as needed.
+ *
  * Return:
- * * negative error code if no data has been written at all of
+ * * negative error code if no data has been written at all or if
  *   vfs_fsync_range() failed for a synchronous write
  * * number of bytes written, even for truncated writes
  */
-ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+ssize_t filemap_write_iter(struct kiocb *iocb, struct iov_iter *from,
+		const struct buffered_write_operations *ops, void *fsdata)
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
@@ -4048,13 +4066,18 @@ ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	inode_lock(inode);
 	ret = generic_write_checks(iocb, from);
 	if (ret > 0)
-		ret = __generic_file_write_iter(iocb, from);
+		ret = __filemap_write_iter(iocb, from, ops, fsdata);
 	inode_unlock(inode);
 
 	if (ret > 0)
 		ret = generic_write_sync(iocb, ret);
 	return ret;
 }
+
+ssize_t generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+	return filemap_write_iter(iocb, from, NULL, NULL);
+}
 EXPORT_SYMBOL(generic_file_write_iter);
 
 /**
^ permalink raw reply related	[flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-02-02 19:54 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-30  5:54 [PATCH 0/3] Start moving write_begin/write_end out of aops Matthew Wilcox (Oracle)
2024-01-30  5:54 ` [PATCH 1/3] fs: Introduce buffered_write_operations Matthew Wilcox (Oracle)
2024-01-30  8:12   ` Christoph Hellwig
2024-02-01  4:36     ` Matthew Wilcox
2024-02-01  4:42       ` Christoph Hellwig
2024-02-02 19:54         ` Matthew Wilcox
2024-01-31  3:14   ` kernel test robot
2024-01-30  5:54 ` [PATCH 2/3] fs: Supply optional buffered_write_operations in buffer.c Matthew Wilcox (Oracle)
2024-01-30  5:54 ` [PATCH 3/3] ext4: Convert to buffered_write_operations Matthew Wilcox (Oracle)
2024-01-30  6:13   ` Matthew Wilcox
2024-01-30  8:13   ` Christoph Hellwig
2024-02-01 15:12   ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).