linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Page I/O
@ 2014-04-13 22:59 Matthew Wilcox
  2014-04-13 22:59 ` [PATCH v3 1/7] Remove block_write_full_page_endio() Matthew Wilcox
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Matthew Wilcox @ 2014-04-13 22:59 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel, Andrew Morton
  Cc: Matthew Wilcox, willy

Hi Andrew,

Now that 3.15-rc1 is out, could you queue these patches for 3.16 please?
Patches 1-3 & 7 are, IMO, worthwhile cleanups / bug fixes, regardless
of the rest of the patch set.

If this patch series gets in, I'll take care of including the NVMe
driver piece.  It'll be a bit more tricky than the proof of concept that
I've been flashing around because we have to make sure that the device
responds better to page sized I/Os than accumulating larger I/Os.

It's indisputably a win for brd and for other NVM technology devices
that are accessed synchronously rather than through DMA.

Matthew Wilcox (7):
  Remove block_write_full_page_endio()
  Factor clean_buffers() out of __mpage_writepage()
  Factor page_endio() out of mpage_end_io()
  Add bdev_read_page() and bdev_write_page()
  swap: Use bdev_read_page() / bdev_write_page()
  brd: Add support for rw_page
  brd: Return -ENOSPC rather than -ENOMEM on page allocation failure

 drivers/block/brd.c         | 16 +++++++--
 fs/block_dev.c              | 63 ++++++++++++++++++++++++++++++++++
 fs/buffer.c                 | 21 +++---------
 fs/ext4/page-io.c           |  2 +-
 fs/mpage.c                  | 84 +++++++++++++++++++++++----------------------
 fs/ocfs2/file.c             |  2 +-
 include/linux/blkdev.h      |  4 +++
 include/linux/buffer_head.h |  2 --
 include/linux/pagemap.h     |  2 ++
 mm/filemap.c                | 25 ++++++++++++++
 mm/page_io.c                | 23 +++++++++++--
 11 files changed, 178 insertions(+), 66 deletions(-)

-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/7] Remove block_write_full_page_endio()
  2014-04-13 22:59 [PATCH v3 0/7] Page I/O Matthew Wilcox
@ 2014-04-13 22:59 ` Matthew Wilcox
  2014-04-13 22:59 ` [PATCH v3 2/7] Factor clean_buffers() out of __mpage_writepage() Matthew Wilcox
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2014-04-13 22:59 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel, Andrew Morton
  Cc: Matthew Wilcox, willy

The last in-tree caller of block_write_full_page_endio() was
removed in January 2013.  It's time to remove the EXPORT_SYMBOL,
which leaves block_write_full_page() as the only caller of
block_write_full_page_endio(), so inline block_write_full_page_endio()
into block_write_full_page().

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/buffer.c                 | 21 +++++----------------
 fs/ext4/page-io.c           |  2 +-
 fs/ocfs2/file.c             |  2 +-
 include/linux/buffer_head.h |  2 --
 4 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 9ddb9fc..7b5bb90 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2879,10 +2879,9 @@ EXPORT_SYMBOL(block_truncate_page);
 
 /*
  * The generic ->writepage function for buffer-backed address_spaces
- * this form passes in the end_io handler used to finish the IO.
  */
-int block_write_full_page_endio(struct page *page, get_block_t *get_block,
-			struct writeback_control *wbc, bh_end_io_t *handler)
+int block_write_full_page(struct page *page, get_block_t *get_block,
+			struct writeback_control *wbc)
 {
 	struct inode * const inode = page->mapping->host;
 	loff_t i_size = i_size_read(inode);
@@ -2892,7 +2891,7 @@ int block_write_full_page_endio(struct page *page, get_block_t *get_block,
 	/* Is the page fully inside i_size? */
 	if (page->index < end_index)
 		return __block_write_full_page(inode, page, get_block, wbc,
-					       handler);
+					       end_buffer_async_write);
 
 	/* Is the page fully outside i_size? (truncate in progress) */
 	offset = i_size & (PAGE_CACHE_SIZE-1);
@@ -2915,18 +2914,8 @@ int block_write_full_page_endio(struct page *page, get_block_t *get_block,
 	 * writes to that region are not written out to the file."
 	 */
 	zero_user_segment(page, offset, PAGE_CACHE_SIZE);
-	return __block_write_full_page(inode, page, get_block, wbc, handler);
-}
-EXPORT_SYMBOL(block_write_full_page_endio);
-
-/*
- * The generic ->writepage function for buffer-backed address_spaces
- */
-int block_write_full_page(struct page *page, get_block_t *get_block,
-			struct writeback_control *wbc)
-{
-	return block_write_full_page_endio(page, get_block, wbc,
-					   end_buffer_async_write);
+	return __block_write_full_page(inode, page, get_block, wbc,
+							end_buffer_async_write);
 }
 EXPORT_SYMBOL(block_write_full_page);
 
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index ab95508..11c2ba5 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -428,7 +428,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
 		block_start = bh_offset(bh);
 		if (block_start >= len) {
 			/*
-			 * Comments copied from block_write_full_page_endio:
+			 * Comments copied from block_write_full_page:
 			 *
 			 * The page straddles i_size.  It must be zeroed out on
 			 * each and every writepage invocation because it may
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 8970dcf..8eb6e57 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -828,7 +828,7 @@ static int ocfs2_write_zero_page(struct inode *inode, u64 abs_from,
 		/*
 		 * fs-writeback will release the dirty pages without page lock
 		 * whose offset are over inode size, the release happens at
-		 * block_write_full_page_endio().
+		 * block_write_full_page().
 		 */
 		i_size_write(inode, abs_to);
 		inode->i_blocks = ocfs2_inode_sector_count(inode);
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index c40302f..e05c7ec 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -207,8 +207,6 @@ void block_invalidatepage(struct page *page, unsigned int offset,
 			  unsigned int length);
 int block_write_full_page(struct page *page, get_block_t *get_block,
 				struct writeback_control *wbc);
-int block_write_full_page_endio(struct page *page, get_block_t *get_block,
-			struct writeback_control *wbc, bh_end_io_t *handler);
 int block_read_full_page(struct page*, get_block_t*);
 int block_is_partially_uptodate(struct page *page, unsigned long from,
 				unsigned long count);
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/7] Factor clean_buffers() out of __mpage_writepage()
  2014-04-13 22:59 [PATCH v3 0/7] Page I/O Matthew Wilcox
  2014-04-13 22:59 ` [PATCH v3 1/7] Remove block_write_full_page_endio() Matthew Wilcox
@ 2014-04-13 22:59 ` Matthew Wilcox
  2014-04-13 22:59 ` [PATCH v3 3/7] Factor page_endio() out of mpage_end_io() Matthew Wilcox
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2014-04-13 22:59 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel, Andrew Morton
  Cc: Matthew Wilcox, willy

__mpage_writepage() is over 200 lines long, has 20 local variables,
four goto labels and could desperately use simplification.  Splitting
clean_buffers() into a helper function improves matters a little,
removing 20+ lines from it.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/mpage.c | 54 ++++++++++++++++++++++++++++++------------------------
 1 file changed, 30 insertions(+), 24 deletions(-)

diff --git a/fs/mpage.c b/fs/mpage.c
index 4979ffa..4cc9c5d 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -439,6 +439,35 @@ struct mpage_data {
 	unsigned use_writepage;
 };
 
+/*
+ * We have our BIO, so we can now mark the buffers clean.  Make
+ * sure to only clean buffers which we know we'll be writing.
+ */
+static void clean_buffers(struct page *page, unsigned first_unmapped)
+{
+	unsigned buffer_counter = 0;
+	struct buffer_head *bh, *head;
+	if (!page_has_buffers(page))
+		return;
+	head = page_buffers(page);
+	bh = head;
+
+	do {
+		if (buffer_counter++ == first_unmapped)
+			break;
+		clear_buffer_dirty(bh);
+		bh = bh->b_this_page;
+	} while (bh != head);
+
+	/*
+	 * we cannot drop the bh if the page is not uptodate or a concurrent
+	 * readpage would fail to serialize with the bh and it would read from
+	 * disk before we reach the platter.
+	 */
+	if (buffer_heads_over_limit && PageUptodate(page))
+		try_to_free_buffers(page);
+}
+
 static int __mpage_writepage(struct page *page, struct writeback_control *wbc,
 		      void *data)
 {
@@ -591,30 +620,7 @@ alloc_new:
 		goto alloc_new;
 	}
 
-	/*
-	 * OK, we have our BIO, so we can now mark the buffers clean.  Make
-	 * sure to only clean buffers which we know we'll be writing.
-	 */
-	if (page_has_buffers(page)) {
-		struct buffer_head *head = page_buffers(page);
-		struct buffer_head *bh = head;
-		unsigned buffer_counter = 0;
-
-		do {
-			if (buffer_counter++ == first_unmapped)
-				break;
-			clear_buffer_dirty(bh);
-			bh = bh->b_this_page;
-		} while (bh != head);
-
-		/*
-		 * we cannot drop the bh if the page is not uptodate
-		 * or a concurrent readpage would fail to serialize with the bh
-		 * and it would read from disk before we reach the platter.
-		 */
-		if (buffer_heads_over_limit && PageUptodate(page))
-			try_to_free_buffers(page);
-	}
+	clean_buffers(page, first_unmapped);
 
 	BUG_ON(PageWriteback(page));
 	set_page_writeback(page);
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/7] Factor page_endio() out of mpage_end_io()
  2014-04-13 22:59 [PATCH v3 0/7] Page I/O Matthew Wilcox
  2014-04-13 22:59 ` [PATCH v3 1/7] Remove block_write_full_page_endio() Matthew Wilcox
  2014-04-13 22:59 ` [PATCH v3 2/7] Factor clean_buffers() out of __mpage_writepage() Matthew Wilcox
@ 2014-04-13 22:59 ` Matthew Wilcox
  2014-04-13 22:59 ` [PATCH v3 4/7] Add bdev_read_page() and bdev_write_page() Matthew Wilcox
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2014-04-13 22:59 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel, Andrew Morton
  Cc: Matthew Wilcox, willy

page_endio() takes care of updating all the appropriate page flags
once I/O has finished to a page.  Switch to using mapping_set_error()
instead of setting AS_EIO directly; this will handle thin-provisioned
devices correctly.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 fs/mpage.c              | 18 +-----------------
 include/linux/pagemap.h |  2 ++
 mm/filemap.c            | 25 +++++++++++++++++++++++++
 3 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/fs/mpage.c b/fs/mpage.c
index 4cc9c5d..10da0da 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -48,23 +48,7 @@ static void mpage_end_io(struct bio *bio, int err)
 
 	bio_for_each_segment_all(bv, bio, i) {
 		struct page *page = bv->bv_page;
-
-		if (bio_data_dir(bio) == READ) {
-			if (!err) {
-				SetPageUptodate(page);
-			} else {
-				ClearPageUptodate(page);
-				SetPageError(page);
-			}
-			unlock_page(page);
-		} else { /* bio_data_dir(bio) == WRITE */
-			if (err) {
-				SetPageError(page);
-				if (page->mapping)
-					set_bit(AS_EIO, &page->mapping->flags);
-			}
-			end_page_writeback(page);
-		}
+		page_endio(page, bio_data_dir(bio), err);
 	}
 
 	bio_put(bio);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 45598f1..718214c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -425,6 +425,8 @@ static inline void wait_on_page_writeback(struct page *page)
 extern void end_page_writeback(struct page *page);
 void wait_for_stable_page(struct page *page);
 
+void page_endio(struct page *page, int rw, int err);
+
 /*
  * Add an arbitrary waiter to a page's wait queue
  */
diff --git a/mm/filemap.c b/mm/filemap.c
index a82fbe4..ee6a3ce 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -762,6 +762,31 @@ void end_page_writeback(struct page *page)
 }
 EXPORT_SYMBOL(end_page_writeback);
 
+/*
+ * After completing I/O on a page, call this routine to update the page
+ * flags appropriately
+ */
+void page_endio(struct page *page, int rw, int err)
+{
+	if (rw == READ) {
+		if (!err) {
+			SetPageUptodate(page);
+		} else {
+			ClearPageUptodate(page);
+			SetPageError(page);
+		}
+		unlock_page(page);
+	} else { /* rw == WRITE */
+		if (err) {
+			SetPageError(page);
+			if (page->mapping)
+				mapping_set_error(page->mapping, err);
+		}
+		end_page_writeback(page);
+	}
+}
+EXPORT_SYMBOL_GPL(page_endio);
+
 /**
  * __lock_page - get a lock on the page, assuming we need to sleep to get it
  * @page: the page to lock
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 4/7] Add bdev_read_page() and bdev_write_page()
  2014-04-13 22:59 [PATCH v3 0/7] Page I/O Matthew Wilcox
                   ` (2 preceding siblings ...)
  2014-04-13 22:59 ` [PATCH v3 3/7] Factor page_endio() out of mpage_end_io() Matthew Wilcox
@ 2014-04-13 22:59 ` Matthew Wilcox
  2014-04-13 22:59 ` [PATCH v3 5/7] swap: Use bdev_read_page() / bdev_write_page() Matthew Wilcox
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2014-04-13 22:59 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel, Andrew Morton
  Cc: Matthew Wilcox, willy

A block device driver may choose to provide a rw_page operation.
These will be called when the filesystem is attempting to do page sized
I/O to page cache pages (ie not for direct I/O).  This does preclude
I/Os that are larger than page size, so this may only be a performance
gain for some devices.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Tested-by: Dheeraj Reddy <dheeraj.reddy@intel.com>
---
 fs/block_dev.c         | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/mpage.c             | 12 ++++++++++
 include/linux/blkdev.h |  4 ++++
 3 files changed, 79 insertions(+)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 552a8d1..83fba15 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -363,6 +363,69 @@ int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
 }
 EXPORT_SYMBOL(blkdev_fsync);
 
+/**
+ * bdev_read_page() - Start reading a page from a block device
+ * @bdev: The device to read the page from
+ * @sector: The offset on the device to read the page to (need not be aligned)
+ * @page: The page to read
+ *
+ * On entry, the page should be locked.  It will be unlocked when the page
+ * has been read.  If the block driver implements rw_page synchronously,
+ * that will be true on exit from this function, but it need not be.
+ *
+ * Errors returned by this function are usually "soft", eg out of memory, or
+ * queue full; callers should try a different route to read this page rather
+ * than propagate an error back up the stack.
+ *
+ * Return: negative errno if an error occurs, 0 if submission was successful.
+ */
+int bdev_read_page(struct block_device *bdev, sector_t sector,
+			struct page *page)
+{
+	const struct block_device_operations *ops = bdev->bd_disk->fops;
+	if (!ops->rw_page)
+		return -EOPNOTSUPP;
+	return ops->rw_page(bdev, sector + get_start_sect(bdev), page, READ);
+}
+EXPORT_SYMBOL_GPL(bdev_read_page);
+
+/**
+ * bdev_write_page() - Start writing a page to a block device
+ * @bdev: The device to write the page to
+ * @sector: The offset on the device to write the page to (need not be aligned)
+ * @page: The page to write
+ * @wbc: The writeback_control for the write
+ *
+ * On entry, the page should be locked and not currently under writeback.
+ * On exit, if the write started successfully, the page will be unlocked and
+ * under writeback.  If the write failed already (eg the driver failed to
+ * queue the page to the device), the page will still be locked.  If the
+ * caller is a ->writepage implementation, it will need to unlock the page.
+ *
+ * Errors returned by this function are usually "soft", eg out of memory, or
+ * queue full; callers should try a different route to write this page rather
+ * than propagate an error back up the stack.
+ *
+ * Return: negative errno if an error occurs, 0 if submission was successful.
+ */
+int bdev_write_page(struct block_device *bdev, sector_t sector,
+			struct page *page, struct writeback_control *wbc)
+{
+	int result;
+	int rw = (wbc->sync_mode == WB_SYNC_ALL) ? WRITE_SYNC : WRITE;
+	const struct block_device_operations *ops = bdev->bd_disk->fops;
+	if (!ops->rw_page)
+		return -EOPNOTSUPP;
+	set_page_writeback(page);
+	result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, rw);
+	if (result)
+		end_page_writeback(page);
+	else
+		unlock_page(page);
+	return result;
+}
+EXPORT_SYMBOL_GPL(bdev_write_page);
+
 /*
  * pseudo-fs
  */
diff --git a/fs/mpage.c b/fs/mpage.c
index 10da0da..5f9ed62 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -269,6 +269,11 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages,
 
 alloc_new:
 	if (bio == NULL) {
+		if (first_hole == blocks_per_page) {
+			if (!bdev_read_page(bdev, blocks[0] << (blkbits - 9),
+								page))
+				goto out;
+		}
 		bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9),
 			  	min_t(int, nr_pages, bio_get_nr_vecs(bdev)),
 				GFP_KERNEL);
@@ -587,6 +592,13 @@ page_is_mapped:
 
 alloc_new:
 	if (bio == NULL) {
+		if (first_unmapped == blocks_per_page) {
+			if (!bdev_write_page(bdev, blocks[0] << (blkbits - 9),
+								page, wbc)) {
+				clean_buffers(page, first_unmapped);
+				goto out;
+			}
+		}
 		bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9),
 				bio_get_nr_vecs(bdev), GFP_NOFS|__GFP_HIGH);
 		if (bio == NULL)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0d84981..6d2de38 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1570,6 +1570,7 @@ static inline bool blk_integrity_is_initialized(struct gendisk *g)
 struct block_device_operations {
 	int (*open) (struct block_device *, fmode_t);
 	void (*release) (struct gendisk *, fmode_t);
+	int (*rw_page)(struct block_device *, sector_t, struct page *, int rw);
 	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*direct_access) (struct block_device *, sector_t,
@@ -1588,6 +1589,9 @@ struct block_device_operations {
 
 extern int __blkdev_driver_ioctl(struct block_device *, fmode_t, unsigned int,
 				 unsigned long);
+extern int bdev_read_page(struct block_device *, sector_t, struct page *);
+extern int bdev_write_page(struct block_device *, sector_t, struct page *,
+						struct writeback_control *);
 #else /* CONFIG_BLOCK */
 /*
  * stubs for when the block layer is configured out
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 5/7] swap: Use bdev_read_page() / bdev_write_page()
  2014-04-13 22:59 [PATCH v3 0/7] Page I/O Matthew Wilcox
                   ` (3 preceding siblings ...)
  2014-04-13 22:59 ` [PATCH v3 4/7] Add bdev_read_page() and bdev_write_page() Matthew Wilcox
@ 2014-04-13 22:59 ` Matthew Wilcox
  2014-04-24 18:18   ` Andrew Morton
  2014-04-13 22:59 ` [PATCH v3 6/7] brd: Add support for rw_page Matthew Wilcox
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2014-04-13 22:59 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel, Andrew Morton
  Cc: Matthew Wilcox, willy

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 mm/page_io.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/mm/page_io.c b/mm/page_io.c
index 7c59ef6..43d7220 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -248,11 +248,16 @@ out:
 	return ret;
 }
 
+static sector_t swap_page_sector(struct page *page)
+{
+	return (sector_t)__page_file_index(page) << (PAGE_CACHE_SHIFT - 9);
+}
+
 int __swap_writepage(struct page *page, struct writeback_control *wbc,
 	void (*end_write_func)(struct bio *, int))
 {
 	struct bio *bio;
-	int ret = 0, rw = WRITE;
+	int ret, rw = WRITE;
 	struct swap_info_struct *sis = page_swap_info(page);
 
 	if (sis->flags & SWP_FILE) {
@@ -297,6 +302,13 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 		return ret;
 	}
 
+	ret = bdev_write_page(sis->bdev, swap_page_sector(page), page, wbc);
+	if (!ret) {
+		count_vm_event(PSWPOUT);
+		return 0;
+	}
+
+	ret = 0;
 	bio = get_swap_bio(GFP_NOIO, page, end_write_func);
 	if (bio == NULL) {
 		set_page_dirty(page);
@@ -317,7 +329,7 @@ out:
 int swap_readpage(struct page *page)
 {
 	struct bio *bio;
-	int ret = 0;
+	int ret;
 	struct swap_info_struct *sis = page_swap_info(page);
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
@@ -338,6 +350,13 @@ int swap_readpage(struct page *page)
 		return ret;
 	}
 
+	ret = bdev_read_page(sis->bdev, swap_page_sector(page), page);
+	if (!ret) {
+		count_vm_event(PSWPIN);
+		return 0;
+	}
+
+	ret = 0;
 	bio = get_swap_bio(GFP_KERNEL, page, end_swap_bio_read);
 	if (bio == NULL) {
 		unlock_page(page);
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 6/7] brd: Add support for rw_page
  2014-04-13 22:59 [PATCH v3 0/7] Page I/O Matthew Wilcox
                   ` (4 preceding siblings ...)
  2014-04-13 22:59 ` [PATCH v3 5/7] swap: Use bdev_read_page() / bdev_write_page() Matthew Wilcox
@ 2014-04-13 22:59 ` Matthew Wilcox
  2014-04-13 22:59 ` [PATCH v3 7/7] brd: Return -ENOSPC rather than -ENOMEM on page allocation failure Matthew Wilcox
  2014-04-14  0:08 ` [PATCH v3 0/7] Page I/O Minchan Kim
  7 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2014-04-13 22:59 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel, Andrew Morton
  Cc: Matthew Wilcox, willy

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 drivers/block/brd.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index e73b85c..807d3d5 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -360,6 +360,15 @@ out:
 	bio_endio(bio, err);
 }
 
+static int brd_rw_page(struct block_device *bdev, sector_t sector,
+		       struct page *page, int rw)
+{
+	struct brd_device *brd = bdev->bd_disk->private_data;
+	int err = brd_do_bvec(brd, page, PAGE_CACHE_SIZE, 0, rw, sector);
+	page_endio(page, rw & WRITE, err);
+	return err;
+}
+
 #ifdef CONFIG_BLK_DEV_XIP
 static int brd_direct_access(struct block_device *bdev, sector_t sector,
 			void **kaddr, unsigned long *pfn)
@@ -419,6 +428,7 @@ static int brd_ioctl(struct block_device *bdev, fmode_t mode,
 
 static const struct block_device_operations brd_fops = {
 	.owner =		THIS_MODULE,
+	.rw_page =		brd_rw_page,
 	.ioctl =		brd_ioctl,
 #ifdef CONFIG_BLK_DEV_XIP
 	.direct_access =	brd_direct_access,
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 7/7] brd: Return -ENOSPC rather than -ENOMEM on page allocation failure
  2014-04-13 22:59 [PATCH v3 0/7] Page I/O Matthew Wilcox
                   ` (5 preceding siblings ...)
  2014-04-13 22:59 ` [PATCH v3 6/7] brd: Add support for rw_page Matthew Wilcox
@ 2014-04-13 22:59 ` Matthew Wilcox
  2014-04-14  0:08 ` [PATCH v3 0/7] Page I/O Minchan Kim
  7 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2014-04-13 22:59 UTC (permalink / raw)
  To: linux-fsdevel, linux-mm, linux-kernel, Andrew Morton
  Cc: Matthew Wilcox, willy

brd is effectively a thinly provisioned device.  Thinly provisioned
devices return -ENOSPC when they can't write a new block.  -ENOMEM is
an implementation detail that callers shouldn't know.

Acked-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
---
 drivers/block/brd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 807d3d5..c7d138e 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -200,11 +200,11 @@ static int copy_to_brd_setup(struct brd_device *brd, sector_t sector, size_t n)
 
 	copy = min_t(size_t, n, PAGE_SIZE - offset);
 	if (!brd_insert_page(brd, sector))
-		return -ENOMEM;
+		return -ENOSPC;
 	if (copy < n) {
 		sector += copy >> SECTOR_SHIFT;
 		if (!brd_insert_page(brd, sector))
-			return -ENOMEM;
+			return -ENOSPC;
 	}
 	return 0;
 }
@@ -384,7 +384,7 @@ static int brd_direct_access(struct block_device *bdev, sector_t sector,
 		return -ERANGE;
 	page = brd_insert_page(brd, sector);
 	if (!page)
-		return -ENOMEM;
+		return -ENOSPC;
 	*kaddr = page_address(page);
 	*pfn = page_to_pfn(page);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/7] Page I/O
  2014-04-13 22:59 [PATCH v3 0/7] Page I/O Matthew Wilcox
                   ` (6 preceding siblings ...)
  2014-04-13 22:59 ` [PATCH v3 7/7] brd: Return -ENOSPC rather than -ENOMEM on page allocation failure Matthew Wilcox
@ 2014-04-14  0:08 ` Minchan Kim
  7 siblings, 0 replies; 12+ messages in thread
From: Minchan Kim @ 2014-04-14  0:08 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-fsdevel, linux-mm, linux-kernel, Andrew Morton, willy

On Sun, Apr 13, 2014 at 06:59:49PM -0400, Matthew Wilcox wrote:
> Hi Andrew,
> 
> Now that 3.15-rc1 is out, could you queue these patches for 3.16 please?
> Patches 1-3 & 7 are, IMO, worthwhile cleanups / bug fixes, regardless
> of the rest of the patch set.
> 
> If this patch series gets in, I'll take care of including the NVMe
> driver piece.  It'll be a bit more tricky than the proof of concept that
> I've been flashing around because we have to make sure that the device
> responds better to page sized I/Os than accumulating larger I/Os.
> 
> It's indisputably a win for brd and for other NVM technology devices
> that are accessed synchronously rather than through DMA.

FYI, It would be good for zram, too.
I support this patchset.

>-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 5/7] swap: Use bdev_read_page() / bdev_write_page()
  2014-04-13 22:59 ` [PATCH v3 5/7] swap: Use bdev_read_page() / bdev_write_page() Matthew Wilcox
@ 2014-04-24 18:18   ` Andrew Morton
  2014-04-24 18:57     ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2014-04-24 18:18 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, linux-kernel, willy, Hugh Dickins

On Sun, 13 Apr 2014 18:59:54 -0400 Matthew Wilcox <matthew.r.wilcox@intel.com> wrote:

>  mm/page_io.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)

Some changelog here would be nice.  What were the reasons for the
change?  Any observable performance changes?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 5/7] swap: Use bdev_read_page() / bdev_write_page()
  2014-04-24 18:18   ` Andrew Morton
@ 2014-04-24 18:57     ` Matthew Wilcox
  2014-04-25 14:01       ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2014-04-24 18:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-fsdevel, linux-mm, linux-kernel,
	Hugh Dickins

On Thu, Apr 24, 2014 at 11:18:17AM -0700, Andrew Morton wrote:
> On Sun, 13 Apr 2014 18:59:54 -0400 Matthew Wilcox <matthew.r.wilcox@intel.com> wrote:
> 
> >  mm/page_io.c | 23 +++++++++++++++++++++--
> >  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> Some changelog here would be nice.  What were the reasons for the
> change?  Any observable performance changes?

Whoops ... I could swear I wrote one.  Wonder what happened to it.  Here
was all I had:

We can avoid allocating a BIO if we use the writepage path instead of
the Direct I/O path.

But that's kind of lame.  I don't have any performance numbers right now,
so how about we go with:

By calling the device driver to write the page directly, we avoid
allocating a BIO, which allows us to free memory without allocating
memory.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 5/7] swap: Use bdev_read_page() / bdev_write_page()
  2014-04-24 18:57     ` Matthew Wilcox
@ 2014-04-25 14:01       ` Matthew Wilcox
  0 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2014-04-25 14:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, linux-fsdevel, linux-mm, linux-kernel,
	Hugh Dickins

On Thu, Apr 24, 2014 at 02:57:40PM -0400, Matthew Wilcox wrote:
> By calling the device driver to write the page directly, we avoid
> allocating a BIO, which allows us to free memory without allocating
> memory.

I got handed some performance numbers last night!  Next time you're updating
the patch description, please use:

By calling the device driver to write the page directly, we avoid
allocating a BIO, which allows us to free memory without allocating
memory.  When running a swap-heavy benchmark, system time is reduced by
about 20%.

Tested-by: Dheeraj Reddy <dheeraj.reddy@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-04-25 14:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-13 22:59 [PATCH v3 0/7] Page I/O Matthew Wilcox
2014-04-13 22:59 ` [PATCH v3 1/7] Remove block_write_full_page_endio() Matthew Wilcox
2014-04-13 22:59 ` [PATCH v3 2/7] Factor clean_buffers() out of __mpage_writepage() Matthew Wilcox
2014-04-13 22:59 ` [PATCH v3 3/7] Factor page_endio() out of mpage_end_io() Matthew Wilcox
2014-04-13 22:59 ` [PATCH v3 4/7] Add bdev_read_page() and bdev_write_page() Matthew Wilcox
2014-04-13 22:59 ` [PATCH v3 5/7] swap: Use bdev_read_page() / bdev_write_page() Matthew Wilcox
2014-04-24 18:18   ` Andrew Morton
2014-04-24 18:57     ` Matthew Wilcox
2014-04-25 14:01       ` Matthew Wilcox
2014-04-13 22:59 ` [PATCH v3 6/7] brd: Add support for rw_page Matthew Wilcox
2014-04-13 22:59 ` [PATCH v3 7/7] brd: Return -ENOSPC rather than -ENOMEM on page allocation failure Matthew Wilcox
2014-04-14  0:08 ` [PATCH v3 0/7] Page I/O Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).