[RFC][PATCH 0/2] 9p: v9fs read and write speedup

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH 0/2] 9p: v9fs read and write speedup
@ 2016-10-10 17:24 Edward Shishkin
  2016-10-10 17:24 ` [PATCH 1/2] 9p: v9fs add writepages Edward Shishkin
  2016-12-12 18:13 ` [RFC][PATCH 0/7] 9p: v9fs read and write speedup - V2 Edward Shishkin
  0 siblings, 2 replies; 15+ messages in thread
From: Edward Shishkin @ 2016-10-10 17:24 UTC (permalink / raw)
  To: Eric Van Hensbergen, V9FS Developers Mailing List,
	Linux Filesystem Development List
  Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana,
	Edward Shishkin

Hello everyone,

The progress in virtualization and cloud technologies has resulted in
a popularity of file sets shared on the host machines by Plan 9 File
Protocol (the sharing setup is also known as VirtFS). Another sharing
setup which uses NFS protocol is less popular because of number of
reasons.

Unfortunately performance of default VirtFS setup is poor.
We analyzed the reasons in our Labs of Huawei Technologies and found
that typical bottleneck is caused by the fact that transfer of any
portion of data by many small 9p messages is slower than transfer of
the same amount of data by a single 9p message.

It is possible to reduce number of 9P messages (and, hence, to improve
performance) by a number of ways(*), however, some "hardcoded"
bottlenecks are still present in the v9fs driver of the guest kernel.
Specifically, this is poor implementation of read-ahead and
write-behind paths of v9fs. With current implementations there is no
chances that more than PAGE_SIZE bytes of data will be transmitted at
one time.

To improve the situation we have introduced a special layer, which
allows to coalesce specified number of adjacent pages (if any) into a
single 9P message. This layer is represented by private
implementations of ->readpages() and ->writepages() address space
operations for v9fs.

To merge adjacent pages we use a special buffer of size, which depends
on specified (per mount session) msize. For read-ahead paths we
allocate such buffers by demand. For writeback paths we use a single
buffer pre-allocated at mount time. This is because at writeback paths
the file system usually responds to memory pressure notification, so
we can not afford allocation by-demand at writeback paths.

All pages which are supposed to merge are coped to the buffer at
respective offsets. Then we construct and transmit a long read(write)
9P message (**).

Thus, we managed to speedup only one writeback-ing thread. Other
concurrent threads, which failed to obtain the buffer, will go by
usual (slow) ways. If interesting, I'll implement a solution with N
pre-allocated buffers (where N is number of CPUs).

This approach allows to increase VirtFS bandwidth up to 3 times, and
thus, to make it close to the bandwidth of VirtIO-blk (see the numbers
in the Appendix A below).

Comment. Our patches improve only asynchronous operations, which
involve the page cache. Direct reads and writes will be unaffected by
obvious reasons.
NOTE, that by default v9fs works in direct mode, so in order to see an
effect, you should specify respective v9fs mount option (e.g.
"fscache").

----
(*)  Specifying larger mszie (maximal possible size of 9P message)
allow to reduce number of 9P messages in direct operations performed
by large chunks.
Disabling v9fs ACL and security labels in the guest kernel (if it is
not needed) allows to avoid extra-messages.

(**) 9P, Plan 9 File Protocol specifications
https://swtch.com/plan9port/man/man9/intro.html

Appendix A.

        iozone -e -r chunk_size -s 6G -w -f

            Throughput in mBytes/sec

operation  chunk_size   (1)       (2)       (3)

write          1M       391       127       330
read           1M       469       221       432
write          4K       410       128       297
read           4K       465       192       327
random write   1M       403       145       313
random read    1M       347       161       195
random write   4K       344       119       131
random read    4K        44        41        64

Legend:

(1): VirtIO-blk
(2): VirtIO-9p
(3): VirtIO-9p: guest kernel is patched with our stuff

Hardware & Software:

Host:  8 CPU Intel(R) Core(TM) Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
16G RAM, SSD: Noname. Throughput: Write: 410 M/sec, Read: 470 M/sec.
Fedora 24, Kernel-4.7.4-200.fc24.x86_64, kvm+qemu-2.7.0, fs: ext4

Guest: 2 CPU: GenuineIntel 3.4GHz, 2G RAM, Network model: VirtIO
Fedora 21, Kernel: 4.7.6

Settings:
VirtIO-blk: Guest FS: ext4;
VirtIO-9p:  mount option:
             "trans=virtio,version=9p2000.L,msize=131096,fscache"

Caches of the host and guest were dropped before every iozone's phase.

CC-ed QEMU developers list for possible comments and ACKs.

Please, consider for inclusion.

Thanks,
Edward.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] 9p: v9fs add writepages.
  2016-10-10 17:24 [RFC][PATCH 0/2] 9p: v9fs read and write speedup Edward Shishkin
@ 2016-10-10 17:24 ` Edward Shishkin
  2016-10-10 17:24   ` [PATCH 2/2] 9p: v9fs new readpages Edward Shishkin
  2016-10-25 14:01   ` [Qemu-devel] [PATCH 1/2] 9p: v9fs add writepages Alexander Graf
  2016-12-12 18:13 ` [RFC][PATCH 0/7] 9p: v9fs read and write speedup - V2 Edward Shishkin
  1 sibling, 2 replies; 15+ messages in thread
From: Edward Shishkin @ 2016-10-10 17:24 UTC (permalink / raw)
  To: Eric Van Hensbergen, V9FS Developers Mailing List,
	Linux Filesystem Development List
  Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana,
	Edward Shishkin, Edward Shishkin

Add a v9fs private ->writepages() method of address_space
operations for merging pages into long 9p messages.

Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com>
---
 fs/9p/v9fs.c      |  46 +++++++
 fs/9p/v9fs.h      |  22 +++-
 fs/9p/vfs_addr.c  | 357 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/9p/vfs_super.c |   8 +-
 4 files changed, 431 insertions(+), 2 deletions(-)

diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index 072e759..3b49daf 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -32,6 +32,7 @@
 #include <linux/parser.h>
 #include <linux/idr.h>
 #include <linux/slab.h>
+#include <linux/pagemap.h>
 #include <net/9p/9p.h>
 #include <net/9p/client.h>
 #include <net/9p/transport.h>
@@ -309,6 +310,49 @@ static int v9fs_parse_options(struct v9fs_session_info *v9ses, char *opts)
 	return ret;
 }
 
+void put_flush_set(struct v9fs_flush_set *fset)
+{
+	if (!fset)
+		return;
+	if (fset->pages)
+		kfree(fset->pages);
+	if (fset->buf)
+		kfree(fset->buf);
+	kfree(fset);
+}
+
+/**
+ * Allocate and initalize flush set
+ * Pre-conditions: valid msize is set
+ */
+int alloc_init_flush_set(struct v9fs_session_info *v9ses)
+{
+	int ret = -ENOMEM;
+	int num_pages;
+	struct v9fs_flush_set *fset = NULL;
+
+	num_pages = v9ses->clnt->msize >> PAGE_SHIFT;
+	if (num_pages < 2)
+		/* speedup impossible */
+		return 0;
+	fset = kzalloc(sizeof(*fset), GFP_KERNEL);
+	if (!fset)
+		goto error;
+	fset->num_pages = num_pages;
+	fset->pages = kzalloc(num_pages * sizeof(*fset->pages), GFP_KERNEL);
+	if (!fset->pages)
+		goto error;
+	fset->buf = kzalloc(num_pages << PAGE_SHIFT, GFP_USER);
+	if (!fset->buf)
+		goto error;
+	spin_lock_init(&(fset->lock));
+	v9ses->flush = fset;
+	return 0;
+ error:
+	put_flush_set(fset);
+	return ret;
+}
+
 /**
  * v9fs_session_init - initialize session
  * @v9ses: session information structure
@@ -444,6 +488,8 @@ void v9fs_session_close(struct v9fs_session_info *v9ses)
 	kfree(v9ses->uname);
 	kfree(v9ses->aname);
 
+	put_flush_set(v9ses->flush);
+
 	bdi_destroy(&v9ses->bdi);
 
 	spin_lock(&v9fs_sessionlist_lock);
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 6877050..d1092e4 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -23,6 +23,7 @@
 #ifndef FS_9P_V9FS_H
 #define FS_9P_V9FS_H
 
+#include <linux/kconfig.h>
 #include <linux/backing-dev.h>
 
 /**
@@ -69,6 +70,13 @@ enum p9_cache_modes {
 	CACHE_FSCACHE,
 };
 
+struct v9fs_flush_set {
+        struct page **pages;
+	int num_pages;
+        char *buf;
+	spinlock_t lock;
+};
+
 /**
  * struct v9fs_session_info - per-instance session information
  * @flags: session options of type &p9_session_flags
@@ -105,7 +113,7 @@ struct v9fs_session_info {
 	char *cachetag;
 	struct fscache_cookie *fscache;
 #endif
-
+	struct v9fs_flush_set *flush; /* flush set for writepages */
 	char *uname;		/* user name to mount as */
 	char *aname;		/* name of remote hierarchy being mounted */
 	unsigned int maxdata;	/* max data for client interface */
@@ -158,6 +166,8 @@ extern const struct inode_operations v9fs_symlink_inode_operations_dotl;
 extern struct inode *v9fs_inode_from_fid_dotl(struct v9fs_session_info *v9ses,
 					      struct p9_fid *fid,
 					      struct super_block *sb, int new);
+extern int alloc_init_flush_set(struct v9fs_session_info *v9ses);
+extern void put_flush_set(struct v9fs_flush_set *fset);
 
 /* other default globals */
 #define V9FS_PORT	564
@@ -222,4 +232,14 @@ v9fs_get_new_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
 		return v9fs_inode_from_fid(v9ses, fid, sb, 1);
 }
 
+static inline int spin_trylock_flush_set(struct v9fs_flush_set *fset)
+{
+	return spin_trylock(&(fset->lock));
+}
+
+static inline void spin_unlock_flush_set(struct v9fs_flush_set *fset)
+{
+	spin_unlock(&(fset->lock));
+}
+
 #endif
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 6181ad7..e871886 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -36,6 +36,7 @@
 #include <linux/uio.h>
 #include <net/9p/9p.h>
 #include <net/9p/client.h>
+#include <trace/events/writeback.h>
 
 #include "v9fs.h"
 #include "v9fs_vfs.h"
@@ -209,6 +210,361 @@ static int v9fs_vfs_writepage(struct page *page, struct writeback_control *wbc)
 	return retval;
 }
 
+static void redirty_pages_for_writeback(struct page **pages, int nr,
+					struct writeback_control *wbc)
+{
+	int i;
+	for (i = 0; i < nr; i++) {
+		lock_page(pages[i]);
+		redirty_page_for_writepage(wbc, pages[i]);
+		unlock_page(pages[i]);
+	}
+}
+
+static void set_pages_error(struct page **pages, int nr, int error)
+{
+	int i;
+	for (i = 0; i < nr; i++) {
+		lock_page(pages[i]);
+		SetPageError(pages[i]);
+		mapping_set_error(pages[i]->mapping, error);
+		unlock_page(pages[i]);
+	}
+}
+
+#define V9FS_WRITEPAGES_DEBUG   (0)
+
+struct flush_context {
+	struct writeback_control *wbc;
+	struct address_space *mapping;
+	struct v9fs_flush_set *fset;
+	pgoff_t done_index;
+	pgoff_t writeback_index;
+	pgoff_t index;
+	pgoff_t end; /* Inclusive */
+	const char *msg;
+	int cycled;
+	int range_whole;
+	int done;
+};
+
+/**
+ * Copy a page with file's data to buffer.
+ * Handle races with truncate, etc.
+ * Return number of copied bytes
+ *
+ * @page: page to copy data from;
+ * @page_nr: serial number of the page
+ */
+static int flush_page(struct page *page, int page_nr, struct flush_context *ctx)
+{
+	char *kdata;
+	loff_t isize;
+	int copied = 0;
+	struct writeback_control *wbc = ctx->wbc;
+	/*
+	 * At this point, the page may be truncated or
+	 * invalidated (changing page->mapping to NULL), or
+	 * even swizzled back from swapper_space to tmpfs file
+	 * mapping. However, page->index will not change
+	 * because we have a reference on the page.
+	 */
+	if (page->index > ctx->end) {
+		/*
+		 * can't be range_cyclic (1st pass) because
+		 * end == -1 in that case.
+		 */
+		ctx->done = 1;
+		ctx->msg = "page out of range";
+		goto exit;
+	}
+	ctx->done_index = page->index;
+	lock_page(page);
+	/*
+	 * Page truncated or invalidated. We can freely skip it
+	 * then, even for data integrity operations: the page
+	 * has disappeared concurrently, so there could be no
+	 * real expectation of this data interity operation
+	 * even if there is now a new, dirty page at the same
+	 * pagecache address.
+	 */
+	if (unlikely(page->mapping != ctx->mapping)) {
+		unlock_page(page);
+		ctx->msg = "page truncated or invalidated";
+		goto exit;
+	}
+	if (!PageDirty(page)) {
+		/*
+		 * someone wrote it for us
+		 */
+		unlock_page(page);
+		ctx->msg = "page not dirty";
+		goto exit;
+	}
+	if (PageWriteback(page)) {
+		if (wbc->sync_mode != WB_SYNC_NONE)
+			wait_on_page_writeback(page);
+		else {
+			unlock_page(page);
+			ctx->msg = "page is writeback";
+			goto exit;
+		}
+	}
+	BUG_ON(PageWriteback(page));
+	if (!clear_page_dirty_for_io(page)) {
+		unlock_page(page);
+		ctx->msg = "failed to clear page dirty";
+		goto exit;
+	}
+	trace_wbc_writepage(wbc, inode_to_bdi(ctx->mapping->host));
+
+	set_page_writeback(page);
+	isize = i_size_read(ctx->mapping->host);
+	if (page->index == isize >> PAGE_SHIFT)
+		copied = isize & ~PAGE_MASK;
+	else
+		copied = PAGE_SIZE;
+	kdata = kmap_atomic(page);
+	memcpy(ctx->fset->buf + (page_nr << PAGE_SHIFT), kdata, copied);
+	kunmap_atomic(kdata);
+	end_page_writeback(page);
+
+	unlock_page(page);
+	/*
+	 * We stop writing back only if we are not doing
+	 * integrity sync. In case of integrity sync we have to
+	 * keep going until we have written all the pages
+	 * we tagged for writeback prior to entering this loop.
+	 */
+	if (--wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE)
+		ctx->done = 1;
+ exit:
+	return copied;
+}
+
+static int send_buffer(off_t offset, int len, struct flush_context *ctx)
+{
+	int ret = 0;
+	struct kvec kvec;
+	struct iov_iter iter;
+	struct v9fs_inode *v9inode = V9FS_I(ctx->mapping->host);
+
+	kvec.iov_base = ctx->fset->buf;
+	kvec.iov_len = len;
+	iov_iter_kvec(&iter, WRITE | ITER_KVEC, &kvec, 1, len);
+	BUG_ON(!v9inode->writeback_fid);
+
+	p9_client_write(v9inode->writeback_fid, offset, &iter, &ret);
+	return ret;
+}
+
+/**
+ * Helper function for managing 9pFS write requests.
+ * The main purpose of this function is to provide support for
+ * the coalescing of several pages into a single 9p message.
+ * This is similarly to NFS's pagelist.
+ *
+ * Copy pages with adjusent indices to a buffer and send it to
+ * the server.
+ *
+ * @pages: array of pages with ascending indices;
+ * @nr_pages: number of pages in the array;
+ */
+static int flush_pages(struct page **pages, int nr_pages,
+		       struct flush_context *ctx)
+{
+	int ret;
+	int pos = 0;
+	int iter_pos;
+	int iter_nrpages;
+	pgoff_t iter_page_idx;
+
+	while (pos < nr_pages) {
+
+		int i;
+		int iter_len = 0;
+		struct page *page;
+
+		iter_pos = pos;
+		iter_nrpages = 0;
+		iter_page_idx = pages[pos]->index;
+
+		for (i = 0; pos < nr_pages; i++) {
+			int from_page;
+
+			page = pages[pos];
+			if (page->index != iter_page_idx + i) {
+				/*
+				 * Hole in the indices,
+				 * further coalesce impossible.
+				 * Try to send what we have accumulated.
+				 * This page will be processed in the next
+				 * iteration
+				 */
+				goto iter_send;
+			}
+			from_page = flush_page(page, i, ctx);
+
+			iter_len += from_page;
+			iter_nrpages++;
+			pos++;
+
+			if (from_page != PAGE_SIZE) {
+				/*
+				 * Not full page was flushed,
+				 * further coalesce impossible.
+				 * Try to send what we have accumulated.
+				 */
+#if V9FS_WRITEPAGES_DEBUG
+				if (from_page == 0)
+				    printk("9p: page %lu is not flushed (%s)\n",
+					   page->index, ctx->msg);
+#endif
+				goto iter_send;
+			}
+		};
+	iter_send:
+		if (iter_len == 0)
+			/*
+			 * Nothing to send
+			 */
+			goto next_iter;
+		ret = send_buffer(iter_page_idx << PAGE_SHIFT,
+				  iter_len, ctx);
+		if (ret == -EAGAIN) {
+			redirty_pages_for_writeback(pages + iter_pos,
+						    iter_nrpages, ctx->wbc);
+			ret = 0;
+		} else if (ret < 0) {
+			/*
+			 * Something bad happened.
+			 * done_index is set past this chunk,
+			 * so media errors will not choke
+			 * background writeout for the entire
+			 * file.
+			 */
+			printk("9p: send_buffer failed (%d)\n", ret);
+
+			set_pages_error(pages + iter_pos, iter_nrpages, ret);
+			ctx->done_index =
+				pages[iter_pos + iter_nrpages - 1]->index + 1;
+			ctx->done = 1;
+			return ret;
+		} else
+			ret = 0;
+	next_iter:
+		if (ctx->done)
+			return ret;
+	};
+	return 0;
+}
+
+static void init_flush_context(struct flush_context *ctx,
+			       struct address_space *mapping,
+			       struct writeback_control *wbc,
+			       struct v9fs_flush_set *fset)
+{
+	ctx->wbc = wbc;
+	ctx->mapping = mapping;
+	ctx->fset = fset;
+	ctx->done = 0;
+	ctx->range_whole = 0;
+
+	if (wbc->range_cyclic) {
+		ctx->writeback_index = mapping->writeback_index;
+		ctx->index = ctx->writeback_index;
+		if (ctx->index == 0)
+			ctx->cycled = 1;
+		else
+			ctx->cycled = 0;
+		ctx->end = -1;
+	} else {
+		ctx->index = wbc->range_start >> PAGE_SHIFT;
+		ctx->end = wbc->range_end >> PAGE_SHIFT;
+		if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
+			ctx->range_whole = 1;
+		ctx->cycled = 1; /* ignore range_cyclic tests */
+	}
+}
+
+/**
+ * Pre-condition: flush set is locked
+ */
+static int v9fs_writepages_fastpath(struct address_space *mapping,
+				    struct writeback_control *wbc,
+				    struct v9fs_flush_set *fset)
+{
+	int ret = 0;
+	int tag;
+	int nr_pages;
+        struct page **pages = fset->pages;
+	struct flush_context ctx;
+
+	init_flush_context(&ctx, mapping, wbc, fset);
+
+	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+		tag = PAGECACHE_TAG_TOWRITE;
+	else
+		tag = PAGECACHE_TAG_DIRTY;
+retry:
+	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+		tag_pages_for_writeback(mapping, ctx.index, ctx.end);
+
+	ctx.done_index = ctx.index;
+
+	while (!ctx.done && (ctx.index <= ctx.end)) {
+		int i;
+		nr_pages = find_get_pages_tag(mapping, &ctx.index, tag,
+					      1 + min(ctx.end - ctx.index,
+					      (pgoff_t)(fset->num_pages - 1)),
+					      pages);
+		if (nr_pages == 0)
+			break;
+
+		ret = flush_pages(pages, nr_pages, &ctx);
+		/*
+		 * unpin pages
+		 */
+		for (i = 0; i < nr_pages; i++)
+			put_page(pages[i]);
+		if (ret < 0)
+			break;
+		cond_resched();
+	}
+	if (!ctx.cycled && !ctx.done) {
+		/*
+		 * range_cyclic:
+		 * We hit the last page and there is more work
+		 * to be done: wrap back to the start of the file
+		 */
+		ctx.cycled = 1;
+		ctx.index = 0;
+		ctx.end = ctx.writeback_index - 1;
+		goto retry;
+	}
+	if (wbc->range_cyclic || (ctx.range_whole && wbc->nr_to_write > 0))
+		mapping->writeback_index = ctx.done_index;
+	return ret;
+}
+
+static int v9fs_writepages(struct address_space *mapping,
+			   struct writeback_control *wbc)
+{
+	int ret;
+	struct v9fs_flush_set *fset;
+
+	fset = v9fs_inode2v9ses(mapping->host)->flush;
+	if (!fset || !spin_trylock_flush_set(fset))
+		/*
+		 * do it by slow way
+		 */
+		return generic_writepages(mapping, wbc);
+
+	ret = v9fs_writepages_fastpath(mapping, wbc, fset);
+	spin_unlock_flush_set(fset);
+	return ret;
+}
+
 /**
  * v9fs_launder_page - Writeback a dirty page
  * Returns 0 on success.
@@ -342,6 +698,7 @@ const struct address_space_operations v9fs_addr_operations = {
 	.readpages = v9fs_vfs_readpages,
 	.set_page_dirty = __set_page_dirty_nobuffers,
 	.writepage = v9fs_vfs_writepage,
+	.writepages = v9fs_writepages,
 	.write_begin = v9fs_write_begin,
 	.write_end = v9fs_write_end,
 	.releasepage = v9fs_release_page,
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index de3ed86..c1f9af1 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -140,8 +140,14 @@ static struct dentry *v9fs_mount(struct file_system_type *fs_type, int flags,
 	}
 	v9fs_fill_super(sb, v9ses, flags, data);
 
-	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE)
+	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
+		retval = alloc_init_flush_set(v9ses);
+		if (IS_ERR(v9ses->flush)) {
+			retval = PTR_ERR(fid);
+			goto release_sb;
+		}
 		sb->s_d_op = &v9fs_cached_dentry_operations;
+	}
 	else
 		sb->s_d_op = &v9fs_dentry_operations;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] 9p: v9fs new readpages.
  2016-10-10 17:24 ` [PATCH 1/2] 9p: v9fs add writepages Edward Shishkin
@ 2016-10-10 17:24   ` Edward Shishkin
  2016-10-25 14:13     ` [Qemu-devel] " Alexander Graf
  2016-10-25 14:01   ` [Qemu-devel] [PATCH 1/2] 9p: v9fs add writepages Alexander Graf
  1 sibling, 1 reply; 15+ messages in thread
From: Edward Shishkin @ 2016-10-10 17:24 UTC (permalink / raw)
  To: Eric Van Hensbergen, V9FS Developers Mailing List,
	Linux Filesystem Development List
  Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana,
	Edward Shishkin, Edward Shishkin

Modify v9fs private ->readpages() method of address_space
operations for merging pages into long 9p messages.

Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com>
---
 fs/9p/vfs_addr.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 155 insertions(+), 1 deletion(-)

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index e871886..4ad248e 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -34,6 +34,7 @@
 #include <linux/idr.h>
 #include <linux/sched.h>
 #include <linux/uio.h>
+#include <linux/slab.h>
 #include <net/9p/9p.h>
 #include <net/9p/client.h>
 #include <trace/events/writeback.h>
@@ -99,6 +100,148 @@ static int v9fs_vfs_readpage(struct file *filp, struct page *page)
 	return v9fs_fid_readpage(filp->private_data, page);
 }
 
+/*
+ * Context for "fast readpages"
+ */
+struct v9fs_readpages_ctx {
+	struct file *filp;
+	struct address_space *mapping;
+	pgoff_t start_index; /* index of the first page with actual data */
+	char *buf; /* buffer with actual data */
+	int len; /* length of the actual data */
+	int num_pages; /* maximal data chunk (in pages) that can be
+			  passed per transmission */
+};
+
+static int init_readpages_ctx(struct v9fs_readpages_ctx *ctx,
+			      struct file *filp,
+			      struct address_space *mapping,
+			      int num_pages)
+{
+	memset(ctx, 0, sizeof(*ctx));
+	ctx->buf = kmalloc(num_pages << PAGE_SHIFT, GFP_USER);
+	if (!ctx->buf)
+		return -ENOMEM;
+	ctx->filp = filp;
+	ctx->mapping = mapping;
+	ctx->num_pages = num_pages;
+	return 0;
+}
+
+static void done_readpages_ctx(struct v9fs_readpages_ctx *ctx)
+{
+	kfree(ctx->buf);
+}
+
+static int receive_buffer(struct file *filp,
+			  char *buf,
+			  off_t offset, /* offset in the file */
+			  int len,
+			  int *err)
+{
+	struct kvec kvec;
+	struct iov_iter iter;
+
+	kvec.iov_base = buf;
+	kvec.iov_len = len;
+	iov_iter_kvec(&iter, READ | ITER_KVEC, &kvec, 1, len);
+
+	return p9_client_read(filp->private_data, offset, &iter, err);
+}
+
+static int fast_filler(struct v9fs_readpages_ctx *ctx, struct page *page)
+{
+	int err;
+	int ret = 0;
+	char *kdata;
+	int to_page;
+	off_t off_in_buf;
+	struct inode *inode = page->mapping->host;
+
+	BUG_ON(!PageLocked(page));
+	/*
+	 * first, validate the buffer
+	 */
+	if (page->index < ctx->start_index ||
+	    ctx->start_index + ctx->num_pages < page->index) {
+		/*
+		 * No actual data in the buffer,
+		 * so actualize it
+		 */
+		ret = receive_buffer(ctx->filp,
+				     ctx->buf,
+				     page_offset(page),
+				     ctx->num_pages << PAGE_SHIFT,
+				     &err);
+		if (err) {
+			printk("failed to receive buffer off=%llu (%d)\n",
+			       (unsigned long long)page_offset(page),
+			       err);
+			ret = err;
+			goto done;
+		}
+		ctx->start_index = page->index;
+		ctx->len = ret;
+		ret = 0;
+	}
+	/*
+	 * fill the page with buffer's data
+	 */
+	off_in_buf = (page->index - ctx->start_index) << PAGE_SHIFT;
+	if (off_in_buf >= ctx->len) {
+		/*
+		 * No actual data to fill the page with
+		 */
+		ret = -1;
+		goto done;
+	}
+	to_page = ctx->len - off_in_buf;
+	if (to_page >= PAGE_SIZE)
+		to_page = PAGE_SIZE;
+
+	kdata = kmap_atomic(page);
+	memcpy(kdata, ctx->buf + off_in_buf, to_page);
+	memset(kdata + to_page, 0, PAGE_SIZE - to_page);
+	kunmap_atomic(kdata);
+
+	flush_dcache_page(page);
+	SetPageUptodate(page);
+	v9fs_readpage_to_fscache(inode, page);
+ done:
+	unlock_page(page);
+	return ret;
+}
+
+/**
+ * Try to read pages by groups. For every such group we issue only one
+ * read request to the server.
+ * @num_pages: maximal chunk of data (in pages) that can be passed per
+ * such request
+ */
+static int v9fs_readpages_tryfast(struct file *filp,
+				  struct address_space *mapping,
+				  struct list_head *pages,
+				  int num_pages)
+{
+	int ret;
+	struct v9fs_readpages_ctx ctx;
+
+	ret = init_readpages_ctx(&ctx, filp, mapping, num_pages);
+	if (ret)
+		/*
+		 * Can not allocate resources for the fast path,
+		 * so do it by slow way
+		 */
+		return read_cache_pages(mapping, pages,
+					(void *)v9fs_vfs_readpage, filp);
+
+	else
+		ret = read_cache_pages(mapping, pages,
+				       (void *)fast_filler, &ctx);
+	done_readpages_ctx(&ctx);
+	return ret;
+}
+
 /**
  * v9fs_vfs_readpages - read a set of pages from 9P
  *
@@ -114,6 +257,7 @@ static int v9fs_vfs_readpages(struct file *filp, struct address_space *mapping,
 {
 	int ret = 0;
 	struct inode *inode;
+	struct v9fs_flush_set *fset;
 
 	inode = mapping->host;
 	p9_debug(P9_DEBUG_VFS, "inode: %p file: %p\n", inode, filp);
@@ -122,7 +266,17 @@ static int v9fs_vfs_readpages(struct file *filp, struct address_space *mapping,
 	if (ret == 0)
 		return ret;
 
-	ret = read_cache_pages(mapping, pages, (void *)v9fs_vfs_readpage, filp);
+	fset = v9fs_inode2v9ses(mapping->host)->flush;
+	if (!fset)
+		/*
+		 * Do it by slow way
+		 */
+		ret = read_cache_pages(mapping, pages,
+				       (void *)v9fs_vfs_readpage, filp);
+	else
+		ret = v9fs_readpages_tryfast(filp, mapping,
+					     pages, fset->num_pages);
+
 	p9_debug(P9_DEBUG_VFS, "  = %d\n", ret);
 	return ret;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] 9p: v9fs new readpages.
  2016-10-10 17:24   ` [PATCH 2/2] 9p: v9fs new readpages Edward Shishkin
@ 2016-10-25 14:13     ` Alexander Graf
  2016-12-09 18:42       ` Edward Shishkin
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Graf @ 2016-10-25 14:13 UTC (permalink / raw)
  To: Edward Shishkin, Eric Van Hensbergen,
	V9FS Developers Mailing List, Linux Filesystem Development List
  Cc: Edward Shishkin, Claudio Fontana, QEMU Developers Mailing List,
	ZhangWei

On 10/10/2016 07:24 PM, Edward Shishkin wrote:
> Modify v9fs private ->readpages() method of address_space
> operations for merging pages into long 9p messages.
>
> Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com>
> ---
>   fs/9p/vfs_addr.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 155 insertions(+), 1 deletion(-)
>
> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> index e871886..4ad248e 100644
> --- a/fs/9p/vfs_addr.c
> +++ b/fs/9p/vfs_addr.c
> @@ -34,6 +34,7 @@
>   #include <linux/idr.h>
>   #include <linux/sched.h>
>   #include <linux/uio.h>
> +#include <linux/slab.h>
>   #include <net/9p/9p.h>
>   #include <net/9p/client.h>
>   #include <trace/events/writeback.h>
> @@ -99,6 +100,148 @@ static int v9fs_vfs_readpage(struct file *filp, struct page *page)
>   	return v9fs_fid_readpage(filp->private_data, page);
>   }
>   
> +/*
> + * Context for "fast readpages"
> + */
> +struct v9fs_readpages_ctx {
> +	struct file *filp;
> +	struct address_space *mapping;
> +	pgoff_t start_index; /* index of the first page with actual data */
> +	char *buf; /* buffer with actual data */
> +	int len; /* length of the actual data */
> +	int num_pages; /* maximal data chunk (in pages) that can be
> +			  passed per transmission */
> +};
> +
> +static int init_readpages_ctx(struct v9fs_readpages_ctx *ctx,
> +			      struct file *filp,
> +			      struct address_space *mapping,
> +			      int num_pages)
> +{
> +	memset(ctx, 0, sizeof(*ctx));
> +	ctx->buf = kmalloc(num_pages << PAGE_SHIFT, GFP_USER);

Doesn't this a) have potential information leak to user space and b) 
allow user space to allocate big amounts of kernel memory? A limited 
page pool would probably be better here.

I also don't quite grasp yet what pattern you're actually optimizing. 
Basically you're doing implicit read-ahead on behalf of the reader, 
right? So why would that be faster in a random 4k read scenario? Also, 
have you compared tmpfs hosted files to only benchmark the transmission 
path?

> +	if (!ctx->buf)
> +		return -ENOMEM;
> +	ctx->filp = filp;
> +	ctx->mapping = mapping;
> +	ctx->num_pages = num_pages;
> +	return 0;
> +}
> +
> +static void done_readpages_ctx(struct v9fs_readpages_ctx *ctx)
> +{
> +	kfree(ctx->buf);
> +}
> +
> +static int receive_buffer(struct file *filp,
> +			  char *buf,
> +			  off_t offset, /* offset in the file */
> +			  int len,
> +			  int *err)
> +{
> +	struct kvec kvec;
> +	struct iov_iter iter;
> +
> +	kvec.iov_base = buf;
> +	kvec.iov_len = len;
> +	iov_iter_kvec(&iter, READ | ITER_KVEC, &kvec, 1, len);
> +
> +	return p9_client_read(filp->private_data, offset, &iter, err);
> +}
> +
> +static int fast_filler(struct v9fs_readpages_ctx *ctx, struct page *page)
> +{
> +	int err;
> +	int ret = 0;
> +	char *kdata;
> +	int to_page;
> +	off_t off_in_buf;
> +	struct inode *inode = page->mapping->host;
> +
> +	BUG_ON(!PageLocked(page));
> +	/*
> +	 * first, validate the buffer
> +	 */
> +	if (page->index < ctx->start_index ||
> +	    ctx->start_index + ctx->num_pages < page->index) {
> +		/*
> +		 * No actual data in the buffer,
> +		 * so actualize it
> +		 */
> +		ret = receive_buffer(ctx->filp,
> +				     ctx->buf,
> +				     page_offset(page),
> +				     ctx->num_pages << PAGE_SHIFT,

Doesn't this potentially read beyond the end of the file?


Alex

> +				     &err);
> +		if (err) {
> +			printk("failed to receive buffer off=%llu (%d)\n",
> +			       (unsigned long long)page_offset(page),
> +			       err);
> +			ret = err;
> +			goto done;
> +		}
> +		ctx->start_index = page->index;
> +		ctx->len = ret;
> +		ret = 0;
> +	}
> +	/*
> +	 * fill the page with buffer's data
> +	 */
> +	off_in_buf = (page->index - ctx->start_index) << PAGE_SHIFT;
> +	if (off_in_buf >= ctx->len) {
> +		/*
> +		 * No actual data to fill the page with
> +		 */
> +		ret = -1;
> +		goto done;
> +	}
> +	to_page = ctx->len - off_in_buf;
> +	if (to_page >= PAGE_SIZE)
> +		to_page = PAGE_SIZE;
> +
> +	kdata = kmap_atomic(page);
> +	memcpy(kdata, ctx->buf + off_in_buf, to_page);
> +	memset(kdata + to_page, 0, PAGE_SIZE - to_page);
> +	kunmap_atomic(kdata);
> +
> +	flush_dcache_page(page);
> +	SetPageUptodate(page);
> +	v9fs_readpage_to_fscache(inode, page);
> + done:
> +	unlock_page(page);
> +	return ret;
> +}
> +
> +/**
> + * Try to read pages by groups. For every such group we issue only one
> + * read request to the server.
> + * @num_pages: maximal chunk of data (in pages) that can be passed per
> + * such request
> + */
> +static int v9fs_readpages_tryfast(struct file *filp,
> +				  struct address_space *mapping,
> +				  struct list_head *pages,
> +				  int num_pages)
> +{
> +	int ret;
> +	struct v9fs_readpages_ctx ctx;
> +
> +	ret = init_readpages_ctx(&ctx, filp, mapping, num_pages);
> +	if (ret)
> +		/*
> +		 * Can not allocate resources for the fast path,
> +		 * so do it by slow way
> +		 */
> +		return read_cache_pages(mapping, pages,
> +					(void *)v9fs_vfs_readpage, filp);
> +
> +	else
> +		ret = read_cache_pages(mapping, pages,
> +				       (void *)fast_filler, &ctx);
> +	done_readpages_ctx(&ctx);
> +	return ret;
> +}
> +
>   /**
>    * v9fs_vfs_readpages - read a set of pages from 9P
>    *
> @@ -114,6 +257,7 @@ static int v9fs_vfs_readpages(struct file *filp, struct address_space *mapping,
>   {
>   	int ret = 0;
>   	struct inode *inode;
> +	struct v9fs_flush_set *fset;
>   
>   	inode = mapping->host;
>   	p9_debug(P9_DEBUG_VFS, "inode: %p file: %p\n", inode, filp);
> @@ -122,7 +266,17 @@ static int v9fs_vfs_readpages(struct file *filp, struct address_space *mapping,
>   	if (ret == 0)
>   		return ret;
>   
> -	ret = read_cache_pages(mapping, pages, (void *)v9fs_vfs_readpage, filp);
> +	fset = v9fs_inode2v9ses(mapping->host)->flush;
> +	if (!fset)
> +		/*
> +		 * Do it by slow way
> +		 */
> +		ret = read_cache_pages(mapping, pages,
> +				       (void *)v9fs_vfs_readpage, filp);
> +	else
> +		ret = v9fs_readpages_tryfast(filp, mapping,
> +					     pages, fset->num_pages);
> +
>   	p9_debug(P9_DEBUG_VFS, "  = %d\n", ret);
>   	return ret;
>   }


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] 9p: v9fs new readpages.
  2016-10-25 14:13     ` [Qemu-devel] " Alexander Graf
@ 2016-12-09 18:42       ` Edward Shishkin
  0 siblings, 0 replies; 15+ messages in thread
From: Edward Shishkin @ 2016-12-09 18:42 UTC (permalink / raw)
  To: Alexander Graf, Eric Van Hensbergen, V9FS Developers Mailing List,
	Linux Filesystem Development List
  Cc: Edward Shishkin, Claudio Fontana, QEMU Developers Mailing List,
	ZhangWei

Hello Alexander,

Thank you for the comments.
Please, find my answers below.


On 10/25/2016 04:13 PM, Alexander Graf wrote:
> On 10/10/2016 07:24 PM, Edward Shishkin wrote:
>> Modify v9fs private ->readpages() method of address_space
>> operations for merging pages into long 9p messages.
>>
>> Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com>
>> ---
>>   fs/9p/vfs_addr.c | 156 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 155 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
>> index e871886..4ad248e 100644
>> --- a/fs/9p/vfs_addr.c
>> +++ b/fs/9p/vfs_addr.c
>> @@ -34,6 +34,7 @@
>>   #include <linux/idr.h>
>>   #include <linux/sched.h>
>>   #include <linux/uio.h>
>> +#include <linux/slab.h>
>>   #include <net/9p/9p.h>
>>   #include <net/9p/client.h>
>>   #include <trace/events/writeback.h>
>> @@ -99,6 +100,148 @@ static int v9fs_vfs_readpage(struct file *filp, 
>> struct page *page)
>>       return v9fs_fid_readpage(filp->private_data, page);
>>   }
>>   +/*
>> + * Context for "fast readpages"
>> + */
>> +struct v9fs_readpages_ctx {
>> +    struct file *filp;
>> +    struct address_space *mapping;
>> +    pgoff_t start_index; /* index of the first page with actual data */
>> +    char *buf; /* buffer with actual data */
>> +    int len; /* length of the actual data */
>> +    int num_pages; /* maximal data chunk (in pages) that can be
>> +              passed per transmission */
>> +};
>> +
>> +static int init_readpages_ctx(struct v9fs_readpages_ctx *ctx,
>> +                  struct file *filp,
>> +                  struct address_space *mapping,
>> +                  int num_pages)
>> +{
>> +    memset(ctx, 0, sizeof(*ctx));
>> +    ctx->buf = kmalloc(num_pages << PAGE_SHIFT, GFP_USER);
>
> Doesn't this a) have potential information leak to user space


Yes, allocation with such flag is a mistake. Will be fixed in v2.


> and b) allow user space to allocate big amounts of kernel memory?


Yes, I definitely missed a sanity check for the num_pages. Will be fixed 
in v2.


> A limited page pool would probably be better here.
>
> I also don't quite grasp yet what pattern you're actually optimizing.


reading/writing a large file in sequential/random manner.


> Basically you're doing implicit read-ahead on behalf of the reader, right?


Yes. As you can see, we always read (ctx->num_pages) number of pages.


> So why would that be faster in a random 4k read scenario?


You mean 41 MB/s vs 64 MB/s?
With such read-ahead it's more likely that an up-to-date page will be 
present in
the cache. So, why not? After all, our speedup of random 4K reads is not 
dramatic.


> Also, have you compared tmpfs hosted files to only benchmark the 
> transmission path?


No, I haven't. Only I/O speedup was a concern.
I can obtain such numbers, if interesting.


>
>> +    if (!ctx->buf)
>> +        return -ENOMEM;
>> +    ctx->filp = filp;
>> +    ctx->mapping = mapping;
>> +    ctx->num_pages = num_pages;
>> +    return 0;
>> +}
>> +
>> +static void done_readpages_ctx(struct v9fs_readpages_ctx *ctx)
>> +{
>> +    kfree(ctx->buf);
>> +}
>> +
>> +static int receive_buffer(struct file *filp,
>> +              char *buf,
>> +              off_t offset, /* offset in the file */
>> +              int len,
>> +              int *err)
>> +{
>> +    struct kvec kvec;
>> +    struct iov_iter iter;
>> +
>> +    kvec.iov_base = buf;
>> +    kvec.iov_len = len;
>> +    iov_iter_kvec(&iter, READ | ITER_KVEC, &kvec, 1, len);
>> +
>> +    return p9_client_read(filp->private_data, offset, &iter, err);
>> +}
>> +
>> +static int fast_filler(struct v9fs_readpages_ctx *ctx, struct page 
>> *page)
>> +{
>> +    int err;
>> +    int ret = 0;
>> +    char *kdata;
>> +    int to_page;
>> +    off_t off_in_buf;
>> +    struct inode *inode = page->mapping->host;
>> +
>> +    BUG_ON(!PageLocked(page));
>> +    /*
>> +     * first, validate the buffer
>> +     */
>> +    if (page->index < ctx->start_index ||
>> +        ctx->start_index + ctx->num_pages < page->index) {
>> +        /*
>> +         * No actual data in the buffer,
>> +         * so actualize it
>> +         */
>> +        ret = receive_buffer(ctx->filp,
>> +                     ctx->buf,
>> +                     page_offset(page),
>> +                     ctx->num_pages << PAGE_SHIFT,
>
> Doesn't this potentially read beyond the end of the file?


POSIX doesn't prohibit to read beyond the end of a file. The only 
requirement is that
the supplement should be filled with zeros.
Full pages of such supplement are filled with zeros from the buffer: 
here we rely on
the local file system on the host. Partial pages are filled with zeros 
by us at the place
I have marked with (*) below.

Thanks,
Edward.

>
> Alex
>
>> +                     &err);
>> +        if (err) {
>> +            printk("failed to receive buffer off=%llu (%d)\n",
>> +                   (unsigned long long)page_offset(page),
>> +                   err);
>> +            ret = err;
>> +            goto done;
>> +        }
>> +        ctx->start_index = page->index;
>> +        ctx->len = ret;
>> +        ret = 0;
>> +    }
>> +    /*
>> +     * fill the page with buffer's data
>> +     */
>> +    off_in_buf = (page->index - ctx->start_index) << PAGE_SHIFT;
>> +    if (off_in_buf >= ctx->len) {
>> +        /*
>> +         * No actual data to fill the page with
>> +         */
>> +        ret = -1;
>> +        goto done;
>> +    }
>> +    to_page = ctx->len - off_in_buf;
>> +    if (to_page >= PAGE_SIZE)
>> +        to_page = PAGE_SIZE;
>> +
>> +    kdata = kmap_atomic(page);
>> +    memcpy(kdata, ctx->buf + off_in_buf, to_page);
>> +    memset(kdata + to_page, 0, PAGE_SIZE - to_page);

(*)

>> +    kunmap_atomic(kdata);
>> +
>> +    flush_dcache_page(page);
>> +    SetPageUptodate(page);
>> +    v9fs_readpage_to_fscache(inode, page);
>> + done:
>> +    unlock_page(page);
>> +    return ret;
>> +}
>> +
>> +/**
>> + * Try to read pages by groups. For every such group we issue only one
>> + * read request to the server.
>> + * @num_pages: maximal chunk of data (in pages) that can be passed per
>> + * such request
>> + */
>> +static int v9fs_readpages_tryfast(struct file *filp,
>> +                  struct address_space *mapping,
>> +                  struct list_head *pages,
>> +                  int num_pages)
>> +{
>> +    int ret;
>> +    struct v9fs_readpages_ctx ctx;
>> +
>> +    ret = init_readpages_ctx(&ctx, filp, mapping, num_pages);
>> +    if (ret)
>> +        /*
>> +         * Can not allocate resources for the fast path,
>> +         * so do it by slow way
>> +         */
>> +        return read_cache_pages(mapping, pages,
>> +                    (void *)v9fs_vfs_readpage, filp);
>> +
>> +    else
>> +        ret = read_cache_pages(mapping, pages,
>> +                       (void *)fast_filler, &ctx);
>> +    done_readpages_ctx(&ctx);
>> +    return ret;
>> +}
>> +
>>   /**
>>    * v9fs_vfs_readpages - read a set of pages from 9P
>>    *
>> @@ -114,6 +257,7 @@ static int v9fs_vfs_readpages(struct file *filp, 
>> struct address_space *mapping,
>>   {
>>       int ret = 0;
>>       struct inode *inode;
>> +    struct v9fs_flush_set *fset;
>>         inode = mapping->host;
>>       p9_debug(P9_DEBUG_VFS, "inode: %p file: %p\n", inode, filp);
>> @@ -122,7 +266,17 @@ static int v9fs_vfs_readpages(struct file *filp, 
>> struct address_space *mapping,
>>       if (ret == 0)
>>           return ret;
>>   -    ret = read_cache_pages(mapping, pages, (void 
>> *)v9fs_vfs_readpage, filp);
>> +    fset = v9fs_inode2v9ses(mapping->host)->flush;
>> +    if (!fset)
>> +        /*
>> +         * Do it by slow way
>> +         */
>> +        ret = read_cache_pages(mapping, pages,
>> +                       (void *)v9fs_vfs_readpage, filp);
>> +    else
>> +        ret = v9fs_readpages_tryfast(filp, mapping,
>> +                         pages, fset->num_pages);
>> +
>>       p9_debug(P9_DEBUG_VFS, "  = %d\n", ret);
>>       return ret;
>>   }
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] 9p: v9fs add writepages.
  2016-10-10 17:24 ` [PATCH 1/2] 9p: v9fs add writepages Edward Shishkin
  2016-10-10 17:24   ` [PATCH 2/2] 9p: v9fs new readpages Edward Shishkin
@ 2016-10-25 14:01   ` Alexander Graf
  2016-12-09 18:43     ` Edward Shishkin
  1 sibling, 1 reply; 15+ messages in thread
From: Alexander Graf @ 2016-10-25 14:01 UTC (permalink / raw)
  To: Edward Shishkin, Eric Van Hensbergen,
	V9FS Developers Mailing List, Linux Filesystem Development List
  Cc: Edward Shishkin, Claudio Fontana, QEMU Developers Mailing List,
	ZhangWei

On 10/10/2016 07:24 PM, Edward Shishkin wrote:
> Add a v9fs private ->writepages() method of address_space
> operations for merging pages into long 9p messages.
>
> Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com>
> ---
>   fs/9p/v9fs.c      |  46 +++++++
>   fs/9p/v9fs.h      |  22 +++-
>   fs/9p/vfs_addr.c  | 357 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   fs/9p/vfs_super.c |   8 +-
>   4 files changed, 431 insertions(+), 2 deletions(-)
>
> diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
> index 072e759..3b49daf 100644
> --- a/fs/9p/v9fs.c
> +++ b/fs/9p/v9fs.c
> @@ -32,6 +32,7 @@
>   #include <linux/parser.h>
>   #include <linux/idr.h>
>   #include <linux/slab.h>
> +#include <linux/pagemap.h>
>   #include <net/9p/9p.h>
>   #include <net/9p/client.h>
>   #include <net/9p/transport.h>
> @@ -309,6 +310,49 @@ static int v9fs_parse_options(struct v9fs_session_info *v9ses, char *opts)
>   	return ret;
>   }
>   
> +void put_flush_set(struct v9fs_flush_set *fset)
> +{
> +	if (!fset)
> +		return;
> +	if (fset->pages)
> +		kfree(fset->pages);
> +	if (fset->buf)
> +		kfree(fset->buf);
> +	kfree(fset);
> +}
> +
> +/**
> + * Allocate and initalize flush set
> + * Pre-conditions: valid msize is set
> + */
> +int alloc_init_flush_set(struct v9fs_session_info *v9ses)
> +{
> +	int ret = -ENOMEM;
> +	int num_pages;
> +	struct v9fs_flush_set *fset = NULL;
> +
> +	num_pages = v9ses->clnt->msize >> PAGE_SHIFT;
> +	if (num_pages < 2)
> +		/* speedup impossible */
> +		return 0;
> +	fset = kzalloc(sizeof(*fset), GFP_KERNEL);
> +	if (!fset)
> +		goto error;
> +	fset->num_pages = num_pages;
> +	fset->pages = kzalloc(num_pages * sizeof(*fset->pages), GFP_KERNEL);
> +	if (!fset->pages)
> +		goto error;
> +	fset->buf = kzalloc(num_pages << PAGE_SHIFT, GFP_USER);
> +	if (!fset->buf)
> +		goto error;
> +	spin_lock_init(&(fset->lock));
> +	v9ses->flush = fset;
> +	return 0;
> + error:
> +	put_flush_set(fset);
> +	return ret;
> +}
> +
>   /**
>    * v9fs_session_init - initialize session
>    * @v9ses: session information structure
> @@ -444,6 +488,8 @@ void v9fs_session_close(struct v9fs_session_info *v9ses)
>   	kfree(v9ses->uname);
>   	kfree(v9ses->aname);
>   
> +	put_flush_set(v9ses->flush);
> +
>   	bdi_destroy(&v9ses->bdi);
>   
>   	spin_lock(&v9fs_sessionlist_lock);
> diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
> index 6877050..d1092e4 100644
> --- a/fs/9p/v9fs.h
> +++ b/fs/9p/v9fs.h
> @@ -23,6 +23,7 @@
>   #ifndef FS_9P_V9FS_H
>   #define FS_9P_V9FS_H
>   
> +#include <linux/kconfig.h>
>   #include <linux/backing-dev.h>
>   
>   /**
> @@ -69,6 +70,13 @@ enum p9_cache_modes {
>   	CACHE_FSCACHE,
>   };
>   
> +struct v9fs_flush_set {
> +        struct page **pages;
> +	int num_pages;
> +        char *buf;
> +	spinlock_t lock;
> +};
> +
>   /**
>    * struct v9fs_session_info - per-instance session information
>    * @flags: session options of type &p9_session_flags
> @@ -105,7 +113,7 @@ struct v9fs_session_info {
>   	char *cachetag;
>   	struct fscache_cookie *fscache;
>   #endif
> -
> +	struct v9fs_flush_set *flush; /* flush set for writepages */
>   	char *uname;		/* user name to mount as */
>   	char *aname;		/* name of remote hierarchy being mounted */
>   	unsigned int maxdata;	/* max data for client interface */
> @@ -158,6 +166,8 @@ extern const struct inode_operations v9fs_symlink_inode_operations_dotl;
>   extern struct inode *v9fs_inode_from_fid_dotl(struct v9fs_session_info *v9ses,
>   					      struct p9_fid *fid,
>   					      struct super_block *sb, int new);
> +extern int alloc_init_flush_set(struct v9fs_session_info *v9ses);
> +extern void put_flush_set(struct v9fs_flush_set *fset);
>   
>   /* other default globals */
>   #define V9FS_PORT	564
> @@ -222,4 +232,14 @@ v9fs_get_new_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
>   		return v9fs_inode_from_fid(v9ses, fid, sb, 1);
>   }
>   
> +static inline int spin_trylock_flush_set(struct v9fs_flush_set *fset)
> +{
> +	return spin_trylock(&(fset->lock));
> +}
> +
> +static inline void spin_unlock_flush_set(struct v9fs_flush_set *fset)
> +{
> +	spin_unlock(&(fset->lock));
> +}
> +
>   #endif
> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> index 6181ad7..e871886 100644
> --- a/fs/9p/vfs_addr.c
> +++ b/fs/9p/vfs_addr.c
> @@ -36,6 +36,7 @@
>   #include <linux/uio.h>
>   #include <net/9p/9p.h>
>   #include <net/9p/client.h>
> +#include <trace/events/writeback.h>
>   
>   #include "v9fs.h"
>   #include "v9fs_vfs.h"
> @@ -209,6 +210,361 @@ static int v9fs_vfs_writepage(struct page *page, struct writeback_control *wbc)
>   	return retval;
>   }
>   
> +static void redirty_pages_for_writeback(struct page **pages, int nr,
> +					struct writeback_control *wbc)
> +{
> +	int i;
> +	for (i = 0; i < nr; i++) {
> +		lock_page(pages[i]);
> +		redirty_page_for_writepage(wbc, pages[i]);
> +		unlock_page(pages[i]);
> +	}
> +}
> +
> +static void set_pages_error(struct page **pages, int nr, int error)
> +{
> +	int i;
> +	for (i = 0; i < nr; i++) {
> +		lock_page(pages[i]);
> +		SetPageError(pages[i]);
> +		mapping_set_error(pages[i]->mapping, error);
> +		unlock_page(pages[i]);
> +	}
> +}
> +
> +#define V9FS_WRITEPAGES_DEBUG   (0)
> +
> +struct flush_context {
> +	struct writeback_control *wbc;
> +	struct address_space *mapping;
> +	struct v9fs_flush_set *fset;
> +	pgoff_t done_index;
> +	pgoff_t writeback_index;
> +	pgoff_t index;
> +	pgoff_t end; /* Inclusive */
> +	const char *msg;
> +	int cycled;
> +	int range_whole;
> +	int done;
> +};
> +
> +/**
> + * Copy a page with file's data to buffer.
> + * Handle races with truncate, etc.
> + * Return number of copied bytes
> + *
> + * @page: page to copy data from;
> + * @page_nr: serial number of the page
> + */
> +static int flush_page(struct page *page, int page_nr, struct flush_context *ctx)
> +{
> +	char *kdata;
> +	loff_t isize;
> +	int copied = 0;
> +	struct writeback_control *wbc = ctx->wbc;
> +	/*
> +	 * At this point, the page may be truncated or
> +	 * invalidated (changing page->mapping to NULL), or
> +	 * even swizzled back from swapper_space to tmpfs file
> +	 * mapping. However, page->index will not change
> +	 * because we have a reference on the page.
> +	 */
> +	if (page->index > ctx->end) {
> +		/*
> +		 * can't be range_cyclic (1st pass) because
> +		 * end == -1 in that case.
> +		 */
> +		ctx->done = 1;
> +		ctx->msg = "page out of range";
> +		goto exit;
> +	}
> +	ctx->done_index = page->index;
> +	lock_page(page);
> +	/*
> +	 * Page truncated or invalidated. We can freely skip it
> +	 * then, even for data integrity operations: the page
> +	 * has disappeared concurrently, so there could be no
> +	 * real expectation of this data interity operation
> +	 * even if there is now a new, dirty page at the same
> +	 * pagecache address.
> +	 */
> +	if (unlikely(page->mapping != ctx->mapping)) {
> +		unlock_page(page);
> +		ctx->msg = "page truncated or invalidated";
> +		goto exit;
> +	}
> +	if (!PageDirty(page)) {
> +		/*
> +		 * someone wrote it for us
> +		 */
> +		unlock_page(page);
> +		ctx->msg = "page not dirty";
> +		goto exit;
> +	}
> +	if (PageWriteback(page)) {
> +		if (wbc->sync_mode != WB_SYNC_NONE)
> +			wait_on_page_writeback(page);
> +		else {
> +			unlock_page(page);
> +			ctx->msg = "page is writeback";
> +			goto exit;
> +		}
> +	}
> +	BUG_ON(PageWriteback(page));
> +	if (!clear_page_dirty_for_io(page)) {
> +		unlock_page(page);
> +		ctx->msg = "failed to clear page dirty";
> +		goto exit;
> +	}
> +	trace_wbc_writepage(wbc, inode_to_bdi(ctx->mapping->host));
> +
> +	set_page_writeback(page);
> +	isize = i_size_read(ctx->mapping->host);
> +	if (page->index == isize >> PAGE_SHIFT)
> +		copied = isize & ~PAGE_MASK;
> +	else
> +		copied = PAGE_SIZE;
> +	kdata = kmap_atomic(page);
> +	memcpy(ctx->fset->buf + (page_nr << PAGE_SHIFT), kdata, copied);
> +	kunmap_atomic(kdata);
> +	end_page_writeback(page);
> +
> +	unlock_page(page);
> +	/*
> +	 * We stop writing back only if we are not doing
> +	 * integrity sync. In case of integrity sync we have to
> +	 * keep going until we have written all the pages
> +	 * we tagged for writeback prior to entering this loop.
> +	 */
> +	if (--wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE)
> +		ctx->done = 1;
> + exit:
> +	return copied;
> +}
> +
> +static int send_buffer(off_t offset, int len, struct flush_context *ctx)
> +{
> +	int ret = 0;
> +	struct kvec kvec;
> +	struct iov_iter iter;
> +	struct v9fs_inode *v9inode = V9FS_I(ctx->mapping->host);
> +
> +	kvec.iov_base = ctx->fset->buf;
> +	kvec.iov_len = len;
> +	iov_iter_kvec(&iter, WRITE | ITER_KVEC, &kvec, 1, len);
> +	BUG_ON(!v9inode->writeback_fid);
> +
> +	p9_client_write(v9inode->writeback_fid, offset, &iter, &ret);
> +	return ret;
> +}
> +
> +/**
> + * Helper function for managing 9pFS write requests.
> + * The main purpose of this function is to provide support for
> + * the coalescing of several pages into a single 9p message.
> + * This is similarly to NFS's pagelist.
> + *
> + * Copy pages with adjusent indices to a buffer and send it to
> + * the server.

Why do you need to copy the pages? The transport below 9p - virtio in 
your case - has native scatter-gather support, so you don't need to copy 
anything, no?


Alex


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] 9p: v9fs add writepages.
  2016-10-25 14:01   ` [Qemu-devel] [PATCH 1/2] 9p: v9fs add writepages Alexander Graf
@ 2016-12-09 18:43     ` Edward Shishkin
  0 siblings, 0 replies; 15+ messages in thread
From: Edward Shishkin @ 2016-12-09 18:43 UTC (permalink / raw)
  To: Alexander Graf, Eric Van Hensbergen, V9FS Developers Mailing List,
	Linux Filesystem Development List
  Cc: Edward Shishkin, Claudio Fontana, QEMU Developers Mailing List,
	ZhangWei

On 10/25/2016 04:01 PM, Alexander Graf wrote:
> On 10/10/2016 07:24 PM, Edward Shishkin wrote:
>> Add a v9fs private ->writepages() method of address_space
>> operations for merging pages into long 9p messages.
>>
>> Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com>
>> ---
>>   fs/9p/v9fs.c      |  46 +++++++
>>   fs/9p/v9fs.h      |  22 +++-
>>   fs/9p/vfs_addr.c  | 357 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/9p/vfs_super.c |   8 +-
>>   4 files changed, 431 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
>> index 072e759..3b49daf 100644
>> --- a/fs/9p/v9fs.c
>> +++ b/fs/9p/v9fs.c
>> @@ -32,6 +32,7 @@
>>   #include <linux/parser.h>
>>   #include <linux/idr.h>
>>   #include <linux/slab.h>
>> +#include <linux/pagemap.h>
>>   #include <net/9p/9p.h>
>>   #include <net/9p/client.h>
>>   #include <net/9p/transport.h>
>> @@ -309,6 +310,49 @@ static int v9fs_parse_options(struct 
>> v9fs_session_info *v9ses, char *opts)
>>       return ret;
>>   }
>>   +void put_flush_set(struct v9fs_flush_set *fset)
>> +{
>> +    if (!fset)
>> +        return;
>> +    if (fset->pages)
>> +        kfree(fset->pages);
>> +    if (fset->buf)
>> +        kfree(fset->buf);
>> +    kfree(fset);
>> +}
>> +
>> +/**
>> + * Allocate and initalize flush set
>> + * Pre-conditions: valid msize is set
>> + */
>> +int alloc_init_flush_set(struct v9fs_session_info *v9ses)
>> +{
>> +    int ret = -ENOMEM;
>> +    int num_pages;
>> +    struct v9fs_flush_set *fset = NULL;
>> +
>> +    num_pages = v9ses->clnt->msize >> PAGE_SHIFT;
>> +    if (num_pages < 2)
>> +        /* speedup impossible */
>> +        return 0;
>> +    fset = kzalloc(sizeof(*fset), GFP_KERNEL);
>> +    if (!fset)
>> +        goto error;
>> +    fset->num_pages = num_pages;
>> +    fset->pages = kzalloc(num_pages * sizeof(*fset->pages), 
>> GFP_KERNEL);
>> +    if (!fset->pages)
>> +        goto error;
>> +    fset->buf = kzalloc(num_pages << PAGE_SHIFT, GFP_USER);
>> +    if (!fset->buf)
>> +        goto error;
>> +    spin_lock_init(&(fset->lock));
>> +    v9ses->flush = fset;
>> +    return 0;
>> + error:
>> +    put_flush_set(fset);
>> +    return ret;
>> +}
>> +
>>   /**
>>    * v9fs_session_init - initialize session
>>    * @v9ses: session information structure
>> @@ -444,6 +488,8 @@ void v9fs_session_close(struct v9fs_session_info 
>> *v9ses)
>>       kfree(v9ses->uname);
>>       kfree(v9ses->aname);
>>   +    put_flush_set(v9ses->flush);
>> +
>>       bdi_destroy(&v9ses->bdi);
>>         spin_lock(&v9fs_sessionlist_lock);
>> diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
>> index 6877050..d1092e4 100644
>> --- a/fs/9p/v9fs.h
>> +++ b/fs/9p/v9fs.h
>> @@ -23,6 +23,7 @@
>>   #ifndef FS_9P_V9FS_H
>>   #define FS_9P_V9FS_H
>>   +#include <linux/kconfig.h>
>>   #include <linux/backing-dev.h>
>>     /**
>> @@ -69,6 +70,13 @@ enum p9_cache_modes {
>>       CACHE_FSCACHE,
>>   };
>>   +struct v9fs_flush_set {
>> +        struct page **pages;
>> +    int num_pages;
>> +        char *buf;
>> +    spinlock_t lock;
>> +};
>> +
>>   /**
>>    * struct v9fs_session_info - per-instance session information
>>    * @flags: session options of type &p9_session_flags
>> @@ -105,7 +113,7 @@ struct v9fs_session_info {
>>       char *cachetag;
>>       struct fscache_cookie *fscache;
>>   #endif
>> -
>> +    struct v9fs_flush_set *flush; /* flush set for writepages */
>>       char *uname;        /* user name to mount as */
>>       char *aname;        /* name of remote hierarchy being mounted */
>>       unsigned int maxdata;    /* max data for client interface */
>> @@ -158,6 +166,8 @@ extern const struct inode_operations 
>> v9fs_symlink_inode_operations_dotl;
>>   extern struct inode *v9fs_inode_from_fid_dotl(struct 
>> v9fs_session_info *v9ses,
>>                             struct p9_fid *fid,
>>                             struct super_block *sb, int new);
>> +extern int alloc_init_flush_set(struct v9fs_session_info *v9ses);
>> +extern void put_flush_set(struct v9fs_flush_set *fset);
>>     /* other default globals */
>>   #define V9FS_PORT    564
>> @@ -222,4 +232,14 @@ v9fs_get_new_inode_from_fid(struct 
>> v9fs_session_info *v9ses, struct p9_fid *fid,
>>           return v9fs_inode_from_fid(v9ses, fid, sb, 1);
>>   }
>>   +static inline int spin_trylock_flush_set(struct v9fs_flush_set *fset)
>> +{
>> +    return spin_trylock(&(fset->lock));
>> +}
>> +
>> +static inline void spin_unlock_flush_set(struct v9fs_flush_set *fset)
>> +{
>> +    spin_unlock(&(fset->lock));
>> +}
>> +
>>   #endif
>> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
>> index 6181ad7..e871886 100644
>> --- a/fs/9p/vfs_addr.c
>> +++ b/fs/9p/vfs_addr.c
>> @@ -36,6 +36,7 @@
>>   #include <linux/uio.h>
>>   #include <net/9p/9p.h>
>>   #include <net/9p/client.h>
>> +#include <trace/events/writeback.h>
>>     #include "v9fs.h"
>>   #include "v9fs_vfs.h"
>> @@ -209,6 +210,361 @@ static int v9fs_vfs_writepage(struct page 
>> *page, struct writeback_control *wbc)
>>       return retval;
>>   }
>>   +static void redirty_pages_for_writeback(struct page **pages, int nr,
>> +                    struct writeback_control *wbc)
>> +{
>> +    int i;
>> +    for (i = 0; i < nr; i++) {
>> +        lock_page(pages[i]);
>> +        redirty_page_for_writepage(wbc, pages[i]);
>> +        unlock_page(pages[i]);
>> +    }
>> +}
>> +
>> +static void set_pages_error(struct page **pages, int nr, int error)
>> +{
>> +    int i;
>> +    for (i = 0; i < nr; i++) {
>> +        lock_page(pages[i]);
>> +        SetPageError(pages[i]);
>> +        mapping_set_error(pages[i]->mapping, error);
>> +        unlock_page(pages[i]);
>> +    }
>> +}
>> +
>> +#define V9FS_WRITEPAGES_DEBUG   (0)
>> +
>> +struct flush_context {
>> +    struct writeback_control *wbc;
>> +    struct address_space *mapping;
>> +    struct v9fs_flush_set *fset;
>> +    pgoff_t done_index;
>> +    pgoff_t writeback_index;
>> +    pgoff_t index;
>> +    pgoff_t end; /* Inclusive */
>> +    const char *msg;
>> +    int cycled;
>> +    int range_whole;
>> +    int done;
>> +};
>> +
>> +/**
>> + * Copy a page with file's data to buffer.
>> + * Handle races with truncate, etc.
>> + * Return number of copied bytes
>> + *
>> + * @page: page to copy data from;
>> + * @page_nr: serial number of the page
>> + */
>> +static int flush_page(struct page *page, int page_nr, struct 
>> flush_context *ctx)
>> +{
>> +    char *kdata;
>> +    loff_t isize;
>> +    int copied = 0;
>> +    struct writeback_control *wbc = ctx->wbc;
>> +    /*
>> +     * At this point, the page may be truncated or
>> +     * invalidated (changing page->mapping to NULL), or
>> +     * even swizzled back from swapper_space to tmpfs file
>> +     * mapping. However, page->index will not change
>> +     * because we have a reference on the page.
>> +     */
>> +    if (page->index > ctx->end) {
>> +        /*
>> +         * can't be range_cyclic (1st pass) because
>> +         * end == -1 in that case.
>> +         */
>> +        ctx->done = 1;
>> +        ctx->msg = "page out of range";
>> +        goto exit;
>> +    }
>> +    ctx->done_index = page->index;
>> +    lock_page(page);
>> +    /*
>> +     * Page truncated or invalidated. We can freely skip it
>> +     * then, even for data integrity operations: the page
>> +     * has disappeared concurrently, so there could be no
>> +     * real expectation of this data interity operation
>> +     * even if there is now a new, dirty page at the same
>> +     * pagecache address.
>> +     */
>> +    if (unlikely(page->mapping != ctx->mapping)) {
>> +        unlock_page(page);
>> +        ctx->msg = "page truncated or invalidated";
>> +        goto exit;
>> +    }
>> +    if (!PageDirty(page)) {
>> +        /*
>> +         * someone wrote it for us
>> +         */
>> +        unlock_page(page);
>> +        ctx->msg = "page not dirty";
>> +        goto exit;
>> +    }
>> +    if (PageWriteback(page)) {
>> +        if (wbc->sync_mode != WB_SYNC_NONE)
>> +            wait_on_page_writeback(page);
>> +        else {
>> +            unlock_page(page);
>> +            ctx->msg = "page is writeback";
>> +            goto exit;
>> +        }
>> +    }
>> +    BUG_ON(PageWriteback(page));
>> +    if (!clear_page_dirty_for_io(page)) {
>> +        unlock_page(page);
>> +        ctx->msg = "failed to clear page dirty";
>> +        goto exit;
>> +    }
>> +    trace_wbc_writepage(wbc, inode_to_bdi(ctx->mapping->host));
>> +
>> +    set_page_writeback(page);
>> +    isize = i_size_read(ctx->mapping->host);
>> +    if (page->index == isize >> PAGE_SHIFT)
>> +        copied = isize & ~PAGE_MASK;
>> +    else
>> +        copied = PAGE_SIZE;
>> +    kdata = kmap_atomic(page);
>> +    memcpy(ctx->fset->buf + (page_nr << PAGE_SHIFT), kdata, copied);
>> +    kunmap_atomic(kdata);
>> +    end_page_writeback(page);
>> +
>> +    unlock_page(page);
>> +    /*
>> +     * We stop writing back only if we are not doing
>> +     * integrity sync. In case of integrity sync we have to
>> +     * keep going until we have written all the pages
>> +     * we tagged for writeback prior to entering this loop.
>> +     */
>> +    if (--wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE)
>> +        ctx->done = 1;
>> + exit:
>> +    return copied;
>> +}
>> +
>> +static int send_buffer(off_t offset, int len, struct flush_context 
>> *ctx)
>> +{
>> +    int ret = 0;
>> +    struct kvec kvec;
>> +    struct iov_iter iter;
>> +    struct v9fs_inode *v9inode = V9FS_I(ctx->mapping->host);
>> +
>> +    kvec.iov_base = ctx->fset->buf;
>> +    kvec.iov_len = len;
>> +    iov_iter_kvec(&iter, WRITE | ITER_KVEC, &kvec, 1, len);
>> +    BUG_ON(!v9inode->writeback_fid);
>> +
>> +    p9_client_write(v9inode->writeback_fid, offset, &iter, &ret);
>> +    return ret;
>> +}
>> +
>> +/**
>> + * Helper function for managing 9pFS write requests.
>> + * The main purpose of this function is to provide support for
>> + * the coalescing of several pages into a single 9p message.
>> + * This is similarly to NFS's pagelist.
>> + *
>> + * Copy pages with adjusent indices to a buffer and send it to
>> + * the server.
>
> Why do you need to copy the pages? The transport below 9p - virtio in 
> your case - has native scatter-gather support, so you don't need to 
> copy anything, no?
>


Perhaps we can avoid copying pages. However it would mean modification
of the 9P file transfer protocol, which is not aware of pages. I suspect 
that
such activity requires substantial sponsorship, and I am not sure that 
it will
be interesting for Huawei.

Thanks,
Edward.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC][PATCH 0/7] 9p: v9fs read and write speedup - V2
  2016-10-10 17:24 [RFC][PATCH 0/2] 9p: v9fs read and write speedup Edward Shishkin
  2016-10-10 17:24 ` [PATCH 1/2] 9p: v9fs add writepages Edward Shishkin
@ 2016-12-12 18:13 ` Edward Shishkin
  2016-12-12 18:15   ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin
  1 sibling, 1 reply; 15+ messages in thread
From: Edward Shishkin @ 2016-12-12 18:13 UTC (permalink / raw)
  To: Eric Van Hensbergen, V9FS Developers Mailing List,
	Linux Filesystem Development List
  Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana,
	Edward Shishkin, Greg Kurz, Alexander Graf

Hello everyone,

Version 2 of the patch-series contains cleanups and bug-fixes.
The patches 1, 2 remain unchanged. The patches 3, 4, 5 are
cleanups suggested by Fengguang Wu. The patches 6, 7 are fixups
for bugs found by Alexander Graf.

Any comments, suggestions are welcome as usual.

Thanks,
Edward.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/7] 9p: v9fs add writepages.
  2016-12-12 18:13 ` [RFC][PATCH 0/7] 9p: v9fs read and write speedup - V2 Edward Shishkin
@ 2016-12-12 18:15   ` edward.shishkin
  2016-12-12 18:15     ` [PATCH 2/7] 9p: v9fs new readpages edward.shishkin
                       ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: edward.shishkin @ 2016-12-12 18:15 UTC (permalink / raw)
  To: Eric Van Hensbergen, V9FS Developers Mailing List,
	Linux Filesystem Development List
  Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana,
	Greg Kurz, Alexander Graf, Eduard Shishkin

From: Eduard Shishkin <eduard.shishkin@huawei.com>

Add a v9fs private ->writepages() method of address_space
operations for merging pages into long 9p messages.

Signed-off-by: Eduard Shishkin <eduard.shishkin@huawei.com>
---
 fs/9p/v9fs.c      |  46 +++++++
 fs/9p/v9fs.h      |  22 +++-
 fs/9p/vfs_addr.c  | 357 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/9p/vfs_super.c |   8 +-
 4 files changed, 431 insertions(+), 2 deletions(-)

diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index 072e759..3b49daf 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -32,6 +32,7 @@
 #include <linux/parser.h>
 #include <linux/idr.h>
 #include <linux/slab.h>
+#include <linux/pagemap.h>
 #include <net/9p/9p.h>
 #include <net/9p/client.h>
 #include <net/9p/transport.h>
@@ -309,6 +310,49 @@ static int v9fs_parse_options(struct v9fs_session_info *v9ses, char *opts)
 	return ret;
 }
 
+void put_flush_set(struct v9fs_flush_set *fset)
+{
+	if (!fset)
+		return;
+	if (fset->pages)
+		kfree(fset->pages);
+	if (fset->buf)
+		kfree(fset->buf);
+	kfree(fset);
+}
+
+/**
+ * Allocate and initalize flush set
+ * Pre-conditions: valid msize is set
+ */
+int alloc_init_flush_set(struct v9fs_session_info *v9ses)
+{
+	int ret = -ENOMEM;
+	int num_pages;
+	struct v9fs_flush_set *fset = NULL;
+
+	num_pages = v9ses->clnt->msize >> PAGE_SHIFT;
+	if (num_pages < 2)
+		/* speedup impossible */
+		return 0;
+	fset = kzalloc(sizeof(*fset), GFP_KERNEL);
+	if (!fset)
+		goto error;
+	fset->num_pages = num_pages;
+	fset->pages = kzalloc(num_pages * sizeof(*fset->pages), GFP_KERNEL);
+	if (!fset->pages)
+		goto error;
+	fset->buf = kzalloc(num_pages << PAGE_SHIFT, GFP_USER);
+	if (!fset->buf)
+		goto error;
+	spin_lock_init(&(fset->lock));
+	v9ses->flush = fset;
+	return 0;
+ error:
+	put_flush_set(fset);
+	return ret;
+}
+
 /**
  * v9fs_session_init - initialize session
  * @v9ses: session information structure
@@ -444,6 +488,8 @@ void v9fs_session_close(struct v9fs_session_info *v9ses)
 	kfree(v9ses->uname);
 	kfree(v9ses->aname);
 
+	put_flush_set(v9ses->flush);
+
 	bdi_destroy(&v9ses->bdi);
 
 	spin_lock(&v9fs_sessionlist_lock);
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 443d12e..13a9db1 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -23,6 +23,7 @@
 #ifndef FS_9P_V9FS_H
 #define FS_9P_V9FS_H
 
+#include <linux/kconfig.h>
 #include <linux/backing-dev.h>
 
 /**
@@ -69,6 +70,13 @@ enum p9_cache_modes {
 	CACHE_FSCACHE,
 };
 
+struct v9fs_flush_set {
+        struct page **pages;
+	int num_pages;
+        char *buf;
+	spinlock_t lock;
+};
+
 /**
  * struct v9fs_session_info - per-instance session information
  * @flags: session options of type &p9_session_flags
@@ -105,7 +113,7 @@ struct v9fs_session_info {
 	char *cachetag;
 	struct fscache_cookie *fscache;
 #endif
-
+	struct v9fs_flush_set *flush; /* flush set for writepages */
 	char *uname;		/* user name to mount as */
 	char *aname;		/* name of remote hierarchy being mounted */
 	unsigned int maxdata;	/* max data for client interface */
@@ -159,6 +167,8 @@ extern const struct inode_operations v9fs_symlink_inode_operations_dotl;
 extern struct inode *v9fs_inode_from_fid_dotl(struct v9fs_session_info *v9ses,
 					      struct p9_fid *fid,
 					      struct super_block *sb, int new);
+extern int alloc_init_flush_set(struct v9fs_session_info *v9ses);
+extern void put_flush_set(struct v9fs_flush_set *fset);
 
 /* other default globals */
 #define V9FS_PORT	564
@@ -223,4 +233,14 @@ v9fs_get_new_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
 		return v9fs_inode_from_fid(v9ses, fid, sb, 1);
 }
 
+static inline int spin_trylock_flush_set(struct v9fs_flush_set *fset)
+{
+	return spin_trylock(&(fset->lock));
+}
+
+static inline void spin_unlock_flush_set(struct v9fs_flush_set *fset)
+{
+	spin_unlock(&(fset->lock));
+}
+
 #endif
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 6181ad7..e871886 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -36,6 +36,7 @@
 #include <linux/uio.h>
 #include <net/9p/9p.h>
 #include <net/9p/client.h>
+#include <trace/events/writeback.h>
 
 #include "v9fs.h"
 #include "v9fs_vfs.h"
@@ -209,6 +210,361 @@ static int v9fs_vfs_writepage(struct page *page, struct writeback_control *wbc)
 	return retval;
 }
 
+static void redirty_pages_for_writeback(struct page **pages, int nr,
+					struct writeback_control *wbc)
+{
+	int i;
+	for (i = 0; i < nr; i++) {
+		lock_page(pages[i]);
+		redirty_page_for_writepage(wbc, pages[i]);
+		unlock_page(pages[i]);
+	}
+}
+
+static void set_pages_error(struct page **pages, int nr, int error)
+{
+	int i;
+	for (i = 0; i < nr; i++) {
+		lock_page(pages[i]);
+		SetPageError(pages[i]);
+		mapping_set_error(pages[i]->mapping, error);
+		unlock_page(pages[i]);
+	}
+}
+
+#define V9FS_WRITEPAGES_DEBUG   (0)
+
+struct flush_context {
+	struct writeback_control *wbc;
+	struct address_space *mapping;
+	struct v9fs_flush_set *fset;
+	pgoff_t done_index;
+	pgoff_t writeback_index;
+	pgoff_t index;
+	pgoff_t end; /* Inclusive */
+	const char *msg;
+	int cycled;
+	int range_whole;
+	int done;
+};
+
+/**
+ * Copy a page with file's data to buffer.
+ * Handle races with truncate, etc.
+ * Return number of copied bytes
+ *
+ * @page: page to copy data from;
+ * @page_nr: serial number of the page
+ */
+static int flush_page(struct page *page, int page_nr, struct flush_context *ctx)
+{
+	char *kdata;
+	loff_t isize;
+	int copied = 0;
+	struct writeback_control *wbc = ctx->wbc;
+	/*
+	 * At this point, the page may be truncated or
+	 * invalidated (changing page->mapping to NULL), or
+	 * even swizzled back from swapper_space to tmpfs file
+	 * mapping. However, page->index will not change
+	 * because we have a reference on the page.
+	 */
+	if (page->index > ctx->end) {
+		/*
+		 * can't be range_cyclic (1st pass) because
+		 * end == -1 in that case.
+		 */
+		ctx->done = 1;
+		ctx->msg = "page out of range";
+		goto exit;
+	}
+	ctx->done_index = page->index;
+	lock_page(page);
+	/*
+	 * Page truncated or invalidated. We can freely skip it
+	 * then, even for data integrity operations: the page
+	 * has disappeared concurrently, so there could be no
+	 * real expectation of this data interity operation
+	 * even if there is now a new, dirty page at the same
+	 * pagecache address.
+	 */
+	if (unlikely(page->mapping != ctx->mapping)) {
+		unlock_page(page);
+		ctx->msg = "page truncated or invalidated";
+		goto exit;
+	}
+	if (!PageDirty(page)) {
+		/*
+		 * someone wrote it for us
+		 */
+		unlock_page(page);
+		ctx->msg = "page not dirty";
+		goto exit;
+	}
+	if (PageWriteback(page)) {
+		if (wbc->sync_mode != WB_SYNC_NONE)
+			wait_on_page_writeback(page);
+		else {
+			unlock_page(page);
+			ctx->msg = "page is writeback";
+			goto exit;
+		}
+	}
+	BUG_ON(PageWriteback(page));
+	if (!clear_page_dirty_for_io(page)) {
+		unlock_page(page);
+		ctx->msg = "failed to clear page dirty";
+		goto exit;
+	}
+	trace_wbc_writepage(wbc, inode_to_bdi(ctx->mapping->host));
+
+	set_page_writeback(page);
+	isize = i_size_read(ctx->mapping->host);
+	if (page->index == isize >> PAGE_SHIFT)
+		copied = isize & ~PAGE_MASK;
+	else
+		copied = PAGE_SIZE;
+	kdata = kmap_atomic(page);
+	memcpy(ctx->fset->buf + (page_nr << PAGE_SHIFT), kdata, copied);
+	kunmap_atomic(kdata);
+	end_page_writeback(page);
+
+	unlock_page(page);
+	/*
+	 * We stop writing back only if we are not doing
+	 * integrity sync. In case of integrity sync we have to
+	 * keep going until we have written all the pages
+	 * we tagged for writeback prior to entering this loop.
+	 */
+	if (--wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE)
+		ctx->done = 1;
+ exit:
+	return copied;
+}
+
+static int send_buffer(off_t offset, int len, struct flush_context *ctx)
+{
+	int ret = 0;
+	struct kvec kvec;
+	struct iov_iter iter;
+	struct v9fs_inode *v9inode = V9FS_I(ctx->mapping->host);
+
+	kvec.iov_base = ctx->fset->buf;
+	kvec.iov_len = len;
+	iov_iter_kvec(&iter, WRITE | ITER_KVEC, &kvec, 1, len);
+	BUG_ON(!v9inode->writeback_fid);
+
+	p9_client_write(v9inode->writeback_fid, offset, &iter, &ret);
+	return ret;
+}
+
+/**
+ * Helper function for managing 9pFS write requests.
+ * The main purpose of this function is to provide support for
+ * the coalescing of several pages into a single 9p message.
+ * This is similarly to NFS's pagelist.
+ *
+ * Copy pages with adjusent indices to a buffer and send it to
+ * the server.
+ *
+ * @pages: array of pages with ascending indices;
+ * @nr_pages: number of pages in the array;
+ */
+static int flush_pages(struct page **pages, int nr_pages,
+		       struct flush_context *ctx)
+{
+	int ret;
+	int pos = 0;
+	int iter_pos;
+	int iter_nrpages;
+	pgoff_t iter_page_idx;
+
+	while (pos < nr_pages) {
+
+		int i;
+		int iter_len = 0;
+		struct page *page;
+
+		iter_pos = pos;
+		iter_nrpages = 0;
+		iter_page_idx = pages[pos]->index;
+
+		for (i = 0; pos < nr_pages; i++) {
+			int from_page;
+
+			page = pages[pos];
+			if (page->index != iter_page_idx + i) {
+				/*
+				 * Hole in the indices,
+				 * further coalesce impossible.
+				 * Try to send what we have accumulated.
+				 * This page will be processed in the next
+				 * iteration
+				 */
+				goto iter_send;
+			}
+			from_page = flush_page(page, i, ctx);
+
+			iter_len += from_page;
+			iter_nrpages++;
+			pos++;
+
+			if (from_page != PAGE_SIZE) {
+				/*
+				 * Not full page was flushed,
+				 * further coalesce impossible.
+				 * Try to send what we have accumulated.
+				 */
+#if V9FS_WRITEPAGES_DEBUG
+				if (from_page == 0)
+				    printk("9p: page %lu is not flushed (%s)\n",
+					   page->index, ctx->msg);
+#endif
+				goto iter_send;
+			}
+		};
+	iter_send:
+		if (iter_len == 0)
+			/*
+			 * Nothing to send
+			 */
+			goto next_iter;
+		ret = send_buffer(iter_page_idx << PAGE_SHIFT,
+				  iter_len, ctx);
+		if (ret == -EAGAIN) {
+			redirty_pages_for_writeback(pages + iter_pos,
+						    iter_nrpages, ctx->wbc);
+			ret = 0;
+		} else if (ret < 0) {
+			/*
+			 * Something bad happened.
+			 * done_index is set past this chunk,
+			 * so media errors will not choke
+			 * background writeout for the entire
+			 * file.
+			 */
+			printk("9p: send_buffer failed (%d)\n", ret);
+
+			set_pages_error(pages + iter_pos, iter_nrpages, ret);
+			ctx->done_index =
+				pages[iter_pos + iter_nrpages - 1]->index + 1;
+			ctx->done = 1;
+			return ret;
+		} else
+			ret = 0;
+	next_iter:
+		if (ctx->done)
+			return ret;
+	};
+	return 0;
+}
+
+static void init_flush_context(struct flush_context *ctx,
+			       struct address_space *mapping,
+			       struct writeback_control *wbc,
+			       struct v9fs_flush_set *fset)
+{
+	ctx->wbc = wbc;
+	ctx->mapping = mapping;
+	ctx->fset = fset;
+	ctx->done = 0;
+	ctx->range_whole = 0;
+
+	if (wbc->range_cyclic) {
+		ctx->writeback_index = mapping->writeback_index;
+		ctx->index = ctx->writeback_index;
+		if (ctx->index == 0)
+			ctx->cycled = 1;
+		else
+			ctx->cycled = 0;
+		ctx->end = -1;
+	} else {
+		ctx->index = wbc->range_start >> PAGE_SHIFT;
+		ctx->end = wbc->range_end >> PAGE_SHIFT;
+		if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
+			ctx->range_whole = 1;
+		ctx->cycled = 1; /* ignore range_cyclic tests */
+	}
+}
+
+/**
+ * Pre-condition: flush set is locked
+ */
+static int v9fs_writepages_fastpath(struct address_space *mapping,
+				    struct writeback_control *wbc,
+				    struct v9fs_flush_set *fset)
+{
+	int ret = 0;
+	int tag;
+	int nr_pages;
+        struct page **pages = fset->pages;
+	struct flush_context ctx;
+
+	init_flush_context(&ctx, mapping, wbc, fset);
+
+	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+		tag = PAGECACHE_TAG_TOWRITE;
+	else
+		tag = PAGECACHE_TAG_DIRTY;
+retry:
+	if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+		tag_pages_for_writeback(mapping, ctx.index, ctx.end);
+
+	ctx.done_index = ctx.index;
+
+	while (!ctx.done && (ctx.index <= ctx.end)) {
+		int i;
+		nr_pages = find_get_pages_tag(mapping, &ctx.index, tag,
+					      1 + min(ctx.end - ctx.index,
+					      (pgoff_t)(fset->num_pages - 1)),
+					      pages);
+		if (nr_pages == 0)
+			break;
+
+		ret = flush_pages(pages, nr_pages, &ctx);
+		/*
+		 * unpin pages
+		 */
+		for (i = 0; i < nr_pages; i++)
+			put_page(pages[i]);
+		if (ret < 0)
+			break;
+		cond_resched();
+	}
+	if (!ctx.cycled && !ctx.done) {
+		/*
+		 * range_cyclic:
+		 * We hit the last page and there is more work
+		 * to be done: wrap back to the start of the file
+		 */
+		ctx.cycled = 1;
+		ctx.index = 0;
+		ctx.end = ctx.writeback_index - 1;
+		goto retry;
+	}
+	if (wbc->range_cyclic || (ctx.range_whole && wbc->nr_to_write > 0))
+		mapping->writeback_index = ctx.done_index;
+	return ret;
+}
+
+static int v9fs_writepages(struct address_space *mapping,
+			   struct writeback_control *wbc)
+{
+	int ret;
+	struct v9fs_flush_set *fset;
+
+	fset = v9fs_inode2v9ses(mapping->host)->flush;
+	if (!fset || !spin_trylock_flush_set(fset))
+		/*
+		 * do it by slow way
+		 */
+		return generic_writepages(mapping, wbc);
+
+	ret = v9fs_writepages_fastpath(mapping, wbc, fset);
+	spin_unlock_flush_set(fset);
+	return ret;
+}
+
 /**
  * v9fs_launder_page - Writeback a dirty page
  * Returns 0 on success.
@@ -342,6 +698,7 @@ const struct address_space_operations v9fs_addr_operations = {
 	.readpages = v9fs_vfs_readpages,
 	.set_page_dirty = __set_page_dirty_nobuffers,
 	.writepage = v9fs_vfs_writepage,
+	.writepages = v9fs_writepages,
 	.write_begin = v9fs_write_begin,
 	.write_end = v9fs_write_end,
 	.releasepage = v9fs_release_page,
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index de3ed86..c1f9af1 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -140,8 +140,14 @@ static struct dentry *v9fs_mount(struct file_system_type *fs_type, int flags,
 	}
 	v9fs_fill_super(sb, v9ses, flags, data);
 
-	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE)
+	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
+		retval = alloc_init_flush_set(v9ses);
+		if (IS_ERR(v9ses->flush)) {
+			retval = PTR_ERR(fid);
+			goto release_sb;
+		}
 		sb->s_d_op = &v9fs_cached_dentry_operations;
+	}
 	else
 		sb->s_d_op = &v9fs_dentry_operations;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/7] 9p: v9fs new readpages.
  2016-12-12 18:15   ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin
@ 2016-12-12 18:15     ` edward.shishkin
  2016-12-12 18:15     ` [PATCH 3/7] 9p: v9fs fix ifnullfree.cocci warnings edward.shishkin
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: edward.shishkin @ 2016-12-12 18:15 UTC (permalink / raw)
  To: Eric Van Hensbergen, V9FS Developers Mailing List,
	Linux Filesystem Development List
  Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana,
	Greg Kurz, Alexander Graf, Eduard Shishkin

From: Eduard Shishkin <eduard.shishkin@huawei.com>

Modify v9fs private ->readpages() method of address_space
operations for merging pages into long 9p messages.

Signed-off-by: Eduard Shishkin <eduard.shishkin@huawei.com>
---
 fs/9p/vfs_addr.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 155 insertions(+), 1 deletion(-)

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index e871886..4ad248e 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -34,6 +34,7 @@
 #include <linux/idr.h>
 #include <linux/sched.h>
 #include <linux/uio.h>
+#include <linux/slab.h>
 #include <net/9p/9p.h>
 #include <net/9p/client.h>
 #include <trace/events/writeback.h>
@@ -99,6 +100,148 @@ static int v9fs_vfs_readpage(struct file *filp, struct page *page)
 	return v9fs_fid_readpage(filp->private_data, page);
 }
 
+/*
+ * Context for "fast readpages"
+ */
+struct v9fs_readpages_ctx {
+	struct file *filp;
+	struct address_space *mapping;
+	pgoff_t start_index; /* index of the first page with actual data */
+	char *buf; /* buffer with actual data */
+	int len; /* length of the actual data */
+	int num_pages; /* maximal data chunk (in pages) that can be
+			  passed per transmission */
+};
+
+static int init_readpages_ctx(struct v9fs_readpages_ctx *ctx,
+			      struct file *filp,
+			      struct address_space *mapping,
+			      int num_pages)
+{
+	memset(ctx, 0, sizeof(*ctx));
+	ctx->buf = kmalloc(num_pages << PAGE_SHIFT, GFP_USER);
+	if (!ctx->buf)
+		return -ENOMEM;
+	ctx->filp = filp;
+	ctx->mapping = mapping;
+	ctx->num_pages = num_pages;
+	return 0;
+}
+
+static void done_readpages_ctx(struct v9fs_readpages_ctx *ctx)
+{
+	kfree(ctx->buf);
+}
+
+static int receive_buffer(struct file *filp,
+			  char *buf,
+			  off_t offset, /* offset in the file */
+			  int len,
+			  int *err)
+{
+	struct kvec kvec;
+	struct iov_iter iter;
+
+	kvec.iov_base = buf;
+	kvec.iov_len = len;
+	iov_iter_kvec(&iter, READ | ITER_KVEC, &kvec, 1, len);
+
+	return p9_client_read(filp->private_data, offset, &iter, err);
+}
+
+static int fast_filler(struct v9fs_readpages_ctx *ctx, struct page *page)
+{
+	int err;
+	int ret = 0;
+	char *kdata;
+	int to_page;
+	off_t off_in_buf;
+	struct inode *inode = page->mapping->host;
+
+	BUG_ON(!PageLocked(page));
+	/*
+	 * first, validate the buffer
+	 */
+	if (page->index < ctx->start_index ||
+	    ctx->start_index + ctx->num_pages < page->index) {
+		/*
+		 * No actual data in the buffer,
+		 * so actualize it
+		 */
+		ret = receive_buffer(ctx->filp,
+				     ctx->buf,
+				     page_offset(page),
+				     ctx->num_pages << PAGE_SHIFT,
+				     &err);
+		if (err) {
+			printk("failed to receive buffer off=%llu (%d)\n",
+			       (unsigned long long)page_offset(page),
+			       err);
+			ret = err;
+			goto done;
+		}
+		ctx->start_index = page->index;
+		ctx->len = ret;
+		ret = 0;
+	}
+	/*
+	 * fill the page with buffer's data
+	 */
+	off_in_buf = (page->index - ctx->start_index) << PAGE_SHIFT;
+	if (off_in_buf >= ctx->len) {
+		/*
+		 * No actual data to fill the page with
+		 */
+		ret = -1;
+		goto done;
+	}
+	to_page = ctx->len - off_in_buf;
+	if (to_page >= PAGE_SIZE)
+		to_page = PAGE_SIZE;
+
+	kdata = kmap_atomic(page);
+	memcpy(kdata, ctx->buf + off_in_buf, to_page);
+	memset(kdata + to_page, 0, PAGE_SIZE - to_page);
+	kunmap_atomic(kdata);
+
+	flush_dcache_page(page);
+	SetPageUptodate(page);
+	v9fs_readpage_to_fscache(inode, page);
+ done:
+	unlock_page(page);
+	return ret;
+}
+
+/**
+ * Try to read pages by groups. For every such group we issue only one
+ * read request to the server.
+ * @num_pages: maximal chunk of data (in pages) that can be passed per
+ * such request
+ */
+static int v9fs_readpages_tryfast(struct file *filp,
+				  struct address_space *mapping,
+				  struct list_head *pages,
+				  int num_pages)
+{
+	int ret;
+	struct v9fs_readpages_ctx ctx;
+
+	ret = init_readpages_ctx(&ctx, filp, mapping, num_pages);
+	if (ret)
+		/*
+		 * Can not allocate resources for the fast path,
+		 * so do it by slow way
+		 */
+		return read_cache_pages(mapping, pages,
+					(void *)v9fs_vfs_readpage, filp);
+
+	else
+		ret = read_cache_pages(mapping, pages,
+				       (void *)fast_filler, &ctx);
+	done_readpages_ctx(&ctx);
+	return ret;
+}
+
 /**
  * v9fs_vfs_readpages - read a set of pages from 9P
  *
@@ -114,6 +257,7 @@ static int v9fs_vfs_readpages(struct file *filp, struct address_space *mapping,
 {
 	int ret = 0;
 	struct inode *inode;
+	struct v9fs_flush_set *fset;
 
 	inode = mapping->host;
 	p9_debug(P9_DEBUG_VFS, "inode: %p file: %p\n", inode, filp);
@@ -122,7 +266,17 @@ static int v9fs_vfs_readpages(struct file *filp, struct address_space *mapping,
 	if (ret == 0)
 		return ret;
 
-	ret = read_cache_pages(mapping, pages, (void *)v9fs_vfs_readpage, filp);
+	fset = v9fs_inode2v9ses(mapping->host)->flush;
+	if (!fset)
+		/*
+		 * Do it by slow way
+		 */
+		ret = read_cache_pages(mapping, pages,
+				       (void *)v9fs_vfs_readpage, filp);
+	else
+		ret = v9fs_readpages_tryfast(filp, mapping,
+					     pages, fset->num_pages);
+
 	p9_debug(P9_DEBUG_VFS, "  = %d\n", ret);
 	return ret;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/7] 9p: v9fs fix ifnullfree.cocci warnings
  2016-12-12 18:15   ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin
  2016-12-12 18:15     ` [PATCH 2/7] 9p: v9fs new readpages edward.shishkin
@ 2016-12-12 18:15     ` edward.shishkin
  2016-12-12 18:15     ` [PATCH 4/7] 9p: v9fs fix odd_ptr_err.cocci warnings edward.shishkin
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: edward.shishkin @ 2016-12-12 18:15 UTC (permalink / raw)
  To: Eric Van Hensbergen, V9FS Developers Mailing List,
	Linux Filesystem Development List
  Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana,
	Greg Kurz, Alexander Graf, kbuild test robot, Eduard Shishkin

From: kbuild test robot <fengguang.wu@intel.com>

fs/9p/v9fs.c:318:2-7: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.
fs/9p/v9fs.c:320:2-7: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.

 NULL check before some freeing functions is not needed.

 Based on checkpatch warning
 "kfree(NULL) is safe this check is probably not required"
 and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Eduard Shishkin <eduard.shishkin@huawei.com>
---
 fs/9p/v9fs.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index 3b49daf..e2f84a6 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -314,10 +314,8 @@ void put_flush_set(struct v9fs_flush_set *fset)
 {
 	if (!fset)
 		return;
-	if (fset->pages)
-		kfree(fset->pages);
-	if (fset->buf)
-		kfree(fset->buf);
+	kfree(fset->pages);
+	kfree(fset->buf);
 	kfree(fset);
 }

-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/7] 9p: v9fs fix odd_ptr_err.cocci warnings
  2016-12-12 18:15   ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin
  2016-12-12 18:15     ` [PATCH 2/7] 9p: v9fs new readpages edward.shishkin
  2016-12-12 18:15     ` [PATCH 3/7] 9p: v9fs fix ifnullfree.cocci warnings edward.shishkin
@ 2016-12-12 18:15     ` edward.shishkin
  2016-12-12 18:15     ` [PATCH 5/7] 9p: v9fs fix semicolon.cocci warnings edward.shishkin
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: edward.shishkin @ 2016-12-12 18:15 UTC (permalink / raw)
  To: Eric Van Hensbergen, V9FS Developers Mailing List,
	Linux Filesystem Development List
  Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana,
	Greg Kurz, Alexander Graf, kbuild test robot, Eduard Shishkin

From: kbuild test robot <fengguang.wu@intel.com>

fs/9p/vfs_super.c:145:6-12: inconsistent IS_ERR and PTR_ERR on line 146.

 PTR_ERR should access the value just tested by IS_ERR

Semantic patch information:
 There can be false positives in the patch case, where it is the call to
 IS_ERR that is wrong.

Generated by: scripts/coccinelle/tests/odd_ptr_err.cocci

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Eduard Shishkin <eduard.shishkin@huawei.com>
---
 fs/9p/vfs_super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index c1f9af1..24aacec 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -143,7 +143,7 @@ static struct dentry *v9fs_mount(struct file_system_type *fs_type, int flags,
 	if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
 		retval = alloc_init_flush_set(v9ses);
 		if (IS_ERR(v9ses->flush)) {
-			retval = PTR_ERR(fid);
+			retval = PTR_ERR(v9ses->flush);
 			goto release_sb;
 		}
 		sb->s_d_op = &v9fs_cached_dentry_operations;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/7] 9p: v9fs fix semicolon.cocci warnings
  2016-12-12 18:15   ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin
                       ` (2 preceding siblings ...)
  2016-12-12 18:15     ` [PATCH 4/7] 9p: v9fs fix odd_ptr_err.cocci warnings edward.shishkin
@ 2016-12-12 18:15     ` edward.shishkin
  2016-12-12 18:15     ` [PATCH 6/7] 9p: v9fs fix readpages writepages contexts allocation edward.shishkin
  2016-12-12 18:15     ` [PATCH 7/7] 9p: v9fs fix calculation of max number of merged pages edward.shishkin
  5 siblings, 0 replies; 15+ messages in thread
From: edward.shishkin @ 2016-12-12 18:15 UTC (permalink / raw)
  To: Eric Van Hensbergen, V9FS Developers Mailing List,
	Linux Filesystem Development List
  Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana,
	Greg Kurz, Alexander Graf, kbuild test robot, Eduard Shishkin

From: kbuild test robot <fengguang.wu@intel.com>

fs/9p/vfs_addr.c:425:3-4: Unneeded semicolon
fs/9p/vfs_addr.c:458:2-3: Unneeded semicolon

 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Eduard Shishkin <eduard.shishkin@huawei.com>
---
 fs/9p/vfs_addr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 4ad248e..afd8a13 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -576,7 +576,7 @@ static int flush_pages(struct page **pages, int nr_pages,
 #endif
 				goto iter_send;
 			}
-		};
+		}
 	iter_send:
 		if (iter_len == 0)
 			/*
@@ -609,7 +609,7 @@ static int flush_pages(struct page **pages, int nr_pages,
 	next_iter:
 		if (ctx->done)
 			return ret;
-	};
+	}
 	return 0;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/7] 9p: v9fs fix readpages writepages contexts allocation
  2016-12-12 18:15   ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin
                       ` (3 preceding siblings ...)
  2016-12-12 18:15     ` [PATCH 5/7] 9p: v9fs fix semicolon.cocci warnings edward.shishkin
@ 2016-12-12 18:15     ` edward.shishkin
  2016-12-12 18:15     ` [PATCH 7/7] 9p: v9fs fix calculation of max number of merged pages edward.shishkin
  5 siblings, 0 replies; 15+ messages in thread
From: edward.shishkin @ 2016-12-12 18:15 UTC (permalink / raw)
  To: Eric Van Hensbergen, V9FS Developers Mailing List,
	Linux Filesystem Development List
  Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana,
	Greg Kurz, Alexander Graf, Eduard Shishkin

From: Eduard Shishkin <eduard.shishkin@huawei.com>

Use GFP_KERNEL instead of GFP_USER when allocating buffers for
readpages/writepages contexts.

Signed-off-by: Eduard Shishkin <eduard.shishkin@huawei.com>
---
 fs/9p/v9fs.c     | 2 +-
 fs/9p/vfs_addr.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index e2f84a6..58bff9e 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -340,7 +340,7 @@ int alloc_init_flush_set(struct v9fs_session_info *v9ses)
 	fset->pages = kzalloc(num_pages * sizeof(*fset->pages), GFP_KERNEL);
 	if (!fset->pages)
 		goto error;
-	fset->buf = kzalloc(num_pages << PAGE_SHIFT, GFP_USER);
+	fset->buf = kzalloc(num_pages << PAGE_SHIFT, GFP_KERNEL);
 	if (!fset->buf)
 		goto error;
 	spin_lock_init(&(fset->lock));
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index afd8a13..4b2b1d6 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -119,7 +119,7 @@ static int init_readpages_ctx(struct v9fs_readpages_ctx *ctx,
 			      int num_pages)
 {
 	memset(ctx, 0, sizeof(*ctx));
-	ctx->buf = kmalloc(num_pages << PAGE_SHIFT, GFP_USER);
+	ctx->buf = kmalloc(num_pages << PAGE_SHIFT, GFP_KERNEL);
 	if (!ctx->buf)
 		return -ENOMEM;
 	ctx->filp = filp;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/7] 9p: v9fs fix calculation of max number of merged pages
  2016-12-12 18:15   ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin
                       ` (4 preceding siblings ...)
  2016-12-12 18:15     ` [PATCH 6/7] 9p: v9fs fix readpages writepages contexts allocation edward.shishkin
@ 2016-12-12 18:15     ` edward.shishkin
  5 siblings, 0 replies; 15+ messages in thread
From: edward.shishkin @ 2016-12-12 18:15 UTC (permalink / raw)
  To: Eric Van Hensbergen, V9FS Developers Mailing List,
	Linux Filesystem Development List
  Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana,
	Greg Kurz, Alexander Graf, Eduard Shishkin

From: Eduard Shishkin <eduard.shishkin@huawei.com>

Don't merge too many pages when composing a 9p message because:
. it doesn't lead to essential performance improvement;
. to not allow user space to allocate big amount of kernel memory.

We use a limit of 256K (for total size of all pages merged per message),
as larger values don't provide any visible speedup.

Signed-off-by: Eduard Shishkin <eduard.shishkin@huawei.com>
---
 fs/9p/v9fs.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index 58bff9e..50a4034 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -319,6 +319,8 @@ void put_flush_set(struct v9fs_flush_set *fset)
 	kfree(fset);
 }
 
+#define MAX_FLUSH_DATA_SIZE (262144)
+
 /**
  * Allocate and initalize flush set
  * Pre-conditions: valid msize is set
@@ -333,6 +335,11 @@ int alloc_init_flush_set(struct v9fs_session_info *v9ses)
 	if (num_pages < 2)
 		/* speedup impossible */
 		return 0;
+	if (num_pages > (MAX_FLUSH_DATA_SIZE >> PAGE_SHIFT))
+		/*
+		 * no performance gain with larger values
+		 */
+		num_pages = MAX_FLUSH_DATA_SIZE >> PAGE_SHIFT;
 	fset = kzalloc(sizeof(*fset), GFP_KERNEL);
 	if (!fset)
 		goto error;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-12-12 18:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-10 17:24 [RFC][PATCH 0/2] 9p: v9fs read and write speedup Edward Shishkin
2016-10-10 17:24 ` [PATCH 1/2] 9p: v9fs add writepages Edward Shishkin
2016-10-10 17:24   ` [PATCH 2/2] 9p: v9fs new readpages Edward Shishkin
2016-10-25 14:13     ` [Qemu-devel] " Alexander Graf
2016-12-09 18:42       ` Edward Shishkin
2016-10-25 14:01   ` [Qemu-devel] [PATCH 1/2] 9p: v9fs add writepages Alexander Graf
2016-12-09 18:43     ` Edward Shishkin
2016-12-12 18:13 ` [RFC][PATCH 0/7] 9p: v9fs read and write speedup - V2 Edward Shishkin
2016-12-12 18:15   ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin
2016-12-12 18:15     ` [PATCH 2/7] 9p: v9fs new readpages edward.shishkin
2016-12-12 18:15     ` [PATCH 3/7] 9p: v9fs fix ifnullfree.cocci warnings edward.shishkin
2016-12-12 18:15     ` [PATCH 4/7] 9p: v9fs fix odd_ptr_err.cocci warnings edward.shishkin
2016-12-12 18:15     ` [PATCH 5/7] 9p: v9fs fix semicolon.cocci warnings edward.shishkin
2016-12-12 18:15     ` [PATCH 6/7] 9p: v9fs fix readpages writepages contexts allocation edward.shishkin
2016-12-12 18:15     ` [PATCH 7/7] 9p: v9fs fix calculation of max number of merged pages edward.shishkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).