* [RFC][PATCH 0/2] 9p: v9fs read and write speedup @ 2016-10-10 17:24 Edward Shishkin 2016-10-10 17:24 ` [PATCH 1/2] 9p: v9fs add writepages Edward Shishkin 2016-12-12 18:13 ` [RFC][PATCH 0/7] 9p: v9fs read and write speedup - V2 Edward Shishkin 0 siblings, 2 replies; 15+ messages in thread From: Edward Shishkin @ 2016-10-10 17:24 UTC (permalink / raw) To: Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana, Edward Shishkin Hello everyone, The progress in virtualization and cloud technologies has resulted in a popularity of file sets shared on the host machines by Plan 9 File Protocol (the sharing setup is also known as VirtFS). Another sharing setup which uses NFS protocol is less popular because of number of reasons. Unfortunately performance of default VirtFS setup is poor. We analyzed the reasons in our Labs of Huawei Technologies and found that typical bottleneck is caused by the fact that transfer of any portion of data by many small 9p messages is slower than transfer of the same amount of data by a single 9p message. It is possible to reduce number of 9P messages (and, hence, to improve performance) by a number of ways(*), however, some "hardcoded" bottlenecks are still present in the v9fs driver of the guest kernel. Specifically, this is poor implementation of read-ahead and write-behind paths of v9fs. With current implementations there is no chances that more than PAGE_SIZE bytes of data will be transmitted at one time. To improve the situation we have introduced a special layer, which allows to coalesce specified number of adjacent pages (if any) into a single 9P message. This layer is represented by private implementations of ->readpages() and ->writepages() address space operations for v9fs. To merge adjacent pages we use a special buffer of size, which depends on specified (per mount session) msize. For read-ahead paths we allocate such buffers by demand. For writeback paths we use a single buffer pre-allocated at mount time. This is because at writeback paths the file system usually responds to memory pressure notification, so we can not afford allocation by-demand at writeback paths. All pages which are supposed to merge are coped to the buffer at respective offsets. Then we construct and transmit a long read(write) 9P message (**). Thus, we managed to speedup only one writeback-ing thread. Other concurrent threads, which failed to obtain the buffer, will go by usual (slow) ways. If interesting, I'll implement a solution with N pre-allocated buffers (where N is number of CPUs). This approach allows to increase VirtFS bandwidth up to 3 times, and thus, to make it close to the bandwidth of VirtIO-blk (see the numbers in the Appendix A below). Comment. Our patches improve only asynchronous operations, which involve the page cache. Direct reads and writes will be unaffected by obvious reasons. NOTE, that by default v9fs works in direct mode, so in order to see an effect, you should specify respective v9fs mount option (e.g. "fscache"). ---- (*) Specifying larger mszie (maximal possible size of 9P message) allow to reduce number of 9P messages in direct operations performed by large chunks. Disabling v9fs ACL and security labels in the guest kernel (if it is not needed) allows to avoid extra-messages. (**) 9P, Plan 9 File Protocol specifications https://swtch.com/plan9port/man/man9/intro.html Appendix A. iozone -e -r chunk_size -s 6G -w -f Throughput in mBytes/sec operation chunk_size (1) (2) (3) write 1M 391 127 330 read 1M 469 221 432 write 4K 410 128 297 read 4K 465 192 327 random write 1M 403 145 313 random read 1M 347 161 195 random write 4K 344 119 131 random read 4K 44 41 64 Legend: (1): VirtIO-blk (2): VirtIO-9p (3): VirtIO-9p: guest kernel is patched with our stuff Hardware & Software: Host: 8 CPU Intel(R) Core(TM) Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz 16G RAM, SSD: Noname. Throughput: Write: 410 M/sec, Read: 470 M/sec. Fedora 24, Kernel-4.7.4-200.fc24.x86_64, kvm+qemu-2.7.0, fs: ext4 Guest: 2 CPU: GenuineIntel 3.4GHz, 2G RAM, Network model: VirtIO Fedora 21, Kernel: 4.7.6 Settings: VirtIO-blk: Guest FS: ext4; VirtIO-9p: mount option: "trans=virtio,version=9p2000.L,msize=131096,fscache" Caches of the host and guest were dropped before every iozone's phase. CC-ed QEMU developers list for possible comments and ACKs. Please, consider for inclusion. Thanks, Edward. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] 9p: v9fs add writepages. 2016-10-10 17:24 [RFC][PATCH 0/2] 9p: v9fs read and write speedup Edward Shishkin @ 2016-10-10 17:24 ` Edward Shishkin 2016-10-10 17:24 ` [PATCH 2/2] 9p: v9fs new readpages Edward Shishkin 2016-10-25 14:01 ` [Qemu-devel] [PATCH 1/2] 9p: v9fs add writepages Alexander Graf 2016-12-12 18:13 ` [RFC][PATCH 0/7] 9p: v9fs read and write speedup - V2 Edward Shishkin 1 sibling, 2 replies; 15+ messages in thread From: Edward Shishkin @ 2016-10-10 17:24 UTC (permalink / raw) To: Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana, Edward Shishkin, Edward Shishkin Add a v9fs private ->writepages() method of address_space operations for merging pages into long 9p messages. Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com> --- fs/9p/v9fs.c | 46 +++++++ fs/9p/v9fs.h | 22 +++- fs/9p/vfs_addr.c | 357 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/9p/vfs_super.c | 8 +- 4 files changed, 431 insertions(+), 2 deletions(-) diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c index 072e759..3b49daf 100644 --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -32,6 +32,7 @@ #include <linux/parser.h> #include <linux/idr.h> #include <linux/slab.h> +#include <linux/pagemap.h> #include <net/9p/9p.h> #include <net/9p/client.h> #include <net/9p/transport.h> @@ -309,6 +310,49 @@ static int v9fs_parse_options(struct v9fs_session_info *v9ses, char *opts) return ret; } +void put_flush_set(struct v9fs_flush_set *fset) +{ + if (!fset) + return; + if (fset->pages) + kfree(fset->pages); + if (fset->buf) + kfree(fset->buf); + kfree(fset); +} + +/** + * Allocate and initalize flush set + * Pre-conditions: valid msize is set + */ +int alloc_init_flush_set(struct v9fs_session_info *v9ses) +{ + int ret = -ENOMEM; + int num_pages; + struct v9fs_flush_set *fset = NULL; + + num_pages = v9ses->clnt->msize >> PAGE_SHIFT; + if (num_pages < 2) + /* speedup impossible */ + return 0; + fset = kzalloc(sizeof(*fset), GFP_KERNEL); + if (!fset) + goto error; + fset->num_pages = num_pages; + fset->pages = kzalloc(num_pages * sizeof(*fset->pages), GFP_KERNEL); + if (!fset->pages) + goto error; + fset->buf = kzalloc(num_pages << PAGE_SHIFT, GFP_USER); + if (!fset->buf) + goto error; + spin_lock_init(&(fset->lock)); + v9ses->flush = fset; + return 0; + error: + put_flush_set(fset); + return ret; +} + /** * v9fs_session_init - initialize session * @v9ses: session information structure @@ -444,6 +488,8 @@ void v9fs_session_close(struct v9fs_session_info *v9ses) kfree(v9ses->uname); kfree(v9ses->aname); + put_flush_set(v9ses->flush); + bdi_destroy(&v9ses->bdi); spin_lock(&v9fs_sessionlist_lock); diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h index 6877050..d1092e4 100644 --- a/fs/9p/v9fs.h +++ b/fs/9p/v9fs.h @@ -23,6 +23,7 @@ #ifndef FS_9P_V9FS_H #define FS_9P_V9FS_H +#include <linux/kconfig.h> #include <linux/backing-dev.h> /** @@ -69,6 +70,13 @@ enum p9_cache_modes { CACHE_FSCACHE, }; +struct v9fs_flush_set { + struct page **pages; + int num_pages; + char *buf; + spinlock_t lock; +}; + /** * struct v9fs_session_info - per-instance session information * @flags: session options of type &p9_session_flags @@ -105,7 +113,7 @@ struct v9fs_session_info { char *cachetag; struct fscache_cookie *fscache; #endif - + struct v9fs_flush_set *flush; /* flush set for writepages */ char *uname; /* user name to mount as */ char *aname; /* name of remote hierarchy being mounted */ unsigned int maxdata; /* max data for client interface */ @@ -158,6 +166,8 @@ extern const struct inode_operations v9fs_symlink_inode_operations_dotl; extern struct inode *v9fs_inode_from_fid_dotl(struct v9fs_session_info *v9ses, struct p9_fid *fid, struct super_block *sb, int new); +extern int alloc_init_flush_set(struct v9fs_session_info *v9ses); +extern void put_flush_set(struct v9fs_flush_set *fset); /* other default globals */ #define V9FS_PORT 564 @@ -222,4 +232,14 @@ v9fs_get_new_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid, return v9fs_inode_from_fid(v9ses, fid, sb, 1); } +static inline int spin_trylock_flush_set(struct v9fs_flush_set *fset) +{ + return spin_trylock(&(fset->lock)); +} + +static inline void spin_unlock_flush_set(struct v9fs_flush_set *fset) +{ + spin_unlock(&(fset->lock)); +} + #endif diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index 6181ad7..e871886 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -36,6 +36,7 @@ #include <linux/uio.h> #include <net/9p/9p.h> #include <net/9p/client.h> +#include <trace/events/writeback.h> #include "v9fs.h" #include "v9fs_vfs.h" @@ -209,6 +210,361 @@ static int v9fs_vfs_writepage(struct page *page, struct writeback_control *wbc) return retval; } +static void redirty_pages_for_writeback(struct page **pages, int nr, + struct writeback_control *wbc) +{ + int i; + for (i = 0; i < nr; i++) { + lock_page(pages[i]); + redirty_page_for_writepage(wbc, pages[i]); + unlock_page(pages[i]); + } +} + +static void set_pages_error(struct page **pages, int nr, int error) +{ + int i; + for (i = 0; i < nr; i++) { + lock_page(pages[i]); + SetPageError(pages[i]); + mapping_set_error(pages[i]->mapping, error); + unlock_page(pages[i]); + } +} + +#define V9FS_WRITEPAGES_DEBUG (0) + +struct flush_context { + struct writeback_control *wbc; + struct address_space *mapping; + struct v9fs_flush_set *fset; + pgoff_t done_index; + pgoff_t writeback_index; + pgoff_t index; + pgoff_t end; /* Inclusive */ + const char *msg; + int cycled; + int range_whole; + int done; +}; + +/** + * Copy a page with file's data to buffer. + * Handle races with truncate, etc. + * Return number of copied bytes + * + * @page: page to copy data from; + * @page_nr: serial number of the page + */ +static int flush_page(struct page *page, int page_nr, struct flush_context *ctx) +{ + char *kdata; + loff_t isize; + int copied = 0; + struct writeback_control *wbc = ctx->wbc; + /* + * At this point, the page may be truncated or + * invalidated (changing page->mapping to NULL), or + * even swizzled back from swapper_space to tmpfs file + * mapping. However, page->index will not change + * because we have a reference on the page. + */ + if (page->index > ctx->end) { + /* + * can't be range_cyclic (1st pass) because + * end == -1 in that case. + */ + ctx->done = 1; + ctx->msg = "page out of range"; + goto exit; + } + ctx->done_index = page->index; + lock_page(page); + /* + * Page truncated or invalidated. We can freely skip it + * then, even for data integrity operations: the page + * has disappeared concurrently, so there could be no + * real expectation of this data interity operation + * even if there is now a new, dirty page at the same + * pagecache address. + */ + if (unlikely(page->mapping != ctx->mapping)) { + unlock_page(page); + ctx->msg = "page truncated or invalidated"; + goto exit; + } + if (!PageDirty(page)) { + /* + * someone wrote it for us + */ + unlock_page(page); + ctx->msg = "page not dirty"; + goto exit; + } + if (PageWriteback(page)) { + if (wbc->sync_mode != WB_SYNC_NONE) + wait_on_page_writeback(page); + else { + unlock_page(page); + ctx->msg = "page is writeback"; + goto exit; + } + } + BUG_ON(PageWriteback(page)); + if (!clear_page_dirty_for_io(page)) { + unlock_page(page); + ctx->msg = "failed to clear page dirty"; + goto exit; + } + trace_wbc_writepage(wbc, inode_to_bdi(ctx->mapping->host)); + + set_page_writeback(page); + isize = i_size_read(ctx->mapping->host); + if (page->index == isize >> PAGE_SHIFT) + copied = isize & ~PAGE_MASK; + else + copied = PAGE_SIZE; + kdata = kmap_atomic(page); + memcpy(ctx->fset->buf + (page_nr << PAGE_SHIFT), kdata, copied); + kunmap_atomic(kdata); + end_page_writeback(page); + + unlock_page(page); + /* + * We stop writing back only if we are not doing + * integrity sync. In case of integrity sync we have to + * keep going until we have written all the pages + * we tagged for writeback prior to entering this loop. + */ + if (--wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE) + ctx->done = 1; + exit: + return copied; +} + +static int send_buffer(off_t offset, int len, struct flush_context *ctx) +{ + int ret = 0; + struct kvec kvec; + struct iov_iter iter; + struct v9fs_inode *v9inode = V9FS_I(ctx->mapping->host); + + kvec.iov_base = ctx->fset->buf; + kvec.iov_len = len; + iov_iter_kvec(&iter, WRITE | ITER_KVEC, &kvec, 1, len); + BUG_ON(!v9inode->writeback_fid); + + p9_client_write(v9inode->writeback_fid, offset, &iter, &ret); + return ret; +} + +/** + * Helper function for managing 9pFS write requests. + * The main purpose of this function is to provide support for + * the coalescing of several pages into a single 9p message. + * This is similarly to NFS's pagelist. + * + * Copy pages with adjusent indices to a buffer and send it to + * the server. + * + * @pages: array of pages with ascending indices; + * @nr_pages: number of pages in the array; + */ +static int flush_pages(struct page **pages, int nr_pages, + struct flush_context *ctx) +{ + int ret; + int pos = 0; + int iter_pos; + int iter_nrpages; + pgoff_t iter_page_idx; + + while (pos < nr_pages) { + + int i; + int iter_len = 0; + struct page *page; + + iter_pos = pos; + iter_nrpages = 0; + iter_page_idx = pages[pos]->index; + + for (i = 0; pos < nr_pages; i++) { + int from_page; + + page = pages[pos]; + if (page->index != iter_page_idx + i) { + /* + * Hole in the indices, + * further coalesce impossible. + * Try to send what we have accumulated. + * This page will be processed in the next + * iteration + */ + goto iter_send; + } + from_page = flush_page(page, i, ctx); + + iter_len += from_page; + iter_nrpages++; + pos++; + + if (from_page != PAGE_SIZE) { + /* + * Not full page was flushed, + * further coalesce impossible. + * Try to send what we have accumulated. + */ +#if V9FS_WRITEPAGES_DEBUG + if (from_page == 0) + printk("9p: page %lu is not flushed (%s)\n", + page->index, ctx->msg); +#endif + goto iter_send; + } + }; + iter_send: + if (iter_len == 0) + /* + * Nothing to send + */ + goto next_iter; + ret = send_buffer(iter_page_idx << PAGE_SHIFT, + iter_len, ctx); + if (ret == -EAGAIN) { + redirty_pages_for_writeback(pages + iter_pos, + iter_nrpages, ctx->wbc); + ret = 0; + } else if (ret < 0) { + /* + * Something bad happened. + * done_index is set past this chunk, + * so media errors will not choke + * background writeout for the entire + * file. + */ + printk("9p: send_buffer failed (%d)\n", ret); + + set_pages_error(pages + iter_pos, iter_nrpages, ret); + ctx->done_index = + pages[iter_pos + iter_nrpages - 1]->index + 1; + ctx->done = 1; + return ret; + } else + ret = 0; + next_iter: + if (ctx->done) + return ret; + }; + return 0; +} + +static void init_flush_context(struct flush_context *ctx, + struct address_space *mapping, + struct writeback_control *wbc, + struct v9fs_flush_set *fset) +{ + ctx->wbc = wbc; + ctx->mapping = mapping; + ctx->fset = fset; + ctx->done = 0; + ctx->range_whole = 0; + + if (wbc->range_cyclic) { + ctx->writeback_index = mapping->writeback_index; + ctx->index = ctx->writeback_index; + if (ctx->index == 0) + ctx->cycled = 1; + else + ctx->cycled = 0; + ctx->end = -1; + } else { + ctx->index = wbc->range_start >> PAGE_SHIFT; + ctx->end = wbc->range_end >> PAGE_SHIFT; + if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) + ctx->range_whole = 1; + ctx->cycled = 1; /* ignore range_cyclic tests */ + } +} + +/** + * Pre-condition: flush set is locked + */ +static int v9fs_writepages_fastpath(struct address_space *mapping, + struct writeback_control *wbc, + struct v9fs_flush_set *fset) +{ + int ret = 0; + int tag; + int nr_pages; + struct page **pages = fset->pages; + struct flush_context ctx; + + init_flush_context(&ctx, mapping, wbc, fset); + + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) + tag = PAGECACHE_TAG_TOWRITE; + else + tag = PAGECACHE_TAG_DIRTY; +retry: + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) + tag_pages_for_writeback(mapping, ctx.index, ctx.end); + + ctx.done_index = ctx.index; + + while (!ctx.done && (ctx.index <= ctx.end)) { + int i; + nr_pages = find_get_pages_tag(mapping, &ctx.index, tag, + 1 + min(ctx.end - ctx.index, + (pgoff_t)(fset->num_pages - 1)), + pages); + if (nr_pages == 0) + break; + + ret = flush_pages(pages, nr_pages, &ctx); + /* + * unpin pages + */ + for (i = 0; i < nr_pages; i++) + put_page(pages[i]); + if (ret < 0) + break; + cond_resched(); + } + if (!ctx.cycled && !ctx.done) { + /* + * range_cyclic: + * We hit the last page and there is more work + * to be done: wrap back to the start of the file + */ + ctx.cycled = 1; + ctx.index = 0; + ctx.end = ctx.writeback_index - 1; + goto retry; + } + if (wbc->range_cyclic || (ctx.range_whole && wbc->nr_to_write > 0)) + mapping->writeback_index = ctx.done_index; + return ret; +} + +static int v9fs_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + int ret; + struct v9fs_flush_set *fset; + + fset = v9fs_inode2v9ses(mapping->host)->flush; + if (!fset || !spin_trylock_flush_set(fset)) + /* + * do it by slow way + */ + return generic_writepages(mapping, wbc); + + ret = v9fs_writepages_fastpath(mapping, wbc, fset); + spin_unlock_flush_set(fset); + return ret; +} + /** * v9fs_launder_page - Writeback a dirty page * Returns 0 on success. @@ -342,6 +698,7 @@ const struct address_space_operations v9fs_addr_operations = { .readpages = v9fs_vfs_readpages, .set_page_dirty = __set_page_dirty_nobuffers, .writepage = v9fs_vfs_writepage, + .writepages = v9fs_writepages, .write_begin = v9fs_write_begin, .write_end = v9fs_write_end, .releasepage = v9fs_release_page, diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c index de3ed86..c1f9af1 100644 --- a/fs/9p/vfs_super.c +++ b/fs/9p/vfs_super.c @@ -140,8 +140,14 @@ static struct dentry *v9fs_mount(struct file_system_type *fs_type, int flags, } v9fs_fill_super(sb, v9ses, flags, data); - if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) + if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) { + retval = alloc_init_flush_set(v9ses); + if (IS_ERR(v9ses->flush)) { + retval = PTR_ERR(fid); + goto release_sb; + } sb->s_d_op = &v9fs_cached_dentry_operations; + } else sb->s_d_op = &v9fs_dentry_operations; -- 2.7.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] 9p: v9fs new readpages. 2016-10-10 17:24 ` [PATCH 1/2] 9p: v9fs add writepages Edward Shishkin @ 2016-10-10 17:24 ` Edward Shishkin 2016-10-25 14:13 ` [Qemu-devel] " Alexander Graf 2016-10-25 14:01 ` [Qemu-devel] [PATCH 1/2] 9p: v9fs add writepages Alexander Graf 1 sibling, 1 reply; 15+ messages in thread From: Edward Shishkin @ 2016-10-10 17:24 UTC (permalink / raw) To: Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana, Edward Shishkin, Edward Shishkin Modify v9fs private ->readpages() method of address_space operations for merging pages into long 9p messages. Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com> --- fs/9p/vfs_addr.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 155 insertions(+), 1 deletion(-) diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index e871886..4ad248e 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -34,6 +34,7 @@ #include <linux/idr.h> #include <linux/sched.h> #include <linux/uio.h> +#include <linux/slab.h> #include <net/9p/9p.h> #include <net/9p/client.h> #include <trace/events/writeback.h> @@ -99,6 +100,148 @@ static int v9fs_vfs_readpage(struct file *filp, struct page *page) return v9fs_fid_readpage(filp->private_data, page); } +/* + * Context for "fast readpages" + */ +struct v9fs_readpages_ctx { + struct file *filp; + struct address_space *mapping; + pgoff_t start_index; /* index of the first page with actual data */ + char *buf; /* buffer with actual data */ + int len; /* length of the actual data */ + int num_pages; /* maximal data chunk (in pages) that can be + passed per transmission */ +}; + +static int init_readpages_ctx(struct v9fs_readpages_ctx *ctx, + struct file *filp, + struct address_space *mapping, + int num_pages) +{ + memset(ctx, 0, sizeof(*ctx)); + ctx->buf = kmalloc(num_pages << PAGE_SHIFT, GFP_USER); + if (!ctx->buf) + return -ENOMEM; + ctx->filp = filp; + ctx->mapping = mapping; + ctx->num_pages = num_pages; + return 0; +} + +static void done_readpages_ctx(struct v9fs_readpages_ctx *ctx) +{ + kfree(ctx->buf); +} + +static int receive_buffer(struct file *filp, + char *buf, + off_t offset, /* offset in the file */ + int len, + int *err) +{ + struct kvec kvec; + struct iov_iter iter; + + kvec.iov_base = buf; + kvec.iov_len = len; + iov_iter_kvec(&iter, READ | ITER_KVEC, &kvec, 1, len); + + return p9_client_read(filp->private_data, offset, &iter, err); +} + +static int fast_filler(struct v9fs_readpages_ctx *ctx, struct page *page) +{ + int err; + int ret = 0; + char *kdata; + int to_page; + off_t off_in_buf; + struct inode *inode = page->mapping->host; + + BUG_ON(!PageLocked(page)); + /* + * first, validate the buffer + */ + if (page->index < ctx->start_index || + ctx->start_index + ctx->num_pages < page->index) { + /* + * No actual data in the buffer, + * so actualize it + */ + ret = receive_buffer(ctx->filp, + ctx->buf, + page_offset(page), + ctx->num_pages << PAGE_SHIFT, + &err); + if (err) { + printk("failed to receive buffer off=%llu (%d)\n", + (unsigned long long)page_offset(page), + err); + ret = err; + goto done; + } + ctx->start_index = page->index; + ctx->len = ret; + ret = 0; + } + /* + * fill the page with buffer's data + */ + off_in_buf = (page->index - ctx->start_index) << PAGE_SHIFT; + if (off_in_buf >= ctx->len) { + /* + * No actual data to fill the page with + */ + ret = -1; + goto done; + } + to_page = ctx->len - off_in_buf; + if (to_page >= PAGE_SIZE) + to_page = PAGE_SIZE; + + kdata = kmap_atomic(page); + memcpy(kdata, ctx->buf + off_in_buf, to_page); + memset(kdata + to_page, 0, PAGE_SIZE - to_page); + kunmap_atomic(kdata); + + flush_dcache_page(page); + SetPageUptodate(page); + v9fs_readpage_to_fscache(inode, page); + done: + unlock_page(page); + return ret; +} + +/** + * Try to read pages by groups. For every such group we issue only one + * read request to the server. + * @num_pages: maximal chunk of data (in pages) that can be passed per + * such request + */ +static int v9fs_readpages_tryfast(struct file *filp, + struct address_space *mapping, + struct list_head *pages, + int num_pages) +{ + int ret; + struct v9fs_readpages_ctx ctx; + + ret = init_readpages_ctx(&ctx, filp, mapping, num_pages); + if (ret) + /* + * Can not allocate resources for the fast path, + * so do it by slow way + */ + return read_cache_pages(mapping, pages, + (void *)v9fs_vfs_readpage, filp); + + else + ret = read_cache_pages(mapping, pages, + (void *)fast_filler, &ctx); + done_readpages_ctx(&ctx); + return ret; +} + /** * v9fs_vfs_readpages - read a set of pages from 9P * @@ -114,6 +257,7 @@ static int v9fs_vfs_readpages(struct file *filp, struct address_space *mapping, { int ret = 0; struct inode *inode; + struct v9fs_flush_set *fset; inode = mapping->host; p9_debug(P9_DEBUG_VFS, "inode: %p file: %p\n", inode, filp); @@ -122,7 +266,17 @@ static int v9fs_vfs_readpages(struct file *filp, struct address_space *mapping, if (ret == 0) return ret; - ret = read_cache_pages(mapping, pages, (void *)v9fs_vfs_readpage, filp); + fset = v9fs_inode2v9ses(mapping->host)->flush; + if (!fset) + /* + * Do it by slow way + */ + ret = read_cache_pages(mapping, pages, + (void *)v9fs_vfs_readpage, filp); + else + ret = v9fs_readpages_tryfast(filp, mapping, + pages, fset->num_pages); + p9_debug(P9_DEBUG_VFS, " = %d\n", ret); return ret; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 2/2] 9p: v9fs new readpages. 2016-10-10 17:24 ` [PATCH 2/2] 9p: v9fs new readpages Edward Shishkin @ 2016-10-25 14:13 ` Alexander Graf 2016-12-09 18:42 ` Edward Shishkin 0 siblings, 1 reply; 15+ messages in thread From: Alexander Graf @ 2016-10-25 14:13 UTC (permalink / raw) To: Edward Shishkin, Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: Edward Shishkin, Claudio Fontana, QEMU Developers Mailing List, ZhangWei On 10/10/2016 07:24 PM, Edward Shishkin wrote: > Modify v9fs private ->readpages() method of address_space > operations for merging pages into long 9p messages. > > Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com> > --- > fs/9p/vfs_addr.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 155 insertions(+), 1 deletion(-) > > diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c > index e871886..4ad248e 100644 > --- a/fs/9p/vfs_addr.c > +++ b/fs/9p/vfs_addr.c > @@ -34,6 +34,7 @@ > #include <linux/idr.h> > #include <linux/sched.h> > #include <linux/uio.h> > +#include <linux/slab.h> > #include <net/9p/9p.h> > #include <net/9p/client.h> > #include <trace/events/writeback.h> > @@ -99,6 +100,148 @@ static int v9fs_vfs_readpage(struct file *filp, struct page *page) > return v9fs_fid_readpage(filp->private_data, page); > } > > +/* > + * Context for "fast readpages" > + */ > +struct v9fs_readpages_ctx { > + struct file *filp; > + struct address_space *mapping; > + pgoff_t start_index; /* index of the first page with actual data */ > + char *buf; /* buffer with actual data */ > + int len; /* length of the actual data */ > + int num_pages; /* maximal data chunk (in pages) that can be > + passed per transmission */ > +}; > + > +static int init_readpages_ctx(struct v9fs_readpages_ctx *ctx, > + struct file *filp, > + struct address_space *mapping, > + int num_pages) > +{ > + memset(ctx, 0, sizeof(*ctx)); > + ctx->buf = kmalloc(num_pages << PAGE_SHIFT, GFP_USER); Doesn't this a) have potential information leak to user space and b) allow user space to allocate big amounts of kernel memory? A limited page pool would probably be better here. I also don't quite grasp yet what pattern you're actually optimizing. Basically you're doing implicit read-ahead on behalf of the reader, right? So why would that be faster in a random 4k read scenario? Also, have you compared tmpfs hosted files to only benchmark the transmission path? > + if (!ctx->buf) > + return -ENOMEM; > + ctx->filp = filp; > + ctx->mapping = mapping; > + ctx->num_pages = num_pages; > + return 0; > +} > + > +static void done_readpages_ctx(struct v9fs_readpages_ctx *ctx) > +{ > + kfree(ctx->buf); > +} > + > +static int receive_buffer(struct file *filp, > + char *buf, > + off_t offset, /* offset in the file */ > + int len, > + int *err) > +{ > + struct kvec kvec; > + struct iov_iter iter; > + > + kvec.iov_base = buf; > + kvec.iov_len = len; > + iov_iter_kvec(&iter, READ | ITER_KVEC, &kvec, 1, len); > + > + return p9_client_read(filp->private_data, offset, &iter, err); > +} > + > +static int fast_filler(struct v9fs_readpages_ctx *ctx, struct page *page) > +{ > + int err; > + int ret = 0; > + char *kdata; > + int to_page; > + off_t off_in_buf; > + struct inode *inode = page->mapping->host; > + > + BUG_ON(!PageLocked(page)); > + /* > + * first, validate the buffer > + */ > + if (page->index < ctx->start_index || > + ctx->start_index + ctx->num_pages < page->index) { > + /* > + * No actual data in the buffer, > + * so actualize it > + */ > + ret = receive_buffer(ctx->filp, > + ctx->buf, > + page_offset(page), > + ctx->num_pages << PAGE_SHIFT, Doesn't this potentially read beyond the end of the file? Alex > + &err); > + if (err) { > + printk("failed to receive buffer off=%llu (%d)\n", > + (unsigned long long)page_offset(page), > + err); > + ret = err; > + goto done; > + } > + ctx->start_index = page->index; > + ctx->len = ret; > + ret = 0; > + } > + /* > + * fill the page with buffer's data > + */ > + off_in_buf = (page->index - ctx->start_index) << PAGE_SHIFT; > + if (off_in_buf >= ctx->len) { > + /* > + * No actual data to fill the page with > + */ > + ret = -1; > + goto done; > + } > + to_page = ctx->len - off_in_buf; > + if (to_page >= PAGE_SIZE) > + to_page = PAGE_SIZE; > + > + kdata = kmap_atomic(page); > + memcpy(kdata, ctx->buf + off_in_buf, to_page); > + memset(kdata + to_page, 0, PAGE_SIZE - to_page); > + kunmap_atomic(kdata); > + > + flush_dcache_page(page); > + SetPageUptodate(page); > + v9fs_readpage_to_fscache(inode, page); > + done: > + unlock_page(page); > + return ret; > +} > + > +/** > + * Try to read pages by groups. For every such group we issue only one > + * read request to the server. > + * @num_pages: maximal chunk of data (in pages) that can be passed per > + * such request > + */ > +static int v9fs_readpages_tryfast(struct file *filp, > + struct address_space *mapping, > + struct list_head *pages, > + int num_pages) > +{ > + int ret; > + struct v9fs_readpages_ctx ctx; > + > + ret = init_readpages_ctx(&ctx, filp, mapping, num_pages); > + if (ret) > + /* > + * Can not allocate resources for the fast path, > + * so do it by slow way > + */ > + return read_cache_pages(mapping, pages, > + (void *)v9fs_vfs_readpage, filp); > + > + else > + ret = read_cache_pages(mapping, pages, > + (void *)fast_filler, &ctx); > + done_readpages_ctx(&ctx); > + return ret; > +} > + > /** > * v9fs_vfs_readpages - read a set of pages from 9P > * > @@ -114,6 +257,7 @@ static int v9fs_vfs_readpages(struct file *filp, struct address_space *mapping, > { > int ret = 0; > struct inode *inode; > + struct v9fs_flush_set *fset; > > inode = mapping->host; > p9_debug(P9_DEBUG_VFS, "inode: %p file: %p\n", inode, filp); > @@ -122,7 +266,17 @@ static int v9fs_vfs_readpages(struct file *filp, struct address_space *mapping, > if (ret == 0) > return ret; > > - ret = read_cache_pages(mapping, pages, (void *)v9fs_vfs_readpage, filp); > + fset = v9fs_inode2v9ses(mapping->host)->flush; > + if (!fset) > + /* > + * Do it by slow way > + */ > + ret = read_cache_pages(mapping, pages, > + (void *)v9fs_vfs_readpage, filp); > + else > + ret = v9fs_readpages_tryfast(filp, mapping, > + pages, fset->num_pages); > + > p9_debug(P9_DEBUG_VFS, " = %d\n", ret); > return ret; > } ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 2/2] 9p: v9fs new readpages. 2016-10-25 14:13 ` [Qemu-devel] " Alexander Graf @ 2016-12-09 18:42 ` Edward Shishkin 0 siblings, 0 replies; 15+ messages in thread From: Edward Shishkin @ 2016-12-09 18:42 UTC (permalink / raw) To: Alexander Graf, Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: Edward Shishkin, Claudio Fontana, QEMU Developers Mailing List, ZhangWei Hello Alexander, Thank you for the comments. Please, find my answers below. On 10/25/2016 04:13 PM, Alexander Graf wrote: > On 10/10/2016 07:24 PM, Edward Shishkin wrote: >> Modify v9fs private ->readpages() method of address_space >> operations for merging pages into long 9p messages. >> >> Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com> >> --- >> fs/9p/vfs_addr.c | 156 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 155 insertions(+), 1 deletion(-) >> >> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c >> index e871886..4ad248e 100644 >> --- a/fs/9p/vfs_addr.c >> +++ b/fs/9p/vfs_addr.c >> @@ -34,6 +34,7 @@ >> #include <linux/idr.h> >> #include <linux/sched.h> >> #include <linux/uio.h> >> +#include <linux/slab.h> >> #include <net/9p/9p.h> >> #include <net/9p/client.h> >> #include <trace/events/writeback.h> >> @@ -99,6 +100,148 @@ static int v9fs_vfs_readpage(struct file *filp, >> struct page *page) >> return v9fs_fid_readpage(filp->private_data, page); >> } >> +/* >> + * Context for "fast readpages" >> + */ >> +struct v9fs_readpages_ctx { >> + struct file *filp; >> + struct address_space *mapping; >> + pgoff_t start_index; /* index of the first page with actual data */ >> + char *buf; /* buffer with actual data */ >> + int len; /* length of the actual data */ >> + int num_pages; /* maximal data chunk (in pages) that can be >> + passed per transmission */ >> +}; >> + >> +static int init_readpages_ctx(struct v9fs_readpages_ctx *ctx, >> + struct file *filp, >> + struct address_space *mapping, >> + int num_pages) >> +{ >> + memset(ctx, 0, sizeof(*ctx)); >> + ctx->buf = kmalloc(num_pages << PAGE_SHIFT, GFP_USER); > > Doesn't this a) have potential information leak to user space Yes, allocation with such flag is a mistake. Will be fixed in v2. > and b) allow user space to allocate big amounts of kernel memory? Yes, I definitely missed a sanity check for the num_pages. Will be fixed in v2. > A limited page pool would probably be better here. > > I also don't quite grasp yet what pattern you're actually optimizing. reading/writing a large file in sequential/random manner. > Basically you're doing implicit read-ahead on behalf of the reader, right? Yes. As you can see, we always read (ctx->num_pages) number of pages. > So why would that be faster in a random 4k read scenario? You mean 41 MB/s vs 64 MB/s? With such read-ahead it's more likely that an up-to-date page will be present in the cache. So, why not? After all, our speedup of random 4K reads is not dramatic. > Also, have you compared tmpfs hosted files to only benchmark the > transmission path? No, I haven't. Only I/O speedup was a concern. I can obtain such numbers, if interesting. > >> + if (!ctx->buf) >> + return -ENOMEM; >> + ctx->filp = filp; >> + ctx->mapping = mapping; >> + ctx->num_pages = num_pages; >> + return 0; >> +} >> + >> +static void done_readpages_ctx(struct v9fs_readpages_ctx *ctx) >> +{ >> + kfree(ctx->buf); >> +} >> + >> +static int receive_buffer(struct file *filp, >> + char *buf, >> + off_t offset, /* offset in the file */ >> + int len, >> + int *err) >> +{ >> + struct kvec kvec; >> + struct iov_iter iter; >> + >> + kvec.iov_base = buf; >> + kvec.iov_len = len; >> + iov_iter_kvec(&iter, READ | ITER_KVEC, &kvec, 1, len); >> + >> + return p9_client_read(filp->private_data, offset, &iter, err); >> +} >> + >> +static int fast_filler(struct v9fs_readpages_ctx *ctx, struct page >> *page) >> +{ >> + int err; >> + int ret = 0; >> + char *kdata; >> + int to_page; >> + off_t off_in_buf; >> + struct inode *inode = page->mapping->host; >> + >> + BUG_ON(!PageLocked(page)); >> + /* >> + * first, validate the buffer >> + */ >> + if (page->index < ctx->start_index || >> + ctx->start_index + ctx->num_pages < page->index) { >> + /* >> + * No actual data in the buffer, >> + * so actualize it >> + */ >> + ret = receive_buffer(ctx->filp, >> + ctx->buf, >> + page_offset(page), >> + ctx->num_pages << PAGE_SHIFT, > > Doesn't this potentially read beyond the end of the file? POSIX doesn't prohibit to read beyond the end of a file. The only requirement is that the supplement should be filled with zeros. Full pages of such supplement are filled with zeros from the buffer: here we rely on the local file system on the host. Partial pages are filled with zeros by us at the place I have marked with (*) below. Thanks, Edward. > > Alex > >> + &err); >> + if (err) { >> + printk("failed to receive buffer off=%llu (%d)\n", >> + (unsigned long long)page_offset(page), >> + err); >> + ret = err; >> + goto done; >> + } >> + ctx->start_index = page->index; >> + ctx->len = ret; >> + ret = 0; >> + } >> + /* >> + * fill the page with buffer's data >> + */ >> + off_in_buf = (page->index - ctx->start_index) << PAGE_SHIFT; >> + if (off_in_buf >= ctx->len) { >> + /* >> + * No actual data to fill the page with >> + */ >> + ret = -1; >> + goto done; >> + } >> + to_page = ctx->len - off_in_buf; >> + if (to_page >= PAGE_SIZE) >> + to_page = PAGE_SIZE; >> + >> + kdata = kmap_atomic(page); >> + memcpy(kdata, ctx->buf + off_in_buf, to_page); >> + memset(kdata + to_page, 0, PAGE_SIZE - to_page); (*) >> + kunmap_atomic(kdata); >> + >> + flush_dcache_page(page); >> + SetPageUptodate(page); >> + v9fs_readpage_to_fscache(inode, page); >> + done: >> + unlock_page(page); >> + return ret; >> +} >> + >> +/** >> + * Try to read pages by groups. For every such group we issue only one >> + * read request to the server. >> + * @num_pages: maximal chunk of data (in pages) that can be passed per >> + * such request >> + */ >> +static int v9fs_readpages_tryfast(struct file *filp, >> + struct address_space *mapping, >> + struct list_head *pages, >> + int num_pages) >> +{ >> + int ret; >> + struct v9fs_readpages_ctx ctx; >> + >> + ret = init_readpages_ctx(&ctx, filp, mapping, num_pages); >> + if (ret) >> + /* >> + * Can not allocate resources for the fast path, >> + * so do it by slow way >> + */ >> + return read_cache_pages(mapping, pages, >> + (void *)v9fs_vfs_readpage, filp); >> + >> + else >> + ret = read_cache_pages(mapping, pages, >> + (void *)fast_filler, &ctx); >> + done_readpages_ctx(&ctx); >> + return ret; >> +} >> + >> /** >> * v9fs_vfs_readpages - read a set of pages from 9P >> * >> @@ -114,6 +257,7 @@ static int v9fs_vfs_readpages(struct file *filp, >> struct address_space *mapping, >> { >> int ret = 0; >> struct inode *inode; >> + struct v9fs_flush_set *fset; >> inode = mapping->host; >> p9_debug(P9_DEBUG_VFS, "inode: %p file: %p\n", inode, filp); >> @@ -122,7 +266,17 @@ static int v9fs_vfs_readpages(struct file *filp, >> struct address_space *mapping, >> if (ret == 0) >> return ret; >> - ret = read_cache_pages(mapping, pages, (void >> *)v9fs_vfs_readpage, filp); >> + fset = v9fs_inode2v9ses(mapping->host)->flush; >> + if (!fset) >> + /* >> + * Do it by slow way >> + */ >> + ret = read_cache_pages(mapping, pages, >> + (void *)v9fs_vfs_readpage, filp); >> + else >> + ret = v9fs_readpages_tryfast(filp, mapping, >> + pages, fset->num_pages); >> + >> p9_debug(P9_DEBUG_VFS, " = %d\n", ret); >> return ret; >> } > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] 9p: v9fs add writepages. 2016-10-10 17:24 ` [PATCH 1/2] 9p: v9fs add writepages Edward Shishkin 2016-10-10 17:24 ` [PATCH 2/2] 9p: v9fs new readpages Edward Shishkin @ 2016-10-25 14:01 ` Alexander Graf 2016-12-09 18:43 ` Edward Shishkin 1 sibling, 1 reply; 15+ messages in thread From: Alexander Graf @ 2016-10-25 14:01 UTC (permalink / raw) To: Edward Shishkin, Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: Edward Shishkin, Claudio Fontana, QEMU Developers Mailing List, ZhangWei On 10/10/2016 07:24 PM, Edward Shishkin wrote: > Add a v9fs private ->writepages() method of address_space > operations for merging pages into long 9p messages. > > Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com> > --- > fs/9p/v9fs.c | 46 +++++++ > fs/9p/v9fs.h | 22 +++- > fs/9p/vfs_addr.c | 357 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > fs/9p/vfs_super.c | 8 +- > 4 files changed, 431 insertions(+), 2 deletions(-) > > diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c > index 072e759..3b49daf 100644 > --- a/fs/9p/v9fs.c > +++ b/fs/9p/v9fs.c > @@ -32,6 +32,7 @@ > #include <linux/parser.h> > #include <linux/idr.h> > #include <linux/slab.h> > +#include <linux/pagemap.h> > #include <net/9p/9p.h> > #include <net/9p/client.h> > #include <net/9p/transport.h> > @@ -309,6 +310,49 @@ static int v9fs_parse_options(struct v9fs_session_info *v9ses, char *opts) > return ret; > } > > +void put_flush_set(struct v9fs_flush_set *fset) > +{ > + if (!fset) > + return; > + if (fset->pages) > + kfree(fset->pages); > + if (fset->buf) > + kfree(fset->buf); > + kfree(fset); > +} > + > +/** > + * Allocate and initalize flush set > + * Pre-conditions: valid msize is set > + */ > +int alloc_init_flush_set(struct v9fs_session_info *v9ses) > +{ > + int ret = -ENOMEM; > + int num_pages; > + struct v9fs_flush_set *fset = NULL; > + > + num_pages = v9ses->clnt->msize >> PAGE_SHIFT; > + if (num_pages < 2) > + /* speedup impossible */ > + return 0; > + fset = kzalloc(sizeof(*fset), GFP_KERNEL); > + if (!fset) > + goto error; > + fset->num_pages = num_pages; > + fset->pages = kzalloc(num_pages * sizeof(*fset->pages), GFP_KERNEL); > + if (!fset->pages) > + goto error; > + fset->buf = kzalloc(num_pages << PAGE_SHIFT, GFP_USER); > + if (!fset->buf) > + goto error; > + spin_lock_init(&(fset->lock)); > + v9ses->flush = fset; > + return 0; > + error: > + put_flush_set(fset); > + return ret; > +} > + > /** > * v9fs_session_init - initialize session > * @v9ses: session information structure > @@ -444,6 +488,8 @@ void v9fs_session_close(struct v9fs_session_info *v9ses) > kfree(v9ses->uname); > kfree(v9ses->aname); > > + put_flush_set(v9ses->flush); > + > bdi_destroy(&v9ses->bdi); > > spin_lock(&v9fs_sessionlist_lock); > diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h > index 6877050..d1092e4 100644 > --- a/fs/9p/v9fs.h > +++ b/fs/9p/v9fs.h > @@ -23,6 +23,7 @@ > #ifndef FS_9P_V9FS_H > #define FS_9P_V9FS_H > > +#include <linux/kconfig.h> > #include <linux/backing-dev.h> > > /** > @@ -69,6 +70,13 @@ enum p9_cache_modes { > CACHE_FSCACHE, > }; > > +struct v9fs_flush_set { > + struct page **pages; > + int num_pages; > + char *buf; > + spinlock_t lock; > +}; > + > /** > * struct v9fs_session_info - per-instance session information > * @flags: session options of type &p9_session_flags > @@ -105,7 +113,7 @@ struct v9fs_session_info { > char *cachetag; > struct fscache_cookie *fscache; > #endif > - > + struct v9fs_flush_set *flush; /* flush set for writepages */ > char *uname; /* user name to mount as */ > char *aname; /* name of remote hierarchy being mounted */ > unsigned int maxdata; /* max data for client interface */ > @@ -158,6 +166,8 @@ extern const struct inode_operations v9fs_symlink_inode_operations_dotl; > extern struct inode *v9fs_inode_from_fid_dotl(struct v9fs_session_info *v9ses, > struct p9_fid *fid, > struct super_block *sb, int new); > +extern int alloc_init_flush_set(struct v9fs_session_info *v9ses); > +extern void put_flush_set(struct v9fs_flush_set *fset); > > /* other default globals */ > #define V9FS_PORT 564 > @@ -222,4 +232,14 @@ v9fs_get_new_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid, > return v9fs_inode_from_fid(v9ses, fid, sb, 1); > } > > +static inline int spin_trylock_flush_set(struct v9fs_flush_set *fset) > +{ > + return spin_trylock(&(fset->lock)); > +} > + > +static inline void spin_unlock_flush_set(struct v9fs_flush_set *fset) > +{ > + spin_unlock(&(fset->lock)); > +} > + > #endif > diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c > index 6181ad7..e871886 100644 > --- a/fs/9p/vfs_addr.c > +++ b/fs/9p/vfs_addr.c > @@ -36,6 +36,7 @@ > #include <linux/uio.h> > #include <net/9p/9p.h> > #include <net/9p/client.h> > +#include <trace/events/writeback.h> > > #include "v9fs.h" > #include "v9fs_vfs.h" > @@ -209,6 +210,361 @@ static int v9fs_vfs_writepage(struct page *page, struct writeback_control *wbc) > return retval; > } > > +static void redirty_pages_for_writeback(struct page **pages, int nr, > + struct writeback_control *wbc) > +{ > + int i; > + for (i = 0; i < nr; i++) { > + lock_page(pages[i]); > + redirty_page_for_writepage(wbc, pages[i]); > + unlock_page(pages[i]); > + } > +} > + > +static void set_pages_error(struct page **pages, int nr, int error) > +{ > + int i; > + for (i = 0; i < nr; i++) { > + lock_page(pages[i]); > + SetPageError(pages[i]); > + mapping_set_error(pages[i]->mapping, error); > + unlock_page(pages[i]); > + } > +} > + > +#define V9FS_WRITEPAGES_DEBUG (0) > + > +struct flush_context { > + struct writeback_control *wbc; > + struct address_space *mapping; > + struct v9fs_flush_set *fset; > + pgoff_t done_index; > + pgoff_t writeback_index; > + pgoff_t index; > + pgoff_t end; /* Inclusive */ > + const char *msg; > + int cycled; > + int range_whole; > + int done; > +}; > + > +/** > + * Copy a page with file's data to buffer. > + * Handle races with truncate, etc. > + * Return number of copied bytes > + * > + * @page: page to copy data from; > + * @page_nr: serial number of the page > + */ > +static int flush_page(struct page *page, int page_nr, struct flush_context *ctx) > +{ > + char *kdata; > + loff_t isize; > + int copied = 0; > + struct writeback_control *wbc = ctx->wbc; > + /* > + * At this point, the page may be truncated or > + * invalidated (changing page->mapping to NULL), or > + * even swizzled back from swapper_space to tmpfs file > + * mapping. However, page->index will not change > + * because we have a reference on the page. > + */ > + if (page->index > ctx->end) { > + /* > + * can't be range_cyclic (1st pass) because > + * end == -1 in that case. > + */ > + ctx->done = 1; > + ctx->msg = "page out of range"; > + goto exit; > + } > + ctx->done_index = page->index; > + lock_page(page); > + /* > + * Page truncated or invalidated. We can freely skip it > + * then, even for data integrity operations: the page > + * has disappeared concurrently, so there could be no > + * real expectation of this data interity operation > + * even if there is now a new, dirty page at the same > + * pagecache address. > + */ > + if (unlikely(page->mapping != ctx->mapping)) { > + unlock_page(page); > + ctx->msg = "page truncated or invalidated"; > + goto exit; > + } > + if (!PageDirty(page)) { > + /* > + * someone wrote it for us > + */ > + unlock_page(page); > + ctx->msg = "page not dirty"; > + goto exit; > + } > + if (PageWriteback(page)) { > + if (wbc->sync_mode != WB_SYNC_NONE) > + wait_on_page_writeback(page); > + else { > + unlock_page(page); > + ctx->msg = "page is writeback"; > + goto exit; > + } > + } > + BUG_ON(PageWriteback(page)); > + if (!clear_page_dirty_for_io(page)) { > + unlock_page(page); > + ctx->msg = "failed to clear page dirty"; > + goto exit; > + } > + trace_wbc_writepage(wbc, inode_to_bdi(ctx->mapping->host)); > + > + set_page_writeback(page); > + isize = i_size_read(ctx->mapping->host); > + if (page->index == isize >> PAGE_SHIFT) > + copied = isize & ~PAGE_MASK; > + else > + copied = PAGE_SIZE; > + kdata = kmap_atomic(page); > + memcpy(ctx->fset->buf + (page_nr << PAGE_SHIFT), kdata, copied); > + kunmap_atomic(kdata); > + end_page_writeback(page); > + > + unlock_page(page); > + /* > + * We stop writing back only if we are not doing > + * integrity sync. In case of integrity sync we have to > + * keep going until we have written all the pages > + * we tagged for writeback prior to entering this loop. > + */ > + if (--wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE) > + ctx->done = 1; > + exit: > + return copied; > +} > + > +static int send_buffer(off_t offset, int len, struct flush_context *ctx) > +{ > + int ret = 0; > + struct kvec kvec; > + struct iov_iter iter; > + struct v9fs_inode *v9inode = V9FS_I(ctx->mapping->host); > + > + kvec.iov_base = ctx->fset->buf; > + kvec.iov_len = len; > + iov_iter_kvec(&iter, WRITE | ITER_KVEC, &kvec, 1, len); > + BUG_ON(!v9inode->writeback_fid); > + > + p9_client_write(v9inode->writeback_fid, offset, &iter, &ret); > + return ret; > +} > + > +/** > + * Helper function for managing 9pFS write requests. > + * The main purpose of this function is to provide support for > + * the coalescing of several pages into a single 9p message. > + * This is similarly to NFS's pagelist. > + * > + * Copy pages with adjusent indices to a buffer and send it to > + * the server. Why do you need to copy the pages? The transport below 9p - virtio in your case - has native scatter-gather support, so you don't need to copy anything, no? Alex ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] 9p: v9fs add writepages. 2016-10-25 14:01 ` [Qemu-devel] [PATCH 1/2] 9p: v9fs add writepages Alexander Graf @ 2016-12-09 18:43 ` Edward Shishkin 0 siblings, 0 replies; 15+ messages in thread From: Edward Shishkin @ 2016-12-09 18:43 UTC (permalink / raw) To: Alexander Graf, Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: Edward Shishkin, Claudio Fontana, QEMU Developers Mailing List, ZhangWei On 10/25/2016 04:01 PM, Alexander Graf wrote: > On 10/10/2016 07:24 PM, Edward Shishkin wrote: >> Add a v9fs private ->writepages() method of address_space >> operations for merging pages into long 9p messages. >> >> Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com> >> --- >> fs/9p/v9fs.c | 46 +++++++ >> fs/9p/v9fs.h | 22 +++- >> fs/9p/vfs_addr.c | 357 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> fs/9p/vfs_super.c | 8 +- >> 4 files changed, 431 insertions(+), 2 deletions(-) >> >> diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c >> index 072e759..3b49daf 100644 >> --- a/fs/9p/v9fs.c >> +++ b/fs/9p/v9fs.c >> @@ -32,6 +32,7 @@ >> #include <linux/parser.h> >> #include <linux/idr.h> >> #include <linux/slab.h> >> +#include <linux/pagemap.h> >> #include <net/9p/9p.h> >> #include <net/9p/client.h> >> #include <net/9p/transport.h> >> @@ -309,6 +310,49 @@ static int v9fs_parse_options(struct >> v9fs_session_info *v9ses, char *opts) >> return ret; >> } >> +void put_flush_set(struct v9fs_flush_set *fset) >> +{ >> + if (!fset) >> + return; >> + if (fset->pages) >> + kfree(fset->pages); >> + if (fset->buf) >> + kfree(fset->buf); >> + kfree(fset); >> +} >> + >> +/** >> + * Allocate and initalize flush set >> + * Pre-conditions: valid msize is set >> + */ >> +int alloc_init_flush_set(struct v9fs_session_info *v9ses) >> +{ >> + int ret = -ENOMEM; >> + int num_pages; >> + struct v9fs_flush_set *fset = NULL; >> + >> + num_pages = v9ses->clnt->msize >> PAGE_SHIFT; >> + if (num_pages < 2) >> + /* speedup impossible */ >> + return 0; >> + fset = kzalloc(sizeof(*fset), GFP_KERNEL); >> + if (!fset) >> + goto error; >> + fset->num_pages = num_pages; >> + fset->pages = kzalloc(num_pages * sizeof(*fset->pages), >> GFP_KERNEL); >> + if (!fset->pages) >> + goto error; >> + fset->buf = kzalloc(num_pages << PAGE_SHIFT, GFP_USER); >> + if (!fset->buf) >> + goto error; >> + spin_lock_init(&(fset->lock)); >> + v9ses->flush = fset; >> + return 0; >> + error: >> + put_flush_set(fset); >> + return ret; >> +} >> + >> /** >> * v9fs_session_init - initialize session >> * @v9ses: session information structure >> @@ -444,6 +488,8 @@ void v9fs_session_close(struct v9fs_session_info >> *v9ses) >> kfree(v9ses->uname); >> kfree(v9ses->aname); >> + put_flush_set(v9ses->flush); >> + >> bdi_destroy(&v9ses->bdi); >> spin_lock(&v9fs_sessionlist_lock); >> diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h >> index 6877050..d1092e4 100644 >> --- a/fs/9p/v9fs.h >> +++ b/fs/9p/v9fs.h >> @@ -23,6 +23,7 @@ >> #ifndef FS_9P_V9FS_H >> #define FS_9P_V9FS_H >> +#include <linux/kconfig.h> >> #include <linux/backing-dev.h> >> /** >> @@ -69,6 +70,13 @@ enum p9_cache_modes { >> CACHE_FSCACHE, >> }; >> +struct v9fs_flush_set { >> + struct page **pages; >> + int num_pages; >> + char *buf; >> + spinlock_t lock; >> +}; >> + >> /** >> * struct v9fs_session_info - per-instance session information >> * @flags: session options of type &p9_session_flags >> @@ -105,7 +113,7 @@ struct v9fs_session_info { >> char *cachetag; >> struct fscache_cookie *fscache; >> #endif >> - >> + struct v9fs_flush_set *flush; /* flush set for writepages */ >> char *uname; /* user name to mount as */ >> char *aname; /* name of remote hierarchy being mounted */ >> unsigned int maxdata; /* max data for client interface */ >> @@ -158,6 +166,8 @@ extern const struct inode_operations >> v9fs_symlink_inode_operations_dotl; >> extern struct inode *v9fs_inode_from_fid_dotl(struct >> v9fs_session_info *v9ses, >> struct p9_fid *fid, >> struct super_block *sb, int new); >> +extern int alloc_init_flush_set(struct v9fs_session_info *v9ses); >> +extern void put_flush_set(struct v9fs_flush_set *fset); >> /* other default globals */ >> #define V9FS_PORT 564 >> @@ -222,4 +232,14 @@ v9fs_get_new_inode_from_fid(struct >> v9fs_session_info *v9ses, struct p9_fid *fid, >> return v9fs_inode_from_fid(v9ses, fid, sb, 1); >> } >> +static inline int spin_trylock_flush_set(struct v9fs_flush_set *fset) >> +{ >> + return spin_trylock(&(fset->lock)); >> +} >> + >> +static inline void spin_unlock_flush_set(struct v9fs_flush_set *fset) >> +{ >> + spin_unlock(&(fset->lock)); >> +} >> + >> #endif >> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c >> index 6181ad7..e871886 100644 >> --- a/fs/9p/vfs_addr.c >> +++ b/fs/9p/vfs_addr.c >> @@ -36,6 +36,7 @@ >> #include <linux/uio.h> >> #include <net/9p/9p.h> >> #include <net/9p/client.h> >> +#include <trace/events/writeback.h> >> #include "v9fs.h" >> #include "v9fs_vfs.h" >> @@ -209,6 +210,361 @@ static int v9fs_vfs_writepage(struct page >> *page, struct writeback_control *wbc) >> return retval; >> } >> +static void redirty_pages_for_writeback(struct page **pages, int nr, >> + struct writeback_control *wbc) >> +{ >> + int i; >> + for (i = 0; i < nr; i++) { >> + lock_page(pages[i]); >> + redirty_page_for_writepage(wbc, pages[i]); >> + unlock_page(pages[i]); >> + } >> +} >> + >> +static void set_pages_error(struct page **pages, int nr, int error) >> +{ >> + int i; >> + for (i = 0; i < nr; i++) { >> + lock_page(pages[i]); >> + SetPageError(pages[i]); >> + mapping_set_error(pages[i]->mapping, error); >> + unlock_page(pages[i]); >> + } >> +} >> + >> +#define V9FS_WRITEPAGES_DEBUG (0) >> + >> +struct flush_context { >> + struct writeback_control *wbc; >> + struct address_space *mapping; >> + struct v9fs_flush_set *fset; >> + pgoff_t done_index; >> + pgoff_t writeback_index; >> + pgoff_t index; >> + pgoff_t end; /* Inclusive */ >> + const char *msg; >> + int cycled; >> + int range_whole; >> + int done; >> +}; >> + >> +/** >> + * Copy a page with file's data to buffer. >> + * Handle races with truncate, etc. >> + * Return number of copied bytes >> + * >> + * @page: page to copy data from; >> + * @page_nr: serial number of the page >> + */ >> +static int flush_page(struct page *page, int page_nr, struct >> flush_context *ctx) >> +{ >> + char *kdata; >> + loff_t isize; >> + int copied = 0; >> + struct writeback_control *wbc = ctx->wbc; >> + /* >> + * At this point, the page may be truncated or >> + * invalidated (changing page->mapping to NULL), or >> + * even swizzled back from swapper_space to tmpfs file >> + * mapping. However, page->index will not change >> + * because we have a reference on the page. >> + */ >> + if (page->index > ctx->end) { >> + /* >> + * can't be range_cyclic (1st pass) because >> + * end == -1 in that case. >> + */ >> + ctx->done = 1; >> + ctx->msg = "page out of range"; >> + goto exit; >> + } >> + ctx->done_index = page->index; >> + lock_page(page); >> + /* >> + * Page truncated or invalidated. We can freely skip it >> + * then, even for data integrity operations: the page >> + * has disappeared concurrently, so there could be no >> + * real expectation of this data interity operation >> + * even if there is now a new, dirty page at the same >> + * pagecache address. >> + */ >> + if (unlikely(page->mapping != ctx->mapping)) { >> + unlock_page(page); >> + ctx->msg = "page truncated or invalidated"; >> + goto exit; >> + } >> + if (!PageDirty(page)) { >> + /* >> + * someone wrote it for us >> + */ >> + unlock_page(page); >> + ctx->msg = "page not dirty"; >> + goto exit; >> + } >> + if (PageWriteback(page)) { >> + if (wbc->sync_mode != WB_SYNC_NONE) >> + wait_on_page_writeback(page); >> + else { >> + unlock_page(page); >> + ctx->msg = "page is writeback"; >> + goto exit; >> + } >> + } >> + BUG_ON(PageWriteback(page)); >> + if (!clear_page_dirty_for_io(page)) { >> + unlock_page(page); >> + ctx->msg = "failed to clear page dirty"; >> + goto exit; >> + } >> + trace_wbc_writepage(wbc, inode_to_bdi(ctx->mapping->host)); >> + >> + set_page_writeback(page); >> + isize = i_size_read(ctx->mapping->host); >> + if (page->index == isize >> PAGE_SHIFT) >> + copied = isize & ~PAGE_MASK; >> + else >> + copied = PAGE_SIZE; >> + kdata = kmap_atomic(page); >> + memcpy(ctx->fset->buf + (page_nr << PAGE_SHIFT), kdata, copied); >> + kunmap_atomic(kdata); >> + end_page_writeback(page); >> + >> + unlock_page(page); >> + /* >> + * We stop writing back only if we are not doing >> + * integrity sync. In case of integrity sync we have to >> + * keep going until we have written all the pages >> + * we tagged for writeback prior to entering this loop. >> + */ >> + if (--wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE) >> + ctx->done = 1; >> + exit: >> + return copied; >> +} >> + >> +static int send_buffer(off_t offset, int len, struct flush_context >> *ctx) >> +{ >> + int ret = 0; >> + struct kvec kvec; >> + struct iov_iter iter; >> + struct v9fs_inode *v9inode = V9FS_I(ctx->mapping->host); >> + >> + kvec.iov_base = ctx->fset->buf; >> + kvec.iov_len = len; >> + iov_iter_kvec(&iter, WRITE | ITER_KVEC, &kvec, 1, len); >> + BUG_ON(!v9inode->writeback_fid); >> + >> + p9_client_write(v9inode->writeback_fid, offset, &iter, &ret); >> + return ret; >> +} >> + >> +/** >> + * Helper function for managing 9pFS write requests. >> + * The main purpose of this function is to provide support for >> + * the coalescing of several pages into a single 9p message. >> + * This is similarly to NFS's pagelist. >> + * >> + * Copy pages with adjusent indices to a buffer and send it to >> + * the server. > > Why do you need to copy the pages? The transport below 9p - virtio in > your case - has native scatter-gather support, so you don't need to > copy anything, no? > Perhaps we can avoid copying pages. However it would mean modification of the 9P file transfer protocol, which is not aware of pages. I suspect that such activity requires substantial sponsorship, and I am not sure that it will be interesting for Huawei. Thanks, Edward. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC][PATCH 0/7] 9p: v9fs read and write speedup - V2 2016-10-10 17:24 [RFC][PATCH 0/2] 9p: v9fs read and write speedup Edward Shishkin 2016-10-10 17:24 ` [PATCH 1/2] 9p: v9fs add writepages Edward Shishkin @ 2016-12-12 18:13 ` Edward Shishkin 2016-12-12 18:15 ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin 1 sibling, 1 reply; 15+ messages in thread From: Edward Shishkin @ 2016-12-12 18:13 UTC (permalink / raw) To: Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana, Edward Shishkin, Greg Kurz, Alexander Graf Hello everyone, Version 2 of the patch-series contains cleanups and bug-fixes. The patches 1, 2 remain unchanged. The patches 3, 4, 5 are cleanups suggested by Fengguang Wu. The patches 6, 7 are fixups for bugs found by Alexander Graf. Any comments, suggestions are welcome as usual. Thanks, Edward. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/7] 9p: v9fs add writepages. 2016-12-12 18:13 ` [RFC][PATCH 0/7] 9p: v9fs read and write speedup - V2 Edward Shishkin @ 2016-12-12 18:15 ` edward.shishkin 2016-12-12 18:15 ` [PATCH 2/7] 9p: v9fs new readpages edward.shishkin ` (5 more replies) 0 siblings, 6 replies; 15+ messages in thread From: edward.shishkin @ 2016-12-12 18:15 UTC (permalink / raw) To: Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana, Greg Kurz, Alexander Graf, Eduard Shishkin From: Eduard Shishkin <eduard.shishkin@huawei.com> Add a v9fs private ->writepages() method of address_space operations for merging pages into long 9p messages. Signed-off-by: Eduard Shishkin <eduard.shishkin@huawei.com> --- fs/9p/v9fs.c | 46 +++++++ fs/9p/v9fs.h | 22 +++- fs/9p/vfs_addr.c | 357 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/9p/vfs_super.c | 8 +- 4 files changed, 431 insertions(+), 2 deletions(-) diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c index 072e759..3b49daf 100644 --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -32,6 +32,7 @@ #include <linux/parser.h> #include <linux/idr.h> #include <linux/slab.h> +#include <linux/pagemap.h> #include <net/9p/9p.h> #include <net/9p/client.h> #include <net/9p/transport.h> @@ -309,6 +310,49 @@ static int v9fs_parse_options(struct v9fs_session_info *v9ses, char *opts) return ret; } +void put_flush_set(struct v9fs_flush_set *fset) +{ + if (!fset) + return; + if (fset->pages) + kfree(fset->pages); + if (fset->buf) + kfree(fset->buf); + kfree(fset); +} + +/** + * Allocate and initalize flush set + * Pre-conditions: valid msize is set + */ +int alloc_init_flush_set(struct v9fs_session_info *v9ses) +{ + int ret = -ENOMEM; + int num_pages; + struct v9fs_flush_set *fset = NULL; + + num_pages = v9ses->clnt->msize >> PAGE_SHIFT; + if (num_pages < 2) + /* speedup impossible */ + return 0; + fset = kzalloc(sizeof(*fset), GFP_KERNEL); + if (!fset) + goto error; + fset->num_pages = num_pages; + fset->pages = kzalloc(num_pages * sizeof(*fset->pages), GFP_KERNEL); + if (!fset->pages) + goto error; + fset->buf = kzalloc(num_pages << PAGE_SHIFT, GFP_USER); + if (!fset->buf) + goto error; + spin_lock_init(&(fset->lock)); + v9ses->flush = fset; + return 0; + error: + put_flush_set(fset); + return ret; +} + /** * v9fs_session_init - initialize session * @v9ses: session information structure @@ -444,6 +488,8 @@ void v9fs_session_close(struct v9fs_session_info *v9ses) kfree(v9ses->uname); kfree(v9ses->aname); + put_flush_set(v9ses->flush); + bdi_destroy(&v9ses->bdi); spin_lock(&v9fs_sessionlist_lock); diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h index 443d12e..13a9db1 100644 --- a/fs/9p/v9fs.h +++ b/fs/9p/v9fs.h @@ -23,6 +23,7 @@ #ifndef FS_9P_V9FS_H #define FS_9P_V9FS_H +#include <linux/kconfig.h> #include <linux/backing-dev.h> /** @@ -69,6 +70,13 @@ enum p9_cache_modes { CACHE_FSCACHE, }; +struct v9fs_flush_set { + struct page **pages; + int num_pages; + char *buf; + spinlock_t lock; +}; + /** * struct v9fs_session_info - per-instance session information * @flags: session options of type &p9_session_flags @@ -105,7 +113,7 @@ struct v9fs_session_info { char *cachetag; struct fscache_cookie *fscache; #endif - + struct v9fs_flush_set *flush; /* flush set for writepages */ char *uname; /* user name to mount as */ char *aname; /* name of remote hierarchy being mounted */ unsigned int maxdata; /* max data for client interface */ @@ -159,6 +167,8 @@ extern const struct inode_operations v9fs_symlink_inode_operations_dotl; extern struct inode *v9fs_inode_from_fid_dotl(struct v9fs_session_info *v9ses, struct p9_fid *fid, struct super_block *sb, int new); +extern int alloc_init_flush_set(struct v9fs_session_info *v9ses); +extern void put_flush_set(struct v9fs_flush_set *fset); /* other default globals */ #define V9FS_PORT 564 @@ -223,4 +233,14 @@ v9fs_get_new_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid, return v9fs_inode_from_fid(v9ses, fid, sb, 1); } +static inline int spin_trylock_flush_set(struct v9fs_flush_set *fset) +{ + return spin_trylock(&(fset->lock)); +} + +static inline void spin_unlock_flush_set(struct v9fs_flush_set *fset) +{ + spin_unlock(&(fset->lock)); +} + #endif diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index 6181ad7..e871886 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -36,6 +36,7 @@ #include <linux/uio.h> #include <net/9p/9p.h> #include <net/9p/client.h> +#include <trace/events/writeback.h> #include "v9fs.h" #include "v9fs_vfs.h" @@ -209,6 +210,361 @@ static int v9fs_vfs_writepage(struct page *page, struct writeback_control *wbc) return retval; } +static void redirty_pages_for_writeback(struct page **pages, int nr, + struct writeback_control *wbc) +{ + int i; + for (i = 0; i < nr; i++) { + lock_page(pages[i]); + redirty_page_for_writepage(wbc, pages[i]); + unlock_page(pages[i]); + } +} + +static void set_pages_error(struct page **pages, int nr, int error) +{ + int i; + for (i = 0; i < nr; i++) { + lock_page(pages[i]); + SetPageError(pages[i]); + mapping_set_error(pages[i]->mapping, error); + unlock_page(pages[i]); + } +} + +#define V9FS_WRITEPAGES_DEBUG (0) + +struct flush_context { + struct writeback_control *wbc; + struct address_space *mapping; + struct v9fs_flush_set *fset; + pgoff_t done_index; + pgoff_t writeback_index; + pgoff_t index; + pgoff_t end; /* Inclusive */ + const char *msg; + int cycled; + int range_whole; + int done; +}; + +/** + * Copy a page with file's data to buffer. + * Handle races with truncate, etc. + * Return number of copied bytes + * + * @page: page to copy data from; + * @page_nr: serial number of the page + */ +static int flush_page(struct page *page, int page_nr, struct flush_context *ctx) +{ + char *kdata; + loff_t isize; + int copied = 0; + struct writeback_control *wbc = ctx->wbc; + /* + * At this point, the page may be truncated or + * invalidated (changing page->mapping to NULL), or + * even swizzled back from swapper_space to tmpfs file + * mapping. However, page->index will not change + * because we have a reference on the page. + */ + if (page->index > ctx->end) { + /* + * can't be range_cyclic (1st pass) because + * end == -1 in that case. + */ + ctx->done = 1; + ctx->msg = "page out of range"; + goto exit; + } + ctx->done_index = page->index; + lock_page(page); + /* + * Page truncated or invalidated. We can freely skip it + * then, even for data integrity operations: the page + * has disappeared concurrently, so there could be no + * real expectation of this data interity operation + * even if there is now a new, dirty page at the same + * pagecache address. + */ + if (unlikely(page->mapping != ctx->mapping)) { + unlock_page(page); + ctx->msg = "page truncated or invalidated"; + goto exit; + } + if (!PageDirty(page)) { + /* + * someone wrote it for us + */ + unlock_page(page); + ctx->msg = "page not dirty"; + goto exit; + } + if (PageWriteback(page)) { + if (wbc->sync_mode != WB_SYNC_NONE) + wait_on_page_writeback(page); + else { + unlock_page(page); + ctx->msg = "page is writeback"; + goto exit; + } + } + BUG_ON(PageWriteback(page)); + if (!clear_page_dirty_for_io(page)) { + unlock_page(page); + ctx->msg = "failed to clear page dirty"; + goto exit; + } + trace_wbc_writepage(wbc, inode_to_bdi(ctx->mapping->host)); + + set_page_writeback(page); + isize = i_size_read(ctx->mapping->host); + if (page->index == isize >> PAGE_SHIFT) + copied = isize & ~PAGE_MASK; + else + copied = PAGE_SIZE; + kdata = kmap_atomic(page); + memcpy(ctx->fset->buf + (page_nr << PAGE_SHIFT), kdata, copied); + kunmap_atomic(kdata); + end_page_writeback(page); + + unlock_page(page); + /* + * We stop writing back only if we are not doing + * integrity sync. In case of integrity sync we have to + * keep going until we have written all the pages + * we tagged for writeback prior to entering this loop. + */ + if (--wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE) + ctx->done = 1; + exit: + return copied; +} + +static int send_buffer(off_t offset, int len, struct flush_context *ctx) +{ + int ret = 0; + struct kvec kvec; + struct iov_iter iter; + struct v9fs_inode *v9inode = V9FS_I(ctx->mapping->host); + + kvec.iov_base = ctx->fset->buf; + kvec.iov_len = len; + iov_iter_kvec(&iter, WRITE | ITER_KVEC, &kvec, 1, len); + BUG_ON(!v9inode->writeback_fid); + + p9_client_write(v9inode->writeback_fid, offset, &iter, &ret); + return ret; +} + +/** + * Helper function for managing 9pFS write requests. + * The main purpose of this function is to provide support for + * the coalescing of several pages into a single 9p message. + * This is similarly to NFS's pagelist. + * + * Copy pages with adjusent indices to a buffer and send it to + * the server. + * + * @pages: array of pages with ascending indices; + * @nr_pages: number of pages in the array; + */ +static int flush_pages(struct page **pages, int nr_pages, + struct flush_context *ctx) +{ + int ret; + int pos = 0; + int iter_pos; + int iter_nrpages; + pgoff_t iter_page_idx; + + while (pos < nr_pages) { + + int i; + int iter_len = 0; + struct page *page; + + iter_pos = pos; + iter_nrpages = 0; + iter_page_idx = pages[pos]->index; + + for (i = 0; pos < nr_pages; i++) { + int from_page; + + page = pages[pos]; + if (page->index != iter_page_idx + i) { + /* + * Hole in the indices, + * further coalesce impossible. + * Try to send what we have accumulated. + * This page will be processed in the next + * iteration + */ + goto iter_send; + } + from_page = flush_page(page, i, ctx); + + iter_len += from_page; + iter_nrpages++; + pos++; + + if (from_page != PAGE_SIZE) { + /* + * Not full page was flushed, + * further coalesce impossible. + * Try to send what we have accumulated. + */ +#if V9FS_WRITEPAGES_DEBUG + if (from_page == 0) + printk("9p: page %lu is not flushed (%s)\n", + page->index, ctx->msg); +#endif + goto iter_send; + } + }; + iter_send: + if (iter_len == 0) + /* + * Nothing to send + */ + goto next_iter; + ret = send_buffer(iter_page_idx << PAGE_SHIFT, + iter_len, ctx); + if (ret == -EAGAIN) { + redirty_pages_for_writeback(pages + iter_pos, + iter_nrpages, ctx->wbc); + ret = 0; + } else if (ret < 0) { + /* + * Something bad happened. + * done_index is set past this chunk, + * so media errors will not choke + * background writeout for the entire + * file. + */ + printk("9p: send_buffer failed (%d)\n", ret); + + set_pages_error(pages + iter_pos, iter_nrpages, ret); + ctx->done_index = + pages[iter_pos + iter_nrpages - 1]->index + 1; + ctx->done = 1; + return ret; + } else + ret = 0; + next_iter: + if (ctx->done) + return ret; + }; + return 0; +} + +static void init_flush_context(struct flush_context *ctx, + struct address_space *mapping, + struct writeback_control *wbc, + struct v9fs_flush_set *fset) +{ + ctx->wbc = wbc; + ctx->mapping = mapping; + ctx->fset = fset; + ctx->done = 0; + ctx->range_whole = 0; + + if (wbc->range_cyclic) { + ctx->writeback_index = mapping->writeback_index; + ctx->index = ctx->writeback_index; + if (ctx->index == 0) + ctx->cycled = 1; + else + ctx->cycled = 0; + ctx->end = -1; + } else { + ctx->index = wbc->range_start >> PAGE_SHIFT; + ctx->end = wbc->range_end >> PAGE_SHIFT; + if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) + ctx->range_whole = 1; + ctx->cycled = 1; /* ignore range_cyclic tests */ + } +} + +/** + * Pre-condition: flush set is locked + */ +static int v9fs_writepages_fastpath(struct address_space *mapping, + struct writeback_control *wbc, + struct v9fs_flush_set *fset) +{ + int ret = 0; + int tag; + int nr_pages; + struct page **pages = fset->pages; + struct flush_context ctx; + + init_flush_context(&ctx, mapping, wbc, fset); + + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) + tag = PAGECACHE_TAG_TOWRITE; + else + tag = PAGECACHE_TAG_DIRTY; +retry: + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) + tag_pages_for_writeback(mapping, ctx.index, ctx.end); + + ctx.done_index = ctx.index; + + while (!ctx.done && (ctx.index <= ctx.end)) { + int i; + nr_pages = find_get_pages_tag(mapping, &ctx.index, tag, + 1 + min(ctx.end - ctx.index, + (pgoff_t)(fset->num_pages - 1)), + pages); + if (nr_pages == 0) + break; + + ret = flush_pages(pages, nr_pages, &ctx); + /* + * unpin pages + */ + for (i = 0; i < nr_pages; i++) + put_page(pages[i]); + if (ret < 0) + break; + cond_resched(); + } + if (!ctx.cycled && !ctx.done) { + /* + * range_cyclic: + * We hit the last page and there is more work + * to be done: wrap back to the start of the file + */ + ctx.cycled = 1; + ctx.index = 0; + ctx.end = ctx.writeback_index - 1; + goto retry; + } + if (wbc->range_cyclic || (ctx.range_whole && wbc->nr_to_write > 0)) + mapping->writeback_index = ctx.done_index; + return ret; +} + +static int v9fs_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + int ret; + struct v9fs_flush_set *fset; + + fset = v9fs_inode2v9ses(mapping->host)->flush; + if (!fset || !spin_trylock_flush_set(fset)) + /* + * do it by slow way + */ + return generic_writepages(mapping, wbc); + + ret = v9fs_writepages_fastpath(mapping, wbc, fset); + spin_unlock_flush_set(fset); + return ret; +} + /** * v9fs_launder_page - Writeback a dirty page * Returns 0 on success. @@ -342,6 +698,7 @@ const struct address_space_operations v9fs_addr_operations = { .readpages = v9fs_vfs_readpages, .set_page_dirty = __set_page_dirty_nobuffers, .writepage = v9fs_vfs_writepage, + .writepages = v9fs_writepages, .write_begin = v9fs_write_begin, .write_end = v9fs_write_end, .releasepage = v9fs_release_page, diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c index de3ed86..c1f9af1 100644 --- a/fs/9p/vfs_super.c +++ b/fs/9p/vfs_super.c @@ -140,8 +140,14 @@ static struct dentry *v9fs_mount(struct file_system_type *fs_type, int flags, } v9fs_fill_super(sb, v9ses, flags, data); - if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) + if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) { + retval = alloc_init_flush_set(v9ses); + if (IS_ERR(v9ses->flush)) { + retval = PTR_ERR(fid); + goto release_sb; + } sb->s_d_op = &v9fs_cached_dentry_operations; + } else sb->s_d_op = &v9fs_dentry_operations; -- 2.7.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/7] 9p: v9fs new readpages. 2016-12-12 18:15 ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin @ 2016-12-12 18:15 ` edward.shishkin 2016-12-12 18:15 ` [PATCH 3/7] 9p: v9fs fix ifnullfree.cocci warnings edward.shishkin ` (4 subsequent siblings) 5 siblings, 0 replies; 15+ messages in thread From: edward.shishkin @ 2016-12-12 18:15 UTC (permalink / raw) To: Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana, Greg Kurz, Alexander Graf, Eduard Shishkin From: Eduard Shishkin <eduard.shishkin@huawei.com> Modify v9fs private ->readpages() method of address_space operations for merging pages into long 9p messages. Signed-off-by: Eduard Shishkin <eduard.shishkin@huawei.com> --- fs/9p/vfs_addr.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 155 insertions(+), 1 deletion(-) diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index e871886..4ad248e 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -34,6 +34,7 @@ #include <linux/idr.h> #include <linux/sched.h> #include <linux/uio.h> +#include <linux/slab.h> #include <net/9p/9p.h> #include <net/9p/client.h> #include <trace/events/writeback.h> @@ -99,6 +100,148 @@ static int v9fs_vfs_readpage(struct file *filp, struct page *page) return v9fs_fid_readpage(filp->private_data, page); } +/* + * Context for "fast readpages" + */ +struct v9fs_readpages_ctx { + struct file *filp; + struct address_space *mapping; + pgoff_t start_index; /* index of the first page with actual data */ + char *buf; /* buffer with actual data */ + int len; /* length of the actual data */ + int num_pages; /* maximal data chunk (in pages) that can be + passed per transmission */ +}; + +static int init_readpages_ctx(struct v9fs_readpages_ctx *ctx, + struct file *filp, + struct address_space *mapping, + int num_pages) +{ + memset(ctx, 0, sizeof(*ctx)); + ctx->buf = kmalloc(num_pages << PAGE_SHIFT, GFP_USER); + if (!ctx->buf) + return -ENOMEM; + ctx->filp = filp; + ctx->mapping = mapping; + ctx->num_pages = num_pages; + return 0; +} + +static void done_readpages_ctx(struct v9fs_readpages_ctx *ctx) +{ + kfree(ctx->buf); +} + +static int receive_buffer(struct file *filp, + char *buf, + off_t offset, /* offset in the file */ + int len, + int *err) +{ + struct kvec kvec; + struct iov_iter iter; + + kvec.iov_base = buf; + kvec.iov_len = len; + iov_iter_kvec(&iter, READ | ITER_KVEC, &kvec, 1, len); + + return p9_client_read(filp->private_data, offset, &iter, err); +} + +static int fast_filler(struct v9fs_readpages_ctx *ctx, struct page *page) +{ + int err; + int ret = 0; + char *kdata; + int to_page; + off_t off_in_buf; + struct inode *inode = page->mapping->host; + + BUG_ON(!PageLocked(page)); + /* + * first, validate the buffer + */ + if (page->index < ctx->start_index || + ctx->start_index + ctx->num_pages < page->index) { + /* + * No actual data in the buffer, + * so actualize it + */ + ret = receive_buffer(ctx->filp, + ctx->buf, + page_offset(page), + ctx->num_pages << PAGE_SHIFT, + &err); + if (err) { + printk("failed to receive buffer off=%llu (%d)\n", + (unsigned long long)page_offset(page), + err); + ret = err; + goto done; + } + ctx->start_index = page->index; + ctx->len = ret; + ret = 0; + } + /* + * fill the page with buffer's data + */ + off_in_buf = (page->index - ctx->start_index) << PAGE_SHIFT; + if (off_in_buf >= ctx->len) { + /* + * No actual data to fill the page with + */ + ret = -1; + goto done; + } + to_page = ctx->len - off_in_buf; + if (to_page >= PAGE_SIZE) + to_page = PAGE_SIZE; + + kdata = kmap_atomic(page); + memcpy(kdata, ctx->buf + off_in_buf, to_page); + memset(kdata + to_page, 0, PAGE_SIZE - to_page); + kunmap_atomic(kdata); + + flush_dcache_page(page); + SetPageUptodate(page); + v9fs_readpage_to_fscache(inode, page); + done: + unlock_page(page); + return ret; +} + +/** + * Try to read pages by groups. For every such group we issue only one + * read request to the server. + * @num_pages: maximal chunk of data (in pages) that can be passed per + * such request + */ +static int v9fs_readpages_tryfast(struct file *filp, + struct address_space *mapping, + struct list_head *pages, + int num_pages) +{ + int ret; + struct v9fs_readpages_ctx ctx; + + ret = init_readpages_ctx(&ctx, filp, mapping, num_pages); + if (ret) + /* + * Can not allocate resources for the fast path, + * so do it by slow way + */ + return read_cache_pages(mapping, pages, + (void *)v9fs_vfs_readpage, filp); + + else + ret = read_cache_pages(mapping, pages, + (void *)fast_filler, &ctx); + done_readpages_ctx(&ctx); + return ret; +} + /** * v9fs_vfs_readpages - read a set of pages from 9P * @@ -114,6 +257,7 @@ static int v9fs_vfs_readpages(struct file *filp, struct address_space *mapping, { int ret = 0; struct inode *inode; + struct v9fs_flush_set *fset; inode = mapping->host; p9_debug(P9_DEBUG_VFS, "inode: %p file: %p\n", inode, filp); @@ -122,7 +266,17 @@ static int v9fs_vfs_readpages(struct file *filp, struct address_space *mapping, if (ret == 0) return ret; - ret = read_cache_pages(mapping, pages, (void *)v9fs_vfs_readpage, filp); + fset = v9fs_inode2v9ses(mapping->host)->flush; + if (!fset) + /* + * Do it by slow way + */ + ret = read_cache_pages(mapping, pages, + (void *)v9fs_vfs_readpage, filp); + else + ret = v9fs_readpages_tryfast(filp, mapping, + pages, fset->num_pages); + p9_debug(P9_DEBUG_VFS, " = %d\n", ret); return ret; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/7] 9p: v9fs fix ifnullfree.cocci warnings 2016-12-12 18:15 ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin 2016-12-12 18:15 ` [PATCH 2/7] 9p: v9fs new readpages edward.shishkin @ 2016-12-12 18:15 ` edward.shishkin 2016-12-12 18:15 ` [PATCH 4/7] 9p: v9fs fix odd_ptr_err.cocci warnings edward.shishkin ` (3 subsequent siblings) 5 siblings, 0 replies; 15+ messages in thread From: edward.shishkin @ 2016-12-12 18:15 UTC (permalink / raw) To: Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana, Greg Kurz, Alexander Graf, kbuild test robot, Eduard Shishkin From: kbuild test robot <fengguang.wu@intel.com> fs/9p/v9fs.c:318:2-7: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values. fs/9p/v9fs.c:320:2-7: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values. NULL check before some freeing functions is not needed. Based on checkpatch warning "kfree(NULL) is safe this check is probably not required" and kfreeaddr.cocci by Julia Lawall. Generated by: scripts/coccinelle/free/ifnullfree.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Eduard Shishkin <eduard.shishkin@huawei.com> --- fs/9p/v9fs.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c index 3b49daf..e2f84a6 100644 --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -314,10 +314,8 @@ void put_flush_set(struct v9fs_flush_set *fset) { if (!fset) return; - if (fset->pages) - kfree(fset->pages); - if (fset->buf) - kfree(fset->buf); + kfree(fset->pages); + kfree(fset->buf); kfree(fset); } -- 2.7.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/7] 9p: v9fs fix odd_ptr_err.cocci warnings 2016-12-12 18:15 ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin 2016-12-12 18:15 ` [PATCH 2/7] 9p: v9fs new readpages edward.shishkin 2016-12-12 18:15 ` [PATCH 3/7] 9p: v9fs fix ifnullfree.cocci warnings edward.shishkin @ 2016-12-12 18:15 ` edward.shishkin 2016-12-12 18:15 ` [PATCH 5/7] 9p: v9fs fix semicolon.cocci warnings edward.shishkin ` (2 subsequent siblings) 5 siblings, 0 replies; 15+ messages in thread From: edward.shishkin @ 2016-12-12 18:15 UTC (permalink / raw) To: Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana, Greg Kurz, Alexander Graf, kbuild test robot, Eduard Shishkin From: kbuild test robot <fengguang.wu@intel.com> fs/9p/vfs_super.c:145:6-12: inconsistent IS_ERR and PTR_ERR on line 146. PTR_ERR should access the value just tested by IS_ERR Semantic patch information: There can be false positives in the patch case, where it is the call to IS_ERR that is wrong. Generated by: scripts/coccinelle/tests/odd_ptr_err.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Eduard Shishkin <eduard.shishkin@huawei.com> --- fs/9p/vfs_super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c index c1f9af1..24aacec 100644 --- a/fs/9p/vfs_super.c +++ b/fs/9p/vfs_super.c @@ -143,7 +143,7 @@ static struct dentry *v9fs_mount(struct file_system_type *fs_type, int flags, if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) { retval = alloc_init_flush_set(v9ses); if (IS_ERR(v9ses->flush)) { - retval = PTR_ERR(fid); + retval = PTR_ERR(v9ses->flush); goto release_sb; } sb->s_d_op = &v9fs_cached_dentry_operations; -- 2.7.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 5/7] 9p: v9fs fix semicolon.cocci warnings 2016-12-12 18:15 ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin ` (2 preceding siblings ...) 2016-12-12 18:15 ` [PATCH 4/7] 9p: v9fs fix odd_ptr_err.cocci warnings edward.shishkin @ 2016-12-12 18:15 ` edward.shishkin 2016-12-12 18:15 ` [PATCH 6/7] 9p: v9fs fix readpages writepages contexts allocation edward.shishkin 2016-12-12 18:15 ` [PATCH 7/7] 9p: v9fs fix calculation of max number of merged pages edward.shishkin 5 siblings, 0 replies; 15+ messages in thread From: edward.shishkin @ 2016-12-12 18:15 UTC (permalink / raw) To: Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana, Greg Kurz, Alexander Graf, kbuild test robot, Eduard Shishkin From: kbuild test robot <fengguang.wu@intel.com> fs/9p/vfs_addr.c:425:3-4: Unneeded semicolon fs/9p/vfs_addr.c:458:2-3: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Eduard Shishkin <eduard.shishkin@huawei.com> --- fs/9p/vfs_addr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index 4ad248e..afd8a13 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -576,7 +576,7 @@ static int flush_pages(struct page **pages, int nr_pages, #endif goto iter_send; } - }; + } iter_send: if (iter_len == 0) /* @@ -609,7 +609,7 @@ static int flush_pages(struct page **pages, int nr_pages, next_iter: if (ctx->done) return ret; - }; + } return 0; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 6/7] 9p: v9fs fix readpages writepages contexts allocation 2016-12-12 18:15 ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin ` (3 preceding siblings ...) 2016-12-12 18:15 ` [PATCH 5/7] 9p: v9fs fix semicolon.cocci warnings edward.shishkin @ 2016-12-12 18:15 ` edward.shishkin 2016-12-12 18:15 ` [PATCH 7/7] 9p: v9fs fix calculation of max number of merged pages edward.shishkin 5 siblings, 0 replies; 15+ messages in thread From: edward.shishkin @ 2016-12-12 18:15 UTC (permalink / raw) To: Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana, Greg Kurz, Alexander Graf, Eduard Shishkin From: Eduard Shishkin <eduard.shishkin@huawei.com> Use GFP_KERNEL instead of GFP_USER when allocating buffers for readpages/writepages contexts. Signed-off-by: Eduard Shishkin <eduard.shishkin@huawei.com> --- fs/9p/v9fs.c | 2 +- fs/9p/vfs_addr.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c index e2f84a6..58bff9e 100644 --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -340,7 +340,7 @@ int alloc_init_flush_set(struct v9fs_session_info *v9ses) fset->pages = kzalloc(num_pages * sizeof(*fset->pages), GFP_KERNEL); if (!fset->pages) goto error; - fset->buf = kzalloc(num_pages << PAGE_SHIFT, GFP_USER); + fset->buf = kzalloc(num_pages << PAGE_SHIFT, GFP_KERNEL); if (!fset->buf) goto error; spin_lock_init(&(fset->lock)); diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index afd8a13..4b2b1d6 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -119,7 +119,7 @@ static int init_readpages_ctx(struct v9fs_readpages_ctx *ctx, int num_pages) { memset(ctx, 0, sizeof(*ctx)); - ctx->buf = kmalloc(num_pages << PAGE_SHIFT, GFP_USER); + ctx->buf = kmalloc(num_pages << PAGE_SHIFT, GFP_KERNEL); if (!ctx->buf) return -ENOMEM; ctx->filp = filp; -- 2.7.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 7/7] 9p: v9fs fix calculation of max number of merged pages 2016-12-12 18:15 ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin ` (4 preceding siblings ...) 2016-12-12 18:15 ` [PATCH 6/7] 9p: v9fs fix readpages writepages contexts allocation edward.shishkin @ 2016-12-12 18:15 ` edward.shishkin 5 siblings, 0 replies; 15+ messages in thread From: edward.shishkin @ 2016-12-12 18:15 UTC (permalink / raw) To: Eric Van Hensbergen, V9FS Developers Mailing List, Linux Filesystem Development List Cc: QEMU Developers Mailing List, ZhangWei, Claudio Fontana, Greg Kurz, Alexander Graf, Eduard Shishkin From: Eduard Shishkin <eduard.shishkin@huawei.com> Don't merge too many pages when composing a 9p message because: . it doesn't lead to essential performance improvement; . to not allow user space to allocate big amount of kernel memory. We use a limit of 256K (for total size of all pages merged per message), as larger values don't provide any visible speedup. Signed-off-by: Eduard Shishkin <eduard.shishkin@huawei.com> --- fs/9p/v9fs.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c index 58bff9e..50a4034 100644 --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -319,6 +319,8 @@ void put_flush_set(struct v9fs_flush_set *fset) kfree(fset); } +#define MAX_FLUSH_DATA_SIZE (262144) + /** * Allocate and initalize flush set * Pre-conditions: valid msize is set @@ -333,6 +335,11 @@ int alloc_init_flush_set(struct v9fs_session_info *v9ses) if (num_pages < 2) /* speedup impossible */ return 0; + if (num_pages > (MAX_FLUSH_DATA_SIZE >> PAGE_SHIFT)) + /* + * no performance gain with larger values + */ + num_pages = MAX_FLUSH_DATA_SIZE >> PAGE_SHIFT; fset = kzalloc(sizeof(*fset), GFP_KERNEL); if (!fset) goto error; -- 2.7.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-12-12 18:18 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-10-10 17:24 [RFC][PATCH 0/2] 9p: v9fs read and write speedup Edward Shishkin 2016-10-10 17:24 ` [PATCH 1/2] 9p: v9fs add writepages Edward Shishkin 2016-10-10 17:24 ` [PATCH 2/2] 9p: v9fs new readpages Edward Shishkin 2016-10-25 14:13 ` [Qemu-devel] " Alexander Graf 2016-12-09 18:42 ` Edward Shishkin 2016-10-25 14:01 ` [Qemu-devel] [PATCH 1/2] 9p: v9fs add writepages Alexander Graf 2016-12-09 18:43 ` Edward Shishkin 2016-12-12 18:13 ` [RFC][PATCH 0/7] 9p: v9fs read and write speedup - V2 Edward Shishkin 2016-12-12 18:15 ` [PATCH 1/7] 9p: v9fs add writepages edward.shishkin 2016-12-12 18:15 ` [PATCH 2/7] 9p: v9fs new readpages edward.shishkin 2016-12-12 18:15 ` [PATCH 3/7] 9p: v9fs fix ifnullfree.cocci warnings edward.shishkin 2016-12-12 18:15 ` [PATCH 4/7] 9p: v9fs fix odd_ptr_err.cocci warnings edward.shishkin 2016-12-12 18:15 ` [PATCH 5/7] 9p: v9fs fix semicolon.cocci warnings edward.shishkin 2016-12-12 18:15 ` [PATCH 6/7] 9p: v9fs fix readpages writepages contexts allocation edward.shishkin 2016-12-12 18:15 ` [PATCH 7/7] 9p: v9fs fix calculation of max number of merged pages edward.shishkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).