linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] [RFC] orangefs page cache
@ 2017-05-22  9:58 Martin Brandenburg
  2017-05-22  9:58 ` [PATCH 01/13] orangefs: move orangefs_address_operations to file.c Martin Brandenburg
                   ` (12 more replies)
  0 siblings, 13 replies; 20+ messages in thread
From: Martin Brandenburg @ 2017-05-22  9:58 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

OrangeFS has not used the page cache in the past.  This is a work in
progress patch to support it.  It does not yet make any attempt to do
writeback in a reasonable timeframe considering that OrangeFS is a
distributed filesystem.

What's done:

Inodes are written through write_back: getattrs and setattrs are not
necessarily sent to the server.

Pages are read from and written to the cache on readpage and writepage
and direct_IO works.  Some consolidation of code handling the different
read cases allow this diff to decrease code size.

Future work:

The old getattr cache will need to be changed significantly.  It
specifies a timeout which should be used by the new code.  Though the
old getattr cache gave quite a performance boost, it's cached data would
be destroyed on drop_inode.

Really the only way to do a getattr is during orangefs_iget.  Pages are
only read into the cache once.  Using this code with multiple clients
will not work at all.  Ultimately we will need to do writeback much
faster than at present and know when to fill requests from the cache and
when to require a server request.

The good news is that operating under such a delay found several race
conditions in our code which had not been detected because we always did
a server request in the past.

Does this look like a reasonable start?

Martin Brandenburg (13):
  orangefs: move orangefs_address_operations to file.c
  orangefs: remove orangefs_readpages
  orangefs: make orangefs_inode_read static
  orangefs: only set a_ops for regular files
  orangefs: BUG_ON if i_mode invalid
  orangefs: remove mapping_nrpages macro
  orangefs: set up and use backing_dev_info
  orangefs: initialize new inode size to zero
  orangefs: inodes linger in cache
  orangefs: implement direct_IO for the read case
  orangefs: lock inode during fsync
  orangefs: call generic_file_read_iter
  orangefs: implement write through the page cache

 fs/orangefs/file.c            | 232 ++++++++++++++++++++++++--------------
 fs/orangefs/inode.c           | 252 +++++-------------------------------------
 fs/orangefs/namei.c           |  13 ++-
 fs/orangefs/orangefs-kernel.h |   8 +-
 fs/orangefs/orangefs-utils.c  |   7 ++
 fs/orangefs/super.c           |  54 +++++++--
 fs/orangefs/symlink.c         |   1 -
 7 files changed, 242 insertions(+), 325 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/13] orangefs: move orangefs_address_operations to file.c
  2017-05-22  9:58 [PATCH 00/13] [RFC] orangefs page cache Martin Brandenburg
@ 2017-05-22  9:58 ` Martin Brandenburg
  2017-05-22  9:58 ` [PATCH 02/13] orangefs: remove orangefs_readpages Martin Brandenburg
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Martin Brandenburg @ 2017-05-22  9:58 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c  | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/orangefs/inode.c | 132 ----------------------------------------------------
 2 files changed, 132 insertions(+), 132 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index b421df1..9e409df 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -750,3 +750,135 @@ const struct file_operations orangefs_file_operations = {
 	.release	= orangefs_file_release,
 	.fsync		= orangefs_fsync,
 };
+
+static int read_one_page(struct page *page)
+{
+	int ret;
+	int max_block;
+	ssize_t bytes_read = 0;
+	struct inode *inode = page->mapping->host;
+	const __u32 blocksize = PAGE_SIZE;	/* inode->i_blksize */
+	const __u32 blockbits = PAGE_SHIFT;	/* inode->i_blkbits */
+	struct iov_iter to;
+	struct bio_vec bv = {.bv_page = page, .bv_len = PAGE_SIZE};
+
+	iov_iter_bvec(&to, ITER_BVEC | READ, &bv, 1, PAGE_SIZE);
+
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		    "orangefs_readpage called with page %p\n",
+		     page);
+
+	max_block = ((inode->i_size / blocksize) + 1);
+
+	if (page->index < max_block) {
+		loff_t blockptr_offset = (((loff_t) page->index) << blockbits);
+
+		bytes_read = orangefs_inode_read(inode,
+						 &to,
+						 &blockptr_offset,
+						 inode->i_size);
+	}
+	/* this will only zero remaining unread portions of the page data */
+	iov_iter_zero(~0U, &to);
+	/* takes care of potential aliasing */
+	flush_dcache_page(page);
+	if (bytes_read < 0) {
+		ret = bytes_read;
+		SetPageError(page);
+	} else {
+		SetPageUptodate(page);
+		if (PageError(page))
+			ClearPageError(page);
+		ret = 0;
+	}
+	/* unlock the page after the ->readpage() routine completes */
+	unlock_page(page);
+	return ret;
+}
+
+static int orangefs_readpage(struct file *file, struct page *page)
+{
+	return read_one_page(page);
+}
+
+static int orangefs_readpages(struct file *file,
+			   struct address_space *mapping,
+			   struct list_head *pages,
+			   unsigned nr_pages)
+{
+	int page_idx;
+	int ret;
+
+	gossip_debug(GOSSIP_INODE_DEBUG, "orangefs_readpages called\n");
+
+	for (page_idx = 0; page_idx < nr_pages; page_idx++) {
+		struct page *page;
+
+		page = list_entry(pages->prev, struct page, lru);
+		list_del(&page->lru);
+		if (!add_to_page_cache(page,
+				       mapping,
+				       page->index,
+				       readahead_gfp_mask(mapping))) {
+			ret = read_one_page(page);
+			gossip_debug(GOSSIP_INODE_DEBUG,
+				"failure adding page to cache, read_one_page returned: %d\n",
+				ret);
+	      } else {
+			put_page(page);
+	      }
+	}
+	BUG_ON(!list_empty(pages));
+	return 0;
+}
+
+static void orangefs_invalidatepage(struct page *page,
+				 unsigned int offset,
+				 unsigned int length)
+{
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "orangefs_invalidatepage called on page %p "
+		     "(offset is %u)\n",
+		     page,
+		     offset);
+
+	ClearPageUptodate(page);
+	ClearPageMappedToDisk(page);
+	return;
+
+}
+
+static int orangefs_releasepage(struct page *page, gfp_t foo)
+{
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "orangefs_releasepage called on page %p\n",
+		     page);
+	return 0;
+}
+
+/*
+ * Having a direct_IO entry point in the address_space_operations
+ * struct causes the kernel to allows us to use O_DIRECT on
+ * open. Nothing will ever call this thing, but in the future we
+ * will need to be able to use O_DIRECT on open in order to support
+ * AIO. Modeled after NFS, they do this too.
+ */
+
+static ssize_t orangefs_direct_IO(struct kiocb *iocb,
+				  struct iov_iter *iter)
+{
+	gossip_debug(GOSSIP_INODE_DEBUG,
+		     "orangefs_direct_IO: %pD\n",
+		     iocb->ki_filp);
+
+	return -EINVAL;
+}
+
+/** ORANGEFS2 implementation of address space operations */
+const struct address_space_operations orangefs_address_operations = {
+	.readpage = orangefs_readpage,
+	.readpages = orangefs_readpages,
+	.invalidatepage = orangefs_invalidatepage,
+	.releasepage = orangefs_releasepage,
+	.direct_IO = orangefs_direct_IO,
+};
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 9428ea0..a277aeb8 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -13,138 +13,6 @@
 #include "orangefs-kernel.h"
 #include "orangefs-bufmap.h"
 
-static int read_one_page(struct page *page)
-{
-	int ret;
-	int max_block;
-	ssize_t bytes_read = 0;
-	struct inode *inode = page->mapping->host;
-	const __u32 blocksize = PAGE_SIZE;	/* inode->i_blksize */
-	const __u32 blockbits = PAGE_SHIFT;	/* inode->i_blkbits */
-	struct iov_iter to;
-	struct bio_vec bv = {.bv_page = page, .bv_len = PAGE_SIZE};
-
-	iov_iter_bvec(&to, ITER_BVEC | READ, &bv, 1, PAGE_SIZE);
-
-	gossip_debug(GOSSIP_INODE_DEBUG,
-		    "orangefs_readpage called with page %p\n",
-		     page);
-
-	max_block = ((inode->i_size / blocksize) + 1);
-
-	if (page->index < max_block) {
-		loff_t blockptr_offset = (((loff_t) page->index) << blockbits);
-
-		bytes_read = orangefs_inode_read(inode,
-						 &to,
-						 &blockptr_offset,
-						 inode->i_size);
-	}
-	/* this will only zero remaining unread portions of the page data */
-	iov_iter_zero(~0U, &to);
-	/* takes care of potential aliasing */
-	flush_dcache_page(page);
-	if (bytes_read < 0) {
-		ret = bytes_read;
-		SetPageError(page);
-	} else {
-		SetPageUptodate(page);
-		if (PageError(page))
-			ClearPageError(page);
-		ret = 0;
-	}
-	/* unlock the page after the ->readpage() routine completes */
-	unlock_page(page);
-	return ret;
-}
-
-static int orangefs_readpage(struct file *file, struct page *page)
-{
-	return read_one_page(page);
-}
-
-static int orangefs_readpages(struct file *file,
-			   struct address_space *mapping,
-			   struct list_head *pages,
-			   unsigned nr_pages)
-{
-	int page_idx;
-	int ret;
-
-	gossip_debug(GOSSIP_INODE_DEBUG, "orangefs_readpages called\n");
-
-	for (page_idx = 0; page_idx < nr_pages; page_idx++) {
-		struct page *page;
-
-		page = list_entry(pages->prev, struct page, lru);
-		list_del(&page->lru);
-		if (!add_to_page_cache(page,
-				       mapping,
-				       page->index,
-				       readahead_gfp_mask(mapping))) {
-			ret = read_one_page(page);
-			gossip_debug(GOSSIP_INODE_DEBUG,
-				"failure adding page to cache, read_one_page returned: %d\n",
-				ret);
-	      } else {
-			put_page(page);
-	      }
-	}
-	BUG_ON(!list_empty(pages));
-	return 0;
-}
-
-static void orangefs_invalidatepage(struct page *page,
-				 unsigned int offset,
-				 unsigned int length)
-{
-	gossip_debug(GOSSIP_INODE_DEBUG,
-		     "orangefs_invalidatepage called on page %p "
-		     "(offset is %u)\n",
-		     page,
-		     offset);
-
-	ClearPageUptodate(page);
-	ClearPageMappedToDisk(page);
-	return;
-
-}
-
-static int orangefs_releasepage(struct page *page, gfp_t foo)
-{
-	gossip_debug(GOSSIP_INODE_DEBUG,
-		     "orangefs_releasepage called on page %p\n",
-		     page);
-	return 0;
-}
-
-/*
- * Having a direct_IO entry point in the address_space_operations
- * struct causes the kernel to allows us to use O_DIRECT on
- * open. Nothing will ever call this thing, but in the future we
- * will need to be able to use O_DIRECT on open in order to support
- * AIO. Modeled after NFS, they do this too.
- */
-
-static ssize_t orangefs_direct_IO(struct kiocb *iocb,
-				  struct iov_iter *iter)
-{
-	gossip_debug(GOSSIP_INODE_DEBUG,
-		     "orangefs_direct_IO: %pD\n",
-		     iocb->ki_filp);
-
-	return -EINVAL;
-}
-
-/** ORANGEFS2 implementation of address space operations */
-const struct address_space_operations orangefs_address_operations = {
-	.readpage = orangefs_readpage,
-	.readpages = orangefs_readpages,
-	.invalidatepage = orangefs_invalidatepage,
-	.releasepage = orangefs_releasepage,
-	.direct_IO = orangefs_direct_IO,
-};
-
 static int orangefs_setattr_size(struct inode *inode, struct iattr *iattr)
 {
 	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/13] orangefs: remove orangefs_readpages
  2017-05-22  9:58 [PATCH 00/13] [RFC] orangefs page cache Martin Brandenburg
  2017-05-22  9:58 ` [PATCH 01/13] orangefs: move orangefs_address_operations to file.c Martin Brandenburg
@ 2017-05-22  9:58 ` Martin Brandenburg
  2017-05-22  9:58 ` [PATCH 03/13] orangefs: make orangefs_inode_read static Martin Brandenburg
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Martin Brandenburg @ 2017-05-22  9:58 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

It's a copy of the loop which runs if it's not implemented from
read_pages from mm/readahead.c.  Best to just use the generic code.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c | 39 +--------------------------------------
 1 file changed, 1 insertion(+), 38 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 9e409df..5f9577b 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -751,7 +751,7 @@ const struct file_operations orangefs_file_operations = {
 	.fsync		= orangefs_fsync,
 };
 
-static int read_one_page(struct page *page)
+static int orangefs_readpage(struct file *file, struct page *page)
 {
 	int ret;
 	int max_block;
@@ -796,42 +796,6 @@ static int read_one_page(struct page *page)
 	return ret;
 }
 
-static int orangefs_readpage(struct file *file, struct page *page)
-{
-	return read_one_page(page);
-}
-
-static int orangefs_readpages(struct file *file,
-			   struct address_space *mapping,
-			   struct list_head *pages,
-			   unsigned nr_pages)
-{
-	int page_idx;
-	int ret;
-
-	gossip_debug(GOSSIP_INODE_DEBUG, "orangefs_readpages called\n");
-
-	for (page_idx = 0; page_idx < nr_pages; page_idx++) {
-		struct page *page;
-
-		page = list_entry(pages->prev, struct page, lru);
-		list_del(&page->lru);
-		if (!add_to_page_cache(page,
-				       mapping,
-				       page->index,
-				       readahead_gfp_mask(mapping))) {
-			ret = read_one_page(page);
-			gossip_debug(GOSSIP_INODE_DEBUG,
-				"failure adding page to cache, read_one_page returned: %d\n",
-				ret);
-	      } else {
-			put_page(page);
-	      }
-	}
-	BUG_ON(!list_empty(pages));
-	return 0;
-}
-
 static void orangefs_invalidatepage(struct page *page,
 				 unsigned int offset,
 				 unsigned int length)
@@ -877,7 +841,6 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb,
 /** ORANGEFS2 implementation of address space operations */
 const struct address_space_operations orangefs_address_operations = {
 	.readpage = orangefs_readpage,
-	.readpages = orangefs_readpages,
 	.invalidatepage = orangefs_invalidatepage,
 	.releasepage = orangefs_releasepage,
 	.direct_IO = orangefs_direct_IO,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/13] orangefs: make orangefs_inode_read static
  2017-05-22  9:58 [PATCH 00/13] [RFC] orangefs page cache Martin Brandenburg
  2017-05-22  9:58 ` [PATCH 01/13] orangefs: move orangefs_address_operations to file.c Martin Brandenburg
  2017-05-22  9:58 ` [PATCH 02/13] orangefs: remove orangefs_readpages Martin Brandenburg
@ 2017-05-22  9:58 ` Martin Brandenburg
  2017-05-22  9:58 ` [PATCH 04/13] orangefs: only set a_ops for regular files Martin Brandenburg
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Martin Brandenburg @ 2017-05-22  9:58 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c            | 6 ++----
 fs/orangefs/orangefs-kernel.h | 5 -----
 2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 5f9577b..9e0d334 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -402,10 +402,8 @@ static ssize_t do_readv_writev(enum ORANGEFS_io_type type, struct file *file,
  * Read data from a specified offset in a file (referenced by inode).
  * Data may be placed either in a user or kernel buffer.
  */
-ssize_t orangefs_inode_read(struct inode *inode,
-			    struct iov_iter *iter,
-			    loff_t *offset,
-			    loff_t readahead_size)
+static ssize_t orangefs_inode_read(struct inode *inode,
+    struct iov_iter *iter, loff_t *offset, loff_t readahead_size)
 {
 	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
 	size_t count = iov_iter_count(iter);
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index ea0ce50..9028501 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -463,11 +463,6 @@ ssize_t orangefs_listxattr(struct dentry *dentry, char *buffer, size_t size);
 struct inode *orangefs_iget(struct super_block *sb,
 			 struct orangefs_object_kref *ref);
 
-ssize_t orangefs_inode_read(struct inode *inode,
-			    struct iov_iter *iter,
-			    loff_t *offset,
-			    loff_t readahead_size);
-
 /*
  * defined in devorangefs-req.c
  */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/13] orangefs: only set a_ops for regular files
  2017-05-22  9:58 [PATCH 00/13] [RFC] orangefs page cache Martin Brandenburg
                   ` (2 preceding siblings ...)
  2017-05-22  9:58 ` [PATCH 03/13] orangefs: make orangefs_inode_read static Martin Brandenburg
@ 2017-05-22  9:58 ` Martin Brandenburg
  2017-05-22  9:58 ` [PATCH 05/13] orangefs: BUG_ON if i_mode invalid Martin Brandenburg
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Martin Brandenburg @ 2017-05-22  9:58 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/inode.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index a277aeb8..a9681b0 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -169,12 +169,11 @@ const struct inode_operations orangefs_file_inode_operations = {
 
 static int orangefs_init_iops(struct inode *inode)
 {
-	inode->i_mapping->a_ops = &orangefs_address_operations;
-
 	switch (inode->i_mode & S_IFMT) {
 	case S_IFREG:
 		inode->i_op = &orangefs_file_inode_operations;
 		inode->i_fop = &orangefs_file_operations;
+		inode->i_data.a_ops = &orangefs_address_operations;
 		inode->i_blkbits = PAGE_SHIFT;
 		break;
 	case S_IFLNK:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/13] orangefs: BUG_ON if i_mode invalid
  2017-05-22  9:58 [PATCH 00/13] [RFC] orangefs page cache Martin Brandenburg
                   ` (3 preceding siblings ...)
  2017-05-22  9:58 ` [PATCH 04/13] orangefs: only set a_ops for regular files Martin Brandenburg
@ 2017-05-22  9:58 ` Martin Brandenburg
  2017-05-22  9:58 ` [PATCH 06/13] orangefs: remove mapping_nrpages macro Martin Brandenburg
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Martin Brandenburg @ 2017-05-22  9:58 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

This can't happen since it would have been caught (and a graceful
error returned) earlier.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/inode.c          | 14 ++++----------
 fs/orangefs/orangefs-utils.c |  4 ++++
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index a9681b0..9912104 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -175,22 +175,16 @@ static int orangefs_init_iops(struct inode *inode)
 		inode->i_fop = &orangefs_file_operations;
 		inode->i_data.a_ops = &orangefs_address_operations;
 		inode->i_blkbits = PAGE_SHIFT;
-		break;
+		return 0;
 	case S_IFLNK:
 		inode->i_op = &orangefs_symlink_inode_operations;
-		break;
+		return 0;
 	case S_IFDIR:
 		inode->i_op = &orangefs_dir_inode_operations;
 		inode->i_fop = &orangefs_dir_operations;
-		break;
-	default:
-		gossip_debug(GOSSIP_INODE_DEBUG,
-			     "%s: unsupported mode\n",
-			     __func__);
-		return -EINVAL;
+		return 0;
 	}
-
-	return 0;
+	BUG_ON(1);
 }
 
 /*
diff --git a/fs/orangefs/orangefs-utils.c b/fs/orangefs/orangefs-utils.c
index aab6f18..1fb90bd 100644
--- a/fs/orangefs/orangefs-utils.c
+++ b/fs/orangefs/orangefs-utils.c
@@ -294,6 +294,10 @@ int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
 
 	type = orangefs_inode_type(new_op->
 	    downcall.resp.getattr.attributes.objtype);
+	if (type == -1) {
+		ret = -EIO;
+		goto out;
+	}
 	ret = orangefs_inode_is_stale(inode, new,
 	    &new_op->downcall.resp.getattr.attributes,
 	    new_op->downcall.resp.getattr.link_target);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/13] orangefs: remove mapping_nrpages macro
  2017-05-22  9:58 [PATCH 00/13] [RFC] orangefs page cache Martin Brandenburg
                   ` (4 preceding siblings ...)
  2017-05-22  9:58 ` [PATCH 05/13] orangefs: BUG_ON if i_mode invalid Martin Brandenburg
@ 2017-05-22  9:58 ` Martin Brandenburg
  2017-05-22  9:58 ` [PATCH 07/13] orangefs: set up and use backing_dev_info Martin Brandenburg
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Martin Brandenburg @ 2017-05-22  9:58 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 9e0d334..0de41a3 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -599,8 +599,6 @@ static int orangefs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	return generic_file_readonly_mmap(file, vma);
 }
 
-#define mapping_nrpages(idata) ((idata)->nrpages)
-
 /*
  * Called to notify the module that there are no more references to
  * this file (i.e. no processes have it open).
@@ -622,7 +620,7 @@ static int orangefs_file_release(struct inode *inode, struct file *file)
 	 */
 	if (file_inode(file) &&
 	    file_inode(file)->i_mapping &&
-	    mapping_nrpages(&file_inode(file)->i_data)) {
+	    file_inode(file)->i_mapping->nrpages) {
 		if (orangefs_features & ORANGEFS_FEATURE_READAHEAD) {
 			gossip_debug(GOSSIP_INODE_DEBUG,
 			    "calling flush_racache on %pU\n",
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/13] orangefs: set up and use backing_dev_info
  2017-05-22  9:58 [PATCH 00/13] [RFC] orangefs page cache Martin Brandenburg
                   ` (5 preceding siblings ...)
  2017-05-22  9:58 ` [PATCH 06/13] orangefs: remove mapping_nrpages macro Martin Brandenburg
@ 2017-05-22  9:58 ` Martin Brandenburg
  2017-05-22  9:58 ` [PATCH 08/13] orangefs: initialize new inode size to zero Martin Brandenburg
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Martin Brandenburg @ 2017-05-22  9:58 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

This is a heavily modified revert of
70823b9bf3290855a7df895d89bd8209182b52e3.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/orangefs-kernel.h |  1 +
 fs/orangefs/super.c           | 31 ++++++++++++++++++++++++-------
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index 9028501..6dfb5b6 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -252,6 +252,7 @@ struct orangefs_sb_info_s {
 	int mount_pending;
 	int no_list;
 	struct list_head list;
+	struct backing_dev_info bdi;
 };
 
 /*
diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 5c7c273..4d994f8 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -399,15 +399,11 @@ static int orangefs_fill_sb(struct super_block *sb,
 		struct orangefs_fs_mount_response *fs_mount,
 		void *data, int silent)
 {
-	int ret = -EINVAL;
-	struct inode *root = NULL;
-	struct dentry *root_dentry = NULL;
+	int ret;
+	struct inode *root;
+	struct dentry *root_dentry;
 	struct orangefs_object_kref root_object;
 
-	/* alloc and init our private orangefs sb info */
-	sb->s_fs_info = kzalloc(sizeof(struct orangefs_sb_info_s), GFP_KERNEL);
-	if (!ORANGEFS_SB(sb))
-		return -ENOMEM;
 	ORANGEFS_SB(sb)->sb = sb;
 
 	ORANGEFS_SB(sb)->root_khandle = fs_mount->root_khandle;
@@ -430,6 +426,8 @@ static int orangefs_fill_sb(struct super_block *sb,
 	sb->s_blocksize_bits = orangefs_bufmap_shift_query();
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
 
+	sb->s_bdi = &ORANGEFS_SB(sb)->bdi;
+
 	root_object.khandle = ORANGEFS_SB(sb)->root_khandle;
 	root_object.fs_id = ORANGEFS_SB(sb)->fs_id;
 	gossip_debug(GOSSIP_SUPER_DEBUG,
@@ -508,6 +506,23 @@ struct dentry *orangefs_mount(struct file_system_type *fst,
 		goto free_op;
 	}
 
+	/* alloc and init our private orangefs sb info */
+	sb->s_fs_info = kzalloc(sizeof(struct orangefs_sb_info_s), GFP_KERNEL);
+	if (!ORANGEFS_SB(sb)) {
+		d = ERR_PTR(-ENOMEM);
+		goto free_op;
+	}
+
+	ret = bdi_setup_and_register(&ORANGEFS_SB(sb)->bdi, "orangefs");
+	if (ret) {
+		/*
+		 * Ordinarily freed by orangefs_kill_sb but that won't
+		 * happen yet.
+		 */
+		kfree(sb->s_fs_info);
+		goto free_op;
+	}
+
 	ret = orangefs_fill_sb(sb,
 	      &new_op->downcall.resp.fs_mount, data,
 	      flags & MS_SILENT ? 1 : 0);
@@ -582,6 +597,8 @@ void orangefs_kill_sb(struct super_block *sb)
 	/* provided sb cleanup */
 	kill_anon_super(sb);
 
+	bdi_destroy(&ORANGEFS_SB(sb)->bdi);
+
 	/*
 	 * issue the unmount to userspace to tell it to remove the
 	 * dynamic mount info it has for this superblock
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/13] orangefs: initialize new inode size to zero
  2017-05-22  9:58 [PATCH 00/13] [RFC] orangefs page cache Martin Brandenburg
                   ` (6 preceding siblings ...)
  2017-05-22  9:58 ` [PATCH 07/13] orangefs: set up and use backing_dev_info Martin Brandenburg
@ 2017-05-22  9:58 ` Martin Brandenburg
  2017-05-22  9:58 ` [PATCH 09/13] orangefs: inodes linger in cache Martin Brandenburg
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Martin Brandenburg @ 2017-05-22  9:58 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Without attribute caching, the correct size would be pulled from the
server.  With attribute caching, the wrong size could make it to
userspace.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/inode.c | 2 +-
 fs/orangefs/namei.c | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 9912104..3ee121a 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -296,7 +296,7 @@ struct inode *orangefs_new_inode(struct super_block *sb, struct inode *dir,
 	inode->i_uid = current_fsuid();
 	inode->i_gid = current_fsgid();
 	inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
-	inode->i_size = PAGE_SIZE;
+	inode->i_size = 0;
 	inode->i_rdev = dev;
 
 	error = insert_inode_locked4(inode, hash, orangefs_test_inode, ref);
diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
index 478e88b..020a402 100644
--- a/fs/orangefs/namei.c
+++ b/fs/orangefs/namei.c
@@ -319,6 +319,8 @@ static int orangefs_symlink(struct inode *dir,
 		     "Assigned symlink inode new number of %pU\n",
 		     get_khandle_from_ino(inode));
 
+	inode->i_size = strlen(symname);
+
 	d_instantiate(dentry, inode);
 	unlock_new_inode(inode);
 	orangefs_set_timeout(dentry);
@@ -384,6 +386,8 @@ static int orangefs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode
 		     "Assigned dir inode new number of %pU\n",
 		     get_khandle_from_ino(inode));
 
+	inode->i_size = PAGE_SIZE;
+
 	d_instantiate(dentry, inode);
 	unlock_new_inode(inode);
 	orangefs_set_timeout(dentry);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/13] orangefs: inodes linger in cache
  2017-05-22  9:58 [PATCH 00/13] [RFC] orangefs page cache Martin Brandenburg
                   ` (7 preceding siblings ...)
  2017-05-22  9:58 ` [PATCH 08/13] orangefs: initialize new inode size to zero Martin Brandenburg
@ 2017-05-22  9:58 ` Martin Brandenburg
  2017-05-22  9:58 ` [PATCH 10/13] orangefs: implement direct_IO for the read case Martin Brandenburg
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Martin Brandenburg @ 2017-05-22  9:58 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

This is a big change, but it boils down to implementing write_inode,
changing generic_delete_inode to generic_drop_inode, and changing
set/getattr to set/get from the in-memory inode.

This eliminates the benefit of the old getattr cache, though the code is
not removed.  Since inodes are not written back in anything resembling a
timely manner, using the file system from two machines is not
recommended.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c            |   3 ++
 fs/orangefs/inode.c           | 101 ++++++++++--------------------------------
 fs/orangefs/namei.c           |   9 +++-
 fs/orangefs/orangefs-kernel.h |   2 -
 fs/orangefs/orangefs-utils.c  |   3 ++
 fs/orangefs/super.c           |  23 +++++++++-
 fs/orangefs/symlink.c         |   1 -
 7 files changed, 58 insertions(+), 84 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 0de41a3..c3f44aa 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -515,6 +515,9 @@ static ssize_t orangefs_file_write_iter(struct kiocb *iocb, struct iov_iter *ite
 	iocb->ki_pos = pos;
 	orangefs_stats.writes++;
 
+	if (pos > i_size_read(file->f_mapping->host))
+		orangefs_i_size_write(file->f_mapping->host, pos);
+
 out:
 
 	inode_unlock(file->f_mapping->host);
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 3ee121a..d26b9b7 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -17,7 +17,6 @@ static int orangefs_setattr_size(struct inode *inode, struct iattr *iattr)
 {
 	struct orangefs_inode_s *orangefs_inode = ORANGEFS_I(inode);
 	struct orangefs_kernel_op_s *new_op;
-	loff_t orig_size;
 	int ret = -EINVAL;
 
 	gossip_debug(GOSSIP_INODE_DEBUG,
@@ -28,17 +27,6 @@ static int orangefs_setattr_size(struct inode *inode, struct iattr *iattr)
 		     orangefs_inode->refn.fs_id,
 		     iattr->ia_size);
 
-	/* Ensure that we have a up to date size, so we know if it changed. */
-	ret = orangefs_inode_getattr(inode, 0, 1, STATX_SIZE);
-	if (ret == -ESTALE)
-		ret = -EIO;
-	if (ret) {
-		gossip_err("%s: orangefs_inode_getattr failed, ret:%d:.\n",
-		    __func__, ret);
-		return ret;
-	}
-	orig_size = i_size_read(inode);
-
 	truncate_setsize(inode, iattr->ia_size);
 
 	new_op = op_alloc(ORANGEFS_VFS_OP_TRUNCATE);
@@ -64,9 +52,6 @@ static int orangefs_setattr_size(struct inode *inode, struct iattr *iattr)
 	if (ret != 0)
 		return ret;
 
-	if (orig_size != i_size_read(inode))
-		iattr->ia_valid |= ATTR_CTIME | ATTR_MTIME;
-
 	return ret;
 }
 
@@ -75,38 +60,25 @@ static int orangefs_setattr_size(struct inode *inode, struct iattr *iattr)
  */
 int orangefs_setattr(struct dentry *dentry, struct iattr *iattr)
 {
-	int ret = -EINVAL;
-	struct inode *inode = dentry->d_inode;
-
-	gossip_debug(GOSSIP_INODE_DEBUG,
-		     "orangefs_setattr: called on %pd\n",
-		     dentry);
-
-	ret = setattr_prepare(dentry, iattr);
-	if (ret)
-		goto out;
-
+	int r;
 	if (iattr->ia_valid & ATTR_SIZE) {
-		ret = orangefs_setattr_size(inode, iattr);
-		if (ret)
-			goto out;
+		/* XXX: ATTR_CTIME_SET and ATTR_MTIME_SET */
+		if (i_size_read(d_inode(dentry)) != iattr->ia_size)
+			iattr->ia_valid |= ATTR_CTIME | ATTR_MTIME;
 	}
-
-	setattr_copy(inode, iattr);
-	mark_inode_dirty(inode);
-
-	ret = orangefs_inode_setattr(inode, iattr);
-	gossip_debug(GOSSIP_INODE_DEBUG,
-		     "orangefs_setattr: inode_setattr returned %d\n",
-		     ret);
-
-	if (!ret && (iattr->ia_valid & ATTR_MODE))
-		/* change mod on a file that has ACLs */
-		ret = posix_acl_chmod(inode, inode->i_mode);
-
-out:
-	gossip_debug(GOSSIP_INODE_DEBUG, "orangefs_setattr: returning %d\n", ret);
-	return ret;
+	r = simple_setattr(dentry, iattr);
+	if (r)
+		return r;
+	if (iattr->ia_valid & ATTR_SIZE) {
+		r = orangefs_setattr_size(d_inode(dentry), iattr);
+		if (r)
+			return r;
+	}
+	if (iattr->ia_valid & ATTR_MODE) {
+		return posix_acl_chmod(d_inode(dentry),
+		    d_inode(dentry)->i_mode);
+	}
+	return 0;
 }
 
 /*
@@ -115,7 +87,6 @@ int orangefs_setattr(struct dentry *dentry, struct iattr *iattr)
 int orangefs_getattr(const struct path *path, struct kstat *stat,
 		     u32 request_mask, unsigned int flags)
 {
-	int ret = -ENOENT;
 	struct inode *inode = path->dentry->d_inode;
 	struct orangefs_inode_s *orangefs_inode = NULL;
 
@@ -123,38 +94,13 @@ int orangefs_getattr(const struct path *path, struct kstat *stat,
 		     "orangefs_getattr: called on %pd\n",
 		     path->dentry);
 
-	ret = orangefs_inode_getattr(inode, 0, 0, request_mask);
-	if (ret == 0) {
-		generic_fillattr(inode, stat);
-
-		/* override block size reported to stat */
-		orangefs_inode = ORANGEFS_I(inode);
-		stat->blksize = orangefs_inode->blksize;
-
-		if (request_mask & STATX_SIZE)
-			stat->result_mask = STATX_BASIC_STATS;
-		else
-			stat->result_mask = STATX_BASIC_STATS &
-			    ~STATX_SIZE;
-	}
-	return ret;
-}
-
-int orangefs_permission(struct inode *inode, int mask)
-{
-	int ret;
-
-	if (mask & MAY_NOT_BLOCK)
-		return -ECHILD;
-
-	gossip_debug(GOSSIP_INODE_DEBUG, "%s: refreshing\n", __func__);
+	generic_fillattr(inode, stat);
 
-	/* Make sure the permission (and other common attrs) are up to date. */
-	ret = orangefs_inode_getattr(inode, 0, 0, STATX_MODE);
-	if (ret < 0)
-		return ret;
-
-	return generic_permission(inode, mask);
+	/* override block size reported to stat */
+	orangefs_inode = ORANGEFS_I(inode);
+	stat->blksize = orangefs_inode->blksize;
+	stat->result_mask = STATX_BASIC_STATS;
+	return 0;
 }
 
 /* ORANGEDS2 implementation of VFS inode operations for files */
@@ -164,7 +110,6 @@ const struct inode_operations orangefs_file_inode_operations = {
 	.setattr = orangefs_setattr,
 	.getattr = orangefs_getattr,
 	.listxattr = orangefs_listxattr,
-	.permission = orangefs_permission,
 };
 
 static int orangefs_init_iops(struct inode *inode)
diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
index 020a402..9ba6291 100644
--- a/fs/orangefs/namei.c
+++ b/fs/orangefs/namei.c
@@ -446,14 +446,20 @@ static int orangefs_rename(struct inode *old_dir,
 	ret = service_operation(new_op,
 				"orangefs_rename",
 				get_interruptible_flag(old_dentry->d_inode));
-
 	gossip_debug(GOSSIP_NAME_DEBUG,
 		     "orangefs_rename: got downcall status %d\n",
 		     ret);
 
+	if (ret < 0)
+		goto out;
+
 	if (new_dentry->d_inode)
 		new_dentry->d_inode->i_ctime = current_time(new_dentry->d_inode);
 
+	new_dir->i_mtime = current_time(new_dir);
+	new_dir->i_ctime = current_time(new_dir);
+
+out:
 	op_release(new_op);
 	return ret;
 }
@@ -472,5 +478,4 @@ const struct inode_operations orangefs_dir_inode_operations = {
 	.setattr = orangefs_setattr,
 	.getattr = orangefs_getattr,
 	.listxattr = orangefs_listxattr,
-	.permission = orangefs_permission,
 };
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index 6dfb5b6..6cbbf44 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -440,8 +440,6 @@ int orangefs_setattr(struct dentry *dentry, struct iattr *iattr);
 int orangefs_getattr(const struct path *path, struct kstat *stat,
 		     u32 request_mask, unsigned int flags);
 
-int orangefs_permission(struct inode *inode, int mask);
-
 /*
  * defined in xattr.c
  */
diff --git a/fs/orangefs/orangefs-utils.c b/fs/orangefs/orangefs-utils.c
index 1fb90bd..43bb612 100644
--- a/fs/orangefs/orangefs-utils.c
+++ b/fs/orangefs/orangefs-utils.c
@@ -262,6 +262,9 @@ int orangefs_inode_getattr(struct inode *inode, int new, int bypass,
 	gossip_debug(GOSSIP_UTILS_DEBUG, "%s: called on inode %pU\n", __func__,
 	    get_khandle_from_ino(inode));
 
+	if (inode->i_state & I_DIRTY)
+		return 0;
+
 	if (!new && !bypass) {
 		/*
 		 * Must have all the attributes in the mask and be within cache
diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 4d994f8..4d59f76 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -298,11 +298,32 @@ static void orangefs_dirty_inode(struct inode *inode, int flags)
 	SetAtimeFlag(orangefs_inode);
 }
 
+static int orangefs_write_inode(struct inode *inode,
+    struct writeback_control *wbc)
+{
+	struct iattr iattr;
+	int r;
+	iattr.ia_valid = ATTR_MODE | ATTR_UID | ATTR_GID | ATTR_ATIME |
+	    ATTR_ATIME_SET | ATTR_MTIME | ATTR_MTIME_SET | ATTR_CTIME;
+	iattr.ia_mode = inode->i_mode;
+	iattr.ia_uid = inode->i_uid;
+	iattr.ia_gid = inode->i_gid;
+	iattr.ia_atime = inode->i_atime;
+	iattr.ia_mtime = inode->i_mtime;
+	iattr.ia_ctime = inode->i_ctime;
+	r = orangefs_inode_setattr(inode, &iattr);
+	if (r)
+		return r;
+	return r;
+
+}
+
 static const struct super_operations orangefs_s_ops = {
 	.alloc_inode = orangefs_alloc_inode,
 	.destroy_inode = orangefs_destroy_inode,
 	.dirty_inode = orangefs_dirty_inode,
-	.drop_inode = generic_delete_inode,
+	.write_inode = orangefs_write_inode,
+	.drop_inode = generic_drop_inode,
 	.statfs = orangefs_statfs,
 	.remount_fs = orangefs_remount_fs,
 	.show_options = generic_show_options,
diff --git a/fs/orangefs/symlink.c b/fs/orangefs/symlink.c
index 02b1bbd..97ef329 100644
--- a/fs/orangefs/symlink.c
+++ b/fs/orangefs/symlink.c
@@ -13,5 +13,4 @@ const struct inode_operations orangefs_symlink_inode_operations = {
 	.setattr = orangefs_setattr,
 	.getattr = orangefs_getattr,
 	.listxattr = orangefs_listxattr,
-	.permission = orangefs_permission,
 };
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/13] orangefs: implement direct_IO for the read case
  2017-05-22  9:58 [PATCH 00/13] [RFC] orangefs page cache Martin Brandenburg
                   ` (8 preceding siblings ...)
  2017-05-22  9:58 ` [PATCH 09/13] orangefs: inodes linger in cache Martin Brandenburg
@ 2017-05-22  9:58 ` Martin Brandenburg
  2017-05-22  9:58 ` [PATCH 11/13] orangefs: lock inode during fsync Martin Brandenburg
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 20+ messages in thread
From: Martin Brandenburg @ 2017-05-22  9:58 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Since orangefs_file_read_iter now calls generic_file_read_iter, O_DIRECT
now goes through direct_IO in the read case.  In the write case,
orangefs_file_write_iter never calls direct_IO, but handles the direct
write manually.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index c3f44aa..cd126dd 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -819,22 +819,17 @@ static int orangefs_releasepage(struct page *page, gfp_t foo)
 	return 0;
 }
 
-/*
- * Having a direct_IO entry point in the address_space_operations
- * struct causes the kernel to allows us to use O_DIRECT on
- * open. Nothing will ever call this thing, but in the future we
- * will need to be able to use O_DIRECT on open in order to support
- * AIO. Modeled after NFS, they do this too.
- */
-
 static ssize_t orangefs_direct_IO(struct kiocb *iocb,
 				  struct iov_iter *iter)
 {
-	gossip_debug(GOSSIP_INODE_DEBUG,
-		     "orangefs_direct_IO: %pD\n",
-		     iocb->ki_filp);
-
-	return -EINVAL;
+	struct file *file = iocb->ki_filp;
+	loff_t pos = *(&iocb->ki_pos);
+	/*
+	 * This cannot happen until write_iter becomes
+	 * generic_file_write_iter.
+	 */
+	BUG_ON(iov_iter_rw(iter) != READ);
+	return do_readv_writev(ORANGEFS_IO_READ, file, &pos, iter);
 }
 
 /** ORANGEFS2 implementation of address space operations */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/13] orangefs: lock inode during fsync
  2017-05-22  9:58 [PATCH 00/13] [RFC] orangefs page cache Martin Brandenburg
                   ` (9 preceding siblings ...)
  2017-05-22  9:58 ` [PATCH 10/13] orangefs: implement direct_IO for the read case Martin Brandenburg
@ 2017-05-22  9:58 ` Martin Brandenburg
  2017-05-25 15:58   ` Jeff Layton
  2017-05-22  9:59 ` [PATCH 12/13] orangefs: call generic_file_read_iter Martin Brandenburg
  2017-05-22  9:59 ` [PATCH 13/13] orangefs: implement write through the page cache Martin Brandenburg
  12 siblings, 1 reply; 20+ messages in thread
From: Martin Brandenburg @ 2017-05-22  9:58 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index cd126dd..f8536a7 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -652,7 +652,9 @@ static int orangefs_fsync(struct file *file,
 	struct orangefs_kernel_op_s *new_op = NULL;
 
 	/* required call */
+	inode_lock(file_inode(file));
 	filemap_write_and_wait_range(file->f_mapping, start, end);
+	inode_unlock(file_inode(file));
 
 	new_op = op_alloc(ORANGEFS_VFS_OP_FSYNC);
 	if (!new_op)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 12/13] orangefs: call generic_file_read_iter
  2017-05-22  9:58 [PATCH 00/13] [RFC] orangefs page cache Martin Brandenburg
                   ` (10 preceding siblings ...)
  2017-05-22  9:58 ` [PATCH 11/13] orangefs: lock inode during fsync Martin Brandenburg
@ 2017-05-22  9:59 ` Martin Brandenburg
  2017-05-22  9:59 ` [PATCH 13/13] orangefs: implement write through the page cache Martin Brandenburg
  12 siblings, 0 replies; 20+ messages in thread
From: Martin Brandenburg @ 2017-05-22  9:59 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

No need to manually implement this.  The generic implementation handles
direct IO as well.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c | 17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index f8536a7..c03deea 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -441,22 +441,11 @@ static ssize_t orangefs_inode_read(struct inode *inode,
 	return ret;
 }
 
-static ssize_t orangefs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+static ssize_t orangefs_file_read_iter(struct kiocb *iocb,
+    struct iov_iter *iter)
 {
-	struct file *file = iocb->ki_filp;
-	loff_t pos = *(&iocb->ki_pos);
-	ssize_t rc = 0;
-
-	BUG_ON(iocb->private);
-
-	gossip_debug(GOSSIP_FILE_DEBUG, "orangefs_file_read_iter\n");
-
 	orangefs_stats.reads++;
-
-	rc = do_readv_writev(ORANGEFS_IO_READ, file, &pos, iter);
-	iocb->ki_pos = pos;
-
-	return rc;
+	return generic_file_read_iter(iocb, iter);
 }
 
 static ssize_t orangefs_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 13/13] orangefs: implement write through the page cache
  2017-05-22  9:58 [PATCH 00/13] [RFC] orangefs page cache Martin Brandenburg
                   ` (11 preceding siblings ...)
  2017-05-22  9:59 ` [PATCH 12/13] orangefs: call generic_file_read_iter Martin Brandenburg
@ 2017-05-22  9:59 ` Martin Brandenburg
  2017-05-25 16:09   ` Jeff Layton
  12 siblings, 1 reply; 20+ messages in thread
From: Martin Brandenburg @ 2017-05-22  9:59 UTC (permalink / raw)
  To: hubcap, linux-fsdevel; +Cc: Martin Brandenburg

With this and the last commit, OrangeFS is capable of writing through
the page cache.  This should significantly increase performance of very
small writes, since writeback will not be done for every write call.

However it is not appropriate for use with multiple clients due to the
long writeback delay.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/file.c | 128 +++++++++++++++++++++++------------------------------
 1 file changed, 56 insertions(+), 72 deletions(-)

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index c03deea..3ab0a1f 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -448,69 +448,11 @@ static ssize_t orangefs_file_read_iter(struct kiocb *iocb,
 	return generic_file_read_iter(iocb, iter);
 }
 
-static ssize_t orangefs_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
+static ssize_t orangefs_file_write_iter(struct kiocb *iocb,
+    struct iov_iter *iter)
 {
-	struct file *file = iocb->ki_filp;
-	loff_t pos;
-	ssize_t rc;
-
-	BUG_ON(iocb->private);
-
-	gossip_debug(GOSSIP_FILE_DEBUG, "orangefs_file_write_iter\n");
-
-	inode_lock(file->f_mapping->host);
-
-	/* Make sure generic_write_checks sees an up to date inode size. */
-	if (file->f_flags & O_APPEND) {
-		rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
-		    STATX_SIZE);
-		if (rc == -ESTALE)
-			rc = -EIO;
-		if (rc) {
-			gossip_err("%s: orangefs_inode_getattr failed, "
-			    "rc:%zd:.\n", __func__, rc);
-			goto out;
-		}
-	}
-
-	if (file->f_pos > i_size_read(file->f_mapping->host))
-		orangefs_i_size_write(file->f_mapping->host, file->f_pos);
-
-	rc = generic_write_checks(iocb, iter);
-
-	if (rc <= 0) {
-		gossip_err("%s: generic_write_checks failed, rc:%zd:.\n",
-			   __func__, rc);
-		goto out;
-	}
-
-	/*
-	 * if we are appending, generic_write_checks would have updated
-	 * pos to the end of the file, so we will wait till now to set
-	 * pos...
-	 */
-	pos = *(&iocb->ki_pos);
-
-	rc = do_readv_writev(ORANGEFS_IO_WRITE,
-			     file,
-			     &pos,
-			     iter);
-	if (rc < 0) {
-		gossip_err("%s: do_readv_writev failed, rc:%zd:.\n",
-			   __func__, rc);
-		goto out;
-	}
-
-	iocb->ki_pos = pos;
 	orangefs_stats.writes++;
-
-	if (pos > i_size_read(file->f_mapping->host))
-		orangefs_i_size_write(file->f_mapping->host, pos);
-
-out:
-
-	inode_unlock(file->f_mapping->host);
-	return rc;
+	return generic_file_write_iter(iocb, iter);
 }
 
 /*
@@ -606,9 +548,8 @@ static int orangefs_file_release(struct inode *inode, struct file *file)
 	orangefs_flush_inode(inode);
 
 	/*
-	 * remove all associated inode pages from the page cache and
-	 * readahead cache (if any); this forces an expensive refresh of
-	 * data for the next caller of mmap (or 'get_block' accesses)
+	 * remove all associated inode pages from the readahead cache
+	 * (if any)
 	 */
 	if (file_inode(file) &&
 	    file_inode(file)->i_mapping &&
@@ -621,8 +562,6 @@ static int orangefs_file_release(struct inode *inode, struct file *file)
 			gossip_debug(GOSSIP_INODE_DEBUG,
 			    "flush_racache finished\n");
 		}
-		truncate_inode_pages(file_inode(file)->i_mapping,
-				     0);
 	}
 	return 0;
 }
@@ -741,6 +680,40 @@ const struct file_operations orangefs_file_operations = {
 	.fsync		= orangefs_fsync,
 };
 
+static int orangefs_writepage(struct page *page,
+    struct writeback_control *wbc)
+{
+	struct inode *inode = page->mapping->host;
+	struct iov_iter iter;
+	struct iovec iov;
+	loff_t off;
+	size_t len;
+	ssize_t r;
+	void *map;
+
+	off = page_offset(page);
+	len = i_size_read(inode);
+	if (off + PAGE_SIZE > len)
+		len = len - off;
+	else
+		len = PAGE_SIZE;
+
+	map = kmap_atomic(page);
+	iov.iov_base = map;
+	iov.iov_len = len;
+	iov_iter_init(&iter, WRITE, &iov, 1, len);
+
+	set_page_writeback(page);
+
+	r = wait_for_direct_io(ORANGEFS_IO_WRITE, inode, &off, &iter,
+	    len, 0);
+	kunmap_atomic(map);
+
+	end_page_writeback(page);
+	unlock_page(page);
+	return 0;
+}
+
 static int orangefs_readpage(struct file *file, struct page *page)
 {
 	int ret;
@@ -786,6 +759,17 @@ static int orangefs_readpage(struct file *file, struct page *page)
 	return ret;
 }
 
+static int orangefs_write_end(struct file *file,
+    struct address_space *mapping, loff_t pos, unsigned len,
+    unsigned copied, struct page *page, void *fsdata)
+{
+	int r;
+	r = simple_write_end(file, mapping, pos, len, copied, page,
+	    fsdata);
+	mark_inode_dirty_sync(file_inode(file));
+	return r;
+}
+
 static void orangefs_invalidatepage(struct page *page,
 				 unsigned int offset,
 				 unsigned int length)
@@ -815,17 +799,17 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb,
 {
 	struct file *file = iocb->ki_filp;
 	loff_t pos = *(&iocb->ki_pos);
-	/*
-	 * This cannot happen until write_iter becomes
-	 * generic_file_write_iter.
-	 */
-	BUG_ON(iov_iter_rw(iter) != READ);
-	return do_readv_writev(ORANGEFS_IO_READ, file, &pos, iter);
+	return do_readv_writev(iocb->ki_flags & IOCB_WRITE ?
+	    ORANGEFS_IO_WRITE : ORANGEFS_IO_READ, file, &pos, iter);
 }
 
 /** ORANGEFS2 implementation of address space operations */
 const struct address_space_operations orangefs_address_operations = {
+	.writepage = orangefs_writepage,
 	.readpage = orangefs_readpage,
+	.set_page_dirty = __set_page_dirty_nobuffers,
+	.write_begin = simple_write_begin,
+	.write_end = orangefs_write_end,
 	.invalidatepage = orangefs_invalidatepage,
 	.releasepage = orangefs_releasepage,
 	.direct_IO = orangefs_direct_IO,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 11/13] orangefs: lock inode during fsync
  2017-05-22  9:58 ` [PATCH 11/13] orangefs: lock inode during fsync Martin Brandenburg
@ 2017-05-25 15:58   ` Jeff Layton
  2017-05-26 16:21     ` martin
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2017-05-25 15:58 UTC (permalink / raw)
  To: Martin Brandenburg, hubcap, linux-fsdevel

On Mon, 2017-05-22 at 05:58 -0400, Martin Brandenburg wrote:
> Signed-off-by: Martin Brandenburg <martin@omnibond.com>
> ---
>  fs/orangefs/file.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
> index cd126dd..f8536a7 100644
> --- a/fs/orangefs/file.c
> +++ b/fs/orangefs/file.c
> @@ -652,7 +652,9 @@ static int orangefs_fsync(struct file *file,
>  	struct orangefs_kernel_op_s *new_op = NULL;
>  
>  	/* required call */
> +	inode_lock(file_inode(file));
>  	filemap_write_and_wait_range(file->f_mapping, start, end);
> +	inode_unlock(file_inode(file));
>  
>  	new_op = op_alloc(ORANGEFS_VFS_OP_FSYNC);
>  	if (!new_op)

Why? You're just writing back the cached file data here. There's no
reason to lock the inode for that, AFAICS.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 13/13] orangefs: implement write through the page cache
  2017-05-22  9:59 ` [PATCH 13/13] orangefs: implement write through the page cache Martin Brandenburg
@ 2017-05-25 16:09   ` Jeff Layton
  2017-05-26 18:09     ` martin
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Layton @ 2017-05-25 16:09 UTC (permalink / raw)
  To: Martin Brandenburg, hubcap, linux-fsdevel

On Mon, 2017-05-22 at 05:59 -0400, Martin Brandenburg wrote:
> With this and the last commit, OrangeFS is capable of writing through
> the page cache.  This should significantly increase performance of very
> small writes, since writeback will not be done for every write call.
> 
> However it is not appropriate for use with multiple clients due to the
> long writeback delay.
> 
> Signed-off-by: Martin Brandenburg <martin@omnibond.com>
> ---
>  fs/orangefs/file.c | 128 +++++++++++++++++++++++------------------------------
>  1 file changed, 56 insertions(+), 72 deletions(-)
> 
> diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
> index c03deea..3ab0a1f 100644
> --- a/fs/orangefs/file.c
> +++ b/fs/orangefs/file.c
> @@ -448,69 +448,11 @@ static ssize_t orangefs_file_read_iter(struct kiocb *iocb,
>  	return generic_file_read_iter(iocb, iter);
>  }
>  
> -static ssize_t orangefs_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
> +static ssize_t orangefs_file_write_iter(struct kiocb *iocb,
> +    struct iov_iter *iter)
>  {
> -	struct file *file = iocb->ki_filp;
> -	loff_t pos;
> -	ssize_t rc;
> -
> -	BUG_ON(iocb->private);
> -
> -	gossip_debug(GOSSIP_FILE_DEBUG, "orangefs_file_write_iter\n");
> -
> -	inode_lock(file->f_mapping->host);
> -
> -	/* Make sure generic_write_checks sees an up to date inode size. */
> -	if (file->f_flags & O_APPEND) {
> -		rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
> -		    STATX_SIZE);
> -		if (rc == -ESTALE)
> -			rc = -EIO;
> -		if (rc) {
> -			gossip_err("%s: orangefs_inode_getattr failed, "
> -			    "rc:%zd:.\n", __func__, rc);
> -			goto out;
> -		}
> -	}
> -
> -	if (file->f_pos > i_size_read(file->f_mapping->host))
> -		orangefs_i_size_write(file->f_mapping->host, file->f_pos);
> -
> -	rc = generic_write_checks(iocb, iter);
> -
> -	if (rc <= 0) {
> -		gossip_err("%s: generic_write_checks failed, rc:%zd:.\n",
> -			   __func__, rc);
> -		goto out;
> -	}
> -
> -	/*
> -	 * if we are appending, generic_write_checks would have updated
> -	 * pos to the end of the file, so we will wait till now to set
> -	 * pos...
> -	 */
> -	pos = *(&iocb->ki_pos);
> -
> -	rc = do_readv_writev(ORANGEFS_IO_WRITE,
> -			     file,
> -			     &pos,
> -			     iter);
> -	if (rc < 0) {
> -		gossip_err("%s: do_readv_writev failed, rc:%zd:.\n",
> -			   __func__, rc);
> -		goto out;
> -	}
> -
> -	iocb->ki_pos = pos;
>  	orangefs_stats.writes++;
> -
> -	if (pos > i_size_read(file->f_mapping->host))
> -		orangefs_i_size_write(file->f_mapping->host, pos);
> -
> -out:
> -
> -	inode_unlock(file->f_mapping->host);
> -	return rc;
> +	return generic_file_write_iter(iocb, iter);
>  }
>  
>  /*
> @@ -606,9 +548,8 @@ static int orangefs_file_release(struct inode *inode, struct file *file)
>  	orangefs_flush_inode(inode);
>  
>  	/*
> -	 * remove all associated inode pages from the page cache and
> -	 * readahead cache (if any); this forces an expensive refresh of
> -	 * data for the next caller of mmap (or 'get_block' accesses)
> +	 * remove all associated inode pages from the readahead cache
> +	 * (if any)
>  	 */
>  	if (file_inode(file) &&
>  	    file_inode(file)->i_mapping &&
> @@ -621,8 +562,6 @@ static int orangefs_file_release(struct inode *inode, struct file *file)
>  			gossip_debug(GOSSIP_INODE_DEBUG,
>  			    "flush_racache finished\n");
>  		}
> -		truncate_inode_pages(file_inode(file)->i_mapping,
> -				     0);
>  	}
>  	return 0;
>  }
> @@ -741,6 +680,40 @@ const struct file_operations orangefs_file_operations = {
>  	.fsync		= orangefs_fsync,
>  };
>  
> +static int orangefs_writepage(struct page *page,
> +    struct writeback_control *wbc)
> +{
> +	struct inode *inode = page->mapping->host;
> +	struct iov_iter iter;
> +	struct iovec iov;
> +	loff_t off;
> +	size_t len;
> +	ssize_t r;
> +	void *map;
> +
> +	off = page_offset(page);
> +	len = i_size_read(inode);
> +	if (off + PAGE_SIZE > len)
> +		len = len - off;
> +	else
> +		len = PAGE_SIZE;
> +
> +	map = kmap_atomic(page);
> +	iov.iov_base = map;
> +	iov.iov_len = len;
> +	iov_iter_init(&iter, WRITE, &iov, 1, len);
> +
> +	set_page_writeback(page);
> +

wait_for_direct_io can sleep, so you can't use kmap_atomic there.
Regular old kmap should be ok there though.

Also, you probably really don't want to use iovecs there, as those are
expected to deal in userland addresses and the kmap address won't be
one.

It may be cleaner to use a bio_vec there instead. You most likely
wouldn't need to kmap at all if you did that.

> +	r = wait_for_direct_io(ORANGEFS_IO_WRITE, inode, &off, &iter,
> +	    len, 0);
> +	kunmap_atomic(map);
> +

When writeback fails for some reason, you'll also want to call
mapping_set_error to help ensure that those errors get reported (unless
you're tracking them on your own somehow). I don't see where that's
being done in a cursory glance at these patches, but I could have missed
it.

> +	end_page_writeback(page);
> +	unlock_page(page);
> +	return 0;
> +}
> +
> static int orangefs_readpage(struct file *file, struct page *page)
>  {
>  	int ret;
> @@ -786,6 +759,17 @@ static int orangefs_readpage(struct file *file, struct page *page)
>  	return ret;
>  }
>  
> +static int orangefs_write_end(struct file *file,
> +    struct address_space *mapping, loff_t pos, unsigned len,
> +    unsigned copied, struct page *page, void *fsdata)
> +{
> +	int r;
> +	r = simple_write_end(file, mapping, pos, len, copied, page,
> +	    fsdata);
> +	mark_inode_dirty_sync(file_inode(file));
> +	return r;
> +}
> +
>  static void orangefs_invalidatepage(struct page *page,
>  				 unsigned int offset,
>  				 unsigned int length)
> @@ -815,17 +799,17 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb,
>  {
>  	struct file *file = iocb->ki_filp;
>  	loff_t pos = *(&iocb->ki_pos);
> -	/*
> -	 * This cannot happen until write_iter becomes
> -	 * generic_file_write_iter.
> -	 */
> -	BUG_ON(iov_iter_rw(iter) != READ);
> -	return do_readv_writev(ORANGEFS_IO_READ, file, &pos, iter);
> +	return do_readv_writev(iocb->ki_flags & IOCB_WRITE ?
> +	    ORANGEFS_IO_WRITE : ORANGEFS_IO_READ, file, &pos, iter);
>  }
>  
>  /** ORANGEFS2 implementation of address space operations */
>  const struct address_space_operations orangefs_address_operations = {
> +	.writepage = orangefs_writepage,
>  	.readpage = orangefs_readpage,
> +	.set_page_dirty = __set_page_dirty_nobuffers,
> +	.write_begin = simple_write_begin,
> +	.write_end = orangefs_write_end,
>  	.invalidatepage = orangefs_invalidatepage,
>  	.releasepage = orangefs_releasepage,
>  	.direct_IO = orangefs_direct_IO,

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 11/13] orangefs: lock inode during fsync
  2017-05-25 15:58   ` Jeff Layton
@ 2017-05-26 16:21     ` martin
  2017-05-26 16:58       ` Jeff Layton
  0 siblings, 1 reply; 20+ messages in thread
From: martin @ 2017-05-26 16:21 UTC (permalink / raw)
  To: Jeff Layton; +Cc: hubcap, linux-fsdevel

On Thu, May 25, 2017 at 11:58:57AM -0400, Jeff Layton wrote:
> On Mon, 2017-05-22 at 05:58 -0400, Martin Brandenburg wrote:
> > Signed-off-by: Martin Brandenburg <martin@omnibond.com>
> > ---
> >  fs/orangefs/file.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
> > index cd126dd..f8536a7 100644
> > --- a/fs/orangefs/file.c
> > +++ b/fs/orangefs/file.c
> > @@ -652,7 +652,9 @@ static int orangefs_fsync(struct file *file,
> >  	struct orangefs_kernel_op_s *new_op = NULL;
> >  
> >  	/* required call */
> > +	inode_lock(file_inode(file));
> >  	filemap_write_and_wait_range(file->f_mapping, start, end);
> > +	inode_unlock(file_inode(file));
> >  
> >  	new_op = op_alloc(ORANGEFS_VFS_OP_FSYNC);
> >  	if (!new_op)
> 
> Why? You're just writing back the cached file data here. There's no
> reason to lock the inode for that, AFAICS.
> 
> -- 
> Jeff Layton <jlayton@redhat.com>

Because FUSE does.  Now I see FUSE needs it for fuse_sync_writes.

	/*
	 * Wait for all pending writepages on the inode to finish.
	 *
	 * This is currently done by blocking further writes with FUSE_NOWRITE
	 * and waiting for all sent writes to complete.
	 *
	 * This must be called under i_mutex, otherwise the FUSE_NOWRITE usage
	 * could conflict with truncation.
	 */
	static void fuse_sync_writes(struct inode *inode)

I think OrangeFS is okay because writepage does not return until the
write has completed.  Does that sound right?

I'm not sure about the truncate conflict.  Truncates are sent to the
server immediately.  Can a pending write show past the end of the file
show up later?  I don't handle that at all.

Martin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 11/13] orangefs: lock inode during fsync
  2017-05-26 16:21     ` martin
@ 2017-05-26 16:58       ` Jeff Layton
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2017-05-26 16:58 UTC (permalink / raw)
  To: martin; +Cc: hubcap, linux-fsdevel

On Fri, 2017-05-26 at 12:21 -0400, martin@omnibond.com wrote:
> On Thu, May 25, 2017 at 11:58:57AM -0400, Jeff Layton wrote:
> > On Mon, 2017-05-22 at 05:58 -0400, Martin Brandenburg wrote:
> > > Signed-off-by: Martin Brandenburg <martin@omnibond.com>
> > > ---
> > >  fs/orangefs/file.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
> > > index cd126dd..f8536a7 100644
> > > --- a/fs/orangefs/file.c
> > > +++ b/fs/orangefs/file.c
> > > @@ -652,7 +652,9 @@ static int orangefs_fsync(struct file *file,
> > >  	struct orangefs_kernel_op_s *new_op = NULL;
> > >  
> > >  	/* required call */
> > > +	inode_lock(file_inode(file));
> > >  	filemap_write_and_wait_range(file->f_mapping, start, end);
> > > +	inode_unlock(file_inode(file));
> > >  
> > >  	new_op = op_alloc(ORANGEFS_VFS_OP_FSYNC);
> > >  	if (!new_op)
> > 
> > Why? You're just writing back the cached file data here. There's no
> > reason to lock the inode for that, AFAICS.
> > 
> > -- 
> > Jeff Layton <jlayton@redhat.com>
> 
> Because FUSE does.  Now I see FUSE needs it for fuse_sync_writes.
> 
> 	/*
> 	 * Wait for all pending writepages on the inode to finish.
> 	 *
> 	 * This is currently done by blocking further writes with FUSE_NOWRITE
> 	 * and waiting for all sent writes to complete.
> 	 *
> 	 * This must be called under i_mutex, otherwise the FUSE_NOWRITE usage
> 	 * could conflict with truncation.
> 	 */
> 	static void fuse_sync_writes(struct inode *inode)
> 
> I think OrangeFS is okay because writepage does not return until the
> write has completed.  Does that sound right?
> 

I think so.

> I'm not sure about the truncate conflict.  Truncates are sent to the
> server immediately.  Can a pending write show past the end of the file
> show up later?  I don't handle that at all.
> 

IIUC, the danger there is generally that the write and truncate could
get reordered. If they're both synchronous though then I don't see how
that could happen
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 13/13] orangefs: implement write through the page cache
  2017-05-25 16:09   ` Jeff Layton
@ 2017-05-26 18:09     ` martin
  2017-05-26 18:48       ` Jeff Layton
  0 siblings, 1 reply; 20+ messages in thread
From: martin @ 2017-05-26 18:09 UTC (permalink / raw)
  To: Jeff Layton; +Cc: hubcap, linux-fsdevel

On Thu, May 25, 2017 at 12:09:49PM -0400, Jeff Layton wrote:
> On Mon, 2017-05-22 at 05:59 -0400, Martin Brandenburg wrote:
> > With this and the last commit, OrangeFS is capable of writing through
> > the page cache.  This should significantly increase performance of very
> > small writes, since writeback will not be done for every write call.
> > 
> > However it is not appropriate for use with multiple clients due to the
> > long writeback delay.
> > 
> > Signed-off-by: Martin Brandenburg <martin@omnibond.com>
> > ---
> >  fs/orangefs/file.c | 128 +++++++++++++++++++++++------------------------------
> >  1 file changed, 56 insertions(+), 72 deletions(-)
> > 
> > diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
> > index c03deea..3ab0a1f 100644
> > --- a/fs/orangefs/file.c
> > +++ b/fs/orangefs/file.c
> > @@ -448,69 +448,11 @@ static ssize_t orangefs_file_read_iter(struct kiocb *iocb,
> >  	return generic_file_read_iter(iocb, iter);
> >  }
> >  
> > -static ssize_t orangefs_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
> > +static ssize_t orangefs_file_write_iter(struct kiocb *iocb,
> > +    struct iov_iter *iter)
> >  {
> > -	struct file *file = iocb->ki_filp;
> > -	loff_t pos;
> > -	ssize_t rc;
> > -
> > -	BUG_ON(iocb->private);
> > -
> > -	gossip_debug(GOSSIP_FILE_DEBUG, "orangefs_file_write_iter\n");
> > -
> > -	inode_lock(file->f_mapping->host);
> > -
> > -	/* Make sure generic_write_checks sees an up to date inode size. */
> > -	if (file->f_flags & O_APPEND) {
> > -		rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
> > -		    STATX_SIZE);
> > -		if (rc == -ESTALE)
> > -			rc = -EIO;
> > -		if (rc) {
> > -			gossip_err("%s: orangefs_inode_getattr failed, "
> > -			    "rc:%zd:.\n", __func__, rc);
> > -			goto out;
> > -		}
> > -	}
> > -
> > -	if (file->f_pos > i_size_read(file->f_mapping->host))
> > -		orangefs_i_size_write(file->f_mapping->host, file->f_pos);
> > -
> > -	rc = generic_write_checks(iocb, iter);
> > -
> > -	if (rc <= 0) {
> > -		gossip_err("%s: generic_write_checks failed, rc:%zd:.\n",
> > -			   __func__, rc);
> > -		goto out;
> > -	}
> > -
> > -	/*
> > -	 * if we are appending, generic_write_checks would have updated
> > -	 * pos to the end of the file, so we will wait till now to set
> > -	 * pos...
> > -	 */
> > -	pos = *(&iocb->ki_pos);
> > -
> > -	rc = do_readv_writev(ORANGEFS_IO_WRITE,
> > -			     file,
> > -			     &pos,
> > -			     iter);
> > -	if (rc < 0) {
> > -		gossip_err("%s: do_readv_writev failed, rc:%zd:.\n",
> > -			   __func__, rc);
> > -		goto out;
> > -	}
> > -
> > -	iocb->ki_pos = pos;
> >  	orangefs_stats.writes++;
> > -
> > -	if (pos > i_size_read(file->f_mapping->host))
> > -		orangefs_i_size_write(file->f_mapping->host, pos);
> > -
> > -out:
> > -
> > -	inode_unlock(file->f_mapping->host);
> > -	return rc;
> > +	return generic_file_write_iter(iocb, iter);
> >  }
> >  
> >  /*
> > @@ -606,9 +548,8 @@ static int orangefs_file_release(struct inode *inode, struct file *file)
> >  	orangefs_flush_inode(inode);
> >  
> >  	/*
> > -	 * remove all associated inode pages from the page cache and
> > -	 * readahead cache (if any); this forces an expensive refresh of
> > -	 * data for the next caller of mmap (or 'get_block' accesses)
> > +	 * remove all associated inode pages from the readahead cache
> > +	 * (if any)
> >  	 */
> >  	if (file_inode(file) &&
> >  	    file_inode(file)->i_mapping &&
> > @@ -621,8 +562,6 @@ static int orangefs_file_release(struct inode *inode, struct file *file)
> >  			gossip_debug(GOSSIP_INODE_DEBUG,
> >  			    "flush_racache finished\n");
> >  		}
> > -		truncate_inode_pages(file_inode(file)->i_mapping,
> > -				     0);
> >  	}
> >  	return 0;
> >  }
> > @@ -741,6 +680,40 @@ const struct file_operations orangefs_file_operations = {
> >  	.fsync		= orangefs_fsync,
> >  };
> >  
> > +static int orangefs_writepage(struct page *page,
> > +    struct writeback_control *wbc)
> > +{
> > +	struct inode *inode = page->mapping->host;
> > +	struct iov_iter iter;
> > +	struct iovec iov;
> > +	loff_t off;
> > +	size_t len;
> > +	ssize_t r;
> > +	void *map;
> > +
> > +	off = page_offset(page);
> > +	len = i_size_read(inode);
> > +	if (off + PAGE_SIZE > len)
> > +		len = len - off;
> > +	else
> > +		len = PAGE_SIZE;
> > +
> > +	map = kmap_atomic(page);
> > +	iov.iov_base = map;
> > +	iov.iov_len = len;
> > +	iov_iter_init(&iter, WRITE, &iov, 1, len);
> > +
> > +	set_page_writeback(page);
> > +
> 
> wait_for_direct_io can sleep, so you can't use kmap_atomic there.
> Regular old kmap should be ok there though.
> 
> Also, you probably really don't want to use iovecs there, as those are
> expected to deal in userland addresses and the kmap address won't be
> one.
> 
> It may be cleaner to use a bio_vec there instead. You most likely
> wouldn't need to kmap at all if you did that.

I tried that since that's what readpage does.

	static int orangefs_writepage(struct page *page,
	    struct writeback_control *wbc)
	{
		struct inode *inode = page->mapping->host;
		struct iov_iter iter;
		struct bio_vec bv;
		loff_t off;
		size_t len;
		ssize_t r;

		off = page_offset(page);
		len = i_size_read(inode);
		if (off + PAGE_SIZE > len)
			len = len - off;
		else
			len = PAGE_SIZE;

		bv.bv_page = page;
		bv.bv_len = len;
		iov_iter_bvec(&iter, ITER_BVEC | WRITE, &bv, 1, len);

		set_page_writeback(page);

		r = wait_for_direct_io(ORANGEFS_IO_WRITE, inode, &off, &iter,
		   len, 0);
		printk("orangefs_writepage wait_for_direct_io returned %d\n", (int)r);

		end_page_writeback(page);
		unlock_page(page);
		return 0;
	}

But this produced:

	[  125.739396] BUG: unable to handle kernel paging request at ffff880303833802
	[  125.739429] IP: __memcpy+0x12/0x20
	[  125.739439] PGD 2ec7067 
	[  125.739441] PUD 0 

	[  125.739463] Oops: 0000 [#1] SMP
	[  125.739474] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl snd_hda_codec_realtek nfs snd_hda_codec_generic lockd grace fscache sunrpc adt7475 hwmon_vid nouveau video mxm_wmi wmi coretemp i2c_algo_bit kvm_intel drm_kms_helper ttm drm kvm snd_hda_intel snd_hda_codec irqbypass snd_hwdep evdev iTCO_wdt dcdbas iTCO_vendor_support pcspkr snd_hda_core serio_raw i2c_i801 snd_pcm snd_timer lpc_ich mfd_core i2c_core shpchp snd soundcore button acpi_cpufreq autofs4 ext4 crc16 jbd2 mbcache hid_generic usbhid hid sg sr_mod cdrom sd_mod usb_storage ata_generic ata_piix libata psmouse firewire_ohci scsi_mod r8169 firewire_core ehci_pci mii uhci_hcd ehci_hcd crc_itu_t usbcore usb_common
	[  125.739683] CPU: 1 PID: 111 Comm: kworker/u8:2 Not tainted 4.11.0+ #66
	[  125.739697] Hardware name: Dell Inc. Studio 540      /0M017G, BIOS 1.1.1 07/24/2009
	[  125.739718] Workqueue: writeback wb_workfn (flush-orangefs-1)
	[  125.739734] task: ffff88021d9bc100 task.stack: ffffc900016d8000
	[  125.739748] RIP: 0010:__memcpy+0x12/0x20
	[  125.739759] RSP: 0018:ffffc900016db800 EFLAGS: 00010246
	[  125.739773] RAX: ffff88020693d000 RBX: 0000000000001000 RCX: 0000000000000200
	[  125.739788] RDX: 0000000000000000 RSI: ffff880303833802 RDI: ffff88020693d000
	[  125.739803] RBP: ffffc900016db808 R08: 0000160000000000 R09: 00000000ffff8802
	[  125.739818] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000001000
	[  125.739834] R13: ffffc900016db990 R14: 0000000000000000 R15: ffff88020693e000
	[  125.739849] FS:  0000000000000000(0000) GS:ffff880227000000(0000) knlGS:0000000000000000
	[  125.739866] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[  125.739879] CR2: ffff880303833802 CR3: 0000000220d97000 CR4: 00000000000406e0
	[  125.739895] Call Trace:
	[  125.739906]  ? memcpy_from_page+0x46/0x80
	[  125.739918]  copy_from_iter+0x1f1/0x380
	[  125.739929]  ? get+0x55/0x2a0
	[  125.739942]  copy_page_from_iter+0x166/0x250
	[  125.739957]  orangefs_bufmap_copy_from_iovec+0x77/0xc0
	[  125.739971]  wait_for_direct_io+0x1cb/0x500
	[  125.739987]  orangefs_writepage+0x76/0xb0
	[  125.740002]  __writepage+0x16/0x40
	[  125.740014]  write_cache_pages+0x266/0x5c0
	[  125.740027]  ? sched_clock_local+0x17/0x90
	[  125.740040]  ? wb_position_ratio+0x1f0/0x1f0
	[  125.740060]  generic_writepages+0x52/0x70
	[  125.740074]  do_writepages+0x2b/0x30
	[  125.740084]  ? do_writepages+0x2b/0x30
	[  125.740096]  __writeback_single_inode+0x61/0x6d0
	[  125.740109]  ? _raw_spin_unlock+0x27/0x30
	[  125.740122]  writeback_sb_inodes+0x2e4/0x680
	[  125.740141]  __writeback_inodes_wb+0x8f/0xc0
	[  125.740155]  wb_writeback+0x32c/0x4f0
	[  125.740171]  wb_workfn+0x9c/0x560
	[  125.740181]  ? wb_workfn+0x9c/0x560
	[  125.740194]  ? process_one_work+0x164/0x650
	[  125.740210]  process_one_work+0x1e1/0x650
	[  125.740225]  worker_thread+0x69/0x4a0
	[  125.740240]  kthread+0x127/0x160
	[  125.740250]  ? process_one_work+0x650/0x650
	[  125.740262]  ? kthread_create_on_node+0x40/0x40
	[  125.740275]  ret_from_fork+0x31/0x40
	[  125.740291] Code: 75 05 e8 72 fb ff ff 48 8b 43 60 48 2b 43 50 88 43 4e 5b 5d f3 c3 90 90 90 66 66 90 66 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3 
	[  125.740413] RIP: __memcpy+0x12/0x20 RSP: ffffc900016db800
	[  125.740425] CR2: ffff880303833802
	[  125.740437] ---[ end trace 844fdaac4da7b68d ]---

> 
> > +	r = wait_for_direct_io(ORANGEFS_IO_WRITE, inode, &off, &iter,
> > +	    len, 0);
> > +	kunmap_atomic(map);
> > +
> 
> When writeback fails for some reason, you'll also want to call
> mapping_set_error to help ensure that those errors get reported (unless
> you're tracking them on your own somehow). I don't see where that's
> being done in a cursory glance at these patches, but I could have missed
> it.
> 
> > +	end_page_writeback(page);
> > +	unlock_page(page);
> > +	return 0;
> > +}
> > +
> > static int orangefs_readpage(struct file *file, struct page *page)
> >  {
> >  	int ret;
> > @@ -786,6 +759,17 @@ static int orangefs_readpage(struct file *file, struct page *page)
> >  	return ret;
> >  }
> >  
> > +static int orangefs_write_end(struct file *file,
> > +    struct address_space *mapping, loff_t pos, unsigned len,
> > +    unsigned copied, struct page *page, void *fsdata)
> > +{
> > +	int r;
> > +	r = simple_write_end(file, mapping, pos, len, copied, page,
> > +	    fsdata);
> > +	mark_inode_dirty_sync(file_inode(file));
> > +	return r;
> > +}
> > +
> >  static void orangefs_invalidatepage(struct page *page,
> >  				 unsigned int offset,
> >  				 unsigned int length)
> > @@ -815,17 +799,17 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb,
> >  {
> >  	struct file *file = iocb->ki_filp;
> >  	loff_t pos = *(&iocb->ki_pos);
> > -	/*
> > -	 * This cannot happen until write_iter becomes
> > -	 * generic_file_write_iter.
> > -	 */
> > -	BUG_ON(iov_iter_rw(iter) != READ);
> > -	return do_readv_writev(ORANGEFS_IO_READ, file, &pos, iter);
> > +	return do_readv_writev(iocb->ki_flags & IOCB_WRITE ?
> > +	    ORANGEFS_IO_WRITE : ORANGEFS_IO_READ, file, &pos, iter);
> >  }
> >  
> >  /** ORANGEFS2 implementation of address space operations */
> >  const struct address_space_operations orangefs_address_operations = {
> > +	.writepage = orangefs_writepage,
> >  	.readpage = orangefs_readpage,
> > +	.set_page_dirty = __set_page_dirty_nobuffers,
> > +	.write_begin = simple_write_begin,
> > +	.write_end = orangefs_write_end,
> >  	.invalidatepage = orangefs_invalidatepage,
> >  	.releasepage = orangefs_releasepage,
> >  	.direct_IO = orangefs_direct_IO,
> 
> -- 
> Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 13/13] orangefs: implement write through the page cache
  2017-05-26 18:09     ` martin
@ 2017-05-26 18:48       ` Jeff Layton
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2017-05-26 18:48 UTC (permalink / raw)
  To: martin; +Cc: hubcap, linux-fsdevel

On Fri, 2017-05-26 at 14:09 -0400, martin@omnibond.com wrote:
> On Thu, May 25, 2017 at 12:09:49PM -0400, Jeff Layton wrote:
> > On Mon, 2017-05-22 at 05:59 -0400, Martin Brandenburg wrote:
> > > With this and the last commit, OrangeFS is capable of writing through
> > > the page cache.  This should significantly increase performance of very
> > > small writes, since writeback will not be done for every write call.
> > > 
> > > However it is not appropriate for use with multiple clients due to the
> > > long writeback delay.
> > > 
> > > Signed-off-by: Martin Brandenburg <martin@omnibond.com>
> > > ---
> > >  fs/orangefs/file.c | 128 +++++++++++++++++++++++------------------------------
> > >  1 file changed, 56 insertions(+), 72 deletions(-)
> > > 
> > > diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
> > > index c03deea..3ab0a1f 100644
> > > --- a/fs/orangefs/file.c
> > > +++ b/fs/orangefs/file.c
> > > @@ -448,69 +448,11 @@ static ssize_t orangefs_file_read_iter(struct kiocb *iocb,
> > >  	return generic_file_read_iter(iocb, iter);
> > >  }
> > >  
> > > -static ssize_t orangefs_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
> > > +static ssize_t orangefs_file_write_iter(struct kiocb *iocb,
> > > +    struct iov_iter *iter)
> > >  {
> > > -	struct file *file = iocb->ki_filp;
> > > -	loff_t pos;
> > > -	ssize_t rc;
> > > -
> > > -	BUG_ON(iocb->private);
> > > -
> > > -	gossip_debug(GOSSIP_FILE_DEBUG, "orangefs_file_write_iter\n");
> > > -
> > > -	inode_lock(file->f_mapping->host);
> > > -
> > > -	/* Make sure generic_write_checks sees an up to date inode size. */
> > > -	if (file->f_flags & O_APPEND) {
> > > -		rc = orangefs_inode_getattr(file->f_mapping->host, 0, 1,
> > > -		    STATX_SIZE);
> > > -		if (rc == -ESTALE)
> > > -			rc = -EIO;
> > > -		if (rc) {
> > > -			gossip_err("%s: orangefs_inode_getattr failed, "
> > > -			    "rc:%zd:.\n", __func__, rc);
> > > -			goto out;
> > > -		}
> > > -	}
> > > -
> > > -	if (file->f_pos > i_size_read(file->f_mapping->host))
> > > -		orangefs_i_size_write(file->f_mapping->host, file->f_pos);
> > > -
> > > -	rc = generic_write_checks(iocb, iter);
> > > -
> > > -	if (rc <= 0) {
> > > -		gossip_err("%s: generic_write_checks failed, rc:%zd:.\n",
> > > -			   __func__, rc);
> > > -		goto out;
> > > -	}
> > > -
> > > -	/*
> > > -	 * if we are appending, generic_write_checks would have updated
> > > -	 * pos to the end of the file, so we will wait till now to set
> > > -	 * pos...
> > > -	 */
> > > -	pos = *(&iocb->ki_pos);
> > > -
> > > -	rc = do_readv_writev(ORANGEFS_IO_WRITE,
> > > -			     file,
> > > -			     &pos,
> > > -			     iter);
> > > -	if (rc < 0) {
> > > -		gossip_err("%s: do_readv_writev failed, rc:%zd:.\n",
> > > -			   __func__, rc);
> > > -		goto out;
> > > -	}
> > > -
> > > -	iocb->ki_pos = pos;
> > >  	orangefs_stats.writes++;
> > > -
> > > -	if (pos > i_size_read(file->f_mapping->host))
> > > -		orangefs_i_size_write(file->f_mapping->host, pos);
> > > -
> > > -out:
> > > -
> > > -	inode_unlock(file->f_mapping->host);
> > > -	return rc;
> > > +	return generic_file_write_iter(iocb, iter);
> > >  }
> > >  
> > >  /*
> > > @@ -606,9 +548,8 @@ static int orangefs_file_release(struct inode *inode, struct file *file)
> > >  	orangefs_flush_inode(inode);
> > >  
> > >  	/*
> > > -	 * remove all associated inode pages from the page cache and
> > > -	 * readahead cache (if any); this forces an expensive refresh of
> > > -	 * data for the next caller of mmap (or 'get_block' accesses)
> > > +	 * remove all associated inode pages from the readahead cache
> > > +	 * (if any)
> > >  	 */
> > >  	if (file_inode(file) &&
> > >  	    file_inode(file)->i_mapping &&
> > > @@ -621,8 +562,6 @@ static int orangefs_file_release(struct inode *inode, struct file *file)
> > >  			gossip_debug(GOSSIP_INODE_DEBUG,
> > >  			    "flush_racache finished\n");
> > >  		}
> > > -		truncate_inode_pages(file_inode(file)->i_mapping,
> > > -				     0);
> > >  	}
> > >  	return 0;
> > >  }
> > > @@ -741,6 +680,40 @@ const struct file_operations orangefs_file_operations = {
> > >  	.fsync		= orangefs_fsync,
> > >  };
> > >  
> > > +static int orangefs_writepage(struct page *page,
> > > +    struct writeback_control *wbc)
> > > +{
> > > +	struct inode *inode = page->mapping->host;
> > > +	struct iov_iter iter;
> > > +	struct iovec iov;
> > > +	loff_t off;
> > > +	size_t len;
> > > +	ssize_t r;
> > > +	void *map;
> > > +
> > > +	off = page_offset(page);
> > > +	len = i_size_read(inode);
> > > +	if (off + PAGE_SIZE > len)
> > > +		len = len - off;
> > > +	else
> > > +		len = PAGE_SIZE;
> > > +
> > > +	map = kmap_atomic(page);
> > > +	iov.iov_base = map;
> > > +	iov.iov_len = len;
> > > +	iov_iter_init(&iter, WRITE, &iov, 1, len);
> > > +
> > > +	set_page_writeback(page);
> > > +
> > 
> > wait_for_direct_io can sleep, so you can't use kmap_atomic there.
> > Regular old kmap should be ok there though.
> > 
> > Also, you probably really don't want to use iovecs there, as those are
> > expected to deal in userland addresses and the kmap address won't be
> > one.
> > 
> > It may be cleaner to use a bio_vec there instead. You most likely
> > wouldn't need to kmap at all if you did that.
> 
> I tried that since that's what readpage does.
> 
> 	static int orangefs_writepage(struct page *page,
> 	    struct writeback_control *wbc)
> 	{
> 		struct inode *inode = page->mapping->host;
> 		struct iov_iter iter;
> 		struct bio_vec bv;
> 		loff_t off;
> 		size_t len;
> 		ssize_t r;
> 
> 		off = page_offset(page);
> 		len = i_size_read(inode);
> 		if (off + PAGE_SIZE > len)
> 			len = len - off;
> 		else
> 			len = PAGE_SIZE;
> 
> 		bv.bv_page = page;
> 		bv.bv_len = len;

You didn't set bv.bv_offset there.

> 		iov_iter_bvec(&iter, ITER_BVEC | WRITE, &bv, 1, len);
> 
> 		set_page_writeback(page);
> 
> 		r = wait_for_direct_io(ORANGEFS_IO_WRITE, inode, &off, &iter,
> 		   len, 0);
> 		printk("orangefs_writepage wait_for_direct_io returned %d\n", (int)r);
> 
> 		end_page_writeback(page);
> 		unlock_page(page);
> 		return 0;
> 	}
> 
> But this produced:
> 
> 	[  125.739396] BUG: unable to handle kernel paging request at ffff880303833802
> 	[  125.739429] IP: __memcpy+0x12/0x20
> 	[  125.739439] PGD 2ec7067 
> 	[  125.739441] PUD 0 
> 
> 	[  125.739463] Oops: 0000 [#1] SMP
> 	[  125.739474] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl snd_hda_codec_realtek nfs snd_hda_codec_generic lockd grace fscache sunrpc adt7475 hwmon_vid nouveau video mxm_wmi wmi coretemp i2c_algo_bit kvm_intel drm_kms_helper ttm drm kvm snd_hda_intel snd_hda_codec irqbypass snd_hwdep evdev iTCO_wdt dcdbas iTCO_vendor_support pcspkr snd_hda_core serio_raw i2c_i801 snd_pcm snd_timer lpc_ich mfd_core i2c_core shpchp snd soundcore button acpi_cpufreq autofs4 ext4 crc16 jbd2 mbcache hid_generic usbhid hid sg sr_mod cdrom sd_mod usb_storage ata_generic ata_piix libata psmouse firewire_ohci scsi_mod r8169 firewire_core ehci_pci mii uhci_hcd ehci_hcd crc_itu_t usbcore usb_common
> 	[  125.739683] CPU: 1 PID: 111 Comm: kworker/u8:2 Not tainted 4.11.0+ #66
> 	[  125.739697] Hardware name: Dell Inc. Studio 540      /0M017G, BIOS 1.1.1 07/24/2009
> 	[  125.739718] Workqueue: writeback wb_workfn (flush-orangefs-1)
> 	[  125.739734] task: ffff88021d9bc100 task.stack: ffffc900016d8000
> 	[  125.739748] RIP: 0010:__memcpy+0x12/0x20
> 	[  125.739759] RSP: 0018:ffffc900016db800 EFLAGS: 00010246
> 	[  125.739773] RAX: ffff88020693d000 RBX: 0000000000001000 RCX: 0000000000000200
> 	[  125.739788] RDX: 0000000000000000 RSI: ffff880303833802 RDI: ffff88020693d000
> 	[  125.739803] RBP: ffffc900016db808 R08: 0000160000000000 R09: 00000000ffff8802
> 	[  125.739818] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000001000
> 	[  125.739834] R13: ffffc900016db990 R14: 0000000000000000 R15: ffff88020693e000
> 	[  125.739849] FS:  0000000000000000(0000) GS:ffff880227000000(0000) knlGS:0000000000000000
> 	[  125.739866] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 	[  125.739879] CR2: ffff880303833802 CR3: 0000000220d97000 CR4: 00000000000406e0
> 	[  125.739895] Call Trace:
> 	[  125.739906]  ? memcpy_from_page+0x46/0x80
> 	[  125.739918]  copy_from_iter+0x1f1/0x380
> 	[  125.739929]  ? get+0x55/0x2a0
> 	[  125.739942]  copy_page_from_iter+0x166/0x250
> 	[  125.739957]  orangefs_bufmap_copy_from_iovec+0x77/0xc0
> 	[  125.739971]  wait_for_direct_io+0x1cb/0x500
> 	[  125.739987]  orangefs_writepage+0x76/0xb0
> 	[  125.740002]  __writepage+0x16/0x40
> 	[  125.740014]  write_cache_pages+0x266/0x5c0
> 	[  125.740027]  ? sched_clock_local+0x17/0x90
> 	[  125.740040]  ? wb_position_ratio+0x1f0/0x1f0
> 	[  125.740060]  generic_writepages+0x52/0x70
> 	[  125.740074]  do_writepages+0x2b/0x30
> 	[  125.740084]  ? do_writepages+0x2b/0x30
> 	[  125.740096]  __writeback_single_inode+0x61/0x6d0
> 	[  125.740109]  ? _raw_spin_unlock+0x27/0x30
> 	[  125.740122]  writeback_sb_inodes+0x2e4/0x680
> 	[  125.740141]  __writeback_inodes_wb+0x8f/0xc0
> 	[  125.740155]  wb_writeback+0x32c/0x4f0
> 	[  125.740171]  wb_workfn+0x9c/0x560
> 	[  125.740181]  ? wb_workfn+0x9c/0x560
> 	[  125.740194]  ? process_one_work+0x164/0x650
> 	[  125.740210]  process_one_work+0x1e1/0x650
> 	[  125.740225]  worker_thread+0x69/0x4a0
> 	[  125.740240]  kthread+0x127/0x160
> 	[  125.740250]  ? process_one_work+0x650/0x650
> 	[  125.740262]  ? kthread_create_on_node+0x40/0x40
> 	[  125.740275]  ret_from_fork+0x31/0x40
> 	[  125.740291] Code: 75 05 e8 72 fb ff ff 48 8b 43 60 48 2b 43 50 88 43 4e 5b 5d f3 c3 90 90 90 66 66 90 66 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3 
> 	[  125.740413] RIP: __memcpy+0x12/0x20 RSP: ffffc900016db800
> 	[  125.740425] CR2: ffff880303833802
> 	[  125.740437] ---[ end trace 844fdaac4da7b68d ]---
> 
> > 
> > > +	r = wait_for_direct_io(ORANGEFS_IO_WRITE, inode, &off, &iter,
> > > +	    len, 0);
> > > +	kunmap_atomic(map);
> > > +
> > 
> > When writeback fails for some reason, you'll also want to call
> > mapping_set_error to help ensure that those errors get reported (unless
> > you're tracking them on your own somehow). I don't see where that's
> > being done in a cursory glance at these patches, but I could have missed
> > it.
> > 
> > > +	end_page_writeback(page);
> > > +	unlock_page(page);
> > > +	return 0;
> > > +}
> > > +
> > > static int orangefs_readpage(struct file *file, struct page *page)
> > >  {
> > >  	int ret;
> > > @@ -786,6 +759,17 @@ static int orangefs_readpage(struct file *file, struct page *page)
> > >  	return ret;
> > >  }
> > >  
> > > +static int orangefs_write_end(struct file *file,
> > > +    struct address_space *mapping, loff_t pos, unsigned len,
> > > +    unsigned copied, struct page *page, void *fsdata)
> > > +{
> > > +	int r;
> > > +	r = simple_write_end(file, mapping, pos, len, copied, page,
> > > +	    fsdata);
> > > +	mark_inode_dirty_sync(file_inode(file));
> > > +	return r;
> > > +}
> > > +
> > >  static void orangefs_invalidatepage(struct page *page,
> > >  				 unsigned int offset,
> > >  				 unsigned int length)
> > > @@ -815,17 +799,17 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb,
> > >  {
> > >  	struct file *file = iocb->ki_filp;
> > >  	loff_t pos = *(&iocb->ki_pos);
> > > -	/*
> > > -	 * This cannot happen until write_iter becomes
> > > -	 * generic_file_write_iter.
> > > -	 */
> > > -	BUG_ON(iov_iter_rw(iter) != READ);
> > > -	return do_readv_writev(ORANGEFS_IO_READ, file, &pos, iter);
> > > +	return do_readv_writev(iocb->ki_flags & IOCB_WRITE ?
> > > +	    ORANGEFS_IO_WRITE : ORANGEFS_IO_READ, file, &pos, iter);
> > >  }
> > >  
> > >  /** ORANGEFS2 implementation of address space operations */
> > >  const struct address_space_operations orangefs_address_operations = {
> > > +	.writepage = orangefs_writepage,
> > >  	.readpage = orangefs_readpage,
> > > +	.set_page_dirty = __set_page_dirty_nobuffers,
> > > +	.write_begin = simple_write_begin,
> > > +	.write_end = orangefs_write_end,
> > >  	.invalidatepage = orangefs_invalidatepage,
> > >  	.releasepage = orangefs_releasepage,
> > >  	.direct_IO = orangefs_direct_IO,
> > 
> > -- 
> > Jeff Layton <jlayton@redhat.com>

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-05-26 18:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-22  9:58 [PATCH 00/13] [RFC] orangefs page cache Martin Brandenburg
2017-05-22  9:58 ` [PATCH 01/13] orangefs: move orangefs_address_operations to file.c Martin Brandenburg
2017-05-22  9:58 ` [PATCH 02/13] orangefs: remove orangefs_readpages Martin Brandenburg
2017-05-22  9:58 ` [PATCH 03/13] orangefs: make orangefs_inode_read static Martin Brandenburg
2017-05-22  9:58 ` [PATCH 04/13] orangefs: only set a_ops for regular files Martin Brandenburg
2017-05-22  9:58 ` [PATCH 05/13] orangefs: BUG_ON if i_mode invalid Martin Brandenburg
2017-05-22  9:58 ` [PATCH 06/13] orangefs: remove mapping_nrpages macro Martin Brandenburg
2017-05-22  9:58 ` [PATCH 07/13] orangefs: set up and use backing_dev_info Martin Brandenburg
2017-05-22  9:58 ` [PATCH 08/13] orangefs: initialize new inode size to zero Martin Brandenburg
2017-05-22  9:58 ` [PATCH 09/13] orangefs: inodes linger in cache Martin Brandenburg
2017-05-22  9:58 ` [PATCH 10/13] orangefs: implement direct_IO for the read case Martin Brandenburg
2017-05-22  9:58 ` [PATCH 11/13] orangefs: lock inode during fsync Martin Brandenburg
2017-05-25 15:58   ` Jeff Layton
2017-05-26 16:21     ` martin
2017-05-26 16:58       ` Jeff Layton
2017-05-22  9:59 ` [PATCH 12/13] orangefs: call generic_file_read_iter Martin Brandenburg
2017-05-22  9:59 ` [PATCH 13/13] orangefs: implement write through the page cache Martin Brandenburg
2017-05-25 16:09   ` Jeff Layton
2017-05-26 18:09     ` martin
2017-05-26 18:48       ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).