[Cluster-devel] [PATCH v2 0/4] mm/gfs2: extend file_* API, and convert gfs2 to errseq

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Cluster-devel] [PATCH v2 0/4] mm/gfs2: extend file_* API, and convert gfs2 to errseq_t error reporting
@ 2017-07-26 17:55 ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Jeff Layton <jlayton@redhat.com>

I sent a small patch earlier this week to make sync_file_range use
errseq_t reporting.

This set respins that patch into a patch that adds a bit more file_*
infrastructure, and then patches to make sync_file_range and fsync
on gfs2 report writeback errors properly.

There's also a small cleanup patch for mm/filemap.c to consolidate
the DAX handling checks in the existing infrastructure.

Jeff Layton (4):
  mm: consolidate dax / non-dax checks for writeback
  mm: add file_fdatawait_range and file_write_and_wait
  fs: convert sync_file_range to use errseq_t based error-tracking
  gfs2: convert to errseq_t based writeback error reporting for fsync

 fs/gfs2/file.c     |  6 +++--
 fs/sync.c          |  4 +--
 include/linux/fs.h |  7 +++++-
 mm/filemap.c       | 71 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 77 insertions(+), 11 deletions(-)

-- 
2.13.3

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v2 0/4] mm/gfs2: extend file_* API, and convert gfs2 to errseq_t error reporting
@ 2017-07-26 17:55 ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara
  Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse,
	cluster-devel

From: Jeff Layton <jlayton@redhat.com>

I sent a small patch earlier this week to make sync_file_range use
errseq_t reporting.

This set respins that patch into a patch that adds a bit more file_*
infrastructure, and then patches to make sync_file_range and fsync
on gfs2 report writeback errors properly.

There's also a small cleanup patch for mm/filemap.c to consolidate
the DAX handling checks in the existing infrastructure.

Jeff Layton (4):
  mm: consolidate dax / non-dax checks for writeback
  mm: add file_fdatawait_range and file_write_and_wait
  fs: convert sync_file_range to use errseq_t based error-tracking
  gfs2: convert to errseq_t based writeback error reporting for fsync

 fs/gfs2/file.c     |  6 +++--
 fs/sync.c          |  4 +--
 include/linux/fs.h |  7 +++++-
 mm/filemap.c       | 71 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 77 insertions(+), 11 deletions(-)

-- 
2.13.3

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v2 0/4] mm/gfs2: extend file_* API, and convert gfs2 to errseq_t error reporting
@ 2017-07-26 17:55 ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara
  Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse,
	cluster-devel

From: Jeff Layton <jlayton@redhat.com>

I sent a small patch earlier this week to make sync_file_range use
errseq_t reporting.

This set respins that patch into a patch that adds a bit more file_*
infrastructure, and then patches to make sync_file_range and fsync
on gfs2 report writeback errors properly.

There's also a small cleanup patch for mm/filemap.c to consolidate
the DAX handling checks in the existing infrastructure.

Jeff Layton (4):
  mm: consolidate dax / non-dax checks for writeback
  mm: add file_fdatawait_range and file_write_and_wait
  fs: convert sync_file_range to use errseq_t based error-tracking
  gfs2: convert to errseq_t based writeback error reporting for fsync

 fs/gfs2/file.c     |  6 +++--
 fs/sync.c          |  4 +--
 include/linux/fs.h |  7 +++++-
 mm/filemap.c       | 71 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 77 insertions(+), 11 deletions(-)

-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback
  2017-07-26 17:55 ` Jeff Layton
  (?)
@ 2017-07-26 17:55   ` Jeff Layton
  -1 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Jeff Layton <jlayton@redhat.com>

We have this complex conditional copied to several places. Turn it into
a helper function.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 mm/filemap.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index e1cca770688f..72e46e6f0d9a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -522,12 +522,17 @@ int filemap_fdatawait(struct address_space *mapping)
 }
 EXPORT_SYMBOL(filemap_fdatawait);
 
+static bool mapping_needs_writeback(struct address_space *mapping)
+{
+	return (!dax_mapping(mapping) && mapping->nrpages) ||
+	    (dax_mapping(mapping) && mapping->nrexceptional);
+}
+
 int filemap_write_and_wait(struct address_space *mapping)
 {
 	int err = 0;
 
-	if ((!dax_mapping(mapping) && mapping->nrpages) ||
-	    (dax_mapping(mapping) && mapping->nrexceptional)) {
+	if (mapping_needs_writeback(mapping)) {
 		err = filemap_fdatawrite(mapping);
 		/*
 		 * Even if the above returned error, the pages may be
@@ -566,8 +571,7 @@ int filemap_write_and_wait_range(struct address_space *mapping,
 {
 	int err = 0;
 
-	if ((!dax_mapping(mapping) && mapping->nrpages) ||
-	    (dax_mapping(mapping) && mapping->nrexceptional)) {
+	if (mapping_needs_writeback(mapping)) {
 		err = __filemap_fdatawrite_range(mapping, lstart, lend,
 						 WB_SYNC_ALL);
 		/* See comment of filemap_write_and_wait() */
@@ -656,8 +660,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
 	int err = 0, err2;
 	struct address_space *mapping = file->f_mapping;
 
-	if ((!dax_mapping(mapping) && mapping->nrpages) ||
-	    (dax_mapping(mapping) && mapping->nrexceptional)) {
+	if (mapping_needs_writeback(mapping)) {
 		err = __filemap_fdatawrite_range(mapping, lstart, lend,
 						 WB_SYNC_ALL);
 		/* See comment of filemap_write_and_wait() */
-- 
2.13.3



^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback
@ 2017-07-26 17:55   ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara
  Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse,
	cluster-devel

From: Jeff Layton <jlayton@redhat.com>

We have this complex conditional copied to several places. Turn it into
a helper function.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 mm/filemap.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index e1cca770688f..72e46e6f0d9a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -522,12 +522,17 @@ int filemap_fdatawait(struct address_space *mapping)
 }
 EXPORT_SYMBOL(filemap_fdatawait);
 
+static bool mapping_needs_writeback(struct address_space *mapping)
+{
+	return (!dax_mapping(mapping) && mapping->nrpages) ||
+	    (dax_mapping(mapping) && mapping->nrexceptional);
+}
+
 int filemap_write_and_wait(struct address_space *mapping)
 {
 	int err = 0;
 
-	if ((!dax_mapping(mapping) && mapping->nrpages) ||
-	    (dax_mapping(mapping) && mapping->nrexceptional)) {
+	if (mapping_needs_writeback(mapping)) {
 		err = filemap_fdatawrite(mapping);
 		/*
 		 * Even if the above returned error, the pages may be
@@ -566,8 +571,7 @@ int filemap_write_and_wait_range(struct address_space *mapping,
 {
 	int err = 0;
 
-	if ((!dax_mapping(mapping) && mapping->nrpages) ||
-	    (dax_mapping(mapping) && mapping->nrexceptional)) {
+	if (mapping_needs_writeback(mapping)) {
 		err = __filemap_fdatawrite_range(mapping, lstart, lend,
 						 WB_SYNC_ALL);
 		/* See comment of filemap_write_and_wait() */
@@ -656,8 +660,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
 	int err = 0, err2;
 	struct address_space *mapping = file->f_mapping;
 
-	if ((!dax_mapping(mapping) && mapping->nrpages) ||
-	    (dax_mapping(mapping) && mapping->nrexceptional)) {
+	if (mapping_needs_writeback(mapping)) {
 		err = __filemap_fdatawrite_range(mapping, lstart, lend,
 						 WB_SYNC_ALL);
 		/* See comment of filemap_write_and_wait() */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback
@ 2017-07-26 17:55   ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara
  Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse,
	cluster-devel

From: Jeff Layton <jlayton@redhat.com>

We have this complex conditional copied to several places. Turn it into
a helper function.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 mm/filemap.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index e1cca770688f..72e46e6f0d9a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -522,12 +522,17 @@ int filemap_fdatawait(struct address_space *mapping)
 }
 EXPORT_SYMBOL(filemap_fdatawait);
 
+static bool mapping_needs_writeback(struct address_space *mapping)
+{
+	return (!dax_mapping(mapping) && mapping->nrpages) ||
+	    (dax_mapping(mapping) && mapping->nrexceptional);
+}
+
 int filemap_write_and_wait(struct address_space *mapping)
 {
 	int err = 0;
 
-	if ((!dax_mapping(mapping) && mapping->nrpages) ||
-	    (dax_mapping(mapping) && mapping->nrexceptional)) {
+	if (mapping_needs_writeback(mapping)) {
 		err = filemap_fdatawrite(mapping);
 		/*
 		 * Even if the above returned error, the pages may be
@@ -566,8 +571,7 @@ int filemap_write_and_wait_range(struct address_space *mapping,
 {
 	int err = 0;
 
-	if ((!dax_mapping(mapping) && mapping->nrpages) ||
-	    (dax_mapping(mapping) && mapping->nrexceptional)) {
+	if (mapping_needs_writeback(mapping)) {
 		err = __filemap_fdatawrite_range(mapping, lstart, lend,
 						 WB_SYNC_ALL);
 		/* See comment of filemap_write_and_wait() */
@@ -656,8 +660,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
 	int err = 0, err2;
 	struct address_space *mapping = file->f_mapping;
 
-	if ((!dax_mapping(mapping) && mapping->nrpages) ||
-	    (dax_mapping(mapping) && mapping->nrexceptional)) {
+	if (mapping_needs_writeback(mapping)) {
 		err = __filemap_fdatawrite_range(mapping, lstart, lend,
 						 WB_SYNC_ALL);
 		/* See comment of filemap_write_and_wait() */
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback
  2017-07-26 17:55   ` Jeff Layton
  (?)
@ 2017-07-27  8:43     ` Jan Kara
  -1 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-07-27  8:43 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed 26-07-17 13:55:35, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> We have this complex conditional copied to several places. Turn it into
> a helper function.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>

Looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  mm/filemap.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index e1cca770688f..72e46e6f0d9a 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -522,12 +522,17 @@ int filemap_fdatawait(struct address_space *mapping)
>  }
>  EXPORT_SYMBOL(filemap_fdatawait);
>  
> +static bool mapping_needs_writeback(struct address_space *mapping)
> +{
> +	return (!dax_mapping(mapping) && mapping->nrpages) ||
> +	    (dax_mapping(mapping) && mapping->nrexceptional);
> +}
> +
>  int filemap_write_and_wait(struct address_space *mapping)
>  {
>  	int err = 0;
>  
> -	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> -	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> +	if (mapping_needs_writeback(mapping)) {
>  		err = filemap_fdatawrite(mapping);
>  		/*
>  		 * Even if the above returned error, the pages may be
> @@ -566,8 +571,7 @@ int filemap_write_and_wait_range(struct address_space *mapping,
>  {
>  	int err = 0;
>  
> -	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> -	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> +	if (mapping_needs_writeback(mapping)) {
>  		err = __filemap_fdatawrite_range(mapping, lstart, lend,
>  						 WB_SYNC_ALL);
>  		/* See comment of filemap_write_and_wait() */
> @@ -656,8 +660,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
>  	int err = 0, err2;
>  	struct address_space *mapping = file->f_mapping;
>  
> -	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> -	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> +	if (mapping_needs_writeback(mapping)) {
>  		err = __filemap_fdatawrite_range(mapping, lstart, lend,
>  						 WB_SYNC_ALL);
>  		/* See comment of filemap_write_and_wait() */
> -- 
> 2.13.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback
@ 2017-07-27  8:43     ` Jan Kara
  0 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-07-27  8:43 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox,
	Bob Peterson, Steven Whitehouse, cluster-devel

On Wed 26-07-17 13:55:35, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> We have this complex conditional copied to several places. Turn it into
> a helper function.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>

Looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  mm/filemap.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index e1cca770688f..72e46e6f0d9a 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -522,12 +522,17 @@ int filemap_fdatawait(struct address_space *mapping)
>  }
>  EXPORT_SYMBOL(filemap_fdatawait);
>  
> +static bool mapping_needs_writeback(struct address_space *mapping)
> +{
> +	return (!dax_mapping(mapping) && mapping->nrpages) ||
> +	    (dax_mapping(mapping) && mapping->nrexceptional);
> +}
> +
>  int filemap_write_and_wait(struct address_space *mapping)
>  {
>  	int err = 0;
>  
> -	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> -	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> +	if (mapping_needs_writeback(mapping)) {
>  		err = filemap_fdatawrite(mapping);
>  		/*
>  		 * Even if the above returned error, the pages may be
> @@ -566,8 +571,7 @@ int filemap_write_and_wait_range(struct address_space *mapping,
>  {
>  	int err = 0;
>  
> -	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> -	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> +	if (mapping_needs_writeback(mapping)) {
>  		err = __filemap_fdatawrite_range(mapping, lstart, lend,
>  						 WB_SYNC_ALL);
>  		/* See comment of filemap_write_and_wait() */
> @@ -656,8 +660,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
>  	int err = 0, err2;
>  	struct address_space *mapping = file->f_mapping;
>  
> -	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> -	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> +	if (mapping_needs_writeback(mapping)) {
>  		err = __filemap_fdatawrite_range(mapping, lstart, lend,
>  						 WB_SYNC_ALL);
>  		/* See comment of filemap_write_and_wait() */
> -- 
> 2.13.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback
@ 2017-07-27  8:43     ` Jan Kara
  0 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-07-27  8:43 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox,
	Bob Peterson, Steven Whitehouse, cluster-devel

On Wed 26-07-17 13:55:35, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> We have this complex conditional copied to several places. Turn it into
> a helper function.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>

Looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  mm/filemap.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index e1cca770688f..72e46e6f0d9a 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -522,12 +522,17 @@ int filemap_fdatawait(struct address_space *mapping)
>  }
>  EXPORT_SYMBOL(filemap_fdatawait);
>  
> +static bool mapping_needs_writeback(struct address_space *mapping)
> +{
> +	return (!dax_mapping(mapping) && mapping->nrpages) ||
> +	    (dax_mapping(mapping) && mapping->nrexceptional);
> +}
> +
>  int filemap_write_and_wait(struct address_space *mapping)
>  {
>  	int err = 0;
>  
> -	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> -	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> +	if (mapping_needs_writeback(mapping)) {
>  		err = filemap_fdatawrite(mapping);
>  		/*
>  		 * Even if the above returned error, the pages may be
> @@ -566,8 +571,7 @@ int filemap_write_and_wait_range(struct address_space *mapping,
>  {
>  	int err = 0;
>  
> -	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> -	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> +	if (mapping_needs_writeback(mapping)) {
>  		err = __filemap_fdatawrite_range(mapping, lstart, lend,
>  						 WB_SYNC_ALL);
>  		/* See comment of filemap_write_and_wait() */
> @@ -656,8 +660,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
>  	int err = 0, err2;
>  	struct address_space *mapping = file->f_mapping;
>  
> -	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> -	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> +	if (mapping_needs_writeback(mapping)) {
>  		err = __filemap_fdatawrite_range(mapping, lstart, lend,
>  						 WB_SYNC_ALL);
>  		/* See comment of filemap_write_and_wait() */
> -- 
> 2.13.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-26 17:55 ` Jeff Layton
  (?)
@ 2017-07-26 17:55   ` Jeff Layton
  -1 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Jeff Layton <jlayton@redhat.com>

Some filesystem fsync routines will need these.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h |  7 ++++++-
 mm/filemap.c       | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21e7df1ad613..bc57a79294f0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2544,6 +2544,8 @@ extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
 				   loff_t lend);
 extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
 				  loff_t lend);
+extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
+						loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
@@ -2552,11 +2554,14 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping,
 extern int filemap_fdatawrite_range(struct address_space *mapping,
 				loff_t start, loff_t end);
 extern int filemap_check_errors(struct address_space *mapping);
-
 extern void __filemap_set_wb_err(struct address_space *mapping, int err);
+
+extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
+						loff_t lend);
 extern int __must_check file_check_and_advance_wb_err(struct file *file);
 extern int __must_check file_write_and_wait_range(struct file *file,
 						loff_t start, loff_t end);
+extern int __must_check file_write_and_wait(struct file *file);
 
 /**
  * filemap_set_wb_err - set a writeback error on an address_space
diff --git a/mm/filemap.c b/mm/filemap.c
index 72e46e6f0d9a..b904a8dfa43d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
 EXPORT_SYMBOL(filemap_fdatawait_range);
 
 /**
+ * file_fdatawait_range - wait for writeback to complete
+ * @file:		file pointing to address space structure to wait for
+ * @start_byte:		offset in bytes where the range starts
+ * @end_byte:		offset in bytes where the range ends (inclusive)
+ *
+ * Walk the list of under-writeback pages of the address space that file
+ * refers to, in the given range and wait for all of them.  Check error
+ * status of the address space vs. the file->f_wb_err cursor and return it.
+ *
+ * Since the error status of the file is advanced by this function,
+ * callers are responsible for checking the return value and handling and/or
+ * reporting the error.
+ */
+int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	__filemap_fdatawait_range(mapping, start_byte, end_byte);
+	return file_check_and_advance_wb_err(file);
+}
+EXPORT_SYMBOL(file_fdatawait_range);
+
+/**
  * filemap_fdatawait_keep_errors - wait for writeback without clearing errors
  * @mapping: address space structure to wait for
  *
@@ -675,6 +698,39 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
 EXPORT_SYMBOL(file_write_and_wait_range);
 
 /**
+ * file_write_and_wait - write out whole file and wait on it and return any
+ * 			 writeback errors since we last checked
+ * @file: file to write back and wait on
+ *
+ * Write back the whole file and wait on its mapping. Afterward, check for
+ * errors that may have occurred since our file->f_wb_err cursor was last
+ * updated.
+ */
+int file_write_and_wait(struct file *file)
+{
+	int err = 0, err2;
+	struct address_space *mapping = file->f_mapping;
+
+	if ((!dax_mapping(mapping) && mapping->nrpages) ||
+	    (dax_mapping(mapping) && mapping->nrexceptional)) {
+		err = filemap_fdatawrite(mapping);
+		/* See comment of filemap_write_and_wait() */
+		if (err != -EIO) {
+			loff_t i_size = i_size_read(mapping->host);
+
+			if (i_size != 0)
+				__filemap_fdatawait_range(mapping, 0,
+							  i_size - 1);
+		}
+	}
+	err2 = file_check_and_advance_wb_err(file);
+	if (!err)
+		err = err2;
+	return err;
+}
+EXPORT_SYMBOL(file_write_and_wait);
+
+/**
  * replace_page_cache_page - replace a pagecache page with a new one
  * @old:	page to be replaced
  * @new:	page to replace with
-- 
2.13.3



^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-26 17:55   ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara
  Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse,
	cluster-devel

From: Jeff Layton <jlayton@redhat.com>

Some filesystem fsync routines will need these.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h |  7 ++++++-
 mm/filemap.c       | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21e7df1ad613..bc57a79294f0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2544,6 +2544,8 @@ extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
 				   loff_t lend);
 extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
 				  loff_t lend);
+extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
+						loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
@@ -2552,11 +2554,14 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping,
 extern int filemap_fdatawrite_range(struct address_space *mapping,
 				loff_t start, loff_t end);
 extern int filemap_check_errors(struct address_space *mapping);
-
 extern void __filemap_set_wb_err(struct address_space *mapping, int err);
+
+extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
+						loff_t lend);
 extern int __must_check file_check_and_advance_wb_err(struct file *file);
 extern int __must_check file_write_and_wait_range(struct file *file,
 						loff_t start, loff_t end);
+extern int __must_check file_write_and_wait(struct file *file);
 
 /**
  * filemap_set_wb_err - set a writeback error on an address_space
diff --git a/mm/filemap.c b/mm/filemap.c
index 72e46e6f0d9a..b904a8dfa43d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
 EXPORT_SYMBOL(filemap_fdatawait_range);
 
 /**
+ * file_fdatawait_range - wait for writeback to complete
+ * @file:		file pointing to address space structure to wait for
+ * @start_byte:		offset in bytes where the range starts
+ * @end_byte:		offset in bytes where the range ends (inclusive)
+ *
+ * Walk the list of under-writeback pages of the address space that file
+ * refers to, in the given range and wait for all of them.  Check error
+ * status of the address space vs. the file->f_wb_err cursor and return it.
+ *
+ * Since the error status of the file is advanced by this function,
+ * callers are responsible for checking the return value and handling and/or
+ * reporting the error.
+ */
+int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	__filemap_fdatawait_range(mapping, start_byte, end_byte);
+	return file_check_and_advance_wb_err(file);
+}
+EXPORT_SYMBOL(file_fdatawait_range);
+
+/**
  * filemap_fdatawait_keep_errors - wait for writeback without clearing errors
  * @mapping: address space structure to wait for
  *
@@ -675,6 +698,39 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
 EXPORT_SYMBOL(file_write_and_wait_range);
 
 /**
+ * file_write_and_wait - write out whole file and wait on it and return any
+ * 			 writeback errors since we last checked
+ * @file: file to write back and wait on
+ *
+ * Write back the whole file and wait on its mapping. Afterward, check for
+ * errors that may have occurred since our file->f_wb_err cursor was last
+ * updated.
+ */
+int file_write_and_wait(struct file *file)
+{
+	int err = 0, err2;
+	struct address_space *mapping = file->f_mapping;
+
+	if ((!dax_mapping(mapping) && mapping->nrpages) ||
+	    (dax_mapping(mapping) && mapping->nrexceptional)) {
+		err = filemap_fdatawrite(mapping);
+		/* See comment of filemap_write_and_wait() */
+		if (err != -EIO) {
+			loff_t i_size = i_size_read(mapping->host);
+
+			if (i_size != 0)
+				__filemap_fdatawait_range(mapping, 0,
+							  i_size - 1);
+		}
+	}
+	err2 = file_check_and_advance_wb_err(file);
+	if (!err)
+		err = err2;
+	return err;
+}
+EXPORT_SYMBOL(file_write_and_wait);
+
+/**
  * replace_page_cache_page - replace a pagecache page with a new one
  * @old:	page to be replaced
  * @new:	page to replace with
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-26 17:55   ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara
  Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse,
	cluster-devel

From: Jeff Layton <jlayton@redhat.com>

Some filesystem fsync routines will need these.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h |  7 ++++++-
 mm/filemap.c       | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21e7df1ad613..bc57a79294f0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2544,6 +2544,8 @@ extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
 				   loff_t lend);
 extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
 				  loff_t lend);
+extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
+						loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
@@ -2552,11 +2554,14 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping,
 extern int filemap_fdatawrite_range(struct address_space *mapping,
 				loff_t start, loff_t end);
 extern int filemap_check_errors(struct address_space *mapping);
-
 extern void __filemap_set_wb_err(struct address_space *mapping, int err);
+
+extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
+						loff_t lend);
 extern int __must_check file_check_and_advance_wb_err(struct file *file);
 extern int __must_check file_write_and_wait_range(struct file *file,
 						loff_t start, loff_t end);
+extern int __must_check file_write_and_wait(struct file *file);
 
 /**
  * filemap_set_wb_err - set a writeback error on an address_space
diff --git a/mm/filemap.c b/mm/filemap.c
index 72e46e6f0d9a..b904a8dfa43d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
 EXPORT_SYMBOL(filemap_fdatawait_range);
 
 /**
+ * file_fdatawait_range - wait for writeback to complete
+ * @file:		file pointing to address space structure to wait for
+ * @start_byte:		offset in bytes where the range starts
+ * @end_byte:		offset in bytes where the range ends (inclusive)
+ *
+ * Walk the list of under-writeback pages of the address space that file
+ * refers to, in the given range and wait for all of them.  Check error
+ * status of the address space vs. the file->f_wb_err cursor and return it.
+ *
+ * Since the error status of the file is advanced by this function,
+ * callers are responsible for checking the return value and handling and/or
+ * reporting the error.
+ */
+int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	__filemap_fdatawait_range(mapping, start_byte, end_byte);
+	return file_check_and_advance_wb_err(file);
+}
+EXPORT_SYMBOL(file_fdatawait_range);
+
+/**
  * filemap_fdatawait_keep_errors - wait for writeback without clearing errors
  * @mapping: address space structure to wait for
  *
@@ -675,6 +698,39 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
 EXPORT_SYMBOL(file_write_and_wait_range);
 
 /**
+ * file_write_and_wait - write out whole file and wait on it and return any
+ * 			 writeback errors since we last checked
+ * @file: file to write back and wait on
+ *
+ * Write back the whole file and wait on its mapping. Afterward, check for
+ * errors that may have occurred since our file->f_wb_err cursor was last
+ * updated.
+ */
+int file_write_and_wait(struct file *file)
+{
+	int err = 0, err2;
+	struct address_space *mapping = file->f_mapping;
+
+	if ((!dax_mapping(mapping) && mapping->nrpages) ||
+	    (dax_mapping(mapping) && mapping->nrexceptional)) {
+		err = filemap_fdatawrite(mapping);
+		/* See comment of filemap_write_and_wait() */
+		if (err != -EIO) {
+			loff_t i_size = i_size_read(mapping->host);
+
+			if (i_size != 0)
+				__filemap_fdatawait_range(mapping, 0,
+							  i_size - 1);
+		}
+	}
+	err2 = file_check_and_advance_wb_err(file);
+	if (!err)
+		err = err2;
+	return err;
+}
+EXPORT_SYMBOL(file_write_and_wait);
+
+/**
  * replace_page_cache_page - replace a pagecache page with a new one
  * @old:	page to be replaced
  * @new:	page to replace with
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-26 17:55   ` Jeff Layton
  (?)
@ 2017-07-26 19:13     ` Matthew Wilcox
  -1 siblings, 0 replies; 87+ messages in thread
From: Matthew Wilcox @ 2017-07-26 19:13 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Jul 26, 2017 at 01:55:36PM -0400, Jeff Layton wrote:
> +int file_write_and_wait(struct file *file)
> +{
> +	int err = 0, err2;
> +	struct address_space *mapping = file->f_mapping;
> +
> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {

Since patch 1 exists, shouldn't this use the new helper?

> +		err = filemap_fdatawrite(mapping);
> +		/* See comment of filemap_write_and_wait() */
> +		if (err != -EIO) {
> +			loff_t i_size = i_size_read(mapping->host);
> +
> +			if (i_size != 0)
> +				__filemap_fdatawait_range(mapping, 0,
> +							  i_size - 1);
> +		}
> +	}
> +	err2 = file_check_and_advance_wb_err(file);
> +	if (!err)
> +		err = err2;
> +	return err;

Would this be clearer written as:

	if (err)
		return err;
	return err2;

or even ...

	return err ? err : err2;



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-26 19:13     ` Matthew Wilcox
  0 siblings, 0 replies; 87+ messages in thread
From: Matthew Wilcox @ 2017-07-26 19:13 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Bob Peterson,
	Steven Whitehouse, cluster-devel

On Wed, Jul 26, 2017 at 01:55:36PM -0400, Jeff Layton wrote:
> +int file_write_and_wait(struct file *file)
> +{
> +	int err = 0, err2;
> +	struct address_space *mapping = file->f_mapping;
> +
> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {

Since patch 1 exists, shouldn't this use the new helper?

> +		err = filemap_fdatawrite(mapping);
> +		/* See comment of filemap_write_and_wait() */
> +		if (err != -EIO) {
> +			loff_t i_size = i_size_read(mapping->host);
> +
> +			if (i_size != 0)
> +				__filemap_fdatawait_range(mapping, 0,
> +							  i_size - 1);
> +		}
> +	}
> +	err2 = file_check_and_advance_wb_err(file);
> +	if (!err)
> +		err = err2;
> +	return err;

Would this be clearer written as:

	if (err)
		return err;
	return err2;

or even ...

	return err ? err : err2;

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-26 19:13     ` Matthew Wilcox
  0 siblings, 0 replies; 87+ messages in thread
From: Matthew Wilcox @ 2017-07-26 19:13 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Bob Peterson,
	Steven Whitehouse, cluster-devel

On Wed, Jul 26, 2017 at 01:55:36PM -0400, Jeff Layton wrote:
> +int file_write_and_wait(struct file *file)
> +{
> +	int err = 0, err2;
> +	struct address_space *mapping = file->f_mapping;
> +
> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {

Since patch 1 exists, shouldn't this use the new helper?

> +		err = filemap_fdatawrite(mapping);
> +		/* See comment of filemap_write_and_wait() */
> +		if (err != -EIO) {
> +			loff_t i_size = i_size_read(mapping->host);
> +
> +			if (i_size != 0)
> +				__filemap_fdatawait_range(mapping, 0,
> +							  i_size - 1);
> +		}
> +	}
> +	err2 = file_check_and_advance_wb_err(file);
> +	if (!err)
> +		err = err2;
> +	return err;

Would this be clearer written as:

	if (err)
		return err;
	return err2;

or even ...

	return err ? err : err2;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-26 19:13     ` Matthew Wilcox
  (?)
@ 2017-07-26 22:18       ` Jeff Layton
  -1 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 22:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, 2017-07-26 at 12:13 -0700, Matthew Wilcox wrote:
> On Wed, Jul 26, 2017 at 01:55:36PM -0400, Jeff Layton wrote:
> > +int file_write_and_wait(struct file *file)
> > +{
> > +	int err = 0, err2;
> > +	struct address_space *mapping = file->f_mapping;
> > +
> > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> 
> Since patch 1 exists, shouldn't this use the new helper?
> 

<facepalm>

yes, will fix


> > +		err = filemap_fdatawrite(mapping);
> > +		/* See comment of filemap_write_and_wait() */
> > +		if (err != -EIO) {
> > +			loff_t i_size = i_size_read(mapping->host);
> > +
> > +			if (i_size != 0)
> > +				__filemap_fdatawait_range(mapping, 0,
> > +							  i_size - 1);
> > +		}
> > +	}
> > +	err2 = file_check_and_advance_wb_err(file);
> > +	if (!err)
> > +		err = err2;
> > +	return err;
> 
> Would this be clearer written as:
> 
> 	if (err)
> 		return err;
> 	return err2;
> 
> or even ...
> 
> 	return err ? err : err2;
> 

Meh -- I like it the way I have it. If we don't have an error already,
then just take the one from the check and advance.

That said, I don't have a terribly strong preference here, so if anyone
does, then I can be easily persuaded.

-- 
-- 
Jeff Layton <jlayton@redhat.com>



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-26 22:18       ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 22:18 UTC (permalink / raw)
  To: Matthew Wilcox, Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Bob Peterson,
	Steven Whitehouse, cluster-devel

On Wed, 2017-07-26 at 12:13 -0700, Matthew Wilcox wrote:
> On Wed, Jul 26, 2017 at 01:55:36PM -0400, Jeff Layton wrote:
> > +int file_write_and_wait(struct file *file)
> > +{
> > +	int err = 0, err2;
> > +	struct address_space *mapping = file->f_mapping;
> > +
> > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> 
> Since patch 1 exists, shouldn't this use the new helper?
> 

<facepalm>

yes, will fix


> > +		err = filemap_fdatawrite(mapping);
> > +		/* See comment of filemap_write_and_wait() */
> > +		if (err != -EIO) {
> > +			loff_t i_size = i_size_read(mapping->host);
> > +
> > +			if (i_size != 0)
> > +				__filemap_fdatawait_range(mapping, 0,
> > +							  i_size - 1);
> > +		}
> > +	}
> > +	err2 = file_check_and_advance_wb_err(file);
> > +	if (!err)
> > +		err = err2;
> > +	return err;
> 
> Would this be clearer written as:
> 
> 	if (err)
> 		return err;
> 	return err2;
> 
> or even ...
> 
> 	return err ? err : err2;
> 

Meh -- I like it the way I have it. If we don't have an error already,
then just take the one from the check and advance.

That said, I don't have a terribly strong preference here, so if anyone
does, then I can be easily persuaded.

-- 
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-26 22:18       ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 22:18 UTC (permalink / raw)
  To: Matthew Wilcox, Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Bob Peterson,
	Steven Whitehouse, cluster-devel

On Wed, 2017-07-26 at 12:13 -0700, Matthew Wilcox wrote:
> On Wed, Jul 26, 2017 at 01:55:36PM -0400, Jeff Layton wrote:
> > +int file_write_and_wait(struct file *file)
> > +{
> > +	int err = 0, err2;
> > +	struct address_space *mapping = file->f_mapping;
> > +
> > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> 
> Since patch 1 exists, shouldn't this use the new helper?
> 

<facepalm>

yes, will fix


> > +		err = filemap_fdatawrite(mapping);
> > +		/* See comment of filemap_write_and_wait() */
> > +		if (err != -EIO) {
> > +			loff_t i_size = i_size_read(mapping->host);
> > +
> > +			if (i_size != 0)
> > +				__filemap_fdatawait_range(mapping, 0,
> > +							  i_size - 1);
> > +		}
> > +	}
> > +	err2 = file_check_and_advance_wb_err(file);
> > +	if (!err)
> > +		err = err2;
> > +	return err;
> 
> Would this be clearer written as:
> 
> 	if (err)
> 		return err;
> 	return err2;
> 
> or even ...
> 
> 	return err ? err : err2;
> 

Meh -- I like it the way I have it. If we don't have an error already,
then just take the one from the check and advance.

That said, I don't have a terribly strong preference here, so if anyone
does, then I can be easily persuaded.

-- 
-- 
Jeff Layton <jlayton@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-26 17:55   ` Jeff Layton
  (?)
@ 2017-07-26 19:50     ` Bob Peterson
  -1 siblings, 0 replies; 87+ messages in thread
From: Bob Peterson @ 2017-07-26 19:50 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
| From: Jeff Layton <jlayton@redhat.com>
| 
| Some filesystem fsync routines will need these.
| 
| Signed-off-by: Jeff Layton <jlayton@redhat.com>
| ---
|  include/linux/fs.h |  7 ++++++-
|  mm/filemap.c       | 56
|  ++++++++++++++++++++++++++++++++++++++++++++++++++++++
|  2 files changed, 62 insertions(+), 1 deletion(-)
(snip)
| diff --git a/mm/filemap.c b/mm/filemap.c
| index 72e46e6f0d9a..b904a8dfa43d 100644
| --- a/mm/filemap.c
| +++ b/mm/filemap.c
(snip)
| @@ -675,6 +698,39 @@ int file_write_and_wait_range(struct file *file, loff_t
| lstart, loff_t lend)
|  EXPORT_SYMBOL(file_write_and_wait_range);
|  
|  /**
| + * file_write_and_wait - write out whole file and wait on it and return any
| + * 			 writeback errors since we last checked
| + * @file: file to write back and wait on
| + *
| + * Write back the whole file and wait on its mapping. Afterward, check for
| + * errors that may have occurred since our file->f_wb_err cursor was last
| + * updated.
| + */
| +int file_write_and_wait(struct file *file)
| +{
| +	int err = 0, err2;
| +	struct address_space *mapping = file->f_mapping;
| +
| +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
| +	    (dax_mapping(mapping) && mapping->nrexceptional)) {

Seems like we should make the new function mapping_needs_writeback more
central (mm.h or fs.h?) and call it here ^.

| +		err = filemap_fdatawrite(mapping);
| +		/* See comment of filemap_write_and_wait() */
| +		if (err != -EIO) {
| +			loff_t i_size = i_size_read(mapping->host);
| +
| +			if (i_size != 0)
| +				__filemap_fdatawait_range(mapping, 0,
| +							  i_size - 1);
| +		}
| +	}
| +	err2 = file_check_and_advance_wb_err(file);
| +	if (!err)
| +		err = err2;
| +	return err;

In the past, I've seen more elegant constructs like:
        return (err ? err : err2);
but I don't know what's considered more ugly or hackish.

Regards,

Bob Peterson
Red Hat File Systems



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-26 19:50     ` Bob Peterson
  0 siblings, 0 replies; 87+ messages in thread
From: Bob Peterson @ 2017-07-26 19:50 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox,
	Steven Whitehouse, cluster-devel

----- Original Message -----
| From: Jeff Layton <jlayton@redhat.com>
| 
| Some filesystem fsync routines will need these.
| 
| Signed-off-by: Jeff Layton <jlayton@redhat.com>
| ---
|  include/linux/fs.h |  7 ++++++-
|  mm/filemap.c       | 56
|  ++++++++++++++++++++++++++++++++++++++++++++++++++++++
|  2 files changed, 62 insertions(+), 1 deletion(-)
(snip)
| diff --git a/mm/filemap.c b/mm/filemap.c
| index 72e46e6f0d9a..b904a8dfa43d 100644
| --- a/mm/filemap.c
| +++ b/mm/filemap.c
(snip)
| @@ -675,6 +698,39 @@ int file_write_and_wait_range(struct file *file, loff_t
| lstart, loff_t lend)
|  EXPORT_SYMBOL(file_write_and_wait_range);
|  
|  /**
| + * file_write_and_wait - write out whole file and wait on it and return any
| + * 			 writeback errors since we last checked
| + * @file: file to write back and wait on
| + *
| + * Write back the whole file and wait on its mapping. Afterward, check for
| + * errors that may have occurred since our file->f_wb_err cursor was last
| + * updated.
| + */
| +int file_write_and_wait(struct file *file)
| +{
| +	int err = 0, err2;
| +	struct address_space *mapping = file->f_mapping;
| +
| +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
| +	    (dax_mapping(mapping) && mapping->nrexceptional)) {

Seems like we should make the new function mapping_needs_writeback more
central (mm.h or fs.h?) and call it here ^.

| +		err = filemap_fdatawrite(mapping);
| +		/* See comment of filemap_write_and_wait() */
| +		if (err != -EIO) {
| +			loff_t i_size = i_size_read(mapping->host);
| +
| +			if (i_size != 0)
| +				__filemap_fdatawait_range(mapping, 0,
| +							  i_size - 1);
| +		}
| +	}
| +	err2 = file_check_and_advance_wb_err(file);
| +	if (!err)
| +		err = err2;
| +	return err;

In the past, I've seen more elegant constructs like:
        return (err ? err : err2);
but I don't know what's considered more ugly or hackish.

Regards,

Bob Peterson
Red Hat File Systems

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-26 19:50     ` Bob Peterson
  0 siblings, 0 replies; 87+ messages in thread
From: Bob Peterson @ 2017-07-26 19:50 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox,
	Steven Whitehouse, cluster-devel

----- Original Message -----
| From: Jeff Layton <jlayton@redhat.com>
| 
| Some filesystem fsync routines will need these.
| 
| Signed-off-by: Jeff Layton <jlayton@redhat.com>
| ---
|  include/linux/fs.h |  7 ++++++-
|  mm/filemap.c       | 56
|  ++++++++++++++++++++++++++++++++++++++++++++++++++++++
|  2 files changed, 62 insertions(+), 1 deletion(-)
(snip)
| diff --git a/mm/filemap.c b/mm/filemap.c
| index 72e46e6f0d9a..b904a8dfa43d 100644
| --- a/mm/filemap.c
| +++ b/mm/filemap.c
(snip)
| @@ -675,6 +698,39 @@ int file_write_and_wait_range(struct file *file, loff_t
| lstart, loff_t lend)
|  EXPORT_SYMBOL(file_write_and_wait_range);
|  
|  /**
| + * file_write_and_wait - write out whole file and wait on it and return any
| + * 			 writeback errors since we last checked
| + * @file: file to write back and wait on
| + *
| + * Write back the whole file and wait on its mapping. Afterward, check for
| + * errors that may have occurred since our file->f_wb_err cursor was last
| + * updated.
| + */
| +int file_write_and_wait(struct file *file)
| +{
| +	int err = 0, err2;
| +	struct address_space *mapping = file->f_mapping;
| +
| +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
| +	    (dax_mapping(mapping) && mapping->nrexceptional)) {

Seems like we should make the new function mapping_needs_writeback more
central (mm.h or fs.h?) and call it here ^.

| +		err = filemap_fdatawrite(mapping);
| +		/* See comment of filemap_write_and_wait() */
| +		if (err != -EIO) {
| +			loff_t i_size = i_size_read(mapping->host);
| +
| +			if (i_size != 0)
| +				__filemap_fdatawait_range(mapping, 0,
| +							  i_size - 1);
| +		}
| +	}
| +	err2 = file_check_and_advance_wb_err(file);
| +	if (!err)
| +		err = err2;
| +	return err;

In the past, I've seen more elegant constructs like:
        return (err ? err : err2);
but I don't know what's considered more ugly or hackish.

Regards,

Bob Peterson
Red Hat File Systems

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-26 17:55   ` Jeff Layton
  (?)
@ 2017-07-27  8:49     ` Jan Kara
  -1 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-07-27  8:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> +int file_write_and_wait(struct file *file)
> +{
> +	int err = 0, err2;
> +	struct address_space *mapping = file->f_mapping;
> +
> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> +		err = filemap_fdatawrite(mapping);
> +		/* See comment of filemap_write_and_wait() */
> +		if (err != -EIO) {
> +			loff_t i_size = i_size_read(mapping->host);
> +
> +			if (i_size != 0)
> +				__filemap_fdatawait_range(mapping, 0,
> +							  i_size - 1);
> +		}
> +	}

Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
range and ignore i_size. It is much easier than trying to wrap your head
around possible races with file operations modifying i_size.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-27  8:49     ` Jan Kara
  0 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-07-27  8:49 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox,
	Bob Peterson, Steven Whitehouse, cluster-devel

On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> +int file_write_and_wait(struct file *file)
> +{
> +	int err = 0, err2;
> +	struct address_space *mapping = file->f_mapping;
> +
> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> +		err = filemap_fdatawrite(mapping);
> +		/* See comment of filemap_write_and_wait() */
> +		if (err != -EIO) {
> +			loff_t i_size = i_size_read(mapping->host);
> +
> +			if (i_size != 0)
> +				__filemap_fdatawait_range(mapping, 0,
> +							  i_size - 1);
> +		}
> +	}

Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
range and ignore i_size. It is much easier than trying to wrap your head
around possible races with file operations modifying i_size.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-27  8:49     ` Jan Kara
  0 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-07-27  8:49 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox,
	Bob Peterson, Steven Whitehouse, cluster-devel

On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> +int file_write_and_wait(struct file *file)
> +{
> +	int err = 0, err2;
> +	struct address_space *mapping = file->f_mapping;
> +
> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> +		err = filemap_fdatawrite(mapping);
> +		/* See comment of filemap_write_and_wait() */
> +		if (err != -EIO) {
> +			loff_t i_size = i_size_read(mapping->host);
> +
> +			if (i_size != 0)
> +				__filemap_fdatawait_range(mapping, 0,
> +							  i_size - 1);
> +		}
> +	}

Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
range and ignore i_size. It is much easier than trying to wrap your head
around possible races with file operations modifying i_size.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-27  8:49     ` Jan Kara
  (?)
@ 2017-07-27 12:48       ` Jeff Layton
  -1 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-27 12:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > +int file_write_and_wait(struct file *file)
> > +{
> > +	int err = 0, err2;
> > +	struct address_space *mapping = file->f_mapping;
> > +
> > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > +		err = filemap_fdatawrite(mapping);
> > +		/* See comment of filemap_write_and_wait() */
> > +		if (err != -EIO) {
> > +			loff_t i_size = i_size_read(mapping->host);
> > +
> > +			if (i_size != 0)
> > +				__filemap_fdatawait_range(mapping, 0,
> > +							  i_size - 1);
> > +		}
> > +	}
> 
> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> range and ignore i_size. It is much easier than trying to wrap your head
> around possible races with file operations modifying i_size.
> 
> 								Honza

I'm basically emulating _exactly_ what filemap_write_and_wait does here,
as I'm leery of making subtle behavior changes in the actual writeback
behavior. For example:

-----------------8<----------------
static inline int __filemap_fdatawrite(struct address_space *mapping,
        int sync_mode)
{
        return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
}

int filemap_fdatawrite(struct address_space *mapping)
{
        return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
}
EXPORT_SYMBOL(filemap_fdatawrite);
-----------------8<----------------

...which then sets up the wbc with the right ranges and sync mode and
kicks off writepages. But then, it does the i_size_read to figure out
what range it should wait on (with the shortcut for the size == 0 case).

My assumption was that it was intentionally designed that way, but I'm
guessing from your comments that it wasn't? If so, then we can turn
file_write_and_wait a static inline wrapper around
file_write_and_wait_range.
-- 
Jeff Layton <jlayton@redhat.com>



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-27 12:48       ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-27 12:48 UTC (permalink / raw)
  To: Jan Kara, Jeff Layton
  Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson,
	Steven Whitehouse, cluster-devel

On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > +int file_write_and_wait(struct file *file)
> > +{
> > +	int err = 0, err2;
> > +	struct address_space *mapping = file->f_mapping;
> > +
> > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > +		err = filemap_fdatawrite(mapping);
> > +		/* See comment of filemap_write_and_wait() */
> > +		if (err != -EIO) {
> > +			loff_t i_size = i_size_read(mapping->host);
> > +
> > +			if (i_size != 0)
> > +				__filemap_fdatawait_range(mapping, 0,
> > +							  i_size - 1);
> > +		}
> > +	}
> 
> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> range and ignore i_size. It is much easier than trying to wrap your head
> around possible races with file operations modifying i_size.
> 
> 								Honza

I'm basically emulating _exactly_ what filemap_write_and_wait does here,
as I'm leery of making subtle behavior changes in the actual writeback
behavior. For example:

-----------------8<----------------
static inline int __filemap_fdatawrite(struct address_space *mapping,
        int sync_mode)
{
        return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
}

int filemap_fdatawrite(struct address_space *mapping)
{
        return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
}
EXPORT_SYMBOL(filemap_fdatawrite);
-----------------8<----------------

...which then sets up the wbc with the right ranges and sync mode and
kicks off writepages. But then, it does the i_size_read to figure out
what range it should wait on (with the shortcut for the size == 0 case).

My assumption was that it was intentionally designed that way, but I'm
guessing from your comments that it wasn't? If so, then we can turn
file_write_and_wait a static inline wrapper around
file_write_and_wait_range.
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-27 12:48       ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-27 12:48 UTC (permalink / raw)
  To: Jan Kara, Jeff Layton
  Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson,
	Steven Whitehouse, cluster-devel

On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > +int file_write_and_wait(struct file *file)
> > +{
> > +	int err = 0, err2;
> > +	struct address_space *mapping = file->f_mapping;
> > +
> > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > +		err = filemap_fdatawrite(mapping);
> > +		/* See comment of filemap_write_and_wait() */
> > +		if (err != -EIO) {
> > +			loff_t i_size = i_size_read(mapping->host);
> > +
> > +			if (i_size != 0)
> > +				__filemap_fdatawait_range(mapping, 0,
> > +							  i_size - 1);
> > +		}
> > +	}
> 
> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> range and ignore i_size. It is much easier than trying to wrap your head
> around possible races with file operations modifying i_size.
> 
> 								Honza

I'm basically emulating _exactly_ what filemap_write_and_wait does here,
as I'm leery of making subtle behavior changes in the actual writeback
behavior. For example:

-----------------8<----------------
static inline int __filemap_fdatawrite(struct address_space *mapping,
        int sync_mode)
{
        return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
}

int filemap_fdatawrite(struct address_space *mapping)
{
        return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
}
EXPORT_SYMBOL(filemap_fdatawrite);
-----------------8<----------------

...which then sets up the wbc with the right ranges and sync mode and
kicks off writepages. But then, it does the i_size_read to figure out
what range it should wait on (with the shortcut for the size == 0 case).

My assumption was that it was intentionally designed that way, but I'm
guessing from your comments that it wasn't? If so, then we can turn
file_write_and_wait a static inline wrapper around
file_write_and_wait_range.
-- 
Jeff Layton <jlayton@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-27 12:48       ` Jeff Layton
  (?)
@ 2017-07-31 11:27         ` Jeff Layton
  -1 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 11:27 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > +int file_write_and_wait(struct file *file)
> > > +{
> > > +	int err = 0, err2;
> > > +	struct address_space *mapping = file->f_mapping;
> > > +
> > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > +		err = filemap_fdatawrite(mapping);
> > > +		/* See comment of filemap_write_and_wait() */
> > > +		if (err != -EIO) {
> > > +			loff_t i_size = i_size_read(mapping->host);
> > > +
> > > +			if (i_size != 0)
> > > +				__filemap_fdatawait_range(mapping, 0,
> > > +							  i_size - 1);
> > > +		}
> > > +	}
> > 
> > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > range and ignore i_size. It is much easier than trying to wrap your head
> > around possible races with file operations modifying i_size.
> > 
> > 								Honza
> 
> I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> as I'm leery of making subtle behavior changes in the actual writeback
> behavior. For example:
> 
> -----------------8<----------------
> static inline int __filemap_fdatawrite(struct address_space *mapping,
>         int sync_mode)
> {
>         return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> }
> 
> int filemap_fdatawrite(struct address_space *mapping)
> {
>         return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> }
> EXPORT_SYMBOL(filemap_fdatawrite);
> -----------------8<----------------
> 
> ...which then sets up the wbc with the right ranges and sync mode and
> kicks off writepages. But then, it does the i_size_read to figure out
> what range it should wait on (with the shortcut for the size == 0 case).
> 
> My assumption was that it was intentionally designed that way, but I'm
> guessing from your comments that it wasn't? If so, then we can turn
> file_write_and_wait a static inline wrapper around
> file_write_and_wait_range.

FWIW, I did a bit of archaeology in the linux-history tree and found
this patch from Marcelo in 2004. Is this optimization still helpful? If
not, then that does simplify the code a bit.

-------------------8<--------------------

[PATCH] small wait_on_page_writeback_range() optimization

filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
parameter.  This is not needed since we know the EOF from the inode.  Use
that instead.

Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
 mm/filemap.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 78e18b7639b6..55fb7b4141e4 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
  */
 int filemap_fdatawait(struct address_space *mapping)
 {
-	return wait_on_page_writeback_range(mapping, 0, -1);
+	loff_t i_size = i_size_read(mapping->host);
+
+	if (i_size == 0)
+		return 0;
+
+	return wait_on_page_writeback_range(mapping, 0,
+				(i_size - 1) >> PAGE_CACHE_SHIFT);
 }
 EXPORT_SYMBOL(filemap_fdatawait);



^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 11:27         ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 11:27 UTC (permalink / raw)
  To: Jan Kara, Marcelo Tosatti
  Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson,
	Steven Whitehouse, cluster-devel

On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > +int file_write_and_wait(struct file *file)
> > > +{
> > > +	int err = 0, err2;
> > > +	struct address_space *mapping = file->f_mapping;
> > > +
> > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > +		err = filemap_fdatawrite(mapping);
> > > +		/* See comment of filemap_write_and_wait() */
> > > +		if (err != -EIO) {
> > > +			loff_t i_size = i_size_read(mapping->host);
> > > +
> > > +			if (i_size != 0)
> > > +				__filemap_fdatawait_range(mapping, 0,
> > > +							  i_size - 1);
> > > +		}
> > > +	}
> > 
> > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > range and ignore i_size. It is much easier than trying to wrap your head
> > around possible races with file operations modifying i_size.
> > 
> > 								Honza
> 
> I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> as I'm leery of making subtle behavior changes in the actual writeback
> behavior. For example:
> 
> -----------------8<----------------
> static inline int __filemap_fdatawrite(struct address_space *mapping,
>         int sync_mode)
> {
>         return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> }
> 
> int filemap_fdatawrite(struct address_space *mapping)
> {
>         return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> }
> EXPORT_SYMBOL(filemap_fdatawrite);
> -----------------8<----------------
> 
> ...which then sets up the wbc with the right ranges and sync mode and
> kicks off writepages. But then, it does the i_size_read to figure out
> what range it should wait on (with the shortcut for the size == 0 case).
> 
> My assumption was that it was intentionally designed that way, but I'm
> guessing from your comments that it wasn't? If so, then we can turn
> file_write_and_wait a static inline wrapper around
> file_write_and_wait_range.

FWIW, I did a bit of archaeology in the linux-history tree and found
this patch from Marcelo in 2004. Is this optimization still helpful? If
not, then that does simplify the code a bit.

-------------------8<--------------------

[PATCH] small wait_on_page_writeback_range() optimization

filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
parameter.  This is not needed since we know the EOF from the inode.  Use
that instead.

Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
 mm/filemap.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 78e18b7639b6..55fb7b4141e4 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
  */
 int filemap_fdatawait(struct address_space *mapping)
 {
-	return wait_on_page_writeback_range(mapping, 0, -1);
+	loff_t i_size = i_size_read(mapping->host);
+
+	if (i_size == 0)
+		return 0;
+
+	return wait_on_page_writeback_range(mapping, 0,
+				(i_size - 1) >> PAGE_CACHE_SHIFT);
 }
 EXPORT_SYMBOL(filemap_fdatawait);

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 11:27         ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 11:27 UTC (permalink / raw)
  To: Jan Kara, Marcelo Tosatti
  Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson,
	Steven Whitehouse, cluster-devel

On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > +int file_write_and_wait(struct file *file)
> > > +{
> > > +	int err = 0, err2;
> > > +	struct address_space *mapping = file->f_mapping;
> > > +
> > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > +		err = filemap_fdatawrite(mapping);
> > > +		/* See comment of filemap_write_and_wait() */
> > > +		if (err != -EIO) {
> > > +			loff_t i_size = i_size_read(mapping->host);
> > > +
> > > +			if (i_size != 0)
> > > +				__filemap_fdatawait_range(mapping, 0,
> > > +							  i_size - 1);
> > > +		}
> > > +	}
> > 
> > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > range and ignore i_size. It is much easier than trying to wrap your head
> > around possible races with file operations modifying i_size.
> > 
> > 								Honza
> 
> I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> as I'm leery of making subtle behavior changes in the actual writeback
> behavior. For example:
> 
> -----------------8<----------------
> static inline int __filemap_fdatawrite(struct address_space *mapping,
>         int sync_mode)
> {
>         return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> }
> 
> int filemap_fdatawrite(struct address_space *mapping)
> {
>         return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> }
> EXPORT_SYMBOL(filemap_fdatawrite);
> -----------------8<----------------
> 
> ...which then sets up the wbc with the right ranges and sync mode and
> kicks off writepages. But then, it does the i_size_read to figure out
> what range it should wait on (with the shortcut for the size == 0 case).
> 
> My assumption was that it was intentionally designed that way, but I'm
> guessing from your comments that it wasn't? If so, then we can turn
> file_write_and_wait a static inline wrapper around
> file_write_and_wait_range.

FWIW, I did a bit of archaeology in the linux-history tree and found
this patch from Marcelo in 2004. Is this optimization still helpful? If
not, then that does simplify the code a bit.

-------------------8<--------------------

[PATCH] small wait_on_page_writeback_range() optimization

filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
parameter.  This is not needed since we know the EOF from the inode.  Use
that instead.

Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
 mm/filemap.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 78e18b7639b6..55fb7b4141e4 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
  */
 int filemap_fdatawait(struct address_space *mapping)
 {
-	return wait_on_page_writeback_range(mapping, 0, -1);
+	loff_t i_size = i_size_read(mapping->host);
+
+	if (i_size == 0)
+		return 0;
+
+	return wait_on_page_writeback_range(mapping, 0,
+				(i_size - 1) >> PAGE_CACHE_SHIFT);
 }
 EXPORT_SYMBOL(filemap_fdatawait);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-31 11:27         ` Jeff Layton
  (?)
@ 2017-07-31 11:32           ` Steven Whitehouse
  -1 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-31 11:32 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 31/07/17 12:27, Jeff Layton wrote:
> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote:
>>>> +int file_write_and_wait(struct file *file)
>>>> +{
>>>> +	int err = 0, err2;
>>>> +	struct address_space *mapping = file->f_mapping;
>>>> +
>>>> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
>>>> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
>>>> +		err = filemap_fdatawrite(mapping);
>>>> +		/* See comment of filemap_write_and_wait() */
>>>> +		if (err != -EIO) {
>>>> +			loff_t i_size = i_size_read(mapping->host);
>>>> +
>>>> +			if (i_size != 0)
>>>> +				__filemap_fdatawait_range(mapping, 0,
>>>> +							  i_size - 1);
>>>> +		}
>>>> +	}
>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
>>> range and ignore i_size. It is much easier than trying to wrap your head
>>> around possible races with file operations modifying i_size.
>>>
>>> 								Honza
>> I'm basically emulating _exactly_ what filemap_write_and_wait does here,
>> as I'm leery of making subtle behavior changes in the actual writeback
>> behavior. For example:
>>
>> -----------------8<----------------
>> static inline int __filemap_fdatawrite(struct address_space *mapping,
>>          int sync_mode)
>> {
>>          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
>> }
>>
>> int filemap_fdatawrite(struct address_space *mapping)
>> {
>>          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
>> }
>> EXPORT_SYMBOL(filemap_fdatawrite);
>> -----------------8<----------------
>>
>> ...which then sets up the wbc with the right ranges and sync mode and
>> kicks off writepages. But then, it does the i_size_read to figure out
>> what range it should wait on (with the shortcut for the size == 0 case).
>>
>> My assumption was that it was intentionally designed that way, but I'm
>> guessing from your comments that it wasn't? If so, then we can turn
>> file_write_and_wait a static inline wrapper around
>> file_write_and_wait_range.
> FWIW, I did a bit of archaeology in the linux-history tree and found
> this patch from Marcelo in 2004. Is this optimization still helpful? If
> not, then that does simplify the code a bit.
>
> -------------------8<--------------------
>
> [PATCH] small wait_on_page_writeback_range() optimization
>
> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> parameter.  This is not needed since we know the EOF from the inode.  Use
> that instead.
>
> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> ---
>   mm/filemap.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 78e18b7639b6..55fb7b4141e4 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
>    */
>   int filemap_fdatawait(struct address_space *mapping)
>   {
> -	return wait_on_page_writeback_range(mapping, 0, -1);
> +	loff_t i_size = i_size_read(mapping->host);
> +
> +	if (i_size == 0)
> +		return 0;
> +
> +	return wait_on_page_writeback_range(mapping, 0,
> +				(i_size - 1) >> PAGE_CACHE_SHIFT);
>   }
>   EXPORT_SYMBOL(filemap_fdatawait);
>

Does this ever get called in cases where we would not hold fs locks? In 
that case we definitely don't want to be relying on i_size,

Steve.



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 11:32           ` Steven Whitehouse
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-31 11:32 UTC (permalink / raw)
  To: Jeff Layton, Jan Kara, Marcelo Tosatti
  Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson,
	cluster-devel

Hi,


On 31/07/17 12:27, Jeff Layton wrote:
> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote:
>>>> +int file_write_and_wait(struct file *file)
>>>> +{
>>>> +	int err = 0, err2;
>>>> +	struct address_space *mapping = file->f_mapping;
>>>> +
>>>> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
>>>> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
>>>> +		err = filemap_fdatawrite(mapping);
>>>> +		/* See comment of filemap_write_and_wait() */
>>>> +		if (err != -EIO) {
>>>> +			loff_t i_size = i_size_read(mapping->host);
>>>> +
>>>> +			if (i_size != 0)
>>>> +				__filemap_fdatawait_range(mapping, 0,
>>>> +							  i_size - 1);
>>>> +		}
>>>> +	}
>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
>>> range and ignore i_size. It is much easier than trying to wrap your head
>>> around possible races with file operations modifying i_size.
>>>
>>> 								Honza
>> I'm basically emulating _exactly_ what filemap_write_and_wait does here,
>> as I'm leery of making subtle behavior changes in the actual writeback
>> behavior. For example:
>>
>> -----------------8<----------------
>> static inline int __filemap_fdatawrite(struct address_space *mapping,
>>          int sync_mode)
>> {
>>          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
>> }
>>
>> int filemap_fdatawrite(struct address_space *mapping)
>> {
>>          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
>> }
>> EXPORT_SYMBOL(filemap_fdatawrite);
>> -----------------8<----------------
>>
>> ...which then sets up the wbc with the right ranges and sync mode and
>> kicks off writepages. But then, it does the i_size_read to figure out
>> what range it should wait on (with the shortcut for the size == 0 case).
>>
>> My assumption was that it was intentionally designed that way, but I'm
>> guessing from your comments that it wasn't? If so, then we can turn
>> file_write_and_wait a static inline wrapper around
>> file_write_and_wait_range.
> FWIW, I did a bit of archaeology in the linux-history tree and found
> this patch from Marcelo in 2004. Is this optimization still helpful? If
> not, then that does simplify the code a bit.
>
> -------------------8<--------------------
>
> [PATCH] small wait_on_page_writeback_range() optimization
>
> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> parameter.  This is not needed since we know the EOF from the inode.  Use
> that instead.
>
> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> ---
>   mm/filemap.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 78e18b7639b6..55fb7b4141e4 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
>    */
>   int filemap_fdatawait(struct address_space *mapping)
>   {
> -	return wait_on_page_writeback_range(mapping, 0, -1);
> +	loff_t i_size = i_size_read(mapping->host);
> +
> +	if (i_size == 0)
> +		return 0;
> +
> +	return wait_on_page_writeback_range(mapping, 0,
> +				(i_size - 1) >> PAGE_CACHE_SHIFT);
>   }
>   EXPORT_SYMBOL(filemap_fdatawait);
>

Does this ever get called in cases where we would not hold fs locks? In 
that case we definitely don't want to be relying on i_size,

Steve.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 11:32           ` Steven Whitehouse
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-31 11:32 UTC (permalink / raw)
  To: Jeff Layton, Jan Kara, Marcelo Tosatti
  Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson,
	cluster-devel

Hi,


On 31/07/17 12:27, Jeff Layton wrote:
> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote:
>>>> +int file_write_and_wait(struct file *file)
>>>> +{
>>>> +	int err = 0, err2;
>>>> +	struct address_space *mapping = file->f_mapping;
>>>> +
>>>> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
>>>> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
>>>> +		err = filemap_fdatawrite(mapping);
>>>> +		/* See comment of filemap_write_and_wait() */
>>>> +		if (err != -EIO) {
>>>> +			loff_t i_size = i_size_read(mapping->host);
>>>> +
>>>> +			if (i_size != 0)
>>>> +				__filemap_fdatawait_range(mapping, 0,
>>>> +							  i_size - 1);
>>>> +		}
>>>> +	}
>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
>>> range and ignore i_size. It is much easier than trying to wrap your head
>>> around possible races with file operations modifying i_size.
>>>
>>> 								Honza
>> I'm basically emulating _exactly_ what filemap_write_and_wait does here,
>> as I'm leery of making subtle behavior changes in the actual writeback
>> behavior. For example:
>>
>> -----------------8<----------------
>> static inline int __filemap_fdatawrite(struct address_space *mapping,
>>          int sync_mode)
>> {
>>          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
>> }
>>
>> int filemap_fdatawrite(struct address_space *mapping)
>> {
>>          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
>> }
>> EXPORT_SYMBOL(filemap_fdatawrite);
>> -----------------8<----------------
>>
>> ...which then sets up the wbc with the right ranges and sync mode and
>> kicks off writepages. But then, it does the i_size_read to figure out
>> what range it should wait on (with the shortcut for the size == 0 case).
>>
>> My assumption was that it was intentionally designed that way, but I'm
>> guessing from your comments that it wasn't? If so, then we can turn
>> file_write_and_wait a static inline wrapper around
>> file_write_and_wait_range.
> FWIW, I did a bit of archaeology in the linux-history tree and found
> this patch from Marcelo in 2004. Is this optimization still helpful? If
> not, then that does simplify the code a bit.
>
> -------------------8<--------------------
>
> [PATCH] small wait_on_page_writeback_range() optimization
>
> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> parameter.  This is not needed since we know the EOF from the inode.  Use
> that instead.
>
> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> ---
>   mm/filemap.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 78e18b7639b6..55fb7b4141e4 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
>    */
>   int filemap_fdatawait(struct address_space *mapping)
>   {
> -	return wait_on_page_writeback_range(mapping, 0, -1);
> +	loff_t i_size = i_size_read(mapping->host);
> +
> +	if (i_size == 0)
> +		return 0;
> +
> +	return wait_on_page_writeback_range(mapping, 0,
> +				(i_size - 1) >> PAGE_CACHE_SHIFT);
>   }
>   EXPORT_SYMBOL(filemap_fdatawait);
>

Does this ever get called in cases where we would not hold fs locks? In 
that case we definitely don't want to be relying on i_size,

Steve.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-31 11:32           ` Steven Whitehouse
  (?)
@ 2017-07-31 11:44             ` Jeff Layton
  -1 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 11:44 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> Hi,
> 
> 
> On 31/07/17 12:27, Jeff Layton wrote:
> > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > +int file_write_and_wait(struct file *file)
> > > > > +{
> > > > > +	int err = 0, err2;
> > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > +
> > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > +		err = filemap_fdatawrite(mapping);
> > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > +		if (err != -EIO) {
> > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > +
> > > > > +			if (i_size != 0)
> > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > +							  i_size - 1);
> > > > > +		}
> > > > > +	}
> > > > 
> > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > around possible races with file operations modifying i_size.
> > > > 
> > > > 								Honza
> > > 
> > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > as I'm leery of making subtle behavior changes in the actual writeback
> > > behavior. For example:
> > > 
> > > -----------------8<----------------
> > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > >          int sync_mode)
> > > {
> > >          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > }
> > > 
> > > int filemap_fdatawrite(struct address_space *mapping)
> > > {
> > >          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > }
> > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > -----------------8<----------------
> > > 
> > > ...which then sets up the wbc with the right ranges and sync mode and
> > > kicks off writepages. But then, it does the i_size_read to figure out
> > > what range it should wait on (with the shortcut for the size == 0 case).
> > > 
> > > My assumption was that it was intentionally designed that way, but I'm
> > > guessing from your comments that it wasn't? If so, then we can turn
> > > file_write_and_wait a static inline wrapper around
> > > file_write_and_wait_range.
> > 
> > FWIW, I did a bit of archaeology in the linux-history tree and found
> > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > not, then that does simplify the code a bit.
> > 
> > -------------------8<--------------------
> > 
> > [PATCH] small wait_on_page_writeback_range() optimization
> > 
> > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > parameter.  This is not needed since we know the EOF from the inode.  Use
> > that instead.
> > 
> > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > ---
> >   mm/filemap.c | 8 +++++++-
> >   1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 78e18b7639b6..55fb7b4141e4 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> >    */
> >   int filemap_fdatawait(struct address_space *mapping)
> >   {
> > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > +	loff_t i_size = i_size_read(mapping->host);
> > +
> > +	if (i_size == 0)
> > +		return 0;
> > +
> > +	return wait_on_page_writeback_range(mapping, 0,
> > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> >   }
> >   EXPORT_SYMBOL(filemap_fdatawait);
> > 
> 
> Does this ever get called in cases where we would not hold fs locks? In 
> that case we definitely don't want to be relying on i_size,
> 
> Steve.
> 

Yes. We can initiate and wait on writeback from any context where you
can sleep, really.

We're just waiting on whole file writeback here, so I don't think
there's anything wrong. As long as the i_size was valid at some point in
time prior to waiting then you're ok.

The question I have is more whether this optimization is still useful. 

What we do now is just walk the radix tree and wait_on_page_writeback
for each page. Do we gain anything by avoiding ranges beyond the current
EOF with the pagecache infrastructure of 2017?

-- 
Jeff Layton <jlayton@redhat.com>



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 11:44             ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 11:44 UTC (permalink / raw)
  To: Steven Whitehouse, Jan Kara, Marcelo Tosatti
  Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson,
	cluster-devel

On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> Hi,
> 
> 
> On 31/07/17 12:27, Jeff Layton wrote:
> > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > +int file_write_and_wait(struct file *file)
> > > > > +{
> > > > > +	int err = 0, err2;
> > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > +
> > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > +		err = filemap_fdatawrite(mapping);
> > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > +		if (err != -EIO) {
> > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > +
> > > > > +			if (i_size != 0)
> > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > +							  i_size - 1);
> > > > > +		}
> > > > > +	}
> > > > 
> > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > around possible races with file operations modifying i_size.
> > > > 
> > > > 								Honza
> > > 
> > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > as I'm leery of making subtle behavior changes in the actual writeback
> > > behavior. For example:
> > > 
> > > -----------------8<----------------
> > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > >          int sync_mode)
> > > {
> > >          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > }
> > > 
> > > int filemap_fdatawrite(struct address_space *mapping)
> > > {
> > >          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > }
> > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > -----------------8<----------------
> > > 
> > > ...which then sets up the wbc with the right ranges and sync mode and
> > > kicks off writepages. But then, it does the i_size_read to figure out
> > > what range it should wait on (with the shortcut for the size == 0 case).
> > > 
> > > My assumption was that it was intentionally designed that way, but I'm
> > > guessing from your comments that it wasn't? If so, then we can turn
> > > file_write_and_wait a static inline wrapper around
> > > file_write_and_wait_range.
> > 
> > FWIW, I did a bit of archaeology in the linux-history tree and found
> > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > not, then that does simplify the code a bit.
> > 
> > -------------------8<--------------------
> > 
> > [PATCH] small wait_on_page_writeback_range() optimization
> > 
> > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > parameter.  This is not needed since we know the EOF from the inode.  Use
> > that instead.
> > 
> > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > ---
> >   mm/filemap.c | 8 +++++++-
> >   1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 78e18b7639b6..55fb7b4141e4 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> >    */
> >   int filemap_fdatawait(struct address_space *mapping)
> >   {
> > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > +	loff_t i_size = i_size_read(mapping->host);
> > +
> > +	if (i_size == 0)
> > +		return 0;
> > +
> > +	return wait_on_page_writeback_range(mapping, 0,
> > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> >   }
> >   EXPORT_SYMBOL(filemap_fdatawait);
> > 
> 
> Does this ever get called in cases where we would not hold fs locks? In 
> that case we definitely don't want to be relying on i_size,
> 
> Steve.
> 

Yes. We can initiate and wait on writeback from any context where you
can sleep, really.

We're just waiting on whole file writeback here, so I don't think
there's anything wrong. As long as the i_size was valid at some point in
time prior to waiting then you're ok.

The question I have is more whether this optimization is still useful. 

What we do now is just walk the radix tree and wait_on_page_writeback
for each page. Do we gain anything by avoiding ranges beyond the current
EOF with the pagecache infrastructure of 2017?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 11:44             ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 11:44 UTC (permalink / raw)
  To: Steven Whitehouse, Jan Kara, Marcelo Tosatti
  Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson,
	cluster-devel

On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> Hi,
> 
> 
> On 31/07/17 12:27, Jeff Layton wrote:
> > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > +int file_write_and_wait(struct file *file)
> > > > > +{
> > > > > +	int err = 0, err2;
> > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > +
> > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > +		err = filemap_fdatawrite(mapping);
> > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > +		if (err != -EIO) {
> > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > +
> > > > > +			if (i_size != 0)
> > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > +							  i_size - 1);
> > > > > +		}
> > > > > +	}
> > > > 
> > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > around possible races with file operations modifying i_size.
> > > > 
> > > > 								Honza
> > > 
> > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > as I'm leery of making subtle behavior changes in the actual writeback
> > > behavior. For example:
> > > 
> > > -----------------8<----------------
> > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > >          int sync_mode)
> > > {
> > >          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > }
> > > 
> > > int filemap_fdatawrite(struct address_space *mapping)
> > > {
> > >          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > }
> > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > -----------------8<----------------
> > > 
> > > ...which then sets up the wbc with the right ranges and sync mode and
> > > kicks off writepages. But then, it does the i_size_read to figure out
> > > what range it should wait on (with the shortcut for the size == 0 case).
> > > 
> > > My assumption was that it was intentionally designed that way, but I'm
> > > guessing from your comments that it wasn't? If so, then we can turn
> > > file_write_and_wait a static inline wrapper around
> > > file_write_and_wait_range.
> > 
> > FWIW, I did a bit of archaeology in the linux-history tree and found
> > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > not, then that does simplify the code a bit.
> > 
> > -------------------8<--------------------
> > 
> > [PATCH] small wait_on_page_writeback_range() optimization
> > 
> > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > parameter.  This is not needed since we know the EOF from the inode.  Use
> > that instead.
> > 
> > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > ---
> >   mm/filemap.c | 8 +++++++-
> >   1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 78e18b7639b6..55fb7b4141e4 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> >    */
> >   int filemap_fdatawait(struct address_space *mapping)
> >   {
> > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > +	loff_t i_size = i_size_read(mapping->host);
> > +
> > +	if (i_size == 0)
> > +		return 0;
> > +
> > +	return wait_on_page_writeback_range(mapping, 0,
> > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> >   }
> >   EXPORT_SYMBOL(filemap_fdatawait);
> > 
> 
> Does this ever get called in cases where we would not hold fs locks? In 
> that case we definitely don't want to be relying on i_size,
> 
> Steve.
> 

Yes. We can initiate and wait on writeback from any context where you
can sleep, really.

We're just waiting on whole file writeback here, so I don't think
there's anything wrong. As long as the i_size was valid at some point in
time prior to waiting then you're ok.

The question I have is more whether this optimization is still useful. 

What we do now is just walk the radix tree and wait_on_page_writeback
for each page. Do we gain anything by avoiding ranges beyond the current
EOF with the pagecache infrastructure of 2017?

-- 
Jeff Layton <jlayton@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-31 11:44             ` Jeff Layton
  (?)
@ 2017-07-31 12:05               ` Steven Whitehouse
  -1 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-31 12:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 31/07/17 12:44, Jeff Layton wrote:
> On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
>> Hi,
>>
>>
>> On 31/07/17 12:27, Jeff Layton wrote:
>>> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
>>>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
>>>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote:
>>>>>> +int file_write_and_wait(struct file *file)
>>>>>> +{
>>>>>> +	int err = 0, err2;
>>>>>> +	struct address_space *mapping = file->f_mapping;
>>>>>> +
>>>>>> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
>>>>>> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
>>>>>> +		err = filemap_fdatawrite(mapping);
>>>>>> +		/* See comment of filemap_write_and_wait() */
>>>>>> +		if (err != -EIO) {
>>>>>> +			loff_t i_size = i_size_read(mapping->host);
>>>>>> +
>>>>>> +			if (i_size != 0)
>>>>>> +				__filemap_fdatawait_range(mapping, 0,
>>>>>> +							  i_size - 1);
>>>>>> +		}
>>>>>> +	}
>>>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
>>>>> range and ignore i_size. It is much easier than trying to wrap your head
>>>>> around possible races with file operations modifying i_size.
>>>>>
>>>>> 								Honza
>>>> I'm basically emulating _exactly_ what filemap_write_and_wait does here,
>>>> as I'm leery of making subtle behavior changes in the actual writeback
>>>> behavior. For example:
>>>>
>>>> -----------------8<----------------
>>>> static inline int __filemap_fdatawrite(struct address_space *mapping,
>>>>           int sync_mode)
>>>> {
>>>>           return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
>>>> }
>>>>
>>>> int filemap_fdatawrite(struct address_space *mapping)
>>>> {
>>>>           return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
>>>> }
>>>> EXPORT_SYMBOL(filemap_fdatawrite);
>>>> -----------------8<----------------
>>>>
>>>> ...which then sets up the wbc with the right ranges and sync mode and
>>>> kicks off writepages. But then, it does the i_size_read to figure out
>>>> what range it should wait on (with the shortcut for the size == 0 case).
>>>>
>>>> My assumption was that it was intentionally designed that way, but I'm
>>>> guessing from your comments that it wasn't? If so, then we can turn
>>>> file_write_and_wait a static inline wrapper around
>>>> file_write_and_wait_range.
>>> FWIW, I did a bit of archaeology in the linux-history tree and found
>>> this patch from Marcelo in 2004. Is this optimization still helpful? If
>>> not, then that does simplify the code a bit.
>>>
>>> -------------------8<--------------------
>>>
>>> [PATCH] small wait_on_page_writeback_range() optimization
>>>
>>> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
>>> parameter.  This is not needed since we know the EOF from the inode.  Use
>>> that instead.
>>>
>>> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
>>> Signed-off-by: Andrew Morton <akpm@osdl.org>
>>> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>> ---
>>>    mm/filemap.c | 8 +++++++-
>>>    1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>> index 78e18b7639b6..55fb7b4141e4 100644
>>> --- a/mm/filemap.c
>>> +++ b/mm/filemap.c
>>> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
>>>     */
>>>    int filemap_fdatawait(struct address_space *mapping)
>>>    {
>>> -	return wait_on_page_writeback_range(mapping, 0, -1);
>>> +	loff_t i_size = i_size_read(mapping->host);
>>> +
>>> +	if (i_size == 0)
>>> +		return 0;
>>> +
>>> +	return wait_on_page_writeback_range(mapping, 0,
>>> +				(i_size - 1) >> PAGE_CACHE_SHIFT);
>>>    }
>>>    EXPORT_SYMBOL(filemap_fdatawait);
>>>
>> Does this ever get called in cases where we would not hold fs locks? In
>> that case we definitely don't want to be relying on i_size,
>>
>> Steve.
>>
> Yes. We can initiate and wait on writeback from any context where you
> can sleep, really.
>
> We're just waiting on whole file writeback here, so I don't think
> there's anything wrong. As long as the i_size was valid at some point in
> time prior to waiting then you're ok.
>
> The question I have is more whether this optimization is still useful.
>
> What we do now is just walk the radix tree and wait_on_page_writeback
> for each page. Do we gain anything by avoiding ranges beyond the current
> EOF with the pagecache infrastructure of 2017?
>

If this can be called from anywhere without fs locks, then i_size is not 
known. That has been a problem in the past since i_size may have changed 
on another node. We avoid that in this case due to only changing i_size 
under an exclusive lock, and also only having dirty pages when we have 
an exclusive lock. There is another case though, if the inode is a block 
device, i_size will be zero. That is the case for the address space that 
looks after rgrps for GFS2. We do (luckily!) call 
filemap_fdatawait_range() directly in that case. For "normal" inodes 
though, the address space for metadata is backed by the block device 
inode, so that looks like it might be an issue, since 
fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the 
metamapping. It might potentially be an issue in other cases too,

Steve.



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 12:05               ` Steven Whitehouse
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-31 12:05 UTC (permalink / raw)
  To: Jeff Layton, Jan Kara, Marcelo Tosatti
  Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson,
	cluster-devel

Hi,


On 31/07/17 12:44, Jeff Layton wrote:
> On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
>> Hi,
>>
>>
>> On 31/07/17 12:27, Jeff Layton wrote:
>>> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
>>>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
>>>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote:
>>>>>> +int file_write_and_wait(struct file *file)
>>>>>> +{
>>>>>> +	int err = 0, err2;
>>>>>> +	struct address_space *mapping = file->f_mapping;
>>>>>> +
>>>>>> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
>>>>>> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
>>>>>> +		err = filemap_fdatawrite(mapping);
>>>>>> +		/* See comment of filemap_write_and_wait() */
>>>>>> +		if (err != -EIO) {
>>>>>> +			loff_t i_size = i_size_read(mapping->host);
>>>>>> +
>>>>>> +			if (i_size != 0)
>>>>>> +				__filemap_fdatawait_range(mapping, 0,
>>>>>> +							  i_size - 1);
>>>>>> +		}
>>>>>> +	}
>>>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
>>>>> range and ignore i_size. It is much easier than trying to wrap your head
>>>>> around possible races with file operations modifying i_size.
>>>>>
>>>>> 								Honza
>>>> I'm basically emulating _exactly_ what filemap_write_and_wait does here,
>>>> as I'm leery of making subtle behavior changes in the actual writeback
>>>> behavior. For example:
>>>>
>>>> -----------------8<----------------
>>>> static inline int __filemap_fdatawrite(struct address_space *mapping,
>>>>           int sync_mode)
>>>> {
>>>>           return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
>>>> }
>>>>
>>>> int filemap_fdatawrite(struct address_space *mapping)
>>>> {
>>>>           return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
>>>> }
>>>> EXPORT_SYMBOL(filemap_fdatawrite);
>>>> -----------------8<----------------
>>>>
>>>> ...which then sets up the wbc with the right ranges and sync mode and
>>>> kicks off writepages. But then, it does the i_size_read to figure out
>>>> what range it should wait on (with the shortcut for the size == 0 case).
>>>>
>>>> My assumption was that it was intentionally designed that way, but I'm
>>>> guessing from your comments that it wasn't? If so, then we can turn
>>>> file_write_and_wait a static inline wrapper around
>>>> file_write_and_wait_range.
>>> FWIW, I did a bit of archaeology in the linux-history tree and found
>>> this patch from Marcelo in 2004. Is this optimization still helpful? If
>>> not, then that does simplify the code a bit.
>>>
>>> -------------------8<--------------------
>>>
>>> [PATCH] small wait_on_page_writeback_range() optimization
>>>
>>> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
>>> parameter.  This is not needed since we know the EOF from the inode.  Use
>>> that instead.
>>>
>>> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
>>> Signed-off-by: Andrew Morton <akpm@osdl.org>
>>> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>> ---
>>>    mm/filemap.c | 8 +++++++-
>>>    1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>> index 78e18b7639b6..55fb7b4141e4 100644
>>> --- a/mm/filemap.c
>>> +++ b/mm/filemap.c
>>> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
>>>     */
>>>    int filemap_fdatawait(struct address_space *mapping)
>>>    {
>>> -	return wait_on_page_writeback_range(mapping, 0, -1);
>>> +	loff_t i_size = i_size_read(mapping->host);
>>> +
>>> +	if (i_size == 0)
>>> +		return 0;
>>> +
>>> +	return wait_on_page_writeback_range(mapping, 0,
>>> +				(i_size - 1) >> PAGE_CACHE_SHIFT);
>>>    }
>>>    EXPORT_SYMBOL(filemap_fdatawait);
>>>
>> Does this ever get called in cases where we would not hold fs locks? In
>> that case we definitely don't want to be relying on i_size,
>>
>> Steve.
>>
> Yes. We can initiate and wait on writeback from any context where you
> can sleep, really.
>
> We're just waiting on whole file writeback here, so I don't think
> there's anything wrong. As long as the i_size was valid at some point in
> time prior to waiting then you're ok.
>
> The question I have is more whether this optimization is still useful.
>
> What we do now is just walk the radix tree and wait_on_page_writeback
> for each page. Do we gain anything by avoiding ranges beyond the current
> EOF with the pagecache infrastructure of 2017?
>

If this can be called from anywhere without fs locks, then i_size is not 
known. That has been a problem in the past since i_size may have changed 
on another node. We avoid that in this case due to only changing i_size 
under an exclusive lock, and also only having dirty pages when we have 
an exclusive lock. There is another case though, if the inode is a block 
device, i_size will be zero. That is the case for the address space that 
looks after rgrps for GFS2. We do (luckily!) call 
filemap_fdatawait_range() directly in that case. For "normal" inodes 
though, the address space for metadata is backed by the block device 
inode, so that looks like it might be an issue, since 
fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the 
metamapping. It might potentially be an issue in other cases too,

Steve.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 12:05               ` Steven Whitehouse
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-31 12:05 UTC (permalink / raw)
  To: Jeff Layton, Jan Kara, Marcelo Tosatti
  Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson,
	cluster-devel

Hi,


On 31/07/17 12:44, Jeff Layton wrote:
> On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
>> Hi,
>>
>>
>> On 31/07/17 12:27, Jeff Layton wrote:
>>> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
>>>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
>>>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote:
>>>>>> +int file_write_and_wait(struct file *file)
>>>>>> +{
>>>>>> +	int err = 0, err2;
>>>>>> +	struct address_space *mapping = file->f_mapping;
>>>>>> +
>>>>>> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
>>>>>> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
>>>>>> +		err = filemap_fdatawrite(mapping);
>>>>>> +		/* See comment of filemap_write_and_wait() */
>>>>>> +		if (err != -EIO) {
>>>>>> +			loff_t i_size = i_size_read(mapping->host);
>>>>>> +
>>>>>> +			if (i_size != 0)
>>>>>> +				__filemap_fdatawait_range(mapping, 0,
>>>>>> +							  i_size - 1);
>>>>>> +		}
>>>>>> +	}
>>>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
>>>>> range and ignore i_size. It is much easier than trying to wrap your head
>>>>> around possible races with file operations modifying i_size.
>>>>>
>>>>> 								Honza
>>>> I'm basically emulating _exactly_ what filemap_write_and_wait does here,
>>>> as I'm leery of making subtle behavior changes in the actual writeback
>>>> behavior. For example:
>>>>
>>>> -----------------8<----------------
>>>> static inline int __filemap_fdatawrite(struct address_space *mapping,
>>>>           int sync_mode)
>>>> {
>>>>           return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
>>>> }
>>>>
>>>> int filemap_fdatawrite(struct address_space *mapping)
>>>> {
>>>>           return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
>>>> }
>>>> EXPORT_SYMBOL(filemap_fdatawrite);
>>>> -----------------8<----------------
>>>>
>>>> ...which then sets up the wbc with the right ranges and sync mode and
>>>> kicks off writepages. But then, it does the i_size_read to figure out
>>>> what range it should wait on (with the shortcut for the size == 0 case).
>>>>
>>>> My assumption was that it was intentionally designed that way, but I'm
>>>> guessing from your comments that it wasn't? If so, then we can turn
>>>> file_write_and_wait a static inline wrapper around
>>>> file_write_and_wait_range.
>>> FWIW, I did a bit of archaeology in the linux-history tree and found
>>> this patch from Marcelo in 2004. Is this optimization still helpful? If
>>> not, then that does simplify the code a bit.
>>>
>>> -------------------8<--------------------
>>>
>>> [PATCH] small wait_on_page_writeback_range() optimization
>>>
>>> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
>>> parameter.  This is not needed since we know the EOF from the inode.  Use
>>> that instead.
>>>
>>> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
>>> Signed-off-by: Andrew Morton <akpm@osdl.org>
>>> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>> ---
>>>    mm/filemap.c | 8 +++++++-
>>>    1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>> index 78e18b7639b6..55fb7b4141e4 100644
>>> --- a/mm/filemap.c
>>> +++ b/mm/filemap.c
>>> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
>>>     */
>>>    int filemap_fdatawait(struct address_space *mapping)
>>>    {
>>> -	return wait_on_page_writeback_range(mapping, 0, -1);
>>> +	loff_t i_size = i_size_read(mapping->host);
>>> +
>>> +	if (i_size == 0)
>>> +		return 0;
>>> +
>>> +	return wait_on_page_writeback_range(mapping, 0,
>>> +				(i_size - 1) >> PAGE_CACHE_SHIFT);
>>>    }
>>>    EXPORT_SYMBOL(filemap_fdatawait);
>>>
>> Does this ever get called in cases where we would not hold fs locks? In
>> that case we definitely don't want to be relying on i_size,
>>
>> Steve.
>>
> Yes. We can initiate and wait on writeback from any context where you
> can sleep, really.
>
> We're just waiting on whole file writeback here, so I don't think
> there's anything wrong. As long as the i_size was valid at some point in
> time prior to waiting then you're ok.
>
> The question I have is more whether this optimization is still useful.
>
> What we do now is just walk the radix tree and wait_on_page_writeback
> for each page. Do we gain anything by avoiding ranges beyond the current
> EOF with the pagecache infrastructure of 2017?
>

If this can be called from anywhere without fs locks, then i_size is not 
known. That has been a problem in the past since i_size may have changed 
on another node. We avoid that in this case due to only changing i_size 
under an exclusive lock, and also only having dirty pages when we have 
an exclusive lock. There is another case though, if the inode is a block 
device, i_size will be zero. That is the case for the address space that 
looks after rgrps for GFS2. We do (luckily!) call 
filemap_fdatawait_range() directly in that case. For "normal" inodes 
though, the address space for metadata is backed by the block device 
inode, so that looks like it might be an issue, since 
fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the 
metamapping. It might potentially be an issue in other cases too,

Steve.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-31 12:05               ` Steven Whitehouse
  (?)
@ 2017-07-31 12:22                 ` Jeff Layton
  -1 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 12:22 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon, 2017-07-31 at 13:05 +0100, Steven Whitehouse wrote:
> Hi,
> 
> 
> On 31/07/17 12:44, Jeff Layton wrote:
> > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> > > Hi,
> > > 
> > > 
> > > On 31/07/17 12:27, Jeff Layton wrote:
> > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > > > +int file_write_and_wait(struct file *file)
> > > > > > > +{
> > > > > > > +	int err = 0, err2;
> > > > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > > > +
> > > > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > > > +		err = filemap_fdatawrite(mapping);
> > > > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > > > +		if (err != -EIO) {
> > > > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > > > +
> > > > > > > +			if (i_size != 0)
> > > > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > > > +							  i_size - 1);
> > > > > > > +		}
> > > > > > > +	}
> > > > > > 
> > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > > > around possible races with file operations modifying i_size.
> > > > > > 
> > > > > > 								Honza
> > > > > 
> > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > > > as I'm leery of making subtle behavior changes in the actual writeback
> > > > > behavior. For example:
> > > > > 
> > > > > -----------------8<----------------
> > > > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > > > >           int sync_mode)
> > > > > {
> > > > >           return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > > > }
> > > > > 
> > > > > int filemap_fdatawrite(struct address_space *mapping)
> > > > > {
> > > > >           return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > > > }
> > > > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > > > -----------------8<----------------
> > > > > 
> > > > > ...which then sets up the wbc with the right ranges and sync mode and
> > > > > kicks off writepages. But then, it does the i_size_read to figure out
> > > > > what range it should wait on (with the shortcut for the size == 0 case).
> > > > > 
> > > > > My assumption was that it was intentionally designed that way, but I'm
> > > > > guessing from your comments that it wasn't? If so, then we can turn
> > > > > file_write_and_wait a static inline wrapper around
> > > > > file_write_and_wait_range.
> > > > 
> > > > FWIW, I did a bit of archaeology in the linux-history tree and found
> > > > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > > > not, then that does simplify the code a bit.
> > > > 
> > > > -------------------8<--------------------
> > > > 
> > > > [PATCH] small wait_on_page_writeback_range() optimization
> > > > 
> > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > > > parameter.  This is not needed since we know the EOF from the inode.  Use
> > > > that instead.
> > > > 
> > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > > > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > > > ---
> > > >    mm/filemap.c | 8 +++++++-
> > > >    1 file changed, 7 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > index 78e18b7639b6..55fb7b4141e4 100644
> > > > --- a/mm/filemap.c
> > > > +++ b/mm/filemap.c
> > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> > > >     */
> > > >    int filemap_fdatawait(struct address_space *mapping)
> > > >    {
> > > > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > > > +	loff_t i_size = i_size_read(mapping->host);
> > > > +
> > > > +	if (i_size == 0)
> > > > +		return 0;
> > > > +
> > > > +	return wait_on_page_writeback_range(mapping, 0,
> > > > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> > > >    }
> > > >    EXPORT_SYMBOL(filemap_fdatawait);
> > > > 
> > > 
> > > Does this ever get called in cases where we would not hold fs locks? In
> > > that case we definitely don't want to be relying on i_size,
> > > 
> > > Steve.
> > > 
> > 
> > Yes. We can initiate and wait on writeback from any context where you
> > can sleep, really.
> > 
> > We're just waiting on whole file writeback here, so I don't think
> > there's anything wrong. As long as the i_size was valid at some point in
> > time prior to waiting then you're ok.
> > 
> > The question I have is more whether this optimization is still useful.
> > 
> > What we do now is just walk the radix tree and wait_on_page_writeback
> > for each page. Do we gain anything by avoiding ranges beyond the current
> > EOF with the pagecache infrastructure of 2017?
> > 
> 
> If this can be called from anywhere without fs locks, then i_size is not 
> known. That has been a problem in the past since i_size may have changed 
> on another node. We avoid that in this case due to only changing i_size 
> under an exclusive lock, and also only having dirty pages when we have 
> an exclusive lock. There is another case though, if the inode is a block 
> device, i_size will be zero. That is the case for the address space that 
> looks after rgrps for GFS2. We do (luckily!) call 
> filemap_fdatawait_range() directly in that case. For "normal" inodes 
> though, the address space for metadata is backed by the block device 
> inode, so that looks like it might be an issue, since 
> fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the 
> metamapping. It might potentially be an issue in other cases too,
> 
> Steve.
> 

Some of those do sound problematic.

Again though, we're only waiting on writeback here, and I assume with
gfs2 that would only be pages that were written on the local node.

Is it possible to have pages under writeback and in still in the tree,
but that are beyond the current i_size? It seems like that's the main
worrisome case.

-- 
Jeff Layton <jlayton@redhat.com>



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 12:22                 ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 12:22 UTC (permalink / raw)
  To: Steven Whitehouse, Jan Kara, Marcelo Tosatti
  Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson,
	cluster-devel

On Mon, 2017-07-31 at 13:05 +0100, Steven Whitehouse wrote:
> Hi,
> 
> 
> On 31/07/17 12:44, Jeff Layton wrote:
> > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> > > Hi,
> > > 
> > > 
> > > On 31/07/17 12:27, Jeff Layton wrote:
> > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > > > +int file_write_and_wait(struct file *file)
> > > > > > > +{
> > > > > > > +	int err = 0, err2;
> > > > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > > > +
> > > > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > > > +		err = filemap_fdatawrite(mapping);
> > > > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > > > +		if (err != -EIO) {
> > > > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > > > +
> > > > > > > +			if (i_size != 0)
> > > > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > > > +							  i_size - 1);
> > > > > > > +		}
> > > > > > > +	}
> > > > > > 
> > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > > > around possible races with file operations modifying i_size.
> > > > > > 
> > > > > > 								Honza
> > > > > 
> > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > > > as I'm leery of making subtle behavior changes in the actual writeback
> > > > > behavior. For example:
> > > > > 
> > > > > -----------------8<----------------
> > > > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > > > >           int sync_mode)
> > > > > {
> > > > >           return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > > > }
> > > > > 
> > > > > int filemap_fdatawrite(struct address_space *mapping)
> > > > > {
> > > > >           return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > > > }
> > > > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > > > -----------------8<----------------
> > > > > 
> > > > > ...which then sets up the wbc with the right ranges and sync mode and
> > > > > kicks off writepages. But then, it does the i_size_read to figure out
> > > > > what range it should wait on (with the shortcut for the size == 0 case).
> > > > > 
> > > > > My assumption was that it was intentionally designed that way, but I'm
> > > > > guessing from your comments that it wasn't? If so, then we can turn
> > > > > file_write_and_wait a static inline wrapper around
> > > > > file_write_and_wait_range.
> > > > 
> > > > FWIW, I did a bit of archaeology in the linux-history tree and found
> > > > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > > > not, then that does simplify the code a bit.
> > > > 
> > > > -------------------8<--------------------
> > > > 
> > > > [PATCH] small wait_on_page_writeback_range() optimization
> > > > 
> > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > > > parameter.  This is not needed since we know the EOF from the inode.  Use
> > > > that instead.
> > > > 
> > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > > > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > > > ---
> > > >    mm/filemap.c | 8 +++++++-
> > > >    1 file changed, 7 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > index 78e18b7639b6..55fb7b4141e4 100644
> > > > --- a/mm/filemap.c
> > > > +++ b/mm/filemap.c
> > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> > > >     */
> > > >    int filemap_fdatawait(struct address_space *mapping)
> > > >    {
> > > > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > > > +	loff_t i_size = i_size_read(mapping->host);
> > > > +
> > > > +	if (i_size == 0)
> > > > +		return 0;
> > > > +
> > > > +	return wait_on_page_writeback_range(mapping, 0,
> > > > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> > > >    }
> > > >    EXPORT_SYMBOL(filemap_fdatawait);
> > > > 
> > > 
> > > Does this ever get called in cases where we would not hold fs locks? In
> > > that case we definitely don't want to be relying on i_size,
> > > 
> > > Steve.
> > > 
> > 
> > Yes. We can initiate and wait on writeback from any context where you
> > can sleep, really.
> > 
> > We're just waiting on whole file writeback here, so I don't think
> > there's anything wrong. As long as the i_size was valid at some point in
> > time prior to waiting then you're ok.
> > 
> > The question I have is more whether this optimization is still useful.
> > 
> > What we do now is just walk the radix tree and wait_on_page_writeback
> > for each page. Do we gain anything by avoiding ranges beyond the current
> > EOF with the pagecache infrastructure of 2017?
> > 
> 
> If this can be called from anywhere without fs locks, then i_size is not 
> known. That has been a problem in the past since i_size may have changed 
> on another node. We avoid that in this case due to only changing i_size 
> under an exclusive lock, and also only having dirty pages when we have 
> an exclusive lock. There is another case though, if the inode is a block 
> device, i_size will be zero. That is the case for the address space that 
> looks after rgrps for GFS2. We do (luckily!) call 
> filemap_fdatawait_range() directly in that case. For "normal" inodes 
> though, the address space for metadata is backed by the block device 
> inode, so that looks like it might be an issue, since 
> fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the 
> metamapping. It might potentially be an issue in other cases too,
> 
> Steve.
> 

Some of those do sound problematic.

Again though, we're only waiting on writeback here, and I assume with
gfs2 that would only be pages that were written on the local node.

Is it possible to have pages under writeback and in still in the tree,
but that are beyond the current i_size? It seems like that's the main
worrisome case.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 12:22                 ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 12:22 UTC (permalink / raw)
  To: Steven Whitehouse, Jan Kara, Marcelo Tosatti
  Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson,
	cluster-devel

On Mon, 2017-07-31 at 13:05 +0100, Steven Whitehouse wrote:
> Hi,
> 
> 
> On 31/07/17 12:44, Jeff Layton wrote:
> > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> > > Hi,
> > > 
> > > 
> > > On 31/07/17 12:27, Jeff Layton wrote:
> > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > > > +int file_write_and_wait(struct file *file)
> > > > > > > +{
> > > > > > > +	int err = 0, err2;
> > > > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > > > +
> > > > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > > > +		err = filemap_fdatawrite(mapping);
> > > > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > > > +		if (err != -EIO) {
> > > > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > > > +
> > > > > > > +			if (i_size != 0)
> > > > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > > > +							  i_size - 1);
> > > > > > > +		}
> > > > > > > +	}
> > > > > > 
> > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > > > around possible races with file operations modifying i_size.
> > > > > > 
> > > > > > 								Honza
> > > > > 
> > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > > > as I'm leery of making subtle behavior changes in the actual writeback
> > > > > behavior. For example:
> > > > > 
> > > > > -----------------8<----------------
> > > > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > > > >           int sync_mode)
> > > > > {
> > > > >           return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > > > }
> > > > > 
> > > > > int filemap_fdatawrite(struct address_space *mapping)
> > > > > {
> > > > >           return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > > > }
> > > > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > > > -----------------8<----------------
> > > > > 
> > > > > ...which then sets up the wbc with the right ranges and sync mode and
> > > > > kicks off writepages. But then, it does the i_size_read to figure out
> > > > > what range it should wait on (with the shortcut for the size == 0 case).
> > > > > 
> > > > > My assumption was that it was intentionally designed that way, but I'm
> > > > > guessing from your comments that it wasn't? If so, then we can turn
> > > > > file_write_and_wait a static inline wrapper around
> > > > > file_write_and_wait_range.
> > > > 
> > > > FWIW, I did a bit of archaeology in the linux-history tree and found
> > > > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > > > not, then that does simplify the code a bit.
> > > > 
> > > > -------------------8<--------------------
> > > > 
> > > > [PATCH] small wait_on_page_writeback_range() optimization
> > > > 
> > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > > > parameter.  This is not needed since we know the EOF from the inode.  Use
> > > > that instead.
> > > > 
> > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > > > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > > > ---
> > > >    mm/filemap.c | 8 +++++++-
> > > >    1 file changed, 7 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > index 78e18b7639b6..55fb7b4141e4 100644
> > > > --- a/mm/filemap.c
> > > > +++ b/mm/filemap.c
> > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> > > >     */
> > > >    int filemap_fdatawait(struct address_space *mapping)
> > > >    {
> > > > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > > > +	loff_t i_size = i_size_read(mapping->host);
> > > > +
> > > > +	if (i_size == 0)
> > > > +		return 0;
> > > > +
> > > > +	return wait_on_page_writeback_range(mapping, 0,
> > > > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> > > >    }
> > > >    EXPORT_SYMBOL(filemap_fdatawait);
> > > > 
> > > 
> > > Does this ever get called in cases where we would not hold fs locks? In
> > > that case we definitely don't want to be relying on i_size,
> > > 
> > > Steve.
> > > 
> > 
> > Yes. We can initiate and wait on writeback from any context where you
> > can sleep, really.
> > 
> > We're just waiting on whole file writeback here, so I don't think
> > there's anything wrong. As long as the i_size was valid at some point in
> > time prior to waiting then you're ok.
> > 
> > The question I have is more whether this optimization is still useful.
> > 
> > What we do now is just walk the radix tree and wait_on_page_writeback
> > for each page. Do we gain anything by avoiding ranges beyond the current
> > EOF with the pagecache infrastructure of 2017?
> > 
> 
> If this can be called from anywhere without fs locks, then i_size is not 
> known. That has been a problem in the past since i_size may have changed 
> on another node. We avoid that in this case due to only changing i_size 
> under an exclusive lock, and also only having dirty pages when we have 
> an exclusive lock. There is another case though, if the inode is a block 
> device, i_size will be zero. That is the case for the address space that 
> looks after rgrps for GFS2. We do (luckily!) call 
> filemap_fdatawait_range() directly in that case. For "normal" inodes 
> though, the address space for metadata is backed by the block device 
> inode, so that looks like it might be an issue, since 
> fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the 
> metamapping. It might potentially be an issue in other cases too,
> 
> Steve.
> 

Some of those do sound problematic.

Again though, we're only waiting on writeback here, and I assume with
gfs2 that would only be pages that were written on the local node.

Is it possible to have pages under writeback and in still in the tree,
but that are beyond the current i_size? It seems like that's the main
worrisome case.

-- 
Jeff Layton <jlayton@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-31 12:22                 ` Jeff Layton
  (?)
@ 2017-07-31 12:25                   ` Steven Whitehouse
  -1 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-31 12:25 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 31/07/17 13:22, Jeff Layton wrote:
> On Mon, 2017-07-31 at 13:05 +0100, Steven Whitehouse wrote:
>> Hi,
>>
>>
>> On 31/07/17 12:44, Jeff Layton wrote:
>>> On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
>>>> Hi,
>>>>
>>>>
>>>> On 31/07/17 12:27, Jeff Layton wrote:
>>>>> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
>>>>>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
>>>>>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote:
>>>>>>>> +int file_write_and_wait(struct file *file)
>>>>>>>> +{
>>>>>>>> +	int err = 0, err2;
>>>>>>>> +	struct address_space *mapping = file->f_mapping;
>>>>>>>> +
>>>>>>>> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
>>>>>>>> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
>>>>>>>> +		err = filemap_fdatawrite(mapping);
>>>>>>>> +		/* See comment of filemap_write_and_wait() */
>>>>>>>> +		if (err != -EIO) {
>>>>>>>> +			loff_t i_size = i_size_read(mapping->host);
>>>>>>>> +
>>>>>>>> +			if (i_size != 0)
>>>>>>>> +				__filemap_fdatawait_range(mapping, 0,
>>>>>>>> +							  i_size - 1);
>>>>>>>> +		}
>>>>>>>> +	}
>>>>>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
>>>>>>> range and ignore i_size. It is much easier than trying to wrap your head
>>>>>>> around possible races with file operations modifying i_size.
>>>>>>>
>>>>>>> 								Honza
>>>>>> I'm basically emulating _exactly_ what filemap_write_and_wait does here,
>>>>>> as I'm leery of making subtle behavior changes in the actual writeback
>>>>>> behavior. For example:
>>>>>>
>>>>>> -----------------8<----------------
>>>>>> static inline int __filemap_fdatawrite(struct address_space *mapping,
>>>>>>            int sync_mode)
>>>>>> {
>>>>>>            return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
>>>>>> }
>>>>>>
>>>>>> int filemap_fdatawrite(struct address_space *mapping)
>>>>>> {
>>>>>>            return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
>>>>>> }
>>>>>> EXPORT_SYMBOL(filemap_fdatawrite);
>>>>>> -----------------8<----------------
>>>>>>
>>>>>> ...which then sets up the wbc with the right ranges and sync mode and
>>>>>> kicks off writepages. But then, it does the i_size_read to figure out
>>>>>> what range it should wait on (with the shortcut for the size == 0 case).
>>>>>>
>>>>>> My assumption was that it was intentionally designed that way, but I'm
>>>>>> guessing from your comments that it wasn't? If so, then we can turn
>>>>>> file_write_and_wait a static inline wrapper around
>>>>>> file_write_and_wait_range.
>>>>> FWIW, I did a bit of archaeology in the linux-history tree and found
>>>>> this patch from Marcelo in 2004. Is this optimization still helpful? If
>>>>> not, then that does simplify the code a bit.
>>>>>
>>>>> -------------------8<--------------------
>>>>>
>>>>> [PATCH] small wait_on_page_writeback_range() optimization
>>>>>
>>>>> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
>>>>> parameter.  This is not needed since we know the EOF from the inode.  Use
>>>>> that instead.
>>>>>
>>>>> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
>>>>> Signed-off-by: Andrew Morton <akpm@osdl.org>
>>>>> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>>>> ---
>>>>>     mm/filemap.c | 8 +++++++-
>>>>>     1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>>>> index 78e18b7639b6..55fb7b4141e4 100644
>>>>> --- a/mm/filemap.c
>>>>> +++ b/mm/filemap.c
>>>>> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
>>>>>      */
>>>>>     int filemap_fdatawait(struct address_space *mapping)
>>>>>     {
>>>>> -	return wait_on_page_writeback_range(mapping, 0, -1);
>>>>> +	loff_t i_size = i_size_read(mapping->host);
>>>>> +
>>>>> +	if (i_size == 0)
>>>>> +		return 0;
>>>>> +
>>>>> +	return wait_on_page_writeback_range(mapping, 0,
>>>>> +				(i_size - 1) >> PAGE_CACHE_SHIFT);
>>>>>     }
>>>>>     EXPORT_SYMBOL(filemap_fdatawait);
>>>>>
>>>> Does this ever get called in cases where we would not hold fs locks? In
>>>> that case we definitely don't want to be relying on i_size,
>>>>
>>>> Steve.
>>>>
>>> Yes. We can initiate and wait on writeback from any context where you
>>> can sleep, really.
>>>
>>> We're just waiting on whole file writeback here, so I don't think
>>> there's anything wrong. As long as the i_size was valid at some point in
>>> time prior to waiting then you're ok.
>>>
>>> The question I have is more whether this optimization is still useful.
>>>
>>> What we do now is just walk the radix tree and wait_on_page_writeback
>>> for each page. Do we gain anything by avoiding ranges beyond the current
>>> EOF with the pagecache infrastructure of 2017?
>>>
>> If this can be called from anywhere without fs locks, then i_size is not
>> known. That has been a problem in the past since i_size may have changed
>> on another node. We avoid that in this case due to only changing i_size
>> under an exclusive lock, and also only having dirty pages when we have
>> an exclusive lock. There is another case though, if the inode is a block
>> device, i_size will be zero. That is the case for the address space that
>> looks after rgrps for GFS2. We do (luckily!) call
>> filemap_fdatawait_range() directly in that case. For "normal" inodes
>> though, the address space for metadata is backed by the block device
>> inode, so that looks like it might be an issue, since
>> fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the
>> metamapping. It might potentially be an issue in other cases too,
>>
>> Steve.
>>
> Some of those do sound problematic.
>
> Again though, we're only waiting on writeback here, and I assume with
> gfs2 that would only be pages that were written on the local node.
Yes
>
> Is it possible to have pages under writeback and in still in the tree,
> but that are beyond the current i_size? It seems like that's the main
> worrisome case.
>
Thats what I was wondering too. I'm not 100% sure without some more 
detailed investigation. Either way the block device case also seems 
problematic, although not impossible to special case I suppose. The real 
question is what do we get from this optmisation? Is the pain of 
checking correctness worth it for the benefits gained,

Steve.



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 12:25                   ` Steven Whitehouse
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-31 12:25 UTC (permalink / raw)
  To: Jeff Layton, Jan Kara, Marcelo Tosatti
  Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson,
	cluster-devel

Hi,


On 31/07/17 13:22, Jeff Layton wrote:
> On Mon, 2017-07-31 at 13:05 +0100, Steven Whitehouse wrote:
>> Hi,
>>
>>
>> On 31/07/17 12:44, Jeff Layton wrote:
>>> On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
>>>> Hi,
>>>>
>>>>
>>>> On 31/07/17 12:27, Jeff Layton wrote:
>>>>> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
>>>>>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
>>>>>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote:
>>>>>>>> +int file_write_and_wait(struct file *file)
>>>>>>>> +{
>>>>>>>> +	int err = 0, err2;
>>>>>>>> +	struct address_space *mapping = file->f_mapping;
>>>>>>>> +
>>>>>>>> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
>>>>>>>> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
>>>>>>>> +		err = filemap_fdatawrite(mapping);
>>>>>>>> +		/* See comment of filemap_write_and_wait() */
>>>>>>>> +		if (err != -EIO) {
>>>>>>>> +			loff_t i_size = i_size_read(mapping->host);
>>>>>>>> +
>>>>>>>> +			if (i_size != 0)
>>>>>>>> +				__filemap_fdatawait_range(mapping, 0,
>>>>>>>> +							  i_size - 1);
>>>>>>>> +		}
>>>>>>>> +	}
>>>>>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
>>>>>>> range and ignore i_size. It is much easier than trying to wrap your head
>>>>>>> around possible races with file operations modifying i_size.
>>>>>>>
>>>>>>> 								Honza
>>>>>> I'm basically emulating _exactly_ what filemap_write_and_wait does here,
>>>>>> as I'm leery of making subtle behavior changes in the actual writeback
>>>>>> behavior. For example:
>>>>>>
>>>>>> -----------------8<----------------
>>>>>> static inline int __filemap_fdatawrite(struct address_space *mapping,
>>>>>>            int sync_mode)
>>>>>> {
>>>>>>            return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
>>>>>> }
>>>>>>
>>>>>> int filemap_fdatawrite(struct address_space *mapping)
>>>>>> {
>>>>>>            return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
>>>>>> }
>>>>>> EXPORT_SYMBOL(filemap_fdatawrite);
>>>>>> -----------------8<----------------
>>>>>>
>>>>>> ...which then sets up the wbc with the right ranges and sync mode and
>>>>>> kicks off writepages. But then, it does the i_size_read to figure out
>>>>>> what range it should wait on (with the shortcut for the size == 0 case).
>>>>>>
>>>>>> My assumption was that it was intentionally designed that way, but I'm
>>>>>> guessing from your comments that it wasn't? If so, then we can turn
>>>>>> file_write_and_wait a static inline wrapper around
>>>>>> file_write_and_wait_range.
>>>>> FWIW, I did a bit of archaeology in the linux-history tree and found
>>>>> this patch from Marcelo in 2004. Is this optimization still helpful? If
>>>>> not, then that does simplify the code a bit.
>>>>>
>>>>> -------------------8<--------------------
>>>>>
>>>>> [PATCH] small wait_on_page_writeback_range() optimization
>>>>>
>>>>> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
>>>>> parameter.  This is not needed since we know the EOF from the inode.  Use
>>>>> that instead.
>>>>>
>>>>> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
>>>>> Signed-off-by: Andrew Morton <akpm@osdl.org>
>>>>> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>>>> ---
>>>>>     mm/filemap.c | 8 +++++++-
>>>>>     1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>>>> index 78e18b7639b6..55fb7b4141e4 100644
>>>>> --- a/mm/filemap.c
>>>>> +++ b/mm/filemap.c
>>>>> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
>>>>>      */
>>>>>     int filemap_fdatawait(struct address_space *mapping)
>>>>>     {
>>>>> -	return wait_on_page_writeback_range(mapping, 0, -1);
>>>>> +	loff_t i_size = i_size_read(mapping->host);
>>>>> +
>>>>> +	if (i_size == 0)
>>>>> +		return 0;
>>>>> +
>>>>> +	return wait_on_page_writeback_range(mapping, 0,
>>>>> +				(i_size - 1) >> PAGE_CACHE_SHIFT);
>>>>>     }
>>>>>     EXPORT_SYMBOL(filemap_fdatawait);
>>>>>
>>>> Does this ever get called in cases where we would not hold fs locks? In
>>>> that case we definitely don't want to be relying on i_size,
>>>>
>>>> Steve.
>>>>
>>> Yes. We can initiate and wait on writeback from any context where you
>>> can sleep, really.
>>>
>>> We're just waiting on whole file writeback here, so I don't think
>>> there's anything wrong. As long as the i_size was valid at some point in
>>> time prior to waiting then you're ok.
>>>
>>> The question I have is more whether this optimization is still useful.
>>>
>>> What we do now is just walk the radix tree and wait_on_page_writeback
>>> for each page. Do we gain anything by avoiding ranges beyond the current
>>> EOF with the pagecache infrastructure of 2017?
>>>
>> If this can be called from anywhere without fs locks, then i_size is not
>> known. That has been a problem in the past since i_size may have changed
>> on another node. We avoid that in this case due to only changing i_size
>> under an exclusive lock, and also only having dirty pages when we have
>> an exclusive lock. There is another case though, if the inode is a block
>> device, i_size will be zero. That is the case for the address space that
>> looks after rgrps for GFS2. We do (luckily!) call
>> filemap_fdatawait_range() directly in that case. For "normal" inodes
>> though, the address space for metadata is backed by the block device
>> inode, so that looks like it might be an issue, since
>> fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the
>> metamapping. It might potentially be an issue in other cases too,
>>
>> Steve.
>>
> Some of those do sound problematic.
>
> Again though, we're only waiting on writeback here, and I assume with
> gfs2 that would only be pages that were written on the local node.
Yes
>
> Is it possible to have pages under writeback and in still in the tree,
> but that are beyond the current i_size? It seems like that's the main
> worrisome case.
>
Thats what I was wondering too. I'm not 100% sure without some more 
detailed investigation. Either way the block device case also seems 
problematic, although not impossible to special case I suppose. The real 
question is what do we get from this optmisation? Is the pain of 
checking correctness worth it for the benefits gained,

Steve.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 12:25                   ` Steven Whitehouse
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-31 12:25 UTC (permalink / raw)
  To: Jeff Layton, Jan Kara, Marcelo Tosatti
  Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson,
	cluster-devel

Hi,


On 31/07/17 13:22, Jeff Layton wrote:
> On Mon, 2017-07-31 at 13:05 +0100, Steven Whitehouse wrote:
>> Hi,
>>
>>
>> On 31/07/17 12:44, Jeff Layton wrote:
>>> On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
>>>> Hi,
>>>>
>>>>
>>>> On 31/07/17 12:27, Jeff Layton wrote:
>>>>> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
>>>>>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
>>>>>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote:
>>>>>>>> +int file_write_and_wait(struct file *file)
>>>>>>>> +{
>>>>>>>> +	int err = 0, err2;
>>>>>>>> +	struct address_space *mapping = file->f_mapping;
>>>>>>>> +
>>>>>>>> +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
>>>>>>>> +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
>>>>>>>> +		err = filemap_fdatawrite(mapping);
>>>>>>>> +		/* See comment of filemap_write_and_wait() */
>>>>>>>> +		if (err != -EIO) {
>>>>>>>> +			loff_t i_size = i_size_read(mapping->host);
>>>>>>>> +
>>>>>>>> +			if (i_size != 0)
>>>>>>>> +				__filemap_fdatawait_range(mapping, 0,
>>>>>>>> +							  i_size - 1);
>>>>>>>> +		}
>>>>>>>> +	}
>>>>>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
>>>>>>> range and ignore i_size. It is much easier than trying to wrap your head
>>>>>>> around possible races with file operations modifying i_size.
>>>>>>>
>>>>>>> 								Honza
>>>>>> I'm basically emulating _exactly_ what filemap_write_and_wait does here,
>>>>>> as I'm leery of making subtle behavior changes in the actual writeback
>>>>>> behavior. For example:
>>>>>>
>>>>>> -----------------8<----------------
>>>>>> static inline int __filemap_fdatawrite(struct address_space *mapping,
>>>>>>            int sync_mode)
>>>>>> {
>>>>>>            return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
>>>>>> }
>>>>>>
>>>>>> int filemap_fdatawrite(struct address_space *mapping)
>>>>>> {
>>>>>>            return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
>>>>>> }
>>>>>> EXPORT_SYMBOL(filemap_fdatawrite);
>>>>>> -----------------8<----------------
>>>>>>
>>>>>> ...which then sets up the wbc with the right ranges and sync mode and
>>>>>> kicks off writepages. But then, it does the i_size_read to figure out
>>>>>> what range it should wait on (with the shortcut for the size == 0 case).
>>>>>>
>>>>>> My assumption was that it was intentionally designed that way, but I'm
>>>>>> guessing from your comments that it wasn't? If so, then we can turn
>>>>>> file_write_and_wait a static inline wrapper around
>>>>>> file_write_and_wait_range.
>>>>> FWIW, I did a bit of archaeology in the linux-history tree and found
>>>>> this patch from Marcelo in 2004. Is this optimization still helpful? If
>>>>> not, then that does simplify the code a bit.
>>>>>
>>>>> -------------------8<--------------------
>>>>>
>>>>> [PATCH] small wait_on_page_writeback_range() optimization
>>>>>
>>>>> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
>>>>> parameter.  This is not needed since we know the EOF from the inode.  Use
>>>>> that instead.
>>>>>
>>>>> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
>>>>> Signed-off-by: Andrew Morton <akpm@osdl.org>
>>>>> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>>>>> ---
>>>>>     mm/filemap.c | 8 +++++++-
>>>>>     1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>>>> index 78e18b7639b6..55fb7b4141e4 100644
>>>>> --- a/mm/filemap.c
>>>>> +++ b/mm/filemap.c
>>>>> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
>>>>>      */
>>>>>     int filemap_fdatawait(struct address_space *mapping)
>>>>>     {
>>>>> -	return wait_on_page_writeback_range(mapping, 0, -1);
>>>>> +	loff_t i_size = i_size_read(mapping->host);
>>>>> +
>>>>> +	if (i_size == 0)
>>>>> +		return 0;
>>>>> +
>>>>> +	return wait_on_page_writeback_range(mapping, 0,
>>>>> +				(i_size - 1) >> PAGE_CACHE_SHIFT);
>>>>>     }
>>>>>     EXPORT_SYMBOL(filemap_fdatawait);
>>>>>
>>>> Does this ever get called in cases where we would not hold fs locks? In
>>>> that case we definitely don't want to be relying on i_size,
>>>>
>>>> Steve.
>>>>
>>> Yes. We can initiate and wait on writeback from any context where you
>>> can sleep, really.
>>>
>>> We're just waiting on whole file writeback here, so I don't think
>>> there's anything wrong. As long as the i_size was valid at some point in
>>> time prior to waiting then you're ok.
>>>
>>> The question I have is more whether this optimization is still useful.
>>>
>>> What we do now is just walk the radix tree and wait_on_page_writeback
>>> for each page. Do we gain anything by avoiding ranges beyond the current
>>> EOF with the pagecache infrastructure of 2017?
>>>
>> If this can be called from anywhere without fs locks, then i_size is not
>> known. That has been a problem in the past since i_size may have changed
>> on another node. We avoid that in this case due to only changing i_size
>> under an exclusive lock, and also only having dirty pages when we have
>> an exclusive lock. There is another case though, if the inode is a block
>> device, i_size will be zero. That is the case for the address space that
>> looks after rgrps for GFS2. We do (luckily!) call
>> filemap_fdatawait_range() directly in that case. For "normal" inodes
>> though, the address space for metadata is backed by the block device
>> inode, so that looks like it might be an issue, since
>> fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the
>> metamapping. It might potentially be an issue in other cases too,
>>
>> Steve.
>>
> Some of those do sound problematic.
>
> Again though, we're only waiting on writeback here, and I assume with
> gfs2 that would only be pages that were written on the local node.
Yes
>
> Is it possible to have pages under writeback and in still in the tree,
> but that are beyond the current i_size? It seems like that's the main
> worrisome case.
>
Thats what I was wondering too. I'm not 100% sure without some more 
detailed investigation. Either way the block device case also seems 
problematic, although not impossible to special case I suppose. The real 
question is what do we get from this optmisation? Is the pain of 
checking correctness worth it for the benefits gained,

Steve.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-31 12:22                 ` Jeff Layton
  (?)
@ 2017-07-31 12:38                   ` Bob Peterson
  -1 siblings, 0 replies; 87+ messages in thread
From: Bob Peterson @ 2017-07-31 12:38 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
| > If this can be called from anywhere without fs locks, then i_size is not
| > known. That has been a problem in the past since i_size may have changed
| > on another node. We avoid that in this case due to only changing i_size
| > under an exclusive lock, and also only having dirty pages when we have
| > an exclusive lock. There is another case though, if the inode is a block
| > device, i_size will be zero. That is the case for the address space that
| > looks after rgrps for GFS2. We do (luckily!) call
| > filemap_fdatawait_range() directly in that case. For "normal" inodes
| > though, the address space for metadata is backed by the block device
| > inode, so that looks like it might be an issue, since
| > fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the
| > metamapping. It might potentially be an issue in other cases too,
| > 
| > Steve.
| > 
| 
| Some of those do sound problematic.
| 
| Again though, we're only waiting on writeback here, and I assume with
| gfs2 that would only be pages that were written on the local node.
| 
| Is it possible to have pages under writeback and in still in the tree,
| but that are beyond the current i_size? It seems like that's the main
| worrisome case.
| 
| --
| Jeff Layton <jlayton@redhat.com>

Hi Jeff,

I believe the answer is yes.

I was recently "bitten" by a case where (whether due to a bug or not)
I had blocks allocated in a GFS2 file beyond i_size. I had implemented a
delete algorithm that used i_size, but I found cases where files couldn't
be deleted because of blocks hanging out past EOF. I'm not sure if they
can be in writeback, but possibly. It's already on my "to investigate"
list, but I haven't gotten to it yet. Yes, it seems like a bug. Yes, we
need to fix it. But now there may be lots of legacy file systems out in
the field that have this problem. Not sure if they can get to writeback
until I study the situation more closely.

I believe Ben Marzinski also may have come across a case in which we
can have blocks in writeback that are beyond i_size. See the commit
message on Ben's patch here:

https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=fd4c5748b8d3f7420e8932ed0bde3d53cc8acc9d

Regards,

Bob Peterson
Red Hat File Systems

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 12:38                   ` Bob Peterson
  0 siblings, 0 replies; 87+ messages in thread
From: Bob Peterson @ 2017-07-31 12:38 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steven Whitehouse, Jan Kara, Marcelo Tosatti, Alexander Viro,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, cluster-devel, Benjamin Marzinski

----- Original Message -----
| > If this can be called from anywhere without fs locks, then i_size is not
| > known. That has been a problem in the past since i_size may have changed
| > on another node. We avoid that in this case due to only changing i_size
| > under an exclusive lock, and also only having dirty pages when we have
| > an exclusive lock. There is another case though, if the inode is a block
| > device, i_size will be zero. That is the case for the address space that
| > looks after rgrps for GFS2. We do (luckily!) call
| > filemap_fdatawait_range() directly in that case. For "normal" inodes
| > though, the address space for metadata is backed by the block device
| > inode, so that looks like it might be an issue, since
| > fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the
| > metamapping. It might potentially be an issue in other cases too,
| > 
| > Steve.
| > 
| 
| Some of those do sound problematic.
| 
| Again though, we're only waiting on writeback here, and I assume with
| gfs2 that would only be pages that were written on the local node.
| 
| Is it possible to have pages under writeback and in still in the tree,
| but that are beyond the current i_size? It seems like that's the main
| worrisome case.
| 
| --
| Jeff Layton <jlayton@redhat.com>

Hi Jeff,

I believe the answer is yes.

I was recently "bitten" by a case where (whether due to a bug or not)
I had blocks allocated in a GFS2 file beyond i_size. I had implemented a
delete algorithm that used i_size, but I found cases where files couldn't
be deleted because of blocks hanging out past EOF. I'm not sure if they
can be in writeback, but possibly. It's already on my "to investigate"
list, but I haven't gotten to it yet. Yes, it seems like a bug. Yes, we
need to fix it. But now there may be lots of legacy file systems out in
the field that have this problem. Not sure if they can get to writeback
until I study the situation more closely.

I believe Ben Marzinski also may have come across a case in which we
can have blocks in writeback that are beyond i_size. See the commit
message on Ben's patch here:

https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=fd4c5748b8d3f7420e8932ed0bde3d53cc8acc9d

Regards,

Bob Peterson
Red Hat File Systems

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 12:38                   ` Bob Peterson
  0 siblings, 0 replies; 87+ messages in thread
From: Bob Peterson @ 2017-07-31 12:38 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steven Whitehouse, Jan Kara, Marcelo Tosatti, Alexander Viro,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, cluster-devel, Benjamin Marzinski

----- Original Message -----
| > If this can be called from anywhere without fs locks, then i_size is not
| > known. That has been a problem in the past since i_size may have changed
| > on another node. We avoid that in this case due to only changing i_size
| > under an exclusive lock, and also only having dirty pages when we have
| > an exclusive lock. There is another case though, if the inode is a block
| > device, i_size will be zero. That is the case for the address space that
| > looks after rgrps for GFS2. We do (luckily!) call
| > filemap_fdatawait_range() directly in that case. For "normal" inodes
| > though, the address space for metadata is backed by the block device
| > inode, so that looks like it might be an issue, since
| > fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the
| > metamapping. It might potentially be an issue in other cases too,
| > 
| > Steve.
| > 
| 
| Some of those do sound problematic.
| 
| Again though, we're only waiting on writeback here, and I assume with
| gfs2 that would only be pages that were written on the local node.
| 
| Is it possible to have pages under writeback and in still in the tree,
| but that are beyond the current i_size? It seems like that's the main
| worrisome case.
| 
| --
| Jeff Layton <jlayton@redhat.com>

Hi Jeff,

I believe the answer is yes.

I was recently "bitten" by a case where (whether due to a bug or not)
I had blocks allocated in a GFS2 file beyond i_size. I had implemented a
delete algorithm that used i_size, but I found cases where files couldn't
be deleted because of blocks hanging out past EOF. I'm not sure if they
can be in writeback, but possibly. It's already on my "to investigate"
list, but I haven't gotten to it yet. Yes, it seems like a bug. Yes, we
need to fix it. But now there may be lots of legacy file systems out in
the field that have this problem. Not sure if they can get to writeback
until I study the situation more closely.

I believe Ben Marzinski also may have come across a case in which we
can have blocks in writeback that are beyond i_size. See the commit
message on Ben's patch here:

https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=fd4c5748b8d3f7420e8932ed0bde3d53cc8acc9d

Regards,

Bob Peterson
Red Hat File Systems

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-31 11:44             ` Jeff Layton
  (?)
@ 2017-07-31 12:07               ` Jan Kara
  -1 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-07-31 12:07 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon 31-07-17 07:44:16, Jeff Layton wrote:
> On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> > On 31/07/17 12:27, Jeff Layton wrote:
> > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > > +int file_write_and_wait(struct file *file)
> > > > > > +{
> > > > > > +	int err = 0, err2;
> > > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > > +
> > > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > > +		err = filemap_fdatawrite(mapping);
> > > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > > +		if (err != -EIO) {
> > > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > > +
> > > > > > +			if (i_size != 0)
> > > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > > +							  i_size - 1);
> > > > > > +		}
> > > > > > +	}
> > > > > 
> > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > > around possible races with file operations modifying i_size.
> > > > > 
> > > > > 								Honza
> > > > 
> > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > > as I'm leery of making subtle behavior changes in the actual writeback
> > > > behavior. For example:
> > > > 
> > > > -----------------8<----------------
> > > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > > >          int sync_mode)
> > > > {
> > > >          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > > }
> > > > 
> > > > int filemap_fdatawrite(struct address_space *mapping)
> > > > {
> > > >          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > > }
> > > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > > -----------------8<----------------
> > > > 
> > > > ...which then sets up the wbc with the right ranges and sync mode and
> > > > kicks off writepages. But then, it does the i_size_read to figure out
> > > > what range it should wait on (with the shortcut for the size == 0 case).
> > > > 
> > > > My assumption was that it was intentionally designed that way, but I'm
> > > > guessing from your comments that it wasn't? If so, then we can turn
> > > > file_write_and_wait a static inline wrapper around
> > > > file_write_and_wait_range.
> > > 
> > > FWIW, I did a bit of archaeology in the linux-history tree and found
> > > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > > not, then that does simplify the code a bit.
> > > 
> > > -------------------8<--------------------
> > > 
> > > [PATCH] small wait_on_page_writeback_range() optimization
> > > 
> > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > > parameter.  This is not needed since we know the EOF from the inode.  Use
> > > that instead.
> > > 
> > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > > ---
> > >   mm/filemap.c | 8 +++++++-
> > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > index 78e18b7639b6..55fb7b4141e4 100644
> > > --- a/mm/filemap.c
> > > +++ b/mm/filemap.c
> > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> > >    */
> > >   int filemap_fdatawait(struct address_space *mapping)
> > >   {
> > > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > > +	loff_t i_size = i_size_read(mapping->host);
> > > +
> > > +	if (i_size == 0)
> > > +		return 0;
> > > +
> > > +	return wait_on_page_writeback_range(mapping, 0,
> > > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> > >   }
> > >   EXPORT_SYMBOL(filemap_fdatawait);
> > > 
> > 
> > Does this ever get called in cases where we would not hold fs locks? In 
> > that case we definitely don't want to be relying on i_size,
> > 
> > Steve.
> > 
> 
> Yes. We can initiate and wait on writeback from any context where you
> can sleep, really.
> 
> We're just waiting on whole file writeback here, so I don't think
> there's anything wrong. As long as the i_size was valid at some point in
> time prior to waiting then you're ok.
> 
> The question I have is more whether this optimization is still useful. 
> 
> What we do now is just walk the radix tree and wait_on_page_writeback
> for each page. Do we gain anything by avoiding ranges beyond the current
> EOF with the pagecache infrastructure of 2017?

FWIW I'm not aware of any significant benefit of using i_size in
filemap_fdatawait() - we iterate to the end of the radix tree node anyway
since pagevec_lookup_tag() does not support range searches anyway (I'm
working on fixing that however even after that the benefit would be still
rather marginal).

What Marcello might have meant even back in 2004 was that if we are in the
middle of truncate, i_size is already reduced but page cache not truncated
yet, then filemap_fdatawait() does not have to wait for writeback of
truncated pages. That might be a noticeable benefit even today if such race
happens however I'm not sure it's worth optimizing for and surprises
arising from randomly snapshotting i_size (which especially for clustered
filesystems may be out of date) IMHO overweight the possible advantage.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 12:07               ` Jan Kara
  0 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-07-31 12:07 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steven Whitehouse, Jan Kara, Marcelo Tosatti, Alexander Viro,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel

On Mon 31-07-17 07:44:16, Jeff Layton wrote:
> On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> > On 31/07/17 12:27, Jeff Layton wrote:
> > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > > +int file_write_and_wait(struct file *file)
> > > > > > +{
> > > > > > +	int err = 0, err2;
> > > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > > +
> > > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > > +		err = filemap_fdatawrite(mapping);
> > > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > > +		if (err != -EIO) {
> > > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > > +
> > > > > > +			if (i_size != 0)
> > > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > > +							  i_size - 1);
> > > > > > +		}
> > > > > > +	}
> > > > > 
> > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > > around possible races with file operations modifying i_size.
> > > > > 
> > > > > 								Honza
> > > > 
> > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > > as I'm leery of making subtle behavior changes in the actual writeback
> > > > behavior. For example:
> > > > 
> > > > -----------------8<----------------
> > > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > > >          int sync_mode)
> > > > {
> > > >          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > > }
> > > > 
> > > > int filemap_fdatawrite(struct address_space *mapping)
> > > > {
> > > >          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > > }
> > > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > > -----------------8<----------------
> > > > 
> > > > ...which then sets up the wbc with the right ranges and sync mode and
> > > > kicks off writepages. But then, it does the i_size_read to figure out
> > > > what range it should wait on (with the shortcut for the size == 0 case).
> > > > 
> > > > My assumption was that it was intentionally designed that way, but I'm
> > > > guessing from your comments that it wasn't? If so, then we can turn
> > > > file_write_and_wait a static inline wrapper around
> > > > file_write_and_wait_range.
> > > 
> > > FWIW, I did a bit of archaeology in the linux-history tree and found
> > > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > > not, then that does simplify the code a bit.
> > > 
> > > -------------------8<--------------------
> > > 
> > > [PATCH] small wait_on_page_writeback_range() optimization
> > > 
> > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > > parameter.  This is not needed since we know the EOF from the inode.  Use
> > > that instead.
> > > 
> > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > > ---
> > >   mm/filemap.c | 8 +++++++-
> > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > index 78e18b7639b6..55fb7b4141e4 100644
> > > --- a/mm/filemap.c
> > > +++ b/mm/filemap.c
> > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> > >    */
> > >   int filemap_fdatawait(struct address_space *mapping)
> > >   {
> > > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > > +	loff_t i_size = i_size_read(mapping->host);
> > > +
> > > +	if (i_size == 0)
> > > +		return 0;
> > > +
> > > +	return wait_on_page_writeback_range(mapping, 0,
> > > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> > >   }
> > >   EXPORT_SYMBOL(filemap_fdatawait);
> > > 
> > 
> > Does this ever get called in cases where we would not hold fs locks? In 
> > that case we definitely don't want to be relying on i_size,
> > 
> > Steve.
> > 
> 
> Yes. We can initiate and wait on writeback from any context where you
> can sleep, really.
> 
> We're just waiting on whole file writeback here, so I don't think
> there's anything wrong. As long as the i_size was valid at some point in
> time prior to waiting then you're ok.
> 
> The question I have is more whether this optimization is still useful. 
> 
> What we do now is just walk the radix tree and wait_on_page_writeback
> for each page. Do we gain anything by avoiding ranges beyond the current
> EOF with the pagecache infrastructure of 2017?

FWIW I'm not aware of any significant benefit of using i_size in
filemap_fdatawait() - we iterate to the end of the radix tree node anyway
since pagevec_lookup_tag() does not support range searches anyway (I'm
working on fixing that however even after that the benefit would be still
rather marginal).

What Marcello might have meant even back in 2004 was that if we are in the
middle of truncate, i_size is already reduced but page cache not truncated
yet, then filemap_fdatawait() does not have to wait for writeback of
truncated pages. That might be a noticeable benefit even today if such race
happens however I'm not sure it's worth optimizing for and surprises
arising from randomly snapshotting i_size (which especially for clustered
filesystems may be out of date) IMHO overweight the possible advantage.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 12:07               ` Jan Kara
  0 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-07-31 12:07 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steven Whitehouse, Jan Kara, Marcelo Tosatti, Alexander Viro,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel

On Mon 31-07-17 07:44:16, Jeff Layton wrote:
> On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> > On 31/07/17 12:27, Jeff Layton wrote:
> > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > > +int file_write_and_wait(struct file *file)
> > > > > > +{
> > > > > > +	int err = 0, err2;
> > > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > > +
> > > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > > +		err = filemap_fdatawrite(mapping);
> > > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > > +		if (err != -EIO) {
> > > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > > +
> > > > > > +			if (i_size != 0)
> > > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > > +							  i_size - 1);
> > > > > > +		}
> > > > > > +	}
> > > > > 
> > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > > around possible races with file operations modifying i_size.
> > > > > 
> > > > > 								Honza
> > > > 
> > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > > as I'm leery of making subtle behavior changes in the actual writeback
> > > > behavior. For example:
> > > > 
> > > > -----------------8<----------------
> > > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > > >          int sync_mode)
> > > > {
> > > >          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > > }
> > > > 
> > > > int filemap_fdatawrite(struct address_space *mapping)
> > > > {
> > > >          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > > }
> > > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > > -----------------8<----------------
> > > > 
> > > > ...which then sets up the wbc with the right ranges and sync mode and
> > > > kicks off writepages. But then, it does the i_size_read to figure out
> > > > what range it should wait on (with the shortcut for the size == 0 case).
> > > > 
> > > > My assumption was that it was intentionally designed that way, but I'm
> > > > guessing from your comments that it wasn't? If so, then we can turn
> > > > file_write_and_wait a static inline wrapper around
> > > > file_write_and_wait_range.
> > > 
> > > FWIW, I did a bit of archaeology in the linux-history tree and found
> > > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > > not, then that does simplify the code a bit.
> > > 
> > > -------------------8<--------------------
> > > 
> > > [PATCH] small wait_on_page_writeback_range() optimization
> > > 
> > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > > parameter.  This is not needed since we know the EOF from the inode.  Use
> > > that instead.
> > > 
> > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > > ---
> > >   mm/filemap.c | 8 +++++++-
> > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > index 78e18b7639b6..55fb7b4141e4 100644
> > > --- a/mm/filemap.c
> > > +++ b/mm/filemap.c
> > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> > >    */
> > >   int filemap_fdatawait(struct address_space *mapping)
> > >   {
> > > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > > +	loff_t i_size = i_size_read(mapping->host);
> > > +
> > > +	if (i_size == 0)
> > > +		return 0;
> > > +
> > > +	return wait_on_page_writeback_range(mapping, 0,
> > > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> > >   }
> > >   EXPORT_SYMBOL(filemap_fdatawait);
> > > 
> > 
> > Does this ever get called in cases where we would not hold fs locks? In 
> > that case we definitely don't want to be relying on i_size,
> > 
> > Steve.
> > 
> 
> Yes. We can initiate and wait on writeback from any context where you
> can sleep, really.
> 
> We're just waiting on whole file writeback here, so I don't think
> there's anything wrong. As long as the i_size was valid at some point in
> time prior to waiting then you're ok.
> 
> The question I have is more whether this optimization is still useful. 
> 
> What we do now is just walk the radix tree and wait_on_page_writeback
> for each page. Do we gain anything by avoiding ranges beyond the current
> EOF with the pagecache infrastructure of 2017?

FWIW I'm not aware of any significant benefit of using i_size in
filemap_fdatawait() - we iterate to the end of the radix tree node anyway
since pagevec_lookup_tag() does not support range searches anyway (I'm
working on fixing that however even after that the benefit would be still
rather marginal).

What Marcello might have meant even back in 2004 was that if we are in the
middle of truncate, i_size is already reduced but page cache not truncated
yet, then filemap_fdatawait() does not have to wait for writeback of
truncated pages. That might be a noticeable benefit even today if such race
happens however I'm not sure it's worth optimizing for and surprises
arising from randomly snapshotting i_size (which especially for clustered
filesystems may be out of date) IMHO overweight the possible advantage.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-31 12:07               ` Jan Kara
  (?)
@ 2017-07-31 13:00                 ` Jeff Layton
  -1 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 13:00 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon, 2017-07-31 at 14:07 +0200, Jan Kara wrote:
> On Mon 31-07-17 07:44:16, Jeff Layton wrote:
> > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> > > On 31/07/17 12:27, Jeff Layton wrote:
> > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > > > +int file_write_and_wait(struct file *file)
> > > > > > > +{
> > > > > > > +	int err = 0, err2;
> > > > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > > > +
> > > > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > > > +		err = filemap_fdatawrite(mapping);
> > > > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > > > +		if (err != -EIO) {
> > > > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > > > +
> > > > > > > +			if (i_size != 0)
> > > > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > > > +							  i_size - 1);
> > > > > > > +		}
> > > > > > > +	}
> > > > > > 
> > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > > > around possible races with file operations modifying i_size.
> > > > > > 
> > > > > > 								Honza
> > > > > 
> > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > > > as I'm leery of making subtle behavior changes in the actual writeback
> > > > > behavior. For example:
> > > > > 
> > > > > -----------------8<----------------
> > > > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > > > >          int sync_mode)
> > > > > {
> > > > >          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > > > }
> > > > > 
> > > > > int filemap_fdatawrite(struct address_space *mapping)
> > > > > {
> > > > >          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > > > }
> > > > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > > > -----------------8<----------------
> > > > > 
> > > > > ...which then sets up the wbc with the right ranges and sync mode and
> > > > > kicks off writepages. But then, it does the i_size_read to figure out
> > > > > what range it should wait on (with the shortcut for the size == 0 case).
> > > > > 
> > > > > My assumption was that it was intentionally designed that way, but I'm
> > > > > guessing from your comments that it wasn't? If so, then we can turn
> > > > > file_write_and_wait a static inline wrapper around
> > > > > file_write_and_wait_range.
> > > > 
> > > > FWIW, I did a bit of archaeology in the linux-history tree and found
> > > > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > > > not, then that does simplify the code a bit.
> > > > 
> > > > -------------------8<--------------------
> > > > 
> > > > [PATCH] small wait_on_page_writeback_range() optimization
> > > > 
> > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > > > parameter.  This is not needed since we know the EOF from the inode.  Use
> > > > that instead.
> > > > 
> > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > > > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > > > ---
> > > >   mm/filemap.c | 8 +++++++-
> > > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > index 78e18b7639b6..55fb7b4141e4 100644
> > > > --- a/mm/filemap.c
> > > > +++ b/mm/filemap.c
> > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> > > >    */
> > > >   int filemap_fdatawait(struct address_space *mapping)
> > > >   {
> > > > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > > > +	loff_t i_size = i_size_read(mapping->host);
> > > > +
> > > > +	if (i_size == 0)
> > > > +		return 0;
> > > > +
> > > > +	return wait_on_page_writeback_range(mapping, 0,
> > > > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> > > >   }
> > > >   EXPORT_SYMBOL(filemap_fdatawait);
> > > > 
> > > 
> > > Does this ever get called in cases where we would not hold fs locks? In 
> > > that case we definitely don't want to be relying on i_size,
> > > 
> > > Steve.
> > > 
> > 
> > Yes. We can initiate and wait on writeback from any context where you
> > can sleep, really.
> > 
> > We're just waiting on whole file writeback here, so I don't think
> > there's anything wrong. As long as the i_size was valid at some point in
> > time prior to waiting then you're ok.
> > 
> > The question I have is more whether this optimization is still useful. 
> > 
> > What we do now is just walk the radix tree and wait_on_page_writeback
> > for each page. Do we gain anything by avoiding ranges beyond the current
> > EOF with the pagecache infrastructure of 2017?
> 
> FWIW I'm not aware of any significant benefit of using i_size in
> filemap_fdatawait() - we iterate to the end of the radix tree node anyway
> since pagevec_lookup_tag() does not support range searches anyway (I'm
> working on fixing that however even after that the benefit would be still
> rather marginal).
> 
> What Marcello might have meant even back in 2004 was that if we are in the
> middle of truncate, i_size is already reduced but page cache not truncated
> yet, then filemap_fdatawait() does not have to wait for writeback of
> truncated pages. That might be a noticeable benefit even today if such race
> happens however I'm not sure it's worth optimizing for and surprises
> arising from randomly snapshotting i_size (which especially for clustered
> filesystems may be out of date) IMHO overweight the possible advantage.
> 
> 								Honza

Thanks for clarifying.

Given that file_write_and_wait is a new helper function anyway, I'll
just make it a wrapper around file_write_and_wait_range. Since it might
be racy, should remove this optimization from the "legacy"
filemap_fdatawait / filemap_fdatawait_keep_errors calls?

-- 
Jeff Layton <jlayton@redhat.com>



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 13:00                 ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 13:00 UTC (permalink / raw)
  To: Jan Kara
  Cc: Steven Whitehouse, Marcelo Tosatti, Alexander Viro,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel

On Mon, 2017-07-31 at 14:07 +0200, Jan Kara wrote:
> On Mon 31-07-17 07:44:16, Jeff Layton wrote:
> > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> > > On 31/07/17 12:27, Jeff Layton wrote:
> > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > > > +int file_write_and_wait(struct file *file)
> > > > > > > +{
> > > > > > > +	int err = 0, err2;
> > > > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > > > +
> > > > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > > > +		err = filemap_fdatawrite(mapping);
> > > > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > > > +		if (err != -EIO) {
> > > > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > > > +
> > > > > > > +			if (i_size != 0)
> > > > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > > > +							  i_size - 1);
> > > > > > > +		}
> > > > > > > +	}
> > > > > > 
> > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > > > around possible races with file operations modifying i_size.
> > > > > > 
> > > > > > 								Honza
> > > > > 
> > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > > > as I'm leery of making subtle behavior changes in the actual writeback
> > > > > behavior. For example:
> > > > > 
> > > > > -----------------8<----------------
> > > > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > > > >          int sync_mode)
> > > > > {
> > > > >          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > > > }
> > > > > 
> > > > > int filemap_fdatawrite(struct address_space *mapping)
> > > > > {
> > > > >          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > > > }
> > > > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > > > -----------------8<----------------
> > > > > 
> > > > > ...which then sets up the wbc with the right ranges and sync mode and
> > > > > kicks off writepages. But then, it does the i_size_read to figure out
> > > > > what range it should wait on (with the shortcut for the size == 0 case).
> > > > > 
> > > > > My assumption was that it was intentionally designed that way, but I'm
> > > > > guessing from your comments that it wasn't? If so, then we can turn
> > > > > file_write_and_wait a static inline wrapper around
> > > > > file_write_and_wait_range.
> > > > 
> > > > FWIW, I did a bit of archaeology in the linux-history tree and found
> > > > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > > > not, then that does simplify the code a bit.
> > > > 
> > > > -------------------8<--------------------
> > > > 
> > > > [PATCH] small wait_on_page_writeback_range() optimization
> > > > 
> > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > > > parameter.  This is not needed since we know the EOF from the inode.  Use
> > > > that instead.
> > > > 
> > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > > > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > > > ---
> > > >   mm/filemap.c | 8 +++++++-
> > > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > index 78e18b7639b6..55fb7b4141e4 100644
> > > > --- a/mm/filemap.c
> > > > +++ b/mm/filemap.c
> > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> > > >    */
> > > >   int filemap_fdatawait(struct address_space *mapping)
> > > >   {
> > > > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > > > +	loff_t i_size = i_size_read(mapping->host);
> > > > +
> > > > +	if (i_size == 0)
> > > > +		return 0;
> > > > +
> > > > +	return wait_on_page_writeback_range(mapping, 0,
> > > > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> > > >   }
> > > >   EXPORT_SYMBOL(filemap_fdatawait);
> > > > 
> > > 
> > > Does this ever get called in cases where we would not hold fs locks? In 
> > > that case we definitely don't want to be relying on i_size,
> > > 
> > > Steve.
> > > 
> > 
> > Yes. We can initiate and wait on writeback from any context where you
> > can sleep, really.
> > 
> > We're just waiting on whole file writeback here, so I don't think
> > there's anything wrong. As long as the i_size was valid at some point in
> > time prior to waiting then you're ok.
> > 
> > The question I have is more whether this optimization is still useful. 
> > 
> > What we do now is just walk the radix tree and wait_on_page_writeback
> > for each page. Do we gain anything by avoiding ranges beyond the current
> > EOF with the pagecache infrastructure of 2017?
> 
> FWIW I'm not aware of any significant benefit of using i_size in
> filemap_fdatawait() - we iterate to the end of the radix tree node anyway
> since pagevec_lookup_tag() does not support range searches anyway (I'm
> working on fixing that however even after that the benefit would be still
> rather marginal).
> 
> What Marcello might have meant even back in 2004 was that if we are in the
> middle of truncate, i_size is already reduced but page cache not truncated
> yet, then filemap_fdatawait() does not have to wait for writeback of
> truncated pages. That might be a noticeable benefit even today if such race
> happens however I'm not sure it's worth optimizing for and surprises
> arising from randomly snapshotting i_size (which especially for clustered
> filesystems may be out of date) IMHO overweight the possible advantage.
> 
> 								Honza

Thanks for clarifying.

Given that file_write_and_wait is a new helper function anyway, I'll
just make it a wrapper around file_write_and_wait_range. Since it might
be racy, should remove this optimization from the "legacy"
filemap_fdatawait / filemap_fdatawait_keep_errors calls?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 13:00                 ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 13:00 UTC (permalink / raw)
  To: Jan Kara
  Cc: Steven Whitehouse, Marcelo Tosatti, Alexander Viro,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel

On Mon, 2017-07-31 at 14:07 +0200, Jan Kara wrote:
> On Mon 31-07-17 07:44:16, Jeff Layton wrote:
> > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> > > On 31/07/17 12:27, Jeff Layton wrote:
> > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > > > +int file_write_and_wait(struct file *file)
> > > > > > > +{
> > > > > > > +	int err = 0, err2;
> > > > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > > > +
> > > > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > > > +		err = filemap_fdatawrite(mapping);
> > > > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > > > +		if (err != -EIO) {
> > > > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > > > +
> > > > > > > +			if (i_size != 0)
> > > > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > > > +							  i_size - 1);
> > > > > > > +		}
> > > > > > > +	}
> > > > > > 
> > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > > > around possible races with file operations modifying i_size.
> > > > > > 
> > > > > > 								Honza
> > > > > 
> > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > > > as I'm leery of making subtle behavior changes in the actual writeback
> > > > > behavior. For example:
> > > > > 
> > > > > -----------------8<----------------
> > > > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > > > >          int sync_mode)
> > > > > {
> > > > >          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > > > }
> > > > > 
> > > > > int filemap_fdatawrite(struct address_space *mapping)
> > > > > {
> > > > >          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > > > }
> > > > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > > > -----------------8<----------------
> > > > > 
> > > > > ...which then sets up the wbc with the right ranges and sync mode and
> > > > > kicks off writepages. But then, it does the i_size_read to figure out
> > > > > what range it should wait on (with the shortcut for the size == 0 case).
> > > > > 
> > > > > My assumption was that it was intentionally designed that way, but I'm
> > > > > guessing from your comments that it wasn't? If so, then we can turn
> > > > > file_write_and_wait a static inline wrapper around
> > > > > file_write_and_wait_range.
> > > > 
> > > > FWIW, I did a bit of archaeology in the linux-history tree and found
> > > > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > > > not, then that does simplify the code a bit.
> > > > 
> > > > -------------------8<--------------------
> > > > 
> > > > [PATCH] small wait_on_page_writeback_range() optimization
> > > > 
> > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > > > parameter.  This is not needed since we know the EOF from the inode.  Use
> > > > that instead.
> > > > 
> > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > > > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > > > ---
> > > >   mm/filemap.c | 8 +++++++-
> > > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > index 78e18b7639b6..55fb7b4141e4 100644
> > > > --- a/mm/filemap.c
> > > > +++ b/mm/filemap.c
> > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> > > >    */
> > > >   int filemap_fdatawait(struct address_space *mapping)
> > > >   {
> > > > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > > > +	loff_t i_size = i_size_read(mapping->host);
> > > > +
> > > > +	if (i_size == 0)
> > > > +		return 0;
> > > > +
> > > > +	return wait_on_page_writeback_range(mapping, 0,
> > > > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> > > >   }
> > > >   EXPORT_SYMBOL(filemap_fdatawait);
> > > > 
> > > 
> > > Does this ever get called in cases where we would not hold fs locks? In 
> > > that case we definitely don't want to be relying on i_size,
> > > 
> > > Steve.
> > > 
> > 
> > Yes. We can initiate and wait on writeback from any context where you
> > can sleep, really.
> > 
> > We're just waiting on whole file writeback here, so I don't think
> > there's anything wrong. As long as the i_size was valid at some point in
> > time prior to waiting then you're ok.
> > 
> > The question I have is more whether this optimization is still useful. 
> > 
> > What we do now is just walk the radix tree and wait_on_page_writeback
> > for each page. Do we gain anything by avoiding ranges beyond the current
> > EOF with the pagecache infrastructure of 2017?
> 
> FWIW I'm not aware of any significant benefit of using i_size in
> filemap_fdatawait() - we iterate to the end of the radix tree node anyway
> since pagevec_lookup_tag() does not support range searches anyway (I'm
> working on fixing that however even after that the benefit would be still
> rather marginal).
> 
> What Marcello might have meant even back in 2004 was that if we are in the
> middle of truncate, i_size is already reduced but page cache not truncated
> yet, then filemap_fdatawait() does not have to wait for writeback of
> truncated pages. That might be a noticeable benefit even today if such race
> happens however I'm not sure it's worth optimizing for and surprises
> arising from randomly snapshotting i_size (which especially for clustered
> filesystems may be out of date) IMHO overweight the possible advantage.
> 
> 								Honza

Thanks for clarifying.

Given that file_write_and_wait is a new helper function anyway, I'll
just make it a wrapper around file_write_and_wait_range. Since it might
be racy, should remove this optimization from the "legacy"
filemap_fdatawait / filemap_fdatawait_keep_errors calls?

-- 
Jeff Layton <jlayton@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-31 13:00                 ` Jeff Layton
  (?)
@ 2017-07-31 13:32                   ` Jan Kara
  -1 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-07-31 13:32 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon 31-07-17 09:00:37, Jeff Layton wrote:
> On Mon, 2017-07-31 at 14:07 +0200, Jan Kara wrote:
> > On Mon 31-07-17 07:44:16, Jeff Layton wrote:
> > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> > > > On 31/07/17 12:27, Jeff Layton wrote:
> > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > > > > +int file_write_and_wait(struct file *file)
> > > > > > > > +{
> > > > > > > > +	int err = 0, err2;
> > > > > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > > > > +
> > > > > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > > > > +		err = filemap_fdatawrite(mapping);
> > > > > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > > > > +		if (err != -EIO) {
> > > > > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > > > > +
> > > > > > > > +			if (i_size != 0)
> > > > > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > > > > +							  i_size - 1);
> > > > > > > > +		}
> > > > > > > > +	}
> > > > > > > 
> > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > > > > around possible races with file operations modifying i_size.
> > > > > > > 
> > > > > > > 								Honza
> > > > > > 
> > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > > > > as I'm leery of making subtle behavior changes in the actual writeback
> > > > > > behavior. For example:
> > > > > > 
> > > > > > -----------------8<----------------
> > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > > > > >          int sync_mode)
> > > > > > {
> > > > > >          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > > > > }
> > > > > > 
> > > > > > int filemap_fdatawrite(struct address_space *mapping)
> > > > > > {
> > > > > >          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > > > > }
> > > > > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > > > > -----------------8<----------------
> > > > > > 
> > > > > > ...which then sets up the wbc with the right ranges and sync mode and
> > > > > > kicks off writepages. But then, it does the i_size_read to figure out
> > > > > > what range it should wait on (with the shortcut for the size == 0 case).
> > > > > > 
> > > > > > My assumption was that it was intentionally designed that way, but I'm
> > > > > > guessing from your comments that it wasn't? If so, then we can turn
> > > > > > file_write_and_wait a static inline wrapper around
> > > > > > file_write_and_wait_range.
> > > > > 
> > > > > FWIW, I did a bit of archaeology in the linux-history tree and found
> > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > > > > not, then that does simplify the code a bit.
> > > > > 
> > > > > -------------------8<--------------------
> > > > > 
> > > > > [PATCH] small wait_on_page_writeback_range() optimization
> > > > > 
> > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > > > > parameter.  This is not needed since we know the EOF from the inode.  Use
> > > > > that instead.
> > > > > 
> > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > > > > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > > > > ---
> > > > >   mm/filemap.c | 8 +++++++-
> > > > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > > index 78e18b7639b6..55fb7b4141e4 100644
> > > > > --- a/mm/filemap.c
> > > > > +++ b/mm/filemap.c
> > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> > > > >    */
> > > > >   int filemap_fdatawait(struct address_space *mapping)
> > > > >   {
> > > > > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > > > > +	loff_t i_size = i_size_read(mapping->host);
> > > > > +
> > > > > +	if (i_size == 0)
> > > > > +		return 0;
> > > > > +
> > > > > +	return wait_on_page_writeback_range(mapping, 0,
> > > > > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> > > > >   }
> > > > >   EXPORT_SYMBOL(filemap_fdatawait);
> > > > > 
> > > > 
> > > > Does this ever get called in cases where we would not hold fs locks? In 
> > > > that case we definitely don't want to be relying on i_size,
> > > > 
> > > > Steve.
> > > > 
> > > 
> > > Yes. We can initiate and wait on writeback from any context where you
> > > can sleep, really.
> > > 
> > > We're just waiting on whole file writeback here, so I don't think
> > > there's anything wrong. As long as the i_size was valid at some point in
> > > time prior to waiting then you're ok.
> > > 
> > > The question I have is more whether this optimization is still useful. 
> > > 
> > > What we do now is just walk the radix tree and wait_on_page_writeback
> > > for each page. Do we gain anything by avoiding ranges beyond the current
> > > EOF with the pagecache infrastructure of 2017?
> > 
> > FWIW I'm not aware of any significant benefit of using i_size in
> > filemap_fdatawait() - we iterate to the end of the radix tree node anyway
> > since pagevec_lookup_tag() does not support range searches anyway (I'm
> > working on fixing that however even after that the benefit would be still
> > rather marginal).
> > 
> > What Marcello might have meant even back in 2004 was that if we are in the
> > middle of truncate, i_size is already reduced but page cache not truncated
> > yet, then filemap_fdatawait() does not have to wait for writeback of
> > truncated pages. That might be a noticeable benefit even today if such race
> > happens however I'm not sure it's worth optimizing for and surprises
> > arising from randomly snapshotting i_size (which especially for clustered
> > filesystems may be out of date) IMHO overweight the possible advantage.
> > 
> > 								Honza
> 
> Thanks for clarifying.
> 
> Given that file_write_and_wait is a new helper function anyway, I'll
> just make it a wrapper around file_write_and_wait_range. Since it might

Agreed.

> be racy, should remove this optimization from the "legacy"
> filemap_fdatawait / filemap_fdatawait_keep_errors calls?

I'm for it.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 13:32                   ` Jan Kara
  0 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-07-31 13:32 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Jan Kara, Steven Whitehouse, Marcelo Tosatti, Alexander Viro,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel

On Mon 31-07-17 09:00:37, Jeff Layton wrote:
> On Mon, 2017-07-31 at 14:07 +0200, Jan Kara wrote:
> > On Mon 31-07-17 07:44:16, Jeff Layton wrote:
> > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> > > > On 31/07/17 12:27, Jeff Layton wrote:
> > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > > > > +int file_write_and_wait(struct file *file)
> > > > > > > > +{
> > > > > > > > +	int err = 0, err2;
> > > > > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > > > > +
> > > > > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > > > > +		err = filemap_fdatawrite(mapping);
> > > > > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > > > > +		if (err != -EIO) {
> > > > > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > > > > +
> > > > > > > > +			if (i_size != 0)
> > > > > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > > > > +							  i_size - 1);
> > > > > > > > +		}
> > > > > > > > +	}
> > > > > > > 
> > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > > > > around possible races with file operations modifying i_size.
> > > > > > > 
> > > > > > > 								Honza
> > > > > > 
> > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > > > > as I'm leery of making subtle behavior changes in the actual writeback
> > > > > > behavior. For example:
> > > > > > 
> > > > > > -----------------8<----------------
> > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > > > > >          int sync_mode)
> > > > > > {
> > > > > >          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > > > > }
> > > > > > 
> > > > > > int filemap_fdatawrite(struct address_space *mapping)
> > > > > > {
> > > > > >          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > > > > }
> > > > > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > > > > -----------------8<----------------
> > > > > > 
> > > > > > ...which then sets up the wbc with the right ranges and sync mode and
> > > > > > kicks off writepages. But then, it does the i_size_read to figure out
> > > > > > what range it should wait on (with the shortcut for the size == 0 case).
> > > > > > 
> > > > > > My assumption was that it was intentionally designed that way, but I'm
> > > > > > guessing from your comments that it wasn't? If so, then we can turn
> > > > > > file_write_and_wait a static inline wrapper around
> > > > > > file_write_and_wait_range.
> > > > > 
> > > > > FWIW, I did a bit of archaeology in the linux-history tree and found
> > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > > > > not, then that does simplify the code a bit.
> > > > > 
> > > > > -------------------8<--------------------
> > > > > 
> > > > > [PATCH] small wait_on_page_writeback_range() optimization
> > > > > 
> > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > > > > parameter.  This is not needed since we know the EOF from the inode.  Use
> > > > > that instead.
> > > > > 
> > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > > > > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > > > > ---
> > > > >   mm/filemap.c | 8 +++++++-
> > > > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > > index 78e18b7639b6..55fb7b4141e4 100644
> > > > > --- a/mm/filemap.c
> > > > > +++ b/mm/filemap.c
> > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> > > > >    */
> > > > >   int filemap_fdatawait(struct address_space *mapping)
> > > > >   {
> > > > > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > > > > +	loff_t i_size = i_size_read(mapping->host);
> > > > > +
> > > > > +	if (i_size == 0)
> > > > > +		return 0;
> > > > > +
> > > > > +	return wait_on_page_writeback_range(mapping, 0,
> > > > > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> > > > >   }
> > > > >   EXPORT_SYMBOL(filemap_fdatawait);
> > > > > 
> > > > 
> > > > Does this ever get called in cases where we would not hold fs locks? In 
> > > > that case we definitely don't want to be relying on i_size,
> > > > 
> > > > Steve.
> > > > 
> > > 
> > > Yes. We can initiate and wait on writeback from any context where you
> > > can sleep, really.
> > > 
> > > We're just waiting on whole file writeback here, so I don't think
> > > there's anything wrong. As long as the i_size was valid at some point in
> > > time prior to waiting then you're ok.
> > > 
> > > The question I have is more whether this optimization is still useful. 
> > > 
> > > What we do now is just walk the radix tree and wait_on_page_writeback
> > > for each page. Do we gain anything by avoiding ranges beyond the current
> > > EOF with the pagecache infrastructure of 2017?
> > 
> > FWIW I'm not aware of any significant benefit of using i_size in
> > filemap_fdatawait() - we iterate to the end of the radix tree node anyway
> > since pagevec_lookup_tag() does not support range searches anyway (I'm
> > working on fixing that however even after that the benefit would be still
> > rather marginal).
> > 
> > What Marcello might have meant even back in 2004 was that if we are in the
> > middle of truncate, i_size is already reduced but page cache not truncated
> > yet, then filemap_fdatawait() does not have to wait for writeback of
> > truncated pages. That might be a noticeable benefit even today if such race
> > happens however I'm not sure it's worth optimizing for and surprises
> > arising from randomly snapshotting i_size (which especially for clustered
> > filesystems may be out of date) IMHO overweight the possible advantage.
> > 
> > 								Honza
> 
> Thanks for clarifying.
> 
> Given that file_write_and_wait is a new helper function anyway, I'll
> just make it a wrapper around file_write_and_wait_range. Since it might

Agreed.

> be racy, should remove this optimization from the "legacy"
> filemap_fdatawait / filemap_fdatawait_keep_errors calls?

I'm for it.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 13:32                   ` Jan Kara
  0 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-07-31 13:32 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Jan Kara, Steven Whitehouse, Marcelo Tosatti, Alexander Viro,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel

On Mon 31-07-17 09:00:37, Jeff Layton wrote:
> On Mon, 2017-07-31 at 14:07 +0200, Jan Kara wrote:
> > On Mon 31-07-17 07:44:16, Jeff Layton wrote:
> > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote:
> > > > On 31/07/17 12:27, Jeff Layton wrote:
> > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote:
> > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote:
> > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote:
> > > > > > > > +int file_write_and_wait(struct file *file)
> > > > > > > > +{
> > > > > > > > +	int err = 0, err2;
> > > > > > > > +	struct address_space *mapping = file->f_mapping;
> > > > > > > > +
> > > > > > > > +	if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > > > > > > > +	    (dax_mapping(mapping) && mapping->nrexceptional)) {
> > > > > > > > +		err = filemap_fdatawrite(mapping);
> > > > > > > > +		/* See comment of filemap_write_and_wait() */
> > > > > > > > +		if (err != -EIO) {
> > > > > > > > +			loff_t i_size = i_size_read(mapping->host);
> > > > > > > > +
> > > > > > > > +			if (i_size != 0)
> > > > > > > > +				__filemap_fdatawait_range(mapping, 0,
> > > > > > > > +							  i_size - 1);
> > > > > > > > +		}
> > > > > > > > +	}
> > > > > > > 
> > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the
> > > > > > > range and ignore i_size. It is much easier than trying to wrap your head
> > > > > > > around possible races with file operations modifying i_size.
> > > > > > > 
> > > > > > > 								Honza
> > > > > > 
> > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here,
> > > > > > as I'm leery of making subtle behavior changes in the actual writeback
> > > > > > behavior. For example:
> > > > > > 
> > > > > > -----------------8<----------------
> > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping,
> > > > > >          int sync_mode)
> > > > > > {
> > > > > >          return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode);
> > > > > > }
> > > > > > 
> > > > > > int filemap_fdatawrite(struct address_space *mapping)
> > > > > > {
> > > > > >          return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
> > > > > > }
> > > > > > EXPORT_SYMBOL(filemap_fdatawrite);
> > > > > > -----------------8<----------------
> > > > > > 
> > > > > > ...which then sets up the wbc with the right ranges and sync mode and
> > > > > > kicks off writepages. But then, it does the i_size_read to figure out
> > > > > > what range it should wait on (with the shortcut for the size == 0 case).
> > > > > > 
> > > > > > My assumption was that it was intentionally designed that way, but I'm
> > > > > > guessing from your comments that it wasn't? If so, then we can turn
> > > > > > file_write_and_wait a static inline wrapper around
> > > > > > file_write_and_wait_range.
> > > > > 
> > > > > FWIW, I did a bit of archaeology in the linux-history tree and found
> > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If
> > > > > not, then that does simplify the code a bit.
> > > > > 
> > > > > -------------------8<--------------------
> > > > > 
> > > > > [PATCH] small wait_on_page_writeback_range() optimization
> > > > > 
> > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end"
> > > > > parameter.  This is not needed since we know the EOF from the inode.  Use
> > > > > that instead.
> > > > > 
> > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
> > > > > Signed-off-by: Andrew Morton <akpm@osdl.org>
> > > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> > > > > ---
> > > > >   mm/filemap.c | 8 +++++++-
> > > > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > > index 78e18b7639b6..55fb7b4141e4 100644
> > > > > --- a/mm/filemap.c
> > > > > +++ b/mm/filemap.c
> > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range);
> > > > >    */
> > > > >   int filemap_fdatawait(struct address_space *mapping)
> > > > >   {
> > > > > -	return wait_on_page_writeback_range(mapping, 0, -1);
> > > > > +	loff_t i_size = i_size_read(mapping->host);
> > > > > +
> > > > > +	if (i_size == 0)
> > > > > +		return 0;
> > > > > +
> > > > > +	return wait_on_page_writeback_range(mapping, 0,
> > > > > +				(i_size - 1) >> PAGE_CACHE_SHIFT);
> > > > >   }
> > > > >   EXPORT_SYMBOL(filemap_fdatawait);
> > > > > 
> > > > 
> > > > Does this ever get called in cases where we would not hold fs locks? In 
> > > > that case we definitely don't want to be relying on i_size,
> > > > 
> > > > Steve.
> > > > 
> > > 
> > > Yes. We can initiate and wait on writeback from any context where you
> > > can sleep, really.
> > > 
> > > We're just waiting on whole file writeback here, so I don't think
> > > there's anything wrong. As long as the i_size was valid at some point in
> > > time prior to waiting then you're ok.
> > > 
> > > The question I have is more whether this optimization is still useful. 
> > > 
> > > What we do now is just walk the radix tree and wait_on_page_writeback
> > > for each page. Do we gain anything by avoiding ranges beyond the current
> > > EOF with the pagecache infrastructure of 2017?
> > 
> > FWIW I'm not aware of any significant benefit of using i_size in
> > filemap_fdatawait() - we iterate to the end of the radix tree node anyway
> > since pagevec_lookup_tag() does not support range searches anyway (I'm
> > working on fixing that however even after that the benefit would be still
> > rather marginal).
> > 
> > What Marcello might have meant even back in 2004 was that if we are in the
> > middle of truncate, i_size is already reduced but page cache not truncated
> > yet, then filemap_fdatawait() does not have to wait for writeback of
> > truncated pages. That might be a noticeable benefit even today if such race
> > happens however I'm not sure it's worth optimizing for and surprises
> > arising from randomly snapshotting i_size (which especially for clustered
> > filesystems may be out of date) IMHO overweight the possible advantage.
> > 
> > 								Honza
> 
> Thanks for clarifying.
> 
> Given that file_write_and_wait is a new helper function anyway, I'll
> just make it a wrapper around file_write_and_wait_range. Since it might

Agreed.

> be racy, should remove this optimization from the "legacy"
> filemap_fdatawait / filemap_fdatawait_keep_errors calls?

I'm for it.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v3] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-26 17:55   ` Jeff Layton
  (?)
@ 2017-07-31 16:49     ` Jeff Layton
  -1 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 16:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Jeff Layton <jlayton@redhat.com>

Necessary now for gfs2_fsync and sync_file_range, but there will
eventually be other callers.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h | 11 ++++++++++-
 mm/filemap.c       | 23 +++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

v3: make file_write_and_wait a wrapper around file_write_and_wait_range

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 526b6a9f30d4..909210bd6366 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2549,6 +2549,8 @@ static inline int filemap_fdatawait(struct address_space *mapping)
 
 extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
 				  loff_t lend);
+extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
+						loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
@@ -2557,12 +2559,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping,
 extern int filemap_fdatawrite_range(struct address_space *mapping,
 				loff_t start, loff_t end);
 extern int filemap_check_errors(struct address_space *mapping);
-
 extern void __filemap_set_wb_err(struct address_space *mapping, int err);
+
+extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
+						loff_t lend);
 extern int __must_check file_check_and_advance_wb_err(struct file *file);
 extern int __must_check file_write_and_wait_range(struct file *file,
 						loff_t start, loff_t end);
 
+static inline int file_write_and_wait(struct file *file)
+{
+	return file_write_and_wait_range(file, 0, LLONG_MAX);
+}
+
 /**
  * filemap_set_wb_err - set a writeback error on an address_space
  * @mapping: mapping in which to set writeback error
diff --git a/mm/filemap.c b/mm/filemap.c
index 953804b29a75..85dfe3bee324 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
 EXPORT_SYMBOL(filemap_fdatawait_range);
 
 /**
+ * file_fdatawait_range - wait for writeback to complete
+ * @file:		file pointing to address space structure to wait for
+ * @start_byte:		offset in bytes where the range starts
+ * @end_byte:		offset in bytes where the range ends (inclusive)
+ *
+ * Walk the list of under-writeback pages of the address space that file
+ * refers to, in the given range and wait for all of them.  Check error
+ * status of the address space vs. the file->f_wb_err cursor and return it.
+ *
+ * Since the error status of the file is advanced by this function,
+ * callers are responsible for checking the return value and handling and/or
+ * reporting the error.
+ */
+int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	__filemap_fdatawait_range(mapping, start_byte, end_byte);
+	return file_check_and_advance_wb_err(file);
+}
+EXPORT_SYMBOL(file_fdatawait_range);
+
+/**
  * filemap_fdatawait_keep_errors - wait for writeback without clearing errors
  * @mapping: address space structure to wait for
  *
-- 
2.13.3



^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 16:49     ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 16:49 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara
  Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse,
	cluster-devel

From: Jeff Layton <jlayton@redhat.com>

Necessary now for gfs2_fsync and sync_file_range, but there will
eventually be other callers.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h | 11 ++++++++++-
 mm/filemap.c       | 23 +++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

v3: make file_write_and_wait a wrapper around file_write_and_wait_range

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 526b6a9f30d4..909210bd6366 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2549,6 +2549,8 @@ static inline int filemap_fdatawait(struct address_space *mapping)
 
 extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
 				  loff_t lend);
+extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
+						loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
@@ -2557,12 +2559,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping,
 extern int filemap_fdatawrite_range(struct address_space *mapping,
 				loff_t start, loff_t end);
 extern int filemap_check_errors(struct address_space *mapping);
-
 extern void __filemap_set_wb_err(struct address_space *mapping, int err);
+
+extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
+						loff_t lend);
 extern int __must_check file_check_and_advance_wb_err(struct file *file);
 extern int __must_check file_write_and_wait_range(struct file *file,
 						loff_t start, loff_t end);
 
+static inline int file_write_and_wait(struct file *file)
+{
+	return file_write_and_wait_range(file, 0, LLONG_MAX);
+}
+
 /**
  * filemap_set_wb_err - set a writeback error on an address_space
  * @mapping: mapping in which to set writeback error
diff --git a/mm/filemap.c b/mm/filemap.c
index 953804b29a75..85dfe3bee324 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
 EXPORT_SYMBOL(filemap_fdatawait_range);
 
 /**
+ * file_fdatawait_range - wait for writeback to complete
+ * @file:		file pointing to address space structure to wait for
+ * @start_byte:		offset in bytes where the range starts
+ * @end_byte:		offset in bytes where the range ends (inclusive)
+ *
+ * Walk the list of under-writeback pages of the address space that file
+ * refers to, in the given range and wait for all of them.  Check error
+ * status of the address space vs. the file->f_wb_err cursor and return it.
+ *
+ * Since the error status of the file is advanced by this function,
+ * callers are responsible for checking the return value and handling and/or
+ * reporting the error.
+ */
+int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	__filemap_fdatawait_range(mapping, start_byte, end_byte);
+	return file_check_and_advance_wb_err(file);
+}
+EXPORT_SYMBOL(file_fdatawait_range);
+
+/**
  * filemap_fdatawait_keep_errors - wait for writeback without clearing errors
  * @mapping: address space structure to wait for
  *
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-07-31 16:49     ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-31 16:49 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara
  Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse,
	cluster-devel

From: Jeff Layton <jlayton@redhat.com>

Necessary now for gfs2_fsync and sync_file_range, but there will
eventually be other callers.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h | 11 ++++++++++-
 mm/filemap.c       | 23 +++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

v3: make file_write_and_wait a wrapper around file_write_and_wait_range

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 526b6a9f30d4..909210bd6366 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2549,6 +2549,8 @@ static inline int filemap_fdatawait(struct address_space *mapping)
 
 extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
 				  loff_t lend);
+extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
+						loff_t lend);
 extern int filemap_write_and_wait(struct address_space *mapping);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
@@ -2557,12 +2559,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping,
 extern int filemap_fdatawrite_range(struct address_space *mapping,
 				loff_t start, loff_t end);
 extern int filemap_check_errors(struct address_space *mapping);
-
 extern void __filemap_set_wb_err(struct address_space *mapping, int err);
+
+extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
+						loff_t lend);
 extern int __must_check file_check_and_advance_wb_err(struct file *file);
 extern int __must_check file_write_and_wait_range(struct file *file,
 						loff_t start, loff_t end);
 
+static inline int file_write_and_wait(struct file *file)
+{
+	return file_write_and_wait_range(file, 0, LLONG_MAX);
+}
+
 /**
  * filemap_set_wb_err - set a writeback error on an address_space
  * @mapping: mapping in which to set writeback error
diff --git a/mm/filemap.c b/mm/filemap.c
index 953804b29a75..85dfe3bee324 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
 EXPORT_SYMBOL(filemap_fdatawait_range);
 
 /**
+ * file_fdatawait_range - wait for writeback to complete
+ * @file:		file pointing to address space structure to wait for
+ * @start_byte:		offset in bytes where the range starts
+ * @end_byte:		offset in bytes where the range ends (inclusive)
+ *
+ * Walk the list of under-writeback pages of the address space that file
+ * refers to, in the given range and wait for all of them.  Check error
+ * status of the address space vs. the file->f_wb_err cursor and return it.
+ *
+ * Since the error status of the file is advanced by this function,
+ * callers are responsible for checking the return value and handling and/or
+ * reporting the error.
+ */
+int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte)
+{
+	struct address_space *mapping = file->f_mapping;
+
+	__filemap_fdatawait_range(mapping, start_byte, end_byte);
+	return file_check_and_advance_wb_err(file);
+}
+EXPORT_SYMBOL(file_fdatawait_range);
+
+/**
  * filemap_fdatawait_keep_errors - wait for writeback without clearing errors
  * @mapping: address space structure to wait for
  *
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v3] mm: add file_fdatawait_range and file_write_and_wait
  2017-07-31 16:49     ` Jeff Layton
  (?)
@ 2017-08-01  9:52       ` Jan Kara
  -1 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-08-01  9:52 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon 31-07-17 12:49:25, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> Necessary now for gfs2_fsync and sync_file_range, but there will
> eventually be other callers.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>

Looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/fs.h | 11 ++++++++++-
>  mm/filemap.c       | 23 +++++++++++++++++++++++
>  2 files changed, 33 insertions(+), 1 deletion(-)
> 
> v3: make file_write_and_wait a wrapper around file_write_and_wait_range
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 526b6a9f30d4..909210bd6366 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2549,6 +2549,8 @@ static inline int filemap_fdatawait(struct address_space *mapping)
>  
>  extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
>  				  loff_t lend);
> +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
> +						loff_t lend);
>  extern int filemap_write_and_wait(struct address_space *mapping);
>  extern int filemap_write_and_wait_range(struct address_space *mapping,
>  				        loff_t lstart, loff_t lend);
> @@ -2557,12 +2559,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping,
>  extern int filemap_fdatawrite_range(struct address_space *mapping,
>  				loff_t start, loff_t end);
>  extern int filemap_check_errors(struct address_space *mapping);
> -
>  extern void __filemap_set_wb_err(struct address_space *mapping, int err);
> +
> +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
> +						loff_t lend);
>  extern int __must_check file_check_and_advance_wb_err(struct file *file);
>  extern int __must_check file_write_and_wait_range(struct file *file,
>  						loff_t start, loff_t end);
>  
> +static inline int file_write_and_wait(struct file *file)
> +{
> +	return file_write_and_wait_range(file, 0, LLONG_MAX);
> +}
> +
>  /**
>   * filemap_set_wb_err - set a writeback error on an address_space
>   * @mapping: mapping in which to set writeback error
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 953804b29a75..85dfe3bee324 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
>  EXPORT_SYMBOL(filemap_fdatawait_range);
>  
>  /**
> + * file_fdatawait_range - wait for writeback to complete
> + * @file:		file pointing to address space structure to wait for
> + * @start_byte:		offset in bytes where the range starts
> + * @end_byte:		offset in bytes where the range ends (inclusive)
> + *
> + * Walk the list of under-writeback pages of the address space that file
> + * refers to, in the given range and wait for all of them.  Check error
> + * status of the address space vs. the file->f_wb_err cursor and return it.
> + *
> + * Since the error status of the file is advanced by this function,
> + * callers are responsible for checking the return value and handling and/or
> + * reporting the error.
> + */
> +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte)
> +{
> +	struct address_space *mapping = file->f_mapping;
> +
> +	__filemap_fdatawait_range(mapping, start_byte, end_byte);
> +	return file_check_and_advance_wb_err(file);
> +}
> +EXPORT_SYMBOL(file_fdatawait_range);
> +
> +/**
>   * filemap_fdatawait_keep_errors - wait for writeback without clearing errors
>   * @mapping: address space structure to wait for
>   *
> -- 
> 2.13.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-08-01  9:52       ` Jan Kara
  0 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-08-01  9:52 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox,
	Bob Peterson, Steven Whitehouse, cluster-devel

On Mon 31-07-17 12:49:25, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> Necessary now for gfs2_fsync and sync_file_range, but there will
> eventually be other callers.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>

Looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/fs.h | 11 ++++++++++-
>  mm/filemap.c       | 23 +++++++++++++++++++++++
>  2 files changed, 33 insertions(+), 1 deletion(-)
> 
> v3: make file_write_and_wait a wrapper around file_write_and_wait_range
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 526b6a9f30d4..909210bd6366 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2549,6 +2549,8 @@ static inline int filemap_fdatawait(struct address_space *mapping)
>  
>  extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
>  				  loff_t lend);
> +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
> +						loff_t lend);
>  extern int filemap_write_and_wait(struct address_space *mapping);
>  extern int filemap_write_and_wait_range(struct address_space *mapping,
>  				        loff_t lstart, loff_t lend);
> @@ -2557,12 +2559,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping,
>  extern int filemap_fdatawrite_range(struct address_space *mapping,
>  				loff_t start, loff_t end);
>  extern int filemap_check_errors(struct address_space *mapping);
> -
>  extern void __filemap_set_wb_err(struct address_space *mapping, int err);
> +
> +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
> +						loff_t lend);
>  extern int __must_check file_check_and_advance_wb_err(struct file *file);
>  extern int __must_check file_write_and_wait_range(struct file *file,
>  						loff_t start, loff_t end);
>  
> +static inline int file_write_and_wait(struct file *file)
> +{
> +	return file_write_and_wait_range(file, 0, LLONG_MAX);
> +}
> +
>  /**
>   * filemap_set_wb_err - set a writeback error on an address_space
>   * @mapping: mapping in which to set writeback error
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 953804b29a75..85dfe3bee324 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
>  EXPORT_SYMBOL(filemap_fdatawait_range);
>  
>  /**
> + * file_fdatawait_range - wait for writeback to complete
> + * @file:		file pointing to address space structure to wait for
> + * @start_byte:		offset in bytes where the range starts
> + * @end_byte:		offset in bytes where the range ends (inclusive)
> + *
> + * Walk the list of under-writeback pages of the address space that file
> + * refers to, in the given range and wait for all of them.  Check error
> + * status of the address space vs. the file->f_wb_err cursor and return it.
> + *
> + * Since the error status of the file is advanced by this function,
> + * callers are responsible for checking the return value and handling and/or
> + * reporting the error.
> + */
> +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte)
> +{
> +	struct address_space *mapping = file->f_mapping;
> +
> +	__filemap_fdatawait_range(mapping, start_byte, end_byte);
> +	return file_check_and_advance_wb_err(file);
> +}
> +EXPORT_SYMBOL(file_fdatawait_range);
> +
> +/**
>   * filemap_fdatawait_keep_errors - wait for writeback without clearing errors
>   * @mapping: address space structure to wait for
>   *
> -- 
> 2.13.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3] mm: add file_fdatawait_range and file_write_and_wait
@ 2017-08-01  9:52       ` Jan Kara
  0 siblings, 0 replies; 87+ messages in thread
From: Jan Kara @ 2017-08-01  9:52 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox,
	Bob Peterson, Steven Whitehouse, cluster-devel

On Mon 31-07-17 12:49:25, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> Necessary now for gfs2_fsync and sync_file_range, but there will
> eventually be other callers.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>

Looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/fs.h | 11 ++++++++++-
>  mm/filemap.c       | 23 +++++++++++++++++++++++
>  2 files changed, 33 insertions(+), 1 deletion(-)
> 
> v3: make file_write_and_wait a wrapper around file_write_and_wait_range
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 526b6a9f30d4..909210bd6366 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2549,6 +2549,8 @@ static inline int filemap_fdatawait(struct address_space *mapping)
>  
>  extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
>  				  loff_t lend);
> +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
> +						loff_t lend);
>  extern int filemap_write_and_wait(struct address_space *mapping);
>  extern int filemap_write_and_wait_range(struct address_space *mapping,
>  				        loff_t lstart, loff_t lend);
> @@ -2557,12 +2559,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping,
>  extern int filemap_fdatawrite_range(struct address_space *mapping,
>  				loff_t start, loff_t end);
>  extern int filemap_check_errors(struct address_space *mapping);
> -
>  extern void __filemap_set_wb_err(struct address_space *mapping, int err);
> +
> +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
> +						loff_t lend);
>  extern int __must_check file_check_and_advance_wb_err(struct file *file);
>  extern int __must_check file_write_and_wait_range(struct file *file,
>  						loff_t start, loff_t end);
>  
> +static inline int file_write_and_wait(struct file *file)
> +{
> +	return file_write_and_wait_range(file, 0, LLONG_MAX);
> +}
> +
>  /**
>   * filemap_set_wb_err - set a writeback error on an address_space
>   * @mapping: mapping in which to set writeback error
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 953804b29a75..85dfe3bee324 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
>  EXPORT_SYMBOL(filemap_fdatawait_range);
>  
>  /**
> + * file_fdatawait_range - wait for writeback to complete
> + * @file:		file pointing to address space structure to wait for
> + * @start_byte:		offset in bytes where the range starts
> + * @end_byte:		offset in bytes where the range ends (inclusive)
> + *
> + * Walk the list of under-writeback pages of the address space that file
> + * refers to, in the given range and wait for all of them.  Check error
> + * status of the address space vs. the file->f_wb_err cursor and return it.
> + *
> + * Since the error status of the file is advanced by this function,
> + * callers are responsible for checking the return value and handling and/or
> + * reporting the error.
> + */
> +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte)
> +{
> +	struct address_space *mapping = file->f_mapping;
> +
> +	__filemap_fdatawait_range(mapping, start_byte, end_byte);
> +	return file_check_and_advance_wb_err(file);
> +}
> +EXPORT_SYMBOL(file_fdatawait_range);
> +
> +/**
>   * filemap_fdatawait_keep_errors - wait for writeback without clearing errors
>   * @mapping: address space structure to wait for
>   *
> -- 
> 2.13.3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 3/4] fs: convert sync_file_range to use errseq_t based error-tracking
  2017-07-26 17:55 ` Jeff Layton
  (?)
@ 2017-07-26 17:55   ` Jeff Layton
  -1 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Jeff Layton <jlayton@redhat.com>

sync_file_range doesn't call down into the filesystem directly at all.
It only kicks off writeback of pagecache pages and optionally waits
on the result.

Convert sync_file_range to use errseq_t based error tracking, under the
assumption that most users will prefer this behavior when errors occur.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/sync.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 2a54c1f22035..27d6b8bbcb6a 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -342,7 +342,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
 
 	ret = 0;
 	if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) {
-		ret = filemap_fdatawait_range(mapping, offset, endbyte);
+		ret = file_fdatawait_range(f.file, offset, endbyte);
 		if (ret < 0)
 			goto out_put;
 	}
@@ -355,7 +355,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
 	}
 
 	if (flags & SYNC_FILE_RANGE_WAIT_AFTER)
-		ret = filemap_fdatawait_range(mapping, offset, endbyte);
+		ret = file_fdatawait_range(f.file, offset, endbyte);
 
 out_put:
 	fdput(f);
-- 
2.13.3



^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 3/4] fs: convert sync_file_range to use errseq_t based error-tracking
@ 2017-07-26 17:55   ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara
  Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse,
	cluster-devel

From: Jeff Layton <jlayton@redhat.com>

sync_file_range doesn't call down into the filesystem directly at all.
It only kicks off writeback of pagecache pages and optionally waits
on the result.

Convert sync_file_range to use errseq_t based error tracking, under the
assumption that most users will prefer this behavior when errors occur.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/sync.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 2a54c1f22035..27d6b8bbcb6a 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -342,7 +342,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
 
 	ret = 0;
 	if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) {
-		ret = filemap_fdatawait_range(mapping, offset, endbyte);
+		ret = file_fdatawait_range(f.file, offset, endbyte);
 		if (ret < 0)
 			goto out_put;
 	}
@@ -355,7 +355,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
 	}
 
 	if (flags & SYNC_FILE_RANGE_WAIT_AFTER)
-		ret = filemap_fdatawait_range(mapping, offset, endbyte);
+		ret = file_fdatawait_range(f.file, offset, endbyte);
 
 out_put:
 	fdput(f);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 3/4] fs: convert sync_file_range to use errseq_t based error-tracking
@ 2017-07-26 17:55   ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara
  Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse,
	cluster-devel

From: Jeff Layton <jlayton@redhat.com>

sync_file_range doesn't call down into the filesystem directly at all.
It only kicks off writeback of pagecache pages and optionally waits
on the result.

Convert sync_file_range to use errseq_t based error tracking, under the
assumption that most users will prefer this behavior when errors occur.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/sync.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index 2a54c1f22035..27d6b8bbcb6a 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -342,7 +342,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
 
 	ret = 0;
 	if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) {
-		ret = filemap_fdatawait_range(mapping, offset, endbyte);
+		ret = file_fdatawait_range(f.file, offset, endbyte);
 		if (ret < 0)
 			goto out_put;
 	}
@@ -355,7 +355,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
 	}
 
 	if (flags & SYNC_FILE_RANGE_WAIT_AFTER)
-		ret = filemap_fdatawait_range(mapping, offset, endbyte);
+		ret = file_fdatawait_range(f.file, offset, endbyte);
 
 out_put:
 	fdput(f);
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
  2017-07-26 17:55 ` Jeff Layton
  (?)
@ 2017-07-26 17:55   ` Jeff Layton
  -1 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: Jeff Layton <jlayton@redhat.com>

This means that we need to export the new file_fdatawait_range symbol.

Also, fix a place where a writeback error might get dropped in the
gfs2_is_jdata case.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/gfs2/file.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index c2062a108d19..c53ac6efd04c 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 		if (ret)
 			return ret;
 		if (gfs2_is_jdata(ip))
-			filemap_write_and_wait(mapping);
+			ret = file_write_and_wait(file);
+		if (ret)
+			return ret;
 		gfs2_ail_flush(ip->i_gl, 1);
 	}
 
 	if (mapping->nrpages)
-		ret = filemap_fdatawait_range(mapping, start, end);
+		ret = file_fdatawait_range(file, start, end);
 
 	return ret ? ret : ret1;
 }
-- 
2.13.3



^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
@ 2017-07-26 17:55   ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara
  Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse,
	cluster-devel

From: Jeff Layton <jlayton@redhat.com>

This means that we need to export the new file_fdatawait_range symbol.

Also, fix a place where a writeback error might get dropped in the
gfs2_is_jdata case.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/gfs2/file.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index c2062a108d19..c53ac6efd04c 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 		if (ret)
 			return ret;
 		if (gfs2_is_jdata(ip))
-			filemap_write_and_wait(mapping);
+			ret = file_write_and_wait(file);
+		if (ret)
+			return ret;
 		gfs2_ail_flush(ip->i_gl, 1);
 	}
 
 	if (mapping->nrpages)
-		ret = filemap_fdatawait_range(mapping, start, end);
+		ret = file_fdatawait_range(file, start, end);
 
 	return ret ? ret : ret1;
 }
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
@ 2017-07-26 17:55   ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw)
  To: Alexander Viro, Jan Kara
  Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse,
	cluster-devel

From: Jeff Layton <jlayton@redhat.com>

This means that we need to export the new file_fdatawait_range symbol.

Also, fix a place where a writeback error might get dropped in the
gfs2_is_jdata case.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/gfs2/file.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index c2062a108d19..c53ac6efd04c 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
 		if (ret)
 			return ret;
 		if (gfs2_is_jdata(ip))
-			filemap_write_and_wait(mapping);
+			ret = file_write_and_wait(file);
+		if (ret)
+			return ret;
 		gfs2_ail_flush(ip->i_gl, 1);
 	}
 
 	if (mapping->nrpages)
-		ret = filemap_fdatawait_range(mapping, start, end);
+		ret = file_fdatawait_range(file, start, end);
 
 	return ret ? ret : ret1;
 }
-- 
2.13.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
  2017-07-26 17:55   ` Jeff Layton
  (?)
@ 2017-07-26 19:21     ` Matthew Wilcox
  -1 siblings, 0 replies; 87+ messages in thread
From: Matthew Wilcox @ 2017-07-26 19:21 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
> @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
>  		if (ret)
>  			return ret;
>  		if (gfs2_is_jdata(ip))
> -			filemap_write_and_wait(mapping);
> +			ret = file_write_and_wait(file);
> +		if (ret)
> +			return ret;
>  		gfs2_ail_flush(ip->i_gl, 1);
>  	}

Do we want to skip flushing the AIL if there was an error (possibly
previously encountered)?  I'd think we'd want to flush the AIL then report
the error, like this:

 		if (gfs2_is_jdata(ip))
-			filemap_write_and_wait(mapping);
+			ret = file_write_and_wait(file);
 		gfs2_ail_flush(ip->i_gl, 1);
+		if (ret)
+			return ret;
 	}



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
@ 2017-07-26 19:21     ` Matthew Wilcox
  0 siblings, 0 replies; 87+ messages in thread
From: Matthew Wilcox @ 2017-07-26 19:21 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Bob Peterson,
	Steven Whitehouse, cluster-devel

On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
> @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
>  		if (ret)
>  			return ret;
>  		if (gfs2_is_jdata(ip))
> -			filemap_write_and_wait(mapping);
> +			ret = file_write_and_wait(file);
> +		if (ret)
> +			return ret;
>  		gfs2_ail_flush(ip->i_gl, 1);
>  	}

Do we want to skip flushing the AIL if there was an error (possibly
previously encountered)?  I'd think we'd want to flush the AIL then report
the error, like this:

 		if (gfs2_is_jdata(ip))
-			filemap_write_and_wait(mapping);
+			ret = file_write_and_wait(file);
 		gfs2_ail_flush(ip->i_gl, 1);
+		if (ret)
+			return ret;
 	}

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
@ 2017-07-26 19:21     ` Matthew Wilcox
  0 siblings, 0 replies; 87+ messages in thread
From: Matthew Wilcox @ 2017-07-26 19:21 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Bob Peterson,
	Steven Whitehouse, cluster-devel

On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
> @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
>  		if (ret)
>  			return ret;
>  		if (gfs2_is_jdata(ip))
> -			filemap_write_and_wait(mapping);
> +			ret = file_write_and_wait(file);
> +		if (ret)
> +			return ret;
>  		gfs2_ail_flush(ip->i_gl, 1);
>  	}

Do we want to skip flushing the AIL if there was an error (possibly
previously encountered)?  I'd think we'd want to flush the AIL then report
the error, like this:

 		if (gfs2_is_jdata(ip))
-			filemap_write_and_wait(mapping);
+			ret = file_write_and_wait(file);
 		gfs2_ail_flush(ip->i_gl, 1);
+		if (ret)
+			return ret;
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
  2017-07-26 19:21     ` Matthew Wilcox
  (?)
@ 2017-07-26 22:22       ` Jeff Layton
  -1 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 22:22 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
> On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
> > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
> >  		if (ret)
> >  			return ret;
> >  		if (gfs2_is_jdata(ip))
> > -			filemap_write_and_wait(mapping);
> > +			ret = file_write_and_wait(file);
> > +		if (ret)
> > +			return ret;
> >  		gfs2_ail_flush(ip->i_gl, 1);
> >  	}
> 
> Do we want to skip flushing the AIL if there was an error (possibly
> previously encountered)?  I'd think we'd want to flush the AIL then report
> the error, like this:
> 

I wondered about that. Note that earlier in the function, we also bail
out without flushing the AIL if sync_inode_metadata fails, so I assumed
that we'd want to do the same here. 

I could definitely be wrong and am fine with changing it if so.
Discarding the error like we do today seems wrong though.

Bob, thoughts?


>  		if (gfs2_is_jdata(ip))
> -			filemap_write_and_wait(mapping);
> +			ret = file_write_and_wait(file);
>  		gfs2_ail_flush(ip->i_gl, 1);
> +		if (ret)
> +			return ret;
>  	}
-- 
Jeff Layton <jlayton@redhat.com>



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
@ 2017-07-26 22:22       ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 22:22 UTC (permalink / raw)
  To: Matthew Wilcox, Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Bob Peterson,
	Steven Whitehouse, cluster-devel

On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
> On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
> > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
> >  		if (ret)
> >  			return ret;
> >  		if (gfs2_is_jdata(ip))
> > -			filemap_write_and_wait(mapping);
> > +			ret = file_write_and_wait(file);
> > +		if (ret)
> > +			return ret;
> >  		gfs2_ail_flush(ip->i_gl, 1);
> >  	}
> 
> Do we want to skip flushing the AIL if there was an error (possibly
> previously encountered)?  I'd think we'd want to flush the AIL then report
> the error, like this:
> 

I wondered about that. Note that earlier in the function, we also bail
out without flushing the AIL if sync_inode_metadata fails, so I assumed
that we'd want to do the same here. 

I could definitely be wrong and am fine with changing it if so.
Discarding the error like we do today seems wrong though.

Bob, thoughts?


>  		if (gfs2_is_jdata(ip))
> -			filemap_write_and_wait(mapping);
> +			ret = file_write_and_wait(file);
>  		gfs2_ail_flush(ip->i_gl, 1);
> +		if (ret)
> +			return ret;
>  	}
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
@ 2017-07-26 22:22       ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-26 22:22 UTC (permalink / raw)
  To: Matthew Wilcox, Jeff Layton
  Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton,
	linux-fsdevel, linux-kernel, linux-mm, Bob Peterson,
	Steven Whitehouse, cluster-devel

On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
> On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
> > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
> >  		if (ret)
> >  			return ret;
> >  		if (gfs2_is_jdata(ip))
> > -			filemap_write_and_wait(mapping);
> > +			ret = file_write_and_wait(file);
> > +		if (ret)
> > +			return ret;
> >  		gfs2_ail_flush(ip->i_gl, 1);
> >  	}
> 
> Do we want to skip flushing the AIL if there was an error (possibly
> previously encountered)?  I'd think we'd want to flush the AIL then report
> the error, like this:
> 

I wondered about that. Note that earlier in the function, we also bail
out without flushing the AIL if sync_inode_metadata fails, so I assumed
that we'd want to do the same here. 

I could definitely be wrong and am fine with changing it if so.
Discarding the error like we do today seems wrong though.

Bob, thoughts?


>  		if (gfs2_is_jdata(ip))
> -			filemap_write_and_wait(mapping);
> +			ret = file_write_and_wait(file);
>  		gfs2_ail_flush(ip->i_gl, 1);
> +		if (ret)
> +			return ret;
>  	}
-- 
Jeff Layton <jlayton@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
  2017-07-26 22:22       ` Jeff Layton
  (?)
@ 2017-07-27 12:47         ` Bob Peterson
  -1 siblings, 0 replies; 87+ messages in thread
From: Bob Peterson @ 2017-07-27 12:47 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
| On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
| > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
| > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t
| > > start, loff_t end,
| > >  		if (ret)
| > >  			return ret;
| > >  		if (gfs2_is_jdata(ip))
| > > -			filemap_write_and_wait(mapping);
| > > +			ret = file_write_and_wait(file);
| > > +		if (ret)
| > > +			return ret;
| > >  		gfs2_ail_flush(ip->i_gl, 1);
| > >  	}
| > 
| > Do we want to skip flushing the AIL if there was an error (possibly
| > previously encountered)?  I'd think we'd want to flush the AIL then report
| > the error, like this:
| > 
| 
| I wondered about that. Note that earlier in the function, we also bail
| out without flushing the AIL if sync_inode_metadata fails, so I assumed
| that we'd want to do the same here.
| 
| I could definitely be wrong and am fine with changing it if so.
| Discarding the error like we do today seems wrong though.
| 
| Bob, thoughts?

Hi Jeff, Matthew,

I'm not sure there's a right or wrong answer here. I don't know what's
best from a "correctness" point of view.

I guess I'm leaning toward Jeff's original solution where we don't
call gfs2_ail_flush() on error. The main purpose of ail_flush is to
go through buffer descriptors (bds) attached to the glock and generate
revokes for them in a new transaction. If there's an error condition,
trying to go through more hoops will probably just get us into more
trouble. If the error is -ENOMEM, we don't want to allocate new memory
for the new transaction. If the error is -EIO, we probably don't
want to encourage more writing either.

So on the one hand, it might be good to get rid of the buffer descriptors
so we don't leak memory, but that's probably also done elsewhere.
I have not chased down what happens in that case, but the same thing
would happen in the existing -EIO case a few lines above.

On the other hand, we probably don't want to start a new transaction
and start adding revokes to it, and such, due to the error.

Perhaps Steve Whitehouse can weigh in?

Regards,

Bob Peterson
Red Hat File Systems

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
@ 2017-07-27 12:47         ` Bob Peterson
  0 siblings, 0 replies; 87+ messages in thread
From: Bob Peterson @ 2017-07-27 12:47 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Steven Whitehouse, cluster-devel

----- Original Message -----
| On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
| > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
| > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t
| > > start, loff_t end,
| > >  		if (ret)
| > >  			return ret;
| > >  		if (gfs2_is_jdata(ip))
| > > -			filemap_write_and_wait(mapping);
| > > +			ret = file_write_and_wait(file);
| > > +		if (ret)
| > > +			return ret;
| > >  		gfs2_ail_flush(ip->i_gl, 1);
| > >  	}
| > 
| > Do we want to skip flushing the AIL if there was an error (possibly
| > previously encountered)?  I'd think we'd want to flush the AIL then report
| > the error, like this:
| > 
| 
| I wondered about that. Note that earlier in the function, we also bail
| out without flushing the AIL if sync_inode_metadata fails, so I assumed
| that we'd want to do the same here.
| 
| I could definitely be wrong and am fine with changing it if so.
| Discarding the error like we do today seems wrong though.
| 
| Bob, thoughts?

Hi Jeff, Matthew,

I'm not sure there's a right or wrong answer here. I don't know what's
best from a "correctness" point of view.

I guess I'm leaning toward Jeff's original solution where we don't
call gfs2_ail_flush() on error. The main purpose of ail_flush is to
go through buffer descriptors (bds) attached to the glock and generate
revokes for them in a new transaction. If there's an error condition,
trying to go through more hoops will probably just get us into more
trouble. If the error is -ENOMEM, we don't want to allocate new memory
for the new transaction. If the error is -EIO, we probably don't
want to encourage more writing either.

So on the one hand, it might be good to get rid of the buffer descriptors
so we don't leak memory, but that's probably also done elsewhere.
I have not chased down what happens in that case, but the same thing
would happen in the existing -EIO case a few lines above.

On the other hand, we probably don't want to start a new transaction
and start adding revokes to it, and such, due to the error.

Perhaps Steve Whitehouse can weigh in?

Regards,

Bob Peterson
Red Hat File Systems

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
@ 2017-07-27 12:47         ` Bob Peterson
  0 siblings, 0 replies; 87+ messages in thread
From: Bob Peterson @ 2017-07-27 12:47 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, Steven Whitehouse, cluster-devel

----- Original Message -----
| On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
| > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
| > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t
| > > start, loff_t end,
| > >  		if (ret)
| > >  			return ret;
| > >  		if (gfs2_is_jdata(ip))
| > > -			filemap_write_and_wait(mapping);
| > > +			ret = file_write_and_wait(file);
| > > +		if (ret)
| > > +			return ret;
| > >  		gfs2_ail_flush(ip->i_gl, 1);
| > >  	}
| > 
| > Do we want to skip flushing the AIL if there was an error (possibly
| > previously encountered)?  I'd think we'd want to flush the AIL then report
| > the error, like this:
| > 
| 
| I wondered about that. Note that earlier in the function, we also bail
| out without flushing the AIL if sync_inode_metadata fails, so I assumed
| that we'd want to do the same here.
| 
| I could definitely be wrong and am fine with changing it if so.
| Discarding the error like we do today seems wrong though.
| 
| Bob, thoughts?

Hi Jeff, Matthew,

I'm not sure there's a right or wrong answer here. I don't know what's
best from a "correctness" point of view.

I guess I'm leaning toward Jeff's original solution where we don't
call gfs2_ail_flush() on error. The main purpose of ail_flush is to
go through buffer descriptors (bds) attached to the glock and generate
revokes for them in a new transaction. If there's an error condition,
trying to go through more hoops will probably just get us into more
trouble. If the error is -ENOMEM, we don't want to allocate new memory
for the new transaction. If the error is -EIO, we probably don't
want to encourage more writing either.

So on the one hand, it might be good to get rid of the buffer descriptors
so we don't leak memory, but that's probably also done elsewhere.
I have not chased down what happens in that case, but the same thing
would happen in the existing -EIO case a few lines above.

On the other hand, we probably don't want to start a new transaction
and start adding revokes to it, and such, due to the error.

Perhaps Steve Whitehouse can weigh in?

Regards,

Bob Peterson
Red Hat File Systems

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
  2017-07-27 12:47         ` Bob Peterson
  (?)
@ 2017-07-28 12:37           ` Steven Whitehouse
  -1 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-28 12:37 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 27/07/17 13:47, Bob Peterson wrote:
> ----- Original Message -----
> | On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
> | > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
> | > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t
> | > > start, loff_t end,
> | > >  		if (ret)
> | > >  			return ret;
> | > >  		if (gfs2_is_jdata(ip))
> | > > -			filemap_write_and_wait(mapping);
> | > > +			ret = file_write_and_wait(file);
> | > > +		if (ret)
> | > > +			return ret;
> | > >  		gfs2_ail_flush(ip->i_gl, 1);
> | > >  	}
> | >
> | > Do we want to skip flushing the AIL if there was an error (possibly
> | > previously encountered)?  I'd think we'd want to flush the AIL then report
> | > the error, like this:
> | >
> |
> | I wondered about that. Note that earlier in the function, we also bail
> | out without flushing the AIL if sync_inode_metadata fails, so I assumed
> | that we'd want to do the same here.
> |
> | I could definitely be wrong and am fine with changing it if so.
> | Discarding the error like we do today seems wrong though.
> |
> | Bob, thoughts?
>
> Hi Jeff, Matthew,
>
> I'm not sure there's a right or wrong answer here. I don't know what's
> best from a "correctness" point of view.
>
> I guess I'm leaning toward Jeff's original solution where we don't
> call gfs2_ail_flush() on error. The main purpose of ail_flush is to
> go through buffer descriptors (bds) attached to the glock and generate
> revokes for them in a new transaction. If there's an error condition,
> trying to go through more hoops will probably just get us into more
> trouble. If the error is -ENOMEM, we don't want to allocate new memory
> for the new transaction. If the error is -EIO, we probably don't
> want to encourage more writing either.
>
> So on the one hand, it might be good to get rid of the buffer descriptors
> so we don't leak memory, but that's probably also done elsewhere.
> I have not chased down what happens in that case, but the same thing
> would happen in the existing -EIO case a few lines above.
>
> On the other hand, we probably don't want to start a new transaction
> and start adding revokes to it, and such, due to the error.
>
> Perhaps Steve Whitehouse can weigh in?
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems

Yes, we probably do want to skip the ail flush if there is an error. We 
don't know whether the error is permanent or transient at that stage. If 
a previous stage of the fsync has failed, then there may be nothing for 
the next stage to do anyway, so it is probably not a big deal either 
way. So long as the error is reported to the caller, then we should be ok,

Steve.



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
@ 2017-07-28 12:37           ` Steven Whitehouse
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-28 12:37 UTC (permalink / raw)
  To: Bob Peterson, Jeff Layton
  Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, cluster-devel

Hi,


On 27/07/17 13:47, Bob Peterson wrote:
> ----- Original Message -----
> | On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
> | > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
> | > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t
> | > > start, loff_t end,
> | > >  		if (ret)
> | > >  			return ret;
> | > >  		if (gfs2_is_jdata(ip))
> | > > -			filemap_write_and_wait(mapping);
> | > > +			ret = file_write_and_wait(file);
> | > > +		if (ret)
> | > > +			return ret;
> | > >  		gfs2_ail_flush(ip->i_gl, 1);
> | > >  	}
> | >
> | > Do we want to skip flushing the AIL if there was an error (possibly
> | > previously encountered)?  I'd think we'd want to flush the AIL then report
> | > the error, like this:
> | >
> |
> | I wondered about that. Note that earlier in the function, we also bail
> | out without flushing the AIL if sync_inode_metadata fails, so I assumed
> | that we'd want to do the same here.
> |
> | I could definitely be wrong and am fine with changing it if so.
> | Discarding the error like we do today seems wrong though.
> |
> | Bob, thoughts?
>
> Hi Jeff, Matthew,
>
> I'm not sure there's a right or wrong answer here. I don't know what's
> best from a "correctness" point of view.
>
> I guess I'm leaning toward Jeff's original solution where we don't
> call gfs2_ail_flush() on error. The main purpose of ail_flush is to
> go through buffer descriptors (bds) attached to the glock and generate
> revokes for them in a new transaction. If there's an error condition,
> trying to go through more hoops will probably just get us into more
> trouble. If the error is -ENOMEM, we don't want to allocate new memory
> for the new transaction. If the error is -EIO, we probably don't
> want to encourage more writing either.
>
> So on the one hand, it might be good to get rid of the buffer descriptors
> so we don't leak memory, but that's probably also done elsewhere.
> I have not chased down what happens in that case, but the same thing
> would happen in the existing -EIO case a few lines above.
>
> On the other hand, we probably don't want to start a new transaction
> and start adding revokes to it, and such, due to the error.
>
> Perhaps Steve Whitehouse can weigh in?
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems

Yes, we probably do want to skip the ail flush if there is an error. We 
don't know whether the error is permanent or transient at that stage. If 
a previous stage of the fsync has failed, then there may be nothing for 
the next stage to do anyway, so it is probably not a big deal either 
way. So long as the error is reported to the caller, then we should be ok,

Steve.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
@ 2017-07-28 12:37           ` Steven Whitehouse
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-28 12:37 UTC (permalink / raw)
  To: Bob Peterson, Jeff Layton
  Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, cluster-devel

Hi,


On 27/07/17 13:47, Bob Peterson wrote:
> ----- Original Message -----
> | On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
> | > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
> | > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t
> | > > start, loff_t end,
> | > >  		if (ret)
> | > >  			return ret;
> | > >  		if (gfs2_is_jdata(ip))
> | > > -			filemap_write_and_wait(mapping);
> | > > +			ret = file_write_and_wait(file);
> | > > +		if (ret)
> | > > +			return ret;
> | > >  		gfs2_ail_flush(ip->i_gl, 1);
> | > >  	}
> | >
> | > Do we want to skip flushing the AIL if there was an error (possibly
> | > previously encountered)?  I'd think we'd want to flush the AIL then report
> | > the error, like this:
> | >
> |
> | I wondered about that. Note that earlier in the function, we also bail
> | out without flushing the AIL if sync_inode_metadata fails, so I assumed
> | that we'd want to do the same here.
> |
> | I could definitely be wrong and am fine with changing it if so.
> | Discarding the error like we do today seems wrong though.
> |
> | Bob, thoughts?
>
> Hi Jeff, Matthew,
>
> I'm not sure there's a right or wrong answer here. I don't know what's
> best from a "correctness" point of view.
>
> I guess I'm leaning toward Jeff's original solution where we don't
> call gfs2_ail_flush() on error. The main purpose of ail_flush is to
> go through buffer descriptors (bds) attached to the glock and generate
> revokes for them in a new transaction. If there's an error condition,
> trying to go through more hoops will probably just get us into more
> trouble. If the error is -ENOMEM, we don't want to allocate new memory
> for the new transaction. If the error is -EIO, we probably don't
> want to encourage more writing either.
>
> So on the one hand, it might be good to get rid of the buffer descriptors
> so we don't leak memory, but that's probably also done elsewhere.
> I have not chased down what happens in that case, but the same thing
> would happen in the existing -EIO case a few lines above.
>
> On the other hand, we probably don't want to start a new transaction
> and start adding revokes to it, and such, due to the error.
>
> Perhaps Steve Whitehouse can weigh in?
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems

Yes, we probably do want to skip the ail flush if there is an error. We 
don't know whether the error is permanent or transient at that stage. If 
a previous stage of the fsync has failed, then there may be nothing for 
the next stage to do anyway, so it is probably not a big deal either 
way. So long as the error is reported to the caller, then we should be ok,

Steve.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
  2017-07-28 12:37           ` Steven Whitehouse
  (?)
@ 2017-07-28 12:47             ` Jeff Layton
  -1 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-28 12:47 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote:
> Hi,
> 
> 
> On 27/07/17 13:47, Bob Peterson wrote:
> > ----- Original Message -----
> > > On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
> > > > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
> > > > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t
> > > > > start, loff_t end,
> > > > >  		if (ret)
> > > > >  			return ret;
> > > > >  		if (gfs2_is_jdata(ip))
> > > > > -			filemap_write_and_wait(mapping);
> > > > > +			ret = file_write_and_wait(file);
> > > > > +		if (ret)
> > > > > +			return ret;
> > > > >  		gfs2_ail_flush(ip->i_gl, 1);
> > > > >  	}
> > > > 
> > > > Do we want to skip flushing the AIL if there was an error (possibly
> > > > previously encountered)?  I'd think we'd want to flush the AIL then report
> > > > the error, like this:
> > > > 
> > > 
> > > I wondered about that. Note that earlier in the function, we also bail
> > > out without flushing the AIL if sync_inode_metadata fails, so I assumed
> > > that we'd want to do the same here.
> > > 
> > > I could definitely be wrong and am fine with changing it if so.
> > > Discarding the error like we do today seems wrong though.
> > > 
> > > Bob, thoughts?
> > 
> > Hi Jeff, Matthew,
> > 
> > I'm not sure there's a right or wrong answer here. I don't know what's
> > best from a "correctness" point of view.
> > 
> > I guess I'm leaning toward Jeff's original solution where we don't
> > call gfs2_ail_flush() on error. The main purpose of ail_flush is to
> > go through buffer descriptors (bds) attached to the glock and generate
> > revokes for them in a new transaction. If there's an error condition,
> > trying to go through more hoops will probably just get us into more
> > trouble. If the error is -ENOMEM, we don't want to allocate new memory
> > for the new transaction. If the error is -EIO, we probably don't
> > want to encourage more writing either.
> > 
> > So on the one hand, it might be good to get rid of the buffer descriptors
> > so we don't leak memory, but that's probably also done elsewhere.
> > I have not chased down what happens in that case, but the same thing
> > would happen in the existing -EIO case a few lines above.
> > 
> > On the other hand, we probably don't want to start a new transaction
> > and start adding revokes to it, and such, due to the error.
> > 
> > Perhaps Steve Whitehouse can weigh in?
> > 
> > Regards,
> > 
> > Bob Peterson
> > Red Hat File Systems
> 
> Yes, we probably do want to skip the ail flush if there is an error. We 
> don't know whether the error is permanent or transient at that stage. If 
> a previous stage of the fsync has failed, then there may be nothing for 
> the next stage to do anyway, so it is probably not a big deal either 
> way. So long as the error is reported to the caller, then we should be ok,
> 

Ok, cool. I'll plan to carry this patch for now as it depends on an
earlier one in the series. One more question though:

Is it correct in the gfs2_is_jdata case to ignore the range that was
passed in from the caller? ->fsync gets start and end arguments, but
this will always write back the whole range. Is that necessary in this
case?

-- 
Jeff Layton <jlayton@redhat.com>



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
@ 2017-07-28 12:47             ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-28 12:47 UTC (permalink / raw)
  To: Steven Whitehouse, Bob Peterson
  Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, cluster-devel

On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote:
> Hi,
> 
> 
> On 27/07/17 13:47, Bob Peterson wrote:
> > ----- Original Message -----
> > > On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
> > > > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
> > > > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t
> > > > > start, loff_t end,
> > > > >  		if (ret)
> > > > >  			return ret;
> > > > >  		if (gfs2_is_jdata(ip))
> > > > > -			filemap_write_and_wait(mapping);
> > > > > +			ret = file_write_and_wait(file);
> > > > > +		if (ret)
> > > > > +			return ret;
> > > > >  		gfs2_ail_flush(ip->i_gl, 1);
> > > > >  	}
> > > > 
> > > > Do we want to skip flushing the AIL if there was an error (possibly
> > > > previously encountered)?  I'd think we'd want to flush the AIL then report
> > > > the error, like this:
> > > > 
> > > 
> > > I wondered about that. Note that earlier in the function, we also bail
> > > out without flushing the AIL if sync_inode_metadata fails, so I assumed
> > > that we'd want to do the same here.
> > > 
> > > I could definitely be wrong and am fine with changing it if so.
> > > Discarding the error like we do today seems wrong though.
> > > 
> > > Bob, thoughts?
> > 
> > Hi Jeff, Matthew,
> > 
> > I'm not sure there's a right or wrong answer here. I don't know what's
> > best from a "correctness" point of view.
> > 
> > I guess I'm leaning toward Jeff's original solution where we don't
> > call gfs2_ail_flush() on error. The main purpose of ail_flush is to
> > go through buffer descriptors (bds) attached to the glock and generate
> > revokes for them in a new transaction. If there's an error condition,
> > trying to go through more hoops will probably just get us into more
> > trouble. If the error is -ENOMEM, we don't want to allocate new memory
> > for the new transaction. If the error is -EIO, we probably don't
> > want to encourage more writing either.
> > 
> > So on the one hand, it might be good to get rid of the buffer descriptors
> > so we don't leak memory, but that's probably also done elsewhere.
> > I have not chased down what happens in that case, but the same thing
> > would happen in the existing -EIO case a few lines above.
> > 
> > On the other hand, we probably don't want to start a new transaction
> > and start adding revokes to it, and such, due to the error.
> > 
> > Perhaps Steve Whitehouse can weigh in?
> > 
> > Regards,
> > 
> > Bob Peterson
> > Red Hat File Systems
> 
> Yes, we probably do want to skip the ail flush if there is an error. We 
> don't know whether the error is permanent or transient at that stage. If 
> a previous stage of the fsync has failed, then there may be nothing for 
> the next stage to do anyway, so it is probably not a big deal either 
> way. So long as the error is reported to the caller, then we should be ok,
> 

Ok, cool. I'll plan to carry this patch for now as it depends on an
earlier one in the series. One more question though:

Is it correct in the gfs2_is_jdata case to ignore the range that was
passed in from the caller? ->fsync gets start and end arguments, but
this will always write back the whole range. Is that necessary in this
case?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
@ 2017-07-28 12:47             ` Jeff Layton
  0 siblings, 0 replies; 87+ messages in thread
From: Jeff Layton @ 2017-07-28 12:47 UTC (permalink / raw)
  To: Steven Whitehouse, Bob Peterson
  Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, cluster-devel

On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote:
> Hi,
> 
> 
> On 27/07/17 13:47, Bob Peterson wrote:
> > ----- Original Message -----
> > > On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
> > > > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
> > > > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t
> > > > > start, loff_t end,
> > > > >  		if (ret)
> > > > >  			return ret;
> > > > >  		if (gfs2_is_jdata(ip))
> > > > > -			filemap_write_and_wait(mapping);
> > > > > +			ret = file_write_and_wait(file);
> > > > > +		if (ret)
> > > > > +			return ret;
> > > > >  		gfs2_ail_flush(ip->i_gl, 1);
> > > > >  	}
> > > > 
> > > > Do we want to skip flushing the AIL if there was an error (possibly
> > > > previously encountered)?  I'd think we'd want to flush the AIL then report
> > > > the error, like this:
> > > > 
> > > 
> > > I wondered about that. Note that earlier in the function, we also bail
> > > out without flushing the AIL if sync_inode_metadata fails, so I assumed
> > > that we'd want to do the same here.
> > > 
> > > I could definitely be wrong and am fine with changing it if so.
> > > Discarding the error like we do today seems wrong though.
> > > 
> > > Bob, thoughts?
> > 
> > Hi Jeff, Matthew,
> > 
> > I'm not sure there's a right or wrong answer here. I don't know what's
> > best from a "correctness" point of view.
> > 
> > I guess I'm leaning toward Jeff's original solution where we don't
> > call gfs2_ail_flush() on error. The main purpose of ail_flush is to
> > go through buffer descriptors (bds) attached to the glock and generate
> > revokes for them in a new transaction. If there's an error condition,
> > trying to go through more hoops will probably just get us into more
> > trouble. If the error is -ENOMEM, we don't want to allocate new memory
> > for the new transaction. If the error is -EIO, we probably don't
> > want to encourage more writing either.
> > 
> > So on the one hand, it might be good to get rid of the buffer descriptors
> > so we don't leak memory, but that's probably also done elsewhere.
> > I have not chased down what happens in that case, but the same thing
> > would happen in the existing -EIO case a few lines above.
> > 
> > On the other hand, we probably don't want to start a new transaction
> > and start adding revokes to it, and such, due to the error.
> > 
> > Perhaps Steve Whitehouse can weigh in?
> > 
> > Regards,
> > 
> > Bob Peterson
> > Red Hat File Systems
> 
> Yes, we probably do want to skip the ail flush if there is an error. We 
> don't know whether the error is permanent or transient at that stage. If 
> a previous stage of the fsync has failed, then there may be nothing for 
> the next stage to do anyway, so it is probably not a big deal either 
> way. So long as the error is reported to the caller, then we should be ok,
> 

Ok, cool. I'll plan to carry this patch for now as it depends on an
earlier one in the series. One more question though:

Is it correct in the gfs2_is_jdata case to ignore the range that was
passed in from the caller? ->fsync gets start and end arguments, but
this will always write back the whole range. Is that necessary in this
case?

-- 
Jeff Layton <jlayton@redhat.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
  2017-07-28 12:47             ` Jeff Layton
  (?)
@ 2017-07-28 12:54               ` Steven Whitehouse
  -1 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-28 12:54 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,


On 28/07/17 13:47, Jeff Layton wrote:
> On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote:
>> Hi,
>>
>>
>> On 27/07/17 13:47, Bob Peterson wrote:
>>> ----- Original Message -----
>>>> On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
>>>>> On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
>>>>>> @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t
>>>>>> start, loff_t end,
>>>>>>   		if (ret)
>>>>>>   			return ret;
>>>>>>   		if (gfs2_is_jdata(ip))
>>>>>> -			filemap_write_and_wait(mapping);
>>>>>> +			ret = file_write_and_wait(file);
>>>>>> +		if (ret)
>>>>>> +			return ret;
>>>>>>   		gfs2_ail_flush(ip->i_gl, 1);
>>>>>>   	}
>>>>> Do we want to skip flushing the AIL if there was an error (possibly
>>>>> previously encountered)?  I'd think we'd want to flush the AIL then report
>>>>> the error, like this:
>>>>>
>>>> I wondered about that. Note that earlier in the function, we also bail
>>>> out without flushing the AIL if sync_inode_metadata fails, so I assumed
>>>> that we'd want to do the same here.
>>>>
>>>> I could definitely be wrong and am fine with changing it if so.
>>>> Discarding the error like we do today seems wrong though.
>>>>
>>>> Bob, thoughts?
>>> Hi Jeff, Matthew,
>>>
>>> I'm not sure there's a right or wrong answer here. I don't know what's
>>> best from a "correctness" point of view.
>>>
>>> I guess I'm leaning toward Jeff's original solution where we don't
>>> call gfs2_ail_flush() on error. The main purpose of ail_flush is to
>>> go through buffer descriptors (bds) attached to the glock and generate
>>> revokes for them in a new transaction. If there's an error condition,
>>> trying to go through more hoops will probably just get us into more
>>> trouble. If the error is -ENOMEM, we don't want to allocate new memory
>>> for the new transaction. If the error is -EIO, we probably don't
>>> want to encourage more writing either.
>>>
>>> So on the one hand, it might be good to get rid of the buffer descriptors
>>> so we don't leak memory, but that's probably also done elsewhere.
>>> I have not chased down what happens in that case, but the same thing
>>> would happen in the existing -EIO case a few lines above.
>>>
>>> On the other hand, we probably don't want to start a new transaction
>>> and start adding revokes to it, and such, due to the error.
>>>
>>> Perhaps Steve Whitehouse can weigh in?
>>>
>>> Regards,
>>>
>>> Bob Peterson
>>> Red Hat File Systems
>> Yes, we probably do want to skip the ail flush if there is an error. We
>> don't know whether the error is permanent or transient at that stage. If
>> a previous stage of the fsync has failed, then there may be nothing for
>> the next stage to do anyway, so it is probably not a big deal either
>> way. So long as the error is reported to the caller, then we should be ok,
>>
> Ok, cool. I'll plan to carry this patch for now as it depends on an
> earlier one in the series. One more question though:
>
> Is it correct in the gfs2_is_jdata case to ignore the range that was
> passed in from the caller? ->fsync gets start and end arguments, but
> this will always write back the whole range. Is that necessary in this
> case?
>
It probably doesn't matter really. We try to discourage the use of jdata 
from userspace. There are a few internal files that use it still, and it 
is there for backwards compatibility more than anything. So performance 
is generally not a problem for that. The ordered write mode is the 
important one.

So you are right that it might be better to add the range into that call 
too, but it is not likely that anybody will notice the performance 
improvement,

Steve.



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
@ 2017-07-28 12:54               ` Steven Whitehouse
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-28 12:54 UTC (permalink / raw)
  To: Jeff Layton, Bob Peterson
  Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, cluster-devel

Hi,


On 28/07/17 13:47, Jeff Layton wrote:
> On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote:
>> Hi,
>>
>>
>> On 27/07/17 13:47, Bob Peterson wrote:
>>> ----- Original Message -----
>>>> On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
>>>>> On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
>>>>>> @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t
>>>>>> start, loff_t end,
>>>>>>   		if (ret)
>>>>>>   			return ret;
>>>>>>   		if (gfs2_is_jdata(ip))
>>>>>> -			filemap_write_and_wait(mapping);
>>>>>> +			ret = file_write_and_wait(file);
>>>>>> +		if (ret)
>>>>>> +			return ret;
>>>>>>   		gfs2_ail_flush(ip->i_gl, 1);
>>>>>>   	}
>>>>> Do we want to skip flushing the AIL if there was an error (possibly
>>>>> previously encountered)?  I'd think we'd want to flush the AIL then report
>>>>> the error, like this:
>>>>>
>>>> I wondered about that. Note that earlier in the function, we also bail
>>>> out without flushing the AIL if sync_inode_metadata fails, so I assumed
>>>> that we'd want to do the same here.
>>>>
>>>> I could definitely be wrong and am fine with changing it if so.
>>>> Discarding the error like we do today seems wrong though.
>>>>
>>>> Bob, thoughts?
>>> Hi Jeff, Matthew,
>>>
>>> I'm not sure there's a right or wrong answer here. I don't know what's
>>> best from a "correctness" point of view.
>>>
>>> I guess I'm leaning toward Jeff's original solution where we don't
>>> call gfs2_ail_flush() on error. The main purpose of ail_flush is to
>>> go through buffer descriptors (bds) attached to the glock and generate
>>> revokes for them in a new transaction. If there's an error condition,
>>> trying to go through more hoops will probably just get us into more
>>> trouble. If the error is -ENOMEM, we don't want to allocate new memory
>>> for the new transaction. If the error is -EIO, we probably don't
>>> want to encourage more writing either.
>>>
>>> So on the one hand, it might be good to get rid of the buffer descriptors
>>> so we don't leak memory, but that's probably also done elsewhere.
>>> I have not chased down what happens in that case, but the same thing
>>> would happen in the existing -EIO case a few lines above.
>>>
>>> On the other hand, we probably don't want to start a new transaction
>>> and start adding revokes to it, and such, due to the error.
>>>
>>> Perhaps Steve Whitehouse can weigh in?
>>>
>>> Regards,
>>>
>>> Bob Peterson
>>> Red Hat File Systems
>> Yes, we probably do want to skip the ail flush if there is an error. We
>> don't know whether the error is permanent or transient at that stage. If
>> a previous stage of the fsync has failed, then there may be nothing for
>> the next stage to do anyway, so it is probably not a big deal either
>> way. So long as the error is reported to the caller, then we should be ok,
>>
> Ok, cool. I'll plan to carry this patch for now as it depends on an
> earlier one in the series. One more question though:
>
> Is it correct in the gfs2_is_jdata case to ignore the range that was
> passed in from the caller? ->fsync gets start and end arguments, but
> this will always write back the whole range. Is that necessary in this
> case?
>
It probably doesn't matter really. We try to discourage the use of jdata 
from userspace. There are a few internal files that use it still, and it 
is there for backwards compatibility more than anything. So performance 
is generally not a problem for that. The ordered write mode is the 
important one.

So you are right that it might be better to add the range into that call 
too, but it is not likely that anybody will notice the performance 
improvement,

Steve.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync
@ 2017-07-28 12:54               ` Steven Whitehouse
  0 siblings, 0 replies; 87+ messages in thread
From: Steven Whitehouse @ 2017-07-28 12:54 UTC (permalink / raw)
  To: Jeff Layton, Bob Peterson
  Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara,
	J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel,
	linux-mm, cluster-devel

Hi,


On 28/07/17 13:47, Jeff Layton wrote:
> On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote:
>> Hi,
>>
>>
>> On 27/07/17 13:47, Bob Peterson wrote:
>>> ----- Original Message -----
>>>> On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
>>>>> On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
>>>>>> @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t
>>>>>> start, loff_t end,
>>>>>>   		if (ret)
>>>>>>   			return ret;
>>>>>>   		if (gfs2_is_jdata(ip))
>>>>>> -			filemap_write_and_wait(mapping);
>>>>>> +			ret = file_write_and_wait(file);
>>>>>> +		if (ret)
>>>>>> +			return ret;
>>>>>>   		gfs2_ail_flush(ip->i_gl, 1);
>>>>>>   	}
>>>>> Do we want to skip flushing the AIL if there was an error (possibly
>>>>> previously encountered)?  I'd think we'd want to flush the AIL then report
>>>>> the error, like this:
>>>>>
>>>> I wondered about that. Note that earlier in the function, we also bail
>>>> out without flushing the AIL if sync_inode_metadata fails, so I assumed
>>>> that we'd want to do the same here.
>>>>
>>>> I could definitely be wrong and am fine with changing it if so.
>>>> Discarding the error like we do today seems wrong though.
>>>>
>>>> Bob, thoughts?
>>> Hi Jeff, Matthew,
>>>
>>> I'm not sure there's a right or wrong answer here. I don't know what's
>>> best from a "correctness" point of view.
>>>
>>> I guess I'm leaning toward Jeff's original solution where we don't
>>> call gfs2_ail_flush() on error. The main purpose of ail_flush is to
>>> go through buffer descriptors (bds) attached to the glock and generate
>>> revokes for them in a new transaction. If there's an error condition,
>>> trying to go through more hoops will probably just get us into more
>>> trouble. If the error is -ENOMEM, we don't want to allocate new memory
>>> for the new transaction. If the error is -EIO, we probably don't
>>> want to encourage more writing either.
>>>
>>> So on the one hand, it might be good to get rid of the buffer descriptors
>>> so we don't leak memory, but that's probably also done elsewhere.
>>> I have not chased down what happens in that case, but the same thing
>>> would happen in the existing -EIO case a few lines above.
>>>
>>> On the other hand, we probably don't want to start a new transaction
>>> and start adding revokes to it, and such, due to the error.
>>>
>>> Perhaps Steve Whitehouse can weigh in?
>>>
>>> Regards,
>>>
>>> Bob Peterson
>>> Red Hat File Systems
>> Yes, we probably do want to skip the ail flush if there is an error. We
>> don't know whether the error is permanent or transient at that stage. If
>> a previous stage of the fsync has failed, then there may be nothing for
>> the next stage to do anyway, so it is probably not a big deal either
>> way. So long as the error is reported to the caller, then we should be ok,
>>
> Ok, cool. I'll plan to carry this patch for now as it depends on an
> earlier one in the series. One more question though:
>
> Is it correct in the gfs2_is_jdata case to ignore the range that was
> passed in from the caller? ->fsync gets start and end arguments, but
> this will always write back the whole range. Is that necessary in this
> case?
>
It probably doesn't matter really. We try to discourage the use of jdata 
from userspace. There are a few internal files that use it still, and it 
is there for backwards compatibility more than anything. So performance 
is generally not a problem for that. The ordered write mode is the 
important one.

So you are right that it might be better to add the range into that call 
too, but it is not likely that anybody will notice the performance 
improvement,

Steve.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2017-08-01  9:52 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-26 17:55 [Cluster-devel] [PATCH v2 0/4] mm/gfs2: extend file_* API, and convert gfs2 to errseq_t error reporting Jeff Layton
2017-07-26 17:55 ` Jeff Layton
2017-07-26 17:55 ` Jeff Layton
2017-07-26 17:55 ` [Cluster-devel] [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback Jeff Layton
2017-07-26 17:55   ` Jeff Layton
2017-07-26 17:55   ` Jeff Layton
2017-07-27  8:43   ` [Cluster-devel] " Jan Kara
2017-07-27  8:43     ` Jan Kara
2017-07-27  8:43     ` Jan Kara
2017-07-26 17:55 ` [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait Jeff Layton
2017-07-26 17:55   ` Jeff Layton
2017-07-26 17:55   ` Jeff Layton
2017-07-26 19:13   ` [Cluster-devel] " Matthew Wilcox
2017-07-26 19:13     ` Matthew Wilcox
2017-07-26 19:13     ` Matthew Wilcox
2017-07-26 22:18     ` [Cluster-devel] " Jeff Layton
2017-07-26 22:18       ` Jeff Layton
2017-07-26 22:18       ` Jeff Layton
2017-07-26 19:50   ` [Cluster-devel] " Bob Peterson
2017-07-26 19:50     ` Bob Peterson
2017-07-26 19:50     ` Bob Peterson
2017-07-27  8:49   ` [Cluster-devel] " Jan Kara
2017-07-27  8:49     ` Jan Kara
2017-07-27  8:49     ` Jan Kara
2017-07-27 12:48     ` [Cluster-devel] " Jeff Layton
2017-07-27 12:48       ` Jeff Layton
2017-07-27 12:48       ` Jeff Layton
2017-07-31 11:27       ` [Cluster-devel] " Jeff Layton
2017-07-31 11:27         ` Jeff Layton
2017-07-31 11:27         ` Jeff Layton
2017-07-31 11:32         ` [Cluster-devel] " Steven Whitehouse
2017-07-31 11:32           ` Steven Whitehouse
2017-07-31 11:32           ` Steven Whitehouse
2017-07-31 11:44           ` [Cluster-devel] " Jeff Layton
2017-07-31 11:44             ` Jeff Layton
2017-07-31 11:44             ` Jeff Layton
2017-07-31 12:05             ` [Cluster-devel] " Steven Whitehouse
2017-07-31 12:05               ` Steven Whitehouse
2017-07-31 12:05               ` Steven Whitehouse
2017-07-31 12:22               ` [Cluster-devel] " Jeff Layton
2017-07-31 12:22                 ` Jeff Layton
2017-07-31 12:22                 ` Jeff Layton
2017-07-31 12:25                 ` [Cluster-devel] " Steven Whitehouse
2017-07-31 12:25                   ` Steven Whitehouse
2017-07-31 12:25                   ` Steven Whitehouse
2017-07-31 12:38                 ` [Cluster-devel] " Bob Peterson
2017-07-31 12:38                   ` Bob Peterson
2017-07-31 12:38                   ` Bob Peterson
2017-07-31 12:07             ` [Cluster-devel] " Jan Kara
2017-07-31 12:07               ` Jan Kara
2017-07-31 12:07               ` Jan Kara
2017-07-31 13:00               ` [Cluster-devel] " Jeff Layton
2017-07-31 13:00                 ` Jeff Layton
2017-07-31 13:00                 ` Jeff Layton
2017-07-31 13:32                 ` [Cluster-devel] " Jan Kara
2017-07-31 13:32                   ` Jan Kara
2017-07-31 13:32                   ` Jan Kara
2017-07-31 16:49   ` [Cluster-devel] [PATCH v3] " Jeff Layton
2017-07-31 16:49     ` Jeff Layton
2017-07-31 16:49     ` Jeff Layton
2017-08-01  9:52     ` [Cluster-devel] " Jan Kara
2017-08-01  9:52       ` Jan Kara
2017-08-01  9:52       ` Jan Kara
2017-07-26 17:55 ` [Cluster-devel] [PATCH v2 3/4] fs: convert sync_file_range to use errseq_t based error-tracking Jeff Layton
2017-07-26 17:55   ` Jeff Layton
2017-07-26 17:55   ` Jeff Layton
2017-07-26 17:55 ` [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync Jeff Layton
2017-07-26 17:55   ` Jeff Layton
2017-07-26 17:55   ` Jeff Layton
2017-07-26 19:21   ` [Cluster-devel] " Matthew Wilcox
2017-07-26 19:21     ` Matthew Wilcox
2017-07-26 19:21     ` Matthew Wilcox
2017-07-26 22:22     ` [Cluster-devel] " Jeff Layton
2017-07-26 22:22       ` Jeff Layton
2017-07-26 22:22       ` Jeff Layton
2017-07-27 12:47       ` [Cluster-devel] " Bob Peterson
2017-07-27 12:47         ` Bob Peterson
2017-07-27 12:47         ` Bob Peterson
2017-07-28 12:37         ` [Cluster-devel] " Steven Whitehouse
2017-07-28 12:37           ` Steven Whitehouse
2017-07-28 12:37           ` Steven Whitehouse
2017-07-28 12:47           ` [Cluster-devel] " Jeff Layton
2017-07-28 12:47             ` Jeff Layton
2017-07-28 12:47             ` Jeff Layton
2017-07-28 12:54             ` [Cluster-devel] " Steven Whitehouse
2017-07-28 12:54               ` Steven Whitehouse
2017-07-28 12:54               ` Steven Whitehouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.