* [Cluster-devel] [PATCH v2 0/4] mm/gfs2: extend file_* API, and convert gfs2 to errseq_t error reporting @ 2017-07-26 17:55 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: cluster-devel.redhat.com From: Jeff Layton <jlayton@redhat.com> I sent a small patch earlier this week to make sync_file_range use errseq_t reporting. This set respins that patch into a patch that adds a bit more file_* infrastructure, and then patches to make sync_file_range and fsync on gfs2 report writeback errors properly. There's also a small cleanup patch for mm/filemap.c to consolidate the DAX handling checks in the existing infrastructure. Jeff Layton (4): mm: consolidate dax / non-dax checks for writeback mm: add file_fdatawait_range and file_write_and_wait fs: convert sync_file_range to use errseq_t based error-tracking gfs2: convert to errseq_t based writeback error reporting for fsync fs/gfs2/file.c | 6 +++-- fs/sync.c | 4 +-- include/linux/fs.h | 7 +++++- mm/filemap.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++----- 4 files changed, 77 insertions(+), 11 deletions(-) -- 2.13.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v2 0/4] mm/gfs2: extend file_* API, and convert gfs2 to errseq_t error reporting @ 2017-07-26 17:55 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: Alexander Viro, Jan Kara Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel From: Jeff Layton <jlayton@redhat.com> I sent a small patch earlier this week to make sync_file_range use errseq_t reporting. This set respins that patch into a patch that adds a bit more file_* infrastructure, and then patches to make sync_file_range and fsync on gfs2 report writeback errors properly. There's also a small cleanup patch for mm/filemap.c to consolidate the DAX handling checks in the existing infrastructure. Jeff Layton (4): mm: consolidate dax / non-dax checks for writeback mm: add file_fdatawait_range and file_write_and_wait fs: convert sync_file_range to use errseq_t based error-tracking gfs2: convert to errseq_t based writeback error reporting for fsync fs/gfs2/file.c | 6 +++-- fs/sync.c | 4 +-- include/linux/fs.h | 7 +++++- mm/filemap.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++----- 4 files changed, 77 insertions(+), 11 deletions(-) -- 2.13.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v2 0/4] mm/gfs2: extend file_* API, and convert gfs2 to errseq_t error reporting @ 2017-07-26 17:55 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: Alexander Viro, Jan Kara Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel From: Jeff Layton <jlayton@redhat.com> I sent a small patch earlier this week to make sync_file_range use errseq_t reporting. This set respins that patch into a patch that adds a bit more file_* infrastructure, and then patches to make sync_file_range and fsync on gfs2 report writeback errors properly. There's also a small cleanup patch for mm/filemap.c to consolidate the DAX handling checks in the existing infrastructure. Jeff Layton (4): mm: consolidate dax / non-dax checks for writeback mm: add file_fdatawait_range and file_write_and_wait fs: convert sync_file_range to use errseq_t based error-tracking gfs2: convert to errseq_t based writeback error reporting for fsync fs/gfs2/file.c | 6 +++-- fs/sync.c | 4 +-- include/linux/fs.h | 7 +++++- mm/filemap.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++----- 4 files changed, 77 insertions(+), 11 deletions(-) -- 2.13.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback 2017-07-26 17:55 ` Jeff Layton (?) @ 2017-07-26 17:55 ` Jeff Layton -1 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: cluster-devel.redhat.com From: Jeff Layton <jlayton@redhat.com> We have this complex conditional copied to several places. Turn it into a helper function. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- mm/filemap.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index e1cca770688f..72e46e6f0d9a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -522,12 +522,17 @@ int filemap_fdatawait(struct address_space *mapping) } EXPORT_SYMBOL(filemap_fdatawait); +static bool mapping_needs_writeback(struct address_space *mapping) +{ + return (!dax_mapping(mapping) && mapping->nrpages) || + (dax_mapping(mapping) && mapping->nrexceptional); +} + int filemap_write_and_wait(struct address_space *mapping) { int err = 0; - if ((!dax_mapping(mapping) && mapping->nrpages) || - (dax_mapping(mapping) && mapping->nrexceptional)) { + if (mapping_needs_writeback(mapping)) { err = filemap_fdatawrite(mapping); /* * Even if the above returned error, the pages may be @@ -566,8 +571,7 @@ int filemap_write_and_wait_range(struct address_space *mapping, { int err = 0; - if ((!dax_mapping(mapping) && mapping->nrpages) || - (dax_mapping(mapping) && mapping->nrexceptional)) { + if (mapping_needs_writeback(mapping)) { err = __filemap_fdatawrite_range(mapping, lstart, lend, WB_SYNC_ALL); /* See comment of filemap_write_and_wait() */ @@ -656,8 +660,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend) int err = 0, err2; struct address_space *mapping = file->f_mapping; - if ((!dax_mapping(mapping) && mapping->nrpages) || - (dax_mapping(mapping) && mapping->nrexceptional)) { + if (mapping_needs_writeback(mapping)) { err = __filemap_fdatawrite_range(mapping, lstart, lend, WB_SYNC_ALL); /* See comment of filemap_write_and_wait() */ -- 2.13.3 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback @ 2017-07-26 17:55 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: Alexander Viro, Jan Kara Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel From: Jeff Layton <jlayton@redhat.com> We have this complex conditional copied to several places. Turn it into a helper function. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- mm/filemap.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index e1cca770688f..72e46e6f0d9a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -522,12 +522,17 @@ int filemap_fdatawait(struct address_space *mapping) } EXPORT_SYMBOL(filemap_fdatawait); +static bool mapping_needs_writeback(struct address_space *mapping) +{ + return (!dax_mapping(mapping) && mapping->nrpages) || + (dax_mapping(mapping) && mapping->nrexceptional); +} + int filemap_write_and_wait(struct address_space *mapping) { int err = 0; - if ((!dax_mapping(mapping) && mapping->nrpages) || - (dax_mapping(mapping) && mapping->nrexceptional)) { + if (mapping_needs_writeback(mapping)) { err = filemap_fdatawrite(mapping); /* * Even if the above returned error, the pages may be @@ -566,8 +571,7 @@ int filemap_write_and_wait_range(struct address_space *mapping, { int err = 0; - if ((!dax_mapping(mapping) && mapping->nrpages) || - (dax_mapping(mapping) && mapping->nrexceptional)) { + if (mapping_needs_writeback(mapping)) { err = __filemap_fdatawrite_range(mapping, lstart, lend, WB_SYNC_ALL); /* See comment of filemap_write_and_wait() */ @@ -656,8 +660,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend) int err = 0, err2; struct address_space *mapping = file->f_mapping; - if ((!dax_mapping(mapping) && mapping->nrpages) || - (dax_mapping(mapping) && mapping->nrexceptional)) { + if (mapping_needs_writeback(mapping)) { err = __filemap_fdatawrite_range(mapping, lstart, lend, WB_SYNC_ALL); /* See comment of filemap_write_and_wait() */ -- 2.13.3 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback @ 2017-07-26 17:55 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: Alexander Viro, Jan Kara Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel From: Jeff Layton <jlayton@redhat.com> We have this complex conditional copied to several places. Turn it into a helper function. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- mm/filemap.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index e1cca770688f..72e46e6f0d9a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -522,12 +522,17 @@ int filemap_fdatawait(struct address_space *mapping) } EXPORT_SYMBOL(filemap_fdatawait); +static bool mapping_needs_writeback(struct address_space *mapping) +{ + return (!dax_mapping(mapping) && mapping->nrpages) || + (dax_mapping(mapping) && mapping->nrexceptional); +} + int filemap_write_and_wait(struct address_space *mapping) { int err = 0; - if ((!dax_mapping(mapping) && mapping->nrpages) || - (dax_mapping(mapping) && mapping->nrexceptional)) { + if (mapping_needs_writeback(mapping)) { err = filemap_fdatawrite(mapping); /* * Even if the above returned error, the pages may be @@ -566,8 +571,7 @@ int filemap_write_and_wait_range(struct address_space *mapping, { int err = 0; - if ((!dax_mapping(mapping) && mapping->nrpages) || - (dax_mapping(mapping) && mapping->nrexceptional)) { + if (mapping_needs_writeback(mapping)) { err = __filemap_fdatawrite_range(mapping, lstart, lend, WB_SYNC_ALL); /* See comment of filemap_write_and_wait() */ @@ -656,8 +660,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend) int err = 0, err2; struct address_space *mapping = file->f_mapping; - if ((!dax_mapping(mapping) && mapping->nrpages) || - (dax_mapping(mapping) && mapping->nrexceptional)) { + if (mapping_needs_writeback(mapping)) { err = __filemap_fdatawrite_range(mapping, lstart, lend, WB_SYNC_ALL); /* See comment of filemap_write_and_wait() */ -- 2.13.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback 2017-07-26 17:55 ` Jeff Layton (?) @ 2017-07-27 8:43 ` Jan Kara -1 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-07-27 8:43 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed 26-07-17 13:55:35, Jeff Layton wrote: > From: Jeff Layton <jlayton@redhat.com> > > We have this complex conditional copied to several places. Turn it into > a helper function. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> Looks good. You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > mm/filemap.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index e1cca770688f..72e46e6f0d9a 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -522,12 +522,17 @@ int filemap_fdatawait(struct address_space *mapping) > } > EXPORT_SYMBOL(filemap_fdatawait); > > +static bool mapping_needs_writeback(struct address_space *mapping) > +{ > + return (!dax_mapping(mapping) && mapping->nrpages) || > + (dax_mapping(mapping) && mapping->nrexceptional); > +} > + > int filemap_write_and_wait(struct address_space *mapping) > { > int err = 0; > > - if ((!dax_mapping(mapping) && mapping->nrpages) || > - (dax_mapping(mapping) && mapping->nrexceptional)) { > + if (mapping_needs_writeback(mapping)) { > err = filemap_fdatawrite(mapping); > /* > * Even if the above returned error, the pages may be > @@ -566,8 +571,7 @@ int filemap_write_and_wait_range(struct address_space *mapping, > { > int err = 0; > > - if ((!dax_mapping(mapping) && mapping->nrpages) || > - (dax_mapping(mapping) && mapping->nrexceptional)) { > + if (mapping_needs_writeback(mapping)) { > err = __filemap_fdatawrite_range(mapping, lstart, lend, > WB_SYNC_ALL); > /* See comment of filemap_write_and_wait() */ > @@ -656,8 +660,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend) > int err = 0, err2; > struct address_space *mapping = file->f_mapping; > > - if ((!dax_mapping(mapping) && mapping->nrpages) || > - (dax_mapping(mapping) && mapping->nrexceptional)) { > + if (mapping_needs_writeback(mapping)) { > err = __filemap_fdatawrite_range(mapping, lstart, lend, > WB_SYNC_ALL); > /* See comment of filemap_write_and_wait() */ > -- > 2.13.3 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback @ 2017-07-27 8:43 ` Jan Kara 0 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-07-27 8:43 UTC (permalink / raw) To: Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel On Wed 26-07-17 13:55:35, Jeff Layton wrote: > From: Jeff Layton <jlayton@redhat.com> > > We have this complex conditional copied to several places. Turn it into > a helper function. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> Looks good. You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > mm/filemap.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index e1cca770688f..72e46e6f0d9a 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -522,12 +522,17 @@ int filemap_fdatawait(struct address_space *mapping) > } > EXPORT_SYMBOL(filemap_fdatawait); > > +static bool mapping_needs_writeback(struct address_space *mapping) > +{ > + return (!dax_mapping(mapping) && mapping->nrpages) || > + (dax_mapping(mapping) && mapping->nrexceptional); > +} > + > int filemap_write_and_wait(struct address_space *mapping) > { > int err = 0; > > - if ((!dax_mapping(mapping) && mapping->nrpages) || > - (dax_mapping(mapping) && mapping->nrexceptional)) { > + if (mapping_needs_writeback(mapping)) { > err = filemap_fdatawrite(mapping); > /* > * Even if the above returned error, the pages may be > @@ -566,8 +571,7 @@ int filemap_write_and_wait_range(struct address_space *mapping, > { > int err = 0; > > - if ((!dax_mapping(mapping) && mapping->nrpages) || > - (dax_mapping(mapping) && mapping->nrexceptional)) { > + if (mapping_needs_writeback(mapping)) { > err = __filemap_fdatawrite_range(mapping, lstart, lend, > WB_SYNC_ALL); > /* See comment of filemap_write_and_wait() */ > @@ -656,8 +660,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend) > int err = 0, err2; > struct address_space *mapping = file->f_mapping; > > - if ((!dax_mapping(mapping) && mapping->nrpages) || > - (dax_mapping(mapping) && mapping->nrexceptional)) { > + if (mapping_needs_writeback(mapping)) { > err = __filemap_fdatawrite_range(mapping, lstart, lend, > WB_SYNC_ALL); > /* See comment of filemap_write_and_wait() */ > -- > 2.13.3 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback @ 2017-07-27 8:43 ` Jan Kara 0 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-07-27 8:43 UTC (permalink / raw) To: Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel On Wed 26-07-17 13:55:35, Jeff Layton wrote: > From: Jeff Layton <jlayton@redhat.com> > > We have this complex conditional copied to several places. Turn it into > a helper function. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> Looks good. You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > mm/filemap.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index e1cca770688f..72e46e6f0d9a 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -522,12 +522,17 @@ int filemap_fdatawait(struct address_space *mapping) > } > EXPORT_SYMBOL(filemap_fdatawait); > > +static bool mapping_needs_writeback(struct address_space *mapping) > +{ > + return (!dax_mapping(mapping) && mapping->nrpages) || > + (dax_mapping(mapping) && mapping->nrexceptional); > +} > + > int filemap_write_and_wait(struct address_space *mapping) > { > int err = 0; > > - if ((!dax_mapping(mapping) && mapping->nrpages) || > - (dax_mapping(mapping) && mapping->nrexceptional)) { > + if (mapping_needs_writeback(mapping)) { > err = filemap_fdatawrite(mapping); > /* > * Even if the above returned error, the pages may be > @@ -566,8 +571,7 @@ int filemap_write_and_wait_range(struct address_space *mapping, > { > int err = 0; > > - if ((!dax_mapping(mapping) && mapping->nrpages) || > - (dax_mapping(mapping) && mapping->nrexceptional)) { > + if (mapping_needs_writeback(mapping)) { > err = __filemap_fdatawrite_range(mapping, lstart, lend, > WB_SYNC_ALL); > /* See comment of filemap_write_and_wait() */ > @@ -656,8 +660,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend) > int err = 0, err2; > struct address_space *mapping = file->f_mapping; > > - if ((!dax_mapping(mapping) && mapping->nrpages) || > - (dax_mapping(mapping) && mapping->nrexceptional)) { > + if (mapping_needs_writeback(mapping)) { > err = __filemap_fdatawrite_range(mapping, lstart, lend, > WB_SYNC_ALL); > /* See comment of filemap_write_and_wait() */ > -- > 2.13.3 > -- Jan Kara <jack@suse.com> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-26 17:55 ` Jeff Layton (?) @ 2017-07-26 17:55 ` Jeff Layton -1 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: cluster-devel.redhat.com From: Jeff Layton <jlayton@redhat.com> Some filesystem fsync routines will need these. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- include/linux/fs.h | 7 ++++++- mm/filemap.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 62 insertions(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 21e7df1ad613..bc57a79294f0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2544,6 +2544,8 @@ extern int filemap_fdatawait_range(struct address_space *, loff_t lstart, loff_t lend); extern bool filemap_range_has_page(struct address_space *, loff_t lstart, loff_t lend); +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int filemap_write_and_wait(struct address_space *mapping); extern int filemap_write_and_wait_range(struct address_space *mapping, loff_t lstart, loff_t lend); @@ -2552,11 +2554,14 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping, extern int filemap_fdatawrite_range(struct address_space *mapping, loff_t start, loff_t end); extern int filemap_check_errors(struct address_space *mapping); - extern void __filemap_set_wb_err(struct address_space *mapping, int err); + +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int __must_check file_check_and_advance_wb_err(struct file *file); extern int __must_check file_write_and_wait_range(struct file *file, loff_t start, loff_t end); +extern int __must_check file_write_and_wait(struct file *file); /** * filemap_set_wb_err - set a writeback error on an address_space diff --git a/mm/filemap.c b/mm/filemap.c index 72e46e6f0d9a..b904a8dfa43d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, EXPORT_SYMBOL(filemap_fdatawait_range); /** + * file_fdatawait_range - wait for writeback to complete + * @file: file pointing to address space structure to wait for + * @start_byte: offset in bytes where the range starts + * @end_byte: offset in bytes where the range ends (inclusive) + * + * Walk the list of under-writeback pages of the address space that file + * refers to, in the given range and wait for all of them. Check error + * status of the address space vs. the file->f_wb_err cursor and return it. + * + * Since the error status of the file is advanced by this function, + * callers are responsible for checking the return value and handling and/or + * reporting the error. + */ +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte) +{ + struct address_space *mapping = file->f_mapping; + + __filemap_fdatawait_range(mapping, start_byte, end_byte); + return file_check_and_advance_wb_err(file); +} +EXPORT_SYMBOL(file_fdatawait_range); + +/** * filemap_fdatawait_keep_errors - wait for writeback without clearing errors * @mapping: address space structure to wait for * @@ -675,6 +698,39 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend) EXPORT_SYMBOL(file_write_and_wait_range); /** + * file_write_and_wait - write out whole file and wait on it and return any + * writeback errors since we last checked + * @file: file to write back and wait on + * + * Write back the whole file and wait on its mapping. Afterward, check for + * errors that may have occurred since our file->f_wb_err cursor was last + * updated. + */ +int file_write_and_wait(struct file *file) +{ + int err = 0, err2; + struct address_space *mapping = file->f_mapping; + + if ((!dax_mapping(mapping) && mapping->nrpages) || + (dax_mapping(mapping) && mapping->nrexceptional)) { + err = filemap_fdatawrite(mapping); + /* See comment of filemap_write_and_wait() */ + if (err != -EIO) { + loff_t i_size = i_size_read(mapping->host); + + if (i_size != 0) + __filemap_fdatawait_range(mapping, 0, + i_size - 1); + } + } + err2 = file_check_and_advance_wb_err(file); + if (!err) + err = err2; + return err; +} +EXPORT_SYMBOL(file_write_and_wait); + +/** * replace_page_cache_page - replace a pagecache page with a new one * @old: page to be replaced * @new: page to replace with -- 2.13.3 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-26 17:55 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: Alexander Viro, Jan Kara Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel From: Jeff Layton <jlayton@redhat.com> Some filesystem fsync routines will need these. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- include/linux/fs.h | 7 ++++++- mm/filemap.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 62 insertions(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 21e7df1ad613..bc57a79294f0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2544,6 +2544,8 @@ extern int filemap_fdatawait_range(struct address_space *, loff_t lstart, loff_t lend); extern bool filemap_range_has_page(struct address_space *, loff_t lstart, loff_t lend); +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int filemap_write_and_wait(struct address_space *mapping); extern int filemap_write_and_wait_range(struct address_space *mapping, loff_t lstart, loff_t lend); @@ -2552,11 +2554,14 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping, extern int filemap_fdatawrite_range(struct address_space *mapping, loff_t start, loff_t end); extern int filemap_check_errors(struct address_space *mapping); - extern void __filemap_set_wb_err(struct address_space *mapping, int err); + +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int __must_check file_check_and_advance_wb_err(struct file *file); extern int __must_check file_write_and_wait_range(struct file *file, loff_t start, loff_t end); +extern int __must_check file_write_and_wait(struct file *file); /** * filemap_set_wb_err - set a writeback error on an address_space diff --git a/mm/filemap.c b/mm/filemap.c index 72e46e6f0d9a..b904a8dfa43d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, EXPORT_SYMBOL(filemap_fdatawait_range); /** + * file_fdatawait_range - wait for writeback to complete + * @file: file pointing to address space structure to wait for + * @start_byte: offset in bytes where the range starts + * @end_byte: offset in bytes where the range ends (inclusive) + * + * Walk the list of under-writeback pages of the address space that file + * refers to, in the given range and wait for all of them. Check error + * status of the address space vs. the file->f_wb_err cursor and return it. + * + * Since the error status of the file is advanced by this function, + * callers are responsible for checking the return value and handling and/or + * reporting the error. + */ +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte) +{ + struct address_space *mapping = file->f_mapping; + + __filemap_fdatawait_range(mapping, start_byte, end_byte); + return file_check_and_advance_wb_err(file); +} +EXPORT_SYMBOL(file_fdatawait_range); + +/** * filemap_fdatawait_keep_errors - wait for writeback without clearing errors * @mapping: address space structure to wait for * @@ -675,6 +698,39 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend) EXPORT_SYMBOL(file_write_and_wait_range); /** + * file_write_and_wait - write out whole file and wait on it and return any + * writeback errors since we last checked + * @file: file to write back and wait on + * + * Write back the whole file and wait on its mapping. Afterward, check for + * errors that may have occurred since our file->f_wb_err cursor was last + * updated. + */ +int file_write_and_wait(struct file *file) +{ + int err = 0, err2; + struct address_space *mapping = file->f_mapping; + + if ((!dax_mapping(mapping) && mapping->nrpages) || + (dax_mapping(mapping) && mapping->nrexceptional)) { + err = filemap_fdatawrite(mapping); + /* See comment of filemap_write_and_wait() */ + if (err != -EIO) { + loff_t i_size = i_size_read(mapping->host); + + if (i_size != 0) + __filemap_fdatawait_range(mapping, 0, + i_size - 1); + } + } + err2 = file_check_and_advance_wb_err(file); + if (!err) + err = err2; + return err; +} +EXPORT_SYMBOL(file_write_and_wait); + +/** * replace_page_cache_page - replace a pagecache page with a new one * @old: page to be replaced * @new: page to replace with -- 2.13.3 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-26 17:55 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: Alexander Viro, Jan Kara Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel From: Jeff Layton <jlayton@redhat.com> Some filesystem fsync routines will need these. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- include/linux/fs.h | 7 ++++++- mm/filemap.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 62 insertions(+), 1 deletion(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 21e7df1ad613..bc57a79294f0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2544,6 +2544,8 @@ extern int filemap_fdatawait_range(struct address_space *, loff_t lstart, loff_t lend); extern bool filemap_range_has_page(struct address_space *, loff_t lstart, loff_t lend); +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int filemap_write_and_wait(struct address_space *mapping); extern int filemap_write_and_wait_range(struct address_space *mapping, loff_t lstart, loff_t lend); @@ -2552,11 +2554,14 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping, extern int filemap_fdatawrite_range(struct address_space *mapping, loff_t start, loff_t end); extern int filemap_check_errors(struct address_space *mapping); - extern void __filemap_set_wb_err(struct address_space *mapping, int err); + +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int __must_check file_check_and_advance_wb_err(struct file *file); extern int __must_check file_write_and_wait_range(struct file *file, loff_t start, loff_t end); +extern int __must_check file_write_and_wait(struct file *file); /** * filemap_set_wb_err - set a writeback error on an address_space diff --git a/mm/filemap.c b/mm/filemap.c index 72e46e6f0d9a..b904a8dfa43d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, EXPORT_SYMBOL(filemap_fdatawait_range); /** + * file_fdatawait_range - wait for writeback to complete + * @file: file pointing to address space structure to wait for + * @start_byte: offset in bytes where the range starts + * @end_byte: offset in bytes where the range ends (inclusive) + * + * Walk the list of under-writeback pages of the address space that file + * refers to, in the given range and wait for all of them. Check error + * status of the address space vs. the file->f_wb_err cursor and return it. + * + * Since the error status of the file is advanced by this function, + * callers are responsible for checking the return value and handling and/or + * reporting the error. + */ +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte) +{ + struct address_space *mapping = file->f_mapping; + + __filemap_fdatawait_range(mapping, start_byte, end_byte); + return file_check_and_advance_wb_err(file); +} +EXPORT_SYMBOL(file_fdatawait_range); + +/** * filemap_fdatawait_keep_errors - wait for writeback without clearing errors * @mapping: address space structure to wait for * @@ -675,6 +698,39 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend) EXPORT_SYMBOL(file_write_and_wait_range); /** + * file_write_and_wait - write out whole file and wait on it and return any + * writeback errors since we last checked + * @file: file to write back and wait on + * + * Write back the whole file and wait on its mapping. Afterward, check for + * errors that may have occurred since our file->f_wb_err cursor was last + * updated. + */ +int file_write_and_wait(struct file *file) +{ + int err = 0, err2; + struct address_space *mapping = file->f_mapping; + + if ((!dax_mapping(mapping) && mapping->nrpages) || + (dax_mapping(mapping) && mapping->nrexceptional)) { + err = filemap_fdatawrite(mapping); + /* See comment of filemap_write_and_wait() */ + if (err != -EIO) { + loff_t i_size = i_size_read(mapping->host); + + if (i_size != 0) + __filemap_fdatawait_range(mapping, 0, + i_size - 1); + } + } + err2 = file_check_and_advance_wb_err(file); + if (!err) + err = err2; + return err; +} +EXPORT_SYMBOL(file_write_and_wait); + +/** * replace_page_cache_page - replace a pagecache page with a new one * @old: page to be replaced * @new: page to replace with -- 2.13.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-26 17:55 ` Jeff Layton (?) @ 2017-07-26 19:13 ` Matthew Wilcox -1 siblings, 0 replies; 87+ messages in thread From: Matthew Wilcox @ 2017-07-26 19:13 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed, Jul 26, 2017 at 01:55:36PM -0400, Jeff Layton wrote: > +int file_write_and_wait(struct file *file) > +{ > + int err = 0, err2; > + struct address_space *mapping = file->f_mapping; > + > + if ((!dax_mapping(mapping) && mapping->nrpages) || > + (dax_mapping(mapping) && mapping->nrexceptional)) { Since patch 1 exists, shouldn't this use the new helper? > + err = filemap_fdatawrite(mapping); > + /* See comment of filemap_write_and_wait() */ > + if (err != -EIO) { > + loff_t i_size = i_size_read(mapping->host); > + > + if (i_size != 0) > + __filemap_fdatawait_range(mapping, 0, > + i_size - 1); > + } > + } > + err2 = file_check_and_advance_wb_err(file); > + if (!err) > + err = err2; > + return err; Would this be clearer written as: if (err) return err; return err2; or even ... return err ? err : err2; ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-26 19:13 ` Matthew Wilcox 0 siblings, 0 replies; 87+ messages in thread From: Matthew Wilcox @ 2017-07-26 19:13 UTC (permalink / raw) To: Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Bob Peterson, Steven Whitehouse, cluster-devel On Wed, Jul 26, 2017 at 01:55:36PM -0400, Jeff Layton wrote: > +int file_write_and_wait(struct file *file) > +{ > + int err = 0, err2; > + struct address_space *mapping = file->f_mapping; > + > + if ((!dax_mapping(mapping) && mapping->nrpages) || > + (dax_mapping(mapping) && mapping->nrexceptional)) { Since patch 1 exists, shouldn't this use the new helper? > + err = filemap_fdatawrite(mapping); > + /* See comment of filemap_write_and_wait() */ > + if (err != -EIO) { > + loff_t i_size = i_size_read(mapping->host); > + > + if (i_size != 0) > + __filemap_fdatawait_range(mapping, 0, > + i_size - 1); > + } > + } > + err2 = file_check_and_advance_wb_err(file); > + if (!err) > + err = err2; > + return err; Would this be clearer written as: if (err) return err; return err2; or even ... return err ? err : err2; ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-26 19:13 ` Matthew Wilcox 0 siblings, 0 replies; 87+ messages in thread From: Matthew Wilcox @ 2017-07-26 19:13 UTC (permalink / raw) To: Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Bob Peterson, Steven Whitehouse, cluster-devel On Wed, Jul 26, 2017 at 01:55:36PM -0400, Jeff Layton wrote: > +int file_write_and_wait(struct file *file) > +{ > + int err = 0, err2; > + struct address_space *mapping = file->f_mapping; > + > + if ((!dax_mapping(mapping) && mapping->nrpages) || > + (dax_mapping(mapping) && mapping->nrexceptional)) { Since patch 1 exists, shouldn't this use the new helper? > + err = filemap_fdatawrite(mapping); > + /* See comment of filemap_write_and_wait() */ > + if (err != -EIO) { > + loff_t i_size = i_size_read(mapping->host); > + > + if (i_size != 0) > + __filemap_fdatawait_range(mapping, 0, > + i_size - 1); > + } > + } > + err2 = file_check_and_advance_wb_err(file); > + if (!err) > + err = err2; > + return err; Would this be clearer written as: if (err) return err; return err2; or even ... return err ? err : err2; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-26 19:13 ` Matthew Wilcox (?) @ 2017-07-26 22:18 ` Jeff Layton -1 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 22:18 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed, 2017-07-26 at 12:13 -0700, Matthew Wilcox wrote: > On Wed, Jul 26, 2017 at 01:55:36PM -0400, Jeff Layton wrote: > > +int file_write_and_wait(struct file *file) > > +{ > > + int err = 0, err2; > > + struct address_space *mapping = file->f_mapping; > > + > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > Since patch 1 exists, shouldn't this use the new helper? > <facepalm> yes, will fix > > + err = filemap_fdatawrite(mapping); > > + /* See comment of filemap_write_and_wait() */ > > + if (err != -EIO) { > > + loff_t i_size = i_size_read(mapping->host); > > + > > + if (i_size != 0) > > + __filemap_fdatawait_range(mapping, 0, > > + i_size - 1); > > + } > > + } > > + err2 = file_check_and_advance_wb_err(file); > > + if (!err) > > + err = err2; > > + return err; > > Would this be clearer written as: > > if (err) > return err; > return err2; > > or even ... > > return err ? err : err2; > Meh -- I like it the way I have it. If we don't have an error already, then just take the one from the check and advance. That said, I don't have a terribly strong preference here, so if anyone does, then I can be easily persuaded. -- -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-26 22:18 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 22:18 UTC (permalink / raw) To: Matthew Wilcox, Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Bob Peterson, Steven Whitehouse, cluster-devel On Wed, 2017-07-26 at 12:13 -0700, Matthew Wilcox wrote: > On Wed, Jul 26, 2017 at 01:55:36PM -0400, Jeff Layton wrote: > > +int file_write_and_wait(struct file *file) > > +{ > > + int err = 0, err2; > > + struct address_space *mapping = file->f_mapping; > > + > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > Since patch 1 exists, shouldn't this use the new helper? > <facepalm> yes, will fix > > + err = filemap_fdatawrite(mapping); > > + /* See comment of filemap_write_and_wait() */ > > + if (err != -EIO) { > > + loff_t i_size = i_size_read(mapping->host); > > + > > + if (i_size != 0) > > + __filemap_fdatawait_range(mapping, 0, > > + i_size - 1); > > + } > > + } > > + err2 = file_check_and_advance_wb_err(file); > > + if (!err) > > + err = err2; > > + return err; > > Would this be clearer written as: > > if (err) > return err; > return err2; > > or even ... > > return err ? err : err2; > Meh -- I like it the way I have it. If we don't have an error already, then just take the one from the check and advance. That said, I don't have a terribly strong preference here, so if anyone does, then I can be easily persuaded. -- -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-26 22:18 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 22:18 UTC (permalink / raw) To: Matthew Wilcox, Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Bob Peterson, Steven Whitehouse, cluster-devel On Wed, 2017-07-26 at 12:13 -0700, Matthew Wilcox wrote: > On Wed, Jul 26, 2017 at 01:55:36PM -0400, Jeff Layton wrote: > > +int file_write_and_wait(struct file *file) > > +{ > > + int err = 0, err2; > > + struct address_space *mapping = file->f_mapping; > > + > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > Since patch 1 exists, shouldn't this use the new helper? > <facepalm> yes, will fix > > + err = filemap_fdatawrite(mapping); > > + /* See comment of filemap_write_and_wait() */ > > + if (err != -EIO) { > > + loff_t i_size = i_size_read(mapping->host); > > + > > + if (i_size != 0) > > + __filemap_fdatawait_range(mapping, 0, > > + i_size - 1); > > + } > > + } > > + err2 = file_check_and_advance_wb_err(file); > > + if (!err) > > + err = err2; > > + return err; > > Would this be clearer written as: > > if (err) > return err; > return err2; > > or even ... > > return err ? err : err2; > Meh -- I like it the way I have it. If we don't have an error already, then just take the one from the check and advance. That said, I don't have a terribly strong preference here, so if anyone does, then I can be easily persuaded. -- -- Jeff Layton <jlayton@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-26 17:55 ` Jeff Layton (?) @ 2017-07-26 19:50 ` Bob Peterson -1 siblings, 0 replies; 87+ messages in thread From: Bob Peterson @ 2017-07-26 19:50 UTC (permalink / raw) To: cluster-devel.redhat.com ----- Original Message ----- | From: Jeff Layton <jlayton@redhat.com> | | Some filesystem fsync routines will need these. | | Signed-off-by: Jeff Layton <jlayton@redhat.com> | --- | include/linux/fs.h | 7 ++++++- | mm/filemap.c | 56 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++ | 2 files changed, 62 insertions(+), 1 deletion(-) (snip) | diff --git a/mm/filemap.c b/mm/filemap.c | index 72e46e6f0d9a..b904a8dfa43d 100644 | --- a/mm/filemap.c | +++ b/mm/filemap.c (snip) | @@ -675,6 +698,39 @@ int file_write_and_wait_range(struct file *file, loff_t | lstart, loff_t lend) | EXPORT_SYMBOL(file_write_and_wait_range); | | /** | + * file_write_and_wait - write out whole file and wait on it and return any | + * writeback errors since we last checked | + * @file: file to write back and wait on | + * | + * Write back the whole file and wait on its mapping. Afterward, check for | + * errors that may have occurred since our file->f_wb_err cursor was last | + * updated. | + */ | +int file_write_and_wait(struct file *file) | +{ | + int err = 0, err2; | + struct address_space *mapping = file->f_mapping; | + | + if ((!dax_mapping(mapping) && mapping->nrpages) || | + (dax_mapping(mapping) && mapping->nrexceptional)) { Seems like we should make the new function mapping_needs_writeback more central (mm.h or fs.h?) and call it here ^. | + err = filemap_fdatawrite(mapping); | + /* See comment of filemap_write_and_wait() */ | + if (err != -EIO) { | + loff_t i_size = i_size_read(mapping->host); | + | + if (i_size != 0) | + __filemap_fdatawait_range(mapping, 0, | + i_size - 1); | + } | + } | + err2 = file_check_and_advance_wb_err(file); | + if (!err) | + err = err2; | + return err; In the past, I've seen more elegant constructs like: return (err ? err : err2); but I don't know what's considered more ugly or hackish. Regards, Bob Peterson Red Hat File Systems ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-26 19:50 ` Bob Peterson 0 siblings, 0 replies; 87+ messages in thread From: Bob Peterson @ 2017-07-26 19:50 UTC (permalink / raw) To: Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Steven Whitehouse, cluster-devel ----- Original Message ----- | From: Jeff Layton <jlayton@redhat.com> | | Some filesystem fsync routines will need these. | | Signed-off-by: Jeff Layton <jlayton@redhat.com> | --- | include/linux/fs.h | 7 ++++++- | mm/filemap.c | 56 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++ | 2 files changed, 62 insertions(+), 1 deletion(-) (snip) | diff --git a/mm/filemap.c b/mm/filemap.c | index 72e46e6f0d9a..b904a8dfa43d 100644 | --- a/mm/filemap.c | +++ b/mm/filemap.c (snip) | @@ -675,6 +698,39 @@ int file_write_and_wait_range(struct file *file, loff_t | lstart, loff_t lend) | EXPORT_SYMBOL(file_write_and_wait_range); | | /** | + * file_write_and_wait - write out whole file and wait on it and return any | + * writeback errors since we last checked | + * @file: file to write back and wait on | + * | + * Write back the whole file and wait on its mapping. Afterward, check for | + * errors that may have occurred since our file->f_wb_err cursor was last | + * updated. | + */ | +int file_write_and_wait(struct file *file) | +{ | + int err = 0, err2; | + struct address_space *mapping = file->f_mapping; | + | + if ((!dax_mapping(mapping) && mapping->nrpages) || | + (dax_mapping(mapping) && mapping->nrexceptional)) { Seems like we should make the new function mapping_needs_writeback more central (mm.h or fs.h?) and call it here ^. | + err = filemap_fdatawrite(mapping); | + /* See comment of filemap_write_and_wait() */ | + if (err != -EIO) { | + loff_t i_size = i_size_read(mapping->host); | + | + if (i_size != 0) | + __filemap_fdatawait_range(mapping, 0, | + i_size - 1); | + } | + } | + err2 = file_check_and_advance_wb_err(file); | + if (!err) | + err = err2; | + return err; In the past, I've seen more elegant constructs like: return (err ? err : err2); but I don't know what's considered more ugly or hackish. Regards, Bob Peterson Red Hat File Systems ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-26 19:50 ` Bob Peterson 0 siblings, 0 replies; 87+ messages in thread From: Bob Peterson @ 2017-07-26 19:50 UTC (permalink / raw) To: Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Steven Whitehouse, cluster-devel ----- Original Message ----- | From: Jeff Layton <jlayton@redhat.com> | | Some filesystem fsync routines will need these. | | Signed-off-by: Jeff Layton <jlayton@redhat.com> | --- | include/linux/fs.h | 7 ++++++- | mm/filemap.c | 56 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++ | 2 files changed, 62 insertions(+), 1 deletion(-) (snip) | diff --git a/mm/filemap.c b/mm/filemap.c | index 72e46e6f0d9a..b904a8dfa43d 100644 | --- a/mm/filemap.c | +++ b/mm/filemap.c (snip) | @@ -675,6 +698,39 @@ int file_write_and_wait_range(struct file *file, loff_t | lstart, loff_t lend) | EXPORT_SYMBOL(file_write_and_wait_range); | | /** | + * file_write_and_wait - write out whole file and wait on it and return any | + * writeback errors since we last checked | + * @file: file to write back and wait on | + * | + * Write back the whole file and wait on its mapping. Afterward, check for | + * errors that may have occurred since our file->f_wb_err cursor was last | + * updated. | + */ | +int file_write_and_wait(struct file *file) | +{ | + int err = 0, err2; | + struct address_space *mapping = file->f_mapping; | + | + if ((!dax_mapping(mapping) && mapping->nrpages) || | + (dax_mapping(mapping) && mapping->nrexceptional)) { Seems like we should make the new function mapping_needs_writeback more central (mm.h or fs.h?) and call it here ^. | + err = filemap_fdatawrite(mapping); | + /* See comment of filemap_write_and_wait() */ | + if (err != -EIO) { | + loff_t i_size = i_size_read(mapping->host); | + | + if (i_size != 0) | + __filemap_fdatawait_range(mapping, 0, | + i_size - 1); | + } | + } | + err2 = file_check_and_advance_wb_err(file); | + if (!err) | + err = err2; | + return err; In the past, I've seen more elegant constructs like: return (err ? err : err2); but I don't know what's considered more ugly or hackish. Regards, Bob Peterson Red Hat File Systems -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-26 17:55 ` Jeff Layton (?) @ 2017-07-27 8:49 ` Jan Kara -1 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-07-27 8:49 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed 26-07-17 13:55:36, Jeff Layton wrote: > +int file_write_and_wait(struct file *file) > +{ > + int err = 0, err2; > + struct address_space *mapping = file->f_mapping; > + > + if ((!dax_mapping(mapping) && mapping->nrpages) || > + (dax_mapping(mapping) && mapping->nrexceptional)) { > + err = filemap_fdatawrite(mapping); > + /* See comment of filemap_write_and_wait() */ > + if (err != -EIO) { > + loff_t i_size = i_size_read(mapping->host); > + > + if (i_size != 0) > + __filemap_fdatawait_range(mapping, 0, > + i_size - 1); > + } > + } Err, what's the i_size check doing here? I'd just pass ~0 as the end of the range and ignore i_size. It is much easier than trying to wrap your head around possible races with file operations modifying i_size. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-27 8:49 ` Jan Kara 0 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-07-27 8:49 UTC (permalink / raw) To: Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel On Wed 26-07-17 13:55:36, Jeff Layton wrote: > +int file_write_and_wait(struct file *file) > +{ > + int err = 0, err2; > + struct address_space *mapping = file->f_mapping; > + > + if ((!dax_mapping(mapping) && mapping->nrpages) || > + (dax_mapping(mapping) && mapping->nrexceptional)) { > + err = filemap_fdatawrite(mapping); > + /* See comment of filemap_write_and_wait() */ > + if (err != -EIO) { > + loff_t i_size = i_size_read(mapping->host); > + > + if (i_size != 0) > + __filemap_fdatawait_range(mapping, 0, > + i_size - 1); > + } > + } Err, what's the i_size check doing here? I'd just pass ~0 as the end of the range and ignore i_size. It is much easier than trying to wrap your head around possible races with file operations modifying i_size. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-27 8:49 ` Jan Kara 0 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-07-27 8:49 UTC (permalink / raw) To: Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel On Wed 26-07-17 13:55:36, Jeff Layton wrote: > +int file_write_and_wait(struct file *file) > +{ > + int err = 0, err2; > + struct address_space *mapping = file->f_mapping; > + > + if ((!dax_mapping(mapping) && mapping->nrpages) || > + (dax_mapping(mapping) && mapping->nrexceptional)) { > + err = filemap_fdatawrite(mapping); > + /* See comment of filemap_write_and_wait() */ > + if (err != -EIO) { > + loff_t i_size = i_size_read(mapping->host); > + > + if (i_size != 0) > + __filemap_fdatawait_range(mapping, 0, > + i_size - 1); > + } > + } Err, what's the i_size check doing here? I'd just pass ~0 as the end of the range and ignore i_size. It is much easier than trying to wrap your head around possible races with file operations modifying i_size. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-27 8:49 ` Jan Kara (?) @ 2017-07-27 12:48 ` Jeff Layton -1 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-27 12:48 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > +int file_write_and_wait(struct file *file) > > +{ > > + int err = 0, err2; > > + struct address_space *mapping = file->f_mapping; > > + > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > + err = filemap_fdatawrite(mapping); > > + /* See comment of filemap_write_and_wait() */ > > + if (err != -EIO) { > > + loff_t i_size = i_size_read(mapping->host); > > + > > + if (i_size != 0) > > + __filemap_fdatawait_range(mapping, 0, > > + i_size - 1); > > + } > > + } > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > range and ignore i_size. It is much easier than trying to wrap your head > around possible races with file operations modifying i_size. > > Honza I'm basically emulating _exactly_ what filemap_write_and_wait does here, as I'm leery of making subtle behavior changes in the actual writeback behavior. For example: -----------------8<---------------- static inline int __filemap_fdatawrite(struct address_space *mapping, int sync_mode) { return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); } int filemap_fdatawrite(struct address_space *mapping) { return __filemap_fdatawrite(mapping, WB_SYNC_ALL); } EXPORT_SYMBOL(filemap_fdatawrite); -----------------8<---------------- ...which then sets up the wbc with the right ranges and sync mode and kicks off writepages. But then, it does the i_size_read to figure out what range it should wait on (with the shortcut for the size == 0 case). My assumption was that it was intentionally designed that way, but I'm guessing from your comments that it wasn't? If so, then we can turn file_write_and_wait a static inline wrapper around file_write_and_wait_range. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-27 12:48 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-27 12:48 UTC (permalink / raw) To: Jan Kara, Jeff Layton Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > +int file_write_and_wait(struct file *file) > > +{ > > + int err = 0, err2; > > + struct address_space *mapping = file->f_mapping; > > + > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > + err = filemap_fdatawrite(mapping); > > + /* See comment of filemap_write_and_wait() */ > > + if (err != -EIO) { > > + loff_t i_size = i_size_read(mapping->host); > > + > > + if (i_size != 0) > > + __filemap_fdatawait_range(mapping, 0, > > + i_size - 1); > > + } > > + } > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > range and ignore i_size. It is much easier than trying to wrap your head > around possible races with file operations modifying i_size. > > Honza I'm basically emulating _exactly_ what filemap_write_and_wait does here, as I'm leery of making subtle behavior changes in the actual writeback behavior. For example: -----------------8<---------------- static inline int __filemap_fdatawrite(struct address_space *mapping, int sync_mode) { return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); } int filemap_fdatawrite(struct address_space *mapping) { return __filemap_fdatawrite(mapping, WB_SYNC_ALL); } EXPORT_SYMBOL(filemap_fdatawrite); -----------------8<---------------- ...which then sets up the wbc with the right ranges and sync mode and kicks off writepages. But then, it does the i_size_read to figure out what range it should wait on (with the shortcut for the size == 0 case). My assumption was that it was intentionally designed that way, but I'm guessing from your comments that it wasn't? If so, then we can turn file_write_and_wait a static inline wrapper around file_write_and_wait_range. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-27 12:48 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-27 12:48 UTC (permalink / raw) To: Jan Kara, Jeff Layton Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > +int file_write_and_wait(struct file *file) > > +{ > > + int err = 0, err2; > > + struct address_space *mapping = file->f_mapping; > > + > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > + err = filemap_fdatawrite(mapping); > > + /* See comment of filemap_write_and_wait() */ > > + if (err != -EIO) { > > + loff_t i_size = i_size_read(mapping->host); > > + > > + if (i_size != 0) > > + __filemap_fdatawait_range(mapping, 0, > > + i_size - 1); > > + } > > + } > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > range and ignore i_size. It is much easier than trying to wrap your head > around possible races with file operations modifying i_size. > > Honza I'm basically emulating _exactly_ what filemap_write_and_wait does here, as I'm leery of making subtle behavior changes in the actual writeback behavior. For example: -----------------8<---------------- static inline int __filemap_fdatawrite(struct address_space *mapping, int sync_mode) { return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); } int filemap_fdatawrite(struct address_space *mapping) { return __filemap_fdatawrite(mapping, WB_SYNC_ALL); } EXPORT_SYMBOL(filemap_fdatawrite); -----------------8<---------------- ...which then sets up the wbc with the right ranges and sync mode and kicks off writepages. But then, it does the i_size_read to figure out what range it should wait on (with the shortcut for the size == 0 case). My assumption was that it was intentionally designed that way, but I'm guessing from your comments that it wasn't? If so, then we can turn file_write_and_wait a static inline wrapper around file_write_and_wait_range. -- Jeff Layton <jlayton@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-27 12:48 ` Jeff Layton (?) @ 2017-07-31 11:27 ` Jeff Layton -1 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 11:27 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > +int file_write_and_wait(struct file *file) > > > +{ > > > + int err = 0, err2; > > > + struct address_space *mapping = file->f_mapping; > > > + > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > + err = filemap_fdatawrite(mapping); > > > + /* See comment of filemap_write_and_wait() */ > > > + if (err != -EIO) { > > > + loff_t i_size = i_size_read(mapping->host); > > > + > > > + if (i_size != 0) > > > + __filemap_fdatawait_range(mapping, 0, > > > + i_size - 1); > > > + } > > > + } > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > range and ignore i_size. It is much easier than trying to wrap your head > > around possible races with file operations modifying i_size. > > > > Honza > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > as I'm leery of making subtle behavior changes in the actual writeback > behavior. For example: > > -----------------8<---------------- > static inline int __filemap_fdatawrite(struct address_space *mapping, > int sync_mode) > { > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > } > > int filemap_fdatawrite(struct address_space *mapping) > { > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > } > EXPORT_SYMBOL(filemap_fdatawrite); > -----------------8<---------------- > > ...which then sets up the wbc with the right ranges and sync mode and > kicks off writepages. But then, it does the i_size_read to figure out > what range it should wait on (with the shortcut for the size == 0 case). > > My assumption was that it was intentionally designed that way, but I'm > guessing from your comments that it wasn't? If so, then we can turn > file_write_and_wait a static inline wrapper around > file_write_and_wait_range. FWIW, I did a bit of archaeology in the linux-history tree and found this patch from Marcelo in 2004. Is this optimization still helpful? If not, then that does simplify the code a bit. -------------------8<-------------------- [PATCH] small wait_on_page_writeback_range() optimization filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" parameter. This is not needed since we know the EOF from the inode. Use that instead. Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- mm/filemap.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 78e18b7639b6..55fb7b4141e4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); */ int filemap_fdatawait(struct address_space *mapping) { - return wait_on_page_writeback_range(mapping, 0, -1); + loff_t i_size = i_size_read(mapping->host); + + if (i_size == 0) + return 0; + + return wait_on_page_writeback_range(mapping, 0, + (i_size - 1) >> PAGE_CACHE_SHIFT); } EXPORT_SYMBOL(filemap_fdatawait); ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 11:27 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 11:27 UTC (permalink / raw) To: Jan Kara, Marcelo Tosatti Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > +int file_write_and_wait(struct file *file) > > > +{ > > > + int err = 0, err2; > > > + struct address_space *mapping = file->f_mapping; > > > + > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > + err = filemap_fdatawrite(mapping); > > > + /* See comment of filemap_write_and_wait() */ > > > + if (err != -EIO) { > > > + loff_t i_size = i_size_read(mapping->host); > > > + > > > + if (i_size != 0) > > > + __filemap_fdatawait_range(mapping, 0, > > > + i_size - 1); > > > + } > > > + } > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > range and ignore i_size. It is much easier than trying to wrap your head > > around possible races with file operations modifying i_size. > > > > Honza > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > as I'm leery of making subtle behavior changes in the actual writeback > behavior. For example: > > -----------------8<---------------- > static inline int __filemap_fdatawrite(struct address_space *mapping, > int sync_mode) > { > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > } > > int filemap_fdatawrite(struct address_space *mapping) > { > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > } > EXPORT_SYMBOL(filemap_fdatawrite); > -----------------8<---------------- > > ...which then sets up the wbc with the right ranges and sync mode and > kicks off writepages. But then, it does the i_size_read to figure out > what range it should wait on (with the shortcut for the size == 0 case). > > My assumption was that it was intentionally designed that way, but I'm > guessing from your comments that it wasn't? If so, then we can turn > file_write_and_wait a static inline wrapper around > file_write_and_wait_range. FWIW, I did a bit of archaeology in the linux-history tree and found this patch from Marcelo in 2004. Is this optimization still helpful? If not, then that does simplify the code a bit. -------------------8<-------------------- [PATCH] small wait_on_page_writeback_range() optimization filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" parameter. This is not needed since we know the EOF from the inode. Use that instead. Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- mm/filemap.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 78e18b7639b6..55fb7b4141e4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); */ int filemap_fdatawait(struct address_space *mapping) { - return wait_on_page_writeback_range(mapping, 0, -1); + loff_t i_size = i_size_read(mapping->host); + + if (i_size == 0) + return 0; + + return wait_on_page_writeback_range(mapping, 0, + (i_size - 1) >> PAGE_CACHE_SHIFT); } EXPORT_SYMBOL(filemap_fdatawait); ^ permalink raw reply related [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 11:27 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 11:27 UTC (permalink / raw) To: Jan Kara, Marcelo Tosatti Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > +int file_write_and_wait(struct file *file) > > > +{ > > > + int err = 0, err2; > > > + struct address_space *mapping = file->f_mapping; > > > + > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > + err = filemap_fdatawrite(mapping); > > > + /* See comment of filemap_write_and_wait() */ > > > + if (err != -EIO) { > > > + loff_t i_size = i_size_read(mapping->host); > > > + > > > + if (i_size != 0) > > > + __filemap_fdatawait_range(mapping, 0, > > > + i_size - 1); > > > + } > > > + } > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > range and ignore i_size. It is much easier than trying to wrap your head > > around possible races with file operations modifying i_size. > > > > Honza > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > as I'm leery of making subtle behavior changes in the actual writeback > behavior. For example: > > -----------------8<---------------- > static inline int __filemap_fdatawrite(struct address_space *mapping, > int sync_mode) > { > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > } > > int filemap_fdatawrite(struct address_space *mapping) > { > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > } > EXPORT_SYMBOL(filemap_fdatawrite); > -----------------8<---------------- > > ...which then sets up the wbc with the right ranges and sync mode and > kicks off writepages. But then, it does the i_size_read to figure out > what range it should wait on (with the shortcut for the size == 0 case). > > My assumption was that it was intentionally designed that way, but I'm > guessing from your comments that it wasn't? If so, then we can turn > file_write_and_wait a static inline wrapper around > file_write_and_wait_range. FWIW, I did a bit of archaeology in the linux-history tree and found this patch from Marcelo in 2004. Is this optimization still helpful? If not, then that does simplify the code a bit. -------------------8<-------------------- [PATCH] small wait_on_page_writeback_range() optimization filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" parameter. This is not needed since we know the EOF from the inode. Use that instead. Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- mm/filemap.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 78e18b7639b6..55fb7b4141e4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); */ int filemap_fdatawait(struct address_space *mapping) { - return wait_on_page_writeback_range(mapping, 0, -1); + loff_t i_size = i_size_read(mapping->host); + + if (i_size == 0) + return 0; + + return wait_on_page_writeback_range(mapping, 0, + (i_size - 1) >> PAGE_CACHE_SHIFT); } EXPORT_SYMBOL(filemap_fdatawait); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-31 11:27 ` Jeff Layton (?) @ 2017-07-31 11:32 ` Steven Whitehouse -1 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-31 11:32 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, On 31/07/17 12:27, Jeff Layton wrote: > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: >> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: >>> On Wed 26-07-17 13:55:36, Jeff Layton wrote: >>>> +int file_write_and_wait(struct file *file) >>>> +{ >>>> + int err = 0, err2; >>>> + struct address_space *mapping = file->f_mapping; >>>> + >>>> + if ((!dax_mapping(mapping) && mapping->nrpages) || >>>> + (dax_mapping(mapping) && mapping->nrexceptional)) { >>>> + err = filemap_fdatawrite(mapping); >>>> + /* See comment of filemap_write_and_wait() */ >>>> + if (err != -EIO) { >>>> + loff_t i_size = i_size_read(mapping->host); >>>> + >>>> + if (i_size != 0) >>>> + __filemap_fdatawait_range(mapping, 0, >>>> + i_size - 1); >>>> + } >>>> + } >>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the >>> range and ignore i_size. It is much easier than trying to wrap your head >>> around possible races with file operations modifying i_size. >>> >>> Honza >> I'm basically emulating _exactly_ what filemap_write_and_wait does here, >> as I'm leery of making subtle behavior changes in the actual writeback >> behavior. For example: >> >> -----------------8<---------------- >> static inline int __filemap_fdatawrite(struct address_space *mapping, >> int sync_mode) >> { >> return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); >> } >> >> int filemap_fdatawrite(struct address_space *mapping) >> { >> return __filemap_fdatawrite(mapping, WB_SYNC_ALL); >> } >> EXPORT_SYMBOL(filemap_fdatawrite); >> -----------------8<---------------- >> >> ...which then sets up the wbc with the right ranges and sync mode and >> kicks off writepages. But then, it does the i_size_read to figure out >> what range it should wait on (with the shortcut for the size == 0 case). >> >> My assumption was that it was intentionally designed that way, but I'm >> guessing from your comments that it wasn't? If so, then we can turn >> file_write_and_wait a static inline wrapper around >> file_write_and_wait_range. > FWIW, I did a bit of archaeology in the linux-history tree and found > this patch from Marcelo in 2004. Is this optimization still helpful? If > not, then that does simplify the code a bit. > > -------------------8<-------------------- > > [PATCH] small wait_on_page_writeback_range() optimization > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > parameter. This is not needed since we know the EOF from the inode. Use > that instead. > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > Signed-off-by: Andrew Morton <akpm@osdl.org> > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > --- > mm/filemap.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 78e18b7639b6..55fb7b4141e4 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > */ > int filemap_fdatawait(struct address_space *mapping) > { > - return wait_on_page_writeback_range(mapping, 0, -1); > + loff_t i_size = i_size_read(mapping->host); > + > + if (i_size == 0) > + return 0; > + > + return wait_on_page_writeback_range(mapping, 0, > + (i_size - 1) >> PAGE_CACHE_SHIFT); > } > EXPORT_SYMBOL(filemap_fdatawait); > Does this ever get called in cases where we would not hold fs locks? In that case we definitely don't want to be relying on i_size, Steve. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 11:32 ` Steven Whitehouse 0 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-31 11:32 UTC (permalink / raw) To: Jeff Layton, Jan Kara, Marcelo Tosatti Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel Hi, On 31/07/17 12:27, Jeff Layton wrote: > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: >> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: >>> On Wed 26-07-17 13:55:36, Jeff Layton wrote: >>>> +int file_write_and_wait(struct file *file) >>>> +{ >>>> + int err = 0, err2; >>>> + struct address_space *mapping = file->f_mapping; >>>> + >>>> + if ((!dax_mapping(mapping) && mapping->nrpages) || >>>> + (dax_mapping(mapping) && mapping->nrexceptional)) { >>>> + err = filemap_fdatawrite(mapping); >>>> + /* See comment of filemap_write_and_wait() */ >>>> + if (err != -EIO) { >>>> + loff_t i_size = i_size_read(mapping->host); >>>> + >>>> + if (i_size != 0) >>>> + __filemap_fdatawait_range(mapping, 0, >>>> + i_size - 1); >>>> + } >>>> + } >>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the >>> range and ignore i_size. It is much easier than trying to wrap your head >>> around possible races with file operations modifying i_size. >>> >>> Honza >> I'm basically emulating _exactly_ what filemap_write_and_wait does here, >> as I'm leery of making subtle behavior changes in the actual writeback >> behavior. For example: >> >> -----------------8<---------------- >> static inline int __filemap_fdatawrite(struct address_space *mapping, >> int sync_mode) >> { >> return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); >> } >> >> int filemap_fdatawrite(struct address_space *mapping) >> { >> return __filemap_fdatawrite(mapping, WB_SYNC_ALL); >> } >> EXPORT_SYMBOL(filemap_fdatawrite); >> -----------------8<---------------- >> >> ...which then sets up the wbc with the right ranges and sync mode and >> kicks off writepages. But then, it does the i_size_read to figure out >> what range it should wait on (with the shortcut for the size == 0 case). >> >> My assumption was that it was intentionally designed that way, but I'm >> guessing from your comments that it wasn't? If so, then we can turn >> file_write_and_wait a static inline wrapper around >> file_write_and_wait_range. > FWIW, I did a bit of archaeology in the linux-history tree and found > this patch from Marcelo in 2004. Is this optimization still helpful? If > not, then that does simplify the code a bit. > > -------------------8<-------------------- > > [PATCH] small wait_on_page_writeback_range() optimization > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > parameter. This is not needed since we know the EOF from the inode. Use > that instead. > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > Signed-off-by: Andrew Morton <akpm@osdl.org> > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > --- > mm/filemap.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 78e18b7639b6..55fb7b4141e4 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > */ > int filemap_fdatawait(struct address_space *mapping) > { > - return wait_on_page_writeback_range(mapping, 0, -1); > + loff_t i_size = i_size_read(mapping->host); > + > + if (i_size == 0) > + return 0; > + > + return wait_on_page_writeback_range(mapping, 0, > + (i_size - 1) >> PAGE_CACHE_SHIFT); > } > EXPORT_SYMBOL(filemap_fdatawait); > Does this ever get called in cases where we would not hold fs locks? In that case we definitely don't want to be relying on i_size, Steve. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 11:32 ` Steven Whitehouse 0 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-31 11:32 UTC (permalink / raw) To: Jeff Layton, Jan Kara, Marcelo Tosatti Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel Hi, On 31/07/17 12:27, Jeff Layton wrote: > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: >> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: >>> On Wed 26-07-17 13:55:36, Jeff Layton wrote: >>>> +int file_write_and_wait(struct file *file) >>>> +{ >>>> + int err = 0, err2; >>>> + struct address_space *mapping = file->f_mapping; >>>> + >>>> + if ((!dax_mapping(mapping) && mapping->nrpages) || >>>> + (dax_mapping(mapping) && mapping->nrexceptional)) { >>>> + err = filemap_fdatawrite(mapping); >>>> + /* See comment of filemap_write_and_wait() */ >>>> + if (err != -EIO) { >>>> + loff_t i_size = i_size_read(mapping->host); >>>> + >>>> + if (i_size != 0) >>>> + __filemap_fdatawait_range(mapping, 0, >>>> + i_size - 1); >>>> + } >>>> + } >>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the >>> range and ignore i_size. It is much easier than trying to wrap your head >>> around possible races with file operations modifying i_size. >>> >>> Honza >> I'm basically emulating _exactly_ what filemap_write_and_wait does here, >> as I'm leery of making subtle behavior changes in the actual writeback >> behavior. For example: >> >> -----------------8<---------------- >> static inline int __filemap_fdatawrite(struct address_space *mapping, >> int sync_mode) >> { >> return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); >> } >> >> int filemap_fdatawrite(struct address_space *mapping) >> { >> return __filemap_fdatawrite(mapping, WB_SYNC_ALL); >> } >> EXPORT_SYMBOL(filemap_fdatawrite); >> -----------------8<---------------- >> >> ...which then sets up the wbc with the right ranges and sync mode and >> kicks off writepages. But then, it does the i_size_read to figure out >> what range it should wait on (with the shortcut for the size == 0 case). >> >> My assumption was that it was intentionally designed that way, but I'm >> guessing from your comments that it wasn't? If so, then we can turn >> file_write_and_wait a static inline wrapper around >> file_write_and_wait_range. > FWIW, I did a bit of archaeology in the linux-history tree and found > this patch from Marcelo in 2004. Is this optimization still helpful? If > not, then that does simplify the code a bit. > > -------------------8<-------------------- > > [PATCH] small wait_on_page_writeback_range() optimization > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > parameter. This is not needed since we know the EOF from the inode. Use > that instead. > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > Signed-off-by: Andrew Morton <akpm@osdl.org> > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > --- > mm/filemap.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 78e18b7639b6..55fb7b4141e4 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > */ > int filemap_fdatawait(struct address_space *mapping) > { > - return wait_on_page_writeback_range(mapping, 0, -1); > + loff_t i_size = i_size_read(mapping->host); > + > + if (i_size == 0) > + return 0; > + > + return wait_on_page_writeback_range(mapping, 0, > + (i_size - 1) >> PAGE_CACHE_SHIFT); > } > EXPORT_SYMBOL(filemap_fdatawait); > Does this ever get called in cases where we would not hold fs locks? In that case we definitely don't want to be relying on i_size, Steve. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-31 11:32 ` Steven Whitehouse (?) @ 2017-07-31 11:44 ` Jeff Layton -1 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 11:44 UTC (permalink / raw) To: cluster-devel.redhat.com On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > Hi, > > > On 31/07/17 12:27, Jeff Layton wrote: > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > +int file_write_and_wait(struct file *file) > > > > > +{ > > > > > + int err = 0, err2; > > > > > + struct address_space *mapping = file->f_mapping; > > > > > + > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > + err = filemap_fdatawrite(mapping); > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > + if (err != -EIO) { > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > + > > > > > + if (i_size != 0) > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > + i_size - 1); > > > > > + } > > > > > + } > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > around possible races with file operations modifying i_size. > > > > > > > > Honza > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > as I'm leery of making subtle behavior changes in the actual writeback > > > behavior. For example: > > > > > > -----------------8<---------------- > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > int sync_mode) > > > { > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > } > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > { > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > } > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > -----------------8<---------------- > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > kicks off writepages. But then, it does the i_size_read to figure out > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > guessing from your comments that it wasn't? If so, then we can turn > > > file_write_and_wait a static inline wrapper around > > > file_write_and_wait_range. > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > not, then that does simplify the code a bit. > > > > -------------------8<-------------------- > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > parameter. This is not needed since we know the EOF from the inode. Use > > that instead. > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > --- > > mm/filemap.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 78e18b7639b6..55fb7b4141e4 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > */ > > int filemap_fdatawait(struct address_space *mapping) > > { > > - return wait_on_page_writeback_range(mapping, 0, -1); > > + loff_t i_size = i_size_read(mapping->host); > > + > > + if (i_size == 0) > > + return 0; > > + > > + return wait_on_page_writeback_range(mapping, 0, > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > } > > EXPORT_SYMBOL(filemap_fdatawait); > > > > Does this ever get called in cases where we would not hold fs locks? In > that case we definitely don't want to be relying on i_size, > > Steve. > Yes. We can initiate and wait on writeback from any context where you can sleep, really. We're just waiting on whole file writeback here, so I don't think there's anything wrong. As long as the i_size was valid at some point in time prior to waiting then you're ok. The question I have is more whether this optimization is still useful. What we do now is just walk the radix tree and wait_on_page_writeback for each page. Do we gain anything by avoiding ranges beyond the current EOF with the pagecache infrastructure of 2017? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 11:44 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 11:44 UTC (permalink / raw) To: Steven Whitehouse, Jan Kara, Marcelo Tosatti Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > Hi, > > > On 31/07/17 12:27, Jeff Layton wrote: > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > +int file_write_and_wait(struct file *file) > > > > > +{ > > > > > + int err = 0, err2; > > > > > + struct address_space *mapping = file->f_mapping; > > > > > + > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > + err = filemap_fdatawrite(mapping); > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > + if (err != -EIO) { > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > + > > > > > + if (i_size != 0) > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > + i_size - 1); > > > > > + } > > > > > + } > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > around possible races with file operations modifying i_size. > > > > > > > > Honza > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > as I'm leery of making subtle behavior changes in the actual writeback > > > behavior. For example: > > > > > > -----------------8<---------------- > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > int sync_mode) > > > { > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > } > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > { > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > } > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > -----------------8<---------------- > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > kicks off writepages. But then, it does the i_size_read to figure out > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > guessing from your comments that it wasn't? If so, then we can turn > > > file_write_and_wait a static inline wrapper around > > > file_write_and_wait_range. > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > not, then that does simplify the code a bit. > > > > -------------------8<-------------------- > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > parameter. This is not needed since we know the EOF from the inode. Use > > that instead. > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > --- > > mm/filemap.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 78e18b7639b6..55fb7b4141e4 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > */ > > int filemap_fdatawait(struct address_space *mapping) > > { > > - return wait_on_page_writeback_range(mapping, 0, -1); > > + loff_t i_size = i_size_read(mapping->host); > > + > > + if (i_size == 0) > > + return 0; > > + > > + return wait_on_page_writeback_range(mapping, 0, > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > } > > EXPORT_SYMBOL(filemap_fdatawait); > > > > Does this ever get called in cases where we would not hold fs locks? In > that case we definitely don't want to be relying on i_size, > > Steve. > Yes. We can initiate and wait on writeback from any context where you can sleep, really. We're just waiting on whole file writeback here, so I don't think there's anything wrong. As long as the i_size was valid at some point in time prior to waiting then you're ok. The question I have is more whether this optimization is still useful. What we do now is just walk the radix tree and wait_on_page_writeback for each page. Do we gain anything by avoiding ranges beyond the current EOF with the pagecache infrastructure of 2017? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 11:44 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 11:44 UTC (permalink / raw) To: Steven Whitehouse, Jan Kara, Marcelo Tosatti Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > Hi, > > > On 31/07/17 12:27, Jeff Layton wrote: > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > +int file_write_and_wait(struct file *file) > > > > > +{ > > > > > + int err = 0, err2; > > > > > + struct address_space *mapping = file->f_mapping; > > > > > + > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > + err = filemap_fdatawrite(mapping); > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > + if (err != -EIO) { > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > + > > > > > + if (i_size != 0) > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > + i_size - 1); > > > > > + } > > > > > + } > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > around possible races with file operations modifying i_size. > > > > > > > > Honza > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > as I'm leery of making subtle behavior changes in the actual writeback > > > behavior. For example: > > > > > > -----------------8<---------------- > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > int sync_mode) > > > { > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > } > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > { > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > } > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > -----------------8<---------------- > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > kicks off writepages. But then, it does the i_size_read to figure out > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > guessing from your comments that it wasn't? If so, then we can turn > > > file_write_and_wait a static inline wrapper around > > > file_write_and_wait_range. > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > not, then that does simplify the code a bit. > > > > -------------------8<-------------------- > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > parameter. This is not needed since we know the EOF from the inode. Use > > that instead. > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > --- > > mm/filemap.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 78e18b7639b6..55fb7b4141e4 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > */ > > int filemap_fdatawait(struct address_space *mapping) > > { > > - return wait_on_page_writeback_range(mapping, 0, -1); > > + loff_t i_size = i_size_read(mapping->host); > > + > > + if (i_size == 0) > > + return 0; > > + > > + return wait_on_page_writeback_range(mapping, 0, > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > } > > EXPORT_SYMBOL(filemap_fdatawait); > > > > Does this ever get called in cases where we would not hold fs locks? In > that case we definitely don't want to be relying on i_size, > > Steve. > Yes. We can initiate and wait on writeback from any context where you can sleep, really. We're just waiting on whole file writeback here, so I don't think there's anything wrong. As long as the i_size was valid at some point in time prior to waiting then you're ok. The question I have is more whether this optimization is still useful. What we do now is just walk the radix tree and wait_on_page_writeback for each page. Do we gain anything by avoiding ranges beyond the current EOF with the pagecache infrastructure of 2017? -- Jeff Layton <jlayton@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-31 11:44 ` Jeff Layton (?) @ 2017-07-31 12:05 ` Steven Whitehouse -1 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-31 12:05 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, On 31/07/17 12:44, Jeff Layton wrote: > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: >> Hi, >> >> >> On 31/07/17 12:27, Jeff Layton wrote: >>> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: >>>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: >>>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote: >>>>>> +int file_write_and_wait(struct file *file) >>>>>> +{ >>>>>> + int err = 0, err2; >>>>>> + struct address_space *mapping = file->f_mapping; >>>>>> + >>>>>> + if ((!dax_mapping(mapping) && mapping->nrpages) || >>>>>> + (dax_mapping(mapping) && mapping->nrexceptional)) { >>>>>> + err = filemap_fdatawrite(mapping); >>>>>> + /* See comment of filemap_write_and_wait() */ >>>>>> + if (err != -EIO) { >>>>>> + loff_t i_size = i_size_read(mapping->host); >>>>>> + >>>>>> + if (i_size != 0) >>>>>> + __filemap_fdatawait_range(mapping, 0, >>>>>> + i_size - 1); >>>>>> + } >>>>>> + } >>>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the >>>>> range and ignore i_size. It is much easier than trying to wrap your head >>>>> around possible races with file operations modifying i_size. >>>>> >>>>> Honza >>>> I'm basically emulating _exactly_ what filemap_write_and_wait does here, >>>> as I'm leery of making subtle behavior changes in the actual writeback >>>> behavior. For example: >>>> >>>> -----------------8<---------------- >>>> static inline int __filemap_fdatawrite(struct address_space *mapping, >>>> int sync_mode) >>>> { >>>> return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); >>>> } >>>> >>>> int filemap_fdatawrite(struct address_space *mapping) >>>> { >>>> return __filemap_fdatawrite(mapping, WB_SYNC_ALL); >>>> } >>>> EXPORT_SYMBOL(filemap_fdatawrite); >>>> -----------------8<---------------- >>>> >>>> ...which then sets up the wbc with the right ranges and sync mode and >>>> kicks off writepages. But then, it does the i_size_read to figure out >>>> what range it should wait on (with the shortcut for the size == 0 case). >>>> >>>> My assumption was that it was intentionally designed that way, but I'm >>>> guessing from your comments that it wasn't? If so, then we can turn >>>> file_write_and_wait a static inline wrapper around >>>> file_write_and_wait_range. >>> FWIW, I did a bit of archaeology in the linux-history tree and found >>> this patch from Marcelo in 2004. Is this optimization still helpful? If >>> not, then that does simplify the code a bit. >>> >>> -------------------8<-------------------- >>> >>> [PATCH] small wait_on_page_writeback_range() optimization >>> >>> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" >>> parameter. This is not needed since we know the EOF from the inode. Use >>> that instead. >>> >>> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> >>> Signed-off-by: Andrew Morton <akpm@osdl.org> >>> Signed-off-by: Linus Torvalds <torvalds@osdl.org> >>> --- >>> mm/filemap.c | 8 +++++++- >>> 1 file changed, 7 insertions(+), 1 deletion(-) >>> >>> diff --git a/mm/filemap.c b/mm/filemap.c >>> index 78e18b7639b6..55fb7b4141e4 100644 >>> --- a/mm/filemap.c >>> +++ b/mm/filemap.c >>> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); >>> */ >>> int filemap_fdatawait(struct address_space *mapping) >>> { >>> - return wait_on_page_writeback_range(mapping, 0, -1); >>> + loff_t i_size = i_size_read(mapping->host); >>> + >>> + if (i_size == 0) >>> + return 0; >>> + >>> + return wait_on_page_writeback_range(mapping, 0, >>> + (i_size - 1) >> PAGE_CACHE_SHIFT); >>> } >>> EXPORT_SYMBOL(filemap_fdatawait); >>> >> Does this ever get called in cases where we would not hold fs locks? In >> that case we definitely don't want to be relying on i_size, >> >> Steve. >> > Yes. We can initiate and wait on writeback from any context where you > can sleep, really. > > We're just waiting on whole file writeback here, so I don't think > there's anything wrong. As long as the i_size was valid at some point in > time prior to waiting then you're ok. > > The question I have is more whether this optimization is still useful. > > What we do now is just walk the radix tree and wait_on_page_writeback > for each page. Do we gain anything by avoiding ranges beyond the current > EOF with the pagecache infrastructure of 2017? > If this can be called from anywhere without fs locks, then i_size is not known. That has been a problem in the past since i_size may have changed on another node. We avoid that in this case due to only changing i_size under an exclusive lock, and also only having dirty pages when we have an exclusive lock. There is another case though, if the inode is a block device, i_size will be zero. That is the case for the address space that looks after rgrps for GFS2. We do (luckily!) call filemap_fdatawait_range() directly in that case. For "normal" inodes though, the address space for metadata is backed by the block device inode, so that looks like it might be an issue, since fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the metamapping. It might potentially be an issue in other cases too, Steve. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 12:05 ` Steven Whitehouse 0 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-31 12:05 UTC (permalink / raw) To: Jeff Layton, Jan Kara, Marcelo Tosatti Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel Hi, On 31/07/17 12:44, Jeff Layton wrote: > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: >> Hi, >> >> >> On 31/07/17 12:27, Jeff Layton wrote: >>> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: >>>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: >>>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote: >>>>>> +int file_write_and_wait(struct file *file) >>>>>> +{ >>>>>> + int err = 0, err2; >>>>>> + struct address_space *mapping = file->f_mapping; >>>>>> + >>>>>> + if ((!dax_mapping(mapping) && mapping->nrpages) || >>>>>> + (dax_mapping(mapping) && mapping->nrexceptional)) { >>>>>> + err = filemap_fdatawrite(mapping); >>>>>> + /* See comment of filemap_write_and_wait() */ >>>>>> + if (err != -EIO) { >>>>>> + loff_t i_size = i_size_read(mapping->host); >>>>>> + >>>>>> + if (i_size != 0) >>>>>> + __filemap_fdatawait_range(mapping, 0, >>>>>> + i_size - 1); >>>>>> + } >>>>>> + } >>>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the >>>>> range and ignore i_size. It is much easier than trying to wrap your head >>>>> around possible races with file operations modifying i_size. >>>>> >>>>> Honza >>>> I'm basically emulating _exactly_ what filemap_write_and_wait does here, >>>> as I'm leery of making subtle behavior changes in the actual writeback >>>> behavior. For example: >>>> >>>> -----------------8<---------------- >>>> static inline int __filemap_fdatawrite(struct address_space *mapping, >>>> int sync_mode) >>>> { >>>> return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); >>>> } >>>> >>>> int filemap_fdatawrite(struct address_space *mapping) >>>> { >>>> return __filemap_fdatawrite(mapping, WB_SYNC_ALL); >>>> } >>>> EXPORT_SYMBOL(filemap_fdatawrite); >>>> -----------------8<---------------- >>>> >>>> ...which then sets up the wbc with the right ranges and sync mode and >>>> kicks off writepages. But then, it does the i_size_read to figure out >>>> what range it should wait on (with the shortcut for the size == 0 case). >>>> >>>> My assumption was that it was intentionally designed that way, but I'm >>>> guessing from your comments that it wasn't? If so, then we can turn >>>> file_write_and_wait a static inline wrapper around >>>> file_write_and_wait_range. >>> FWIW, I did a bit of archaeology in the linux-history tree and found >>> this patch from Marcelo in 2004. Is this optimization still helpful? If >>> not, then that does simplify the code a bit. >>> >>> -------------------8<-------------------- >>> >>> [PATCH] small wait_on_page_writeback_range() optimization >>> >>> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" >>> parameter. This is not needed since we know the EOF from the inode. Use >>> that instead. >>> >>> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> >>> Signed-off-by: Andrew Morton <akpm@osdl.org> >>> Signed-off-by: Linus Torvalds <torvalds@osdl.org> >>> --- >>> mm/filemap.c | 8 +++++++- >>> 1 file changed, 7 insertions(+), 1 deletion(-) >>> >>> diff --git a/mm/filemap.c b/mm/filemap.c >>> index 78e18b7639b6..55fb7b4141e4 100644 >>> --- a/mm/filemap.c >>> +++ b/mm/filemap.c >>> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); >>> */ >>> int filemap_fdatawait(struct address_space *mapping) >>> { >>> - return wait_on_page_writeback_range(mapping, 0, -1); >>> + loff_t i_size = i_size_read(mapping->host); >>> + >>> + if (i_size == 0) >>> + return 0; >>> + >>> + return wait_on_page_writeback_range(mapping, 0, >>> + (i_size - 1) >> PAGE_CACHE_SHIFT); >>> } >>> EXPORT_SYMBOL(filemap_fdatawait); >>> >> Does this ever get called in cases where we would not hold fs locks? In >> that case we definitely don't want to be relying on i_size, >> >> Steve. >> > Yes. We can initiate and wait on writeback from any context where you > can sleep, really. > > We're just waiting on whole file writeback here, so I don't think > there's anything wrong. As long as the i_size was valid at some point in > time prior to waiting then you're ok. > > The question I have is more whether this optimization is still useful. > > What we do now is just walk the radix tree and wait_on_page_writeback > for each page. Do we gain anything by avoiding ranges beyond the current > EOF with the pagecache infrastructure of 2017? > If this can be called from anywhere without fs locks, then i_size is not known. That has been a problem in the past since i_size may have changed on another node. We avoid that in this case due to only changing i_size under an exclusive lock, and also only having dirty pages when we have an exclusive lock. There is another case though, if the inode is a block device, i_size will be zero. That is the case for the address space that looks after rgrps for GFS2. We do (luckily!) call filemap_fdatawait_range() directly in that case. For "normal" inodes though, the address space for metadata is backed by the block device inode, so that looks like it might be an issue, since fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the metamapping. It might potentially be an issue in other cases too, Steve. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 12:05 ` Steven Whitehouse 0 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-31 12:05 UTC (permalink / raw) To: Jeff Layton, Jan Kara, Marcelo Tosatti Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel Hi, On 31/07/17 12:44, Jeff Layton wrote: > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: >> Hi, >> >> >> On 31/07/17 12:27, Jeff Layton wrote: >>> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: >>>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: >>>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote: >>>>>> +int file_write_and_wait(struct file *file) >>>>>> +{ >>>>>> + int err = 0, err2; >>>>>> + struct address_space *mapping = file->f_mapping; >>>>>> + >>>>>> + if ((!dax_mapping(mapping) && mapping->nrpages) || >>>>>> + (dax_mapping(mapping) && mapping->nrexceptional)) { >>>>>> + err = filemap_fdatawrite(mapping); >>>>>> + /* See comment of filemap_write_and_wait() */ >>>>>> + if (err != -EIO) { >>>>>> + loff_t i_size = i_size_read(mapping->host); >>>>>> + >>>>>> + if (i_size != 0) >>>>>> + __filemap_fdatawait_range(mapping, 0, >>>>>> + i_size - 1); >>>>>> + } >>>>>> + } >>>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the >>>>> range and ignore i_size. It is much easier than trying to wrap your head >>>>> around possible races with file operations modifying i_size. >>>>> >>>>> Honza >>>> I'm basically emulating _exactly_ what filemap_write_and_wait does here, >>>> as I'm leery of making subtle behavior changes in the actual writeback >>>> behavior. For example: >>>> >>>> -----------------8<---------------- >>>> static inline int __filemap_fdatawrite(struct address_space *mapping, >>>> int sync_mode) >>>> { >>>> return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); >>>> } >>>> >>>> int filemap_fdatawrite(struct address_space *mapping) >>>> { >>>> return __filemap_fdatawrite(mapping, WB_SYNC_ALL); >>>> } >>>> EXPORT_SYMBOL(filemap_fdatawrite); >>>> -----------------8<---------------- >>>> >>>> ...which then sets up the wbc with the right ranges and sync mode and >>>> kicks off writepages. But then, it does the i_size_read to figure out >>>> what range it should wait on (with the shortcut for the size == 0 case). >>>> >>>> My assumption was that it was intentionally designed that way, but I'm >>>> guessing from your comments that it wasn't? If so, then we can turn >>>> file_write_and_wait a static inline wrapper around >>>> file_write_and_wait_range. >>> FWIW, I did a bit of archaeology in the linux-history tree and found >>> this patch from Marcelo in 2004. Is this optimization still helpful? If >>> not, then that does simplify the code a bit. >>> >>> -------------------8<-------------------- >>> >>> [PATCH] small wait_on_page_writeback_range() optimization >>> >>> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" >>> parameter. This is not needed since we know the EOF from the inode. Use >>> that instead. >>> >>> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> >>> Signed-off-by: Andrew Morton <akpm@osdl.org> >>> Signed-off-by: Linus Torvalds <torvalds@osdl.org> >>> --- >>> mm/filemap.c | 8 +++++++- >>> 1 file changed, 7 insertions(+), 1 deletion(-) >>> >>> diff --git a/mm/filemap.c b/mm/filemap.c >>> index 78e18b7639b6..55fb7b4141e4 100644 >>> --- a/mm/filemap.c >>> +++ b/mm/filemap.c >>> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); >>> */ >>> int filemap_fdatawait(struct address_space *mapping) >>> { >>> - return wait_on_page_writeback_range(mapping, 0, -1); >>> + loff_t i_size = i_size_read(mapping->host); >>> + >>> + if (i_size == 0) >>> + return 0; >>> + >>> + return wait_on_page_writeback_range(mapping, 0, >>> + (i_size - 1) >> PAGE_CACHE_SHIFT); >>> } >>> EXPORT_SYMBOL(filemap_fdatawait); >>> >> Does this ever get called in cases where we would not hold fs locks? In >> that case we definitely don't want to be relying on i_size, >> >> Steve. >> > Yes. We can initiate and wait on writeback from any context where you > can sleep, really. > > We're just waiting on whole file writeback here, so I don't think > there's anything wrong. As long as the i_size was valid at some point in > time prior to waiting then you're ok. > > The question I have is more whether this optimization is still useful. > > What we do now is just walk the radix tree and wait_on_page_writeback > for each page. Do we gain anything by avoiding ranges beyond the current > EOF with the pagecache infrastructure of 2017? > If this can be called from anywhere without fs locks, then i_size is not known. That has been a problem in the past since i_size may have changed on another node. We avoid that in this case due to only changing i_size under an exclusive lock, and also only having dirty pages when we have an exclusive lock. There is another case though, if the inode is a block device, i_size will be zero. That is the case for the address space that looks after rgrps for GFS2. We do (luckily!) call filemap_fdatawait_range() directly in that case. For "normal" inodes though, the address space for metadata is backed by the block device inode, so that looks like it might be an issue, since fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the metamapping. It might potentially be an issue in other cases too, Steve. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-31 12:05 ` Steven Whitehouse (?) @ 2017-07-31 12:22 ` Jeff Layton -1 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 12:22 UTC (permalink / raw) To: cluster-devel.redhat.com On Mon, 2017-07-31 at 13:05 +0100, Steven Whitehouse wrote: > Hi, > > > On 31/07/17 12:44, Jeff Layton wrote: > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > > Hi, > > > > > > > > > On 31/07/17 12:27, Jeff Layton wrote: > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > > +int file_write_and_wait(struct file *file) > > > > > > > +{ > > > > > > > + int err = 0, err2; > > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > > + > > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > > + if (err != -EIO) { > > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > > + > > > > > > > + if (i_size != 0) > > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > > + i_size - 1); > > > > > > > + } > > > > > > > + } > > > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > > > Honza > > > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > > behavior. For example: > > > > > > > > > > -----------------8<---------------- > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > > int sync_mode) > > > > > { > > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > > } > > > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > > { > > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > > } > > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > > -----------------8<---------------- > > > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > > file_write_and_wait a static inline wrapper around > > > > > file_write_and_wait_range. > > > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > > not, then that does simplify the code a bit. > > > > > > > > -------------------8<-------------------- > > > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > > parameter. This is not needed since we know the EOF from the inode. Use > > > > that instead. > > > > > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > > > --- > > > > mm/filemap.c | 8 +++++++- > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > > --- a/mm/filemap.c > > > > +++ b/mm/filemap.c > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > > */ > > > > int filemap_fdatawait(struct address_space *mapping) > > > > { > > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > > + loff_t i_size = i_size_read(mapping->host); > > > > + > > > > + if (i_size == 0) > > > > + return 0; > > > > + > > > > + return wait_on_page_writeback_range(mapping, 0, > > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > > } > > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > > that case we definitely don't want to be relying on i_size, > > > > > > Steve. > > > > > > > Yes. We can initiate and wait on writeback from any context where you > > can sleep, really. > > > > We're just waiting on whole file writeback here, so I don't think > > there's anything wrong. As long as the i_size was valid at some point in > > time prior to waiting then you're ok. > > > > The question I have is more whether this optimization is still useful. > > > > What we do now is just walk the radix tree and wait_on_page_writeback > > for each page. Do we gain anything by avoiding ranges beyond the current > > EOF with the pagecache infrastructure of 2017? > > > > If this can be called from anywhere without fs locks, then i_size is not > known. That has been a problem in the past since i_size may have changed > on another node. We avoid that in this case due to only changing i_size > under an exclusive lock, and also only having dirty pages when we have > an exclusive lock. There is another case though, if the inode is a block > device, i_size will be zero. That is the case for the address space that > looks after rgrps for GFS2. We do (luckily!) call > filemap_fdatawait_range() directly in that case. For "normal" inodes > though, the address space for metadata is backed by the block device > inode, so that looks like it might be an issue, since > fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the > metamapping. It might potentially be an issue in other cases too, > > Steve. > Some of those do sound problematic. Again though, we're only waiting on writeback here, and I assume with gfs2 that would only be pages that were written on the local node. Is it possible to have pages under writeback and in still in the tree, but that are beyond the current i_size? It seems like that's the main worrisome case. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 12:22 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 12:22 UTC (permalink / raw) To: Steven Whitehouse, Jan Kara, Marcelo Tosatti Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel On Mon, 2017-07-31 at 13:05 +0100, Steven Whitehouse wrote: > Hi, > > > On 31/07/17 12:44, Jeff Layton wrote: > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > > Hi, > > > > > > > > > On 31/07/17 12:27, Jeff Layton wrote: > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > > +int file_write_and_wait(struct file *file) > > > > > > > +{ > > > > > > > + int err = 0, err2; > > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > > + > > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > > + if (err != -EIO) { > > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > > + > > > > > > > + if (i_size != 0) > > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > > + i_size - 1); > > > > > > > + } > > > > > > > + } > > > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > > > Honza > > > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > > behavior. For example: > > > > > > > > > > -----------------8<---------------- > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > > int sync_mode) > > > > > { > > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > > } > > > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > > { > > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > > } > > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > > -----------------8<---------------- > > > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > > file_write_and_wait a static inline wrapper around > > > > > file_write_and_wait_range. > > > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > > not, then that does simplify the code a bit. > > > > > > > > -------------------8<-------------------- > > > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > > parameter. This is not needed since we know the EOF from the inode. Use > > > > that instead. > > > > > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > > > --- > > > > mm/filemap.c | 8 +++++++- > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > > --- a/mm/filemap.c > > > > +++ b/mm/filemap.c > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > > */ > > > > int filemap_fdatawait(struct address_space *mapping) > > > > { > > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > > + loff_t i_size = i_size_read(mapping->host); > > > > + > > > > + if (i_size == 0) > > > > + return 0; > > > > + > > > > + return wait_on_page_writeback_range(mapping, 0, > > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > > } > > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > > that case we definitely don't want to be relying on i_size, > > > > > > Steve. > > > > > > > Yes. We can initiate and wait on writeback from any context where you > > can sleep, really. > > > > We're just waiting on whole file writeback here, so I don't think > > there's anything wrong. As long as the i_size was valid at some point in > > time prior to waiting then you're ok. > > > > The question I have is more whether this optimization is still useful. > > > > What we do now is just walk the radix tree and wait_on_page_writeback > > for each page. Do we gain anything by avoiding ranges beyond the current > > EOF with the pagecache infrastructure of 2017? > > > > If this can be called from anywhere without fs locks, then i_size is not > known. That has been a problem in the past since i_size may have changed > on another node. We avoid that in this case due to only changing i_size > under an exclusive lock, and also only having dirty pages when we have > an exclusive lock. There is another case though, if the inode is a block > device, i_size will be zero. That is the case for the address space that > looks after rgrps for GFS2. We do (luckily!) call > filemap_fdatawait_range() directly in that case. For "normal" inodes > though, the address space for metadata is backed by the block device > inode, so that looks like it might be an issue, since > fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the > metamapping. It might potentially be an issue in other cases too, > > Steve. > Some of those do sound problematic. Again though, we're only waiting on writeback here, and I assume with gfs2 that would only be pages that were written on the local node. Is it possible to have pages under writeback and in still in the tree, but that are beyond the current i_size? It seems like that's the main worrisome case. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 12:22 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 12:22 UTC (permalink / raw) To: Steven Whitehouse, Jan Kara, Marcelo Tosatti Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel On Mon, 2017-07-31 at 13:05 +0100, Steven Whitehouse wrote: > Hi, > > > On 31/07/17 12:44, Jeff Layton wrote: > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > > Hi, > > > > > > > > > On 31/07/17 12:27, Jeff Layton wrote: > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > > +int file_write_and_wait(struct file *file) > > > > > > > +{ > > > > > > > + int err = 0, err2; > > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > > + > > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > > + if (err != -EIO) { > > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > > + > > > > > > > + if (i_size != 0) > > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > > + i_size - 1); > > > > > > > + } > > > > > > > + } > > > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > > > Honza > > > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > > behavior. For example: > > > > > > > > > > -----------------8<---------------- > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > > int sync_mode) > > > > > { > > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > > } > > > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > > { > > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > > } > > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > > -----------------8<---------------- > > > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > > file_write_and_wait a static inline wrapper around > > > > > file_write_and_wait_range. > > > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > > not, then that does simplify the code a bit. > > > > > > > > -------------------8<-------------------- > > > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > > parameter. This is not needed since we know the EOF from the inode. Use > > > > that instead. > > > > > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > > > --- > > > > mm/filemap.c | 8 +++++++- > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > > --- a/mm/filemap.c > > > > +++ b/mm/filemap.c > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > > */ > > > > int filemap_fdatawait(struct address_space *mapping) > > > > { > > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > > + loff_t i_size = i_size_read(mapping->host); > > > > + > > > > + if (i_size == 0) > > > > + return 0; > > > > + > > > > + return wait_on_page_writeback_range(mapping, 0, > > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > > } > > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > > that case we definitely don't want to be relying on i_size, > > > > > > Steve. > > > > > > > Yes. We can initiate and wait on writeback from any context where you > > can sleep, really. > > > > We're just waiting on whole file writeback here, so I don't think > > there's anything wrong. As long as the i_size was valid at some point in > > time prior to waiting then you're ok. > > > > The question I have is more whether this optimization is still useful. > > > > What we do now is just walk the radix tree and wait_on_page_writeback > > for each page. Do we gain anything by avoiding ranges beyond the current > > EOF with the pagecache infrastructure of 2017? > > > > If this can be called from anywhere without fs locks, then i_size is not > known. That has been a problem in the past since i_size may have changed > on another node. We avoid that in this case due to only changing i_size > under an exclusive lock, and also only having dirty pages when we have > an exclusive lock. There is another case though, if the inode is a block > device, i_size will be zero. That is the case for the address space that > looks after rgrps for GFS2. We do (luckily!) call > filemap_fdatawait_range() directly in that case. For "normal" inodes > though, the address space for metadata is backed by the block device > inode, so that looks like it might be an issue, since > fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the > metamapping. It might potentially be an issue in other cases too, > > Steve. > Some of those do sound problematic. Again though, we're only waiting on writeback here, and I assume with gfs2 that would only be pages that were written on the local node. Is it possible to have pages under writeback and in still in the tree, but that are beyond the current i_size? It seems like that's the main worrisome case. -- Jeff Layton <jlayton@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-31 12:22 ` Jeff Layton (?) @ 2017-07-31 12:25 ` Steven Whitehouse -1 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-31 12:25 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, On 31/07/17 13:22, Jeff Layton wrote: > On Mon, 2017-07-31 at 13:05 +0100, Steven Whitehouse wrote: >> Hi, >> >> >> On 31/07/17 12:44, Jeff Layton wrote: >>> On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: >>>> Hi, >>>> >>>> >>>> On 31/07/17 12:27, Jeff Layton wrote: >>>>> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: >>>>>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: >>>>>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote: >>>>>>>> +int file_write_and_wait(struct file *file) >>>>>>>> +{ >>>>>>>> + int err = 0, err2; >>>>>>>> + struct address_space *mapping = file->f_mapping; >>>>>>>> + >>>>>>>> + if ((!dax_mapping(mapping) && mapping->nrpages) || >>>>>>>> + (dax_mapping(mapping) && mapping->nrexceptional)) { >>>>>>>> + err = filemap_fdatawrite(mapping); >>>>>>>> + /* See comment of filemap_write_and_wait() */ >>>>>>>> + if (err != -EIO) { >>>>>>>> + loff_t i_size = i_size_read(mapping->host); >>>>>>>> + >>>>>>>> + if (i_size != 0) >>>>>>>> + __filemap_fdatawait_range(mapping, 0, >>>>>>>> + i_size - 1); >>>>>>>> + } >>>>>>>> + } >>>>>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the >>>>>>> range and ignore i_size. It is much easier than trying to wrap your head >>>>>>> around possible races with file operations modifying i_size. >>>>>>> >>>>>>> Honza >>>>>> I'm basically emulating _exactly_ what filemap_write_and_wait does here, >>>>>> as I'm leery of making subtle behavior changes in the actual writeback >>>>>> behavior. For example: >>>>>> >>>>>> -----------------8<---------------- >>>>>> static inline int __filemap_fdatawrite(struct address_space *mapping, >>>>>> int sync_mode) >>>>>> { >>>>>> return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); >>>>>> } >>>>>> >>>>>> int filemap_fdatawrite(struct address_space *mapping) >>>>>> { >>>>>> return __filemap_fdatawrite(mapping, WB_SYNC_ALL); >>>>>> } >>>>>> EXPORT_SYMBOL(filemap_fdatawrite); >>>>>> -----------------8<---------------- >>>>>> >>>>>> ...which then sets up the wbc with the right ranges and sync mode and >>>>>> kicks off writepages. But then, it does the i_size_read to figure out >>>>>> what range it should wait on (with the shortcut for the size == 0 case). >>>>>> >>>>>> My assumption was that it was intentionally designed that way, but I'm >>>>>> guessing from your comments that it wasn't? If so, then we can turn >>>>>> file_write_and_wait a static inline wrapper around >>>>>> file_write_and_wait_range. >>>>> FWIW, I did a bit of archaeology in the linux-history tree and found >>>>> this patch from Marcelo in 2004. Is this optimization still helpful? If >>>>> not, then that does simplify the code a bit. >>>>> >>>>> -------------------8<-------------------- >>>>> >>>>> [PATCH] small wait_on_page_writeback_range() optimization >>>>> >>>>> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" >>>>> parameter. This is not needed since we know the EOF from the inode. Use >>>>> that instead. >>>>> >>>>> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> >>>>> Signed-off-by: Andrew Morton <akpm@osdl.org> >>>>> Signed-off-by: Linus Torvalds <torvalds@osdl.org> >>>>> --- >>>>> mm/filemap.c | 8 +++++++- >>>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/mm/filemap.c b/mm/filemap.c >>>>> index 78e18b7639b6..55fb7b4141e4 100644 >>>>> --- a/mm/filemap.c >>>>> +++ b/mm/filemap.c >>>>> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); >>>>> */ >>>>> int filemap_fdatawait(struct address_space *mapping) >>>>> { >>>>> - return wait_on_page_writeback_range(mapping, 0, -1); >>>>> + loff_t i_size = i_size_read(mapping->host); >>>>> + >>>>> + if (i_size == 0) >>>>> + return 0; >>>>> + >>>>> + return wait_on_page_writeback_range(mapping, 0, >>>>> + (i_size - 1) >> PAGE_CACHE_SHIFT); >>>>> } >>>>> EXPORT_SYMBOL(filemap_fdatawait); >>>>> >>>> Does this ever get called in cases where we would not hold fs locks? In >>>> that case we definitely don't want to be relying on i_size, >>>> >>>> Steve. >>>> >>> Yes. We can initiate and wait on writeback from any context where you >>> can sleep, really. >>> >>> We're just waiting on whole file writeback here, so I don't think >>> there's anything wrong. As long as the i_size was valid at some point in >>> time prior to waiting then you're ok. >>> >>> The question I have is more whether this optimization is still useful. >>> >>> What we do now is just walk the radix tree and wait_on_page_writeback >>> for each page. Do we gain anything by avoiding ranges beyond the current >>> EOF with the pagecache infrastructure of 2017? >>> >> If this can be called from anywhere without fs locks, then i_size is not >> known. That has been a problem in the past since i_size may have changed >> on another node. We avoid that in this case due to only changing i_size >> under an exclusive lock, and also only having dirty pages when we have >> an exclusive lock. There is another case though, if the inode is a block >> device, i_size will be zero. That is the case for the address space that >> looks after rgrps for GFS2. We do (luckily!) call >> filemap_fdatawait_range() directly in that case. For "normal" inodes >> though, the address space for metadata is backed by the block device >> inode, so that looks like it might be an issue, since >> fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the >> metamapping. It might potentially be an issue in other cases too, >> >> Steve. >> > Some of those do sound problematic. > > Again though, we're only waiting on writeback here, and I assume with > gfs2 that would only be pages that were written on the local node. Yes > > Is it possible to have pages under writeback and in still in the tree, > but that are beyond the current i_size? It seems like that's the main > worrisome case. > Thats what I was wondering too. I'm not 100% sure without some more detailed investigation. Either way the block device case also seems problematic, although not impossible to special case I suppose. The real question is what do we get from this optmisation? Is the pain of checking correctness worth it for the benefits gained, Steve. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 12:25 ` Steven Whitehouse 0 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-31 12:25 UTC (permalink / raw) To: Jeff Layton, Jan Kara, Marcelo Tosatti Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel Hi, On 31/07/17 13:22, Jeff Layton wrote: > On Mon, 2017-07-31 at 13:05 +0100, Steven Whitehouse wrote: >> Hi, >> >> >> On 31/07/17 12:44, Jeff Layton wrote: >>> On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: >>>> Hi, >>>> >>>> >>>> On 31/07/17 12:27, Jeff Layton wrote: >>>>> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: >>>>>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: >>>>>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote: >>>>>>>> +int file_write_and_wait(struct file *file) >>>>>>>> +{ >>>>>>>> + int err = 0, err2; >>>>>>>> + struct address_space *mapping = file->f_mapping; >>>>>>>> + >>>>>>>> + if ((!dax_mapping(mapping) && mapping->nrpages) || >>>>>>>> + (dax_mapping(mapping) && mapping->nrexceptional)) { >>>>>>>> + err = filemap_fdatawrite(mapping); >>>>>>>> + /* See comment of filemap_write_and_wait() */ >>>>>>>> + if (err != -EIO) { >>>>>>>> + loff_t i_size = i_size_read(mapping->host); >>>>>>>> + >>>>>>>> + if (i_size != 0) >>>>>>>> + __filemap_fdatawait_range(mapping, 0, >>>>>>>> + i_size - 1); >>>>>>>> + } >>>>>>>> + } >>>>>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the >>>>>>> range and ignore i_size. It is much easier than trying to wrap your head >>>>>>> around possible races with file operations modifying i_size. >>>>>>> >>>>>>> Honza >>>>>> I'm basically emulating _exactly_ what filemap_write_and_wait does here, >>>>>> as I'm leery of making subtle behavior changes in the actual writeback >>>>>> behavior. For example: >>>>>> >>>>>> -----------------8<---------------- >>>>>> static inline int __filemap_fdatawrite(struct address_space *mapping, >>>>>> int sync_mode) >>>>>> { >>>>>> return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); >>>>>> } >>>>>> >>>>>> int filemap_fdatawrite(struct address_space *mapping) >>>>>> { >>>>>> return __filemap_fdatawrite(mapping, WB_SYNC_ALL); >>>>>> } >>>>>> EXPORT_SYMBOL(filemap_fdatawrite); >>>>>> -----------------8<---------------- >>>>>> >>>>>> ...which then sets up the wbc with the right ranges and sync mode and >>>>>> kicks off writepages. But then, it does the i_size_read to figure out >>>>>> what range it should wait on (with the shortcut for the size == 0 case). >>>>>> >>>>>> My assumption was that it was intentionally designed that way, but I'm >>>>>> guessing from your comments that it wasn't? If so, then we can turn >>>>>> file_write_and_wait a static inline wrapper around >>>>>> file_write_and_wait_range. >>>>> FWIW, I did a bit of archaeology in the linux-history tree and found >>>>> this patch from Marcelo in 2004. Is this optimization still helpful? If >>>>> not, then that does simplify the code a bit. >>>>> >>>>> -------------------8<-------------------- >>>>> >>>>> [PATCH] small wait_on_page_writeback_range() optimization >>>>> >>>>> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" >>>>> parameter. This is not needed since we know the EOF from the inode. Use >>>>> that instead. >>>>> >>>>> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> >>>>> Signed-off-by: Andrew Morton <akpm@osdl.org> >>>>> Signed-off-by: Linus Torvalds <torvalds@osdl.org> >>>>> --- >>>>> mm/filemap.c | 8 +++++++- >>>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/mm/filemap.c b/mm/filemap.c >>>>> index 78e18b7639b6..55fb7b4141e4 100644 >>>>> --- a/mm/filemap.c >>>>> +++ b/mm/filemap.c >>>>> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); >>>>> */ >>>>> int filemap_fdatawait(struct address_space *mapping) >>>>> { >>>>> - return wait_on_page_writeback_range(mapping, 0, -1); >>>>> + loff_t i_size = i_size_read(mapping->host); >>>>> + >>>>> + if (i_size == 0) >>>>> + return 0; >>>>> + >>>>> + return wait_on_page_writeback_range(mapping, 0, >>>>> + (i_size - 1) >> PAGE_CACHE_SHIFT); >>>>> } >>>>> EXPORT_SYMBOL(filemap_fdatawait); >>>>> >>>> Does this ever get called in cases where we would not hold fs locks? In >>>> that case we definitely don't want to be relying on i_size, >>>> >>>> Steve. >>>> >>> Yes. We can initiate and wait on writeback from any context where you >>> can sleep, really. >>> >>> We're just waiting on whole file writeback here, so I don't think >>> there's anything wrong. As long as the i_size was valid at some point in >>> time prior to waiting then you're ok. >>> >>> The question I have is more whether this optimization is still useful. >>> >>> What we do now is just walk the radix tree and wait_on_page_writeback >>> for each page. Do we gain anything by avoiding ranges beyond the current >>> EOF with the pagecache infrastructure of 2017? >>> >> If this can be called from anywhere without fs locks, then i_size is not >> known. That has been a problem in the past since i_size may have changed >> on another node. We avoid that in this case due to only changing i_size >> under an exclusive lock, and also only having dirty pages when we have >> an exclusive lock. There is another case though, if the inode is a block >> device, i_size will be zero. That is the case for the address space that >> looks after rgrps for GFS2. We do (luckily!) call >> filemap_fdatawait_range() directly in that case. For "normal" inodes >> though, the address space for metadata is backed by the block device >> inode, so that looks like it might be an issue, since >> fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the >> metamapping. It might potentially be an issue in other cases too, >> >> Steve. >> > Some of those do sound problematic. > > Again though, we're only waiting on writeback here, and I assume with > gfs2 that would only be pages that were written on the local node. Yes > > Is it possible to have pages under writeback and in still in the tree, > but that are beyond the current i_size? It seems like that's the main > worrisome case. > Thats what I was wondering too. I'm not 100% sure without some more detailed investigation. Either way the block device case also seems problematic, although not impossible to special case I suppose. The real question is what do we get from this optmisation? Is the pain of checking correctness worth it for the benefits gained, Steve. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 12:25 ` Steven Whitehouse 0 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-31 12:25 UTC (permalink / raw) To: Jeff Layton, Jan Kara, Marcelo Tosatti Cc: Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel Hi, On 31/07/17 13:22, Jeff Layton wrote: > On Mon, 2017-07-31 at 13:05 +0100, Steven Whitehouse wrote: >> Hi, >> >> >> On 31/07/17 12:44, Jeff Layton wrote: >>> On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: >>>> Hi, >>>> >>>> >>>> On 31/07/17 12:27, Jeff Layton wrote: >>>>> On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: >>>>>> On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: >>>>>>> On Wed 26-07-17 13:55:36, Jeff Layton wrote: >>>>>>>> +int file_write_and_wait(struct file *file) >>>>>>>> +{ >>>>>>>> + int err = 0, err2; >>>>>>>> + struct address_space *mapping = file->f_mapping; >>>>>>>> + >>>>>>>> + if ((!dax_mapping(mapping) && mapping->nrpages) || >>>>>>>> + (dax_mapping(mapping) && mapping->nrexceptional)) { >>>>>>>> + err = filemap_fdatawrite(mapping); >>>>>>>> + /* See comment of filemap_write_and_wait() */ >>>>>>>> + if (err != -EIO) { >>>>>>>> + loff_t i_size = i_size_read(mapping->host); >>>>>>>> + >>>>>>>> + if (i_size != 0) >>>>>>>> + __filemap_fdatawait_range(mapping, 0, >>>>>>>> + i_size - 1); >>>>>>>> + } >>>>>>>> + } >>>>>>> Err, what's the i_size check doing here? I'd just pass ~0 as the end of the >>>>>>> range and ignore i_size. It is much easier than trying to wrap your head >>>>>>> around possible races with file operations modifying i_size. >>>>>>> >>>>>>> Honza >>>>>> I'm basically emulating _exactly_ what filemap_write_and_wait does here, >>>>>> as I'm leery of making subtle behavior changes in the actual writeback >>>>>> behavior. For example: >>>>>> >>>>>> -----------------8<---------------- >>>>>> static inline int __filemap_fdatawrite(struct address_space *mapping, >>>>>> int sync_mode) >>>>>> { >>>>>> return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); >>>>>> } >>>>>> >>>>>> int filemap_fdatawrite(struct address_space *mapping) >>>>>> { >>>>>> return __filemap_fdatawrite(mapping, WB_SYNC_ALL); >>>>>> } >>>>>> EXPORT_SYMBOL(filemap_fdatawrite); >>>>>> -----------------8<---------------- >>>>>> >>>>>> ...which then sets up the wbc with the right ranges and sync mode and >>>>>> kicks off writepages. But then, it does the i_size_read to figure out >>>>>> what range it should wait on (with the shortcut for the size == 0 case). >>>>>> >>>>>> My assumption was that it was intentionally designed that way, but I'm >>>>>> guessing from your comments that it wasn't? If so, then we can turn >>>>>> file_write_and_wait a static inline wrapper around >>>>>> file_write_and_wait_range. >>>>> FWIW, I did a bit of archaeology in the linux-history tree and found >>>>> this patch from Marcelo in 2004. Is this optimization still helpful? If >>>>> not, then that does simplify the code a bit. >>>>> >>>>> -------------------8<-------------------- >>>>> >>>>> [PATCH] small wait_on_page_writeback_range() optimization >>>>> >>>>> filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" >>>>> parameter. This is not needed since we know the EOF from the inode. Use >>>>> that instead. >>>>> >>>>> Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> >>>>> Signed-off-by: Andrew Morton <akpm@osdl.org> >>>>> Signed-off-by: Linus Torvalds <torvalds@osdl.org> >>>>> --- >>>>> mm/filemap.c | 8 +++++++- >>>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/mm/filemap.c b/mm/filemap.c >>>>> index 78e18b7639b6..55fb7b4141e4 100644 >>>>> --- a/mm/filemap.c >>>>> +++ b/mm/filemap.c >>>>> @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); >>>>> */ >>>>> int filemap_fdatawait(struct address_space *mapping) >>>>> { >>>>> - return wait_on_page_writeback_range(mapping, 0, -1); >>>>> + loff_t i_size = i_size_read(mapping->host); >>>>> + >>>>> + if (i_size == 0) >>>>> + return 0; >>>>> + >>>>> + return wait_on_page_writeback_range(mapping, 0, >>>>> + (i_size - 1) >> PAGE_CACHE_SHIFT); >>>>> } >>>>> EXPORT_SYMBOL(filemap_fdatawait); >>>>> >>>> Does this ever get called in cases where we would not hold fs locks? In >>>> that case we definitely don't want to be relying on i_size, >>>> >>>> Steve. >>>> >>> Yes. We can initiate and wait on writeback from any context where you >>> can sleep, really. >>> >>> We're just waiting on whole file writeback here, so I don't think >>> there's anything wrong. As long as the i_size was valid at some point in >>> time prior to waiting then you're ok. >>> >>> The question I have is more whether this optimization is still useful. >>> >>> What we do now is just walk the radix tree and wait_on_page_writeback >>> for each page. Do we gain anything by avoiding ranges beyond the current >>> EOF with the pagecache infrastructure of 2017? >>> >> If this can be called from anywhere without fs locks, then i_size is not >> known. That has been a problem in the past since i_size may have changed >> on another node. We avoid that in this case due to only changing i_size >> under an exclusive lock, and also only having dirty pages when we have >> an exclusive lock. There is another case though, if the inode is a block >> device, i_size will be zero. That is the case for the address space that >> looks after rgrps for GFS2. We do (luckily!) call >> filemap_fdatawait_range() directly in that case. For "normal" inodes >> though, the address space for metadata is backed by the block device >> inode, so that looks like it might be an issue, since >> fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the >> metamapping. It might potentially be an issue in other cases too, >> >> Steve. >> > Some of those do sound problematic. > > Again though, we're only waiting on writeback here, and I assume with > gfs2 that would only be pages that were written on the local node. Yes > > Is it possible to have pages under writeback and in still in the tree, > but that are beyond the current i_size? It seems like that's the main > worrisome case. > Thats what I was wondering too. I'm not 100% sure without some more detailed investigation. Either way the block device case also seems problematic, although not impossible to special case I suppose. The real question is what do we get from this optmisation? Is the pain of checking correctness worth it for the benefits gained, Steve. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-31 12:22 ` Jeff Layton (?) @ 2017-07-31 12:38 ` Bob Peterson -1 siblings, 0 replies; 87+ messages in thread From: Bob Peterson @ 2017-07-31 12:38 UTC (permalink / raw) To: cluster-devel.redhat.com ----- Original Message ----- | > If this can be called from anywhere without fs locks, then i_size is not | > known. That has been a problem in the past since i_size may have changed | > on another node. We avoid that in this case due to only changing i_size | > under an exclusive lock, and also only having dirty pages when we have | > an exclusive lock. There is another case though, if the inode is a block | > device, i_size will be zero. That is the case for the address space that | > looks after rgrps for GFS2. We do (luckily!) call | > filemap_fdatawait_range() directly in that case. For "normal" inodes | > though, the address space for metadata is backed by the block device | > inode, so that looks like it might be an issue, since | > fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the | > metamapping. It might potentially be an issue in other cases too, | > | > Steve. | > | | Some of those do sound problematic. | | Again though, we're only waiting on writeback here, and I assume with | gfs2 that would only be pages that were written on the local node. | | Is it possible to have pages under writeback and in still in the tree, | but that are beyond the current i_size? It seems like that's the main | worrisome case. | | -- | Jeff Layton <jlayton@redhat.com> Hi Jeff, I believe the answer is yes. I was recently "bitten" by a case where (whether due to a bug or not) I had blocks allocated in a GFS2 file beyond i_size. I had implemented a delete algorithm that used i_size, but I found cases where files couldn't be deleted because of blocks hanging out past EOF. I'm not sure if they can be in writeback, but possibly. It's already on my "to investigate" list, but I haven't gotten to it yet. Yes, it seems like a bug. Yes, we need to fix it. But now there may be lots of legacy file systems out in the field that have this problem. Not sure if they can get to writeback until I study the situation more closely. I believe Ben Marzinski also may have come across a case in which we can have blocks in writeback that are beyond i_size. See the commit message on Ben's patch here: https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=fd4c5748b8d3f7420e8932ed0bde3d53cc8acc9d Regards, Bob Peterson Red Hat File Systems ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 12:38 ` Bob Peterson 0 siblings, 0 replies; 87+ messages in thread From: Bob Peterson @ 2017-07-31 12:38 UTC (permalink / raw) To: Jeff Layton Cc: Steven Whitehouse, Jan Kara, Marcelo Tosatti, Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, cluster-devel, Benjamin Marzinski ----- Original Message ----- | > If this can be called from anywhere without fs locks, then i_size is not | > known. That has been a problem in the past since i_size may have changed | > on another node. We avoid that in this case due to only changing i_size | > under an exclusive lock, and also only having dirty pages when we have | > an exclusive lock. There is another case though, if the inode is a block | > device, i_size will be zero. That is the case for the address space that | > looks after rgrps for GFS2. We do (luckily!) call | > filemap_fdatawait_range() directly in that case. For "normal" inodes | > though, the address space for metadata is backed by the block device | > inode, so that looks like it might be an issue, since | > fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the | > metamapping. It might potentially be an issue in other cases too, | > | > Steve. | > | | Some of those do sound problematic. | | Again though, we're only waiting on writeback here, and I assume with | gfs2 that would only be pages that were written on the local node. | | Is it possible to have pages under writeback and in still in the tree, | but that are beyond the current i_size? It seems like that's the main | worrisome case. | | -- | Jeff Layton <jlayton@redhat.com> Hi Jeff, I believe the answer is yes. I was recently "bitten" by a case where (whether due to a bug or not) I had blocks allocated in a GFS2 file beyond i_size. I had implemented a delete algorithm that used i_size, but I found cases where files couldn't be deleted because of blocks hanging out past EOF. I'm not sure if they can be in writeback, but possibly. It's already on my "to investigate" list, but I haven't gotten to it yet. Yes, it seems like a bug. Yes, we need to fix it. But now there may be lots of legacy file systems out in the field that have this problem. Not sure if they can get to writeback until I study the situation more closely. I believe Ben Marzinski also may have come across a case in which we can have blocks in writeback that are beyond i_size. See the commit message on Ben's patch here: https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=fd4c5748b8d3f7420e8932ed0bde3d53cc8acc9d Regards, Bob Peterson Red Hat File Systems ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 12:38 ` Bob Peterson 0 siblings, 0 replies; 87+ messages in thread From: Bob Peterson @ 2017-07-31 12:38 UTC (permalink / raw) To: Jeff Layton Cc: Steven Whitehouse, Jan Kara, Marcelo Tosatti, Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, cluster-devel, Benjamin Marzinski ----- Original Message ----- | > If this can be called from anywhere without fs locks, then i_size is not | > known. That has been a problem in the past since i_size may have changed | > on another node. We avoid that in this case due to only changing i_size | > under an exclusive lock, and also only having dirty pages when we have | > an exclusive lock. There is another case though, if the inode is a block | > device, i_size will be zero. That is the case for the address space that | > looks after rgrps for GFS2. We do (luckily!) call | > filemap_fdatawait_range() directly in that case. For "normal" inodes | > though, the address space for metadata is backed by the block device | > inode, so that looks like it might be an issue, since | > fs/gfs2/glops.c:inode_go_sync() calls filemap_fdatawait() on the | > metamapping. It might potentially be an issue in other cases too, | > | > Steve. | > | | Some of those do sound problematic. | | Again though, we're only waiting on writeback here, and I assume with | gfs2 that would only be pages that were written on the local node. | | Is it possible to have pages under writeback and in still in the tree, | but that are beyond the current i_size? It seems like that's the main | worrisome case. | | -- | Jeff Layton <jlayton@redhat.com> Hi Jeff, I believe the answer is yes. I was recently "bitten" by a case where (whether due to a bug or not) I had blocks allocated in a GFS2 file beyond i_size. I had implemented a delete algorithm that used i_size, but I found cases where files couldn't be deleted because of blocks hanging out past EOF. I'm not sure if they can be in writeback, but possibly. It's already on my "to investigate" list, but I haven't gotten to it yet. Yes, it seems like a bug. Yes, we need to fix it. But now there may be lots of legacy file systems out in the field that have this problem. Not sure if they can get to writeback until I study the situation more closely. I believe Ben Marzinski also may have come across a case in which we can have blocks in writeback that are beyond i_size. See the commit message on Ben's patch here: https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=fd4c5748b8d3f7420e8932ed0bde3d53cc8acc9d Regards, Bob Peterson Red Hat File Systems -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-31 11:44 ` Jeff Layton (?) @ 2017-07-31 12:07 ` Jan Kara -1 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-07-31 12:07 UTC (permalink / raw) To: cluster-devel.redhat.com On Mon 31-07-17 07:44:16, Jeff Layton wrote: > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > On 31/07/17 12:27, Jeff Layton wrote: > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > +int file_write_and_wait(struct file *file) > > > > > > +{ > > > > > > + int err = 0, err2; > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > + > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > + if (err != -EIO) { > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > + > > > > > > + if (i_size != 0) > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > + i_size - 1); > > > > > > + } > > > > > > + } > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > Honza > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > behavior. For example: > > > > > > > > -----------------8<---------------- > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > int sync_mode) > > > > { > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > } > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > { > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > } > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > -----------------8<---------------- > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > file_write_and_wait a static inline wrapper around > > > > file_write_and_wait_range. > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > not, then that does simplify the code a bit. > > > > > > -------------------8<-------------------- > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > parameter. This is not needed since we know the EOF from the inode. Use > > > that instead. > > > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > > --- > > > mm/filemap.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > --- a/mm/filemap.c > > > +++ b/mm/filemap.c > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > */ > > > int filemap_fdatawait(struct address_space *mapping) > > > { > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > + loff_t i_size = i_size_read(mapping->host); > > > + > > > + if (i_size == 0) > > > + return 0; > > > + > > > + return wait_on_page_writeback_range(mapping, 0, > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > } > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > that case we definitely don't want to be relying on i_size, > > > > Steve. > > > > Yes. We can initiate and wait on writeback from any context where you > can sleep, really. > > We're just waiting on whole file writeback here, so I don't think > there's anything wrong. As long as the i_size was valid at some point in > time prior to waiting then you're ok. > > The question I have is more whether this optimization is still useful. > > What we do now is just walk the radix tree and wait_on_page_writeback > for each page. Do we gain anything by avoiding ranges beyond the current > EOF with the pagecache infrastructure of 2017? FWIW I'm not aware of any significant benefit of using i_size in filemap_fdatawait() - we iterate to the end of the radix tree node anyway since pagevec_lookup_tag() does not support range searches anyway (I'm working on fixing that however even after that the benefit would be still rather marginal). What Marcello might have meant even back in 2004 was that if we are in the middle of truncate, i_size is already reduced but page cache not truncated yet, then filemap_fdatawait() does not have to wait for writeback of truncated pages. That might be a noticeable benefit even today if such race happens however I'm not sure it's worth optimizing for and surprises arising from randomly snapshotting i_size (which especially for clustered filesystems may be out of date) IMHO overweight the possible advantage. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 12:07 ` Jan Kara 0 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-07-31 12:07 UTC (permalink / raw) To: Jeff Layton Cc: Steven Whitehouse, Jan Kara, Marcelo Tosatti, Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel On Mon 31-07-17 07:44:16, Jeff Layton wrote: > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > On 31/07/17 12:27, Jeff Layton wrote: > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > +int file_write_and_wait(struct file *file) > > > > > > +{ > > > > > > + int err = 0, err2; > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > + > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > + if (err != -EIO) { > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > + > > > > > > + if (i_size != 0) > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > + i_size - 1); > > > > > > + } > > > > > > + } > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > Honza > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > behavior. For example: > > > > > > > > -----------------8<---------------- > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > int sync_mode) > > > > { > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > } > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > { > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > } > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > -----------------8<---------------- > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > file_write_and_wait a static inline wrapper around > > > > file_write_and_wait_range. > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > not, then that does simplify the code a bit. > > > > > > -------------------8<-------------------- > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > parameter. This is not needed since we know the EOF from the inode. Use > > > that instead. > > > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > > --- > > > mm/filemap.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > --- a/mm/filemap.c > > > +++ b/mm/filemap.c > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > */ > > > int filemap_fdatawait(struct address_space *mapping) > > > { > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > + loff_t i_size = i_size_read(mapping->host); > > > + > > > + if (i_size == 0) > > > + return 0; > > > + > > > + return wait_on_page_writeback_range(mapping, 0, > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > } > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > that case we definitely don't want to be relying on i_size, > > > > Steve. > > > > Yes. We can initiate and wait on writeback from any context where you > can sleep, really. > > We're just waiting on whole file writeback here, so I don't think > there's anything wrong. As long as the i_size was valid at some point in > time prior to waiting then you're ok. > > The question I have is more whether this optimization is still useful. > > What we do now is just walk the radix tree and wait_on_page_writeback > for each page. Do we gain anything by avoiding ranges beyond the current > EOF with the pagecache infrastructure of 2017? FWIW I'm not aware of any significant benefit of using i_size in filemap_fdatawait() - we iterate to the end of the radix tree node anyway since pagevec_lookup_tag() does not support range searches anyway (I'm working on fixing that however even after that the benefit would be still rather marginal). What Marcello might have meant even back in 2004 was that if we are in the middle of truncate, i_size is already reduced but page cache not truncated yet, then filemap_fdatawait() does not have to wait for writeback of truncated pages. That might be a noticeable benefit even today if such race happens however I'm not sure it's worth optimizing for and surprises arising from randomly snapshotting i_size (which especially for clustered filesystems may be out of date) IMHO overweight the possible advantage. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 12:07 ` Jan Kara 0 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-07-31 12:07 UTC (permalink / raw) To: Jeff Layton Cc: Steven Whitehouse, Jan Kara, Marcelo Tosatti, Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel On Mon 31-07-17 07:44:16, Jeff Layton wrote: > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > On 31/07/17 12:27, Jeff Layton wrote: > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > +int file_write_and_wait(struct file *file) > > > > > > +{ > > > > > > + int err = 0, err2; > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > + > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > + if (err != -EIO) { > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > + > > > > > > + if (i_size != 0) > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > + i_size - 1); > > > > > > + } > > > > > > + } > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > Honza > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > behavior. For example: > > > > > > > > -----------------8<---------------- > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > int sync_mode) > > > > { > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > } > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > { > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > } > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > -----------------8<---------------- > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > file_write_and_wait a static inline wrapper around > > > > file_write_and_wait_range. > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > not, then that does simplify the code a bit. > > > > > > -------------------8<-------------------- > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > parameter. This is not needed since we know the EOF from the inode. Use > > > that instead. > > > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > > --- > > > mm/filemap.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > --- a/mm/filemap.c > > > +++ b/mm/filemap.c > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > */ > > > int filemap_fdatawait(struct address_space *mapping) > > > { > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > + loff_t i_size = i_size_read(mapping->host); > > > + > > > + if (i_size == 0) > > > + return 0; > > > + > > > + return wait_on_page_writeback_range(mapping, 0, > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > } > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > that case we definitely don't want to be relying on i_size, > > > > Steve. > > > > Yes. We can initiate and wait on writeback from any context where you > can sleep, really. > > We're just waiting on whole file writeback here, so I don't think > there's anything wrong. As long as the i_size was valid at some point in > time prior to waiting then you're ok. > > The question I have is more whether this optimization is still useful. > > What we do now is just walk the radix tree and wait_on_page_writeback > for each page. Do we gain anything by avoiding ranges beyond the current > EOF with the pagecache infrastructure of 2017? FWIW I'm not aware of any significant benefit of using i_size in filemap_fdatawait() - we iterate to the end of the radix tree node anyway since pagevec_lookup_tag() does not support range searches anyway (I'm working on fixing that however even after that the benefit would be still rather marginal). What Marcello might have meant even back in 2004 was that if we are in the middle of truncate, i_size is already reduced but page cache not truncated yet, then filemap_fdatawait() does not have to wait for writeback of truncated pages. That might be a noticeable benefit even today if such race happens however I'm not sure it's worth optimizing for and surprises arising from randomly snapshotting i_size (which especially for clustered filesystems may be out of date) IMHO overweight the possible advantage. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-31 12:07 ` Jan Kara (?) @ 2017-07-31 13:00 ` Jeff Layton -1 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 13:00 UTC (permalink / raw) To: cluster-devel.redhat.com On Mon, 2017-07-31 at 14:07 +0200, Jan Kara wrote: > On Mon 31-07-17 07:44:16, Jeff Layton wrote: > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > > On 31/07/17 12:27, Jeff Layton wrote: > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > > +int file_write_and_wait(struct file *file) > > > > > > > +{ > > > > > > > + int err = 0, err2; > > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > > + > > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > > + if (err != -EIO) { > > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > > + > > > > > > > + if (i_size != 0) > > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > > + i_size - 1); > > > > > > > + } > > > > > > > + } > > > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > > > Honza > > > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > > behavior. For example: > > > > > > > > > > -----------------8<---------------- > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > > int sync_mode) > > > > > { > > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > > } > > > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > > { > > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > > } > > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > > -----------------8<---------------- > > > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > > file_write_and_wait a static inline wrapper around > > > > > file_write_and_wait_range. > > > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > > not, then that does simplify the code a bit. > > > > > > > > -------------------8<-------------------- > > > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > > parameter. This is not needed since we know the EOF from the inode. Use > > > > that instead. > > > > > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > > > --- > > > > mm/filemap.c | 8 +++++++- > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > > --- a/mm/filemap.c > > > > +++ b/mm/filemap.c > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > > */ > > > > int filemap_fdatawait(struct address_space *mapping) > > > > { > > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > > + loff_t i_size = i_size_read(mapping->host); > > > > + > > > > + if (i_size == 0) > > > > + return 0; > > > > + > > > > + return wait_on_page_writeback_range(mapping, 0, > > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > > } > > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > > that case we definitely don't want to be relying on i_size, > > > > > > Steve. > > > > > > > Yes. We can initiate and wait on writeback from any context where you > > can sleep, really. > > > > We're just waiting on whole file writeback here, so I don't think > > there's anything wrong. As long as the i_size was valid at some point in > > time prior to waiting then you're ok. > > > > The question I have is more whether this optimization is still useful. > > > > What we do now is just walk the radix tree and wait_on_page_writeback > > for each page. Do we gain anything by avoiding ranges beyond the current > > EOF with the pagecache infrastructure of 2017? > > FWIW I'm not aware of any significant benefit of using i_size in > filemap_fdatawait() - we iterate to the end of the radix tree node anyway > since pagevec_lookup_tag() does not support range searches anyway (I'm > working on fixing that however even after that the benefit would be still > rather marginal). > > What Marcello might have meant even back in 2004 was that if we are in the > middle of truncate, i_size is already reduced but page cache not truncated > yet, then filemap_fdatawait() does not have to wait for writeback of > truncated pages. That might be a noticeable benefit even today if such race > happens however I'm not sure it's worth optimizing for and surprises > arising from randomly snapshotting i_size (which especially for clustered > filesystems may be out of date) IMHO overweight the possible advantage. > > Honza Thanks for clarifying. Given that file_write_and_wait is a new helper function anyway, I'll just make it a wrapper around file_write_and_wait_range. Since it might be racy, should remove this optimization from the "legacy" filemap_fdatawait / filemap_fdatawait_keep_errors calls? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 13:00 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 13:00 UTC (permalink / raw) To: Jan Kara Cc: Steven Whitehouse, Marcelo Tosatti, Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel On Mon, 2017-07-31 at 14:07 +0200, Jan Kara wrote: > On Mon 31-07-17 07:44:16, Jeff Layton wrote: > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > > On 31/07/17 12:27, Jeff Layton wrote: > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > > +int file_write_and_wait(struct file *file) > > > > > > > +{ > > > > > > > + int err = 0, err2; > > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > > + > > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > > + if (err != -EIO) { > > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > > + > > > > > > > + if (i_size != 0) > > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > > + i_size - 1); > > > > > > > + } > > > > > > > + } > > > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > > > Honza > > > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > > behavior. For example: > > > > > > > > > > -----------------8<---------------- > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > > int sync_mode) > > > > > { > > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > > } > > > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > > { > > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > > } > > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > > -----------------8<---------------- > > > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > > file_write_and_wait a static inline wrapper around > > > > > file_write_and_wait_range. > > > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > > not, then that does simplify the code a bit. > > > > > > > > -------------------8<-------------------- > > > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > > parameter. This is not needed since we know the EOF from the inode. Use > > > > that instead. > > > > > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > > > --- > > > > mm/filemap.c | 8 +++++++- > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > > --- a/mm/filemap.c > > > > +++ b/mm/filemap.c > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > > */ > > > > int filemap_fdatawait(struct address_space *mapping) > > > > { > > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > > + loff_t i_size = i_size_read(mapping->host); > > > > + > > > > + if (i_size == 0) > > > > + return 0; > > > > + > > > > + return wait_on_page_writeback_range(mapping, 0, > > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > > } > > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > > that case we definitely don't want to be relying on i_size, > > > > > > Steve. > > > > > > > Yes. We can initiate and wait on writeback from any context where you > > can sleep, really. > > > > We're just waiting on whole file writeback here, so I don't think > > there's anything wrong. As long as the i_size was valid at some point in > > time prior to waiting then you're ok. > > > > The question I have is more whether this optimization is still useful. > > > > What we do now is just walk the radix tree and wait_on_page_writeback > > for each page. Do we gain anything by avoiding ranges beyond the current > > EOF with the pagecache infrastructure of 2017? > > FWIW I'm not aware of any significant benefit of using i_size in > filemap_fdatawait() - we iterate to the end of the radix tree node anyway > since pagevec_lookup_tag() does not support range searches anyway (I'm > working on fixing that however even after that the benefit would be still > rather marginal). > > What Marcello might have meant even back in 2004 was that if we are in the > middle of truncate, i_size is already reduced but page cache not truncated > yet, then filemap_fdatawait() does not have to wait for writeback of > truncated pages. That might be a noticeable benefit even today if such race > happens however I'm not sure it's worth optimizing for and surprises > arising from randomly snapshotting i_size (which especially for clustered > filesystems may be out of date) IMHO overweight the possible advantage. > > Honza Thanks for clarifying. Given that file_write_and_wait is a new helper function anyway, I'll just make it a wrapper around file_write_and_wait_range. Since it might be racy, should remove this optimization from the "legacy" filemap_fdatawait / filemap_fdatawait_keep_errors calls? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 13:00 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 13:00 UTC (permalink / raw) To: Jan Kara Cc: Steven Whitehouse, Marcelo Tosatti, Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel On Mon, 2017-07-31 at 14:07 +0200, Jan Kara wrote: > On Mon 31-07-17 07:44:16, Jeff Layton wrote: > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > > On 31/07/17 12:27, Jeff Layton wrote: > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > > +int file_write_and_wait(struct file *file) > > > > > > > +{ > > > > > > > + int err = 0, err2; > > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > > + > > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > > + if (err != -EIO) { > > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > > + > > > > > > > + if (i_size != 0) > > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > > + i_size - 1); > > > > > > > + } > > > > > > > + } > > > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > > > Honza > > > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > > behavior. For example: > > > > > > > > > > -----------------8<---------------- > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > > int sync_mode) > > > > > { > > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > > } > > > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > > { > > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > > } > > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > > -----------------8<---------------- > > > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > > file_write_and_wait a static inline wrapper around > > > > > file_write_and_wait_range. > > > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > > not, then that does simplify the code a bit. > > > > > > > > -------------------8<-------------------- > > > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > > parameter. This is not needed since we know the EOF from the inode. Use > > > > that instead. > > > > > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > > > --- > > > > mm/filemap.c | 8 +++++++- > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > > --- a/mm/filemap.c > > > > +++ b/mm/filemap.c > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > > */ > > > > int filemap_fdatawait(struct address_space *mapping) > > > > { > > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > > + loff_t i_size = i_size_read(mapping->host); > > > > + > > > > + if (i_size == 0) > > > > + return 0; > > > > + > > > > + return wait_on_page_writeback_range(mapping, 0, > > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > > } > > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > > that case we definitely don't want to be relying on i_size, > > > > > > Steve. > > > > > > > Yes. We can initiate and wait on writeback from any context where you > > can sleep, really. > > > > We're just waiting on whole file writeback here, so I don't think > > there's anything wrong. As long as the i_size was valid at some point in > > time prior to waiting then you're ok. > > > > The question I have is more whether this optimization is still useful. > > > > What we do now is just walk the radix tree and wait_on_page_writeback > > for each page. Do we gain anything by avoiding ranges beyond the current > > EOF with the pagecache infrastructure of 2017? > > FWIW I'm not aware of any significant benefit of using i_size in > filemap_fdatawait() - we iterate to the end of the radix tree node anyway > since pagevec_lookup_tag() does not support range searches anyway (I'm > working on fixing that however even after that the benefit would be still > rather marginal). > > What Marcello might have meant even back in 2004 was that if we are in the > middle of truncate, i_size is already reduced but page cache not truncated > yet, then filemap_fdatawait() does not have to wait for writeback of > truncated pages. That might be a noticeable benefit even today if such race > happens however I'm not sure it's worth optimizing for and surprises > arising from randomly snapshotting i_size (which especially for clustered > filesystems may be out of date) IMHO overweight the possible advantage. > > Honza Thanks for clarifying. Given that file_write_and_wait is a new helper function anyway, I'll just make it a wrapper around file_write_and_wait_range. Since it might be racy, should remove this optimization from the "legacy" filemap_fdatawait / filemap_fdatawait_keep_errors calls? -- Jeff Layton <jlayton@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait 2017-07-31 13:00 ` Jeff Layton (?) @ 2017-07-31 13:32 ` Jan Kara -1 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-07-31 13:32 UTC (permalink / raw) To: cluster-devel.redhat.com On Mon 31-07-17 09:00:37, Jeff Layton wrote: > On Mon, 2017-07-31 at 14:07 +0200, Jan Kara wrote: > > On Mon 31-07-17 07:44:16, Jeff Layton wrote: > > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > > > On 31/07/17 12:27, Jeff Layton wrote: > > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > > > +int file_write_and_wait(struct file *file) > > > > > > > > +{ > > > > > > > > + int err = 0, err2; > > > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > > > + > > > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > > > + if (err != -EIO) { > > > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > > > + > > > > > > > > + if (i_size != 0) > > > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > > > + i_size - 1); > > > > > > > > + } > > > > > > > > + } > > > > > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > > > > > Honza > > > > > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > > > behavior. For example: > > > > > > > > > > > > -----------------8<---------------- > > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > > > int sync_mode) > > > > > > { > > > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > > > } > > > > > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > > > { > > > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > > > } > > > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > > > -----------------8<---------------- > > > > > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > > > file_write_and_wait a static inline wrapper around > > > > > > file_write_and_wait_range. > > > > > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > > > not, then that does simplify the code a bit. > > > > > > > > > > -------------------8<-------------------- > > > > > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > > > parameter. This is not needed since we know the EOF from the inode. Use > > > > > that instead. > > > > > > > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > > > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > > > > --- > > > > > mm/filemap.c | 8 +++++++- > > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > > > --- a/mm/filemap.c > > > > > +++ b/mm/filemap.c > > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > > > */ > > > > > int filemap_fdatawait(struct address_space *mapping) > > > > > { > > > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > + > > > > > + if (i_size == 0) > > > > > + return 0; > > > > > + > > > > > + return wait_on_page_writeback_range(mapping, 0, > > > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > > > } > > > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > > > that case we definitely don't want to be relying on i_size, > > > > > > > > Steve. > > > > > > > > > > Yes. We can initiate and wait on writeback from any context where you > > > can sleep, really. > > > > > > We're just waiting on whole file writeback here, so I don't think > > > there's anything wrong. As long as the i_size was valid at some point in > > > time prior to waiting then you're ok. > > > > > > The question I have is more whether this optimization is still useful. > > > > > > What we do now is just walk the radix tree and wait_on_page_writeback > > > for each page. Do we gain anything by avoiding ranges beyond the current > > > EOF with the pagecache infrastructure of 2017? > > > > FWIW I'm not aware of any significant benefit of using i_size in > > filemap_fdatawait() - we iterate to the end of the radix tree node anyway > > since pagevec_lookup_tag() does not support range searches anyway (I'm > > working on fixing that however even after that the benefit would be still > > rather marginal). > > > > What Marcello might have meant even back in 2004 was that if we are in the > > middle of truncate, i_size is already reduced but page cache not truncated > > yet, then filemap_fdatawait() does not have to wait for writeback of > > truncated pages. That might be a noticeable benefit even today if such race > > happens however I'm not sure it's worth optimizing for and surprises > > arising from randomly snapshotting i_size (which especially for clustered > > filesystems may be out of date) IMHO overweight the possible advantage. > > > > Honza > > Thanks for clarifying. > > Given that file_write_and_wait is a new helper function anyway, I'll > just make it a wrapper around file_write_and_wait_range. Since it might Agreed. > be racy, should remove this optimization from the "legacy" > filemap_fdatawait / filemap_fdatawait_keep_errors calls? I'm for it. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 13:32 ` Jan Kara 0 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-07-31 13:32 UTC (permalink / raw) To: Jeff Layton Cc: Jan Kara, Steven Whitehouse, Marcelo Tosatti, Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel On Mon 31-07-17 09:00:37, Jeff Layton wrote: > On Mon, 2017-07-31 at 14:07 +0200, Jan Kara wrote: > > On Mon 31-07-17 07:44:16, Jeff Layton wrote: > > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > > > On 31/07/17 12:27, Jeff Layton wrote: > > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > > > +int file_write_and_wait(struct file *file) > > > > > > > > +{ > > > > > > > > + int err = 0, err2; > > > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > > > + > > > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > > > + if (err != -EIO) { > > > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > > > + > > > > > > > > + if (i_size != 0) > > > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > > > + i_size - 1); > > > > > > > > + } > > > > > > > > + } > > > > > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > > > > > Honza > > > > > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > > > behavior. For example: > > > > > > > > > > > > -----------------8<---------------- > > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > > > int sync_mode) > > > > > > { > > > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > > > } > > > > > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > > > { > > > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > > > } > > > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > > > -----------------8<---------------- > > > > > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > > > file_write_and_wait a static inline wrapper around > > > > > > file_write_and_wait_range. > > > > > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > > > not, then that does simplify the code a bit. > > > > > > > > > > -------------------8<-------------------- > > > > > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > > > parameter. This is not needed since we know the EOF from the inode. Use > > > > > that instead. > > > > > > > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > > > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > > > > --- > > > > > mm/filemap.c | 8 +++++++- > > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > > > --- a/mm/filemap.c > > > > > +++ b/mm/filemap.c > > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > > > */ > > > > > int filemap_fdatawait(struct address_space *mapping) > > > > > { > > > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > + > > > > > + if (i_size == 0) > > > > > + return 0; > > > > > + > > > > > + return wait_on_page_writeback_range(mapping, 0, > > > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > > > } > > > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > > > that case we definitely don't want to be relying on i_size, > > > > > > > > Steve. > > > > > > > > > > Yes. We can initiate and wait on writeback from any context where you > > > can sleep, really. > > > > > > We're just waiting on whole file writeback here, so I don't think > > > there's anything wrong. As long as the i_size was valid at some point in > > > time prior to waiting then you're ok. > > > > > > The question I have is more whether this optimization is still useful. > > > > > > What we do now is just walk the radix tree and wait_on_page_writeback > > > for each page. Do we gain anything by avoiding ranges beyond the current > > > EOF with the pagecache infrastructure of 2017? > > > > FWIW I'm not aware of any significant benefit of using i_size in > > filemap_fdatawait() - we iterate to the end of the radix tree node anyway > > since pagevec_lookup_tag() does not support range searches anyway (I'm > > working on fixing that however even after that the benefit would be still > > rather marginal). > > > > What Marcello might have meant even back in 2004 was that if we are in the > > middle of truncate, i_size is already reduced but page cache not truncated > > yet, then filemap_fdatawait() does not have to wait for writeback of > > truncated pages. That might be a noticeable benefit even today if such race > > happens however I'm not sure it's worth optimizing for and surprises > > arising from randomly snapshotting i_size (which especially for clustered > > filesystems may be out of date) IMHO overweight the possible advantage. > > > > Honza > > Thanks for clarifying. > > Given that file_write_and_wait is a new helper function anyway, I'll > just make it a wrapper around file_write_and_wait_range. Since it might Agreed. > be racy, should remove this optimization from the "legacy" > filemap_fdatawait / filemap_fdatawait_keep_errors calls? I'm for it. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 13:32 ` Jan Kara 0 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-07-31 13:32 UTC (permalink / raw) To: Jeff Layton Cc: Jan Kara, Steven Whitehouse, Marcelo Tosatti, Alexander Viro, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, cluster-devel On Mon 31-07-17 09:00:37, Jeff Layton wrote: > On Mon, 2017-07-31 at 14:07 +0200, Jan Kara wrote: > > On Mon 31-07-17 07:44:16, Jeff Layton wrote: > > > On Mon, 2017-07-31 at 12:32 +0100, Steven Whitehouse wrote: > > > > On 31/07/17 12:27, Jeff Layton wrote: > > > > > On Thu, 2017-07-27 at 08:48 -0400, Jeff Layton wrote: > > > > > > On Thu, 2017-07-27 at 10:49 +0200, Jan Kara wrote: > > > > > > > On Wed 26-07-17 13:55:36, Jeff Layton wrote: > > > > > > > > +int file_write_and_wait(struct file *file) > > > > > > > > +{ > > > > > > > > + int err = 0, err2; > > > > > > > > + struct address_space *mapping = file->f_mapping; > > > > > > > > + > > > > > > > > + if ((!dax_mapping(mapping) && mapping->nrpages) || > > > > > > > > + (dax_mapping(mapping) && mapping->nrexceptional)) { > > > > > > > > + err = filemap_fdatawrite(mapping); > > > > > > > > + /* See comment of filemap_write_and_wait() */ > > > > > > > > + if (err != -EIO) { > > > > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > > > > + > > > > > > > > + if (i_size != 0) > > > > > > > > + __filemap_fdatawait_range(mapping, 0, > > > > > > > > + i_size - 1); > > > > > > > > + } > > > > > > > > + } > > > > > > > > > > > > > > Err, what's the i_size check doing here? I'd just pass ~0 as the end of the > > > > > > > range and ignore i_size. It is much easier than trying to wrap your head > > > > > > > around possible races with file operations modifying i_size. > > > > > > > > > > > > > > Honza > > > > > > > > > > > > I'm basically emulating _exactly_ what filemap_write_and_wait does here, > > > > > > as I'm leery of making subtle behavior changes in the actual writeback > > > > > > behavior. For example: > > > > > > > > > > > > -----------------8<---------------- > > > > > > static inline int __filemap_fdatawrite(struct address_space *mapping, > > > > > > int sync_mode) > > > > > > { > > > > > > return __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode); > > > > > > } > > > > > > > > > > > > int filemap_fdatawrite(struct address_space *mapping) > > > > > > { > > > > > > return __filemap_fdatawrite(mapping, WB_SYNC_ALL); > > > > > > } > > > > > > EXPORT_SYMBOL(filemap_fdatawrite); > > > > > > -----------------8<---------------- > > > > > > > > > > > > ...which then sets up the wbc with the right ranges and sync mode and > > > > > > kicks off writepages. But then, it does the i_size_read to figure out > > > > > > what range it should wait on (with the shortcut for the size == 0 case). > > > > > > > > > > > > My assumption was that it was intentionally designed that way, but I'm > > > > > > guessing from your comments that it wasn't? If so, then we can turn > > > > > > file_write_and_wait a static inline wrapper around > > > > > > file_write_and_wait_range. > > > > > > > > > > FWIW, I did a bit of archaeology in the linux-history tree and found > > > > > this patch from Marcelo in 2004. Is this optimization still helpful? If > > > > > not, then that does simplify the code a bit. > > > > > > > > > > -------------------8<-------------------- > > > > > > > > > > [PATCH] small wait_on_page_writeback_range() optimization > > > > > > > > > > filemap_fdatawait() calls wait_on_page_writeback_range() with -1 as "end" > > > > > parameter. This is not needed since we know the EOF from the inode. Use > > > > > that instead. > > > > > > > > > > Signed-off-by: Marcelo Tosatti <marcelo.tosatti@cyclades.com> > > > > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > > > > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > > > > --- > > > > > mm/filemap.c | 8 +++++++- > > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/mm/filemap.c b/mm/filemap.c > > > > > index 78e18b7639b6..55fb7b4141e4 100644 > > > > > --- a/mm/filemap.c > > > > > +++ b/mm/filemap.c > > > > > @@ -287,7 +287,13 @@ EXPORT_SYMBOL(sync_page_range); > > > > > */ > > > > > int filemap_fdatawait(struct address_space *mapping) > > > > > { > > > > > - return wait_on_page_writeback_range(mapping, 0, -1); > > > > > + loff_t i_size = i_size_read(mapping->host); > > > > > + > > > > > + if (i_size == 0) > > > > > + return 0; > > > > > + > > > > > + return wait_on_page_writeback_range(mapping, 0, > > > > > + (i_size - 1) >> PAGE_CACHE_SHIFT); > > > > > } > > > > > EXPORT_SYMBOL(filemap_fdatawait); > > > > > > > > > > > > > Does this ever get called in cases where we would not hold fs locks? In > > > > that case we definitely don't want to be relying on i_size, > > > > > > > > Steve. > > > > > > > > > > Yes. We can initiate and wait on writeback from any context where you > > > can sleep, really. > > > > > > We're just waiting on whole file writeback here, so I don't think > > > there's anything wrong. As long as the i_size was valid at some point in > > > time prior to waiting then you're ok. > > > > > > The question I have is more whether this optimization is still useful. > > > > > > What we do now is just walk the radix tree and wait_on_page_writeback > > > for each page. Do we gain anything by avoiding ranges beyond the current > > > EOF with the pagecache infrastructure of 2017? > > > > FWIW I'm not aware of any significant benefit of using i_size in > > filemap_fdatawait() - we iterate to the end of the radix tree node anyway > > since pagevec_lookup_tag() does not support range searches anyway (I'm > > working on fixing that however even after that the benefit would be still > > rather marginal). > > > > What Marcello might have meant even back in 2004 was that if we are in the > > middle of truncate, i_size is already reduced but page cache not truncated > > yet, then filemap_fdatawait() does not have to wait for writeback of > > truncated pages. That might be a noticeable benefit even today if such race > > happens however I'm not sure it's worth optimizing for and surprises > > arising from randomly snapshotting i_size (which especially for clustered > > filesystems may be out of date) IMHO overweight the possible advantage. > > > > Honza > > Thanks for clarifying. > > Given that file_write_and_wait is a new helper function anyway, I'll > just make it a wrapper around file_write_and_wait_range. Since it might Agreed. > be racy, should remove this optimization from the "legacy" > filemap_fdatawait / filemap_fdatawait_keep_errors calls? I'm for it. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v3] mm: add file_fdatawait_range and file_write_and_wait 2017-07-26 17:55 ` Jeff Layton (?) @ 2017-07-31 16:49 ` Jeff Layton -1 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 16:49 UTC (permalink / raw) To: cluster-devel.redhat.com From: Jeff Layton <jlayton@redhat.com> Necessary now for gfs2_fsync and sync_file_range, but there will eventually be other callers. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- include/linux/fs.h | 11 ++++++++++- mm/filemap.c | 23 +++++++++++++++++++++++ 2 files changed, 33 insertions(+), 1 deletion(-) v3: make file_write_and_wait a wrapper around file_write_and_wait_range diff --git a/include/linux/fs.h b/include/linux/fs.h index 526b6a9f30d4..909210bd6366 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2549,6 +2549,8 @@ static inline int filemap_fdatawait(struct address_space *mapping) extern bool filemap_range_has_page(struct address_space *, loff_t lstart, loff_t lend); +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int filemap_write_and_wait(struct address_space *mapping); extern int filemap_write_and_wait_range(struct address_space *mapping, loff_t lstart, loff_t lend); @@ -2557,12 +2559,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping, extern int filemap_fdatawrite_range(struct address_space *mapping, loff_t start, loff_t end); extern int filemap_check_errors(struct address_space *mapping); - extern void __filemap_set_wb_err(struct address_space *mapping, int err); + +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int __must_check file_check_and_advance_wb_err(struct file *file); extern int __must_check file_write_and_wait_range(struct file *file, loff_t start, loff_t end); +static inline int file_write_and_wait(struct file *file) +{ + return file_write_and_wait_range(file, 0, LLONG_MAX); +} + /** * filemap_set_wb_err - set a writeback error on an address_space * @mapping: mapping in which to set writeback error diff --git a/mm/filemap.c b/mm/filemap.c index 953804b29a75..85dfe3bee324 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, EXPORT_SYMBOL(filemap_fdatawait_range); /** + * file_fdatawait_range - wait for writeback to complete + * @file: file pointing to address space structure to wait for + * @start_byte: offset in bytes where the range starts + * @end_byte: offset in bytes where the range ends (inclusive) + * + * Walk the list of under-writeback pages of the address space that file + * refers to, in the given range and wait for all of them. Check error + * status of the address space vs. the file->f_wb_err cursor and return it. + * + * Since the error status of the file is advanced by this function, + * callers are responsible for checking the return value and handling and/or + * reporting the error. + */ +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte) +{ + struct address_space *mapping = file->f_mapping; + + __filemap_fdatawait_range(mapping, start_byte, end_byte); + return file_check_and_advance_wb_err(file); +} +EXPORT_SYMBOL(file_fdatawait_range); + +/** * filemap_fdatawait_keep_errors - wait for writeback without clearing errors * @mapping: address space structure to wait for * -- 2.13.3 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v3] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 16:49 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 16:49 UTC (permalink / raw) To: Alexander Viro, Jan Kara Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel From: Jeff Layton <jlayton@redhat.com> Necessary now for gfs2_fsync and sync_file_range, but there will eventually be other callers. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- include/linux/fs.h | 11 ++++++++++- mm/filemap.c | 23 +++++++++++++++++++++++ 2 files changed, 33 insertions(+), 1 deletion(-) v3: make file_write_and_wait a wrapper around file_write_and_wait_range diff --git a/include/linux/fs.h b/include/linux/fs.h index 526b6a9f30d4..909210bd6366 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2549,6 +2549,8 @@ static inline int filemap_fdatawait(struct address_space *mapping) extern bool filemap_range_has_page(struct address_space *, loff_t lstart, loff_t lend); +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int filemap_write_and_wait(struct address_space *mapping); extern int filemap_write_and_wait_range(struct address_space *mapping, loff_t lstart, loff_t lend); @@ -2557,12 +2559,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping, extern int filemap_fdatawrite_range(struct address_space *mapping, loff_t start, loff_t end); extern int filemap_check_errors(struct address_space *mapping); - extern void __filemap_set_wb_err(struct address_space *mapping, int err); + +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int __must_check file_check_and_advance_wb_err(struct file *file); extern int __must_check file_write_and_wait_range(struct file *file, loff_t start, loff_t end); +static inline int file_write_and_wait(struct file *file) +{ + return file_write_and_wait_range(file, 0, LLONG_MAX); +} + /** * filemap_set_wb_err - set a writeback error on an address_space * @mapping: mapping in which to set writeback error diff --git a/mm/filemap.c b/mm/filemap.c index 953804b29a75..85dfe3bee324 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, EXPORT_SYMBOL(filemap_fdatawait_range); /** + * file_fdatawait_range - wait for writeback to complete + * @file: file pointing to address space structure to wait for + * @start_byte: offset in bytes where the range starts + * @end_byte: offset in bytes where the range ends (inclusive) + * + * Walk the list of under-writeback pages of the address space that file + * refers to, in the given range and wait for all of them. Check error + * status of the address space vs. the file->f_wb_err cursor and return it. + * + * Since the error status of the file is advanced by this function, + * callers are responsible for checking the return value and handling and/or + * reporting the error. + */ +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte) +{ + struct address_space *mapping = file->f_mapping; + + __filemap_fdatawait_range(mapping, start_byte, end_byte); + return file_check_and_advance_wb_err(file); +} +EXPORT_SYMBOL(file_fdatawait_range); + +/** * filemap_fdatawait_keep_errors - wait for writeback without clearing errors * @mapping: address space structure to wait for * -- 2.13.3 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v3] mm: add file_fdatawait_range and file_write_and_wait @ 2017-07-31 16:49 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-31 16:49 UTC (permalink / raw) To: Alexander Viro, Jan Kara Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel From: Jeff Layton <jlayton@redhat.com> Necessary now for gfs2_fsync and sync_file_range, but there will eventually be other callers. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- include/linux/fs.h | 11 ++++++++++- mm/filemap.c | 23 +++++++++++++++++++++++ 2 files changed, 33 insertions(+), 1 deletion(-) v3: make file_write_and_wait a wrapper around file_write_and_wait_range diff --git a/include/linux/fs.h b/include/linux/fs.h index 526b6a9f30d4..909210bd6366 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2549,6 +2549,8 @@ static inline int filemap_fdatawait(struct address_space *mapping) extern bool filemap_range_has_page(struct address_space *, loff_t lstart, loff_t lend); +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int filemap_write_and_wait(struct address_space *mapping); extern int filemap_write_and_wait_range(struct address_space *mapping, loff_t lstart, loff_t lend); @@ -2557,12 +2559,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping, extern int filemap_fdatawrite_range(struct address_space *mapping, loff_t start, loff_t end); extern int filemap_check_errors(struct address_space *mapping); - extern void __filemap_set_wb_err(struct address_space *mapping, int err); + +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, + loff_t lend); extern int __must_check file_check_and_advance_wb_err(struct file *file); extern int __must_check file_write_and_wait_range(struct file *file, loff_t start, loff_t end); +static inline int file_write_and_wait(struct file *file) +{ + return file_write_and_wait_range(file, 0, LLONG_MAX); +} + /** * filemap_set_wb_err - set a writeback error on an address_space * @mapping: mapping in which to set writeback error diff --git a/mm/filemap.c b/mm/filemap.c index 953804b29a75..85dfe3bee324 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, EXPORT_SYMBOL(filemap_fdatawait_range); /** + * file_fdatawait_range - wait for writeback to complete + * @file: file pointing to address space structure to wait for + * @start_byte: offset in bytes where the range starts + * @end_byte: offset in bytes where the range ends (inclusive) + * + * Walk the list of under-writeback pages of the address space that file + * refers to, in the given range and wait for all of them. Check error + * status of the address space vs. the file->f_wb_err cursor and return it. + * + * Since the error status of the file is advanced by this function, + * callers are responsible for checking the return value and handling and/or + * reporting the error. + */ +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte) +{ + struct address_space *mapping = file->f_mapping; + + __filemap_fdatawait_range(mapping, start_byte, end_byte); + return file_check_and_advance_wb_err(file); +} +EXPORT_SYMBOL(file_fdatawait_range); + +/** * filemap_fdatawait_keep_errors - wait for writeback without clearing errors * @mapping: address space structure to wait for * -- 2.13.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v3] mm: add file_fdatawait_range and file_write_and_wait 2017-07-31 16:49 ` Jeff Layton (?) @ 2017-08-01 9:52 ` Jan Kara -1 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-08-01 9:52 UTC (permalink / raw) To: cluster-devel.redhat.com On Mon 31-07-17 12:49:25, Jeff Layton wrote: > From: Jeff Layton <jlayton@redhat.com> > > Necessary now for gfs2_fsync and sync_file_range, but there will > eventually be other callers. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> Looks good to me. You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > include/linux/fs.h | 11 ++++++++++- > mm/filemap.c | 23 +++++++++++++++++++++++ > 2 files changed, 33 insertions(+), 1 deletion(-) > > v3: make file_write_and_wait a wrapper around file_write_and_wait_range > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 526b6a9f30d4..909210bd6366 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2549,6 +2549,8 @@ static inline int filemap_fdatawait(struct address_space *mapping) > > extern bool filemap_range_has_page(struct address_space *, loff_t lstart, > loff_t lend); > +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, > + loff_t lend); > extern int filemap_write_and_wait(struct address_space *mapping); > extern int filemap_write_and_wait_range(struct address_space *mapping, > loff_t lstart, loff_t lend); > @@ -2557,12 +2559,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping, > extern int filemap_fdatawrite_range(struct address_space *mapping, > loff_t start, loff_t end); > extern int filemap_check_errors(struct address_space *mapping); > - > extern void __filemap_set_wb_err(struct address_space *mapping, int err); > + > +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, > + loff_t lend); > extern int __must_check file_check_and_advance_wb_err(struct file *file); > extern int __must_check file_write_and_wait_range(struct file *file, > loff_t start, loff_t end); > > +static inline int file_write_and_wait(struct file *file) > +{ > + return file_write_and_wait_range(file, 0, LLONG_MAX); > +} > + > /** > * filemap_set_wb_err - set a writeback error on an address_space > * @mapping: mapping in which to set writeback error > diff --git a/mm/filemap.c b/mm/filemap.c > index 953804b29a75..85dfe3bee324 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, > EXPORT_SYMBOL(filemap_fdatawait_range); > > /** > + * file_fdatawait_range - wait for writeback to complete > + * @file: file pointing to address space structure to wait for > + * @start_byte: offset in bytes where the range starts > + * @end_byte: offset in bytes where the range ends (inclusive) > + * > + * Walk the list of under-writeback pages of the address space that file > + * refers to, in the given range and wait for all of them. Check error > + * status of the address space vs. the file->f_wb_err cursor and return it. > + * > + * Since the error status of the file is advanced by this function, > + * callers are responsible for checking the return value and handling and/or > + * reporting the error. > + */ > +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte) > +{ > + struct address_space *mapping = file->f_mapping; > + > + __filemap_fdatawait_range(mapping, start_byte, end_byte); > + return file_check_and_advance_wb_err(file); > +} > +EXPORT_SYMBOL(file_fdatawait_range); > + > +/** > * filemap_fdatawait_keep_errors - wait for writeback without clearing errors > * @mapping: address space structure to wait for > * > -- > 2.13.3 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3] mm: add file_fdatawait_range and file_write_and_wait @ 2017-08-01 9:52 ` Jan Kara 0 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-08-01 9:52 UTC (permalink / raw) To: Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel On Mon 31-07-17 12:49:25, Jeff Layton wrote: > From: Jeff Layton <jlayton@redhat.com> > > Necessary now for gfs2_fsync and sync_file_range, but there will > eventually be other callers. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> Looks good to me. You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > include/linux/fs.h | 11 ++++++++++- > mm/filemap.c | 23 +++++++++++++++++++++++ > 2 files changed, 33 insertions(+), 1 deletion(-) > > v3: make file_write_and_wait a wrapper around file_write_and_wait_range > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 526b6a9f30d4..909210bd6366 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2549,6 +2549,8 @@ static inline int filemap_fdatawait(struct address_space *mapping) > > extern bool filemap_range_has_page(struct address_space *, loff_t lstart, > loff_t lend); > +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, > + loff_t lend); > extern int filemap_write_and_wait(struct address_space *mapping); > extern int filemap_write_and_wait_range(struct address_space *mapping, > loff_t lstart, loff_t lend); > @@ -2557,12 +2559,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping, > extern int filemap_fdatawrite_range(struct address_space *mapping, > loff_t start, loff_t end); > extern int filemap_check_errors(struct address_space *mapping); > - > extern void __filemap_set_wb_err(struct address_space *mapping, int err); > + > +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, > + loff_t lend); > extern int __must_check file_check_and_advance_wb_err(struct file *file); > extern int __must_check file_write_and_wait_range(struct file *file, > loff_t start, loff_t end); > > +static inline int file_write_and_wait(struct file *file) > +{ > + return file_write_and_wait_range(file, 0, LLONG_MAX); > +} > + > /** > * filemap_set_wb_err - set a writeback error on an address_space > * @mapping: mapping in which to set writeback error > diff --git a/mm/filemap.c b/mm/filemap.c > index 953804b29a75..85dfe3bee324 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, > EXPORT_SYMBOL(filemap_fdatawait_range); > > /** > + * file_fdatawait_range - wait for writeback to complete > + * @file: file pointing to address space structure to wait for > + * @start_byte: offset in bytes where the range starts > + * @end_byte: offset in bytes where the range ends (inclusive) > + * > + * Walk the list of under-writeback pages of the address space that file > + * refers to, in the given range and wait for all of them. Check error > + * status of the address space vs. the file->f_wb_err cursor and return it. > + * > + * Since the error status of the file is advanced by this function, > + * callers are responsible for checking the return value and handling and/or > + * reporting the error. > + */ > +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte) > +{ > + struct address_space *mapping = file->f_mapping; > + > + __filemap_fdatawait_range(mapping, start_byte, end_byte); > + return file_check_and_advance_wb_err(file); > +} > +EXPORT_SYMBOL(file_fdatawait_range); > + > +/** > * filemap_fdatawait_keep_errors - wait for writeback without clearing errors > * @mapping: address space structure to wait for > * > -- > 2.13.3 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v3] mm: add file_fdatawait_range and file_write_and_wait @ 2017-08-01 9:52 ` Jan Kara 0 siblings, 0 replies; 87+ messages in thread From: Jan Kara @ 2017-08-01 9:52 UTC (permalink / raw) To: Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel On Mon 31-07-17 12:49:25, Jeff Layton wrote: > From: Jeff Layton <jlayton@redhat.com> > > Necessary now for gfs2_fsync and sync_file_range, but there will > eventually be other callers. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> Looks good to me. You can add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > include/linux/fs.h | 11 ++++++++++- > mm/filemap.c | 23 +++++++++++++++++++++++ > 2 files changed, 33 insertions(+), 1 deletion(-) > > v3: make file_write_and_wait a wrapper around file_write_and_wait_range > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 526b6a9f30d4..909210bd6366 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2549,6 +2549,8 @@ static inline int filemap_fdatawait(struct address_space *mapping) > > extern bool filemap_range_has_page(struct address_space *, loff_t lstart, > loff_t lend); > +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, > + loff_t lend); > extern int filemap_write_and_wait(struct address_space *mapping); > extern int filemap_write_and_wait_range(struct address_space *mapping, > loff_t lstart, loff_t lend); > @@ -2557,12 +2559,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping, > extern int filemap_fdatawrite_range(struct address_space *mapping, > loff_t start, loff_t end); > extern int filemap_check_errors(struct address_space *mapping); > - > extern void __filemap_set_wb_err(struct address_space *mapping, int err); > + > +extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, > + loff_t lend); > extern int __must_check file_check_and_advance_wb_err(struct file *file); > extern int __must_check file_write_and_wait_range(struct file *file, > loff_t start, loff_t end); > > +static inline int file_write_and_wait(struct file *file) > +{ > + return file_write_and_wait_range(file, 0, LLONG_MAX); > +} > + > /** > * filemap_set_wb_err - set a writeback error on an address_space > * @mapping: mapping in which to set writeback error > diff --git a/mm/filemap.c b/mm/filemap.c > index 953804b29a75..85dfe3bee324 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, > EXPORT_SYMBOL(filemap_fdatawait_range); > > /** > + * file_fdatawait_range - wait for writeback to complete > + * @file: file pointing to address space structure to wait for > + * @start_byte: offset in bytes where the range starts > + * @end_byte: offset in bytes where the range ends (inclusive) > + * > + * Walk the list of under-writeback pages of the address space that file > + * refers to, in the given range and wait for all of them. Check error > + * status of the address space vs. the file->f_wb_err cursor and return it. > + * > + * Since the error status of the file is advanced by this function, > + * callers are responsible for checking the return value and handling and/or > + * reporting the error. > + */ > +int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte) > +{ > + struct address_space *mapping = file->f_mapping; > + > + __filemap_fdatawait_range(mapping, start_byte, end_byte); > + return file_check_and_advance_wb_err(file); > +} > +EXPORT_SYMBOL(file_fdatawait_range); > + > +/** > * filemap_fdatawait_keep_errors - wait for writeback without clearing errors > * @mapping: address space structure to wait for > * > -- > 2.13.3 > -- Jan Kara <jack@suse.com> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 3/4] fs: convert sync_file_range to use errseq_t based error-tracking 2017-07-26 17:55 ` Jeff Layton (?) @ 2017-07-26 17:55 ` Jeff Layton -1 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: cluster-devel.redhat.com From: Jeff Layton <jlayton@redhat.com> sync_file_range doesn't call down into the filesystem directly at all. It only kicks off writeback of pagecache pages and optionally waits on the result. Convert sync_file_range to use errseq_t based error tracking, under the assumption that most users will prefer this behavior when errors occur. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/sync.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/sync.c b/fs/sync.c index 2a54c1f22035..27d6b8bbcb6a 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -342,7 +342,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes, ret = 0; if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) { - ret = filemap_fdatawait_range(mapping, offset, endbyte); + ret = file_fdatawait_range(f.file, offset, endbyte); if (ret < 0) goto out_put; } @@ -355,7 +355,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes, } if (flags & SYNC_FILE_RANGE_WAIT_AFTER) - ret = filemap_fdatawait_range(mapping, offset, endbyte); + ret = file_fdatawait_range(f.file, offset, endbyte); out_put: fdput(f); -- 2.13.3 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v2 3/4] fs: convert sync_file_range to use errseq_t based error-tracking @ 2017-07-26 17:55 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: Alexander Viro, Jan Kara Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel From: Jeff Layton <jlayton@redhat.com> sync_file_range doesn't call down into the filesystem directly at all. It only kicks off writeback of pagecache pages and optionally waits on the result. Convert sync_file_range to use errseq_t based error tracking, under the assumption that most users will prefer this behavior when errors occur. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/sync.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/sync.c b/fs/sync.c index 2a54c1f22035..27d6b8bbcb6a 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -342,7 +342,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes, ret = 0; if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) { - ret = filemap_fdatawait_range(mapping, offset, endbyte); + ret = file_fdatawait_range(f.file, offset, endbyte); if (ret < 0) goto out_put; } @@ -355,7 +355,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes, } if (flags & SYNC_FILE_RANGE_WAIT_AFTER) - ret = filemap_fdatawait_range(mapping, offset, endbyte); + ret = file_fdatawait_range(f.file, offset, endbyte); out_put: fdput(f); -- 2.13.3 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v2 3/4] fs: convert sync_file_range to use errseq_t based error-tracking @ 2017-07-26 17:55 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: Alexander Viro, Jan Kara Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel From: Jeff Layton <jlayton@redhat.com> sync_file_range doesn't call down into the filesystem directly at all. It only kicks off writeback of pagecache pages and optionally waits on the result. Convert sync_file_range to use errseq_t based error tracking, under the assumption that most users will prefer this behavior when errors occur. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/sync.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/sync.c b/fs/sync.c index 2a54c1f22035..27d6b8bbcb6a 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -342,7 +342,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes, ret = 0; if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) { - ret = filemap_fdatawait_range(mapping, offset, endbyte); + ret = file_fdatawait_range(f.file, offset, endbyte); if (ret < 0) goto out_put; } @@ -355,7 +355,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes, } if (flags & SYNC_FILE_RANGE_WAIT_AFTER) - ret = filemap_fdatawait_range(mapping, offset, endbyte); + ret = file_fdatawait_range(f.file, offset, endbyte); out_put: fdput(f); -- 2.13.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync 2017-07-26 17:55 ` Jeff Layton (?) @ 2017-07-26 17:55 ` Jeff Layton -1 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: cluster-devel.redhat.com From: Jeff Layton <jlayton@redhat.com> This means that we need to export the new file_fdatawait_range symbol. Also, fix a place where a writeback error might get dropped in the gfs2_is_jdata case. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/gfs2/file.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index c2062a108d19..c53ac6efd04c 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end, if (ret) return ret; if (gfs2_is_jdata(ip)) - filemap_write_and_wait(mapping); + ret = file_write_and_wait(file); + if (ret) + return ret; gfs2_ail_flush(ip->i_gl, 1); } if (mapping->nrpages) - ret = filemap_fdatawait_range(mapping, start, end); + ret = file_fdatawait_range(file, start, end); return ret ? ret : ret1; } -- 2.13.3 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync @ 2017-07-26 17:55 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: Alexander Viro, Jan Kara Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel From: Jeff Layton <jlayton@redhat.com> This means that we need to export the new file_fdatawait_range symbol. Also, fix a place where a writeback error might get dropped in the gfs2_is_jdata case. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/gfs2/file.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index c2062a108d19..c53ac6efd04c 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end, if (ret) return ret; if (gfs2_is_jdata(ip)) - filemap_write_and_wait(mapping); + ret = file_write_and_wait(file); + if (ret) + return ret; gfs2_ail_flush(ip->i_gl, 1); } if (mapping->nrpages) - ret = filemap_fdatawait_range(mapping, start, end); + ret = file_fdatawait_range(file, start, end); return ret ? ret : ret1; } -- 2.13.3 ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync @ 2017-07-26 17:55 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 17:55 UTC (permalink / raw) To: Alexander Viro, Jan Kara Cc: J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Matthew Wilcox, Bob Peterson, Steven Whitehouse, cluster-devel From: Jeff Layton <jlayton@redhat.com> This means that we need to export the new file_fdatawait_range symbol. Also, fix a place where a writeback error might get dropped in the gfs2_is_jdata case. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/gfs2/file.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index c2062a108d19..c53ac6efd04c 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end, if (ret) return ret; if (gfs2_is_jdata(ip)) - filemap_write_and_wait(mapping); + ret = file_write_and_wait(file); + if (ret) + return ret; gfs2_ail_flush(ip->i_gl, 1); } if (mapping->nrpages) - ret = filemap_fdatawait_range(mapping, start, end); + ret = file_fdatawait_range(file, start, end); return ret ? ret : ret1; } -- 2.13.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync 2017-07-26 17:55 ` Jeff Layton (?) @ 2017-07-26 19:21 ` Matthew Wilcox -1 siblings, 0 replies; 87+ messages in thread From: Matthew Wilcox @ 2017-07-26 19:21 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end, > if (ret) > return ret; > if (gfs2_is_jdata(ip)) > - filemap_write_and_wait(mapping); > + ret = file_write_and_wait(file); > + if (ret) > + return ret; > gfs2_ail_flush(ip->i_gl, 1); > } Do we want to skip flushing the AIL if there was an error (possibly previously encountered)? I'd think we'd want to flush the AIL then report the error, like this: if (gfs2_is_jdata(ip)) - filemap_write_and_wait(mapping); + ret = file_write_and_wait(file); gfs2_ail_flush(ip->i_gl, 1); + if (ret) + return ret; } ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync @ 2017-07-26 19:21 ` Matthew Wilcox 0 siblings, 0 replies; 87+ messages in thread From: Matthew Wilcox @ 2017-07-26 19:21 UTC (permalink / raw) To: Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Bob Peterson, Steven Whitehouse, cluster-devel On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end, > if (ret) > return ret; > if (gfs2_is_jdata(ip)) > - filemap_write_and_wait(mapping); > + ret = file_write_and_wait(file); > + if (ret) > + return ret; > gfs2_ail_flush(ip->i_gl, 1); > } Do we want to skip flushing the AIL if there was an error (possibly previously encountered)? I'd think we'd want to flush the AIL then report the error, like this: if (gfs2_is_jdata(ip)) - filemap_write_and_wait(mapping); + ret = file_write_and_wait(file); gfs2_ail_flush(ip->i_gl, 1); + if (ret) + return ret; } ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync @ 2017-07-26 19:21 ` Matthew Wilcox 0 siblings, 0 replies; 87+ messages in thread From: Matthew Wilcox @ 2017-07-26 19:21 UTC (permalink / raw) To: Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Bob Peterson, Steven Whitehouse, cluster-devel On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end, > if (ret) > return ret; > if (gfs2_is_jdata(ip)) > - filemap_write_and_wait(mapping); > + ret = file_write_and_wait(file); > + if (ret) > + return ret; > gfs2_ail_flush(ip->i_gl, 1); > } Do we want to skip flushing the AIL if there was an error (possibly previously encountered)? I'd think we'd want to flush the AIL then report the error, like this: if (gfs2_is_jdata(ip)) - filemap_write_and_wait(mapping); + ret = file_write_and_wait(file); gfs2_ail_flush(ip->i_gl, 1); + if (ret) + return ret; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync 2017-07-26 19:21 ` Matthew Wilcox (?) @ 2017-07-26 22:22 ` Jeff Layton -1 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 22:22 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end, > > if (ret) > > return ret; > > if (gfs2_is_jdata(ip)) > > - filemap_write_and_wait(mapping); > > + ret = file_write_and_wait(file); > > + if (ret) > > + return ret; > > gfs2_ail_flush(ip->i_gl, 1); > > } > > Do we want to skip flushing the AIL if there was an error (possibly > previously encountered)? I'd think we'd want to flush the AIL then report > the error, like this: > I wondered about that. Note that earlier in the function, we also bail out without flushing the AIL if sync_inode_metadata fails, so I assumed that we'd want to do the same here. I could definitely be wrong and am fine with changing it if so. Discarding the error like we do today seems wrong though. Bob, thoughts? > if (gfs2_is_jdata(ip)) > - filemap_write_and_wait(mapping); > + ret = file_write_and_wait(file); > gfs2_ail_flush(ip->i_gl, 1); > + if (ret) > + return ret; > } -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync @ 2017-07-26 22:22 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 22:22 UTC (permalink / raw) To: Matthew Wilcox, Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Bob Peterson, Steven Whitehouse, cluster-devel On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end, > > if (ret) > > return ret; > > if (gfs2_is_jdata(ip)) > > - filemap_write_and_wait(mapping); > > + ret = file_write_and_wait(file); > > + if (ret) > > + return ret; > > gfs2_ail_flush(ip->i_gl, 1); > > } > > Do we want to skip flushing the AIL if there was an error (possibly > previously encountered)? I'd think we'd want to flush the AIL then report > the error, like this: > I wondered about that. Note that earlier in the function, we also bail out without flushing the AIL if sync_inode_metadata fails, so I assumed that we'd want to do the same here. I could definitely be wrong and am fine with changing it if so. Discarding the error like we do today seems wrong though. Bob, thoughts? > if (gfs2_is_jdata(ip)) > - filemap_write_and_wait(mapping); > + ret = file_write_and_wait(file); > gfs2_ail_flush(ip->i_gl, 1); > + if (ret) > + return ret; > } -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync @ 2017-07-26 22:22 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-26 22:22 UTC (permalink / raw) To: Matthew Wilcox, Jeff Layton Cc: Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Bob Peterson, Steven Whitehouse, cluster-devel On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end, > > if (ret) > > return ret; > > if (gfs2_is_jdata(ip)) > > - filemap_write_and_wait(mapping); > > + ret = file_write_and_wait(file); > > + if (ret) > > + return ret; > > gfs2_ail_flush(ip->i_gl, 1); > > } > > Do we want to skip flushing the AIL if there was an error (possibly > previously encountered)? I'd think we'd want to flush the AIL then report > the error, like this: > I wondered about that. Note that earlier in the function, we also bail out without flushing the AIL if sync_inode_metadata fails, so I assumed that we'd want to do the same here. I could definitely be wrong and am fine with changing it if so. Discarding the error like we do today seems wrong though. Bob, thoughts? > if (gfs2_is_jdata(ip)) > - filemap_write_and_wait(mapping); > + ret = file_write_and_wait(file); > gfs2_ail_flush(ip->i_gl, 1); > + if (ret) > + return ret; > } -- Jeff Layton <jlayton@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync 2017-07-26 22:22 ` Jeff Layton (?) @ 2017-07-27 12:47 ` Bob Peterson -1 siblings, 0 replies; 87+ messages in thread From: Bob Peterson @ 2017-07-27 12:47 UTC (permalink / raw) To: cluster-devel.redhat.com ----- Original Message ----- | On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: | > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: | > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t | > > start, loff_t end, | > > if (ret) | > > return ret; | > > if (gfs2_is_jdata(ip)) | > > - filemap_write_and_wait(mapping); | > > + ret = file_write_and_wait(file); | > > + if (ret) | > > + return ret; | > > gfs2_ail_flush(ip->i_gl, 1); | > > } | > | > Do we want to skip flushing the AIL if there was an error (possibly | > previously encountered)? I'd think we'd want to flush the AIL then report | > the error, like this: | > | | I wondered about that. Note that earlier in the function, we also bail | out without flushing the AIL if sync_inode_metadata fails, so I assumed | that we'd want to do the same here. | | I could definitely be wrong and am fine with changing it if so. | Discarding the error like we do today seems wrong though. | | Bob, thoughts? Hi Jeff, Matthew, I'm not sure there's a right or wrong answer here. I don't know what's best from a "correctness" point of view. I guess I'm leaning toward Jeff's original solution where we don't call gfs2_ail_flush() on error. The main purpose of ail_flush is to go through buffer descriptors (bds) attached to the glock and generate revokes for them in a new transaction. If there's an error condition, trying to go through more hoops will probably just get us into more trouble. If the error is -ENOMEM, we don't want to allocate new memory for the new transaction. If the error is -EIO, we probably don't want to encourage more writing either. So on the one hand, it might be good to get rid of the buffer descriptors so we don't leak memory, but that's probably also done elsewhere. I have not chased down what happens in that case, but the same thing would happen in the existing -EIO case a few lines above. On the other hand, we probably don't want to start a new transaction and start adding revokes to it, and such, due to the error. Perhaps Steve Whitehouse can weigh in? Regards, Bob Peterson Red Hat File Systems ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync @ 2017-07-27 12:47 ` Bob Peterson 0 siblings, 0 replies; 87+ messages in thread From: Bob Peterson @ 2017-07-27 12:47 UTC (permalink / raw) To: Jeff Layton Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Steven Whitehouse, cluster-devel ----- Original Message ----- | On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: | > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: | > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t | > > start, loff_t end, | > > if (ret) | > > return ret; | > > if (gfs2_is_jdata(ip)) | > > - filemap_write_and_wait(mapping); | > > + ret = file_write_and_wait(file); | > > + if (ret) | > > + return ret; | > > gfs2_ail_flush(ip->i_gl, 1); | > > } | > | > Do we want to skip flushing the AIL if there was an error (possibly | > previously encountered)? I'd think we'd want to flush the AIL then report | > the error, like this: | > | | I wondered about that. Note that earlier in the function, we also bail | out without flushing the AIL if sync_inode_metadata fails, so I assumed | that we'd want to do the same here. | | I could definitely be wrong and am fine with changing it if so. | Discarding the error like we do today seems wrong though. | | Bob, thoughts? Hi Jeff, Matthew, I'm not sure there's a right or wrong answer here. I don't know what's best from a "correctness" point of view. I guess I'm leaning toward Jeff's original solution where we don't call gfs2_ail_flush() on error. The main purpose of ail_flush is to go through buffer descriptors (bds) attached to the glock and generate revokes for them in a new transaction. If there's an error condition, trying to go through more hoops will probably just get us into more trouble. If the error is -ENOMEM, we don't want to allocate new memory for the new transaction. If the error is -EIO, we probably don't want to encourage more writing either. So on the one hand, it might be good to get rid of the buffer descriptors so we don't leak memory, but that's probably also done elsewhere. I have not chased down what happens in that case, but the same thing would happen in the existing -EIO case a few lines above. On the other hand, we probably don't want to start a new transaction and start adding revokes to it, and such, due to the error. Perhaps Steve Whitehouse can weigh in? Regards, Bob Peterson Red Hat File Systems ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync @ 2017-07-27 12:47 ` Bob Peterson 0 siblings, 0 replies; 87+ messages in thread From: Bob Peterson @ 2017-07-27 12:47 UTC (permalink / raw) To: Jeff Layton Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Steven Whitehouse, cluster-devel ----- Original Message ----- | On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: | > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: | > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t | > > start, loff_t end, | > > if (ret) | > > return ret; | > > if (gfs2_is_jdata(ip)) | > > - filemap_write_and_wait(mapping); | > > + ret = file_write_and_wait(file); | > > + if (ret) | > > + return ret; | > > gfs2_ail_flush(ip->i_gl, 1); | > > } | > | > Do we want to skip flushing the AIL if there was an error (possibly | > previously encountered)? I'd think we'd want to flush the AIL then report | > the error, like this: | > | | I wondered about that. Note that earlier in the function, we also bail | out without flushing the AIL if sync_inode_metadata fails, so I assumed | that we'd want to do the same here. | | I could definitely be wrong and am fine with changing it if so. | Discarding the error like we do today seems wrong though. | | Bob, thoughts? Hi Jeff, Matthew, I'm not sure there's a right or wrong answer here. I don't know what's best from a "correctness" point of view. I guess I'm leaning toward Jeff's original solution where we don't call gfs2_ail_flush() on error. The main purpose of ail_flush is to go through buffer descriptors (bds) attached to the glock and generate revokes for them in a new transaction. If there's an error condition, trying to go through more hoops will probably just get us into more trouble. If the error is -ENOMEM, we don't want to allocate new memory for the new transaction. If the error is -EIO, we probably don't want to encourage more writing either. So on the one hand, it might be good to get rid of the buffer descriptors so we don't leak memory, but that's probably also done elsewhere. I have not chased down what happens in that case, but the same thing would happen in the existing -EIO case a few lines above. On the other hand, we probably don't want to start a new transaction and start adding revokes to it, and such, due to the error. Perhaps Steve Whitehouse can weigh in? Regards, Bob Peterson Red Hat File Systems -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync 2017-07-27 12:47 ` Bob Peterson (?) @ 2017-07-28 12:37 ` Steven Whitehouse -1 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-28 12:37 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, On 27/07/17 13:47, Bob Peterson wrote: > ----- Original Message ----- > | On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: > | > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > | > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t > | > > start, loff_t end, > | > > if (ret) > | > > return ret; > | > > if (gfs2_is_jdata(ip)) > | > > - filemap_write_and_wait(mapping); > | > > + ret = file_write_and_wait(file); > | > > + if (ret) > | > > + return ret; > | > > gfs2_ail_flush(ip->i_gl, 1); > | > > } > | > > | > Do we want to skip flushing the AIL if there was an error (possibly > | > previously encountered)? I'd think we'd want to flush the AIL then report > | > the error, like this: > | > > | > | I wondered about that. Note that earlier in the function, we also bail > | out without flushing the AIL if sync_inode_metadata fails, so I assumed > | that we'd want to do the same here. > | > | I could definitely be wrong and am fine with changing it if so. > | Discarding the error like we do today seems wrong though. > | > | Bob, thoughts? > > Hi Jeff, Matthew, > > I'm not sure there's a right or wrong answer here. I don't know what's > best from a "correctness" point of view. > > I guess I'm leaning toward Jeff's original solution where we don't > call gfs2_ail_flush() on error. The main purpose of ail_flush is to > go through buffer descriptors (bds) attached to the glock and generate > revokes for them in a new transaction. If there's an error condition, > trying to go through more hoops will probably just get us into more > trouble. If the error is -ENOMEM, we don't want to allocate new memory > for the new transaction. If the error is -EIO, we probably don't > want to encourage more writing either. > > So on the one hand, it might be good to get rid of the buffer descriptors > so we don't leak memory, but that's probably also done elsewhere. > I have not chased down what happens in that case, but the same thing > would happen in the existing -EIO case a few lines above. > > On the other hand, we probably don't want to start a new transaction > and start adding revokes to it, and such, due to the error. > > Perhaps Steve Whitehouse can weigh in? > > Regards, > > Bob Peterson > Red Hat File Systems Yes, we probably do want to skip the ail flush if there is an error. We don't know whether the error is permanent or transient at that stage. If a previous stage of the fsync has failed, then there may be nothing for the next stage to do anyway, so it is probably not a big deal either way. So long as the error is reported to the caller, then we should be ok, Steve. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync @ 2017-07-28 12:37 ` Steven Whitehouse 0 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-28 12:37 UTC (permalink / raw) To: Bob Peterson, Jeff Layton Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, cluster-devel Hi, On 27/07/17 13:47, Bob Peterson wrote: > ----- Original Message ----- > | On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: > | > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > | > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t > | > > start, loff_t end, > | > > if (ret) > | > > return ret; > | > > if (gfs2_is_jdata(ip)) > | > > - filemap_write_and_wait(mapping); > | > > + ret = file_write_and_wait(file); > | > > + if (ret) > | > > + return ret; > | > > gfs2_ail_flush(ip->i_gl, 1); > | > > } > | > > | > Do we want to skip flushing the AIL if there was an error (possibly > | > previously encountered)? I'd think we'd want to flush the AIL then report > | > the error, like this: > | > > | > | I wondered about that. Note that earlier in the function, we also bail > | out without flushing the AIL if sync_inode_metadata fails, so I assumed > | that we'd want to do the same here. > | > | I could definitely be wrong and am fine with changing it if so. > | Discarding the error like we do today seems wrong though. > | > | Bob, thoughts? > > Hi Jeff, Matthew, > > I'm not sure there's a right or wrong answer here. I don't know what's > best from a "correctness" point of view. > > I guess I'm leaning toward Jeff's original solution where we don't > call gfs2_ail_flush() on error. The main purpose of ail_flush is to > go through buffer descriptors (bds) attached to the glock and generate > revokes for them in a new transaction. If there's an error condition, > trying to go through more hoops will probably just get us into more > trouble. If the error is -ENOMEM, we don't want to allocate new memory > for the new transaction. If the error is -EIO, we probably don't > want to encourage more writing either. > > So on the one hand, it might be good to get rid of the buffer descriptors > so we don't leak memory, but that's probably also done elsewhere. > I have not chased down what happens in that case, but the same thing > would happen in the existing -EIO case a few lines above. > > On the other hand, we probably don't want to start a new transaction > and start adding revokes to it, and such, due to the error. > > Perhaps Steve Whitehouse can weigh in? > > Regards, > > Bob Peterson > Red Hat File Systems Yes, we probably do want to skip the ail flush if there is an error. We don't know whether the error is permanent or transient at that stage. If a previous stage of the fsync has failed, then there may be nothing for the next stage to do anyway, so it is probably not a big deal either way. So long as the error is reported to the caller, then we should be ok, Steve. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync @ 2017-07-28 12:37 ` Steven Whitehouse 0 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-28 12:37 UTC (permalink / raw) To: Bob Peterson, Jeff Layton Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, cluster-devel Hi, On 27/07/17 13:47, Bob Peterson wrote: > ----- Original Message ----- > | On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: > | > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > | > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t > | > > start, loff_t end, > | > > if (ret) > | > > return ret; > | > > if (gfs2_is_jdata(ip)) > | > > - filemap_write_and_wait(mapping); > | > > + ret = file_write_and_wait(file); > | > > + if (ret) > | > > + return ret; > | > > gfs2_ail_flush(ip->i_gl, 1); > | > > } > | > > | > Do we want to skip flushing the AIL if there was an error (possibly > | > previously encountered)? I'd think we'd want to flush the AIL then report > | > the error, like this: > | > > | > | I wondered about that. Note that earlier in the function, we also bail > | out without flushing the AIL if sync_inode_metadata fails, so I assumed > | that we'd want to do the same here. > | > | I could definitely be wrong and am fine with changing it if so. > | Discarding the error like we do today seems wrong though. > | > | Bob, thoughts? > > Hi Jeff, Matthew, > > I'm not sure there's a right or wrong answer here. I don't know what's > best from a "correctness" point of view. > > I guess I'm leaning toward Jeff's original solution where we don't > call gfs2_ail_flush() on error. The main purpose of ail_flush is to > go through buffer descriptors (bds) attached to the glock and generate > revokes for them in a new transaction. If there's an error condition, > trying to go through more hoops will probably just get us into more > trouble. If the error is -ENOMEM, we don't want to allocate new memory > for the new transaction. If the error is -EIO, we probably don't > want to encourage more writing either. > > So on the one hand, it might be good to get rid of the buffer descriptors > so we don't leak memory, but that's probably also done elsewhere. > I have not chased down what happens in that case, but the same thing > would happen in the existing -EIO case a few lines above. > > On the other hand, we probably don't want to start a new transaction > and start adding revokes to it, and such, due to the error. > > Perhaps Steve Whitehouse can weigh in? > > Regards, > > Bob Peterson > Red Hat File Systems Yes, we probably do want to skip the ail flush if there is an error. We don't know whether the error is permanent or transient at that stage. If a previous stage of the fsync has failed, then there may be nothing for the next stage to do anyway, so it is probably not a big deal either way. So long as the error is reported to the caller, then we should be ok, Steve. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync 2017-07-28 12:37 ` Steven Whitehouse (?) @ 2017-07-28 12:47 ` Jeff Layton -1 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-28 12:47 UTC (permalink / raw) To: cluster-devel.redhat.com On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote: > Hi, > > > On 27/07/17 13:47, Bob Peterson wrote: > > ----- Original Message ----- > > > On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: > > > > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > > > > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t > > > > > start, loff_t end, > > > > > if (ret) > > > > > return ret; > > > > > if (gfs2_is_jdata(ip)) > > > > > - filemap_write_and_wait(mapping); > > > > > + ret = file_write_and_wait(file); > > > > > + if (ret) > > > > > + return ret; > > > > > gfs2_ail_flush(ip->i_gl, 1); > > > > > } > > > > > > > > Do we want to skip flushing the AIL if there was an error (possibly > > > > previously encountered)? I'd think we'd want to flush the AIL then report > > > > the error, like this: > > > > > > > > > > I wondered about that. Note that earlier in the function, we also bail > > > out without flushing the AIL if sync_inode_metadata fails, so I assumed > > > that we'd want to do the same here. > > > > > > I could definitely be wrong and am fine with changing it if so. > > > Discarding the error like we do today seems wrong though. > > > > > > Bob, thoughts? > > > > Hi Jeff, Matthew, > > > > I'm not sure there's a right or wrong answer here. I don't know what's > > best from a "correctness" point of view. > > > > I guess I'm leaning toward Jeff's original solution where we don't > > call gfs2_ail_flush() on error. The main purpose of ail_flush is to > > go through buffer descriptors (bds) attached to the glock and generate > > revokes for them in a new transaction. If there's an error condition, > > trying to go through more hoops will probably just get us into more > > trouble. If the error is -ENOMEM, we don't want to allocate new memory > > for the new transaction. If the error is -EIO, we probably don't > > want to encourage more writing either. > > > > So on the one hand, it might be good to get rid of the buffer descriptors > > so we don't leak memory, but that's probably also done elsewhere. > > I have not chased down what happens in that case, but the same thing > > would happen in the existing -EIO case a few lines above. > > > > On the other hand, we probably don't want to start a new transaction > > and start adding revokes to it, and such, due to the error. > > > > Perhaps Steve Whitehouse can weigh in? > > > > Regards, > > > > Bob Peterson > > Red Hat File Systems > > Yes, we probably do want to skip the ail flush if there is an error. We > don't know whether the error is permanent or transient at that stage. If > a previous stage of the fsync has failed, then there may be nothing for > the next stage to do anyway, so it is probably not a big deal either > way. So long as the error is reported to the caller, then we should be ok, > Ok, cool. I'll plan to carry this patch for now as it depends on an earlier one in the series. One more question though: Is it correct in the gfs2_is_jdata case to ignore the range that was passed in from the caller? ->fsync gets start and end arguments, but this will always write back the whole range. Is that necessary in this case? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync @ 2017-07-28 12:47 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-28 12:47 UTC (permalink / raw) To: Steven Whitehouse, Bob Peterson Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, cluster-devel On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote: > Hi, > > > On 27/07/17 13:47, Bob Peterson wrote: > > ----- Original Message ----- > > > On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: > > > > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > > > > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t > > > > > start, loff_t end, > > > > > if (ret) > > > > > return ret; > > > > > if (gfs2_is_jdata(ip)) > > > > > - filemap_write_and_wait(mapping); > > > > > + ret = file_write_and_wait(file); > > > > > + if (ret) > > > > > + return ret; > > > > > gfs2_ail_flush(ip->i_gl, 1); > > > > > } > > > > > > > > Do we want to skip flushing the AIL if there was an error (possibly > > > > previously encountered)? I'd think we'd want to flush the AIL then report > > > > the error, like this: > > > > > > > > > > I wondered about that. Note that earlier in the function, we also bail > > > out without flushing the AIL if sync_inode_metadata fails, so I assumed > > > that we'd want to do the same here. > > > > > > I could definitely be wrong and am fine with changing it if so. > > > Discarding the error like we do today seems wrong though. > > > > > > Bob, thoughts? > > > > Hi Jeff, Matthew, > > > > I'm not sure there's a right or wrong answer here. I don't know what's > > best from a "correctness" point of view. > > > > I guess I'm leaning toward Jeff's original solution where we don't > > call gfs2_ail_flush() on error. The main purpose of ail_flush is to > > go through buffer descriptors (bds) attached to the glock and generate > > revokes for them in a new transaction. If there's an error condition, > > trying to go through more hoops will probably just get us into more > > trouble. If the error is -ENOMEM, we don't want to allocate new memory > > for the new transaction. If the error is -EIO, we probably don't > > want to encourage more writing either. > > > > So on the one hand, it might be good to get rid of the buffer descriptors > > so we don't leak memory, but that's probably also done elsewhere. > > I have not chased down what happens in that case, but the same thing > > would happen in the existing -EIO case a few lines above. > > > > On the other hand, we probably don't want to start a new transaction > > and start adding revokes to it, and such, due to the error. > > > > Perhaps Steve Whitehouse can weigh in? > > > > Regards, > > > > Bob Peterson > > Red Hat File Systems > > Yes, we probably do want to skip the ail flush if there is an error. We > don't know whether the error is permanent or transient at that stage. If > a previous stage of the fsync has failed, then there may be nothing for > the next stage to do anyway, so it is probably not a big deal either > way. So long as the error is reported to the caller, then we should be ok, > Ok, cool. I'll plan to carry this patch for now as it depends on an earlier one in the series. One more question though: Is it correct in the gfs2_is_jdata case to ignore the range that was passed in from the caller? ->fsync gets start and end arguments, but this will always write back the whole range. Is that necessary in this case? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync @ 2017-07-28 12:47 ` Jeff Layton 0 siblings, 0 replies; 87+ messages in thread From: Jeff Layton @ 2017-07-28 12:47 UTC (permalink / raw) To: Steven Whitehouse, Bob Peterson Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, cluster-devel On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote: > Hi, > > > On 27/07/17 13:47, Bob Peterson wrote: > > ----- Original Message ----- > > > On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: > > > > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: > > > > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t > > > > > start, loff_t end, > > > > > if (ret) > > > > > return ret; > > > > > if (gfs2_is_jdata(ip)) > > > > > - filemap_write_and_wait(mapping); > > > > > + ret = file_write_and_wait(file); > > > > > + if (ret) > > > > > + return ret; > > > > > gfs2_ail_flush(ip->i_gl, 1); > > > > > } > > > > > > > > Do we want to skip flushing the AIL if there was an error (possibly > > > > previously encountered)? I'd think we'd want to flush the AIL then report > > > > the error, like this: > > > > > > > > > > I wondered about that. Note that earlier in the function, we also bail > > > out without flushing the AIL if sync_inode_metadata fails, so I assumed > > > that we'd want to do the same here. > > > > > > I could definitely be wrong and am fine with changing it if so. > > > Discarding the error like we do today seems wrong though. > > > > > > Bob, thoughts? > > > > Hi Jeff, Matthew, > > > > I'm not sure there's a right or wrong answer here. I don't know what's > > best from a "correctness" point of view. > > > > I guess I'm leaning toward Jeff's original solution where we don't > > call gfs2_ail_flush() on error. The main purpose of ail_flush is to > > go through buffer descriptors (bds) attached to the glock and generate > > revokes for them in a new transaction. If there's an error condition, > > trying to go through more hoops will probably just get us into more > > trouble. If the error is -ENOMEM, we don't want to allocate new memory > > for the new transaction. If the error is -EIO, we probably don't > > want to encourage more writing either. > > > > So on the one hand, it might be good to get rid of the buffer descriptors > > so we don't leak memory, but that's probably also done elsewhere. > > I have not chased down what happens in that case, but the same thing > > would happen in the existing -EIO case a few lines above. > > > > On the other hand, we probably don't want to start a new transaction > > and start adding revokes to it, and such, due to the error. > > > > Perhaps Steve Whitehouse can weigh in? > > > > Regards, > > > > Bob Peterson > > Red Hat File Systems > > Yes, we probably do want to skip the ail flush if there is an error. We > don't know whether the error is permanent or transient at that stage. If > a previous stage of the fsync has failed, then there may be nothing for > the next stage to do anyway, so it is probably not a big deal either > way. So long as the error is reported to the caller, then we should be ok, > Ok, cool. I'll plan to carry this patch for now as it depends on an earlier one in the series. One more question though: Is it correct in the gfs2_is_jdata case to ignore the range that was passed in from the caller? ->fsync gets start and end arguments, but this will always write back the whole range. Is that necessary in this case? -- Jeff Layton <jlayton@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
* [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync 2017-07-28 12:47 ` Jeff Layton (?) @ 2017-07-28 12:54 ` Steven Whitehouse -1 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-28 12:54 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, On 28/07/17 13:47, Jeff Layton wrote: > On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote: >> Hi, >> >> >> On 27/07/17 13:47, Bob Peterson wrote: >>> ----- Original Message ----- >>>> On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: >>>>> On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: >>>>>> @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t >>>>>> start, loff_t end, >>>>>> if (ret) >>>>>> return ret; >>>>>> if (gfs2_is_jdata(ip)) >>>>>> - filemap_write_and_wait(mapping); >>>>>> + ret = file_write_and_wait(file); >>>>>> + if (ret) >>>>>> + return ret; >>>>>> gfs2_ail_flush(ip->i_gl, 1); >>>>>> } >>>>> Do we want to skip flushing the AIL if there was an error (possibly >>>>> previously encountered)? I'd think we'd want to flush the AIL then report >>>>> the error, like this: >>>>> >>>> I wondered about that. Note that earlier in the function, we also bail >>>> out without flushing the AIL if sync_inode_metadata fails, so I assumed >>>> that we'd want to do the same here. >>>> >>>> I could definitely be wrong and am fine with changing it if so. >>>> Discarding the error like we do today seems wrong though. >>>> >>>> Bob, thoughts? >>> Hi Jeff, Matthew, >>> >>> I'm not sure there's a right or wrong answer here. I don't know what's >>> best from a "correctness" point of view. >>> >>> I guess I'm leaning toward Jeff's original solution where we don't >>> call gfs2_ail_flush() on error. The main purpose of ail_flush is to >>> go through buffer descriptors (bds) attached to the glock and generate >>> revokes for them in a new transaction. If there's an error condition, >>> trying to go through more hoops will probably just get us into more >>> trouble. If the error is -ENOMEM, we don't want to allocate new memory >>> for the new transaction. If the error is -EIO, we probably don't >>> want to encourage more writing either. >>> >>> So on the one hand, it might be good to get rid of the buffer descriptors >>> so we don't leak memory, but that's probably also done elsewhere. >>> I have not chased down what happens in that case, but the same thing >>> would happen in the existing -EIO case a few lines above. >>> >>> On the other hand, we probably don't want to start a new transaction >>> and start adding revokes to it, and such, due to the error. >>> >>> Perhaps Steve Whitehouse can weigh in? >>> >>> Regards, >>> >>> Bob Peterson >>> Red Hat File Systems >> Yes, we probably do want to skip the ail flush if there is an error. We >> don't know whether the error is permanent or transient at that stage. If >> a previous stage of the fsync has failed, then there may be nothing for >> the next stage to do anyway, so it is probably not a big deal either >> way. So long as the error is reported to the caller, then we should be ok, >> > Ok, cool. I'll plan to carry this patch for now as it depends on an > earlier one in the series. One more question though: > > Is it correct in the gfs2_is_jdata case to ignore the range that was > passed in from the caller? ->fsync gets start and end arguments, but > this will always write back the whole range. Is that necessary in this > case? > It probably doesn't matter really. We try to discourage the use of jdata from userspace. There are a few internal files that use it still, and it is there for backwards compatibility more than anything. So performance is generally not a problem for that. The ordered write mode is the important one. So you are right that it might be better to add the range into that call too, but it is not likely that anybody will notice the performance improvement, Steve. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync @ 2017-07-28 12:54 ` Steven Whitehouse 0 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-28 12:54 UTC (permalink / raw) To: Jeff Layton, Bob Peterson Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, cluster-devel Hi, On 28/07/17 13:47, Jeff Layton wrote: > On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote: >> Hi, >> >> >> On 27/07/17 13:47, Bob Peterson wrote: >>> ----- Original Message ----- >>>> On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: >>>>> On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: >>>>>> @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t >>>>>> start, loff_t end, >>>>>> if (ret) >>>>>> return ret; >>>>>> if (gfs2_is_jdata(ip)) >>>>>> - filemap_write_and_wait(mapping); >>>>>> + ret = file_write_and_wait(file); >>>>>> + if (ret) >>>>>> + return ret; >>>>>> gfs2_ail_flush(ip->i_gl, 1); >>>>>> } >>>>> Do we want to skip flushing the AIL if there was an error (possibly >>>>> previously encountered)? I'd think we'd want to flush the AIL then report >>>>> the error, like this: >>>>> >>>> I wondered about that. Note that earlier in the function, we also bail >>>> out without flushing the AIL if sync_inode_metadata fails, so I assumed >>>> that we'd want to do the same here. >>>> >>>> I could definitely be wrong and am fine with changing it if so. >>>> Discarding the error like we do today seems wrong though. >>>> >>>> Bob, thoughts? >>> Hi Jeff, Matthew, >>> >>> I'm not sure there's a right or wrong answer here. I don't know what's >>> best from a "correctness" point of view. >>> >>> I guess I'm leaning toward Jeff's original solution where we don't >>> call gfs2_ail_flush() on error. The main purpose of ail_flush is to >>> go through buffer descriptors (bds) attached to the glock and generate >>> revokes for them in a new transaction. If there's an error condition, >>> trying to go through more hoops will probably just get us into more >>> trouble. If the error is -ENOMEM, we don't want to allocate new memory >>> for the new transaction. If the error is -EIO, we probably don't >>> want to encourage more writing either. >>> >>> So on the one hand, it might be good to get rid of the buffer descriptors >>> so we don't leak memory, but that's probably also done elsewhere. >>> I have not chased down what happens in that case, but the same thing >>> would happen in the existing -EIO case a few lines above. >>> >>> On the other hand, we probably don't want to start a new transaction >>> and start adding revokes to it, and such, due to the error. >>> >>> Perhaps Steve Whitehouse can weigh in? >>> >>> Regards, >>> >>> Bob Peterson >>> Red Hat File Systems >> Yes, we probably do want to skip the ail flush if there is an error. We >> don't know whether the error is permanent or transient at that stage. If >> a previous stage of the fsync has failed, then there may be nothing for >> the next stage to do anyway, so it is probably not a big deal either >> way. So long as the error is reported to the caller, then we should be ok, >> > Ok, cool. I'll plan to carry this patch for now as it depends on an > earlier one in the series. One more question though: > > Is it correct in the gfs2_is_jdata case to ignore the range that was > passed in from the caller? ->fsync gets start and end arguments, but > this will always write back the whole range. Is that necessary in this > case? > It probably doesn't matter really. We try to discourage the use of jdata from userspace. There are a few internal files that use it still, and it is there for backwards compatibility more than anything. So performance is generally not a problem for that. The ordered write mode is the important one. So you are right that it might be better to add the range into that call too, but it is not likely that anybody will notice the performance improvement, Steve. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync @ 2017-07-28 12:54 ` Steven Whitehouse 0 siblings, 0 replies; 87+ messages in thread From: Steven Whitehouse @ 2017-07-28 12:54 UTC (permalink / raw) To: Jeff Layton, Bob Peterson Cc: Matthew Wilcox, Jeff Layton, Alexander Viro, Jan Kara, J . Bruce Fields, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, cluster-devel Hi, On 28/07/17 13:47, Jeff Layton wrote: > On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote: >> Hi, >> >> >> On 27/07/17 13:47, Bob Peterson wrote: >>> ----- Original Message ----- >>>> On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote: >>>>> On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote: >>>>>> @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t >>>>>> start, loff_t end, >>>>>> if (ret) >>>>>> return ret; >>>>>> if (gfs2_is_jdata(ip)) >>>>>> - filemap_write_and_wait(mapping); >>>>>> + ret = file_write_and_wait(file); >>>>>> + if (ret) >>>>>> + return ret; >>>>>> gfs2_ail_flush(ip->i_gl, 1); >>>>>> } >>>>> Do we want to skip flushing the AIL if there was an error (possibly >>>>> previously encountered)? I'd think we'd want to flush the AIL then report >>>>> the error, like this: >>>>> >>>> I wondered about that. Note that earlier in the function, we also bail >>>> out without flushing the AIL if sync_inode_metadata fails, so I assumed >>>> that we'd want to do the same here. >>>> >>>> I could definitely be wrong and am fine with changing it if so. >>>> Discarding the error like we do today seems wrong though. >>>> >>>> Bob, thoughts? >>> Hi Jeff, Matthew, >>> >>> I'm not sure there's a right or wrong answer here. I don't know what's >>> best from a "correctness" point of view. >>> >>> I guess I'm leaning toward Jeff's original solution where we don't >>> call gfs2_ail_flush() on error. The main purpose of ail_flush is to >>> go through buffer descriptors (bds) attached to the glock and generate >>> revokes for them in a new transaction. If there's an error condition, >>> trying to go through more hoops will probably just get us into more >>> trouble. If the error is -ENOMEM, we don't want to allocate new memory >>> for the new transaction. If the error is -EIO, we probably don't >>> want to encourage more writing either. >>> >>> So on the one hand, it might be good to get rid of the buffer descriptors >>> so we don't leak memory, but that's probably also done elsewhere. >>> I have not chased down what happens in that case, but the same thing >>> would happen in the existing -EIO case a few lines above. >>> >>> On the other hand, we probably don't want to start a new transaction >>> and start adding revokes to it, and such, due to the error. >>> >>> Perhaps Steve Whitehouse can weigh in? >>> >>> Regards, >>> >>> Bob Peterson >>> Red Hat File Systems >> Yes, we probably do want to skip the ail flush if there is an error. We >> don't know whether the error is permanent or transient at that stage. If >> a previous stage of the fsync has failed, then there may be nothing for >> the next stage to do anyway, so it is probably not a big deal either >> way. So long as the error is reported to the caller, then we should be ok, >> > Ok, cool. I'll plan to carry this patch for now as it depends on an > earlier one in the series. One more question though: > > Is it correct in the gfs2_is_jdata case to ignore the range that was > passed in from the caller? ->fsync gets start and end arguments, but > this will always write back the whole range. Is that necessary in this > case? > It probably doesn't matter really. We try to discourage the use of jdata from userspace. There are a few internal files that use it still, and it is there for backwards compatibility more than anything. So performance is generally not a problem for that. The ordered write mode is the important one. So you are right that it might be better to add the range into that call too, but it is not likely that anybody will notice the performance improvement, Steve. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 87+ messages in thread
end of thread, other threads:[~2017-08-01 9:52 UTC | newest] Thread overview: 87+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-07-26 17:55 [Cluster-devel] [PATCH v2 0/4] mm/gfs2: extend file_* API, and convert gfs2 to errseq_t error reporting Jeff Layton 2017-07-26 17:55 ` Jeff Layton 2017-07-26 17:55 ` Jeff Layton 2017-07-26 17:55 ` [Cluster-devel] [PATCH v2 1/4] mm: consolidate dax / non-dax checks for writeback Jeff Layton 2017-07-26 17:55 ` Jeff Layton 2017-07-26 17:55 ` Jeff Layton 2017-07-27 8:43 ` [Cluster-devel] " Jan Kara 2017-07-27 8:43 ` Jan Kara 2017-07-27 8:43 ` Jan Kara 2017-07-26 17:55 ` [Cluster-devel] [PATCH v2 2/4] mm: add file_fdatawait_range and file_write_and_wait Jeff Layton 2017-07-26 17:55 ` Jeff Layton 2017-07-26 17:55 ` Jeff Layton 2017-07-26 19:13 ` [Cluster-devel] " Matthew Wilcox 2017-07-26 19:13 ` Matthew Wilcox 2017-07-26 19:13 ` Matthew Wilcox 2017-07-26 22:18 ` [Cluster-devel] " Jeff Layton 2017-07-26 22:18 ` Jeff Layton 2017-07-26 22:18 ` Jeff Layton 2017-07-26 19:50 ` [Cluster-devel] " Bob Peterson 2017-07-26 19:50 ` Bob Peterson 2017-07-26 19:50 ` Bob Peterson 2017-07-27 8:49 ` [Cluster-devel] " Jan Kara 2017-07-27 8:49 ` Jan Kara 2017-07-27 8:49 ` Jan Kara 2017-07-27 12:48 ` [Cluster-devel] " Jeff Layton 2017-07-27 12:48 ` Jeff Layton 2017-07-27 12:48 ` Jeff Layton 2017-07-31 11:27 ` [Cluster-devel] " Jeff Layton 2017-07-31 11:27 ` Jeff Layton 2017-07-31 11:27 ` Jeff Layton 2017-07-31 11:32 ` [Cluster-devel] " Steven Whitehouse 2017-07-31 11:32 ` Steven Whitehouse 2017-07-31 11:32 ` Steven Whitehouse 2017-07-31 11:44 ` [Cluster-devel] " Jeff Layton 2017-07-31 11:44 ` Jeff Layton 2017-07-31 11:44 ` Jeff Layton 2017-07-31 12:05 ` [Cluster-devel] " Steven Whitehouse 2017-07-31 12:05 ` Steven Whitehouse 2017-07-31 12:05 ` Steven Whitehouse 2017-07-31 12:22 ` [Cluster-devel] " Jeff Layton 2017-07-31 12:22 ` Jeff Layton 2017-07-31 12:22 ` Jeff Layton 2017-07-31 12:25 ` [Cluster-devel] " Steven Whitehouse 2017-07-31 12:25 ` Steven Whitehouse 2017-07-31 12:25 ` Steven Whitehouse 2017-07-31 12:38 ` [Cluster-devel] " Bob Peterson 2017-07-31 12:38 ` Bob Peterson 2017-07-31 12:38 ` Bob Peterson 2017-07-31 12:07 ` [Cluster-devel] " Jan Kara 2017-07-31 12:07 ` Jan Kara 2017-07-31 12:07 ` Jan Kara 2017-07-31 13:00 ` [Cluster-devel] " Jeff Layton 2017-07-31 13:00 ` Jeff Layton 2017-07-31 13:00 ` Jeff Layton 2017-07-31 13:32 ` [Cluster-devel] " Jan Kara 2017-07-31 13:32 ` Jan Kara 2017-07-31 13:32 ` Jan Kara 2017-07-31 16:49 ` [Cluster-devel] [PATCH v3] " Jeff Layton 2017-07-31 16:49 ` Jeff Layton 2017-07-31 16:49 ` Jeff Layton 2017-08-01 9:52 ` [Cluster-devel] " Jan Kara 2017-08-01 9:52 ` Jan Kara 2017-08-01 9:52 ` Jan Kara 2017-07-26 17:55 ` [Cluster-devel] [PATCH v2 3/4] fs: convert sync_file_range to use errseq_t based error-tracking Jeff Layton 2017-07-26 17:55 ` Jeff Layton 2017-07-26 17:55 ` Jeff Layton 2017-07-26 17:55 ` [Cluster-devel] [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync Jeff Layton 2017-07-26 17:55 ` Jeff Layton 2017-07-26 17:55 ` Jeff Layton 2017-07-26 19:21 ` [Cluster-devel] " Matthew Wilcox 2017-07-26 19:21 ` Matthew Wilcox 2017-07-26 19:21 ` Matthew Wilcox 2017-07-26 22:22 ` [Cluster-devel] " Jeff Layton 2017-07-26 22:22 ` Jeff Layton 2017-07-26 22:22 ` Jeff Layton 2017-07-27 12:47 ` [Cluster-devel] " Bob Peterson 2017-07-27 12:47 ` Bob Peterson 2017-07-27 12:47 ` Bob Peterson 2017-07-28 12:37 ` [Cluster-devel] " Steven Whitehouse 2017-07-28 12:37 ` Steven Whitehouse 2017-07-28 12:37 ` Steven Whitehouse 2017-07-28 12:47 ` [Cluster-devel] " Jeff Layton 2017-07-28 12:47 ` Jeff Layton 2017-07-28 12:47 ` Jeff Layton 2017-07-28 12:54 ` [Cluster-devel] " Steven Whitehouse 2017-07-28 12:54 ` Steven Whitehouse 2017-07-28 12:54 ` Steven Whitehouse
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.