* [PATCH 1/1] nilfs2: remove unnecessary call to nilfs_construct_dsync_segment()
@ 2014-11-01 17:01 Andreas Rohner
[not found] ` <1414861267-13103-1-git-send-email-andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Andreas Rohner @ 2014-11-01 17:01 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner
If some of the pages between start and end are dirty, then
filemap_write_and_wait_range() calls nilfs_writepages() with WB_SYNC_ALL
set in the writeback_control structure. This initiates the construction
of a dsync segment via nilfs_construct_dsync_segment(). The problem is,
that the construction of a dsync segment doesnt't remove the inode from
die i_dirty list and doesn't clear the NILFS_I_DIRTY flag. So
nilfs_inode_dirty() still returns true after
nilfs_construct_dsync_segment() succeded. This leads to an
unnecessary second call to nilfs_construct_dsync_segment() in
nilfs_sync_file() if datasync is true.
This patch simply removes the second invokation of
nilfs_construct_dsync_segment().
Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
---
fs/nilfs2/file.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
index e9e3325..b12e0ab 100644
--- a/fs/nilfs2/file.c
+++ b/fs/nilfs2/file.c
@@ -46,13 +46,9 @@ int nilfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
return err;
mutex_lock(&inode->i_mutex);
- if (nilfs_inode_dirty(inode)) {
- if (datasync)
- err = nilfs_construct_dsync_segment(inode->i_sb, inode,
- 0, LLONG_MAX);
- else
- err = nilfs_construct_segment(inode->i_sb);
- }
+ if (!datasync && nilfs_inode_dirty(inode))
+ err = nilfs_construct_segment(inode->i_sb);
+
mutex_unlock(&inode->i_mutex);
nilfs = inode->i_sb->s_fs_info;
--
2.1.3
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 7+ messages in thread[parent not found: <1414861267-13103-1-git-send-email-andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>]
* Re: [PATCH 1/1] nilfs2: remove unnecessary call to nilfs_construct_dsync_segment() [not found] ` <1414861267-13103-1-git-send-email-andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> @ 2014-11-04 14:34 ` Ryusuke Konishi [not found] ` <20141104.233458.804333886341114381.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Ryusuke Konishi @ 2014-11-04 14:34 UTC (permalink / raw) To: Andreas Rohner; +Cc: linux-nilfs Hi Andreas, On Sat, 1 Nov 2014 18:01:07 +0100, Andreas Rohner wrote: > If some of the pages between start and end are dirty, then > filemap_write_and_wait_range() calls nilfs_writepages() with WB_SYNC_ALL > set in the writeback_control structure. This initiates the construction > of a dsync segment via nilfs_construct_dsync_segment(). The problem is, > that the construction of a dsync segment doesnt't remove the inode from > die i_dirty list and doesn't clear the NILFS_I_DIRTY flag. So > nilfs_inode_dirty() still returns true after > nilfs_construct_dsync_segment() succeded. This leads to an > unnecessary second call to nilfs_construct_dsync_segment() in > nilfs_sync_file() if datasync is true. > > This patch simply removes the second invokation of > nilfs_construct_dsync_segment(). > > Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> Thank you for posting this patch. This optimization looks to become possible by the commit 02c24a821 "fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers". I haven't noticed that the change makes it possible to simplify nilfs_sync_file() like this. One my simple question is why you removed the call to nilfs_construct_dsync_segment() instead of filemap_write_and_wait_range(). If the datasync flag is false, nilfs_sync_file() first calls nilfs_construct_dsync_segment() via filemap_write_and_wait_range() __filemap_fdatawrite_range(,, WB_SYNC_ALL) do_writepages() nilfs_writepages() nilfs_construct_dsync_segment() and then calls nilfs_construct_segment(). Since each call to nilfs_construct_segment() or nilfs_construct_dsync_segment() implies an IO completion wait, it seems that this doubles the latency of fsync(). Do you really need to call filemap_write_and_wait_range() in nilfs_sync_file() ? Regards, Ryusuke Konishi > --- > fs/nilfs2/file.c | 10 +++------- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c > index e9e3325..b12e0ab 100644 > --- a/fs/nilfs2/file.c > +++ b/fs/nilfs2/file.c > @@ -46,13 +46,9 @@ int nilfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) > return err; > mutex_lock(&inode->i_mutex); > > - if (nilfs_inode_dirty(inode)) { > - if (datasync) > - err = nilfs_construct_dsync_segment(inode->i_sb, inode, > - 0, LLONG_MAX); > - else > - err = nilfs_construct_segment(inode->i_sb); > - } > + if (!datasync && nilfs_inode_dirty(inode)) > + err = nilfs_construct_segment(inode->i_sb); > + > mutex_unlock(&inode->i_mutex); > > nilfs = inode->i_sb->s_fs_info; > -- > 2.1.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20141104.233458.804333886341114381.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]
* Re: [PATCH 1/1] nilfs2: remove unnecessary call to nilfs_construct_dsync_segment() [not found] ` <20141104.233458.804333886341114381.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> @ 2014-11-04 15:50 ` Andreas Rohner [not found] ` <5458F5BD.2030908-hi6Y0CQ0nG0@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Andreas Rohner @ 2014-11-04 15:50 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs Hi Ryusuke, On 2014-11-04 15:34, Ryusuke Konishi wrote: > Hi Andreas, > On Sat, 1 Nov 2014 18:01:07 +0100, Andreas Rohner wrote: >> If some of the pages between start and end are dirty, then >> filemap_write_and_wait_range() calls nilfs_writepages() with WB_SYNC_ALL >> set in the writeback_control structure. This initiates the construction >> of a dsync segment via nilfs_construct_dsync_segment(). The problem is, >> that the construction of a dsync segment doesnt't remove the inode from >> die i_dirty list and doesn't clear the NILFS_I_DIRTY flag. So >> nilfs_inode_dirty() still returns true after >> nilfs_construct_dsync_segment() succeded. This leads to an >> unnecessary second call to nilfs_construct_dsync_segment() in >> nilfs_sync_file() if datasync is true. >> >> This patch simply removes the second invokation of >> nilfs_construct_dsync_segment(). >> >> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> > > Thank you for posting this patch. > > This optimization looks to become possible by the commit 02c24a821 > "fs: push i_mutex and filemap_write_and_wait down into ->fsync() > handlers". I haven't noticed that the change makes it possible to > simplify nilfs_sync_file() like this. > > One my simple question is why you removed the call to > nilfs_construct_dsync_segment() instead of > filemap_write_and_wait_range(). > > If the datasync flag is false, nilfs_sync_file() first calls > nilfs_construct_dsync_segment() via > > filemap_write_and_wait_range() > __filemap_fdatawrite_range(,, WB_SYNC_ALL) > do_writepages() > nilfs_writepages() > nilfs_construct_dsync_segment() > > and then calls nilfs_construct_segment(). Exactly. > Since each call to nilfs_construct_segment() or > nilfs_construct_dsync_segment() implies an IO completion wait, it > seems that this doubles the latency of fsync(). > > Do you really need to call filemap_write_and_wait_range() in > nilfs_sync_file() ? I don't think we need it, but I found the following paragraph in Documentation/filesystems/porting: [mandatory] If you have your own ->fsync() you must make sure to call filemap_write_and_wait_range() so that all dirty pages are synced out properly. You must also keep in mind that ->fsync() is not called with i_mutex held anymore, so if you require i_mutex locking you must make sure to take it and release it yourself. So I was unsure, if it is safe to remove it. But maybe I interpreted that wrongly, since nilfs_construct_dsync_segment() and nilfs_construct_segment() write out all dirty pages anyway, there is no need for filemap_write_and_wait_range(). Also do we need i_mutex? As far as I can tell all relevant code blocks are wrapped in nilfs_transaction_begin/commit/abort(). Best regards, Andreas Rohner > Regards, > Ryusuke Konishi > > >> --- >> fs/nilfs2/file.c | 10 +++------- >> 1 file changed, 3 insertions(+), 7 deletions(-) >> >> diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c >> index e9e3325..b12e0ab 100644 >> --- a/fs/nilfs2/file.c >> +++ b/fs/nilfs2/file.c >> @@ -46,13 +46,9 @@ int nilfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) >> return err; >> mutex_lock(&inode->i_mutex); >> >> - if (nilfs_inode_dirty(inode)) { >> - if (datasync) >> - err = nilfs_construct_dsync_segment(inode->i_sb, inode, >> - 0, LLONG_MAX); >> - else >> - err = nilfs_construct_segment(inode->i_sb); >> - } >> + if (!datasync && nilfs_inode_dirty(inode)) >> + err = nilfs_construct_segment(inode->i_sb); >> + >> mutex_unlock(&inode->i_mutex); >> >> nilfs = inode->i_sb->s_fs_info; >> -- >> 2.1.3 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <5458F5BD.2030908-hi6Y0CQ0nG0@public.gmane.org>]
* Re: [PATCH 1/1] nilfs2: remove unnecessary call to nilfs_construct_dsync_segment() [not found] ` <5458F5BD.2030908-hi6Y0CQ0nG0@public.gmane.org> @ 2014-11-05 0:07 ` Ryusuke Konishi [not found] ` <20141105.090704.583801206386960498.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Ryusuke Konishi @ 2014-11-05 0:07 UTC (permalink / raw) To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Tue, 04 Nov 2014 16:50:21 +0100, Andreas Rohner wrote: > On 2014-11-04 15:34, Ryusuke Konishi wrote: >> Since each call to nilfs_construct_segment() or >> nilfs_construct_dsync_segment() implies an IO completion wait, it >> seems that this doubles the latency of fsync(). >> >> Do you really need to call filemap_write_and_wait_range() in >> nilfs_sync_file() ? > > I don't think we need it, but I found the following paragraph in > Documentation/filesystems/porting: > > [mandatory] > If you have your own ->fsync() you must make sure to call > filemap_write_and_wait_range() so that all dirty pages are synced out > properly. You must also keep in mind that ->fsync() is not called with > i_mutex held anymore, so if you require i_mutex locking you must make > sure to take it and release it yourself. > > So I was unsure, if it is safe to remove it. But maybe I interpreted > that wrongly, since nilfs_construct_dsync_segment() and > nilfs_construct_segment() write out all dirty pages anyway, there is no > need for filemap_write_and_wait_range(). I found filemap_write_and_wait_range() returns error status of already done page I/Os via filemap_check_errors(). We need to look into what it does. > Also do we need i_mutex? As far as I can tell all relevant code blocks > are wrapped in nilfs_transaction_begin/commit/abort(). Yes, we may also remove the i_mutex. We have to confirm what i_mutex protects for nilfs. Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20141105.090704.583801206386960498.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]
* Re: [PATCH 1/1] nilfs2: remove unnecessary call to nilfs_construct_dsync_segment() [not found] ` <20141105.090704.583801206386960498.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> @ 2014-11-05 17:08 ` Andreas Rohner [not found] ` <545A59A9.8090506-hi6Y0CQ0nG0@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Andreas Rohner @ 2014-11-05 17:08 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On 2014-11-05 01:07, Ryusuke Konishi wrote: > On Tue, 04 Nov 2014 16:50:21 +0100, Andreas Rohner wrote: >> On 2014-11-04 15:34, Ryusuke Konishi wrote: >>> Since each call to nilfs_construct_segment() or >>> nilfs_construct_dsync_segment() implies an IO completion wait, it >>> seems that this doubles the latency of fsync(). >>> >>> Do you really need to call filemap_write_and_wait_range() in >>> nilfs_sync_file() ? >> >> I don't think we need it, but I found the following paragraph in >> Documentation/filesystems/porting: >> >> [mandatory] >> If you have your own ->fsync() you must make sure to call >> filemap_write_and_wait_range() so that all dirty pages are synced out >> properly. You must also keep in mind that ->fsync() is not called with >> i_mutex held anymore, so if you require i_mutex locking you must make >> sure to take it and release it yourself. >> >> So I was unsure, if it is safe to remove it. But maybe I interpreted >> that wrongly, since nilfs_construct_dsync_segment() and >> nilfs_construct_segment() write out all dirty pages anyway, there is no >> need for filemap_write_and_wait_range(). > > I found filemap_write_and_wait_range() returns error status of > already done page I/Os via filemap_check_errors(). We need to > look into what it does. I have looked into this a bit. AS_EIO and AS_ENOSPC are asynchronous error flags, set by the function mapping_set_error(). However I don't think this is relevant for NILFS2, because it implements its own writepages() function: nilfs_sync_file() filemap_write_and_wait_range() __filemap_fdatawrite_range() do_writepages() writepages() nilfs_writepages() mapping_set_error() would only be called if NILFS2 would use generic_writepages() like this: nilfs_sync_file() filemap_write_and_wait_range() __filemap_fdatawrite_range() do_writepages() generic_writepages() But it doesn't, so we can ignore filemap_check_errors(). Furthermore NILFS2 doesn't use the generic writeback mechanism of the kernel at all. It creates its own bio in nilfs_segbuf_submit_bh(), submits the bio with nilfs_segbuf_submit_bio() and waits for it with nilfs_segbuf_wait() and records IO-errors in segbuf->sb_err, so there is no need to check AS_EIO and AS_ENOSPC. I think filemap_write_and_wait_range() is mostly useful for in place updates. A copy on write filesystem like NILFS2 doesn't need it. BTRFS doesn't use it either in its fsync function... >> Also do we need i_mutex? As far as I can tell all relevant code blocks >> are wrapped in nilfs_transaction_begin/commit/abort(). > > Yes, we may also remove the i_mutex. We have to confirm what i_mutex > protects for nilfs. There are some callback functions which are called with i_mutex already held, but I can't find documentation about that right now. I'm sure I saw it somewhere. Anyway I am going to look into this as well. Regards, Andreas Rohner > Regards, > Ryusuke Konishi > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <545A59A9.8090506-hi6Y0CQ0nG0@public.gmane.org>]
* Re: [PATCH 1/1] nilfs2: remove unnecessary call to nilfs_construct_dsync_segment() [not found] ` <545A59A9.8090506-hi6Y0CQ0nG0@public.gmane.org> @ 2014-11-11 16:58 ` Ryusuke Konishi [not found] ` <20141112.015839.1565017343097841658.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Ryusuke Konishi @ 2014-11-11 16:58 UTC (permalink / raw) To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Wed, 05 Nov 2014 18:08:57 +0100, Andreas Rohner wrote: > On 2014-11-05 01:07, Ryusuke Konishi wrote: >> On Tue, 04 Nov 2014 16:50:21 +0100, Andreas Rohner wrote: >> >> I found filemap_write_and_wait_range() returns error status of >> already done page I/Os via filemap_check_errors(). We need to >> look into what it does. > > I have looked into this a bit. AS_EIO and AS_ENOSPC are asynchronous > error flags, set by the function mapping_set_error(). However I don't > think this is relevant for NILFS2, because it implements its own > writepages() function: > > nilfs_sync_file() > filemap_write_and_wait_range() > __filemap_fdatawrite_range() > do_writepages() > writepages() > nilfs_writepages() > > mapping_set_error() would only be called if NILFS2 would use > generic_writepages() like this: > > nilfs_sync_file() > filemap_write_and_wait_range() > __filemap_fdatawrite_range() > do_writepages() > generic_writepages() > > But it doesn't, so we can ignore filemap_check_errors(). Furthermore > NILFS2 doesn't use the generic writeback mechanism of the kernel at all. > It creates its own bio in nilfs_segbuf_submit_bh(), submits the bio with > nilfs_segbuf_submit_bio() and waits for it with nilfs_segbuf_wait() and > records IO-errors in segbuf->sb_err, so there is no need to check AS_EIO > and AS_ENOSPC. > > I think filemap_write_and_wait_range() is mostly useful for in place > updates. A copy on write filesystem like NILFS2 doesn't need it. BTRFS > doesn't use it either in its fsync function... OK. I confirmed the current NILFS2 doesn't need to check AS_EIO and AS_ENOSPC because NILFS2 doesn't evict erroneous pages from the page cache; NILFS2 tries to keep such pages until they will be successfully written back to disk. Therefore it can detect error of already finished page-IOs without calling filemap_fdatawait_range() or filemap_check_errors(). On the other hand, regular filesystems need to these functions in fsync() because they will clear the dirty flags of pages and buffers even if the writeback failed due to an IO error or a disk full error. Without AS_EIO and AS_ENOSPC check, fsync() of these filesystems can miss the errors. However this design policy of NILFS2 has a defect that the system easily falls into a memory shortage with too many dirty pages when I/O errors will block. If we would introduce a simliar logic to NILFS2, it will need the following changes: - nilfs_abort_logs() and nilfs_end_page_io() are changed so that they call mapping_set_error() instead of redirtying data pages on error. - nilfs_sync_file() will be also changed so that it first calls filemap_fdatawait_range() and catches errors previously happened. Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20141112.015839.1565017343097841658.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]
* Re: [PATCH 1/1] nilfs2: remove unnecessary call to nilfs_construct_dsync_segment() [not found] ` <20141112.015839.1565017343097841658.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> @ 2014-11-11 21:19 ` Andreas Rohner 0 siblings, 0 replies; 7+ messages in thread From: Andreas Rohner @ 2014-11-11 21:19 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On 2014-11-11 17:58, Ryusuke Konishi wrote: > On Wed, 05 Nov 2014 18:08:57 +0100, Andreas Rohner wrote: >> On 2014-11-05 01:07, Ryusuke Konishi wrote: >>> On Tue, 04 Nov 2014 16:50:21 +0100, Andreas Rohner wrote: >>> >>> I found filemap_write_and_wait_range() returns error status of >>> already done page I/Os via filemap_check_errors(). We need to >>> look into what it does. >> >> I have looked into this a bit. AS_EIO and AS_ENOSPC are asynchronous >> error flags, set by the function mapping_set_error(). However I don't >> think this is relevant for NILFS2, because it implements its own >> writepages() function: >> >> nilfs_sync_file() >> filemap_write_and_wait_range() >> __filemap_fdatawrite_range() >> do_writepages() >> writepages() >> nilfs_writepages() >> >> mapping_set_error() would only be called if NILFS2 would use >> generic_writepages() like this: >> >> nilfs_sync_file() >> filemap_write_and_wait_range() >> __filemap_fdatawrite_range() >> do_writepages() >> generic_writepages() >> >> But it doesn't, so we can ignore filemap_check_errors(). Furthermore >> NILFS2 doesn't use the generic writeback mechanism of the kernel at all. >> It creates its own bio in nilfs_segbuf_submit_bh(), submits the bio with >> nilfs_segbuf_submit_bio() and waits for it with nilfs_segbuf_wait() and >> records IO-errors in segbuf->sb_err, so there is no need to check AS_EIO >> and AS_ENOSPC. >> >> I think filemap_write_and_wait_range() is mostly useful for in place >> updates. A copy on write filesystem like NILFS2 doesn't need it. BTRFS >> doesn't use it either in its fsync function... > > OK. I confirmed the current NILFS2 doesn't need to check AS_EIO and > AS_ENOSPC because NILFS2 doesn't evict erroneous pages from the page > cache; NILFS2 tries to keep such pages until they will be successfully > written back to disk. Therefore it can detect error of already > finished page-IOs without calling filemap_fdatawait_range() or > filemap_check_errors(). > > On the other hand, regular filesystems need to these functions in > fsync() because they will clear the dirty flags of pages and buffers > even if the writeback failed due to an IO error or a disk full error. > Without AS_EIO and AS_ENOSPC check, fsync() of these filesystems > can miss the errors. So it is necessary to catch the errors that happened before the call to fsync() and if the pages are not redirtied, the errors could be missed. I didn't know that. > However this design policy of NILFS2 has a defect that the system > easily falls into a memory shortage with too many dirty pages when > I/O errors will block. > > If we would introduce a simliar logic to NILFS2, it will need the > following changes: > > - nilfs_abort_logs() and nilfs_end_page_io() are changed so that they > call mapping_set_error() instead of redirtying data pages on error. > > - nilfs_sync_file() will be also changed so that it first calls > filemap_fdatawait_range() and catches errors previously happened. I don't know if I fully understand the problem. So I may be completely wrong here. But if there is only one bad sector somewhere, which causes an I/O Error, you could potentially lose the data from a whole segment and there would possibly be data loss all over the file system. There could also be inconsistent metadata, because the metadata files would also lose data. NILFS has a good chance of recovering from an I/O error, because it will mark the segment with nilfs_sufile_set_error() and write the pages to a different segment. Isn't that worth the extra memory? Best regards, Andreas Rohner > Regards, > Ryusuke Konishi > -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-11-11 21:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-01 17:01 [PATCH 1/1] nilfs2: remove unnecessary call to nilfs_construct_dsync_segment() Andreas Rohner
[not found] ` <1414861267-13103-1-git-send-email-andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-11-04 14:34 ` Ryusuke Konishi
[not found] ` <20141104.233458.804333886341114381.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-11-04 15:50 ` Andreas Rohner
[not found] ` <5458F5BD.2030908-hi6Y0CQ0nG0@public.gmane.org>
2014-11-05 0:07 ` Ryusuke Konishi
[not found] ` <20141105.090704.583801206386960498.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-11-05 17:08 ` Andreas Rohner
[not found] ` <545A59A9.8090506-hi6Y0CQ0nG0@public.gmane.org>
2014-11-11 16:58 ` Ryusuke Konishi
[not found] ` <20141112.015839.1565017343097841658.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-11-11 21:19 ` Andreas Rohner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).