* Space leak in f2fs @ 2015-05-13 7:17 hujianyang 2015-05-13 17:46 ` [f2fs-dev] " Jaegeuk Kim 0 siblings, 1 reply; 11+ messages in thread From: hujianyang @ 2015-05-13 7:17 UTC (permalink / raw) To: Jaegeuk Kim Cc: Chao Yu, cm224.lee, linux-f2fs-devel, linux-fsdevel@vger.kernel.org Hi Jaegeuk, I found a space leak problem in f2fs. This problem could lead to ENOSPC error during stress tests, e.g. ltp. <<<test_output>>> growfiles(gf15): 11656 growfiles.c/2249: 16920 tlibio.c/739 write(6, buf, 1352) ret:-1, errno=28 No space left on device gf15 1 TFAIL : growfiles.c:132: Test failed ... And can be reproduced by these steps whether background_gc is on or not: 1) format a 4GB f2fs partition 2) dd a 3G file, 3) unlink it. Do these steps again and again. Soon, after one unlink operation, you can see the space of the 3G file is not free. Fs-Server:/mnt/f2fs # df . Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdd3 4193280 301064 3854328 8% /mnt/f2fs Fs-Server:/mnt/f2fs # dd if=/dev/zero of=./test bs=1M count=3072 3072+0 records in 3072+0 records out 3221225472 bytes (3.2 GB) copied, 3.1892 s, 1.0 GB/s Fs-Server:/mnt/f2fs # unlink ./test Fs-Server:/mnt/f2fs # dd if=/dev/zero of=./test bs=1M count=3072 3072+0 records in 3072+0 records out 3221225472 bytes (3.2 GB) copied, 3.44288 s, 936 MB/s Fs-Server:/mnt/f2fs # unlink ./test Fs-Server:/mnt/f2fs # df . Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdd3 4193280 3449888 705504 84% /mnt/f2fs Fs-Server:/mnt/f2fs # ls Fs-Server:/mnt/f2fs # ls Fs-Server:/mnt/f2fs # df . Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdd3 4193280 3449888 705504 84% /mnt/f2fs Fs-Server:/mnt/f2fs # dd if=/dev/zero of=./test bs=1M count=3072 dd: writing `./test': No space left on device 689+0 records in 688+0 records out 721719296 bytes (722 MB) copied, 0.618972 s, 1.2 GB/s Fs-Server:/mnt/f2fs # df . Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdd3 4193280 4155392 0 100% /mnt/f2fs Fs-Server:/mnt/f2fs # ls -l test -rw-r--r-- 1 root root 721719296 May 13 14:52 test We can reuse the leaking space after a sync call: Fs-Server:/mnt/f2fs # df . Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdd3 4193280 4155392 0 100% /mnt/f2fs Fs-Server:/mnt/f2fs # sync Fs-Server:/mnt/f2fs # df . Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdd3 4193280 1006568 3148824 25% /mnt/f2fs I found this may caused by .drop_inode in f2fs. see f2fs_drop_inode() diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 19438f2..7646d2a 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -424,15 +424,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) static int f2fs_drop_inode(struct inode *inode) { - /* - * This is to avoid a deadlock condition like below. - * writeback_single_inode(inode) - * - f2fs_write_data_page - * - f2fs_gc -> iput -> evict - * - inode_wait_for_writeback(inode) - */ - if (!inode_unhashed(inode) && inode->i_state & I_SYNC) - return 0; return generic_drop_inode(inode); } After removing these code, this problem is fixed. But this function is introduced by commit 531ad7d58c6476c5856653448b4c7d26427502b4 to fix a deadlock problem. I wish you and other developers in this list could help me to fix this problem in a correct way. Thanks, Hu ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [f2fs-dev] Space leak in f2fs 2015-05-13 7:17 Space leak in f2fs hujianyang @ 2015-05-13 17:46 ` Jaegeuk Kim 2015-05-14 0:24 ` Jaegeuk Kim 0 siblings, 1 reply; 11+ messages in thread From: Jaegeuk Kim @ 2015-05-13 17:46 UTC (permalink / raw) To: hujianyang; +Cc: linux-fsdevel@vger.kernel.org, linux-f2fs-devel Hi Hu, Thank you for the report. When I tried to reproduce this, it seems that this can occur under somewhat stressful condition. I could't reach out to this problem. Nevertheless, I think this is a possible scenario, so I wrote a patch for this. Could you test this patch? Thanks, --- fs/f2fs/super.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 19438f2..4593cd1 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -431,8 +431,26 @@ static int f2fs_drop_inode(struct inode *inode) * - f2fs_gc -> iput -> evict * - inode_wait_for_writeback(inode) */ - if (!inode_unhashed(inode) && inode->i_state & I_SYNC) + if (!inode_unhashed(inode) && inode->i_state & I_SYNC) { + if (!inode->i_nlink && !is_bad_inode(inode)) { + spin_unlock(&inode->i_lock); + + i_size_write(inode, 0); + + if (F2FS_HAS_BLOCKS(inode)) + f2fs_truncate(inode); + + f2fs_lock_op(F2FS_I_SB(inode)); + remove_inode_page(inode); + f2fs_unlock_op(F2FS_I_SB(inode)); + + /* avoid any write_inode call */ + clear_inode_flag(F2FS_I(inode), FI_DIRTY_INODE); + + spin_lock(&inode->i_lock); + } return 0; + } return generic_drop_inode(inode); } -- 2.1.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Space leak in f2fs 2015-05-13 17:46 ` [f2fs-dev] " Jaegeuk Kim @ 2015-05-14 0:24 ` Jaegeuk Kim 2015-05-14 1:40 ` hujianyang 0 siblings, 1 reply; 11+ messages in thread From: Jaegeuk Kim @ 2015-05-14 0:24 UTC (permalink / raw) To: hujianyang; +Cc: linux-fsdevel@vger.kernel.org, linux-f2fs-devel Hi Hu, Found a bug in the previous patch. Could you check this out? Thanks, --- fs/f2fs/super.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 19438f2..647591b 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -431,8 +431,17 @@ static int f2fs_drop_inode(struct inode *inode) * - f2fs_gc -> iput -> evict * - inode_wait_for_writeback(inode) */ - if (!inode_unhashed(inode) && inode->i_state & I_SYNC) + if (!inode_unhashed(inode) && inode->i_state & I_SYNC) { + if (!inode->i_nlink && !is_bad_inode(inode)) { + spin_unlock(&inode->i_lock); + i_size_write(inode, 0); + + if (F2FS_HAS_BLOCKS(inode)) + f2fs_truncate(inode); + spin_lock(&inode->i_lock); + } return 0; + } return generic_drop_inode(inode); } -- 2.1.1 ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Space leak in f2fs 2015-05-14 0:24 ` Jaegeuk Kim @ 2015-05-14 1:40 ` hujianyang 2015-05-14 1:45 ` [f2fs-dev] " Jaegeuk Kim 2015-05-14 21:14 ` Jaegeuk Kim 0 siblings, 2 replies; 11+ messages in thread From: hujianyang @ 2015-05-14 1:40 UTC (permalink / raw) To: Jaegeuk Kim; +Cc: linux-fsdevel@vger.kernel.org, linux-f2fs-devel Hi Jaegeuk, I've tested this patch. It's OK now. Seems this problem is fixed. I'd like to push this patch to my local tree and run a formal stress test next week. Will you push this patch to f2fs-dev branch? If you have other modification to this fix, please let me know. Thanks very much! Hu On 2015/5/14 8:24, Jaegeuk Kim wrote: > Hi Hu, > > Found a bug in the previous patch. > Could you check this out? > > Thanks, > > --- > fs/f2fs/super.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > index 19438f2..647591b 100644 > --- a/fs/f2fs/super.c > +++ b/fs/f2fs/super.c > @@ -431,8 +431,17 @@ static int f2fs_drop_inode(struct inode *inode) > * - f2fs_gc -> iput -> evict > * - inode_wait_for_writeback(inode) > */ > - if (!inode_unhashed(inode) && inode->i_state & I_SYNC) > + if (!inode_unhashed(inode) && inode->i_state & I_SYNC) { > + if (!inode->i_nlink && !is_bad_inode(inode)) { > + spin_unlock(&inode->i_lock); > + i_size_write(inode, 0); > + > + if (F2FS_HAS_BLOCKS(inode)) > + f2fs_truncate(inode); > + spin_lock(&inode->i_lock); > + } > return 0; > + } > return generic_drop_inode(inode); > } > > ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [f2fs-dev] Space leak in f2fs 2015-05-14 1:40 ` hujianyang @ 2015-05-14 1:45 ` Jaegeuk Kim 2015-05-14 21:14 ` Jaegeuk Kim 1 sibling, 0 replies; 11+ messages in thread From: Jaegeuk Kim @ 2015-05-14 1:45 UTC (permalink / raw) To: hujianyang; +Cc: linux-fsdevel@vger.kernel.org, linux-f2fs-devel I'll push the patch into dev branch right now. Let me know, if there is any problem. Thanks, On Thu, May 14, 2015 at 09:40:25AM +0800, hujianyang wrote: > Hi Jaegeuk, > > I've tested this patch. It's OK now. Seems this problem is fixed. > > I'd like to push this patch to my local tree and run a formal > stress test next week. Will you push this patch to f2fs-dev branch? > > If you have other modification to this fix, please let me know. > > Thanks very much! > Hu > > On 2015/5/14 8:24, Jaegeuk Kim wrote: > > Hi Hu, > > > > Found a bug in the previous patch. > > Could you check this out? > > > > Thanks, > > > > --- > > fs/f2fs/super.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > > index 19438f2..647591b 100644 > > --- a/fs/f2fs/super.c > > +++ b/fs/f2fs/super.c > > @@ -431,8 +431,17 @@ static int f2fs_drop_inode(struct inode *inode) > > * - f2fs_gc -> iput -> evict > > * - inode_wait_for_writeback(inode) > > */ > > - if (!inode_unhashed(inode) && inode->i_state & I_SYNC) > > + if (!inode_unhashed(inode) && inode->i_state & I_SYNC) { > > + if (!inode->i_nlink && !is_bad_inode(inode)) { > > + spin_unlock(&inode->i_lock); > > + i_size_write(inode, 0); > > + > > + if (F2FS_HAS_BLOCKS(inode)) > > + f2fs_truncate(inode); > > + spin_lock(&inode->i_lock); > > + } > > return 0; > > + } > > return generic_drop_inode(inode); > > } > > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [f2fs-dev] Space leak in f2fs 2015-05-14 1:40 ` hujianyang 2015-05-14 1:45 ` [f2fs-dev] " Jaegeuk Kim @ 2015-05-14 21:14 ` Jaegeuk Kim 2015-05-15 8:31 ` Chao Yu 1 sibling, 1 reply; 11+ messages in thread From: Jaegeuk Kim @ 2015-05-14 21:14 UTC (permalink / raw) To: hujianyang; +Cc: linux-fsdevel@vger.kernel.org, linux-f2fs-devel Hi Hu, I've been rethinking about whole this issue differently. And, now I'm starting to test with the below patch instead of previous one. Thanks, Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> --- fs/f2fs/checkpoint.c | 19 +++++++++++++++++++ fs/f2fs/data.c | 4 ++++ fs/f2fs/f2fs.h | 1 + fs/f2fs/super.c | 15 --------------- 4 files changed, 24 insertions(+), 15 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 7b7a9d8..74875fb 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -378,6 +378,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) spin_unlock(&im->ino_lock); } +static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) +{ + struct inode_management *im = &sbi->im[type]; + struct ino_entry *e; + bool exist = false; + + spin_lock(&im->ino_lock); + e = radix_tree_lookup(&im->ino_root, ino); + if (e) + exist = true; + spin_unlock(&im->ino_lock); + return exist; +} + void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type) { /* add new dirty ino entry into list */ @@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) __remove_ino_entry(sbi, ino, ORPHAN_INO); } +bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) +{ + return __exist_ino_entry(sbi, ino, ORPHAN_INO); +} + static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) { struct inode *inode = f2fs_iget(sbi->sb, ino); diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index b0cc2aa..1988f5f 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1749,6 +1749,10 @@ write: goto out; } + /* if orphan inode, we don't need to write its data */ + if (is_orphan_inode(sbi, inode->i_ino)) + goto out; + if (!wbc->for_reclaim) need_balance_fs = true; else if (has_not_enough_free_secs(sbi, 0)) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 8f1f21a..697346a 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1726,6 +1726,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *); void release_orphan_inode(struct f2fs_sb_info *); void add_orphan_inode(struct f2fs_sb_info *, nid_t); void remove_orphan_inode(struct f2fs_sb_info *, nid_t); +bool is_orphan_inode(struct f2fs_sb_info *, nid_t); void recover_orphan_inodes(struct f2fs_sb_info *); int get_valid_checkpoint(struct f2fs_sb_info *); void update_dirty_page(struct inode *, struct page *); diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 19438f2..1d0973a 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -422,20 +422,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) return &fi->vfs_inode; } -static int f2fs_drop_inode(struct inode *inode) -{ - /* - * This is to avoid a deadlock condition like below. - * writeback_single_inode(inode) - * - f2fs_write_data_page - * - f2fs_gc -> iput -> evict - * - inode_wait_for_writeback(inode) - */ - if (!inode_unhashed(inode) && inode->i_state & I_SYNC) - return 0; - return generic_drop_inode(inode); -} - /* * f2fs_dirty_inode() is called from __mark_inode_dirty() * @@ -759,7 +745,6 @@ restore_opts: static struct super_operations f2fs_sops = { .alloc_inode = f2fs_alloc_inode, - .drop_inode = f2fs_drop_inode, .destroy_inode = f2fs_destroy_inode, .write_inode = f2fs_write_inode, .dirty_inode = f2fs_dirty_inode, -- 2.1.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* RE: [f2fs-dev] Space leak in f2fs 2015-05-14 21:14 ` Jaegeuk Kim @ 2015-05-15 8:31 ` Chao Yu 2015-05-16 0:55 ` Jaegeuk Kim 0 siblings, 1 reply; 11+ messages in thread From: Chao Yu @ 2015-05-15 8:31 UTC (permalink / raw) To: 'Jaegeuk Kim', 'hujianyang' Cc: linux-fsdevel, linux-f2fs-devel Hi Jaegeuk, > -----Original Message----- > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org] > Sent: Friday, May 15, 2015 5:14 AM > To: hujianyang > Cc: linux-fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net > Subject: Re: [f2fs-dev] Space leak in f2fs > > Hi Hu, > > I've been rethinking about whole this issue differently. > And, now I'm starting to test with the below patch instead of previous one. > > Thanks, > > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> > --- > fs/f2fs/checkpoint.c | 19 +++++++++++++++++++ > fs/f2fs/data.c | 4 ++++ > fs/f2fs/f2fs.h | 1 + > fs/f2fs/super.c | 15 --------------- > 4 files changed, 24 insertions(+), 15 deletions(-) > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > index 7b7a9d8..74875fb 100644 > --- a/fs/f2fs/checkpoint.c > +++ b/fs/f2fs/checkpoint.c > @@ -378,6 +378,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int > type) > spin_unlock(&im->ino_lock); > } > > +static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) > +{ > + struct inode_management *im = &sbi->im[type]; > + struct ino_entry *e; > + bool exist = false; > + > + spin_lock(&im->ino_lock); > + e = radix_tree_lookup(&im->ino_root, ino); > + if (e) > + exist = true; > + spin_unlock(&im->ino_lock); > + return exist; > +} > + > void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type) > { > /* add new dirty ino entry into list */ > @@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) > __remove_ino_entry(sbi, ino, ORPHAN_INO); > } > > +bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) > +{ > + return __exist_ino_entry(sbi, ino, ORPHAN_INO); > +} > + > static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) > { > struct inode *inode = f2fs_iget(sbi->sb, ino); > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > index b0cc2aa..1988f5f 100644 > --- a/fs/f2fs/data.c > +++ b/fs/f2fs/data.c > @@ -1749,6 +1749,10 @@ write: > goto out; > } > > + /* if orphan inode, we don't need to write its data */ > + if (is_orphan_inode(sbi, inode->i_ino)) > + goto out; When user create a temp file by invoking open with O_TMPFILE flag, in ->tmpfile our temp file will be added into orphan list as its nlink is zero. If we skip writting out data for this orphan inode, later, even though we add nlink/directory entry for orphan inode by calling linkat, our file will contain inconsistent data between in-memory and on-disk. So how about considering for this case? BTW, the previous fixing patch looks good to me. Thanks, > + > if (!wbc->for_reclaim) > need_balance_fs = true; > else if (has_not_enough_free_secs(sbi, 0)) > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 8f1f21a..697346a 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -1726,6 +1726,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *); > void release_orphan_inode(struct f2fs_sb_info *); > void add_orphan_inode(struct f2fs_sb_info *, nid_t); > void remove_orphan_inode(struct f2fs_sb_info *, nid_t); > +bool is_orphan_inode(struct f2fs_sb_info *, nid_t); > void recover_orphan_inodes(struct f2fs_sb_info *); > int get_valid_checkpoint(struct f2fs_sb_info *); > void update_dirty_page(struct inode *, struct page *); > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > index 19438f2..1d0973a 100644 > --- a/fs/f2fs/super.c > +++ b/fs/f2fs/super.c > @@ -422,20 +422,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) > return &fi->vfs_inode; > } > > -static int f2fs_drop_inode(struct inode *inode) > -{ > - /* > - * This is to avoid a deadlock condition like below. > - * writeback_single_inode(inode) > - * - f2fs_write_data_page > - * - f2fs_gc -> iput -> evict > - * - inode_wait_for_writeback(inode) > - */ > - if (!inode_unhashed(inode) && inode->i_state & I_SYNC) > - return 0; > - return generic_drop_inode(inode); > -} > - > /* > * f2fs_dirty_inode() is called from __mark_inode_dirty() > * > @@ -759,7 +745,6 @@ restore_opts: > > static struct super_operations f2fs_sops = { > .alloc_inode = f2fs_alloc_inode, > - .drop_inode = f2fs_drop_inode, > .destroy_inode = f2fs_destroy_inode, > .write_inode = f2fs_write_inode, > .dirty_inode = f2fs_dirty_inode, > -- > 2.1.1 > > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [f2fs-dev] Space leak in f2fs 2015-05-15 8:31 ` Chao Yu @ 2015-05-16 0:55 ` Jaegeuk Kim 2015-05-18 2:43 ` Chao Yu 0 siblings, 1 reply; 11+ messages in thread From: Jaegeuk Kim @ 2015-05-16 0:55 UTC (permalink / raw) To: Chao Yu; +Cc: 'hujianyang', linux-fsdevel, linux-f2fs-devel Hi Chao, On Fri, May 15, 2015 at 04:31:43PM +0800, Chao Yu wrote: > Hi Jaegeuk, > [snip] > > + /* if orphan inode, we don't need to write its data */ > > + if (is_orphan_inode(sbi, inode->i_ino)) > > + goto out; > > When user create a temp file by invoking open with O_TMPFILE flag, > in ->tmpfile our temp file will be added into orphan list as its > nlink is zero. > > If we skip writting out data for this orphan inode, later, even though > we add nlink/directory entry for orphan inode by calling linkat, > our file will contain inconsistent data between in-memory and on-disk. > > So how about considering for this case? Right. How about the below patch? > > BTW, the previous fixing patch looks good to me. But, my new concern here is a memory pressure. If we do not drop the inode when iput was called, we need to wait for another time slot to reclaim its memory. Thanks, --- fs/f2fs/checkpoint.c | 19 +++++++++++++++++++ fs/f2fs/data.c | 8 ++++++++ fs/f2fs/dir.c | 1 + fs/f2fs/f2fs.h | 2 ++ fs/f2fs/super.c | 14 +++++++++++++- 5 files changed, 43 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 7b7a9d8..74875fb 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -378,6 +378,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) spin_unlock(&im->ino_lock); } +static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) +{ + struct inode_management *im = &sbi->im[type]; + struct ino_entry *e; + bool exist = false; + + spin_lock(&im->ino_lock); + e = radix_tree_lookup(&im->ino_root, ino); + if (e) + exist = true; + spin_unlock(&im->ino_lock); + return exist; +} + void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type) { /* add new dirty ino entry into list */ @@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) __remove_ino_entry(sbi, ino, ORPHAN_INO); } +bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) +{ + return __exist_ino_entry(sbi, ino, ORPHAN_INO); +} + static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) { struct inode *inode = f2fs_iget(sbi->sb, ino); diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index b0cc2aa..d883c14 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1749,6 +1749,14 @@ write: goto out; } + /* + * if orphan inode, we don't need to write its data, + * but, tmpfile is not the case. + */ + if (is_orphan_inode(sbi, inode->i_ino) && + !is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE)) + goto out; + if (!wbc->for_reclaim) need_balance_fs = true; else if (has_not_enough_free_secs(sbi, 0)) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 3e92376..a2ea1b9 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -648,6 +648,7 @@ int f2fs_do_tmpfile(struct inode *inode, struct inode *dir) update_inode(inode, page); f2fs_put_page(page, 1); + set_inode_flag(F2FS_I(inode), FI_TMP_INODE); clear_inode_flag(F2FS_I(inode), FI_NEW_INODE); fail: up_write(&F2FS_I(inode)->i_sem); diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index cdcae06..de21d38 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1337,6 +1337,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr) /* used for f2fs_inode_info->flags */ enum { FI_NEW_INODE, /* indicate newly allocated inode */ + FI_TMP_INODE, /* indicate tmpfile */ FI_DIRTY_INODE, /* indicate inode is dirty or not */ FI_DIRTY_DIR, /* indicate directory has dirty pages */ FI_INC_LINK, /* need to increment i_nlink */ @@ -1726,6 +1727,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *); void release_orphan_inode(struct f2fs_sb_info *); void add_orphan_inode(struct f2fs_sb_info *, nid_t); void remove_orphan_inode(struct f2fs_sb_info *, nid_t); +bool is_orphan_inode(struct f2fs_sb_info *, nid_t); void recover_orphan_inodes(struct f2fs_sb_info *); int get_valid_checkpoint(struct f2fs_sb_info *); void update_dirty_page(struct inode *, struct page *); diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 7464d08..98af3bf 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -430,9 +430,21 @@ static int f2fs_drop_inode(struct inode *inode) * - f2fs_write_data_page * - f2fs_gc -> iput -> evict * - inode_wait_for_writeback(inode) + * In order to avoid that, f2fs_write_data_page does not write data + * pages for orphan inode except tmpfile. + * Nevertheless, we need to truncate the tmpfile's data to avoid + * needless cleaning. */ - if (!inode_unhashed(inode) && inode->i_state & I_SYNC) + if (is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE) && + inode->i_state & I_SYNC) { + spin_unlock(&inode->i_lock); + i_size_write(inode, 0); + + if (F2FS_HAS_BLOCKS(inode)) + f2fs_truncate(inode); + spin_lock(&inode->i_lock); return 0; + } return generic_drop_inode(inode); } -- 2.1.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* RE: [f2fs-dev] Space leak in f2fs 2015-05-16 0:55 ` Jaegeuk Kim @ 2015-05-18 2:43 ` Chao Yu 2015-05-18 2:50 ` Nicholas Krause 2015-05-18 5:44 ` [f2fs-dev] " Jaegeuk Kim 0 siblings, 2 replies; 11+ messages in thread From: Chao Yu @ 2015-05-18 2:43 UTC (permalink / raw) To: 'Jaegeuk Kim' Cc: 'hujianyang', linux-fsdevel, linux-f2fs-devel Hi Jaegeuk, > -----Original Message----- > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org] > Sent: Saturday, May 16, 2015 8:56 AM > To: Chao Yu > Cc: 'hujianyang'; linux-fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net > Subject: Re: [f2fs-dev] Space leak in f2fs > > Hi Chao, > > On Fri, May 15, 2015 at 04:31:43PM +0800, Chao Yu wrote: > > Hi Jaegeuk, > > > > [snip] > > > > + /* if orphan inode, we don't need to write its data */ > > > + if (is_orphan_inode(sbi, inode->i_ino)) > > > + goto out; > > > > When user create a temp file by invoking open with O_TMPFILE flag, > > in ->tmpfile our temp file will be added into orphan list as its > > nlink is zero. > > > > If we skip writting out data for this orphan inode, later, even though > > we add nlink/directory entry for orphan inode by calling linkat, > > our file will contain inconsistent data between in-memory and on-disk. > > > > So how about considering for this case? > > Right. > How about the below patch? > > > > > BTW, the previous fixing patch looks good to me. > > But, my new concern here is a memory pressure. If we do not drop the inode > when iput was called, we need to wait for another time slot to reclaim its > memory. Agree. Please see below. > > Thanks, > > --- > fs/f2fs/checkpoint.c | 19 +++++++++++++++++++ > fs/f2fs/data.c | 8 ++++++++ > fs/f2fs/dir.c | 1 + > fs/f2fs/f2fs.h | 2 ++ > fs/f2fs/super.c | 14 +++++++++++++- > 5 files changed, 43 insertions(+), 1 deletion(-) > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > index 7b7a9d8..74875fb 100644 > --- a/fs/f2fs/checkpoint.c > +++ b/fs/f2fs/checkpoint.c > @@ -378,6 +378,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int > type) > spin_unlock(&im->ino_lock); > } > > +static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) > +{ > + struct inode_management *im = &sbi->im[type]; > + struct ino_entry *e; > + bool exist = false; > + > + spin_lock(&im->ino_lock); > + e = radix_tree_lookup(&im->ino_root, ino); > + if (e) > + exist = true; > + spin_unlock(&im->ino_lock); > + return exist; > +} > + > void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type) > { > /* add new dirty ino entry into list */ > @@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) > __remove_ino_entry(sbi, ino, ORPHAN_INO); > } > > +bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) > +{ > + return __exist_ino_entry(sbi, ino, ORPHAN_INO); > +} > + > static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) > { > struct inode *inode = f2fs_iget(sbi->sb, ino); > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > index b0cc2aa..d883c14 100644 > --- a/fs/f2fs/data.c > +++ b/fs/f2fs/data.c > @@ -1749,6 +1749,14 @@ write: > goto out; > } > > + /* > + * if orphan inode, we don't need to write its data, > + * but, tmpfile is not the case. > + */ > + if (is_orphan_inode(sbi, inode->i_ino) && > + !is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE)) For normal inode, all dirty pages will not be written out, and after that pages can be reclaimed by VM any time due to they are be cleaned when flush. Then any process who held the orphan inode may not read any original data correctly from this inode. And here is the unlink description in POSIX: "If one or more processes have the file open when the last link is removed, the link shall be removed before unlink() returns, but the removal of the file contents shall be postponed until all references to the file are closed." To my understanding for above description, we should keep data of helded orphan inode in memory or on disk until it is not referenced by any processes. How do you think of it? using "if (is_orphan_inode(sbi, inode->i_ino) && !atomic_read(&inode->i_count))" to skip writing at the beginning of ->writepage()? Thanks, > + goto out; > + > if (!wbc->for_reclaim) > need_balance_fs = true; > else if (has_not_enough_free_secs(sbi, 0)) > diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c > index 3e92376..a2ea1b9 100644 > --- a/fs/f2fs/dir.c > +++ b/fs/f2fs/dir.c > @@ -648,6 +648,7 @@ int f2fs_do_tmpfile(struct inode *inode, struct inode *dir) > update_inode(inode, page); > f2fs_put_page(page, 1); > > + set_inode_flag(F2FS_I(inode), FI_TMP_INODE); > clear_inode_flag(F2FS_I(inode), FI_NEW_INODE); > fail: > up_write(&F2FS_I(inode)->i_sem); > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index cdcae06..de21d38 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -1337,6 +1337,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr) > /* used for f2fs_inode_info->flags */ > enum { > FI_NEW_INODE, /* indicate newly allocated inode */ > + FI_TMP_INODE, /* indicate tmpfile */ > FI_DIRTY_INODE, /* indicate inode is dirty or not */ > FI_DIRTY_DIR, /* indicate directory has dirty pages */ > FI_INC_LINK, /* need to increment i_nlink */ > @@ -1726,6 +1727,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *); > void release_orphan_inode(struct f2fs_sb_info *); > void add_orphan_inode(struct f2fs_sb_info *, nid_t); > void remove_orphan_inode(struct f2fs_sb_info *, nid_t); > +bool is_orphan_inode(struct f2fs_sb_info *, nid_t); > void recover_orphan_inodes(struct f2fs_sb_info *); > int get_valid_checkpoint(struct f2fs_sb_info *); > void update_dirty_page(struct inode *, struct page *); > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > index 7464d08..98af3bf 100644 > --- a/fs/f2fs/super.c > +++ b/fs/f2fs/super.c > @@ -430,9 +430,21 @@ static int f2fs_drop_inode(struct inode *inode) > * - f2fs_write_data_page > * - f2fs_gc -> iput -> evict > * - inode_wait_for_writeback(inode) > + * In order to avoid that, f2fs_write_data_page does not write data > + * pages for orphan inode except tmpfile. > + * Nevertheless, we need to truncate the tmpfile's data to avoid > + * needless cleaning. > */ > - if (!inode_unhashed(inode) && inode->i_state & I_SYNC) > + if (is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE) && > + inode->i_state & I_SYNC) { > + spin_unlock(&inode->i_lock); > + i_size_write(inode, 0); > + > + if (F2FS_HAS_BLOCKS(inode)) > + f2fs_truncate(inode); > + spin_lock(&inode->i_lock); > return 0; > + } > return generic_drop_inode(inode); > } > > -- > 2.1.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Space leak in f2fs 2015-05-18 2:43 ` Chao Yu @ 2015-05-18 2:50 ` Nicholas Krause 2015-05-18 5:44 ` [f2fs-dev] " Jaegeuk Kim 1 sibling, 0 replies; 11+ messages in thread From: Nicholas Krause @ 2015-05-18 2:50 UTC (permalink / raw) To: Chao Yu, 'Jaegeuk Kim'; +Cc: linux-fsdevel, linux-f2fs-devel On May 17, 2015 10:43:14 PM EDT, Chao Yu <chao2.yu@samsung.com> wrote: >Hi Jaegeuk, > >> -----Original Message----- >> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org] >> Sent: Saturday, May 16, 2015 8:56 AM >> To: Chao Yu >> Cc: 'hujianyang'; linux-fsdevel@vger.kernel.org; >linux-f2fs-devel@lists.sourceforge.net >> Subject: Re: [f2fs-dev] Space leak in f2fs >> >> Hi Chao, >> >> On Fri, May 15, 2015 at 04:31:43PM +0800, Chao Yu wrote: >> > Hi Jaegeuk, >> > >> >> [snip] >> >> > > + /* if orphan inode, we don't need to write its data */ >> > > + if (is_orphan_inode(sbi, inode->i_ino)) >> > > + goto out; >> > >> > When user create a temp file by invoking open with O_TMPFILE flag, >> > in ->tmpfile our temp file will be added into orphan list as its >> > nlink is zero. >> > >> > If we skip writting out data for this orphan inode, later, even >though >> > we add nlink/directory entry for orphan inode by calling linkat, >> > our file will contain inconsistent data between in-memory and >on-disk. >> > >> > So how about considering for this case? >> >> Right. >> How about the below patch? >> >> > >> > BTW, the previous fixing patch looks good to me. >> >> But, my new concern here is a memory pressure. If we do not drop the >inode >> when iput was called, we need to wait for another time slot to >reclaim its >> memory. > >Agree. Please see below. > >> >> Thanks, >> >> --- >> fs/f2fs/checkpoint.c | 19 +++++++++++++++++++ >> fs/f2fs/data.c | 8 ++++++++ >> fs/f2fs/dir.c | 1 + >> fs/f2fs/f2fs.h | 2 ++ >> fs/f2fs/super.c | 14 +++++++++++++- >> 5 files changed, 43 insertions(+), 1 deletion(-) >> >> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c >> index 7b7a9d8..74875fb 100644 >> --- a/fs/f2fs/checkpoint.c >> +++ b/fs/f2fs/checkpoint.c >> @@ -378,6 +378,20 @@ static void __remove_ino_entry(struct >f2fs_sb_info *sbi, nid_t ino, int >> type) >> spin_unlock(&im->ino_lock); >> } >> >> +static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, >int type) >> +{ >> + struct inode_management *im = &sbi->im[type]; >> + struct ino_entry *e; >> + bool exist = false; >> + >> + spin_lock(&im->ino_lock); >> + e = radix_tree_lookup(&im->ino_root, ino); >> + if (e) >> + exist = true; >> + spin_unlock(&im->ino_lock); >> + return exist; >> +} >> + >> void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type) >> { >> /* add new dirty ino entry into list */ >> @@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info >*sbi, nid_t ino) >> __remove_ino_entry(sbi, ino, ORPHAN_INO); >> } >> >> +bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) >> +{ >> + return __exist_ino_entry(sbi, ino, ORPHAN_INO); >> +} >> + >> static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t >ino) >> { >> struct inode *inode = f2fs_iget(sbi->sb, ino); >> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >> index b0cc2aa..d883c14 100644 >> --- a/fs/f2fs/data.c >> +++ b/fs/f2fs/data.c >> @@ -1749,6 +1749,14 @@ write: >> goto out; >> } >> >> + /* >> + * if orphan inode, we don't need to write its data, >> + * but, tmpfile is not the case. >> + */ >> + if (is_orphan_inode(sbi, inode->i_ino) && >> + !is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE)) > >For normal inode, all dirty pages will not be written out, and after >that pages >can be reclaimed by VM any time due to they are be cleaned when flush. >Then any >process who held the orphan inode may not read any original data >correctly from >this inode. > >And here is the unlink description in POSIX: >"If one or more processes have the file open when the last link is >removed, >the link shall be removed before unlink() returns, but the removal of >the >file contents shall be postponed until all references to the file are >closed." > >To my understanding for above description, we should keep data of >helded orphan >inode in memory or on disk until it is not referenced by any processes. > >How do you think of it? > >using "if (is_orphan_inode(sbi, inode->i_ino) && >!atomic_read(&inode->i_count))" >to skip writing at the beginning of ->writepage()? > >Thanks, > Chao, Your correct here, I was going to recommend this but my explanation was pretty badly worded. Again also I am not that well versed in the f2fs code base so I wasn't sure if my answer was correct. Nick >> + goto out; >> + >> if (!wbc->for_reclaim) >> need_balance_fs = true; >> else if (has_not_enough_free_secs(sbi, 0)) >> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c >> index 3e92376..a2ea1b9 100644 >> --- a/fs/f2fs/dir.c >> +++ b/fs/f2fs/dir.c >> @@ -648,6 +648,7 @@ int f2fs_do_tmpfile(struct inode *inode, struct >inode *dir) >> update_inode(inode, page); >> f2fs_put_page(page, 1); >> >> + set_inode_flag(F2FS_I(inode), FI_TMP_INODE); >> clear_inode_flag(F2FS_I(inode), FI_NEW_INODE); >> fail: >> up_write(&F2FS_I(inode)->i_sem); >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >> index cdcae06..de21d38 100644 >> --- a/fs/f2fs/f2fs.h >> +++ b/fs/f2fs/f2fs.h >> @@ -1337,6 +1337,7 @@ static inline void f2fs_change_bit(unsigned int >nr, char *addr) >> /* used for f2fs_inode_info->flags */ >> enum { >> FI_NEW_INODE, /* indicate newly allocated inode */ >> + FI_TMP_INODE, /* indicate tmpfile */ >> FI_DIRTY_INODE, /* indicate inode is dirty or not */ >> FI_DIRTY_DIR, /* indicate directory has dirty pages */ >> FI_INC_LINK, /* need to increment i_nlink */ >> @@ -1726,6 +1727,7 @@ int acquire_orphan_inode(struct f2fs_sb_info >*); >> void release_orphan_inode(struct f2fs_sb_info *); >> void add_orphan_inode(struct f2fs_sb_info *, nid_t); >> void remove_orphan_inode(struct f2fs_sb_info *, nid_t); >> +bool is_orphan_inode(struct f2fs_sb_info *, nid_t); >> void recover_orphan_inodes(struct f2fs_sb_info *); >> int get_valid_checkpoint(struct f2fs_sb_info *); >> void update_dirty_page(struct inode *, struct page *); >> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c >> index 7464d08..98af3bf 100644 >> --- a/fs/f2fs/super.c >> +++ b/fs/f2fs/super.c >> @@ -430,9 +430,21 @@ static int f2fs_drop_inode(struct inode *inode) >> * - f2fs_write_data_page >> * - f2fs_gc -> iput -> evict >> * - inode_wait_for_writeback(inode) >> + * In order to avoid that, f2fs_write_data_page does not write data >> + * pages for orphan inode except tmpfile. >> + * Nevertheless, we need to truncate the tmpfile's data to avoid >> + * needless cleaning. >> */ >> - if (!inode_unhashed(inode) && inode->i_state & I_SYNC) >> + if (is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE) && >> + inode->i_state & I_SYNC) { >> + spin_unlock(&inode->i_lock); >> + i_size_write(inode, 0); >> + >> + if (F2FS_HAS_BLOCKS(inode)) >> + f2fs_truncate(inode); >> + spin_lock(&inode->i_lock); >> return 0; >> + } >> return generic_drop_inode(inode); >> } >> >> -- >> 2.1.1 > > > >------------------------------------------------------------------------------ >One dashboard for servers and applications across >Physical-Virtual-Cloud >Widest out-of-the-box monitoring support with 50+ applications >Performance metrics, stats and reports that give you Actionable >Insights >Deep dive visibility with transaction tracing using APM Insight. >http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >_______________________________________________ >Linux-f2fs-devel mailing list >Linux-f2fs-devel@lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [f2fs-dev] Space leak in f2fs 2015-05-18 2:43 ` Chao Yu 2015-05-18 2:50 ` Nicholas Krause @ 2015-05-18 5:44 ` Jaegeuk Kim 1 sibling, 0 replies; 11+ messages in thread From: Jaegeuk Kim @ 2015-05-18 5:44 UTC (permalink / raw) To: Chao Yu; +Cc: 'hujianyang', linux-fsdevel, linux-f2fs-devel Hi Chao, On Mon, May 18, 2015 at 10:43:14AM +0800, Chao Yu wrote: > Hi Jaegeuk, > > > -----Original Message----- > > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org] > > Sent: Saturday, May 16, 2015 8:56 AM > > To: Chao Yu > > Cc: 'hujianyang'; linux-fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net > > Subject: Re: [f2fs-dev] Space leak in f2fs > > > > Hi Chao, > > > > On Fri, May 15, 2015 at 04:31:43PM +0800, Chao Yu wrote: > > > Hi Jaegeuk, > > > > > > > [snip] > > > > > > + /* if orphan inode, we don't need to write its data */ > > > > + if (is_orphan_inode(sbi, inode->i_ino)) > > > > + goto out; > > > > > > When user create a temp file by invoking open with O_TMPFILE flag, > > > in ->tmpfile our temp file will be added into orphan list as its > > > nlink is zero. > > > > > > If we skip writting out data for this orphan inode, later, even though > > > we add nlink/directory entry for orphan inode by calling linkat, > > > our file will contain inconsistent data between in-memory and on-disk. > > > > > > So how about considering for this case? > > > > Right. > > How about the below patch? > > > > > > > > BTW, the previous fixing patch looks good to me. > > > > But, my new concern here is a memory pressure. If we do not drop the inode > > when iput was called, we need to wait for another time slot to reclaim its > > memory. > > Agree. Please see below. > > > > > Thanks, > > > > --- > > fs/f2fs/checkpoint.c | 19 +++++++++++++++++++ > > fs/f2fs/data.c | 8 ++++++++ > > fs/f2fs/dir.c | 1 + > > fs/f2fs/f2fs.h | 2 ++ > > fs/f2fs/super.c | 14 +++++++++++++- > > 5 files changed, 43 insertions(+), 1 deletion(-) > > > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > > index 7b7a9d8..74875fb 100644 > > --- a/fs/f2fs/checkpoint.c > > +++ b/fs/f2fs/checkpoint.c > > @@ -378,6 +378,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int > > type) > > spin_unlock(&im->ino_lock); > > } > > > > +static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) > > +{ > > + struct inode_management *im = &sbi->im[type]; > > + struct ino_entry *e; > > + bool exist = false; > > + > > + spin_lock(&im->ino_lock); > > + e = radix_tree_lookup(&im->ino_root, ino); > > + if (e) > > + exist = true; > > + spin_unlock(&im->ino_lock); > > + return exist; > > +} > > + > > void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type) > > { > > /* add new dirty ino entry into list */ > > @@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) > > __remove_ino_entry(sbi, ino, ORPHAN_INO); > > } > > > > +bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) > > +{ > > + return __exist_ino_entry(sbi, ino, ORPHAN_INO); > > +} > > + > > static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) > > { > > struct inode *inode = f2fs_iget(sbi->sb, ino); > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > > index b0cc2aa..d883c14 100644 > > --- a/fs/f2fs/data.c > > +++ b/fs/f2fs/data.c > > @@ -1749,6 +1749,14 @@ write: > > goto out; > > } > > > > + /* > > + * if orphan inode, we don't need to write its data, > > + * but, tmpfile is not the case. > > + */ > > + if (is_orphan_inode(sbi, inode->i_ino) && > > + !is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE)) > > For normal inode, all dirty pages will not be written out, and after that pages > can be reclaimed by VM any time due to they are be cleaned when flush. Then any > process who held the orphan inode may not read any original data correctly from > this inode. Urg, right. Indeed, I have not to do this for orphan inodes. > > And here is the unlink description in POSIX: > "If one or more processes have the file open when the last link is removed, > the link shall be removed before unlink() returns, but the removal of the > file contents shall be postponed until all references to the file are closed." > > To my understanding for above description, we should keep data of helded orphan > inode in memory or on disk until it is not referenced by any processes. > > How do you think of it? > > using "if (is_orphan_inode(sbi, inode->i_ino) && !atomic_read(&inode->i_count))" > to skip writing at the beginning of ->writepage()? Hmm, IMO, we can't use i_count without i_lock. And this doesn't clearly address the original race condition. For now, simply we'd better keep v2 which only truncates data blocks in f2fs_drop_inode. Thank you for pointing this out. Thanks, > > Thanks, > > > + goto out; > > + > > if (!wbc->for_reclaim) > > need_balance_fs = true; > > else if (has_not_enough_free_secs(sbi, 0)) > > diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c > > index 3e92376..a2ea1b9 100644 > > --- a/fs/f2fs/dir.c > > +++ b/fs/f2fs/dir.c > > @@ -648,6 +648,7 @@ int f2fs_do_tmpfile(struct inode *inode, struct inode *dir) > > update_inode(inode, page); > > f2fs_put_page(page, 1); > > > > + set_inode_flag(F2FS_I(inode), FI_TMP_INODE); > > clear_inode_flag(F2FS_I(inode), FI_NEW_INODE); > > fail: > > up_write(&F2FS_I(inode)->i_sem); > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > > index cdcae06..de21d38 100644 > > --- a/fs/f2fs/f2fs.h > > +++ b/fs/f2fs/f2fs.h > > @@ -1337,6 +1337,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr) > > /* used for f2fs_inode_info->flags */ > > enum { > > FI_NEW_INODE, /* indicate newly allocated inode */ > > + FI_TMP_INODE, /* indicate tmpfile */ > > FI_DIRTY_INODE, /* indicate inode is dirty or not */ > > FI_DIRTY_DIR, /* indicate directory has dirty pages */ > > FI_INC_LINK, /* need to increment i_nlink */ > > @@ -1726,6 +1727,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *); > > void release_orphan_inode(struct f2fs_sb_info *); > > void add_orphan_inode(struct f2fs_sb_info *, nid_t); > > void remove_orphan_inode(struct f2fs_sb_info *, nid_t); > > +bool is_orphan_inode(struct f2fs_sb_info *, nid_t); > > void recover_orphan_inodes(struct f2fs_sb_info *); > > int get_valid_checkpoint(struct f2fs_sb_info *); > > void update_dirty_page(struct inode *, struct page *); > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > > index 7464d08..98af3bf 100644 > > --- a/fs/f2fs/super.c > > +++ b/fs/f2fs/super.c > > @@ -430,9 +430,21 @@ static int f2fs_drop_inode(struct inode *inode) > > * - f2fs_write_data_page > > * - f2fs_gc -> iput -> evict > > * - inode_wait_for_writeback(inode) > > + * In order to avoid that, f2fs_write_data_page does not write data > > + * pages for orphan inode except tmpfile. > > + * Nevertheless, we need to truncate the tmpfile's data to avoid > > + * needless cleaning. > > */ > > - if (!inode_unhashed(inode) && inode->i_state & I_SYNC) > > + if (is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE) && > > + inode->i_state & I_SYNC) { > > + spin_unlock(&inode->i_lock); > > + i_size_write(inode, 0); > > + > > + if (F2FS_HAS_BLOCKS(inode)) > > + f2fs_truncate(inode); > > + spin_lock(&inode->i_lock); > > return 0; > > + } > > return generic_drop_inode(inode); > > } > > > > -- > > 2.1.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-05-18 5:44 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-05-13 7:17 Space leak in f2fs hujianyang 2015-05-13 17:46 ` [f2fs-dev] " Jaegeuk Kim 2015-05-14 0:24 ` Jaegeuk Kim 2015-05-14 1:40 ` hujianyang 2015-05-14 1:45 ` [f2fs-dev] " Jaegeuk Kim 2015-05-14 21:14 ` Jaegeuk Kim 2015-05-15 8:31 ` Chao Yu 2015-05-16 0:55 ` Jaegeuk Kim 2015-05-18 2:43 ` Chao Yu 2015-05-18 2:50 ` Nicholas Krause 2015-05-18 5:44 ` [f2fs-dev] " Jaegeuk Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).