[PATCH] nilfs2: fix segctor bug that causes file system corruption

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] nilfs2: fix segctor bug that causes file system corruption
@ 2014-01-02 14:48 Andreas Rohner
       [not found] ` <1388674131-22579-1-git-send-email-andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Rohner @ 2014-01-02 14:48 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA, slava-yeENwD64cLxBDgjK7y7TUQ
  Cc: Andreas Rohner

There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean. It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.

The problem shows itself with the following kernel log message:

nilfs_sufile_do_cancel_free: segment 6533 must be clean

Usually a few hours later the file system gets corrupted:

NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0,
nchildren = 0
NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap
(inode number=114660)

The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.

This is what happens:

1. The cleaner starts the segment construction
2. nilfs_segctor_collect is called
3. sc_stage is on NILFS_ST_SUFILE and segments are freed
4. sc_stage is on NILFS_ST_DAT current segment is full
5. nilfs_segctor_extend_segments is called, which
   allocates a new segment
6. The new segment is one of the segments freed in step 3
7. nilfs_sufile_cancel_freev is called and produces an error message
8. Loop around and the collection starts again
9. sc_stage is on NILFS_ST_SUFILE and segments are freed
   including the newly allocated segment, which will contain active data
   and can be allocated at a later time
10. A few hours later another segment construction allocates the segment
    and causes file system corruption

This can be prevented by simply reordering the statements. If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.

Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
---
 fs/nilfs2/segment.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 9f6b486..a1a1916 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1440,17 +1440,19 @@ static int nilfs_segctor_collect(struct nilfs_sc_info *sci,
 
 		nilfs_clear_logs(&sci->sc_segbufs);
 
-		err = nilfs_segctor_extend_segments(sci, nilfs, nadd);
-		if (unlikely(err))
-			return err;
-
 		if (sci->sc_stage.flags & NILFS_CF_SUFREED) {
 			err = nilfs_sufile_cancel_freev(nilfs->ns_sufile,
 							sci->sc_freesegs,
 							sci->sc_nfreesegs,
 							NULL);
 			WARN_ON(err); /* do not happen */
+			sci->sc_stage.flags &= ~NILFS_CF_SUFREED;
 		}
+
+		err = nilfs_segctor_extend_segments(sci, nilfs, nadd);
+		if (unlikely(err))
+			return err;
+
 		nadd = min_t(int, nadd << 1, SC_MAX_SEGDELTA);
 		sci->sc_stage = prev_stage;
 	}
-- 
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] nilfs2: fix segctor bug that causes file system corruption
       [not found]     ` <741898E4-B794-47FF-ABF6-2DCACA2AED43-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2014-01-02 16:38       ` Andreas Rohner
       [not found]         ` <52C5961F.8000608-hi6Y0CQ0nG0@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Rohner @ 2014-01-02 16:38 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs, Ryusuke Konishi

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

On 2014-01-02 17:59, Vyacheslav Dubeyko wrote:
> What version of the kernel did you used during the issue
> reproducing? Do you have the patch [1] in your kernel?

I used kernel version 3.12.4. A call to

git log --grep="Vyacheslav Dubeyko"

gives me your your patch as the first entry in the commit log, so I am
quite certain, that it was applied.

> Of course, you can discover some new issue. But, currently, I can't
> understand relation between the error message from
> nilfs_sufile_do_cancel_free() and next messages from
> nilfs_bmap_last_key() from your description. So, I need to think
> over your description more deeply and I need in confirmation that
> you reproduce the issue with presence of patch [1] in your kernel.

I am sorry if my description was not clear enough. The problem is with
the retry loop in nilfs_segctor_collect. If the current segment buffer
is full a new segment is allocated and the collection is retried. But
if the allocation of a new segment happens before
nilfs_sufile_cancel_freev, there is a possibility, that a segment is
selected, that before was freed during nilfs_segctor_collect_blocks.
Now the collection is retried and the segments are freed again. This
time though the newly allocated segment is freed as well, without any
error messages. It is still used to write active data, while it is
marked as free in the SUFILE. At some later time it is overwritten,
which can cause all kinds of errors.

If you follow the 10 steps I outlined in my commit message, you should
be able to see the problem. If some of the steps are unclear, I am
happy to provide a more thorough explanation.

best regards,
Andreas Rohner
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSxZYfAAoJEPtvGDmQA801AK4P/3d5V2l77GPVPLiTUDBGoLm8
hFH86x5tcrFuoGCbP72FC9RtslVa654BMaieu2jqpmrJRKMQLEaEfBgsAgwC6sT1
mN1JFp7iMLlF4ECoCy3sJNKFVNNPhA+O7Rt/LJL94PLQayqHCl0RKGUZpjxIp5/v
mncjOsePwOVLbMpeYcNVcDL7IKc5hZ6xyB9Z1uqgwxvcsgFESacurfeS71Fj57wU
UMnD31oOqKYqgXEhyA+Vj8Ow5bOmYcE0WbYZh3obZSD2Efr2GWqnUZKyIY/K3+aM
dlcgOYTou4/rbxND/2rh15u8aSJSIDls8sORvQ0fu9bANmEAZZI+AvmdJA15feR4
kmoYJB0oD+0dGAA7JMRlDCte/wz2dRcHVlJ5Rz/SJ/flE83NEbjn9SCqispqJXJE
3IcMU/37/onTS/3q8CXSB836Pym5vTN9PXRzrFZeUrKcxWLgRYrtC4/PaLt8KdhC
HWM7U8TCV9LFEJ2efEqbNjLYOnyCVe36OQEOaeTCeshdtN2PZj+9YQBKFYnNoOXs
q0LXVUXhXrcTSyd7xaZv/HxrIsqpzz0iqyVW8yiZO2CZjJy4t5kxh58j4dmCh8Ku
C90f3FCUWxbMr2pu/0lWw4ySyF119iO1nhJnRHqfBr2iGdYlUcyqEA9ylqwGVAhQ
b/Pb36DpG9NQNQOwTcsP
=bSZ7
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nilfs2: fix segctor bug that causes file system corruption
       [not found] ` <1388674131-22579-1-git-send-email-andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-01-02 16:59   ` Vyacheslav Dubeyko
       [not found]     ` <741898E4-B794-47FF-ABF6-2DCACA2AED43-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-02 16:59 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: linux-nilfs, Ryusuke Konishi

Hi Andreas,

On Jan 2, 2014, at 5:48 PM, Andreas Rohner wrote:

> There is a bug in the function nilfs_segctor_collect, which results in
> active data being written to a segment, that is marked as clean. It is
> possible, that this segment is selected for a later segment
> construction, whereby the old data is overwritten.
> 

Thank you for your patch. But, currently, I doubt that your patch is correct.

What version of the kernel did you used during the issue reproducing?
Do you have the patch [1] in your kernel?

The description of the issue looks like the problem that it was fixed in my
patch. So, do you confident that you reproduce the issue with presence of patch [1]
in your kernel? Could you check and confirm it?

Of course, you can discover some new issue. But, currently, I can't understand
relation between the error message from nilfs_sufile_do_cancel_free() and next
messages from nilfs_bmap_last_key() from your description. So, I need to think over
your description more deeply and I need in confirmation that you reproduce the
issue with presence of patch [1] in your kernel.

Thanks,
Vyacheslav Dubeyko.

[1] http://comments.gmane.org/gmane.comp.file-systems.nilfs.user/3082

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nilfs2: fix segctor bug that causes file system corruption
       [not found]         ` <52C5961F.8000608-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-01-02 17:17           ` Ryusuke Konishi
       [not found]             ` <20140103.021703.220046134.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Ryusuke Konishi @ 2014-01-02 17:17 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi, Andreas, and Vyacheslav

On Thu, 02 Jan 2014 17:38:55 +0100, Andreas Rohner wrote:
>> Of course, you can discover some new issue. But, currently, I can't
>> understand relation between the error message from
>> nilfs_sufile_do_cancel_free() and next messages from
>> nilfs_bmap_last_key() from your description. So, I need to think
>> over your description more deeply and I need in confirmation that
>> you reproduce the issue with presence of patch [1] in your kernel.
> 
> I am sorry if my description was not clear enough. The problem is with
> the retry loop in nilfs_segctor_collect. If the current segment buffer
> is full a new segment is allocated and the collection is retried. But
> if the allocation of a new segment happens before
> nilfs_sufile_cancel_freev, there is a possibility, that a segment is
> selected, that before was freed during nilfs_segctor_collect_blocks.
> Now the collection is retried and the segments are freed again. This
> time though the newly allocated segment is freed as well, without any
> error messages. It is still used to write active data, while it is
> marked as free in the SUFILE. At some later time it is overwritten,
> which can cause all kinds of errors.

Thank you for posting this patch.

I reviewed the patch, and yes, the patch looks correct and needed to
fix the retry logic of nilfs_segctor_collect().

I am thinking of sending this to upstream.  But, before that, please
test the patch enough in the same situation to confirm that it
actually fixes the problem and it doesn't cause other issues.

At this point, I feel this patch has no defects nor side effects, but
I cannot test it in the same situation until the next week.


Thanks,
Ryusuke Konishi

> If you follow the 10 steps I outlined in my commit message, you should
> be able to see the problem. If some of the steps are unclear, I am
> happy to provide a more thorough explanation.
> 
> best regards,
> Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nilfs2: fix segctor bug that causes file system corruption
       [not found]                 ` <88B1B6B6-1AC9-4BC4-9776-C07D1360A99F-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2014-01-02 19:54                   ` Ryusuke Konishi
  0 siblings, 0 replies; 12+ messages in thread
From: Ryusuke Konishi @ 2014-01-02 19:54 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: Andreas Rohner, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,
On Thu, 2 Jan 2014 23:16:22 +0300, Vyacheslav Dubeyko wrote:
> Hi Ryusuke,
> 
> On Jan 2, 2014, at 8:17 PM, Ryusuke Konishi wrote:
> 
>> Hi, Andreas, and Vyacheslav
>> 
>> Thank you for posting this patch.
>> 
>> I reviewed the patch, and yes, the patch looks correct and needed to
>> fix the retry logic of nilfs_segctor_collect().
>> 
>> I am thinking of sending this to upstream.  But, before that, please
>> test the patch enough in the same situation to confirm that it
>> actually fixes the problem and it doesn't cause other issues.
>> 
>> At this point, I feel this patch has no defects nor side effects, but
>> I cannot test it in the same situation until the next week.
>> 
> 
> 		if (sci->sc_stage.flags & NILFS_CF_SUFREED) {
> 			err = nilfs_sufile_cancel_freev(nilfs->ns_sufile,
> 							sci->sc_freesegs,
> 							sci->sc_nfreesegs,
> 							NULL);
> 			WARN_ON(err); /* do not happen */
> +			sci->sc_stage.flags &= ~NILFS_CF_SUFREED;
> 		}
> 
> I see adding the code of clear NILFS_CF_SUFREED flag in the patch.
> But nilfs_segctor_abort_construction() contains likewise code:
> 
> 1727         if (sci->sc_stage.flags & NILFS_CF_SUFREED) {
> 1728                 ret = nilfs_sufile_cancel_freev(nilfs->ns_sufile,
> 1729                                                 sci->sc_freesegs,
> 1730                                                 sci->sc_nfreesegs,
> 1731                                                 NULL);
> 1732                 WARN_ON(ret); /* do not happen */
> 1733         }
> 
> Maybe, this code needs in flag clearing too?

It is unnecessary for nilfs_segctor_abort_construction() because
nilfs_segctor_abort_construction() is called just before escaping from
nilfs_segctor_do_construct().  The NILFS_CF_SUFREED flag has a meaning
only inside nilfs_segctor_do_construct().

> Anyway, I can see setting this flag and cannot see clearing
> of it. Andreas suggests to clear this flag in the patch.
> Is logic of setting/clearing this flag fully correct?

I think his change is correct, and it is the only call site of
nilfs_sufile_cancel_freev() that needs clearing NILFS_CF_SUFREED flag.

Thank you for your attention to this flag.

Regards,
Ryusuke Konishi


> With the best regards,
> Vyacheslav Dubeyko.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nilfs2: fix segctor bug that causes file system corruption
       [not found]             ` <20140103.021703.220046134.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-01-02 20:16               ` Vyacheslav Dubeyko
       [not found]                 ` <88B1B6B6-1AC9-4BC4-9776-C07D1360A99F-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  2014-01-02 22:45               ` Andreas Rohner
  1 sibling, 1 reply; 12+ messages in thread
From: Vyacheslav Dubeyko @ 2014-01-02 20:16 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Andreas Rohner, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Ryusuke,

On Jan 2, 2014, at 8:17 PM, Ryusuke Konishi wrote:

> Hi, Andreas, and Vyacheslav
> 
> Thank you for posting this patch.
> 
> I reviewed the patch, and yes, the patch looks correct and needed to
> fix the retry logic of nilfs_segctor_collect().
> 
> I am thinking of sending this to upstream.  But, before that, please
> test the patch enough in the same situation to confirm that it
> actually fixes the problem and it doesn't cause other issues.
> 
> At this point, I feel this patch has no defects nor side effects, but
> I cannot test it in the same situation until the next week.
> 

		if (sci->sc_stage.flags & NILFS_CF_SUFREED) {
			err = nilfs_sufile_cancel_freev(nilfs->ns_sufile,
							sci->sc_freesegs,
							sci->sc_nfreesegs,
							NULL);
			WARN_ON(err); /* do not happen */
+			sci->sc_stage.flags &= ~NILFS_CF_SUFREED;
		}

I see adding the code of clear NILFS_CF_SUFREED flag in the patch.
But nilfs_segctor_abort_construction() contains likewise code:

1727         if (sci->sc_stage.flags & NILFS_CF_SUFREED) {
1728                 ret = nilfs_sufile_cancel_freev(nilfs->ns_sufile,
1729                                                 sci->sc_freesegs,
1730                                                 sci->sc_nfreesegs,
1731                                                 NULL);
1732                 WARN_ON(ret); /* do not happen */
1733         }

Maybe, this code needs in flag clearing too?

Anyway, I can see setting this flag and cannot see clearing
of it. Andreas suggests to clear this flag in the patch.
Is logic of setting/clearing this flag fully correct?

With the best regards,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nilfs2: fix segctor bug that causes file system corruption
       [not found]             ` <20140103.021703.220046134.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  2014-01-02 20:16               ` Vyacheslav Dubeyko
@ 2014-01-02 22:45               ` Andreas Rohner
       [not found]                 ` <52C5EC14.2030300-hi6Y0CQ0nG0@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Andreas Rohner @ 2014-01-02 22:45 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Ryusuke and Vyacheslav,

On 2014-01-02 18:17, Ryusuke Konishi wrote:
> I am thinking of sending this to upstream.  But, before that, please
> test the patch enough in the same situation to confirm that it
> actually fixes the problem and it doesn't cause other issues.

I have tested it before, but to be on the safe side, I have rerun my
benchmark (it runs for several hours), and as far as I can tell the
patch does fix the problem and it doesn't cause any other issues.

best regards,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nilfs2: fix segctor bug that causes file system corruption
       [not found]                 ` <52C5EC14.2030300-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-01-03  2:05                   ` Ryusuke Konishi
  0 siblings, 0 replies; 12+ messages in thread
From: Ryusuke Konishi @ 2014-01-03  2:05 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 02 Jan 2014 23:45:40 +0100, Andreas Rohner wrote:
> Hi Ryusuke and Vyacheslav,
> 
> On 2014-01-02 18:17, Ryusuke Konishi wrote:
>> I am thinking of sending this to upstream.  But, before that, please
>> test the patch enough in the same situation to confirm that it
>> actually fixes the problem and it doesn't cause other issues.
> 
> I have tested it before, but to be on the safe side, I have rerun my
> benchmark (it runs for several hours), and as far as I can tell the
> patch does fix the problem and it doesn't cause any other issues.

Thank you.

I'll send it to upstream with a few additional tags (your Tested-by
tag and a cc:stable tag) after reviewing again.

Thanks,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nilfs2: fix segctor bug that causes file system corruption
@ 2014-01-03 18:48 Mark Trumpold
  2014-01-03 19:37 ` Andreas Rohner
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Trumpold @ 2014-01-03 18:48 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: andreas.rohner-hi6Y0CQ0nG0

On Thu, 02 Jan 2014, Andreas Rohner wrote:
>>If you follow the 10 steps I outlined in my commit message, 
>>you should be able to see the problem. If some of the steps are unclear, 
>>I am happy to provide a more thorough explanation.

Hi Andreas,
Would it be possible to share the 10 steps to reproducing the problem?
I want to evaluate the risk in my context before going through another
kernel spin.
Regards and thanks,
Mark T.



--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nilfs2: fix segctor bug that causes file system corruption
  2014-01-03 18:48 Mark Trumpold
@ 2014-01-03 19:37 ` Andreas Rohner
  0 siblings, 0 replies; 12+ messages in thread
From: Andreas Rohner @ 2014-01-03 19:37 UTC (permalink / raw)
  To: Mark Trumpold, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On 2014-01-03 19:48, Mark Trumpold wrote:
> On Thu, 02 Jan 2014, Andreas Rohner wrote:
>>> If you follow the 10 steps I outlined in my commit message, 
>>> you should be able to see the problem. If some of the steps are unclear, 
>>> I am happy to provide a more thorough explanation.
> 
> Hi Andreas,
> Would it be possible to share the 10 steps to reproducing the problem?
> I want to evaluate the risk in my context before going through another
> kernel spin.
> Regards and thanks,
> Mark T.

Hi Mark,

I wasn't referring to 10 steps to reproduce the problem, but to the 10
steps in my commit message, which describe how the problem occurs in the
code. But of course I can share my setup and my benchmark.

I use a 100 GB Volume and fill it with dd up to 20GB. Then I replay the
Lair62 NFS Traces from the IOTTA Repository [1]. In Parallel to that I
run a script, that selects every 5 minutes a random checkpoint and
converts it into a snapshot. If there are more than 3 snapshots, the
oldest snapshot is converted back to a checkpoint. One run of this takes
a little more than 4 hours. But it takes about three runs for the bug to
be reproduced. It is quite hard to reproduce it, since a lot of things
need to go wrong at the same time.

There should be a simpler way to trigger it. If the volume is nearly
full and most of the data is protected from the cleaner by a snapshot
and the cleaner runs at a high frequency and lots of DAT-Entries need to
be written out (e.g.: Deletion of a large file).

best regards,
Andreas Rohner

[1] http://iotta.snia.org/historical_section?tracetype_id=2
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] nilfs2: fix segctor bug that causes file system corruption
@ 2014-01-08 17:22 Ryusuke Konishi
       [not found] ` <1389201735-3739-1-git-send-email-konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Ryusuke Konishi @ 2014-01-08 17:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA, Andreas Rohner,
	Ryusuke Konishi

From: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>

There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean. It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.

The problem shows itself with the following kernel log message:

nilfs_sufile_do_cancel_free: segment 6533 must be clean

Usually a few hours later the file system gets corrupted:

NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0,
nchildren = 0
NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap
(inode number=114660)

The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.

This is what happens:

1. The cleaner starts the segment construction
2. nilfs_segctor_collect is called
3. sc_stage is on NILFS_ST_SUFILE and segments are freed
4. sc_stage is on NILFS_ST_DAT current segment is full
5. nilfs_segctor_extend_segments is called, which
   allocates a new segment
6. The new segment is one of the segments freed in step 3
7. nilfs_sufile_cancel_freev is called and produces an error message
8. Loop around and the collection starts again
9. sc_stage is on NILFS_ST_SUFILE and segments are freed
   including the newly allocated segment, which will contain active data
   and can be allocated at a later time
10. A few hours later another segment construction allocates the segment
    and causes file system corruption

This can be prevented by simply reordering the statements. If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.

Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
Reviewed-by: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Tested-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
Cc: stable <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
---
 fs/nilfs2/segment.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 9f6b486..a1a1916 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1440,17 +1440,19 @@ static int nilfs_segctor_collect(struct nilfs_sc_info *sci,
 
 		nilfs_clear_logs(&sci->sc_segbufs);
 
-		err = nilfs_segctor_extend_segments(sci, nilfs, nadd);
-		if (unlikely(err))
-			return err;
-
 		if (sci->sc_stage.flags & NILFS_CF_SUFREED) {
 			err = nilfs_sufile_cancel_freev(nilfs->ns_sufile,
 							sci->sc_freesegs,
 							sci->sc_nfreesegs,
 							NULL);
 			WARN_ON(err); /* do not happen */
+			sci->sc_stage.flags &= ~NILFS_CF_SUFREED;
 		}
+
+		err = nilfs_segctor_extend_segments(sci, nilfs, nadd);
+		if (unlikely(err))
+			return err;
+
 		nadd = min_t(int, nadd << 1, SC_MAX_SEGDELTA);
 		sci->sc_stage = prev_stage;
 	}
-- 
1.7.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] nilfs2: fix segctor bug that causes file system corruption
       [not found] ` <1389201735-3739-1-git-send-email-konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-01-08 17:55   ` Ryusuke Konishi
  0 siblings, 0 replies; 12+ messages in thread
From: Ryusuke Konishi @ 2014-01-08 17:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA, Andreas Rohner,
	Ryusuke Konishi

Hi Andrew,

Please send this bug-fix to upstream.  It fixes a file system
corruption problem in a near disk full condition.

Thanks,
Ryusuke Konishi

On Thu,  9 Jan 2014 02:22:15 +0900, Ryusuke Konishi wrote:
> From: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
> 
> There is a bug in the function nilfs_segctor_collect, which results in
> active data being written to a segment, that is marked as clean. It is
> possible, that this segment is selected for a later segment
> construction, whereby the old data is overwritten.
> 
> The problem shows itself with the following kernel log message:
> 
> nilfs_sufile_do_cancel_free: segment 6533 must be clean
> 
> Usually a few hours later the file system gets corrupted:
> 
> NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0,
> nchildren = 0
> NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap
> (inode number=114660)
> 
> The issue can be reproduced with a file system that is nearly full and
> with the cleaner running, while some IO intensive task is running.
> Although it is quite hard to reproduce.
> 
> This is what happens:
> 
> 1. The cleaner starts the segment construction
> 2. nilfs_segctor_collect is called
> 3. sc_stage is on NILFS_ST_SUFILE and segments are freed
> 4. sc_stage is on NILFS_ST_DAT current segment is full
> 5. nilfs_segctor_extend_segments is called, which
>    allocates a new segment
> 6. The new segment is one of the segments freed in step 3
> 7. nilfs_sufile_cancel_freev is called and produces an error message
> 8. Loop around and the collection starts again
> 9. sc_stage is on NILFS_ST_SUFILE and segments are freed
>    including the newly allocated segment, which will contain active data
>    and can be allocated at a later time
> 10. A few hours later another segment construction allocates the segment
>     and causes file system corruption
> 
> This can be prevented by simply reordering the statements. If
> nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
> the freed segments are marked as dirty and cannot be allocated any more.
> 
> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
> Reviewed-by: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
> Tested-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
> Cc: stable <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
> Signed-off-by: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
> ---
>  fs/nilfs2/segment.c |   10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index 9f6b486..a1a1916 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -1440,17 +1440,19 @@ static int nilfs_segctor_collect(struct nilfs_sc_info *sci,
>  
>  		nilfs_clear_logs(&sci->sc_segbufs);
>  
> -		err = nilfs_segctor_extend_segments(sci, nilfs, nadd);
> -		if (unlikely(err))
> -			return err;
> -
>  		if (sci->sc_stage.flags & NILFS_CF_SUFREED) {
>  			err = nilfs_sufile_cancel_freev(nilfs->ns_sufile,
>  							sci->sc_freesegs,
>  							sci->sc_nfreesegs,
>  							NULL);
>  			WARN_ON(err); /* do not happen */
> +			sci->sc_stage.flags &= ~NILFS_CF_SUFREED;
>  		}
> +
> +		err = nilfs_segctor_extend_segments(sci, nilfs, nadd);
> +		if (unlikely(err))
> +			return err;
> +
>  		nadd = min_t(int, nadd << 1, SC_MAX_SEGDELTA);
>  		sci->sc_stage = prev_stage;
>  	}
> -- 
> 1.7.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-01-08 17:55 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-02 14:48 [PATCH] nilfs2: fix segctor bug that causes file system corruption Andreas Rohner
     [not found] ` <1388674131-22579-1-git-send-email-andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-01-02 16:59   ` Vyacheslav Dubeyko
     [not found]     ` <741898E4-B794-47FF-ABF6-2DCACA2AED43-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-01-02 16:38       ` Andreas Rohner
     [not found]         ` <52C5961F.8000608-hi6Y0CQ0nG0@public.gmane.org>
2014-01-02 17:17           ` Ryusuke Konishi
     [not found]             ` <20140103.021703.220046134.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-02 20:16               ` Vyacheslav Dubeyko
     [not found]                 ` <88B1B6B6-1AC9-4BC4-9776-C07D1360A99F-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2014-01-02 19:54                   ` Ryusuke Konishi
2014-01-02 22:45               ` Andreas Rohner
     [not found]                 ` <52C5EC14.2030300-hi6Y0CQ0nG0@public.gmane.org>
2014-01-03  2:05                   ` Ryusuke Konishi
  -- strict thread matches above, loose matches on Subject: below --
2014-01-03 18:48 Mark Trumpold
2014-01-03 19:37 ` Andreas Rohner
2014-01-08 17:22 Ryusuke Konishi
     [not found] ` <1389201735-3739-1-git-send-email-konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-01-08 17:55   ` Ryusuke Konishi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.