All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] Big fiability issue?
@ 2016-04-03 17:44 Mathieu Belanger
  0 siblings, 0 replies; 7+ messages in thread
From: Mathieu Belanger @ 2016-04-03 17:44 UTC (permalink / raw)
  To: reiserfs-devel

Hi. I got a recurring problem since I began to use Reiser4 again (so with the kernel 4.4 and 4.6-r1)

This power here is not really good, sometime I get a power outrage... and I lose whole directory when that append, even if I was not using them/writing them...

- On my laptop (kernel 4.4), three week ago, when that appened, I was not able to launch firefox after the reboot so I did a fsck... it lost some file according to fsck, everything in /sbin. After the fsck I cound not boot that installation anymore (normal, without /sbin/udev.....)

- On my Desktop (kernel 4.6-rc1), I got two power outrage yesterday. to be preemptif, I did, right after the power problem, boot in single to do a fsck before anything (to be sure). I lost like 2 file and did put them in /lost+found... fine, no problem, it was a sqlite file and a mp3.. no problem... BUT today I needed to use CLion and guess what?

destroyfx@Tanith /tmp/comlin64 $ ls /opt
ls: reading directory '/opt': Input/output error

I did try to rm -Rf /opt, no succes.. I did another fsck today, it put 3 more file in lost+found... and did not fix /opt.

I did use Reiser4 about two years ago. on that time I did lot of overclocking testing and it crash so much time and did NOT lost anything. Plus, that machine was scrypt mining with 6 video cards (and cold join in PCI-e extender) that was making the machine crash a couple time a week, no problem there eater, no lost file.

So was I lucky two years ago to not loss anything after hundreads of crashs? Because now it's really bad.. (I did not even write in /opt in that running session.. and on my laptop I did not write in /sbin for at lease two week before it failed).

-- 
Mathieu Belanger <admin@korinar.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Big fiability issue?
@ 2016-04-05 15:43 Edward Shishkin
  2016-04-06  0:23 ` Ivan Shapovalov
  2016-04-06 14:25 ` Mathieu Belanger
  0 siblings, 2 replies; 7+ messages in thread
From: Edward Shishkin @ 2016-04-05 15:43 UTC (permalink / raw)
  To: Mathieu Bélanger; +Cc: Reiserfs development mailing list

[-- Attachment #1: Type: text/plain, Size: 607 bytes --]

Hello Mathieu,

I found that by default reiser4 still relies on a block layer feature,
which is not longer supported. This is so-called "barriers". And yes,
on the power outage bad things are bound to happen. However, it
is up to bad luck.

The attached patch removes the rest of block barriers support in
reiser4. So, now we honestly wait for IO completion of wandered
blocks (overwrite set) before submitting a journal header (journal
footer).

Not sure if it will address your problem though. Also, data corruption
after rw-mounting of checked (rebuild-fs) partition is still a concern.

Thanks,
Edward.

[-- Attachment #2: reiser4-drop-barriers-support.patch --]
[-- Type: text/x-patch, Size: 5377 bytes --]

Drop residual block barriers support.

Signed-off-by: Edward Shishkin <edward.shishkin@gmail.com>
---
 fs/reiser4/init_super.c |    2 --
 fs/reiser4/super.h      |    2 --
 fs/reiser4/wander.c     |   47 +++++++++++------------------------------------
 fs/reiser4/writeout.h   |    2 +-
 4 files changed, 12 insertions(+), 41 deletions(-)

--- a/fs/reiser4/init_super.c
+++ b/fs/reiser4/init_super.c
@@ -492,8 +492,6 @@ int reiser4_init_super_data(struct super
 	PUSH_BIT_OPT("dont_load_bitmap", REISER4_DONT_LOAD_BITMAP);
 	/* disable transaction commits during write() */
 	PUSH_BIT_OPT("atomic_write", REISER4_ATOMIC_WRITE);
-	/* disable use of write barriers in the reiser4 log writer. */
-	PUSH_BIT_OPT("no_write_barrier", REISER4_NO_WRITE_BARRIER);
 	/* enable issuing of discard requests */
 	PUSH_BIT_OPT("discard", REISER4_DISCARD);
 	/* disable hole punching at flush time */
--- a/fs/reiser4/super.h
+++ b/fs/reiser4/super.h
@@ -50,8 +50,6 @@ typedef enum {
 	REISER4_DONT_LOAD_BITMAP = 5,
 	/* enforce atomicity during write(2) */
 	REISER4_ATOMIC_WRITE = 6,
-	/* don't use write barriers in the log writer code. */
-	REISER4_NO_WRITE_BARRIER = 7,
 	/* enable issuing of discard requests */
 	REISER4_DISCARD = 8,
 	/* disable hole punching at flush time */
--- a/fs/reiser4/wander.c
+++ b/fs/reiser4/wander.c
@@ -224,11 +224,6 @@ static void done_commit_handle(struct co
 	assert("zam-690", list_empty(&ch->tx_list));
 }
 
-static inline int reiser4_use_write_barrier(struct super_block * s)
-{
-	return !reiser4_is_set(s, REISER4_NO_WRITE_BARRIER);
-}
-
 /* fill journal header block data  */
 static void format_journal_header(struct commit_handle *ch)
 {
@@ -420,7 +415,7 @@ store_wmap_actor(txn_atom * atom UNUSED_
    set is written to wandered locations and all wander records are written
    also. Updated journal header blocks contains a pointer (block number) to
    first wander record of the just written transaction */
-static int update_journal_header(struct commit_handle *ch, int use_barrier)
+static int update_journal_header(struct commit_handle *ch)
 {
 	struct reiser4_super_info_data *sbinfo = get_super_private(ch->super);
 	jnode *jh = sbinfo->journal_header;
@@ -430,7 +425,7 @@ static int update_journal_header(struct
 	format_journal_header(ch);
 
 	ret = write_jnodes_to_disk_extent(jh, 1, jnode_get_block(jh), NULL,
-					  use_barrier ? WRITEOUT_BARRIER : 0);
+					  WRITEOUT_FLUSH_FUA);
 	if (ret)
 		return ret;
 
@@ -450,7 +445,7 @@ static int update_journal_header(struct
 /* This function is called after write-back is finished. We update journal
    footer block and free blocks which were occupied by wandered blocks and
    transaction wander records */
-static int update_journal_footer(struct commit_handle *ch, int use_barrier)
+static int update_journal_footer(struct commit_handle *ch)
 {
 	reiser4_super_info_data *sbinfo = get_super_private(ch->super);
 
@@ -461,7 +456,7 @@ static int update_journal_footer(struct
 	format_journal_footer(ch);
 
 	ret = write_jnodes_to_disk_extent(jf, 1, jnode_get_block(jf), NULL,
-					  use_barrier ? WRITEOUT_BARRIER : 0);
+					  WRITEOUT_FLUSH_FUA);
 	if (ret)
 		return ret;
 
@@ -713,7 +708,7 @@ static int write_jnodes_to_disk_extent(
 	flush_queue_t *fq, int flags)
 {
 	struct super_block *super = reiser4_get_current_sb();
-	int write_op = ( flags & WRITEOUT_BARRIER ) ? WRITE_FLUSH_FUA : WRITE;
+	int write_op = ( flags & WRITEOUT_FLUSH_FUA ) ? WRITE_FLUSH_FUA : WRITE;
 	jnode *cur = first;
 	reiser4_block_nr block;
 
@@ -1101,7 +1096,6 @@ static int alloc_tx(struct commit_handle
 static int commit_tx(struct commit_handle *ch)
 {
 	flush_queue_t *fq;
-	int barrier;
 	int ret;
 
 	/* Grab more space for wandered records. */
@@ -1126,23 +1120,16 @@ static int commit_tx(struct commit_handl
 	reiser4_fq_put(fq);
 	if (ret)
 		return ret;
- 	barrier = reiser4_use_write_barrier(ch->super);
-	if (!barrier) {
-		ret = current_atom_finish_all_fq();
-		if (ret)
-			return ret;
-	}
-	ret = update_journal_header(ch, barrier);
-	if (!barrier || ret)
+	ret = current_atom_finish_all_fq();
+	if (ret)
 		return ret;
-	return current_atom_finish_all_fq();
+	return update_journal_header(ch);
 }
 
 static int write_tx_back(struct commit_handle * ch)
 {
 	flush_queue_t *fq;
 	int ret;
-	int barrier;
 
 	fq = get_fq_for_current_atom();
 	if (IS_ERR(fq))
@@ -1153,22 +1140,10 @@ static int write_tx_back(struct commit_h
 	reiser4_fq_put(fq);
 	if (ret)
 		return ret;
-
-	barrier = reiser4_use_write_barrier(ch->super);
-	if (!barrier) {
-		ret = current_atom_finish_all_fq();
-		if (ret)
-			return ret;
-	}
-	ret = update_journal_footer(ch, barrier);
+	ret = current_atom_finish_all_fq();
 	if (ret)
 		return ret;
-	if (barrier) {
-		ret = current_atom_finish_all_fq();
-		if (ret)
-			return ret;
-	}
-	return 0;
+	return update_journal_footer(ch);
 }
 
 /* We assume that at this moment all captured blocks are marked as RELOC or
@@ -1486,7 +1461,7 @@ static int replay_transaction(const stru
 		}
 	}
 
-	ret = update_journal_footer(&ch, 0);
+	ret = update_journal_footer(&ch);
 
       free_ow_set:
 
--- a/fs/reiser4/writeout.h
+++ b/fs/reiser4/writeout.h
@@ -4,7 +4,7 @@
 
 #define WRITEOUT_SINGLE_STREAM (0x1)
 #define WRITEOUT_FOR_PAGE_RECLAIM  (0x2)
-#define WRITEOUT_BARRIER (0x4)
+#define WRITEOUT_FLUSH_FUA (0x4)
 
 extern int reiser4_get_writeout_flags(void);
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Big fiability issue?
  2016-04-05 15:43 [BUG] Big fiability issue? Edward Shishkin
@ 2016-04-06  0:23 ` Ivan Shapovalov
  2016-04-06 15:03   ` Edward Shishkin
  2016-04-06 14:25 ` Mathieu Belanger
  1 sibling, 1 reply; 7+ messages in thread
From: Ivan Shapovalov @ 2016-04-06  0:23 UTC (permalink / raw)
  To: Edward Shishkin, Mathieu Bélanger; +Cc: Reiserfs development mailing list

[-- Attachment #1: Type: text/plain, Size: 880 bytes --]

On 2016-04-05 at 17:43 +0200, Edward Shishkin wrote:
> Hello Mathieu,
> 
> I found that by default reiser4 still relies on a block layer
> feature,
> which is not longer supported. This is so-called "barriers". And yes,
> on the power outage bad things are bound to happen. However, it
> is up to bad luck.

Hm. Write barriers are not supported? `man mount | grep barrier` yields
many results... or are they different barriers?

--
Ivan Shapovalov / intelfx /

> 
> The attached patch removes the rest of block barriers support in
> reiser4. So, now we honestly wait for IO completion of wandered
> blocks (overwrite set) before submitting a journal header (journal
> footer).
> 
> Not sure if it will address your problem though. Also, data
> corruption
> after rw-mounting of checked (rebuild-fs) partition is still a
> concern.
> 
> Thanks,
> Edward.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Big fiability issue?
  2016-04-05 15:43 [BUG] Big fiability issue? Edward Shishkin
  2016-04-06  0:23 ` Ivan Shapovalov
@ 2016-04-06 14:25 ` Mathieu Belanger
  2016-04-06 14:35   ` Mathieu Belanger
  2016-04-06 16:32   ` Edward Shishkin
  1 sibling, 2 replies; 7+ messages in thread
From: Mathieu Belanger @ 2016-04-06 14:25 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: Mathieu Bélanger, Reiserfs development mailing list

Well, I did boot systemrescuecd to fsck my /dev/sda2 and mount it ro to
do the backup and I was able to get all the data back but with a bonus :
I can mount the fscked /dev/sda2 in rw on systemrescuecd, the corruption
don't come back. SystemrescueCD is based on Gentoo I think. But the one
I got got a 3.18 kernel. I did manually compile the latest tools to be
sure and a got an error because 3.18 support format40 4.0.0, not 4.0.1
but that error did not affect me.

I will do more test with that "corrupt" partition before restorating
the backup.

On Tue, 5 Apr 2016 17:43:45 +0200
Edward Shishkin <edward.shishkin@gmail.com> wrote:

> Hello Mathieu,
> 
> I found that by default reiser4 still relies on a block layer feature,
> which is not longer supported. This is so-called "barriers". And yes,
> on the power outage bad things are bound to happen. However, it
> is up to bad luck.
> 
> The attached patch removes the rest of block barriers support in
> reiser4. So, now we honestly wait for IO completion of wandered
> blocks (overwrite set) before submitting a journal header (journal
> footer).
> 
> Not sure if it will address your problem though. Also, data corruption
> after rw-mounting of checked (rebuild-fs) partition is still a concern.
> 
> Thanks,
> Edward.


-- 
Mathieu Belanger <admin@korinar.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Big fiability issue?
  2016-04-06 14:25 ` Mathieu Belanger
@ 2016-04-06 14:35   ` Mathieu Belanger
  2016-04-06 16:32   ` Edward Shishkin
  1 sibling, 0 replies; 7+ messages in thread
From: Mathieu Belanger @ 2016-04-06 14:35 UTC (permalink / raw)
  To: Mathieu Belanger
  Cc: Edward Shishkin, Mathieu Bélanger,
	Reiserfs development mailing list

Another message, now, to confirm that everything is fine once booted
on my main Gentoo, corruption do not come back. All directory (/opt, /mnt)
are back and broken binaries are back alive too like nothing append.

So for now on until the reason is found, I will use sysrescuecd to
do the fscks and mount it in rw there first and then reboot on my
normal OS.

Mathieu

On Wed, 6 Apr 2016 09:25:40 -0500
Mathieu Belanger <admin@korinar.com> wrote:

> Well, I did boot systemrescuecd to fsck my /dev/sda2 and mount it ro to
> do the backup and I was able to get all the data back but with a bonus :
> I can mount the fscked /dev/sda2 in rw on systemrescuecd, the corruption
> don't come back. SystemrescueCD is based on Gentoo I think. But the one
> I got got a 3.18 kernel. I did manually compile the latest tools to be
> sure and a got an error because 3.18 support format40 4.0.0, not 4.0.1
> but that error did not affect me.
> 
> I will do more test with that "corrupt" partition before restorating
> the backup.
> 
> On Tue, 5 Apr 2016 17:43:45 +0200
> Edward Shishkin <edward.shishkin@gmail.com> wrote:
> 
> > Hello Mathieu,
> > 
> > I found that by default reiser4 still relies on a block layer feature,
> > which is not longer supported. This is so-called "barriers". And yes,
> > on the power outage bad things are bound to happen. However, it
> > is up to bad luck.
> > 
> > The attached patch removes the rest of block barriers support in
> > reiser4. So, now we honestly wait for IO completion of wandered
> > blocks (overwrite set) before submitting a journal header (journal
> > footer).
> > 
> > Not sure if it will address your problem though. Also, data corruption
> > after rw-mounting of checked (rebuild-fs) partition is still a concern.
> > 
> > Thanks,
> > Edward.
> 
> 
> -- 
> Mathieu Belanger <admin@korinar.com>
> --
> To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Mathieu Belanger <admin@korinar.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Big fiability issue?
  2016-04-06  0:23 ` Ivan Shapovalov
@ 2016-04-06 15:03   ` Edward Shishkin
  0 siblings, 0 replies; 7+ messages in thread
From: Edward Shishkin @ 2016-04-06 15:03 UTC (permalink / raw)
  To: intelfx, Mathieu Bélanger; +Cc: Reiserfs development mailing list

Hello Ivan,

Write barriers mean a proper ordering of writes. This is needed to
guarantee consistency. For example, commit record should be written
*after* writes of all journal blocks, etc. In reiser4 write barriers
are always "on" by design.

There were 2 ways to provide such ordering: hardware and software
ones. The hardware way assumes additional supports from the hard
drive and the block layer. The Linux block layer had had such support
~6 years ago, however it was dropped then:
https://lwn.net/Articles/400541/
See also Linux/Documentation/block/barrier.txt in old kernels

Respectively, we should discontinue the "hardware barriers" support
in reiser4 and unconditionally switch to the software implementation
that we have for devices without hardware barriers support. I forgot
to make such switch in due time.

Thanks,
Edward.

On 04/06/2016 02:23 AM, Ivan Shapovalov wrote:
> On 2016-04-05 at 17:43 +0200, Edward Shishkin wrote:
>> Hello Mathieu,
>>
>> I found that by default reiser4 still relies on a block layer
>> feature,
>> which is not longer supported. This is so-called "barriers". And yes,
>> on the power outage bad things are bound to happen. However, it
>> is up to bad luck.
> Hm. Write barriers are not supported? `man mount | grep barrier` yields
> many results... or are they different barriers?
>
> --
> Ivan Shapovalov / intelfx /
>
>> The attached patch removes the rest of block barriers support in
>> reiser4. So, now we honestly wait for IO completion of wandered
>> blocks (overwrite set) before submitting a journal header (journal
>> footer).
>>
>> Not sure if it will address your problem though. Also, data
>> corruption
>> after rw-mounting of checked (rebuild-fs) partition is still a
>> concern.
>>
>> Thanks,
>> Edward.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG] Big fiability issue?
  2016-04-06 14:25 ` Mathieu Belanger
  2016-04-06 14:35   ` Mathieu Belanger
@ 2016-04-06 16:32   ` Edward Shishkin
  1 sibling, 0 replies; 7+ messages in thread
From: Edward Shishkin @ 2016-04-06 16:32 UTC (permalink / raw)
  To: Mathieu Belanger; +Cc: Mathieu Bélanger, Reiserfs development mailing list



On 04/06/2016 04:25 PM, Mathieu Belanger wrote:
> Well, I did boot systemrescuecd to fsck my /dev/sda2 and mount it ro to
> do the backup and I was able to get all the data back but with a bonus :
> I can mount the fscked /dev/sda2 in rw on systemrescuecd, the corruption
> don't come back. SystemrescueCD is based on Gentoo I think. But the one
> I got got a 3.18 kernel.


Please, mount it with "no_write_barrier" option on all kernels older
than 4.5.1 (not yet released). Horrible name, of course: this should
sound like "no_block_write_barrier", or something like this, but it is
too late to fix...

Thanks,
Edward.


>   I did manually compile the latest tools to be
> sure and a got an error because 3.18 support format40 4.0.0, not 4.0.1
> but that error did not affect me.
>
> I will do more test with that "corrupt" partition before restorating
> the backup.
>
> On Tue, 5 Apr 2016 17:43:45 +0200
> Edward Shishkin <edward.shishkin@gmail.com> wrote:
>
>> Hello Mathieu,
>>
>> I found that by default reiser4 still relies on a block layer feature,
>> which is not longer supported. This is so-called "barriers". And yes,
>> on the power outage bad things are bound to happen. However, it
>> is up to bad luck.
>>
>> The attached patch removes the rest of block barriers support in
>> reiser4. So, now we honestly wait for IO completion of wandered
>> blocks (overwrite set) before submitting a journal header (journal
>> footer).
>>
>> Not sure if it will address your problem though. Also, data corruption
>> after rw-mounting of checked (rebuild-fs) partition is still a concern.
>>
>> Thanks,
>> Edward.
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-04-06 16:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-05 15:43 [BUG] Big fiability issue? Edward Shishkin
2016-04-06  0:23 ` Ivan Shapovalov
2016-04-06 15:03   ` Edward Shishkin
2016-04-06 14:25 ` Mathieu Belanger
2016-04-06 14:35   ` Mathieu Belanger
2016-04-06 16:32   ` Edward Shishkin
  -- strict thread matches above, loose matches on Subject: below --
2016-04-03 17:44 Mathieu Belanger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.