Filesystem corruption

All of lore.kernel.org
 help / color / mirror / Atom feed

* Filesystem corruption
@ 2001-01-31 14:20 Carsten Langgaard
  2001-01-31 15:52 ` Florian Lohoff
  2001-02-05 10:02 ` Ralf Baechle
  0 siblings, 2 replies; 89+ messages in thread
From: Carsten Langgaard @ 2001-01-31 14:20 UTC (permalink / raw)
  To: linux-mips

Has anyone seen problems with fsck on the latest 2.4.0 kernel ?
My filesystem gets corrupted from time to time when I use the latest
2.4.0 kernel.

/Carsten


--
_    _ ____  ___   Carsten Langgaard   Mailto:carstenl@mips.com
|\  /|||___)(___   MIPS Denmark        Direct: +45 4486 5527
| \/ |||    ____)  Lautrupvang 4B      Switch: +45 4486 5555
  TECHNOLOGIES     2750 Ballerup       Fax...: +45 4486 5556
                   Denmark             http://www.mips.com

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2001-01-31 14:20 Carsten Langgaard
@ 2001-01-31 15:52 ` Florian Lohoff
  2001-01-31 16:24   ` Carsten Langgaard
  2001-02-05 10:02 ` Ralf Baechle
  1 sibling, 1 reply; 89+ messages in thread
From: Florian Lohoff @ 2001-01-31 15:52 UTC (permalink / raw)
  To: Carsten Langgaard; +Cc: linux-mips

On Wed, Jan 31, 2001 at 03:20:35PM +0100, Carsten Langgaard wrote:
> 
> Has anyone seen problems with fsck on the latest 2.4.0 kernel ?
> My filesystem gets corrupted from time to time when I use the latest
> 2.4.0 kernel.
> 

Hmm - nope - 2.4.0 Bigendian here 

resume:~# uptime
 3:50pm  up 6 days, 10 min,  1 user,  load average: 0.00, 0.00, 0.00
resume:~# uname -a
Linux resume.rfc822.org 2.4.0 #3 Thu Jan 25 16:25:23 CET 2001 mips unknown

Flo
-- 
Florian Lohoff                  flo@rfc822.org             +49-5201-669912
     Why is it called "common sense" when nobody seems to have any?

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2001-01-31 15:52 ` Florian Lohoff
@ 2001-01-31 16:24   ` Carsten Langgaard
  2001-01-31 16:48     ` Florian Lohoff
  0 siblings, 1 reply; 89+ messages in thread
From: Carsten Langgaard @ 2001-01-31 16:24 UTC (permalink / raw)
  To: Florian Lohoff; +Cc: linux-mips

Try use fsck.

/Carsten

Florian Lohoff wrote:

> On Wed, Jan 31, 2001 at 03:20:35PM +0100, Carsten Langgaard wrote:
> >
> > Has anyone seen problems with fsck on the latest 2.4.0 kernel ?
> > My filesystem gets corrupted from time to time when I use the latest
> > 2.4.0 kernel.
> >
>
> Hmm - nope - 2.4.0 Bigendian here
>
> resume:~# uptime
>  3:50pm  up 6 days, 10 min,  1 user,  load average: 0.00, 0.00, 0.00
> resume:~# uname -a
> Linux resume.rfc822.org 2.4.0 #3 Thu Jan 25 16:25:23 CET 2001 mips unknown
>
> Flo
> --
> Florian Lohoff                  flo@rfc822.org             +49-5201-669912
>      Why is it called "common sense" when nobody seems to have any?

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2001-01-31 16:24   ` Carsten Langgaard
@ 2001-01-31 16:48     ` Florian Lohoff
  0 siblings, 0 replies; 89+ messages in thread
From: Florian Lohoff @ 2001-01-31 16:48 UTC (permalink / raw)
  To: Carsten Langgaard; +Cc: linux-mips

On Wed, Jan 31, 2001 at 05:24:58PM +0100, Carsten Langgaard wrote:
> 
> Try use fsck.
> 

*Urgs* Trouble ...



resume:~# df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/sda1              2074328   1360061    607040  70% /
/dev/sde1              3839092    217476   3426600   6% /chroot
/dev/sdb1              4003992   3708044     89260  98% /home2
/dev/sdc1              4003992    449472   3347832  12% /home3
/dev/sdd1              4003992   1134620   2662684  30% /ftp.rfc822.org
resume:~# umount /ftp.rfc822.org/
resume:~# fsck -f /dev/sdd1
Parallelizing fsck version 1.18 (11-Nov-1999)
e2fsck 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09
Pass 1: Checking inodes, blocks, and sizes
Inode 64654, i_blocks is 42696, should be 44744.  Fix<y>? yes

Duplicate blocks found... invoking duplicate block passes.
Pass 1B: Rescan for duplicate/bad blocks
Duplicate/bad block(s) in inode 64654: 265881 ... ... ...
Duplicate/bad block(s) in inode 193927: 265881 ... ... ...
Pass 1C: Scan directories for inodes with dup blocks.
Pass 1D: Reconciling duplicate blocks
(There are 2 inodes containing duplicate/bad blocks.)

File /kernel/kernel-image-2.4.0-ip22-r4k.tgz (inode #193927, mod time Thu Jan 25 11:17:00 2001) 
  has 251 duplicate block(s), shared with 1 file(s):
	/devel/gcc-20000822-mips.tar.gz (inode #64654, mod time Mon Aug 28 17:14:56 2000)
Clone duplicate/bad blocks<y>? yes

File /devel/gcc-20000822-mips.tar.gz (inode #64654, mod time Mon Aug 28 17:14:56 2000) 
  has 251 duplicate block(s), shared with 1 file(s):
	/kernel/kernel-image-2.4.0-ip22-r4k.tgz (inode #193927, mod time Thu Jan 25 11:17:00 2001)
Duplicated blocks already reassigned or cloned.

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  +265876 +265877 +265878 +265879 +265880
Fix<y>? yes

Free blocks count wrong for group #0 (29960, counted=29709).
Fix<y>? yes

Free blocks count wrong for group #8 (5, counted=0).
Fix<y>? yes

Free blocks count wrong (717343, counted=717087).
Fix<y>? yes


/dev/sdd1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sdd1: 6277/1034240 files (21.0% non-contiguous), 316359/1033446 blocks



I ran -test6 and -test9 before.

Flo
-- 
Florian Lohoff                  flo@rfc822.org             +49-5201-669912
     Why is it called "common sense" when nobody seems to have any?

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2001-01-31 14:20 Carsten Langgaard
  2001-01-31 15:52 ` Florian Lohoff
@ 2001-02-05 10:02 ` Ralf Baechle
  2001-02-05 12:10     ` Alan Cox
  1 sibling, 1 reply; 89+ messages in thread
From: Ralf Baechle @ 2001-02-05 10:02 UTC (permalink / raw)
  To: Carsten Langgaard; +Cc: linux-mips

On Wed, Jan 31, 2001 at 03:20:35PM +0100, Carsten Langgaard wrote:

> Has anyone seen problems with fsck on the latest 2.4.0 kernel ?
> My filesystem gets corrupted from time to time when I use the latest
> 2.4.0 kernel.

2.4.1 is known to cause fs corruption for all architectures; 2.4.0 should
actually be fine.  I just reached 8 days of uptime on a 32p Origin 2000,
so it can't be that bad.

  Ralf

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
@ 2001-02-05 12:10     ` Alan Cox
  0 siblings, 0 replies; 89+ messages in thread
From: Alan Cox @ 2001-02-05 12:10 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Carsten Langgaard, linux-mips

> 2.4.1 is known to cause fs corruption for all architectures; 2.4.0 should
> actually be fine.  I just reached 8 days of uptime on a 32p Origin 2000,
> so it can't be that bad.

Im tracking fs corruption and worse on 2.4.0 as well (zero page corruptions
since 2.4.0test10 for example)

I dont believe any 2.4 is currently 'safe'

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
@ 2001-02-05 12:10     ` Alan Cox
  0 siblings, 0 replies; 89+ messages in thread
From: Alan Cox @ 2001-02-05 12:10 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Carsten Langgaard, linux-mips

> 2.4.1 is known to cause fs corruption for all architectures; 2.4.0 should
> actually be fine.  I just reached 8 days of uptime on a 32p Origin 2000,
> so it can't be that bad.

Im tracking fs corruption and worse on 2.4.0 as well (zero page corruptions
since 2.4.0test10 for example)

I dont believe any 2.4 is currently 'safe'

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2001-02-05 12:10     ` Alan Cox
  (?)
@ 2001-02-05 12:56     ` Geert Uytterhoeven
  2001-02-05 13:01         ` Alan Cox
  -1 siblings, 1 reply; 89+ messages in thread
From: Geert Uytterhoeven @ 2001-02-05 12:56 UTC (permalink / raw)
  To: Alan Cox; +Cc: Ralf Baechle, Carsten Langgaard, linux-mips

On Mon, 5 Feb 2001, Alan Cox wrote:
> > 2.4.1 is known to cause fs corruption for all architectures; 2.4.0 should
> > actually be fine.  I just reached 8 days of uptime on a 32p Origin 2000,
> > so it can't be that bad.
> 
> Im tracking fs corruption and worse on 2.4.0 as well (zero page corruptions
> since 2.4.0test10 for example)

Is the zero page mapped on non-m68k architectures?

> I dont believe any 2.4 is currently 'safe'

Ugh...

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
@ 2001-02-05 13:01         ` Alan Cox
  0 siblings, 0 replies; 89+ messages in thread
From: Alan Cox @ 2001-02-05 13:01 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Alan Cox, Ralf Baechle, Carsten Langgaard, linux-mips

> Is the zero page mapped on non-m68k architectures?

It can certainly be hit by DMA and kernel memory ops

> > I dont believe any 2.4 is currently 'safe'
> Ugh...

We'll get there, its doing pretty well for most folks

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
@ 2001-02-05 13:01         ` Alan Cox
  0 siblings, 0 replies; 89+ messages in thread
From: Alan Cox @ 2001-02-05 13:01 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Alan Cox, Ralf Baechle, Carsten Langgaard, linux-mips

> Is the zero page mapped on non-m68k architectures?

It can certainly be hit by DMA and kernel memory ops

> > I dont believe any 2.4 is currently 'safe'
> Ugh...

We'll get there, its doing pretty well for most folks

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
@ 2001-02-05 13:16 Ian Chilton
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Chilton @ 2001-02-05 13:16 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-mips

Hello,

> I dont believe any 2.4 is currently 'safe'

auchhh..

If Alan Cox himself (nearly as bad as Linus saying it..) is saying
that, I am glad I am still running 2.2.17/18 on my servers and am
wondering if I should have upgraded my workstations to 2.4.1  ;(


Bye for Now,

Ian

                                \|||/
                                (o o)
 /---------------------------ooO-(_)-Ooo---------------------------\
 |  Ian Chilton        (IRC Nick - GadgetMan)     ICQ #: 16007717  |
 |-----------------------------------------------------------------|
 |  E-Mail: ian@ichilton.co.uk     Web: http://www.ichilton.co.uk  |
 |-----------------------------------------------------------------|
 |        Proofread carefully to see if you any words out.         |
 \-----------------------------------------------------------------/

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
@ 2001-02-05 16:00 Ian Chilton
  0 siblings, 0 replies; 89+ messages in thread
From: Ian Chilton @ 2001-02-05 16:00 UTC (permalink / raw)
  To: J. Scott Kasten; +Cc: linux-mips

Hello,

> If you're worried about it, do what I do.  Pick one server that always
> runs a known stable release and keep your working/home directories on it
> as NFS exports.  Run your development kernel/tools on an NFS client box.
> That way when it croaks, you don't wast a lot of you time fscking and
> possibly loosing files.


That's basically what I do..


Bye for Now,

Ian


                                \|||/ 
                                (o o)
 /---------------------------ooO-(_)-Ooo---------------------------\
 |  Ian Chilton        (IRC Nick - GadgetMan)     ICQ #: 16007717  |
 |-----------------------------------------------------------------|
 |  E-Mail: ian@ichilton.co.uk     Web: http://www.ichilton.co.uk  |
 |-----------------------------------------------------------------|
 |         Budget: A method for going broke methodically.          |
 \-----------------------------------------------------------------/

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
@ 2001-02-05 22:01           ` Ralf Baechle
  0 siblings, 0 replies; 89+ messages in thread
From: Ralf Baechle @ 2001-02-05 22:01 UTC (permalink / raw)
  To: Alan Cox; +Cc: Geert Uytterhoeven, Carsten Langgaard, linux-mips

On Mon, Feb 05, 2001 at 01:01:33PM +0000, Alan Cox wrote:

> > Is the zero page mapped on non-m68k architectures?
> 
> It can certainly be hit by DMA and kernel memory ops
> 
> > > I dont believe any 2.4 is currently 'safe'
> > Ugh...
> 
> We'll get there, its doing pretty well for most folks

I hope so.  For many of us 2.2 is no longer an option.  That is at least
without heavy patching to add support for hardware that isn't supported
by 2.2.

  Ralf

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
@ 2001-02-05 22:01           ` Ralf Baechle
  0 siblings, 0 replies; 89+ messages in thread
From: Ralf Baechle @ 2001-02-05 22:01 UTC (permalink / raw)
  To: Alan Cox; +Cc: Geert Uytterhoeven, Carsten Langgaard, linux-mips

On Mon, Feb 05, 2001 at 01:01:33PM +0000, Alan Cox wrote:

> > Is the zero page mapped on non-m68k architectures?
> 
> It can certainly be hit by DMA and kernel memory ops
> 
> > > I dont believe any 2.4 is currently 'safe'
> > Ugh...
> 
> We'll get there, its doing pretty well for most folks

I hope so.  For many of us 2.2 is no longer an option.  That is at least
without heavy patching to add support for hardware that isn't supported
by 2.2.

  Ralf

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem Corruption
@ 2002-06-06 18:00 Kurt
  2002-06-07  7:15 ` Oleg Drokin
  0 siblings, 1 reply; 89+ messages in thread
From: Kurt @ 2002-06-06 18:00 UTC (permalink / raw)
  To: reiserfs-list

Hello all,
	I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj        239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs 
related errors on the console.
Upon restart they noticed that the file 
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root 
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an 
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of 
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really 
needed to know if this problem has been observed by anyone else, and what 
steps they took to fix the problem.
-Kurt



-- 
================================================
Kurt Palmer                                      SysAdmin
kpalmer@advance.net                Advance Internet
201-459-2846


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem Corruption
@ 2002-06-06 18:00 Kurt
  0 siblings, 0 replies; 89+ messages in thread
From: Kurt @ 2002-06-06 18:00 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 1734 bytes --]

                                                                  
 (Embedded                                                        
 image moved   Kurt <kpalmer@advance.net>                         
 to file:      06/06/2002 02:00 PM                                
 pic13835.pcx)                                                    
                                                                  







Hello all,
     I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj        239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--
================================================
Kurt Palmer                                      SysAdmin
kpalmer@advance.net                Advance Internet
201-459-2846

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem Corruption
@ 2002-06-06 18:00 Kurt
  0 siblings, 0 replies; 89+ messages in thread
From: Kurt @ 2002-06-06 18:00 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 1846 bytes --]

                                                                  
 (Embedded                                                        
 image moved   Kurt <kpalmer@advance.net>                         
 to file:      06/06/2002 02:00 PM                                
 pic24262.pcx)                                                    
                                                                  








 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic13835.pcx)








Hello all,
     I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj        239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--
================================================
Kurt Palmer                                      SysAdmin
kpalmer@advance.net                Advance Internet
201-459-2846

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem Corruption
@ 2002-06-06 18:00 Kurt
  0 siblings, 0 replies; 89+ messages in thread
From: Kurt @ 2002-06-06 18:00 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 1958 bytes --]

                                                                  
 (Embedded                                                        
 image moved   Kurt <kpalmer@advance.net>                         
 to file:      06/06/2002 02:00 PM                                
 pic11654.pcx)                                                    
                                                                  








 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic24262.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic13835.pcx)








Hello all,
     I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj        239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--
================================================
Kurt Palmer                                      SysAdmin
kpalmer@advance.net                Advance Internet
201-459-2846

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem Corruption
@ 2002-06-06 18:00 Kurt
  0 siblings, 0 replies; 89+ messages in thread
From: Kurt @ 2002-06-06 18:00 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 2070 bytes --]

                                                                  
 (Embedded                                                        
 image moved   Kurt <kpalmer@advance.net>                         
 to file:      06/06/2002 02:00 PM                                
 pic04883.pcx)                                                    
                                                                  








 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic11654.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic24262.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic13835.pcx)








Hello all,
     I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj        239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--
================================================
Kurt Palmer                                      SysAdmin
kpalmer@advance.net                Advance Internet
201-459-2846

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem Corruption
@ 2002-06-06 18:00 Kurt
  0 siblings, 0 replies; 89+ messages in thread
From: Kurt @ 2002-06-06 18:00 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 2182 bytes --]

                                                                  
 (Embedded                                                        
 image moved   Kurt <kpalmer@advance.net>                         
 to file:      06/06/2002 02:00 PM                                
 pic08003.pcx)                                                    
                                                                  








 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic04883.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic11654.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic24262.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic13835.pcx)








Hello all,
     I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj        239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--
================================================
Kurt Palmer                                      SysAdmin
kpalmer@advance.net                Advance Internet
201-459-2846

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem Corruption
@ 2002-06-06 18:00 Kurt
  0 siblings, 0 replies; 89+ messages in thread
From: Kurt @ 2002-06-06 18:00 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 2294 bytes --]

                                                                  
 (Embedded                                                        
 image moved   Kurt <kpalmer@advance.net>                         
 to file:      06/06/2002 02:00 PM                                
 pic06540.pcx)                                                    
                                                                  








 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic08003.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic04883.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic11654.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic24262.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic13835.pcx)








Hello all,
     I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj        239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--
================================================
Kurt Palmer                                      SysAdmin
kpalmer@advance.net                Advance Internet
201-459-2846

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem Corruption
@ 2002-06-06 18:00 Kurt
  0 siblings, 0 replies; 89+ messages in thread
From: Kurt @ 2002-06-06 18:00 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 2406 bytes --]

                                                                  
 (Embedded                                                        
 image moved   Kurt <kpalmer@advance.net>                         
 to file:      06/06/2002 02:00 PM                                
 pic19921.pcx)                                                    
                                                                  








 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic06540.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic08003.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic04883.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic11654.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic24262.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic13835.pcx)








Hello all,
     I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj        239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--
================================================
Kurt Palmer                                      SysAdmin
kpalmer@advance.net                Advance Internet
201-459-2846

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem Corruption
@ 2002-06-06 18:00 Kurt
  0 siblings, 0 replies; 89+ messages in thread
From: Kurt @ 2002-06-06 18:00 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 2518 bytes --]

                                                                  
 (Embedded                                                        
 image moved   Kurt <kpalmer@advance.net>                         
 to file:      06/06/2002 02:00 PM                                
 pic18956.pcx)                                                    
                                                                  








 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic19921.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic06540.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic08003.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic04883.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic11654.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic24262.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic13835.pcx)








Hello all,
     I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj        239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--
================================================
Kurt Palmer                                      SysAdmin
kpalmer@advance.net                Advance Internet
201-459-2846

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem Corruption
@ 2002-06-06 18:00 Kurt
  0 siblings, 0 replies; 89+ messages in thread
From: Kurt @ 2002-06-06 18:00 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 2630 bytes --]

                                                                  
 (Embedded                                                        
 image moved   Kurt <kpalmer@advance.net>                         
 to file:      06/06/2002 02:00 PM                                
 pic30134.pcx)                                                    
                                                                  








 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic18956.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic19921.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic06540.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic08003.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic04883.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic11654.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic24262.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic13835.pcx)








Hello all,
     I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj        239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--
================================================
Kurt Palmer                                      SysAdmin
kpalmer@advance.net                Advance Internet
201-459-2846

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem Corruption
@ 2002-06-06 18:00 Kurt
  0 siblings, 0 replies; 89+ messages in thread
From: Kurt @ 2002-06-06 18:00 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 2742 bytes --]

                                                                  
 (Embedded                                                        
 image moved   Kurt <kpalmer@advance.net>                         
 to file:      06/06/2002 02:00 PM                                
 pic29967.pcx)                                                    
                                                                  








 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic30134.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic18956.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic19921.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic06540.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic08003.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic04883.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic11654.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic24262.pcx)









 (Embedded
 image moved   Kurt <kpalmer@advance.net>
 to file:      06/06/2002 02:00 PM
 pic13835.pcx)








Hello all,
     I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj        239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--
================================================
Kurt Palmer                                      SysAdmin
kpalmer@advance.net                Advance Internet
201-459-2846

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem Corruption
  2002-06-06 18:00 Kurt
@ 2002-06-07  7:15 ` Oleg Drokin
  2002-06-11 16:49   ` Kurt
  0 siblings, 1 reply; 89+ messages in thread
From: Oleg Drokin @ 2002-06-07  7:15 UTC (permalink / raw)
  To: Kurt; +Cc: reiserfs-list

Hello!

On Thu, Jun 06, 2002 at 02:00:01PM -0400, Kurt wrote:

> error stating the file pointed to nowhere.
> I was unable to complete a reiserfsck --fix-fixable because of the length of 
> time that this (fsck) process took since this was an unscheduled downtime.
> During the weekend i will attempt to do the fsck again, however i really 
> needed to know if this problem has been observed by anyone else, and what 
> steps they took to fix the problem.

We recommend you to upgrade your kernel to 2.4.18.
To know what exact problem is it would be very useful if you'd posted excerpts
from kernel logs with actual errors.
Thank you.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem Corruption
  2002-06-07  7:15 ` Oleg Drokin
@ 2002-06-11 16:49   ` Kurt
  0 siblings, 0 replies; 89+ messages in thread
From: Kurt @ 2002-06-11 16:49 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: reiserfs-list

Thanks Oleg,
		sorry for the late response (i was out of the office) , you may find the 
following information on the last crash useful :-
+++++++++++++++
3 04:32:37 devo kernel: vs-13075: reiserfs_read_inode2: dead inode read from 
disk [854 1695654 0x0 SD]. This is likely to be race with knfsd. Ignore
Jun  3 04:32:39 devo kernel: vs-13060: reiserfs_update_sd: stat data of object 
[854 1695654 0x0 SD] (nlink == 1) not found (pos 1)
Jun  3 04:41:38 devo kernel: vs-13060: reiserfs_update_sd: stat data of object 
[854 1695654 0x0 SD] (nlink == 1) not found (pos 1)
Jun  3 04:41:43 devo kernel: vs-13060: reiserfs_update_sd: stat data of object 
[854 1695654 0x0 SD] (nlink == 1) not found (pos 1)
++++++++++++
I will upgrade the kernel and reiserfs tools this week and inform you of the 
result after a fsck.
-Kurt

On Friday 07 June 2002 3:15 am, Oleg Drokin wrote:
> Hello!
>
> On Thu, Jun 06, 2002 at 02:00:01PM -0400, Kurt wrote:
> > error stating the file pointed to nowhere.
> > I was unable to complete a reiserfsck --fix-fixable because of the length
> > of time that this (fsck) process took since this was an unscheduled
> > downtime. During the weekend i will attempt to do the fsck again, however
> > i really needed to know if this problem has been observed by anyone else,
> > and what steps they took to fix the problem.
>
> We recommend you to upgrade your kernel to 2.4.18.
> To know what exact problem is it would be very useful if you'd posted
> excerpts from kernel logs with actual errors.
> Thank you.
>
> Bye,
>     Oleg

-- 
================================================
Kurt Palmer                                      SysAdmin
kpalmer@advance.net                Advance Internet
201-459-2846


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem Corruption
@ 2002-09-05 15:57 Brian Tinsley
  0 siblings, 0 replies; 89+ messages in thread
From: Brian Tinsley @ 2002-09-05 15:57 UTC (permalink / raw)
  To: reiserfs-list

We had problems on a production filesystem, apparently from a machine 
crash. I ran reiserfsck (vers. 3.6.3) on this filesystem and received a 
message that one corruption can only be fixed during --rebuild-tree. My 
question is if I do this, is there any chance of data loss or will the 
filesystem be safely repaired? I've never had to do anything but a 
simple --fix-fixable before (without any problems).

-- 
Brian Tinsley
Chief Systems Engineer
Emageon

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem corruption
@ 2003-08-13 16:05 Locke
  2003-08-14  7:49 ` Oleg Drokin
  0 siblings, 1 reply; 89+ messages in thread
From: Locke @ 2003-08-13 16:05 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 4521 bytes --]

  Hi,

    I've got a problem with reiserfs today while I was trying to access 
my network files. I tried browsing my network drive and found out that 
some of my directories were empty. So I unmounted the partition and ran 
reiserfsck(3.6.8), it said I had 4 corruptions and told me to run 
--rebuild-tree. And so I did and it recovered only 7.8GB of 47.8GB of 
the files. I'm guessing the reason why it recovered so little was 
because that because I was running a 7.8GB+40GB LVM and the 40GB 
pyhsical volume wasn't working and left it with only 7.8GB.

Here's the specs of my system:   linux-2.4.21, reisfs-3.6.8, LVM-1.0.7 
(7.8GB + 40GB)
Partitions:
    /dev/hda (ext2)    /   3.2GB
    /dev/hdb+/dev/hdg => /dev/main_vg/storage_lv(reiserfs)   
/mnt/storage   47.8GB

Here's some output of dmesg at the point where I discovered the problem:

is_tree_node: node level 0 does not match to the expected one 1
vs-5150: search_by_key: invalid format found in block 8838461. Fsck?
is_tree_node: node level 0 does not match to the expected one 1
vs-5150: search_by_key: invalid format found in block 8838461. Fsck?
is_tree_node: node level 0 does not match to the expected one 1
vs-5150: search_by_key: invalid format found in block 11534730. Fsck?
is_tree_node: node level 0 does not match to the expected one 1
vs-5150: search_by_key: invalid format found in block 8838461. Fsck?
is_tree_node: node level 0 does not match to the expected one 1
vs-5150: search_by_key: invalid format found in block 8838461. Fsck?
is_tree_node: node level 0 does not match to the expected one 1
vs-5150: search_by_key: invalid format found in block 3412777. Fsck?
is_tree_node: node level 0 does not match to the expected one 1
vs-5150: search_by_key: invalid format found in block 11534730. Fsck?
is_tree_node: node level 0 does not match to the expected one 1
vs-5150: search_by_key: invalid format found in block 11604101. Fsck?
is_tree_node: node level 0 does not match to the expected one 1
vs-5150: search_by_key: invalid format found in block 11534730. Fsck?

reiserfs: checking transaction log (device 3a:00) ...
is_tree_node: node level 32769 does not match to the expected one 1
vs-5150: search_by_key: invalid format found in block 5505049. Fsck?
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat 
data of [1 2 0x0 SD]
Using r5 hash to sort names
is_tree_node: node level 0 does not match to the expected one 2
vs-5150: search_by_key: invalid format found in block 2412772. Fsck?
vs-2140: finish_unfinished: search_by_key returned -2
ReiserFS version 3.6.25
reiserfs: checking transaction log (device 3a:00) ...
is_leaf: free space seems wrong: level=1, nr_items=1, free_space=778 rdkey
vs-5150: search_by_key: invalid format found in block 5505049. Fsck?
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat 
data of [1 2 0x0 SD]
Using r5 hash to sort names
is_tree_node: node level 0 does not match to the expected one 2
vs-5150: search_by_key: invalid format found in block 2412772. Fsck?
vs-2140: finish_unfinished: search_by_key returned -2
ReiserFS version 3.6.25
VFS: Can't find ext3 filesystem on dev lvm(58,0).
reiserfs: checking transaction log (device 3a:00) ...
is_leaf: free space seems wrong: level=1, nr_items=1, free_space=778 rdkey
vs-5150: search_by_key: invalid format found in block 5505049. Fsck?
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat 
data of [1 2 0x0 SD]
Using r5 hash to sort names
is_tree_node: node level 0 does not match to the expected one 2
vs-5150: search_by_key: invalid format found in block 2412772. Fsck?
vs-2140: finish_unfinished: search_by_key returned -2
ReiserFS version 3.6.25

And also when rebooting after the corruption I saw several error 
messages for all drives, hda, hdb and hdg
**

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

**The messages are copied from the FAQ in namesys.com because they 
looked similar so I'm not sure if they're the exactly same.

I tried loading a previous kernel(2.4.20) and the error messages were 
gone, this was probably because of some errors I made when configuring 
the 2.4.21 kernel. It was the first time I've compiled the kernel 
without thoroughly checking the configurations and now I suffer the 
consequences.

Is there anything I can try to recover more data?

Regards,
Kent

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2003-08-13 16:05 Locke
@ 2003-08-14  7:49 ` Oleg Drokin
  0 siblings, 0 replies; 89+ messages in thread
From: Oleg Drokin @ 2003-08-14  7:49 UTC (permalink / raw)
  To: Locke; +Cc: reiserfs-list

Hello!

On Thu, Aug 14, 2003 at 12:05:28AM +0800, Locke wrote:
> the files. I'm guessing the reason why it recovered so little was 
> because that because I was running a 7.8GB+40GB LVM and the 40GB 
> pyhsical volume wasn't working and left it with only 7.8GB.

Yes of course.

> is_tree_node: node level 0 does not match to the expected one 1
> vs-5150: search_by_key: invalid format found in block 8838461. Fsck?

So LVM substitures zero filled blocks instead of data if physical volume
is unavailable.
Of course reiserfsck happily thrown all of those blocks out of the tree.

> And also when rebooting after the corruption I saw several error 
> messages for all drives, hda, hdb and hdg
> **
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Also you should consider replacing your noisy IDE cable for primary IDE
controller with not noisy one. Or just run in lower UDMA mode.

> **The messages are copied from the FAQ in namesys.com because they 
> looked similar so I'm not sure if they're the exactly same.

Well, if they are not the same, you'd better write them down on paper.

> Is there anything I can try to recover more data?

You might try to get LVM up again and run reiserfsck --rebuild tree.
Some more stuff wuill be restored.
Though still you will have lots of files' content lost and there is no way
to restore it anymore.
Also use reiserfsck 3.6.11

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem corruption
@ 2007-05-27 13:18 Laurent CARON
  2007-05-28 12:23 ` Vladimir V. Saveliev
                   ` (2 more replies)
  0 siblings, 3 replies; 89+ messages in thread
From: Laurent CARON @ 2007-05-27 13:18 UTC (permalink / raw)
  To: reiserfs-list, reiserfs-dev

Hi,

A few days ago, one of my procmail suddenly receipes stopped to work.

I didn't care much since this only was for 1 or 2 mails.

Yesterday, i took time to dig it a bit further and looked at the
filesystem on my mail server

Here is the output of ls -al in the Maildir where my mails are stored

total 1341
drwx------   6 lcaron mail       256 2007-05-24 10:35 ./
drwx------ 363 lcaron mail     12184 2007-05-25 21:52 ../
-rw-r--r--   1 lcaron mail        17 2004-05-25 09:19 courierimapacl
drwx------   2 lcaron mail        48 2004-05-25 09:20 courierimapkeywords/
-rw-r--r--   1 lcaron lcaron  169365 2007-05-24 10:35 courierimapuiddb
drwx------   2 lcaron mail   1185016 2007-05-24 10:26 cur/
-rw-------   1 lcaron mail         0 2004-05-25 09:19 maildirfolder
?---------   ? ?      ?            ?                ? new
drwx------   2 lcaron mail        48 2007-05-24 19:16 tmp/


The entry that scares me is
?---------   ? ?      ?            ?                ? new

Seems to me it is a filesystem corruption.

Any other solution than rebuild-tree ?

Thanks

Laurent


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-27 13:18 Laurent CARON
@ 2007-05-28 12:23 ` Vladimir V. Saveliev
  2007-05-28 14:10   ` Laurent CARON
       [not found] ` <Pine.LNX.4.64.0705280025570.10429@sheep.housecafe.de>
       [not found] ` <465BA9AC.8040805@ultraviolet.org>
  2 siblings, 1 reply; 89+ messages in thread
From: Vladimir V. Saveliev @ 2007-05-28 12:23 UTC (permalink / raw)
  To: Laurent CARON; +Cc: reiserfs-dev, reiserfs-list

Hello

On Sunday 27 May 2007 17:18, Laurent CARON wrote:
> Hi,
> 
> A few days ago, one of my procmail suddenly receipes stopped to work.
> 
> I didn't care much since this only was for 1 or 2 mails.
> 
> Yesterday, i took time to dig it a bit further and looked at the
> filesystem on my mail server
> 
> Here is the output of ls -al in the Maildir where my mails are stored
> 
> total 1341
> drwx------   6 lcaron mail       256 2007-05-24 10:35 ./
> drwx------ 363 lcaron mail     12184 2007-05-25 21:52 ../
> -rw-r--r--   1 lcaron mail        17 2004-05-25 09:19 courierimapacl
> drwx------   2 lcaron mail        48 2004-05-25 09:20 courierimapkeywords/
> -rw-r--r--   1 lcaron lcaron  169365 2007-05-24 10:35 courierimapuiddb
> drwx------   2 lcaron mail   1185016 2007-05-24 10:26 cur/
> -rw-------   1 lcaron mail         0 2004-05-25 09:19 maildirfolder
> ?---------   ? ?      ?            ?                ? new
> drwx------   2 lcaron mail        48 2007-05-24 19:16 tmp/
> 
> 
> The entry that scares me is
> ?---------   ? ?      ?            ?                ? new
> 
> Seems to me it is a filesystem corruption.
> 
> Any other solution than rebuild-tree ?
> 

Did you try "rm -rf new"?


> Thanks
> 
> Laurent
> 
> 
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-28 12:23 ` Vladimir V. Saveliev
@ 2007-05-28 14:10   ` Laurent CARON
  2007-05-28 17:13     ` Vladimir V. Saveliev
  0 siblings, 1 reply; 89+ messages in thread
From: Laurent CARON @ 2007-05-28 14:10 UTC (permalink / raw)
  To: reiserfs-list; +Cc: Vladimir V. Saveliev, reiserfs-dev

Vladimir V. Saveliev a écrit :
> Did you try "rm -rf new"?

$ rm -rf new
rm: cannot lstat `new': Permission denied

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-28 14:10   ` Laurent CARON
@ 2007-05-28 17:13     ` Vladimir V. Saveliev
  2007-05-28 17:27       ` Laurent CARON
  0 siblings, 1 reply; 89+ messages in thread
From: Vladimir V. Saveliev @ 2007-05-28 17:13 UTC (permalink / raw)
  To: Laurent CARON; +Cc: reiserfs-list

Hello

On Monday 28 May 2007 18:10, Laurent CARON wrote:
> Vladimir V. Saveliev a écrit :
> > Did you try "rm -rf new"?
> 
> $ rm -rf new
> rm: cannot lstat `new': Permission denied
> 
> 
Is there anything from reiserfs in system logs?

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-28 17:13     ` Vladimir V. Saveliev
@ 2007-05-28 17:27       ` Laurent CARON
  0 siblings, 0 replies; 89+ messages in thread
From: Laurent CARON @ 2007-05-28 17:27 UTC (permalink / raw)
  To: reiserfs-list; +Cc: Vladimir V. Saveliev

Vladimir V. Saveliev a écrit :
> Is there anything from reiserfs in system logs?
> 

Nothing from reiserfs/kernel in

I did experience a similar bug on another computer a while ago (this bug 
was "fixed" by rebuilding the tree).

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
       [not found] ` <Pine.LNX.4.64.0705280025570.10429@sheep.housecafe.de>
@ 2007-05-28 17:31   ` Christian Kujau
  2007-05-28 18:16     ` Laurent CARON
  0 siblings, 1 reply; 89+ messages in thread
From: Christian Kujau @ 2007-05-28 17:31 UTC (permalink / raw)
  To: Christian Kujau; +Cc: reiserfs-list

[resending, because lncsa.com bounced my mail]

On Mon, 28 May 2007, Christian Kujau wrote:
> On Sun, 27 May 2007, Laurent CARON wrote:
>> The entry that scares me is
>> ?---------   ? ?      ?            ?                ? new
>> 
>> Seems to me it is a filesystem corruption.
>> Any other solution than rebuild-tree ?
>
> Please try to check the fs with a current version of reiserfsprogs first. As 
> the manpage advises, try --check first and use --rebuild-tree only if you 
> know what you're doing, IOW: have a current backup.
>
> Also, which kernel/machine is this running on? Do you know *why* this 
> corruption may have occured? Any recent hardware issues? Is ther anything in 
> the logs regarding fs/device errors?
>
> C.
> -- 
> BOFH excuse #448:
>
> vi needs to be upgraded to vii

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-28 17:31   ` Christian Kujau
@ 2007-05-28 18:16     ` Laurent CARON
  2007-05-28 23:19       ` Christian Kujau
  2007-05-29  8:39       ` Vladimir V. Saveliev
  0 siblings, 2 replies; 89+ messages in thread
From: Laurent CARON @ 2007-05-28 18:16 UTC (permalink / raw)
  To: reiserfs-list

Christian Kujau a écrit :
>> Please try to check the fs with a current version of reiserfsprogs 
>> first. As the manpage advises, try --check first and use 
>> --rebuild-tree only if you know what you're doing, IOW: have a current 
>> backup.

Over the past few years, i experienced a few reiser corruption on 
various hardware (dell, hp, asus, sata, scsi, ide...) with the same 
symptoms (unredable file/dir).
Always ran check which told me to run fix-fixable or rebuild-tree, which 
I did after ensuring of backup reliability, and the error was corrected 
(after eventually losing a few files i fortunately had in the backups).

>>
>> Also, which kernel/machine is this running on? Do you know *why* this 
>> corruption may have occured? Any recent hardware issues? Is ther 
>> anything in the logs regarding fs/device errors?

Kernel is 2.6.19.
The machine does not seem to have any HW issue, nothing strange in the 
logs..... :$
This is just a plain Dell 2650 server with a bunch of SCSI HDD, software 
raid5 array, reiserfs on top of it.

Laurent

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-28 18:16     ` Laurent CARON
@ 2007-05-28 23:19       ` Christian Kujau
  2007-05-29  8:39       ` Vladimir V. Saveliev
  1 sibling, 0 replies; 89+ messages in thread
From: Christian Kujau @ 2007-05-28 23:19 UTC (permalink / raw)
  To: reiserfs-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 28 May 2007, Laurent CARON wrote:
> Always ran check which told me to run fix-fixable or rebuild-tree, which I 
> did after ensuring of backup reliability, and the error was corrected (after 
> eventually losing a few files i fortunately had in the backups).

Well, lucky you :)

> The machine does not seem to have any HW issue, nothing strange in the 
> logs..... :$
> This is just a plain Dell 2650 server with a bunch of SCSI HDD, software 
> raid5 array, reiserfs on top of it.

...and no power-failures, bad memory whatsoever?
Hm, too bad, since now it's unclear 
what *caused* the corruptions in the first place. You'll probably 
(hopefully) be able to correct this corruption with --rebuild-tree but 
I'd have a close look on this filesystem for further curruptions.

Christian.
- -- 
BOFH excuse #118:

the router thinks its a printer.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGW2N/+A7rjkF8z0wRAg9yAJ9PgWYfv1KC1Z3o/cVXScqxTYDPfwCdHKDD
Wy3p1M9ODJFfuqn0JaCEu8U=
=uCAH
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
       [not found] ` <465BA9AC.8040805@ultraviolet.org>
@ 2007-05-29  8:15   ` Vladimir V. Saveliev
  2007-05-29 12:36     ` Toby Thain
  0 siblings, 1 reply; 89+ messages in thread
From: Vladimir V. Saveliev @ 2007-05-29  8:15 UTC (permalink / raw)
  To: Tracy R Reed; +Cc: Laurent CARON, reiserfs-list

Hello

On Tuesday 29 May 2007 08:18, Tracy R Reed wrote:
> Laurent CARON wrote:
> > Seems to me it is a filesystem corruption.
> 
> Did I miss it or did not a single person ask you if this happened with
> reiserfs 3 or 4?
> 

Laurent mentioned rebuild-tree mode of reiserfsck. So the problem happened  with reiserfs 3.

> I would be quite surprised if this were reiser 3 and not so surprised if
> it were reiser 4 which is still beta afaik.
> 
> Reiser has a nasty reputation for filesystem corruption more than any
> other fs. I have always found reiser3 to be rock solid but you can't
> mention using reiserfs in mixed company without someone accusing you of
> throwing your data away. You would think the developers would be doing
> more to counter this but I have been following reiserfs for years and
> nobody seems to really care all that much.
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-28 18:16     ` Laurent CARON
  2007-05-28 23:19       ` Christian Kujau
@ 2007-05-29  8:39       ` Vladimir V. Saveliev
  1 sibling, 0 replies; 89+ messages in thread
From: Vladimir V. Saveliev @ 2007-05-29  8:39 UTC (permalink / raw)
  To: Laurent CARON; +Cc: reiserfs-list

Hello

On Monday 28 May 2007 22:16, Laurent CARON wrote:
> Christian Kujau a écrit :
> >> Please try to check the fs with a current version of reiserfsprogs 
> >> first. As the manpage advises, try --check first and use 
> >> --rebuild-tree only if you know what you're doing, IOW: have a current 
> >> backup.
> 
> Over the past few years, i experienced a few reiser corruption on 
> various hardware (dell, hp, asus, sata, scsi, ide...) with the same 
> symptoms (unredable file/dir).
> Always ran check which told me to run fix-fixable or rebuild-tree, which 
> I did after ensuring of backup reliability, and the error was corrected 
> (after eventually losing a few files i fortunately had in the backups).
> 

Would you run reiserfsck --check -l log and let us see the log?
That may give a hint about which kind of corruptions do you have.

> >>
> >> Also, which kernel/machine is this running on? Do you know *why* this 
> >> corruption may have occured? Any recent hardware issues? Is ther 
> >> anything in the logs regarding fs/device errors?
> 
> Kernel is 2.6.19.
> The machine does not seem to have any HW issue, nothing strange in the 
> logs..... :$
> This is just a plain Dell 2650 server with a bunch of SCSI HDD, software 
> raid5 array, reiserfs on top of it.
> 
> Laurent
> 
> 
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-29  8:15   ` Vladimir V. Saveliev
@ 2007-05-29 12:36     ` Toby Thain
  2007-05-30 13:25       ` David Masover
  2007-05-30 16:08       ` Vladimir V. Saveliev
  0 siblings, 2 replies; 89+ messages in thread
From: Toby Thain @ 2007-05-29 12:36 UTC (permalink / raw)
  To: Vladimir V. Saveliev; +Cc: ReiserFS List

>>  I have always found reiser3 to be rock solid

My experienced too, over many server years.

>> but you can't
>> mention using reiserfs in mixed company without someone accusing  
>> you of
>> throwing your data away.

People who repeat this rarely have any direct experience of Reiser;  
they repeat what they've heard; like all myths and legends they are  
transmitted orally rather than based on scientific observation.

>> You would think the developers would be doing
>> more to counter this but I have been following reiserfs for years and
>> nobody seems to really care all that much.
>>

Can't do much about human nature. MySQL suffers from the same  
baseless poisoned folk wisdom.

--Toby

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-29 12:36     ` Toby Thain
@ 2007-05-30 13:25       ` David Masover
  2007-05-30 16:02         ` Vladimir V. Saveliev
  2007-05-30 16:42         ` Toby Thain
  2007-05-30 16:08       ` Vladimir V. Saveliev
  1 sibling, 2 replies; 89+ messages in thread
From: David Masover @ 2007-05-30 13:25 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 1821 bytes --]

On Tuesday 29 May 2007 07:36:13 Toby Thain wrote:

> >> but you can't
> >> mention using reiserfs in mixed company without someone accusing
> >> you of
> >> throwing your data away.
>
> People who repeat this rarely have any direct experience of Reiser;
> they repeat what they've heard; like all myths and legends they are
> transmitted orally rather than based on scientific observation.

Well, there is one problem I vaguely remember that I don't think has been 
addressed, I think it was one of those lets-put-it-off-till-v4 things. It was 
the fact that there are a limited number of inodes (or keys, or whatever you 
call a unique file), and no way of knowing how many you have left until your 
FS will suddenly, one day refuse to create another file.

(For comparison, ext3 seems to support not only telling you how many inodes 
you have left, but tuning that on the fly.)

But, I haven't run into that, and the only problem I've had lately has been 
Reiser4 losing data, and crashing occasionally. I switched most of my data 
off of Reiser4 and onto XFS for that reason. I've also been using ext3 in 
some places, and Reiser3 in others (one place in particular where space is 
limited, but I will have tons of small files).

I later learned that XFS does out-of-order writes by default, making me think 
I should give up and invest in UPS hardware. But, switching away from Reiser4 
means I no longer see random files (including stuff in, for example, /sbin, 
that I hadn't touched in months) go up in smoke.

Ordinarily I like to help debug things, but not at the risk of my data. Maybe 
I'll try again later, and see if I can reproduce it in a VM or somewhere 
safe...

I do still follow the list, though, in case something interesting happens. It 
was fun while it lasted!

[-- Attachment #2: Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-30 13:25       ` David Masover
@ 2007-05-30 16:02         ` Vladimir V. Saveliev
  2007-05-30 20:06           ` David Masover
  2007-05-30 16:42         ` Toby Thain
  1 sibling, 1 reply; 89+ messages in thread
From: Vladimir V. Saveliev @ 2007-05-30 16:02 UTC (permalink / raw)
  To: David Masover; +Cc: reiserfs-list

Hello

On Wednesday 30 May 2007 17:25, David Masover wrote:
> On Tuesday 29 May 2007 07:36:13 Toby Thain wrote:
> 
> > >> but you can't
> > >> mention using reiserfs in mixed company without someone accusing
> > >> you of
> > >> throwing your data away.
> >
> > People who repeat this rarely have any direct experience of Reiser;
> > they repeat what they've heard; like all myths and legends they are
> > transmitted orally rather than based on scientific observation.
> 
> Well, there is one problem I vaguely remember that I don't think has been 
> addressed, I think it was one of those lets-put-it-off-till-v4 things. It was 
> the fact that there are a limited number of inodes (or keys, or whatever you 
> call a unique file), and no way of knowing how many you have left until your 
> FS will suddenly, one day refuse to create another file.
> 

reiserfs is limited to ~2^32 file creations. It is possible to exhaust but I do not remember any reports about that.

> (For comparison, ext3 seems to support not only telling you how many inodes 
> you have left, but tuning that on the fly.)
> 
> But, I haven't run into that, and the only problem I've had lately has been 
> Reiser4 losing data, and crashing occasionally. I switched most of my data 
> off of Reiser4 and onto XFS for that reason. I've also been using ext3 in 
> some places, and Reiser3 in others (one place in particular where space is 
> limited, but I will have tons of small files).
> 
> I later learned that XFS does out-of-order writes by default, making me think 
> I should give up and invest in UPS hardware. But, switching away from Reiser4 
> means I no longer see random files (including stuff in, for example, /sbin, 
> that I hadn't touched in months) go up in smoke.
> 
> Ordinarily I like to help debug things, but not at the risk of my data. Maybe 
> I'll try again later, and see if I can reproduce it in a VM or somewhere 
> safe...
> 
that would be great, thanks

> I do still follow the list, though, in case something interesting happens. It 
> was fun while it lasted!
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-29 12:36     ` Toby Thain
  2007-05-30 13:25       ` David Masover
@ 2007-05-30 16:08       ` Vladimir V. Saveliev
  1 sibling, 0 replies; 89+ messages in thread
From: Vladimir V. Saveliev @ 2007-05-30 16:08 UTC (permalink / raw)
  To: Toby Thain; +Cc: reiserfs-list

Hello

On Tuesday 29 May 2007 16:36, Toby Thain wrote:
> >>  I have always found reiser3 to be rock solid
> 
> My experienced too, over many server years.
> 
> >> but you can't
> >> mention using reiserfs in mixed company without someone accusing  
> >> you of
> >> throwing your data away.
> 
> People who repeat this rarely have any direct experience of Reiser;  
> they repeat what they've heard; like all myths and legends they are  
> transmitted orally rather than based on scientific observation.
> 
well, there were in past several bad stories when reiserfsck was unable restore filesystems because it was unable to find
reiserfs metadata.
Later we found that sometimes (for unknown (but not likely due to reiserfs problem) reason) partition table changes so that 
beginning of a partition gets shifted by few sectors. So, now, when a user reports that reiserfs metadata disappered from a device completely - recovering a partition table to 
original state makes data available again.

> >> You would think the developers would be doing
> >> more to counter this but I have been following reiserfs for years and
> >> nobody seems to really care all that much.
> >>
> 
> Can't do much about human nature. MySQL suffers from the same  
> baseless poisoned folk wisdom.
> 
> --Toby
> 
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-30 13:25       ` David Masover
  2007-05-30 16:02         ` Vladimir V. Saveliev
@ 2007-05-30 16:42         ` Toby Thain
  2007-05-30 19:42           ` David Masover
  1 sibling, 1 reply; 89+ messages in thread
From: Toby Thain @ 2007-05-30 16:42 UTC (permalink / raw)
  To: David Masover; +Cc: ReiserFS List


On 30-May-07, at 10:25 AM, David Masover wrote:

> On Tuesday 29 May 2007 07:36:13 Toby Thain wrote:
>
>>>> but you can't
>>>> mention using reiserfs in mixed company without someone accusing
>>>> you of
>>>> throwing your data away.
>>
>> People who repeat this rarely have any direct experience of Reiser;
>> they repeat what they've heard; like all myths and legends they are
>> transmitted orally rather than based on scientific observation.
>
> Well, there is one problem I vaguely remember that I don't think  
> has been
> addressed, I think it was one of those lets-put-it-off-till-v4  
> things. It was
> the fact that there are a limited number of inodes (or keys, or  
> whatever you
> call a unique file),

But does it cause data loss? One usually sees claims that "reiserfs  
ate my data", or "I heard reiserfs ate somebody's data", but without  
supplying a root cause - bad memory? powerfail? bad disk? etc.

> and no way of knowing how many you have left until your
> FS will suddenly, one day refuse to create another file.
>

> ... switching away from Reiser4
> means I no longer see random files (including stuff in, for  
> example, /sbin,
> that I hadn't touched in months) go up in smoke.

I only wish sanity had prevailed over  kernel inclusion, then we'd  
see it shaken down a lot quicker, like R3 was.

>
> Ordinarily I like to help debug things, but not at the risk of my  
> data. Maybe
> I'll try again later, and see if I can reproduce it in a VM or  
> somewhere
> safe...
>
> I do still follow the list, though, in case something interesting  
> happens.

Yeah, R4 is "something interesting". :) I still hope it gets finished...

--Toby

> It
> was fun while it lasted!


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
@ 2007-05-30 17:22 devsk
  2007-05-30 19:24 ` Toby Thain
  2007-05-30 20:03 ` David Masover
  0 siblings, 2 replies; 89+ messages in thread
From: devsk @ 2007-05-30 17:22 UTC (permalink / raw)
  To: Toby Thain, David Masover; +Cc: ReiserFS List

[-- Attachment #1: Type: text/plain, Size: 3263 bytes --]

I think people just like to spread FUD without doing any analysis of what really caused the FS corruption. It can be anything from a bad 3rd party driver to bad hardware ('bad blocks', does anybody check for them before mkfs these days? I do). People also like to try those untested patchsets, containing every blah that's thrown out by so called 'kernel hackers' which makes your system 10x faster. Rieser4 seems like an easy candidate to vent their anger on afterwards.

I have used R4 for a year now and I have had to reset my PC, troubleshooting problems with vmware/mythtv/cisco vpn client/nvidia, so many times that its not even funny! And R4 didn't give me any problems even once. It boots right up, without any files lost and consistent FS as a subsequent livecd boot and fsck proved it everytime. If I did that to ext or xfs, I would have lost big time. Only files I have ever lost were on ext3 during a sudden power failure. I don't trust safety of my data on any FS but Rieserfs. I hope people don't leave this good piece of code to rot!!

-devsk

----- Original Message ----
From: Toby Thain <toby@smartgames.ca>
To: David Masover <ninja@slaphack.com>
Cc: ReiserFS List <reiserfs-list@namesys.com>
Sent: Wednesday, May 30, 2007 9:42:01 AM
Subject: Re: Filesystem corruption

On 30-May-07, at 10:25 AM, David Masover wrote:

> On Tuesday 29 May 2007 07:36:13 Toby Thain wrote:
>
>>>> but you can't
>>>> mention using reiserfs in mixed company without someone accusing
>>>> you of
>>>> throwing your data away.
>>
>> People who repeat this rarely have any direct experience of Reiser;
>> they repeat what they've heard; like all myths and legends they are
>> transmitted orally rather than based on scientific observation.
>
> Well, there is one problem I vaguely remember that I don't think  
> has been
> addressed, I think it was one of those lets-put-it-off-till-v4  
> things. It was
> the fact that there are a limited number of inodes (or keys, or  
> whatever you
> call a unique file),

But does it cause data loss? One usually sees claims that "reiserfs  
ate my data", or "I heard reiserfs ate somebody's data", but without  
supplying a root cause - bad memory? powerfail? bad disk? etc.

> and no way of knowing how many you have left until your
> FS will suddenly, one day refuse to create another file.
>

> ... switching away from Reiser4
> means I no longer see random files (including stuff in, for  
> example, /sbin,
> that I hadn't touched in months) go up in smoke.

I only wish sanity had prevailed over  kernel inclusion, then we'd  
see it shaken down a lot quicker, like R3 was.

>
> Ordinarily I like to help debug things, but not at the risk of my  
> data. Maybe
> I'll try again later, and see if I can reproduce it in a VM or  
> somewhere
> safe...
>
> I do still follow the list, though, in case something interesting  
> happens.

Yeah, R4 is "something interesting". :) I still hope it gets finished...

--Toby

> It
> was fun while it lasted!

____________________________________________________________________________________Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow  

[-- Attachment #2: Type: text/html, Size: 4130 bytes --]

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-30 17:22 devsk
@ 2007-05-30 19:24 ` Toby Thain
  2007-05-30 20:03 ` David Masover
  1 sibling, 0 replies; 89+ messages in thread
From: Toby Thain @ 2007-05-30 19:24 UTC (permalink / raw)
  To: devsk; +Cc: David Masover, ReiserFS List


On 30-May-07, at 2:22 PM, devsk wrote:

> I think people just like to spread FUD without doing any analysis  
> of what really caused the FS corruption.

I fear you're right. OTOH, filesystem developers on this list (and  
others including ZFS list) tend to be extremely meticulous.

--Toby


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-30 16:42         ` Toby Thain
@ 2007-05-30 19:42           ` David Masover
  0 siblings, 0 replies; 89+ messages in thread
From: David Masover @ 2007-05-30 19:42 UTC (permalink / raw)
  To: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 827 bytes --]

On Wednesday 30 May 2007 11:42:01 Toby Thain wrote:

> But does it cause data loss? One usually sees claims that "reiserfs
> ate my data", or "I heard reiserfs ate somebody's data", but without
> supplying a root cause - bad memory? powerfail? bad disk? etc.

Power failure shouldn't kill a filesystem, and generally shouldn't eat data 
that was written to disk before the failure. (Although I could complain all 
day here about why corruption happens anyway when you do any kind of 
out-of-order operations...  I am looking forward to that Reiser4 transaction 
API, so we can finally get rid of the tmpfile+rename hack.)

But in any case, there were some kernels -- 2.4.16, I think? -- in which 
reiserfs was unstable and did corrupt easily. I believe that was tracked down 
to kernel bugs outside of reiserfs.

[-- Attachment #2: Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-30 17:22 devsk
  2007-05-30 19:24 ` Toby Thain
@ 2007-05-30 20:03 ` David Masover
  2007-05-31  0:11   ` Ingo Bormuth
  1 sibling, 1 reply; 89+ messages in thread
From: David Masover @ 2007-05-30 20:03 UTC (permalink / raw)
  To: devsk; +Cc: Toby Thain, ReiserFS List

[-- Attachment #1: Type: text/plain, Size: 3186 bytes --]

On Wednesday 30 May 2007 12:22:17 devsk wrote:

> I have used R4 for a year now and I have had to reset my PC,
> troubleshooting problems with vmware/mythtv/cisco vpn client/nvidia, so
> many times that its not even funny! And R4 didn't give me any problems even
> once. It boots right up, without any files lost and consistent FS as a
> subsequent livecd boot and fsck proved it everytime.

That happened to me for maybe a year or so, I'm not sure. Then, slowly, I 
started to get problems. The machine crashing due to some nvidia bug -- or 
even a reiser-specific oops or something -- then I'd have to fsck it, which 
would take an hour or more, then I'd boot, and apparently no problems.

Only, recently, these fsck-a-thons started happening more and more often, and 
I started to lose random files. They'd just be silently truncated to 0 bytes. 
And not files I was writing a lot -- I'm talking about things 
like /bin/mount.

Now, maybe it's an amd64-specific bug. Or (somehow) a dmraid-specific bug, or 
a dont_load_bitmap bug. (Who can blame me; without dont_load_bitmap, it takes 
at least 30 seconds, maybe a minute to mount.) Could even be, somehow, a 
Gentoo-specific bug. Could be a 350-gig-partition bug, or even a bug of the 
it-hates-me variety. (My server ran Reiser4 for awhile longer, with no 
problems, but I wasn't about to take chances there.)

But, I switched a friend over to Ubuntu, and he had the same kind of problems. 
In fact, he had them first (I thought it was his computer, for awhile).

Finally, we switched to stock Ubuntu kernels and XFS, me on dmraid, him on 
normal linux raid5 (md), and we now have no problems. It's even faster -- the 
biggest gain for Reiser4 was /usr/portage, which doesn't exist on Ubuntu.

> If I did that to ext 
> or xfs, I would have lost big time.

Well, I'm on XFS on my desktop now, and ext3 on my server. No problems at all 
so far. Also much faster, because my desktop now has a repacker (xfs_fsr).

> I hope people don't leave this good piece of code to rot!!

Me too, but you know, I can no longer afford to spend a few hours running fsck 
for no apparent reason. I no longer have a machine that can do anything but 
just work.

The killer feature of Reiser4, as implemented, is small file performance that 
makes ReiserFSv3 weep, and v3 makes XFS weep. All the other stuff we were 
promised is either planned for a later release (repacker, pseudofiles, 
transaction API) or barely working (cryptocompress).

And on just about any setup I work on today, small file performance is a small 
enough priority that even the slightest hint of instability is a 
deal-breaker. Enough people feel the same way that ext3 is still widely used. 
And if it's ever really crucial, there's reiserfs3.

So, you can blame it on my hardware, or on not getting kernel inclusion, or 
anything you want, but the only place I still use Reiser4 is on the 
gameserver at our LAN party, and we're thinking of moving that to something 
like ext3 or xfs, just so we don't need custom kernels. And after all, that's 
a gameserver, it's not like the filesystem is the bottleneck anyway.

[-- Attachment #2: Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-30 16:02         ` Vladimir V. Saveliev
@ 2007-05-30 20:06           ` David Masover
  0 siblings, 0 replies; 89+ messages in thread
From: David Masover @ 2007-05-30 20:06 UTC (permalink / raw)
  To: Vladimir V. Saveliev; +Cc: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 842 bytes --]

On Wednesday 30 May 2007 11:02:26 Vladimir V. Saveliev wrote:

> > Ordinarily I like to help debug things, but not at the risk of my data.
> > Maybe I'll try again later, and see if I can reproduce it in a VM or
> > somewhere safe...
>
> that would be great, thanks

Keep in mind, it's unlikely, given I don't have much resembling my original 
setup left around. And it was fairly random, under fairly normal usage 
patterns -- just I'd suddenly notice my movie had stopped playing, and I'd 
hit ctrl+alt+f8 and find a bunch of reiser4 error messages.

Is it at all likely that this is an amd64 bug? (The only two places I've seen 
it are on my box and my friend's, both amd64 on some sort of RAID.) If you 
don't have enough testers or hardware for amd64, I can try (again) to setup a 
working x86_64 VM for you to test on.

[-- Attachment #2: Type: application/pgp-signature, Size: 827 bytes --]

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
@ 2007-05-30 20:13 devsk
  0 siblings, 0 replies; 89+ messages in thread
From: devsk @ 2007-05-30 20:13 UTC (permalink / raw)
  To: David Masover; +Cc: Toby Thain, ReiserFS List

[-- Attachment #1: Type: text/plain, Size: 4131 bytes --]

David, Its funny how my setup is very similar to yours: gentoo, amd64, nvraid using dmraid. mount/mkfs is VERY fast (less than a second) here, and I don't use any specific mount options except noatime. My partition is about 16GB though, hosting '/' and /home.

what sources do you use? I use gentoo-sources (currently using 2.6.21-r2) with the latest stable patch (currently 2.6.21) from namesys, applied manually. Nothing else. I use suspend-to-ram (with a UPS) and the whole system is rock solid.

-devsk

----- Original Message ----
From: David Masover <ninja@slaphack.com>
To: devsk <funtoos@yahoo.com>
Cc: Toby Thain <toby@smartgames.ca>; ReiserFS List <reiserfs-list@namesys.com>
Sent: Wednesday, May 30, 2007 1:03:14 PM
Subject: Re: Filesystem corruption

On Wednesday 30 May 2007 12:22:17 devsk wrote:

> I have used R4 for a year now and I have had to reset my PC,
> troubleshooting problems with vmware/mythtv/cisco vpn client/nvidia, so
> many times that its not even funny! And R4 didn't give me any problems even
> once. It boots right up, without any files lost and consistent FS as a
> subsequent livecd boot and fsck proved it everytime.

That happened to me for maybe a year or so, I'm not sure. Then, slowly, I 
started to get problems. The machine crashing due to some nvidia bug -- or 
even a reiser-specific oops or something -- then I'd have to fsck it, which 
would take an hour or more, then I'd boot, and apparently no problems.

Only, recently, these fsck-a-thons started happening more and more often, and 
I started to lose random files. They'd just be silently truncated to 0 bytes. 
And not files I was writing a lot -- I'm talking about things 
like /bin/mount.

Now, maybe it's an amd64-specific bug. Or (somehow) a dmraid-specific bug, or 
a dont_load_bitmap bug. (Who can blame me; without dont_load_bitmap, it takes 
at least 30 seconds, maybe a minute to mount.) Could even be, somehow, a 
Gentoo-specific bug. Could be a 350-gig-partition bug, or even a bug of the 
it-hates-me variety. (My server ran Reiser4 for awhile longer, with no 
problems, but I wasn't about to take chances there.)

But, I switched a friend over to Ubuntu, and he had the same kind of problems. 
In fact, he had them first (I thought it was his computer, for awhile).

Finally, we switched to stock Ubuntu kernels and XFS, me on dmraid, him on 
normal linux raid5 (md), and we now have no problems. It's even faster -- the 
biggest gain for Reiser4 was /usr/portage, which doesn't exist on Ubuntu.

> If I did that to ext 
> or xfs, I would have lost big time.

Well, I'm on XFS on my desktop now, and ext3 on my server. No problems at all 
so far. Also much faster, because my desktop now has a repacker (xfs_fsr).

> I hope people don't leave this good piece of code to rot!!

Me too, but you know, I can no longer afford to spend a few hours running fsck 
for no apparent reason. I no longer have a machine that can do anything but 
just work.

The killer feature of Reiser4, as implemented, is small file performance that 
makes ReiserFSv3 weep, and v3 makes XFS weep. All the other stuff we were 
promised is either planned for a later release (repacker, pseudofiles, 
transaction API) or barely working (cryptocompress).

And on just about any setup I work on today, small file performance is a small 
enough priority that even the slightest hint of instability is a 
deal-breaker. Enough people feel the same way that ext3 is still widely used. 
And if it's ever really crucial, there's reiserfs3.

So, you can blame it on my hardware, or on not getting kernel inclusion, or 
anything you want, but the only place I still use Reiser4 is on the 
gameserver at our LAN party, and we're thinking of moving that to something 
like ext3 or xfs, just so we don't need custom kernels. And after all, that's 
a gameserver, it's not like the filesystem is the bottleneck anyway.

____________________________________________________________________________________Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online.
http://smallbusiness.yahoo.com/webhosting 

[-- Attachment #2: Type: text/html, Size: 4744 bytes --]

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-30 20:03 ` David Masover
@ 2007-05-31  0:11   ` Ingo Bormuth
  2007-06-02 23:10     ` Edward Shishkin
  0 siblings, 1 reply; 89+ messages in thread
From: Ingo Bormuth @ 2007-05-31  0:11 UTC (permalink / raw)
  To: reiserfs-list

On 2007-05-30 15:03, David Masover wrote:

> Only, recently, these fsck-a-thons started happening more and more often, and 
> I started to lose random files. They'd just be silently truncated to 0 bytes. 
> And not files I was writing a lot -- I'm talking about things 
> like /bin/mount.

Hm, same here. I lost /bin/sleep several times. I have a little script
printing status messages to the screen, sleeping two seconds and print
again - you name it. The probability that /bin/sleep is accessed at the
same time the system crashes is quite high (this is _no_ write access,
the system is even mounted noatime).

How could pure execution of a file cause corruption of the file itself?
Any idea ?

Apart from that single file, I never had any serious problems with
reiser4 on three busy systems for years - fsck.reiser4 works like charme.

-- 
Ingo Bormuth, voicebox & fax: +49-(0)-12125-10226517
public key 86326EC9, http://ibormuth.efil.de/contact

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-05-31  0:11   ` Ingo Bormuth
@ 2007-06-02 23:10     ` Edward Shishkin
  2007-06-04  2:55       ` Ingo Bormuth
  0 siblings, 1 reply; 89+ messages in thread
From: Edward Shishkin @ 2007-06-02 23:10 UTC (permalink / raw)
  To: Ingo Bormuth; +Cc: reiserfs-list

Ingo Bormuth wrote:

>On 2007-05-30 15:03, David Masover wrote:
>
>  
>
>>Only, recently, these fsck-a-thons started happening more and more often, and 
>>I started to lose random files. They'd just be silently truncated to 0 bytes. 
>>And not files I was writing a lot -- I'm talking about things 
>>like /bin/mount.
>>    
>>
>
>Hm, same here. I lost /bin/sleep several times.
>

Would you please describe the problem in more details?
What kernel version? What does "I lost /bin/sleep" mean?
Does it mean that:
1. /bin/sleep was truncated to 0 bytes, i.e. "ls -l /bin/sleep" shows  
something like
-rwxr-xr-x  1 root root 0 2005-04-20 18:32 /bin/sleep
2. /bin/sleep disappeared ("ls -l /bin" doesn't show this file)
3. /bin/sleep exists, but filled by zeros
etc...

Thanks,
Edward.

> I have a little script
>printing status messages to the screen, sleeping two seconds and print
>again - you name it. The probability that /bin/sleep is accessed at the
>same time the system crashes is quite high (this is _no_ write access,
>the system is even mounted noatime).
>
>How could pure execution of a file cause corruption of the file itself?
>Any idea ?
>
>Apart from that single file, I never had any serious problems with
>reiser4 on three busy systems for years - fsck.reiser4 works like charme.
>
>
>  
>


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-06-02 23:10     ` Edward Shishkin
@ 2007-06-04  2:55       ` Ingo Bormuth
  2007-06-04  9:41         ` Edward Shishkin
  0 siblings, 1 reply; 89+ messages in thread
From: Ingo Bormuth @ 2007-06-04  2:55 UTC (permalink / raw)
  To: reiserfs-list

On 2007-06-03 03:10, Edward Shishkin wrote:
> Ingo Bormuth wrote:
> >Hm, same here. I lost /bin/sleep several times.

> Would you please describe the problem in more details?
> What kernel version? What does "I lost /bin/sleep" mean?
> Does it mean that:
> 1. /bin/sleep was truncated to 0 bytes, i.e. "ls -l /bin/sleep" shows  
> something like
> -rwxr-xr-x  1 root root 0 2005-04-20 18:32 /bin/sleep
> 2. /bin/sleep disappeared ("ls -l /bin" doesn't show this file)
> 3. /bin/sleep exists, but filled by zeros
> etc...

The file was removed by 'fsck.reiser4 --fix' which emmitted a
message about deleting a corrupted file. (Case 2 in your list).

This always happened after a system freeze or power loss.
The machine freezes quite frequently - I think it has a DMA problem.
Nevertheless I don't see how a file that was not written to can
get corrupted.

Current kernel is 2.6.20.5 (the reiser4 patch I submitted to this 
list on may 2nd).

Root is mounted rw,noatime,nodiratime,onerror=remount-ro,tmgr.atom_max_age=60

Hope that helps.



-- 
Ingo Bormuth, voicebox & fax: +49-(0)-12125-10226517
public key 86326EC9, http://ibormuth.efil.de/contact


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-06-04  2:55       ` Ingo Bormuth
@ 2007-06-04  9:41         ` Edward Shishkin
  2007-06-05 23:20           ` Ingo Bormuth
  0 siblings, 1 reply; 89+ messages in thread
From: Edward Shishkin @ 2007-06-04  9:41 UTC (permalink / raw)
  To: Ingo Bormuth; +Cc: reiserfs-list

Ingo Bormuth wrote:

>On 2007-06-03 03:10, Edward Shishkin wrote:
>  
>
>>Ingo Bormuth wrote:
>>    
>>
>>>Hm, same here. I lost /bin/sleep several times.
>>>      
>>>
>
>  
>
>>Would you please describe the problem in more details?
>>What kernel version? What does "I lost /bin/sleep" mean?
>>Does it mean that:
>>1. /bin/sleep was truncated to 0 bytes, i.e. "ls -l /bin/sleep" shows  
>>something like
>>-rwxr-xr-x  1 root root 0 2005-04-20 18:32 /bin/sleep
>>2. /bin/sleep disappeared ("ls -l /bin" doesn't show this file)
>>3. /bin/sleep exists, but filled by zeros
>>etc...
>>    
>>
>
>The file was removed by 'fsck.reiser4 --fix' which emmitted a
>message about deleting a corrupted file. (Case 2 in your list).
>
>This always happened after a system freeze or power loss.
>The machine freezes quite frequently - I think it has a DMA problem.
>Nevertheless I don't see how a file that was not written to can
>get corrupted.
>
>  
>

When performing mapping read (needed for execution, etc) reiser4 
converts small
files from tails to extents and back (your /bin/sleep is less then 4 * 
blocksize, right?)

>Current kernel is 2.6.20.5 (the reiser4 patch I submitted to this 
>list on may 2nd).
>  
>

Please, rebuild your kernel with the official patch
http://ftp.namesys.com/pub/reiser4-for-2.6/2.6.20/
It contains a bugfix related to tail conversion (races when acquiring 
exclusive access).

Please, report, if such data loss still takes place after upgrade.

Thanks,
Edward.

>Root is mounted rw,noatime,nodiratime,onerror=remount-ro,tmgr.atom_max_age=60
>
>Hope that helps.
>
>
>
>  
>


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-06-04  9:41         ` Edward Shishkin
@ 2007-06-05 23:20           ` Ingo Bormuth
  0 siblings, 0 replies; 89+ messages in thread
From: Ingo Bormuth @ 2007-06-05 23:20 UTC (permalink / raw)
  To: reiserfs-list

On 2007-06-04 13:41, Edward Shishkin wrote:

> When performing mapping read (needed for execution, etc) reiser4
> converts small files from tails to extents and back (your /bin/sleep
> is less then 4 * blocksize, right?)

Yes, it's 15k. 

The conversion is done on disk, even when mounted read only?  I'd like
to see the logic in the code. In case you just know by heart, it' would
be nice if you could give me a little hint where to start at.

> Please, rebuild your kernel with the official patch
> [...]
> Please, report, if such data loss still takes place after upgrade.

I'll keep you informed ...

Thanks.


-- 
Ingo Bormuth, voicebox & fax: +49-(0)-12125-10226517
public key 86326EC9, http://ibormuth.efil.de/contact


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
@ 2007-06-06  3:10 Xu CanHao
  2007-06-06 12:16 ` Ingo Bormuth
  0 siblings, 1 reply; 89+ messages in thread
From: Xu CanHao @ 2007-06-06  3:10 UTC (permalink / raw)
  To: reiserfs-list

So maybe I'd suggest anybody take the _official_ reiser4 patch-set and
_vanilla_ kernel source, these things should provide the maximum
stability. My root filesystem with reiser4 never loses data.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem corruption
  2007-06-06  3:10 Filesystem corruption Xu CanHao
@ 2007-06-06 12:16 ` Ingo Bormuth
  0 siblings, 0 replies; 89+ messages in thread
From: Ingo Bormuth @ 2007-06-06 12:16 UTC (permalink / raw)
  To: reiserfs-list

On 2007-06-06 11:10, Xu CanHao wrote:
> So maybe I'd suggest anybody take the _official_ reiser4 patch-set and
> _vanilla_ kernel source, these things should provide the maximum
> stability. My root filesystem with reiser4 never loses data.

I fully agree, as long as there _exists_ a current official patch.
That was not always the case in the recent past. No wonder people 
started to get their own hands dirty from time to time. 

Btw: It's also fun to read / mess with the code ...


-- 
Ingo Bormuth, voicebox & fax: +49-(0)-12125-10226517
public key 86326EC9, http://ibormuth.efil.de/contact


^ permalink raw reply	[flat|nested] 89+ messages in thread

* filesystem corruption
@ 2011-01-03  1:58 Patrick H.
  2011-01-03  3:16 ` Neil Brown
  0 siblings, 1 reply; 89+ messages in thread
From: Patrick H. @ 2011-01-03  1:58 UTC (permalink / raw)
  To: linux-raid

I've been trying to track down an issue for a while now and from digging 
around it appears (though not certain) the issue lies with the md raid 
device.
Whats happening is that after improperly shutting down a raid-5 array, 
upon reassembly, a few files on the filesystem will be corrupt. I dont 
think this is normal filesystem corruption from files being modified 
during the shut down because some of the files that end up corrupted are 
several hours old.

The exact details of what I'm doing:
I have a 3-node test cluster I'm doing integrity testing on. Each node 
in the cluster is exporting a couple of disks via ATAoE.
I have the first disk of all 3 nodes in a raid-1 that is holding the 
journal data for the ext3 filesystem. The array is running with an 
internal bitmap as well.
The second disk of all 3 nodes is a raid-5 array holding the ext3 
filesystem itself. This is also running with an internal bitmap.
The ext3 filesystem is mounted with 'data=journal,barrier=1,sync'.
When I power down the node which is actively running both md raid 
devices, another node in the cluster takes over and starts both arrays 
up (in degraded mode of course).
Once the original node comes back up, the new master re-adds its disks 
back into the raid arrays and re-syncs them.
During all this, the filesystem is exported through nfs (nfs also has 
sync turned on) and a client is randomly creating, removing, and 
verifying checksums on the files in the filesystem (nfs is hard mounted 
so operations always retry). The client script averages about 30 
creations/s, 30 deletes/s, and 30 checksums/s.

So, as stated above, every now and then (1 in 50 chance or so), when the 
master is hard-rebooted, the client will detect a few files with invalid 
md5 checksums. These files could be hours old so they were not being 
actively modified.
Another key point that leads me to believe its a md raid issue is that 
before I had the ext3 journal running internally on the raid-5 array 
(part of the filesystem itself). When I did this, there would 
occasionally be massive corruption. As in file modification times in the 
future, lots of corrupt files, thousands of files put in the 
'lost+found' dir upon fsck, etc. After I put it on a separate raid-1, 
there are no more invalid modification times, there hasnt been a single 
file added to 'lost+found', and the number of corrupt files dropped 
significantly. This would seem to indicate that the journal was getting 
corrupted, and when it was played back, it went horribly wrong.

So it would seem there's something wrong with the raid-5 array, but I 
dont know what it could be. Any ideas or input would be much 
appreciated. I can modify the clustering scripts to obtain whatever 
information is needed when they start the arrays.

-Patrick

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2011-01-03  1:58 filesystem corruption Patrick H.
@ 2011-01-03  3:16 ` Neil Brown
       [not found]   ` <4D214B5C.3010103@feystorm.net>
                     ` (2 more replies)
  0 siblings, 3 replies; 89+ messages in thread
From: Neil Brown @ 2011-01-03  3:16 UTC (permalink / raw)
  To: Patrick H.; +Cc: linux-raid

On Sun, 02 Jan 2011 18:58:34 -0700 "Patrick H." <linux-raid@feystorm.net>
wrote:

> I've been trying to track down an issue for a while now and from digging 
> around it appears (though not certain) the issue lies with the md raid 
> device.
> Whats happening is that after improperly shutting down a raid-5 array, 
> upon reassembly, a few files on the filesystem will be corrupt. I dont 
> think this is normal filesystem corruption from files being modified 
> during the shut down because some of the files that end up corrupted are 
> several hours old.
> 
> The exact details of what I'm doing:
> I have a 3-node test cluster I'm doing integrity testing on. Each node 
> in the cluster is exporting a couple of disks via ATAoE.
> I have the first disk of all 3 nodes in a raid-1 that is holding the 
> journal data for the ext3 filesystem. The array is running with an 
> internal bitmap as well.
> The second disk of all 3 nodes is a raid-5 array holding the ext3 
> filesystem itself. This is also running with an internal bitmap.
> The ext3 filesystem is mounted with 'data=journal,barrier=1,sync'.
> When I power down the node which is actively running both md raid 
> devices, another node in the cluster takes over and starts both arrays 
> up (in degraded mode of course).
> Once the original node comes back up, the new master re-adds its disks 
> back into the raid arrays and re-syncs them.
> During all this, the filesystem is exported through nfs (nfs also has 
> sync turned on) and a client is randomly creating, removing, and 
> verifying checksums on the files in the filesystem (nfs is hard mounted 
> so operations always retry). The client script averages about 30 
> creations/s, 30 deletes/s, and 30 checksums/s.
> 
> So, as stated above, every now and then (1 in 50 chance or so), when the 
> master is hard-rebooted, the client will detect a few files with invalid 
> md5 checksums. These files could be hours old so they were not being 
> actively modified.
> Another key point that leads me to believe its a md raid issue is that 
> before I had the ext3 journal running internally on the raid-5 array 
> (part of the filesystem itself). When I did this, there would 
> occasionally be massive corruption. As in file modification times in the 
> future, lots of corrupt files, thousands of files put in the 
> 'lost+found' dir upon fsck, etc. After I put it on a separate raid-1, 
> there are no more invalid modification times, there hasnt been a single 
> file added to 'lost+found', and the number of corrupt files dropped 
> significantly. This would seem to indicate that the journal was getting 
> corrupted, and when it was played back, it went horribly wrong.
> 
> So it would seem there's something wrong with the raid-5 array, but I 
> dont know what it could be. Any ideas or input would be much 
> appreciated. I can modify the clustering scripts to obtain whatever 
> information is needed when they start the arrays.

What you are doing cannot work reliably.

If a RAID5 suffers an unclean shutdown and is restarted without a full
complement of devices, then it can corrupt data that has not been changed
recently, just as you are seeing.
This is why mdadm will not assemble that array unless you provide the --force
flag which essentially says "I know what I am doing and accept the risk".

When md needs to update a block in your 3-drive RAID5, it will read the other
block in the same stripe (if that isn't in the cache or being written at the
same time) and then write out the data block (or blocks) and the newly
computed parity block.

If you crash after one of those writes has completed, but before all of the
writes have completed, then the parity block will not match the data blocks
on disk.

When you re-assemble the array with one device missing, md will compute the
data that was on the device using the other data block and the parity block.
As the parity and data blocks could be inconsistent, the result could easily
be wrong.

With RAID1 there is no similar problem.  When you read after a crash you will
always get "correct" data.  It maybe from before the last write that was
attempted, or after, but if the data was not written recently you will read
exactly the right data.

This is why the situation improved substantially when you moved the journal
to RAID1.

The get full improvement, you need to move the data to RAID1 (or RAID10) as
well.

NeilBrown


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
       [not found]   ` <4D214B5C.3010103@feystorm.net>
@ 2011-01-03  4:56     ` Neil Brown
  2011-01-03  5:05       ` Patrick H.
  0 siblings, 1 reply; 89+ messages in thread
From: Neil Brown @ 2011-01-03  4:56 UTC (permalink / raw)
  To: Patrick H.; +Cc: linux-raid

On Sun, 02 Jan 2011 21:06:52 -0700 "Patrick H." <linux-raid@feystorm.net>
wrote:

> That makes sense assuming that MD acknowleges the write once the data is 
> written to the data disks but not necessarily the parity disk, which is 
> what I gather you were saying is what happens. Is there any option that 
> can change the behavior so that md wont ack the write until its been 
> committed to all disks (I'm guessing no since you didnt mention it)?
> Also does raid6 suffer this problem? Is it smart enough to use both 
> parity disks when calculating replacement, or will it just use one?
> 

md/raid5 doesn't acknowledge the write until both the data and the parity
have been written.  But that doesn't make any difference.
If you schedule a number of interdependent writes (data and parity) and then
allow some to complete but not all, then you have inconsistency.
Recovery from losing a single device requires consistency of parity and data.

RAID6 suffers equally from this problem.  Even if it used both parity disks
to recover (which it doesn't) how would that help?  It would then have two
possible value for the data and no way to know which was correct, and every
possibility that both are incorrect.  This would happen if a single data
block was successfully written, but neither parity blocks were.

The only way you can avoid this 'write hole' is by journalling in multiples
of whole stripes.  No current filesystems that I know of can do this as they
journal in blocks, and the maximum block size is less than the minimum stripe
size.  So you would need journalling integrated with md/raid, or you would
need a filesystem which was designed to understand this problem and write
whole stripes at a time, always to an area of the device which did not
contain live data.

NeilBrown

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2011-01-03  4:56     ` Neil Brown
@ 2011-01-03  5:05       ` Patrick H.
  2011-01-04  5:33         ` NeilBrown
  0 siblings, 1 reply; 89+ messages in thread
From: Patrick H. @ 2011-01-03  5:05 UTC (permalink / raw)
  To: linux-raid

Sent: Sun Jan 02 2011 21:56:30 GMT-0700 (Mountain Standard Time)
From: Neil Brown <neilb@suse.de>
To: Patrick H. <linux-raid@feystorm.net> linux-raid@vger.kernel.org
Subject: Re: filesystem corruption
> On Sun, 02 Jan 2011 21:06:52 -0700 "Patrick H." <linux-raid@feystorm.net>
> wrote:
>
>
>   
>> That makes sense assuming that MD acknowleges the write once the data is 
>> written to the data disks but not necessarily the parity disk, which is 
>> what I gather you were saying is what happens. Is there any option that 
>> can change the behavior so that md wont ack the write until its been 
>> committed to all disks (I'm guessing no since you didnt mention it)?
>> Also does raid6 suffer this problem? Is it smart enough to use both 
>> parity disks when calculating replacement, or will it just use one?
>>
>>     
>
> md/raid5 doesn't acknowledge the write until both the data and the parity
> have been written.  But that doesn't make any difference.
> If you schedule a number of interdependent writes (data and parity) and then
> allow some to complete but not all, then you have inconsistency.
> Recovery from losing a single device requires consistency of parity and data.
>
> RAID6 suffers equally from this problem.  Even if it used both parity disks
> to recover (which it doesn't) how would that help?  It would then have two
> possible value for the data and no way to know which was correct, and every
> possibility that both are incorrect.  This would happen if a single data
> block was successfully written, but neither parity blocks were.
>
> The only way you can avoid this 'write hole' is by journalling in multiples
> of whole stripes.  No current filesystems that I know of can do this as they
> journal in blocks, and the maximum block size is less than the minimum stripe
> size.  So you would need journalling integrated with md/raid, or you would
> need a filesystem which was designed to understand this problem and write
> whole stripes at a time, always to an area of the device which did not
> contain live data.
>
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

Ok, thanks for the info.
I think I'll solve it by creating 2 dedicated hosts for running the 
array, but not actually export any disks themselves. This way if a 
master dies, all the raid disks are still there and can be picked up by 
the other master.

-Patrick

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2011-01-03  5:05       ` Patrick H.
@ 2011-01-04  5:33         ` NeilBrown
  2011-01-04  7:50           ` Patrick H.
  0 siblings, 1 reply; 89+ messages in thread
From: NeilBrown @ 2011-01-04  5:33 UTC (permalink / raw)
  To: Patrick H.; +Cc: linux-raid

On Sun, 02 Jan 2011 22:05:06 -0700 "Patrick H." <linux-raid@feystorm.net>
wrote:

> Ok, thanks for the info.
> I think I'll solve it by creating 2 dedicated hosts for running the 
> array, but not actually export any disks themselves. This way if a 
> master dies, all the raid disks are still there and can be picked up by 
> the other master.
> 

That sounds like it should work OK.

NeilBrown


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2011-01-04  5:33         ` NeilBrown
@ 2011-01-04  7:50           ` Patrick H.
  2011-01-04 17:31             ` Patrick H.
  0 siblings, 1 reply; 89+ messages in thread
From: Patrick H. @ 2011-01-04  7:50 UTC (permalink / raw)
  To: linux-raid

Sent: Mon Jan 03 2011 22:33:24 GMT-0700 (Mountain Standard Time)
From: NeilBrown <neilb@suse.de>
To: Patrick H. <linux-raid@feystorm.net> linux-raid@vger.kernel.org
Subject: Re: filesystem corruption
> On Sun, 02 Jan 2011 22:05:06 -0700 "Patrick H." <linux-raid@feystorm.net>
> wrote:
>
>   
>> Ok, thanks for the info.
>> I think I'll solve it by creating 2 dedicated hosts for running the 
>> array, but not actually export any disks themselves. This way if a 
>> master dies, all the raid disks are still there and can be picked up by 
>> the other master.
>>
>>     
>
> That sounds like it should work OK.
>
> NeilBrown
>   
Well, it didnt solve it. if I power the entire cluster down and start it 
back up, I get corruption, on old files that werent being modified 
still. If I power off just a single node, it seems to handle it fine, 
just not the whole cluster.

It also seems to happen fairly frequently now. In the previous setup it 
was probably 1 in 50 failures that there was corruption. Now its pretty 
much a guarantee there will be corruption if I kill it.
On the last failure I did, when it came back up, it re-assembled the 
entire raid-5 array with all disks active and none of them needing any 
sort of re-sync. The disk controller is battery backed, so even if it 
was re-ordering the writes, the battery should ensure that it all gets 
committed.

Any other ideas?

-Patrick

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2011-01-04  7:50           ` Patrick H.
@ 2011-01-04 17:31             ` Patrick H.
  2011-01-05  1:22               ` Patrick H.
  0 siblings, 1 reply; 89+ messages in thread
From: Patrick H. @ 2011-01-04 17:31 UTC (permalink / raw)
  To: linux-raid

Sent: Tue Jan 04 2011 00:50:39 GMT-0700 (Mountain Standard Time)
From: Patrick H. <linux-raid@feystorm.net>
To: linux-raid@vger.kernel.org
Subject: Re: filesystem corruption
> Sent: Mon Jan 03 2011 22:33:24 GMT-0700 (Mountain Standard Time)
> From: NeilBrown <neilb@suse.de>
> To: Patrick H. <linux-raid@feystorm.net> linux-raid@vger.kernel.org
> Subject: Re: filesystem corruption
>> On Sun, 02 Jan 2011 22:05:06 -0700 "Patrick H." 
>> <linux-raid@feystorm.net>
>> wrote:
>>
>>  
>>> Ok, thanks for the info.
>>> I think I'll solve it by creating 2 dedicated hosts for running the 
>>> array, but not actually export any disks themselves. This way if a 
>>> master dies, all the raid disks are still there and can be picked up 
>>> by the other master.
>>>
>>>     
>>
>> That sounds like it should work OK.
>>
>> NeilBrown
>>   
> Well, it didnt solve it. if I power the entire cluster down and start 
> it back up, I get corruption, on old files that werent being modified 
> still. If I power off just a single node, it seems to handle it fine, 
> just not the whole cluster.
>
> It also seems to happen fairly frequently now. In the previous setup 
> it was probably 1 in 50 failures that there was corruption. Now its 
> pretty much a guarantee there will be corruption if I kill it.
> On the last failure I did, when it came back up, it re-assembled the 
> entire raid-5 array with all disks active and none of them needing any 
> sort of re-sync. The disk controller is battery backed, so even if it 
> was re-ordering the writes, the battery should ensure that it all gets 
> committed.
>
> Any other ideas?
>
> -Patrick
Here is some info from my most recent failure simulation. This one 
resulted in about 50 corrupt files, another 40 or so that cant even be 
opened, and one stale nfs file handle.
I had the cluster script dump out a bunch of info before and after 
assembling the array.

= = = = = = = = = =
# mdadm -E /dev/etherd/e1.1p1
/dev/etherd/e1.1p1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c
Name : dm01:126  (local to host dm01)
Creation Time : Tue Jan  4 04:45:50 2011
Raid Level : raid5
Raid Devices : 3

Avail Dev Size : 2119520 (1035.10 MiB 1085.19 MB)
Array Size : 4238848 (2.02 GiB 2.17 GB)
Used Dev Size : 2119424 (1035.05 MiB 1085.15 MB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : a20adb76:af00f276:5be79a36:b4ff3a8b

Internal Bitmap : 2 sectors from superblock
Update Time : Tue Jan  4 16:45:56 2011
Checksum : 361041f6 - correct
Events : 486

Layout : left-symmetric
Chunk Size : 64K

Device Role : Active device 0
Array State : AAA ('A' == active, '.' == missing)



# mdadm -X /dev/etherd/e1.1p1
Filename : /dev/etherd/e1.1p1
Magic : 6d746962
Version : 4
UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c
Events : 486
Events Cleared : 486
State : OK
Chunksize : 64 KB
Daemon : 5s flush period
Write Mode : Normal
Sync Size : 1059712 (1035.05 MiB 1085.15 MB)
Bitmap : 16558 bits (chunks), 189 dirty (1.1%)
= = = = = = = = = =


= = = = = = = = = =
# mdadm -E /dev/etherd/e2.1p1
/dev/etherd/e2.1p1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c
Name : dm01:126  (local to host dm01)
Creation Time : Tue Jan  4 04:45:50 2011
Raid Level : raid5
Raid Devices : 3

Avail Dev Size : 2119520 (1035.10 MiB 1085.19 MB)
Array Size : 4238848 (2.02 GiB 2.17 GB)
Used Dev Size : 2119424 (1035.05 MiB 1085.15 MB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : f9205ace:0796ecf5:2cca363c:c2873816

Internal Bitmap : 2 sectors from superblock
Update Time : Tue Jan  4 16:45:56 2011
Checksum : 9d235885 - correct
Events : 486

Layout : left-symmetric
Chunk Size : 64K

Device Role : Active device 1
Array State : AAA ('A' == active, '.' == missing)



# mdadm -X /dev/etherd/e2.1p1
Filename : /dev/etherd/e2.1p1
Magic : 6d746962
Version : 4
UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c
Events : 486
Events Cleared : 486
State : OK
Chunksize : 64 KB
Daemon : 5s flush period
Write Mode : Normal
Sync Size : 1059712 (1035.05 MiB 1085.15 MB)
Bitmap : 16558 bits (chunks), 189 dirty (1.1%)
= = = = = = = = = =


= = = = = = = = = =
# mdadm -E /dev/etherd/e3.1p1
/dev/etherd/e3.1p1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c
Name : dm01:126  (local to host dm01)
Creation Time : Tue Jan  4 04:45:50 2011
Raid Level : raid5
Raid Devices : 3

Avail Dev Size : 2119520 (1035.10 MiB 1085.19 MB)
Array Size : 4238848 (2.02 GiB 2.17 GB)
Used Dev Size : 2119424 (1035.05 MiB 1085.15 MB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 7f90958d:22de5c08:88750ecb:5f376058

Internal Bitmap : 2 sectors from superblock
Update Time : Tue Jan  4 16:46:13 2011
Checksum : 3fce6b33 - correct
Events : 487

Layout : left-symmetric
Chunk Size : 64K

Device Role : Active device 2
Array State : AAA ('A' == active, '.' == missing)



# mdadm -X /dev/etherd/e3.1p1
Filename : /dev/etherd/e3.1p1
Magic : 6d746962
Version : 4
UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c
Events : 487
Events Cleared : 486
State : OK
Chunksize : 64 KB
Daemon : 5s flush period
Write Mode : Normal
Sync Size : 1059712 (1035.05 MiB 1085.15 MB)
Bitmap : 16558 bits (chunks), 249 dirty (1.5%)
= = = = = = = = = =



- - - - - - - - - - -
# mdadm -D /dev/md/fs01
/dev/md/fs01:
Version : 1.2
Creation Time : Tue Jan  4 04:45:50 2011
Raid Level : raid5
Array Size : 2119424 (2.02 GiB 2.17 GB)
Used Dev Size : 1059712 (1035.05 MiB 1085.15 MB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Tue Jan  4 16:46:13 2011
State : active, resyncing
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 64K

Rebuild Status : 1% complete

Name : dm01:126  (local to host dm01)
UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c
Events : 486

Number   Major   Minor   RaidDevice State
0     152      273        0      active sync   /dev/block/152:273
1     152      529        1      active sync   /dev/block/152:529
3     152      785        2      active sync   /dev/block/152:785
- - - - - - - - - - -



The old method *never* resulted in this much corruption, and never 
generated stale nfs file handles. Why is this so much worse now when it 
was supposed to be better?

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2011-01-04 17:31             ` Patrick H.
@ 2011-01-05  1:22               ` Patrick H.
  0 siblings, 0 replies; 89+ messages in thread
From: Patrick H. @ 2011-01-05  1:22 UTC (permalink / raw)
  To: linux-raid

I think I may have found something on this. I was messing around with it 
more (switched to iSCSI instead of ATAoE), and managed to create a 
situation where 2 of the 3 raid-5 disks had failed, yet the MD device 
was still active, and it was letting me use it. This is bad.

mdadm -D /dev/md/fs01
/dev/md/fs01:
        Version : 1.2
  Creation Time : Tue Jan  4 04:45:50 2011
     Raid Level : raid5
     Array Size : 2119424 (2.02 GiB 2.17 GB)
  Used Dev Size : 1059712 (1035.05 MiB 1085.15 MB)
   Raid Devices : 3
  Total Devices : 1
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Jan  4 22:58:44 2011
          State : active, FAILED
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : dm01:125  (local to host dm01)
           UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c
         Events : 2980

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       80        1      active sync   /dev/sdf
       2       0        0        2      removed


Notice, there's only one disk in the array, the other 2 failed and were 
removed. Yet state is still saying active. The filesystem is still up 
and running, and I can even read and write to it, though it spits out 
tons of IO errors.
I then stopped the array and tried to reassemble it, and now it wont 
reassemble.


# mdadm -A /dev/md/fs01 --uuid 9cd9ae9b:39454845:62f2b08d:a4a1ac6c -vv
mdadm: looking for devices for /dev/md/fs01
mdadm: no recogniseable superblock on /dev/md/fs01_journal
mdadm: /dev/md/fs01_journal has wrong uuid.
mdadm: cannot open device /dev/sdg: Device or resource busy
mdadm: /dev/sdg has wrong uuid.
mdadm: cannot open device /dev/sdd: Device or resource busy
mdadm: /dev/sdd has wrong uuid.
mdadm: cannot open device /dev/sdb: Device or resource busy
mdadm: /dev/sdb has wrong uuid.
mdadm: cannot open device /dev/sda2: Device or resource busy
mdadm: /dev/sda2 has wrong uuid.
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: /dev/sda1 has wrong uuid.
mdadm: cannot open device /dev/sda: Device or resource busy
mdadm: /dev/sda has wrong uuid.
mdadm: /dev/sde is identified as a member of /dev/md/fs01, slot 2.
mdadm: /dev/sdc is identified as a member of /dev/md/fs01, slot 0.
mdadm: /dev/sdf is identified as a member of /dev/md/fs01, slot 1.
mdadm: added /dev/sdc to /dev/md/fs01 as 0
mdadm: added /dev/sde to /dev/md/fs01 as 2
mdadm: added /dev/sdf to /dev/md/fs01 as 1
mdadm: /dev/md/fs01 assembled from 1 drive - not enough to start the array.


# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md125 : inactive sdf[1](S) sde[3](S) sdc[0](S)
      3179280 blocks super 1.2

md126 : active raid1 sdg[0] sdb[2] sdd[1]
      265172 blocks super 1.2 [3/3] [UUU]
      bitmap: 0/3 pages [0KB], 64KB chunk

unused devices: <none>


md126 is the ext3 journal for the filesystem
Below is mdadm info on all the devices in the array

# mdadm -E /dev/sdc
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c
           Name : dm01:125  (local to host dm01)
  Creation Time : Tue Jan  4 04:45:50 2011
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 2119520 (1035.10 MiB 1085.19 MB)
     Array Size : 4238848 (2.02 GiB 2.17 GB)
  Used Dev Size : 2119424 (1035.05 MiB 1085.15 MB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : a20adb76:af00f276:5be79a36:b4ff3a8b

Internal Bitmap : 2 sectors from superblock
    Update Time : Tue Jan  4 22:44:20 2011
       Checksum : 350c988f - correct
         Events : 1150

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AA. ('A' == active, '.' == missing)

# mdadm -X /dev/sdc
        Filename : /dev/sdc
           Magic : 6d746962
         Version : 4
            UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c
          Events : 1150
  Events Cleared : 1144
           State : OK
       Chunksize : 64 KB
          Daemon : 5s flush period
      Write Mode : Normal
       Sync Size : 1059712 (1035.05 MiB 1085.15 MB)
          Bitmap : 16558 bits (chunks), 93 dirty (0.6%)

# mdadm -E /dev/sdf
/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c
           Name : dm01:125  (local to host dm01)
  Creation Time : Tue Jan  4 04:45:50 2011
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 2119520 (1035.10 MiB 1085.19 MB)
     Array Size : 4238848 (2.02 GiB 2.17 GB)
  Used Dev Size : 2119424 (1035.05 MiB 1085.15 MB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : f9205ace:0796ecf5:2cca363c:c2873816

Internal Bitmap : 2 sectors from superblock
    Update Time : Tue Jan  4 23:00:49 2011
       Checksum : 9c20ba71 - correct
         Events : 3062

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : .A. ('A' == active, '.' == missing)

# mdadm -X /dev/sdf
        Filename : /dev/sdf
           Magic : 6d746962
         Version : 4
            UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c
          Events : 3062
  Events Cleared : 1144
           State : OK
       Chunksize : 64 KB
          Daemon : 5s flush period
      Write Mode : Normal
       Sync Size : 1059712 (1035.05 MiB 1085.15 MB)
          Bitmap : 16558 bits (chunks), 150 dirty (0.9%)

# mdadm -E /dev/sde
/dev/sde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c
           Name : dm01:125  (local to host dm01)
  Creation Time : Tue Jan  4 04:45:50 2011
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 2119520 (1035.10 MiB 1085.19 MB)
     Array Size : 4238848 (2.02 GiB 2.17 GB)
  Used Dev Size : 2119424 (1035.05 MiB 1085.15 MB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 7f90958d:22de5c08:88750ecb:5f376058

Internal Bitmap : 2 sectors from superblock
    Update Time : Tue Jan  4 22:43:53 2011
       Checksum : 3ecec198 - correct
         Events : 1144

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : AAA ('A' == active, '.' == missing)

# mdadm -X /dev/sde
        Filename : /dev/sde
           Magic : 6d746962
         Version : 4
            UUID : 9cd9ae9b:39454845:62f2b08d:a4a1ac6c
          Events : 1144
  Events Cleared : 1143
           State : OK
       Chunksize : 64 KB
          Daemon : 5s flush period
      Write Mode : Normal
       Sync Size : 1059712 (1035.05 MiB 1085.15 MB)
          Bitmap : 16558 bits (chunks), 38 dirty (0.2%)








^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2011-01-03  3:16 ` Neil Brown
       [not found]   ` <4D214B5C.3010103@feystorm.net>
@ 2011-01-05  7:02   ` CoolCold
       [not found]   ` <AANLkTinL_nz58f8rSPuhYvVwGY5jdu1XVkNLC1ky5A65@mail.gmail.com>
  2 siblings, 0 replies; 89+ messages in thread
From: CoolCold @ 2011-01-05  7:02 UTC (permalink / raw)
  To: Neil Brown; +Cc: Patrick H., linux-raid

On Mon, Jan 3, 2011 at 6:16 AM, Neil Brown <neilb@suse.de> wrote:
> On Sun, 02 Jan 2011 18:58:34 -0700 "Patrick H." <linux-raid@feystorm.net>
> wrote:
>
>> I've been trying to track down an issue for a while now and from digging
>> around it appears (though not certain) the issue lies with the md raid
>> device.
>> Whats happening is that after improperly shutting down a raid-5 array,
>> upon reassembly, a few files on the filesystem will be corrupt. I dont
>> think this is normal filesystem corruption from files being modified
>> during the shut down because some of the files that end up corrupted are
>> several hours old.
>>
>> The exact details of what I'm doing:
>> I have a 3-node test cluster I'm doing integrity testing on. Each node
>> in the cluster is exporting a couple of disks via ATAoE.
>> I have the first disk of all 3 nodes in a raid-1 that is holding the
>> journal data for the ext3 filesystem. The array is running with an
>> internal bitmap as well.
>> The second disk of all 3 nodes is a raid-5 array holding the ext3
>> filesystem itself. This is also running with an internal bitmap.
>> The ext3 filesystem is mounted with 'data=journal,barrier=1,sync'.
>> When I power down the node which is actively running both md raid
>> devices, another node in the cluster takes over and starts both arrays
>> up (in degraded mode of course).
>> Once the original node comes back up, the new master re-adds its disks
>> back into the raid arrays and re-syncs them.
>> During all this, the filesystem is exported through nfs (nfs also has
>> sync turned on) and a client is randomly creating, removing, and
>> verifying checksums on the files in the filesystem (nfs is hard mounted
>> so operations always retry). The client script averages about 30
>> creations/s, 30 deletes/s, and 30 checksums/s.
>>
>> So, as stated above, every now and then (1 in 50 chance or so), when the
>> master is hard-rebooted, the client will detect a few files with invalid
>> md5 checksums. These files could be hours old so they were not being
>> actively modified.
>> Another key point that leads me to believe its a md raid issue is that
>> before I had the ext3 journal running internally on the raid-5 array
>> (part of the filesystem itself). When I did this, there would
>> occasionally be massive corruption. As in file modification times in the
>> future, lots of corrupt files, thousands of files put in the
>> 'lost+found' dir upon fsck, etc. After I put it on a separate raid-1,
>> there are no more invalid modification times, there hasnt been a single
>> file added to 'lost+found', and the number of corrupt files dropped
>> significantly. This would seem to indicate that the journal was getting
>> corrupted, and when it was played back, it went horribly wrong.
>>
>> So it would seem there's something wrong with the raid-5 array, but I
>> dont know what it could be. Any ideas or input would be much
>> appreciated. I can modify the clustering scripts to obtain whatever
>> information is needed when they start the arrays.
>
> What you are doing cannot work reliably.
>
> If a RAID5 suffers an unclean shutdown and is restarted without a full
> complement of devices, then it can corrupt data that has not been changed
> recently, just as you are seeing.
> This is why mdadm will not assemble that array unless you provide the --force
> flag which essentially says "I know what I am doing and accept the risk".
>
> When md needs to update a block in your 3-drive RAID5, it will read the other
> block in the same stripe (if that isn't in the cache or being written at the
> same time) and then write out the data block (or blocks) and the newly
> computed parity block.
>
> If you crash after one of those writes has completed, but before all of the
> writes have completed, then the parity block will not match the data blocks
> on disk.
Am I understanding right, that in case of hardware controller with
bbu, data and parity gonna be written properly ( for locally connected
 drives of course ) even in case of powerloss and this is the only
feature which hardware raid controllers can do and softraid can't ?
(well, except some nice features like maxiq - cache on ssd for adaptec
controllers and overall write performance expansion because of
ram/bbu)

>
> When you re-assemble the array with one device missing, md will compute the
> data that was on the device using the other data block and the parity block.
> As the parity and data blocks could be inconsistent, the result could easily
> be wrong.
>
> With RAID1 there is no similar problem.  When you read after a crash you will
> always get "correct" data.  It maybe from before the last write that was
> attempted, or after, but if the data was not written recently you will read
> exactly the right data.
>
> This is why the situation improved substantially when you moved the journal
> to RAID1.
>
> The get full improvement, you need to move the data to RAID1 (or RAID10) as
> well.
>
> NeilBrown
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
       [not found]   ` <AANLkTinL_nz58f8rSPuhYvVwGY5jdu1XVkNLC1ky5A65@mail.gmail.com>
@ 2011-01-05 14:28     ` Patrick H.
  2011-01-05 15:52       ` Spelic
  0 siblings, 1 reply; 89+ messages in thread
From: Patrick H. @ 2011-01-05 14:28 UTC (permalink / raw)
  To: linux-raid

Sent: Wed Jan 05 2011 00:00:48 GMT-0700 (Mountain Standard Time)
From: CoolCold <coolthecold@gmail.com>
To: Neil Brown <neilb@suse.de> "Patrick H." <linux-raid@feystorm.net>, 
linux-raid@vger.kernel.org
Subject: Re: filesystem corruption
>
> Am I understanding right, that in case of hardware controller with 
> bbu, data and parity gonna be written properly ( for locally 
> connected  drives of course ) even in case of powerloss and this is 
> the only feature which hardware raid controllers can do and softraid 
> can't ? (well, except some nice features like maxiq - cache on ssd for 
> adaptec controllers and overall write performance expansion because of 
> ram/bbu)
>
>
No, my drives are battery backed as well.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2011-01-05 14:28     ` Patrick H.
@ 2011-01-05 15:52       ` Spelic
  2011-01-05 15:55         ` Patrick H.
  0 siblings, 1 reply; 89+ messages in thread
From: Spelic @ 2011-01-05 15:52 UTC (permalink / raw)
  To: Patrick H.; +Cc: linux-raid

On 01/05/2011 03:28 PM, Patrick H. wrote:
> No, my drives are battery backed as well.

what drives are they, if I can ask? OCZ SSDs with supercapacitor maybe?

Do you know if they will really flush the whole write cache on sudden 
power off? I read smoky sentences about this for the OCZ drives. In 
certain points it seemed like the supercapacitor was only able to 
provide the same guarantees of a HDD, that is, no further data loss due 
to erase-then-rewrite-32K and flash wear levelling stuff, but was not 
able to flush the write cache.
Did you try with e.g. a stream of simple databases transactions then 
disconnecting the cable suddenly like this test
http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/
?

Thank you

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2011-01-05 15:52       ` Spelic
@ 2011-01-05 15:55         ` Patrick H.
  0 siblings, 0 replies; 89+ messages in thread
From: Patrick H. @ 2011-01-05 15:55 UTC (permalink / raw)
  To: linux-raid

HP DL360-G6. SAS controller with battery backed write accelerator.
I havent been focusing on the reliability of the drives as this is proof 
of concept testing. If we decide to use it, the drives will be replaced 
with 2TB SSD PCIe cards.

-Patrick

Sent: Wed Jan 05 2011 08:52:04 GMT-0700 (Mountain Standard Time)
From: Spelic <spelic@shiftmail.org>
To: Patrick H. <linux-raid@feystorm.net> linux-raid 
<linux-raid@vger.kernel.org>
Subject: Re: filesystem corruption
> On 01/05/2011 03:28 PM, Patrick H. wrote:
>> No, my drives are battery backed as well.
>
> what drives are they, if I can ask? OCZ SSDs with supercapacitor maybe?
>
> Do you know if they will really flush the whole write cache on sudden 
> power off? I read smoky sentences about this for the OCZ drives. In 
> certain points it seemed like the supercapacitor was only able to 
> provide the same guarantees of a HDD, that is, no further data loss 
> due to erase-then-rewrite-32K and flash wear levelling stuff, but was 
> not able to flush the write cache.
> Did you try with e.g. a stream of simple databases transactions then 
> disconnecting the cable suddenly like this test
> http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/ 
>
> ?
>
> Thank you

^ permalink raw reply	[flat|nested] 89+ messages in thread

* filesystem corruption
@ 2014-10-31  0:29 Tobias Holst
  2014-10-31  1:02 ` Tobias Holst
  0 siblings, 1 reply; 89+ messages in thread
From: Tobias Holst @ 2014-10-31  0:29 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

Hi

I was using a btrfs RAID1 with two disks under Ubuntu 14.04, kernel
3.13 and btrfs-tools 3.14.1 for weeks without issues.

Now I updated to kernel 3.17.1 and btrfs-tools 3.17. After a reboot
everything looked fine and I started some tests. While running
duperemover (just scanning, not doing anything) and a balance at the
same time the load suddenly went up to >30 and the system was not
responding anymore. Everyhting working with the filesystem stopped
responding. So I did a hard reset.

I was able to reboot, but on the login prompt nothing happened but a
kernel bug. Same back in kernel 3.13.

Now I started a live system (Ubuntu 14.10, kernel 3.16.x, btrfs-tools
3.14.1), and mounted the btrfs filesystem. I can browse through the
files but sometimes, especially when accessing my snapshots or trying
to create a new snapshot, the kernel bug appears and the filesystem
hangs.

It shows this:
Oct 31 00:09:14 ubuntu kernel: [  187.661731] ------------[ cut here
]------------
Oct 31 00:09:14 ubuntu kernel: [  187.661770] WARNING: CPU: 1 PID:
4417 at /build/buildd/linux-3.16.0/fs/btrfs/relocation.c:924
build_backref_tree+0xcab/0x1240 [btrfs]()
Oct 31 00:09:14 ubuntu kernel: [  187.661772] Modules linked in:
nls_iso8859_1 dm_crypt gpio_ich coretemp lpc_ich kvm_intel kvm
dm_multipath scsi_dh serio_raw xgifb(C) bnep rfcomm bluetooth
6lowpan_iphc i3000_edac edac_core parport_pc mac_hid ppdev shpchp lp
parport squashfs overlayfs nls_utf8 isofs btrfs xor raid6_pq dm_mirror
dm_region_hash dm_log hid_generic usbhid hid uas usb_storage ahci
e1000e libahci ptp pps_core
Oct 31 00:09:14 ubuntu kernel: [  187.661800] CPU: 1 PID: 4417 Comm:
btrfs-balance Tainted: G         C    3.16.0-23-generic #31-Ubuntu
Oct 31 00:09:14 ubuntu kernel: [  187.661802] Hardware name:
Supermicro PDSML/PDSML+, BIOS 6.00 03/06/2009
Oct 31 00:09:14 ubuntu kernel: [  187.661804]  0000000000000009
ffff8800a0ae7a00 ffffffff8177fcbc 0000000000000000
Oct 31 00:09:14 ubuntu kernel: [  187.661807]  ffff8800a0ae7a38
ffffffff8106fd8d ffff8800a1440750 ffff8800a1440b48
Oct 31 00:09:14 ubuntu kernel: [  187.661809]  ffff88020a8ce000
0000000000000001 ffff88020b6b0d00 ffff8800a0ae7a48
Oct 31 00:09:14 ubuntu kernel: [  187.661812] Call Trace:
Oct 31 00:09:14 ubuntu kernel: [  187.661820]  [<ffffffff8177fcbc>]
dump_stack+0x45/0x56
Oct 31 00:09:14 ubuntu kernel: [  187.661825]  [<ffffffff8106fd8d>]
warn_slowpath_common+0x7d/0xa0
Oct 31 00:09:14 ubuntu kernel: [  187.661827]  [<ffffffff8106fe6a>]
warn_slowpath_null+0x1a/0x20
Oct 31 00:09:14 ubuntu kernel: [  187.661842]  [<ffffffffc01b734b>]
build_backref_tree+0xcab/0x1240 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661857]  [<ffffffffc01b7ae1>]
relocate_tree_blocks+0x201/0x600 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661872]  [<ffffffffc01b88d8>] ?
add_data_references+0x268/0x2a0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661887]  [<ffffffffc01b96fd>]
relocate_block_group+0x25d/0x6b0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661902]  [<ffffffffc01b9d36>]
btrfs_relocate_block_group+0x1e6/0x2f0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661916]  [<ffffffffc0190988>]
btrfs_relocate_chunk.isra.27+0x58/0x720 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661926]  [<ffffffffc0140dc1>] ?
btrfs_set_path_blocking+0x41/0x80 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661935]  [<ffffffffc0145dfd>] ?
btrfs_search_slot+0x48d/0xa40 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661950]  [<ffffffffc018b49b>] ?
release_extent_buffer+0x2b/0xd0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661964]  [<ffffffffc018b95f>] ?
free_extent_buffer+0x4f/0xa0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661979]  [<ffffffffc01936c3>]
__btrfs_balance+0x4d3/0x8d0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.661993]  [<ffffffffc0193d48>]
btrfs_balance+0x288/0x600 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.662008]  [<ffffffffc019411d>]
balance_kthread+0x5d/0x80 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.662022]  [<ffffffffc01940c0>] ?
btrfs_balance+0x600/0x600 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.662026]  [<ffffffff81094aeb>]
kthread+0xdb/0x100
Oct 31 00:09:14 ubuntu kernel: [  187.662029]  [<ffffffff81094a10>] ?
kthread_create_on_node+0x1c0/0x1c0
Oct 31 00:09:14 ubuntu kernel: [  187.662032]  [<ffffffff81787c3c>]
ret_from_fork+0x7c/0xb0
Oct 31 00:09:14 ubuntu kernel: [  187.662035]  [<ffffffff81094a10>] ?
kthread_create_on_node+0x1c0/0x1c0
Oct 31 00:09:14 ubuntu kernel: [  187.662037] ---[ end trace
fb7849e4a6f20424 ]---

end this:
Oct 31 00:09:14 ubuntu kernel: [  187.682629] ------------[ cut here
]------------
Oct 31 00:09:14 ubuntu kernel: [  187.682635] kernel BUG at
/build/buildd/linux-3.16.0/fs/btrfs/extent-tree.c:868!
Oct 31 00:09:14 ubuntu kernel: [  187.682638] invalid opcode: 0000 [#1] SMP
Oct 31 00:09:14 ubuntu kernel: [  187.682642] Modules linked in:
nls_iso8859_1 dm_crypt gpio_ich coretemp lpc_ich kvm_intel kvm
dm_multipath scsi_dh serio_raw xgifb(C) bnep rfcomm bluetooth
6lowpan_iphc i3000_edac edac_core parport_pc mac_hid ppdev shpchp lp
parport squashfs overlayfs nls_utf8 isofs btrfs xor raid6_pq dm_mirror
dm_region_hash dm_log hid_generic usbhid hid uas usb_storage ahci
e1000e libahci ptp pps_core
Oct 31 00:09:14 ubuntu kernel: [  187.682686] CPU: 1 PID: 4417 Comm:
btrfs-balance Tainted: G        WC    3.16.0-23-generic #31-Ubuntu
Oct 31 00:09:14 ubuntu kernel: [  187.682688] Hardware name:
Supermicro PDSML/PDSML+, BIOS 6.00 03/06/2009
Oct 31 00:09:14 ubuntu kernel: [  187.682690] task: ffff8801bb5728c0
ti: ffff8800a0ae4000 task.ti: ffff8800a0ae4000
Oct 31 00:09:14 ubuntu kernel: [  187.682691] RIP:
0010:[<ffffffffc0150609>]  [<ffffffffc0150609>]
btrfs_lookup_extent_info+0x469/0x4a0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682704] RSP:
0018:ffff8800a0ae7810  EFLAGS: 00010246
Oct 31 00:09:14 ubuntu kernel: [  187.682706] RAX: 0000000000000000
RBX: ffff8800a1440b40 RCX: 000000129457c000
Oct 31 00:09:14 ubuntu kernel: [  187.682708] RDX: ffff8801ab1be3c0
RSI: 000000129457c000 RDI: ffff8801ab1be428
Oct 31 00:09:14 ubuntu kernel: [  187.682709] RBP: ffff8800a0ae7898
R08: ffff8801ab1be3c0 R09: 0000160000000000
Oct 31 00:09:14 ubuntu kernel: [  187.682711] R10: 0000000000000000
R11: 000000000000003a R12: ffff8801ab1be428
Oct 31 00:09:14 ubuntu kernel: [  187.682713] R13: 000000129457c000
R14: ffff8801b8800be0 R15: 0000000000000000
Oct 31 00:09:14 ubuntu kernel: [  187.682715] FS:
0000000000000000(0000) GS:ffff880217c80000(0000)
knlGS:0000000000000000
Oct 31 00:09:14 ubuntu kernel: [  187.682717] CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
Oct 31 00:09:14 ubuntu kernel: [  187.682718] CR2: 0000000000ed3970
CR3: 0000000208e63000 CR4: 00000000000007e0
Oct 31 00:09:14 ubuntu kernel: [  187.682720] Stack:
Oct 31 00:09:14 ubuntu kernel: [  187.682721]  ffff8800a0ae78c0
0000000000000000 0000000000000000 ffff8801ab1be3c0
Oct 31 00:09:14 ubuntu kernel: [  187.682724]  ffff8801b88be1b0
ffff8801ab1be3c0 ffff8801ab1be400 c0008801b8a45720
Oct 31 00:09:14 ubuntu kernel: [  187.682727]  00a8000000129457
ff00000000000040 ffffffffc01570d1 0000000000000001
Oct 31 00:09:14 ubuntu kernel: [  187.682730] Call Trace:
Oct 31 00:09:14 ubuntu kernel: [  187.682742]  [<ffffffffc01570d1>] ?
btrfs_alloc_free_block+0x3a1/0x470 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682751]  [<ffffffffc01416f4>]
update_ref_for_cow+0x174/0x360 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682761]  [<ffffffffc0141afd>]
__btrfs_cow_block+0x21d/0x510 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682770]  [<ffffffffc0141f86>]
btrfs_cow_block+0x116/0x1b0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682779]  [<ffffffffc0145b44>]
btrfs_search_slot+0x1d4/0xa40 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682791]  [<ffffffffc01677ad>] ?
record_root_in_trans+0xad/0x120 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682807]  [<ffffffffc01b64f3>]
do_relocation+0x3c3/0x570 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682817]  [<ffffffffc0152878>] ?
btrfs_block_rsv_refill+0x48/0xa0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682832]  [<ffffffffc01b7e35>]
relocate_tree_blocks+0x555/0x600 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682847]  [<ffffffffc01b88d8>] ?
add_data_references+0x268/0x2a0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682862]  [<ffffffffc01b96fd>]
relocate_block_group+0x25d/0x6b0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682876]  [<ffffffffc01b9d36>]
btrfs_relocate_block_group+0x1e6/0x2f0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682891]  [<ffffffffc0190988>]
btrfs_relocate_chunk.isra.27+0x58/0x720 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682900]  [<ffffffffc0140dc1>] ?
btrfs_set_path_blocking+0x41/0x80 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682909]  [<ffffffffc0145dfd>] ?
btrfs_search_slot+0x48d/0xa40 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682924]  [<ffffffffc018b49b>] ?
release_extent_buffer+0x2b/0xd0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682938]  [<ffffffffc018b95f>] ?
free_extent_buffer+0x4f/0xa0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682953]  [<ffffffffc01936c3>]
__btrfs_balance+0x4d3/0x8d0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682968]  [<ffffffffc0193d48>]
btrfs_balance+0x288/0x600 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682982]  [<ffffffffc019411d>]
balance_kthread+0x5d/0x80 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.682997]  [<ffffffffc01940c0>] ?
btrfs_balance+0x600/0x600 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.683001]  [<ffffffff81094aeb>]
kthread+0xdb/0x100
Oct 31 00:09:14 ubuntu kernel: [  187.683004]  [<ffffffff81094a10>] ?
kthread_create_on_node+0x1c0/0x1c0
Oct 31 00:09:14 ubuntu kernel: [  187.683007]  [<ffffffff81787c3c>]
ret_from_fork+0x7c/0xb0
Oct 31 00:09:14 ubuntu kernel: [  187.683010]  [<ffffffff81094a10>] ?
kthread_create_on_node+0x1c0/0x1c0
Oct 31 00:09:14 ubuntu kernel: [  187.683011] Code: be b0 00 00 00 48
c7 c7 90 77 1e c0 48 89 55 a8 e8 5d f8 f1 c0 48 8b 55 a8 e9 2e fe ff
ff 0f 0b 48 83 7d 88 00 0f 85 8d fe ff ff <0f> 0b 31 c0 e9 de fe ff ff
be 6c 03 00 00 48 c7 c7 28 77 1e c0
Oct 31 00:09:14 ubuntu kernel: [  187.683040] RIP
[<ffffffffc0150609>] btrfs_lookup_extent_info+0x469/0x4a0 [btrfs]
Oct 31 00:09:14 ubuntu kernel: [  187.683050]  RSP <ffff8800a0ae7810>
Oct 31 00:09:14 ubuntu kernel: [  187.683052] ---[ end trace
fb7849e4a6f20425 ]---

Then it keeps repeating this:
Oct 31 00:10:07 ubuntu kernel: [  240.100001] BUG: soft lockup - CPU#2
stuck for 22s! [btrfs-transacti:4416]
Oct 31 00:10:07 ubuntu kernel: [  240.100001] Modules linked in:
nls_iso8859_1 dm_crypt gpio_ich coretemp lpc_ich kvm_intel kvm
dm_multipath scsi_dh serio_raw xgifb(C) bnep rfcomm bluetooth
6lowpan_iphc i3000_edac edac_core parport_pc mac_hid ppdev shpchp lp
parport squashfs overlayfs nls_utf8 isofs btrfs xor raid6_pq dm_mirror
dm_region_hash dm_log hid_generic usbhid hid uas usb_storage ahci
e1000e libahci ptp pps_core
Oct 31 00:10:07 ubuntu kernel: [  240.100001] CPU: 2 PID: 4416 Comm:
btrfs-transacti Tainted: G      D WC    3.16.0-23-generic #31-Ubuntu
Oct 31 00:10:07 ubuntu kernel: [  240.100001] Hardware name:
Supermicro PDSML/PDSML+, BIOS 6.00 03/06/2009
Oct 31 00:10:07 ubuntu kernel: [  240.100001] task: ffff8800a23b1460
ti: ffff8801ba8f8000 task.ti: ffff8801ba8f8000
Oct 31 00:10:07 ubuntu kernel: [  240.100001] RIP:
0010:[<ffffffff81787712>]  [<ffffffff81787712>]
_raw_spin_lock+0x32/0x50
Oct 31 00:10:07 ubuntu kernel: [  240.100001] RSP:
0018:ffff8801ba8fbcc8  EFLAGS: 00000202
Oct 31 00:10:07 ubuntu kernel: [  240.100001] RAX: 0000000000004a52
RBX: 0000000000014800 RCX: 0000000000008c82
Oct 31 00:10:07 ubuntu kernel: [  240.100001] RDX: 0000000000008c84
RSI: 0000000000008c84 RDI: ffff8801b88be1b0
Oct 31 00:10:07 ubuntu kernel: [  240.100001] RBP: ffff8801ba8fbcc8
R08: 00000000008dd0e4 R09: 000000002ac4f29b
Oct 31 00:10:07 ubuntu kernel: [  240.100001] R10: 000000929da8c524
R11: 0000000000000020 R12: ffff88020c32c800
Oct 31 00:10:07 ubuntu kernel: [  240.100001] R13: ffff88020c32c808
R14: 0000000200000003 R15: ffff880217d8e4e0
Oct 31 00:10:07 ubuntu kernel: [  240.100001] FS:
0000000000000000(0000) GS:ffff880217d00000(0000)
knlGS:0000000000000000
Oct 31 00:10:07 ubuntu kernel: [  240.100001] CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
Oct 31 00:10:07 ubuntu kernel: [  240.100001] CR2: 00007fffa496afd8
CR3: 00000002084dd000 CR4: 00000000000007e0
Oct 31 00:10:07 ubuntu kernel: [  240.100001] Stack:
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  ffff8801ba8fbdf0
ffffffffc0153e02 ffffffff810abb55 ffff8800e14532f0
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  ffff8800e1453358
ffff8800a23b14c8 ffff8801ba8fbd60 ffff8801ba8fbd50
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  ffffffff81011661
0000000000014800 ffff880217d11c40 ffff8800a23b1a50
Oct 31 00:10:07 ubuntu kernel: [  240.100001] Call Trace:
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffffc0153e02>]
__btrfs_run_delayed_refs+0x1e2/0x11e0 [btrfs]
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffff810abb55>] ?
set_next_entity+0x95/0xb0
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffff81011661>] ?
__switch_to+0x191/0x5e0
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffff8107dd8a>] ?
del_timer_sync+0x4a/0x60
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffffc0158df3>]
btrfs_run_delayed_refs.part.64+0x73/0x270 [btrfs]
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffffc0159007>]
btrfs_run_delayed_refs+0x17/0x20 [btrfs]
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffffc0169269>]
btrfs_commit_transaction+0x29/0x80 [btrfs]
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffffc016527d>]
transaction_kthread+0x1ed/0x260 [btrfs]
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffffc0165090>] ?
btrfs_cleanup_transaction+0x540/0x540 [btrfs]
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffff81094aeb>]
kthread+0xdb/0x100
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffff81094a10>] ?
kthread_create_on_node+0x1c0/0x1c0
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffff81787c3c>]
ret_from_fork+0x7c/0xb0
Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffff81094a10>] ?
kthread_create_on_node+0x1c0/0x1c0
Oct 31 00:10:07 ubuntu kernel: [  240.100001] Code: 89 e5 b8 00 00 02
00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 04 5d c3 66 90 83 e2 fe 0f
b7 f2 b8 00 80 00 00 eb 0a 0f 1f 00 f3 90 <83> e8 01 74 0a 0f b7 0f 66
39 ca 75 f1 5d c3 66 66 66 90 66 66


Any ideas how to fix this filesystem? I do have backups, but I am
interested in finding out what happened and what to do.

Regards
Tobias

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-10-31  0:29 filesystem corruption Tobias Holst
@ 2014-10-31  1:02 ` Tobias Holst
  2014-10-31  2:41   ` Rich Freeman
  0 siblings, 1 reply; 89+ messages in thread
From: Tobias Holst @ 2014-10-31  1:02 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

Addition:
I found some posts here about a general file system corruption in 3.17
and 3.17.1 - is this the cause?
Additionally I am using ro-snapshots - maybe this is the cause, too?

Anyway: Can I fix that or do I have to reinstall? Haven't touched the
filesystem, just did a scrub (found 0 errors).

Regards
Tobias


2014-10-31 1:29 GMT+01:00 Tobias Holst <tobby@tobby.eu>:
> Hi
>
> I was using a btrfs RAID1 with two disks under Ubuntu 14.04, kernel
> 3.13 and btrfs-tools 3.14.1 for weeks without issues.
>
> Now I updated to kernel 3.17.1 and btrfs-tools 3.17. After a reboot
> everything looked fine and I started some tests. While running
> duperemover (just scanning, not doing anything) and a balance at the
> same time the load suddenly went up to >30 and the system was not
> responding anymore. Everyhting working with the filesystem stopped
> responding. So I did a hard reset.
>
> I was able to reboot, but on the login prompt nothing happened but a
> kernel bug. Same back in kernel 3.13.
>
> Now I started a live system (Ubuntu 14.10, kernel 3.16.x, btrfs-tools
> 3.14.1), and mounted the btrfs filesystem. I can browse through the
> files but sometimes, especially when accessing my snapshots or trying
> to create a new snapshot, the kernel bug appears and the filesystem
> hangs.
>
> It shows this:
> Oct 31 00:09:14 ubuntu kernel: [  187.661731] ------------[ cut here
> ]------------
> Oct 31 00:09:14 ubuntu kernel: [  187.661770] WARNING: CPU: 1 PID:
> 4417 at /build/buildd/linux-3.16.0/fs/btrfs/relocation.c:924
> build_backref_tree+0xcab/0x1240 [btrfs]()
> Oct 31 00:09:14 ubuntu kernel: [  187.661772] Modules linked in:
> nls_iso8859_1 dm_crypt gpio_ich coretemp lpc_ich kvm_intel kvm
> dm_multipath scsi_dh serio_raw xgifb(C) bnep rfcomm bluetooth
> 6lowpan_iphc i3000_edac edac_core parport_pc mac_hid ppdev shpchp lp
> parport squashfs overlayfs nls_utf8 isofs btrfs xor raid6_pq dm_mirror
> dm_region_hash dm_log hid_generic usbhid hid uas usb_storage ahci
> e1000e libahci ptp pps_core
> Oct 31 00:09:14 ubuntu kernel: [  187.661800] CPU: 1 PID: 4417 Comm:
> btrfs-balance Tainted: G         C    3.16.0-23-generic #31-Ubuntu
> Oct 31 00:09:14 ubuntu kernel: [  187.661802] Hardware name:
> Supermicro PDSML/PDSML+, BIOS 6.00 03/06/2009
> Oct 31 00:09:14 ubuntu kernel: [  187.661804]  0000000000000009
> ffff8800a0ae7a00 ffffffff8177fcbc 0000000000000000
> Oct 31 00:09:14 ubuntu kernel: [  187.661807]  ffff8800a0ae7a38
> ffffffff8106fd8d ffff8800a1440750 ffff8800a1440b48
> Oct 31 00:09:14 ubuntu kernel: [  187.661809]  ffff88020a8ce000
> 0000000000000001 ffff88020b6b0d00 ffff8800a0ae7a48
> Oct 31 00:09:14 ubuntu kernel: [  187.661812] Call Trace:
> Oct 31 00:09:14 ubuntu kernel: [  187.661820]  [<ffffffff8177fcbc>]
> dump_stack+0x45/0x56
> Oct 31 00:09:14 ubuntu kernel: [  187.661825]  [<ffffffff8106fd8d>]
> warn_slowpath_common+0x7d/0xa0
> Oct 31 00:09:14 ubuntu kernel: [  187.661827]  [<ffffffff8106fe6a>]
> warn_slowpath_null+0x1a/0x20
> Oct 31 00:09:14 ubuntu kernel: [  187.661842]  [<ffffffffc01b734b>]
> build_backref_tree+0xcab/0x1240 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.661857]  [<ffffffffc01b7ae1>]
> relocate_tree_blocks+0x201/0x600 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.661872]  [<ffffffffc01b88d8>] ?
> add_data_references+0x268/0x2a0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.661887]  [<ffffffffc01b96fd>]
> relocate_block_group+0x25d/0x6b0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.661902]  [<ffffffffc01b9d36>]
> btrfs_relocate_block_group+0x1e6/0x2f0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.661916]  [<ffffffffc0190988>]
> btrfs_relocate_chunk.isra.27+0x58/0x720 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.661926]  [<ffffffffc0140dc1>] ?
> btrfs_set_path_blocking+0x41/0x80 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.661935]  [<ffffffffc0145dfd>] ?
> btrfs_search_slot+0x48d/0xa40 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.661950]  [<ffffffffc018b49b>] ?
> release_extent_buffer+0x2b/0xd0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.661964]  [<ffffffffc018b95f>] ?
> free_extent_buffer+0x4f/0xa0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.661979]  [<ffffffffc01936c3>]
> __btrfs_balance+0x4d3/0x8d0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.661993]  [<ffffffffc0193d48>]
> btrfs_balance+0x288/0x600 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.662008]  [<ffffffffc019411d>]
> balance_kthread+0x5d/0x80 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.662022]  [<ffffffffc01940c0>] ?
> btrfs_balance+0x600/0x600 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.662026]  [<ffffffff81094aeb>]
> kthread+0xdb/0x100
> Oct 31 00:09:14 ubuntu kernel: [  187.662029]  [<ffffffff81094a10>] ?
> kthread_create_on_node+0x1c0/0x1c0
> Oct 31 00:09:14 ubuntu kernel: [  187.662032]  [<ffffffff81787c3c>]
> ret_from_fork+0x7c/0xb0
> Oct 31 00:09:14 ubuntu kernel: [  187.662035]  [<ffffffff81094a10>] ?
> kthread_create_on_node+0x1c0/0x1c0
> Oct 31 00:09:14 ubuntu kernel: [  187.662037] ---[ end trace
> fb7849e4a6f20424 ]---
>
> end this:
> Oct 31 00:09:14 ubuntu kernel: [  187.682629] ------------[ cut here
> ]------------
> Oct 31 00:09:14 ubuntu kernel: [  187.682635] kernel BUG at
> /build/buildd/linux-3.16.0/fs/btrfs/extent-tree.c:868!
> Oct 31 00:09:14 ubuntu kernel: [  187.682638] invalid opcode: 0000 [#1] SMP
> Oct 31 00:09:14 ubuntu kernel: [  187.682642] Modules linked in:
> nls_iso8859_1 dm_crypt gpio_ich coretemp lpc_ich kvm_intel kvm
> dm_multipath scsi_dh serio_raw xgifb(C) bnep rfcomm bluetooth
> 6lowpan_iphc i3000_edac edac_core parport_pc mac_hid ppdev shpchp lp
> parport squashfs overlayfs nls_utf8 isofs btrfs xor raid6_pq dm_mirror
> dm_region_hash dm_log hid_generic usbhid hid uas usb_storage ahci
> e1000e libahci ptp pps_core
> Oct 31 00:09:14 ubuntu kernel: [  187.682686] CPU: 1 PID: 4417 Comm:
> btrfs-balance Tainted: G        WC    3.16.0-23-generic #31-Ubuntu
> Oct 31 00:09:14 ubuntu kernel: [  187.682688] Hardware name:
> Supermicro PDSML/PDSML+, BIOS 6.00 03/06/2009
> Oct 31 00:09:14 ubuntu kernel: [  187.682690] task: ffff8801bb5728c0
> ti: ffff8800a0ae4000 task.ti: ffff8800a0ae4000
> Oct 31 00:09:14 ubuntu kernel: [  187.682691] RIP:
> 0010:[<ffffffffc0150609>]  [<ffffffffc0150609>]
> btrfs_lookup_extent_info+0x469/0x4a0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682704] RSP:
> 0018:ffff8800a0ae7810  EFLAGS: 00010246
> Oct 31 00:09:14 ubuntu kernel: [  187.682706] RAX: 0000000000000000
> RBX: ffff8800a1440b40 RCX: 000000129457c000
> Oct 31 00:09:14 ubuntu kernel: [  187.682708] RDX: ffff8801ab1be3c0
> RSI: 000000129457c000 RDI: ffff8801ab1be428
> Oct 31 00:09:14 ubuntu kernel: [  187.682709] RBP: ffff8800a0ae7898
> R08: ffff8801ab1be3c0 R09: 0000160000000000
> Oct 31 00:09:14 ubuntu kernel: [  187.682711] R10: 0000000000000000
> R11: 000000000000003a R12: ffff8801ab1be428
> Oct 31 00:09:14 ubuntu kernel: [  187.682713] R13: 000000129457c000
> R14: ffff8801b8800be0 R15: 0000000000000000
> Oct 31 00:09:14 ubuntu kernel: [  187.682715] FS:
> 0000000000000000(0000) GS:ffff880217c80000(0000)
> knlGS:0000000000000000
> Oct 31 00:09:14 ubuntu kernel: [  187.682717] CS:  0010 DS: 0000 ES:
> 0000 CR0: 000000008005003b
> Oct 31 00:09:14 ubuntu kernel: [  187.682718] CR2: 0000000000ed3970
> CR3: 0000000208e63000 CR4: 00000000000007e0
> Oct 31 00:09:14 ubuntu kernel: [  187.682720] Stack:
> Oct 31 00:09:14 ubuntu kernel: [  187.682721]  ffff8800a0ae78c0
> 0000000000000000 0000000000000000 ffff8801ab1be3c0
> Oct 31 00:09:14 ubuntu kernel: [  187.682724]  ffff8801b88be1b0
> ffff8801ab1be3c0 ffff8801ab1be400 c0008801b8a45720
> Oct 31 00:09:14 ubuntu kernel: [  187.682727]  00a8000000129457
> ff00000000000040 ffffffffc01570d1 0000000000000001
> Oct 31 00:09:14 ubuntu kernel: [  187.682730] Call Trace:
> Oct 31 00:09:14 ubuntu kernel: [  187.682742]  [<ffffffffc01570d1>] ?
> btrfs_alloc_free_block+0x3a1/0x470 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682751]  [<ffffffffc01416f4>]
> update_ref_for_cow+0x174/0x360 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682761]  [<ffffffffc0141afd>]
> __btrfs_cow_block+0x21d/0x510 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682770]  [<ffffffffc0141f86>]
> btrfs_cow_block+0x116/0x1b0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682779]  [<ffffffffc0145b44>]
> btrfs_search_slot+0x1d4/0xa40 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682791]  [<ffffffffc01677ad>] ?
> record_root_in_trans+0xad/0x120 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682807]  [<ffffffffc01b64f3>]
> do_relocation+0x3c3/0x570 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682817]  [<ffffffffc0152878>] ?
> btrfs_block_rsv_refill+0x48/0xa0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682832]  [<ffffffffc01b7e35>]
> relocate_tree_blocks+0x555/0x600 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682847]  [<ffffffffc01b88d8>] ?
> add_data_references+0x268/0x2a0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682862]  [<ffffffffc01b96fd>]
> relocate_block_group+0x25d/0x6b0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682876]  [<ffffffffc01b9d36>]
> btrfs_relocate_block_group+0x1e6/0x2f0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682891]  [<ffffffffc0190988>]
> btrfs_relocate_chunk.isra.27+0x58/0x720 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682900]  [<ffffffffc0140dc1>] ?
> btrfs_set_path_blocking+0x41/0x80 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682909]  [<ffffffffc0145dfd>] ?
> btrfs_search_slot+0x48d/0xa40 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682924]  [<ffffffffc018b49b>] ?
> release_extent_buffer+0x2b/0xd0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682938]  [<ffffffffc018b95f>] ?
> free_extent_buffer+0x4f/0xa0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682953]  [<ffffffffc01936c3>]
> __btrfs_balance+0x4d3/0x8d0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682968]  [<ffffffffc0193d48>]
> btrfs_balance+0x288/0x600 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682982]  [<ffffffffc019411d>]
> balance_kthread+0x5d/0x80 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.682997]  [<ffffffffc01940c0>] ?
> btrfs_balance+0x600/0x600 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.683001]  [<ffffffff81094aeb>]
> kthread+0xdb/0x100
> Oct 31 00:09:14 ubuntu kernel: [  187.683004]  [<ffffffff81094a10>] ?
> kthread_create_on_node+0x1c0/0x1c0
> Oct 31 00:09:14 ubuntu kernel: [  187.683007]  [<ffffffff81787c3c>]
> ret_from_fork+0x7c/0xb0
> Oct 31 00:09:14 ubuntu kernel: [  187.683010]  [<ffffffff81094a10>] ?
> kthread_create_on_node+0x1c0/0x1c0
> Oct 31 00:09:14 ubuntu kernel: [  187.683011] Code: be b0 00 00 00 48
> c7 c7 90 77 1e c0 48 89 55 a8 e8 5d f8 f1 c0 48 8b 55 a8 e9 2e fe ff
> ff 0f 0b 48 83 7d 88 00 0f 85 8d fe ff ff <0f> 0b 31 c0 e9 de fe ff ff
> be 6c 03 00 00 48 c7 c7 28 77 1e c0
> Oct 31 00:09:14 ubuntu kernel: [  187.683040] RIP
> [<ffffffffc0150609>] btrfs_lookup_extent_info+0x469/0x4a0 [btrfs]
> Oct 31 00:09:14 ubuntu kernel: [  187.683050]  RSP <ffff8800a0ae7810>
> Oct 31 00:09:14 ubuntu kernel: [  187.683052] ---[ end trace
> fb7849e4a6f20425 ]---
>
> Then it keeps repeating this:
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] BUG: soft lockup - CPU#2
> stuck for 22s! [btrfs-transacti:4416]
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] Modules linked in:
> nls_iso8859_1 dm_crypt gpio_ich coretemp lpc_ich kvm_intel kvm
> dm_multipath scsi_dh serio_raw xgifb(C) bnep rfcomm bluetooth
> 6lowpan_iphc i3000_edac edac_core parport_pc mac_hid ppdev shpchp lp
> parport squashfs overlayfs nls_utf8 isofs btrfs xor raid6_pq dm_mirror
> dm_region_hash dm_log hid_generic usbhid hid uas usb_storage ahci
> e1000e libahci ptp pps_core
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] CPU: 2 PID: 4416 Comm:
> btrfs-transacti Tainted: G      D WC    3.16.0-23-generic #31-Ubuntu
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] Hardware name:
> Supermicro PDSML/PDSML+, BIOS 6.00 03/06/2009
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] task: ffff8800a23b1460
> ti: ffff8801ba8f8000 task.ti: ffff8801ba8f8000
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] RIP:
> 0010:[<ffffffff81787712>]  [<ffffffff81787712>]
> _raw_spin_lock+0x32/0x50
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] RSP:
> 0018:ffff8801ba8fbcc8  EFLAGS: 00000202
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] RAX: 0000000000004a52
> RBX: 0000000000014800 RCX: 0000000000008c82
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] RDX: 0000000000008c84
> RSI: 0000000000008c84 RDI: ffff8801b88be1b0
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] RBP: ffff8801ba8fbcc8
> R08: 00000000008dd0e4 R09: 000000002ac4f29b
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] R10: 000000929da8c524
> R11: 0000000000000020 R12: ffff88020c32c800
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] R13: ffff88020c32c808
> R14: 0000000200000003 R15: ffff880217d8e4e0
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] FS:
> 0000000000000000(0000) GS:ffff880217d00000(0000)
> knlGS:0000000000000000
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] CS:  0010 DS: 0000 ES:
> 0000 CR0: 000000008005003b
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] CR2: 00007fffa496afd8
> CR3: 00000002084dd000 CR4: 00000000000007e0
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] Stack:
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  ffff8801ba8fbdf0
> ffffffffc0153e02 ffffffff810abb55 ffff8800e14532f0
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  ffff8800e1453358
> ffff8800a23b14c8 ffff8801ba8fbd60 ffff8801ba8fbd50
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  ffffffff81011661
> 0000000000014800 ffff880217d11c40 ffff8800a23b1a50
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] Call Trace:
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffffc0153e02>]
> __btrfs_run_delayed_refs+0x1e2/0x11e0 [btrfs]
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffff810abb55>] ?
> set_next_entity+0x95/0xb0
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffff81011661>] ?
> __switch_to+0x191/0x5e0
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffff8107dd8a>] ?
> del_timer_sync+0x4a/0x60
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffffc0158df3>]
> btrfs_run_delayed_refs.part.64+0x73/0x270 [btrfs]
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffffc0159007>]
> btrfs_run_delayed_refs+0x17/0x20 [btrfs]
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffffc0169269>]
> btrfs_commit_transaction+0x29/0x80 [btrfs]
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffffc016527d>]
> transaction_kthread+0x1ed/0x260 [btrfs]
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffffc0165090>] ?
> btrfs_cleanup_transaction+0x540/0x540 [btrfs]
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffff81094aeb>]
> kthread+0xdb/0x100
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffff81094a10>] ?
> kthread_create_on_node+0x1c0/0x1c0
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffff81787c3c>]
> ret_from_fork+0x7c/0xb0
> Oct 31 00:10:07 ubuntu kernel: [  240.100001]  [<ffffffff81094a10>] ?
> kthread_create_on_node+0x1c0/0x1c0
> Oct 31 00:10:07 ubuntu kernel: [  240.100001] Code: 89 e5 b8 00 00 02
> 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 04 5d c3 66 90 83 e2 fe 0f
> b7 f2 b8 00 80 00 00 eb 0a 0f 1f 00 f3 90 <83> e8 01 74 0a 0f b7 0f 66
> 39 ca 75 f1 5d c3 66 66 66 90 66 66
>
>
> Any ideas how to fix this filesystem? I do have backups, but I am
> interested in finding out what happened and what to do.
>
> Regards
> Tobias

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-10-31  1:02 ` Tobias Holst
@ 2014-10-31  2:41   ` Rich Freeman
  2014-10-31 17:34     ` Tobias Holst
  0 siblings, 1 reply; 89+ messages in thread
From: Rich Freeman @ 2014-10-31  2:41 UTC (permalink / raw)
  To: Tobias Holst; +Cc: linux-btrfs@vger.kernel.org

On Thu, Oct 30, 2014 at 9:02 PM, Tobias Holst <tobby@tobby.eu> wrote:
> Addition:
> I found some posts here about a general file system corruption in 3.17
> and 3.17.1 - is this the cause?
> Additionally I am using ro-snapshots - maybe this is the cause, too?
>
> Anyway: Can I fix that or do I have to reinstall? Haven't touched the
> filesystem, just did a scrub (found 0 errors).
>

Yup - ro-snapshots is a big problem in 3.17.  You can probably recover now by:
1.  Update your kernel to 3.17.2 - that takes care of all the big
known 3.16/17 issues in general.
2.  Run btrfs check using btrfs-tools 3.17.  That can clean up the
broken snapshots in your filesystem.

That is fairly likely to get your filesystem working normally again.
It worked for me.  I was getting some balance issues when trying to
add another device and I'm not sure if 3.17.2 totally fixed that - I
ended up cancelling the balance and it will be a while before I have
to balance this particular filesystem again, so I'll just hold off and
hope things stabilize.

--
Rich

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-10-31  2:41   ` Rich Freeman
@ 2014-10-31 17:34     ` Tobias Holst
  2014-11-02  4:49       ` Robert White
  0 siblings, 1 reply; 89+ messages in thread
From: Tobias Holst @ 2014-10-31 17:34 UTC (permalink / raw)
  To: Rich Freeman; +Cc: linux-btrfs@vger.kernel.org

I am now using another system with kernel 3.17.2 and btrfs-tools 3.17
and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add
the second one as there are only two slots in that server.

This is what I got:

 tobby@ubuntu: sudo btrfs check /dev/sdb1
warning, device 2 is missing
warning devid 2 not found already
root item for root 1746, current bytenr 80450240512, current gen
163697, current level 2, new bytenr 40074067968, new gen 163707, new
level 2
Found 1 roots with an outdated root item.
Please run a filesystem check with the option --repair to fix them.

 tobby@ubuntu: sudo btrfs check --repair /dev/sdb1
enabling repair mode
warning, device 2 is missing
warning devid 2 not found already
Unable to find block group for 0
extent-tree.c:289: find_search_start: Assertion `1` failed.
btrfs[0x42bd62]
btrfs[0x42ffe5]
btrfs[0x430211]
btrfs[0x4246ec]
btrfs[0x424d11]
btrfs[0x426af3]
btrfs[0x41b18c]
btrfs[0x40b46a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7ffca1119ec5]
btrfs[0x40b497]

This can be repeated as often as I want ;) Nothing changed.

Regards
Tobias

2014-10-31 3:41 GMT+01:00 Rich Freeman <r-btrfs@thefreemanclan.net>:
> On Thu, Oct 30, 2014 at 9:02 PM, Tobias Holst <tobby@tobby.eu> wrote:
>> Addition:
>> I found some posts here about a general file system corruption in 3.17
>> and 3.17.1 - is this the cause?
>> Additionally I am using ro-snapshots - maybe this is the cause, too?
>>
>> Anyway: Can I fix that or do I have to reinstall? Haven't touched the
>> filesystem, just did a scrub (found 0 errors).
>>
>
> Yup - ro-snapshots is a big problem in 3.17.  You can probably recover now by:
> 1.  Update your kernel to 3.17.2 - that takes care of all the big
> known 3.16/17 issues in general.
> 2.  Run btrfs check using btrfs-tools 3.17.  That can clean up the
> broken snapshots in your filesystem.
>
> That is fairly likely to get your filesystem working normally again.
> It worked for me.  I was getting some balance issues when trying to
> add another device and I'm not sure if 3.17.2 totally fixed that - I
> ended up cancelling the balance and it will be a while before I have
> to balance this particular filesystem again, so I'll just hold off and
> hope things stabilize.
>
> --
> Rich

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-10-31 17:34     ` Tobias Holst
@ 2014-11-02  4:49       ` Robert White
  2014-11-02 21:57         ` Chris Murphy
  2014-11-03  2:55         ` Tobias Holst
  0 siblings, 2 replies; 89+ messages in thread
From: Robert White @ 2014-11-02  4:49 UTC (permalink / raw)
  To: Tobias Holst, Rich Freeman; +Cc: linux-btrfs@vger.kernel.org

On 10/31/2014 10:34 AM, Tobias Holst wrote:
> I am now using another system with kernel 3.17.2 and btrfs-tools 3.17
> and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add
> the second one as there are only two slots in that server.
>
> This is what I got:
>
>   tobby@ubuntu: sudo btrfs check /dev/sdb1
> warning, device 2 is missing
> warning devid 2 not found already
> root item for root 1746, current bytenr 80450240512, current gen
> 163697, current level 2, new bytenr 40074067968, new gen 163707, new
> level 2
> Found 1 roots with an outdated root item.
> Please run a filesystem check with the option --repair to fix them.
>
>   tobby@ubuntu: sudo btrfs check --repair /dev/sdb1
> enabling repair mode
> warning, device 2 is missing
> warning devid 2 not found already
> Unable to find block group for 0
> extent-tree.c:289: find_search_start: Assertion `1` failed.

The read-only snapshots taken under 3.17.1 are your core problem.

Now btrfsck is refusing to operate on the degraded RAID because degraded 
RAID is degraded so it's read-only. (this is an educated guess). Since 
btrfsck is _not_ a mount type of operation its got no "degraded mode" 
that would let you deal with half a RAID as far as I know.

In your case...

It is _known_ that you need to be _not_ running 3.17.0 or 3.17.1 if you 
are going to make read-only snapshots safely.
It is _known_ that you need to be running 3.17.2 to get a number of 
fixes that impact your circumstance.
It is _known_ that you need to be running btrfs-progs 3.17 to repair the 
read-only snapshot that are borked up, and that you must _not_ have 
previously tried to repair the problme with an older btrfsck.

Were I you, I would...

Put the two disks back in the same computer before something bad happens.

Upgrade that computer to 3.17.2 and 3.17 respectively.

Take a backup (because I am paranoid like that, though current threat 
seems negligible).

btrfsck your raid with --repair.

Alternately, if you previously tried to btrfsck the raid with a version 
prior to 3.17 tools after the read-only snapshot(s) problem, you will 
need to resort to mkfs.btrfs to solve the problem. But Hey! you have two 
disks, so break the RAID, then mkfs one of them, then copy the data, 
then re-make the RAID such that the new FS rules.

Enjoy your system no longer taking racy read-only snapshots... 8-)

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-11-02  4:49       ` Robert White
@ 2014-11-02 21:57         ` Chris Murphy
  2014-11-03  3:43           ` Zygo Blaxell
  2014-11-03  2:55         ` Tobias Holst
  1 sibling, 1 reply; 89+ messages in thread
From: Chris Murphy @ 2014-11-02 21:57 UTC (permalink / raw)
  Cc: Btrfs BTRFS

On Nov 1, 2014, at 10:49 PM, Robert White <rwhite@pobox.com> wrote:

> On 10/31/2014 10:34 AM, Tobias Holst wrote:
>> I am now using another system with kernel 3.17.2 and btrfs-tools 3.17
>> and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add
>> the second one as there are only two slots in that server.
>> 
>> This is what I got:
>> 
>>  tobby@ubuntu: sudo btrfs check /dev/sdb1
>> warning, device 2 is missing
>> warning devid 2 not found already
>> root item for root 1746, current bytenr 80450240512, current gen
>> 163697, current level 2, new bytenr 40074067968, new gen 163707, new
>> level 2
>> Found 1 roots with an outdated root item.
>> Please run a filesystem check with the option --repair to fix them.
>> 
>>  tobby@ubuntu: sudo btrfs check --repair /dev/sdb1
>> enabling repair mode
>> warning, device 2 is missing
>> warning devid 2 not found already
>> Unable to find block group for 0
>> extent-tree.c:289: find_search_start: Assertion `1` failed.
> 
> The read-only snapshots taken under 3.17.1 are your core problem.
> 
> Now btrfsck is refusing to operate on the degraded RAID because degraded RAID is degraded so it's read-only. (this is an educated guess).

Degradedness and writability are orthogonal. If there's some problem with the fs that prevents it from being mountable rw, then that'd apply for both normal and degraded operation. If the fs is OK, it should permit writable degraded mounts.

> Since btrfsck is _not_ a mount type of operation its got no "degraded mode" that would let you deal with half a RAID as far as I know.

That's a problem. I can see why a repair might need an additional flag (maybe force) to repair a volume that has the minimum number of devices for degraded mounting, but not all are present. Maybe we wouldn't want it to be easy to accidentally run a repair that changes the file system when a device happens to be missing inadvertently that could be found and connected later.

I think related to this is a btrfs equivalent of a bitmap. The metadata already has this information in it, but possibly right now btrfs lacks the equivalent behavior of mdadm readd when a previously missing device is reconnected. If it has a bitmap then it doesn't have to be completely rebuilt, the bitmap contains information telling md how to "catch up" the readded device, i.e. only that which is different needs to be written upon a readd.

For example if I have a two device Btrfs raid1 for both data and metadata, and one device is removed and I mount -o degraded,rw one of them and make some small changes, unmount, then reconnect the missing device and mount NOT degraded - what happens? I haven't tried this. And I also don't know if a full balance (hours) is needed to "catch up" the formerly missing device. With md this is very fast - seconds/minutes depending on how much has been changed.

Chris Murphy

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-11-02  4:49       ` Robert White
  2014-11-02 21:57         ` Chris Murphy
@ 2014-11-03  2:55         ` Tobias Holst
  2014-11-03  3:49           ` Robert White
  1 sibling, 1 reply; 89+ messages in thread
From: Tobias Holst @ 2014-11-03  2:55 UTC (permalink / raw)
  To: Robert White; +Cc: Rich Freeman, linux-btrfs@vger.kernel.org

Thank you for your reply.

I'll answer in-line.

2014-11-02 5:49 GMT+01:00 Robert White <rwhite@pobox.com>:
> On 10/31/2014 10:34 AM, Tobias Holst wrote:
>>
>> I am now using another system with kernel 3.17.2 and btrfs-tools 3.17
>> and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add
>> the second one as there are only two slots in that server.
>>
>> This is what I got:
>>
>>   tobby@ubuntu: sudo btrfs check /dev/sdb1
>> warning, device 2 is missing
>> warning devid 2 not found already
>> root item for root 1746, current bytenr 80450240512, current gen
>> 163697, current level 2, new bytenr 40074067968, new gen 163707, new
>> level 2
>> Found 1 roots with an outdated root item.
>> Please run a filesystem check with the option --repair to fix them.
>>
>>   tobby@ubuntu: sudo btrfs check --repair /dev/sdb1
>> enabling repair mode
>> warning, device 2 is missing
>> warning devid 2 not found already
>> Unable to find block group for 0
>> extent-tree.c:289: find_search_start: Assertion `1` failed.
>
>
> The read-only snapshots taken under 3.17.1 are your core problem.

OK

>
> Now btrfsck is refusing to operate on the degraded RAID because degraded
> RAID is degraded so it's read-only. (this is an educated guess). Since
> btrfsck is _not_ a mount type of operation its got no "degraded mode" that
> would let you deal with half a RAID as far as I know.

OK, good to know.

>
> In your case...
>
> It is _known_ that you need to be _not_ running 3.17.0 or 3.17.1 if you are
> going to make read-only snapshots safely.
> It is _known_ that you need to be running 3.17.2 to get a number of fixes
> that impact your circumstance.
> It is _known_ that you need to be running btrfs-progs 3.17 to repair the
> read-only snapshot that are borked up, and that you must _not_ have
> previously tried to repair the problme with an older btrfsck.

No, I didn't try to repair it with older kernels/btrfs-tools.

>
> Were I you, I would...
>
> Put the two disks back in the same computer before something bad happens.
>
> Upgrade that computer to 3.17.2 and 3.17 respectively.

As I mentioned before I only have two slots and my system on this
btrfs-raid1 is not working anymore. Not just when accessing
ro-snapshots - it crashes everytime at the login prompt. So now I
installed Ubuntu 14.04 to an USB stick (so I can readd both btrfs
HDDs) and upgraded the kernel to 3.17.2 and btrfs-tools to 3.17.

>
> Take a backup (because I am paranoid like that, though current threat seems
> negligible).

I already have a backup. :)

>
> btrfsck your raid with --repair.

OK. And this is what I get now:

tobby@ubuntu: sudo btrfs check /dev/sda1
root item for root 1746, current bytenr 80450240512, current gen
163697, current level 2, new bytenr 40074067968, new gen 163707, new
level 2
Found 1 roots with an outdated root item.
Please run a filesystem check with the option --repair to fix them.

tobby@ubuntu: sudo btrfs check /dev/sda1 --repair
enabling repair mode
fixing root item for root 1746, current bytenr 80450240512, current
gen 163697, current level 2, new bytenr 40074067968, new gen 163707,
new level 2
Fixed 1 roots.
Checking filesystem on /dev/sda1
UUID: 3ad065be-2525-4547-87d3-0e195497f9cf
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 18446744073709551607 inode 258 errors 1000, some csum missing
found 36031450184 bytes used err is 1
total csum bytes: 59665716
total tree bytes: 3523330048
total fs tree bytes: 3234054144
total extent tree bytes: 202358784
btree space waste bytes: 755547262
file data blocks allocated: 122274091008
 referenced 211741990912
Btrfs v3.17

>
> Alternately, if you previously tried to btrfsck the raid with a version
> prior to 3.17 tools after the read-only snapshot(s) problem, you will need
> to resort to mkfs.btrfs to solve the problem. But Hey! you have two disks,
> so break the RAID, then mkfs one of them, then copy the data, then re-make
> the RAID such that the new FS rules.
>
> Enjoy your system no longer taking racy read-only snapshots... 8-)
>
>

Aaaaand this worked! :) Server is back online without restoring any
files from the backup. Looks good to me!

But I can't do a balance anymore?

root@t-mon:~# btrfs balance start /dev/sda1
ERROR: can't access '/dev/sda1'

Regards
Tobias

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-11-02 21:57         ` Chris Murphy
@ 2014-11-03  3:43           ` Zygo Blaxell
  2014-11-03 17:11             ` Chris Murphy
  0 siblings, 1 reply; 89+ messages in thread
From: Zygo Blaxell @ 2014-11-03  3:43 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 5661 bytes --]

On Sun, Nov 02, 2014 at 02:57:22PM -0700, Chris Murphy wrote:
> On Nov 1, 2014, at 10:49 PM, Robert White <rwhite@pobox.com> wrote:
> 
> > On 10/31/2014 10:34 AM, Tobias Holst wrote:
> >> I am now using another system with kernel 3.17.2 and btrfs-tools 3.17
> >> and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add
> >> the second one as there are only two slots in that server.
> >> 
> >> This is what I got:
> >> 
> >>  tobby@ubuntu: sudo btrfs check /dev/sdb1
> >> warning, device 2 is missing
> >> warning devid 2 not found already
> >> root item for root 1746, current bytenr 80450240512, current gen
> >> 163697, current level 2, new bytenr 40074067968, new gen 163707, new
> >> level 2
> >> Found 1 roots with an outdated root item.
> >> Please run a filesystem check with the option --repair to fix them.
> >> 
> >>  tobby@ubuntu: sudo btrfs check --repair /dev/sdb1
> >> enabling repair mode
> >> warning, device 2 is missing
> >> warning devid 2 not found already
> >> Unable to find block group for 0
> >> extent-tree.c:289: find_search_start: Assertion `1` failed.
> > 
> > The read-only snapshots taken under 3.17.1 are your core problem.
> > 
> > Now btrfsck is refusing to operate on the degraded RAID because
> > degraded RAID is degraded so it's read-only. (this is an educated
> > guess).
> 
> Degradedness and writability are orthogonal. If there's some problem
> with the fs that prevents it from being mountable rw, then that'd
> apply for both normal and degraded operation. If the fs is OK, it
> should permit writable degraded mounts.
> 
> > Since btrfsck is _not_ a mount type of operation its got no "degraded
> > mode" that would let you deal with half a RAID as far as I know.
> 
> That's a problem. I can see why a repair might need an additional flag
> (maybe force) to repair a volume that has the minimum number of devices
> for degraded mounting, but not all are present. Maybe we wouldn't want
> it to be easy to accidentally run a repair that changes the file system
> when a device happens to be missing inadvertently that could be found
> and connected later.
> 
> I think related to this is a btrfs equivalent of a bitmap. The metadata
> already has this information in it, but possibly right now btrfs
> lacks the equivalent behavior of mdadm readd when a previously missing
> device is reconnected. If it has a bitmap then it doesn't have to be
> completely rebuilt, the bitmap contains information telling md how to
> "catch up" the readded device, i.e. only that which is different needs
> to be written upon a readd.
> 
> For example if I have a two device Btrfs raid1 for both data and
> metadata, and one device is removed and I mount -o degraded,rw one
> of them and make some small changes, unmount, then reconnect the
> missing device and mount NOT degraded - what happens?  I haven't tried
> this. 

I have.  It's a filesystem-destroying disaster.  Never do it, never let
it happen accidentally.  Make sure that if a disk gets temporarily
disconnected, you either never mount it degraded, or never let it come
back (i.e. take the disk to another machine and wipefs it).  Don't ever,
ever put 'degraded' in /etc/fstab mount options.  Nope.  No.

btrfs seems to assume the data is correct on both disks (the generation
numbers and checksums are OK) but gets confused by equally plausible but
different metadata on each disk.  It doesn't take long before the
filesystem becomes data soup or crashes the kernel.

There is more than one way to get to this point.  Take LVM snapshots of
the devices in a btrfs RAID1 array, and 'btrfs device scan' will see two
different versions of each btrfs device in a btrfs filesystem (one for
the origin LV and one for the snapshot).  btrfs then assembles LVs of
different vintages randomly (e.g. one from the mount command line, one
from an earlier LVM snapshot of the second disk) with disastrous results
similar to the above.  IMHO if btrfs sees multiple devices with the same
UUIDs, it should reject all of them and require an explicit device list;
however, mdadm has a way to deal with this that would also work.

mdadm puts event counters and timestamps in the device superblocks to
prevent any such accidental disjoint assembly and modification of members
of an array.  If disks go temporarily offline with separate modifications
then mdadm refuses to accept disks with different counter+timestamp data
(so you'll get all the disks but one rejected, or only one disk with all
others rejected).  The rejected disk(s) has to go through full device
recovery before rejoining the array--someone has to use mdadm to add
the rejected disk as if it was a new, blank one.

Currently btrfs won't mount a degraded array by default, which prevents
unrecoverable inconsistency.  That's a safe behavior for now, but sooner
or later btrfs will need to be able to safely boot unattended on a
degraded RAID1 root filesystem.

> And I also don't know if a full balance (hours) is needed to
> "catch up" the formerly missing device. With md this is very fast -
> seconds/minutes depending on how much has been changed.

I schedule a scrub immediately after boot, assuming that it will resolve
any data differences (and also assuming that the reboot was caused by
a disk-related glitch, which it usually is for me).  That might not
be enough for metadata differences, and it's certainly not enough for
modifications in degraded mode.  Full balance is out of my reach--it
takes weeks on even my medium-sized filesystems, and mkfs + rsync from
backup is much faster.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-11-03  2:55         ` Tobias Holst
@ 2014-11-03  3:49           ` Robert White
  0 siblings, 0 replies; 89+ messages in thread
From: Robert White @ 2014-11-03  3:49 UTC (permalink / raw)
  To: Tobias Holst; +Cc: Rich Freeman, linux-btrfs@vger.kernel.org

On 11/02/2014 06:55 PM, Tobias Holst wrote:
> But I can't do a balance anymore?
>
> root@t-mon:~# btrfs balance start /dev/sda1
> ERROR: can't access '/dev/sda1'

Balance takes place on a mounted filesystem not a native block device.

So...

mount -t btrfs /dev/sda1 /some/path/somewhere
btrfs balance start /some/path/somewhere



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-11-03  3:43           ` Zygo Blaxell
@ 2014-11-03 17:11             ` Chris Murphy
  2014-11-04  4:31               ` Zygo Blaxell
  0 siblings, 1 reply; 89+ messages in thread
From: Chris Murphy @ 2014-11-03 17:11 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: Btrfs BTRFS

On Nov 2, 2014, at 8:43 PM, Zygo Blaxell <zblaxell@furryterror.org> wrote:

> On Sun, Nov 02, 2014 at 02:57:22PM -0700, Chris Murphy wrote:
>> 
>> For example if I have a two device Btrfs raid1 for both data and
>> metadata, and one device is removed and I mount -o degraded,rw one
>> of them and make some small changes, unmount, then reconnect the
>> missing device and mount NOT degraded - what happens?  I haven't tried
>> this. 
> 
> I have.  It's a filesystem-destroying disaster.  Never do it, never let
> it happen accidentally.  Make sure that if a disk gets temporarily
> disconnected, you either never mount it degraded, or never let it come
> back (i.e. take the disk to another machine and wipefs it).  Don't ever,
> ever put 'degraded' in /etc/fstab mount options.  Nope.  No.

Well I guess I now see why opensuse's plan for Btrfs by default proscribes multiple device Btrfs volumes. The described scenario is really common with users, I see it often on linux-raid@. And md doesn't have this problem. The worst case scenario is if devices don't have bitmaps, and then a whole device rebuild has to happen rather than just a quick "catchup".

> 
> btrfs seems to assume the data is correct on both disks (the generation
> numbers and checksums are OK) but gets confused by equally plausible but
> different metadata on each disk.  It doesn't take long before the
> filesystem becomes data soup or crashes the kernel.

This is a pretty significant problem to still be present, honestly. I can understand the "catchup" mechanism is probably not built yet, but clearly the two devices don't have the same generation. The lower generation device should probably be booted/ignored or declared missing in the meantime to prevent trashing the file system.

Chris Murphy

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-11-03 17:11             ` Chris Murphy
@ 2014-11-04  4:31               ` Zygo Blaxell
  2014-11-04  8:25                 ` Duncan
  2014-11-04 18:28                 ` Chris Murphy
  0 siblings, 2 replies; 89+ messages in thread
From: Zygo Blaxell @ 2014-11-04  4:31 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 2004 bytes --]

On Mon, Nov 03, 2014 at 10:11:18AM -0700, Chris Murphy wrote:
> 
> On Nov 2, 2014, at 8:43 PM, Zygo Blaxell <zblaxell@furryterror.org> wrote:
> > btrfs seems to assume the data is correct on both disks (the generation
> > numbers and checksums are OK) but gets confused by equally plausible but
> > different metadata on each disk.  It doesn't take long before the
> > filesystem becomes data soup or crashes the kernel.
> 
> This is a pretty significant problem to still be present, honestly. I
> can understand the "catchup" mechanism is probably not built yet,
> but clearly the two devices don't have the same generation. The lower
> generation device should probably be booted/ignored or declared missing
> in the meantime to prevent trashing the file system.

The problem with generation numbers is when both devices get divergent
generation numbers but we can't tell them apart, e.g.

	1.  sda generation = 5, sdb generation = 5

	2.  sdb temporarily disconnects, so we are degraded on just sda

	3.  sda gets more generations 6..9

	4.  sda temporarily disconnects, so we have no disks at all.

	5.  the machine reboots, gets sdb back but not sda

If we allow degraded here, then:

	6.  sdb gets more generations 6..9

	7.  sdb disconnects, no disks so no filesystem

	8.  the machine reboots again, this time with sda and sdb present

Now we have two disks with equal generation numbers.  Generations 6..9
on sda are not the same as generations 6..9 on sdb, so if we mix the
two disks' metadata we get bad confusion.

It needs to be more than a sequential number.  If one of the disks
disappears we need to record this fact on the surviving disks, and also
cope with _both_ disks claiming to be the "surviving" one.



> 
> Chris Murphy
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-11-04  4:31               ` Zygo Blaxell
@ 2014-11-04  8:25                 ` Duncan
  2014-11-04 18:28                 ` Chris Murphy
  1 sibling, 0 replies; 89+ messages in thread
From: Duncan @ 2014-11-04  8:25 UTC (permalink / raw)
  To: linux-btrfs

Zygo Blaxell posted on Mon, 03 Nov 2014 23:31:45 -0500 as excerpted:

> On Mon, Nov 03, 2014 at 10:11:18AM -0700, Chris Murphy wrote:
>> 
>> On Nov 2, 2014, at 8:43 PM, Zygo Blaxell <zblaxell@furryterror.org>
>> wrote:
>> > btrfs seems to assume the data is correct on both disks (the
>> > generation numbers and checksums are OK) but gets confused by equally
>> > plausible but different metadata on each disk.  It doesn't take long
>> > before the filesystem becomes data soup or crashes the kernel.
>> 
>> This is a pretty significant problem to still be present, honestly. I
>> can understand the "catchup" mechanism is probably not built yet,
>> but clearly the two devices don't have the same generation. The lower
>> generation device should probably be booted/ignored or declared missing
>> in the meantime to prevent trashing the file system.
> 
> The problem with generation numbers is when both devices get divergent
> generation numbers but we can't tell them apart

[snip very reasonable scenario]

> Now we have two disks with equal generation numbers. 
> Generations 6..9 on sda are not the same as generations 6..9 on sdb, so
> if we mix the two disks' metadata we get bad confusion.
> 
> It needs to be more than a sequential number.  If one of the disks
> disappears we need to record this fact on the surviving disks, and also
> cope with _both_ disks claiming to be the "surviving" one.

Zygo's absolutely correct.  There is an existing catchup mechanism, but 
the tracking is /purely/ sequential generation number based, and if the 
two generation sequences diverge, "Welcome to the (data) Twilight Zone!"

I noted this in my own early pre-deployment raid1 mode testing as well, 
except that I didn't at that point know about sequence numbers and never 
got as far as letting the filesystem make data soup of itself.

What I did was this:

1) Create a two-device raid1 data and metadata filesystem, mount it and 
stick some data on it.

2) Unmount, pull a device, mount degraded the remaining device.

3) Change a file.

4) Unmount, switch devices, mount degraded the other device.

5) Change the same file in an different/incompatible way.

6) Unmount, plug both devices in again, mount (not degraded).

7) Wait for the sync I was used to from mdraid, which of course didn't 
occur.

8) Check the file to see which version showed up.  I don't recall which 
version it was, but it wasn't the common pre-change version.

9) Unmount, pull each device one at a time, mounting the other one 
degraded and checking the file again.

10) The file on each device remained different, without a warning or 
indication of any problem at all when I mounted undegraded in 6/7.

Had I initiated a scrub, presumably it would have seen the difference and 
if one was a newer generation, it would have taken it, overwriting the 
other.  I don't know what it would have done if both were the same 
generation, tho the file being small (just a few line text file, big 
enough to test the effect of differing edits), I guess it would take one 
version or the other.  If the file was large enough to be multiple 
extents, however, I've no idea whether it'd take one or the other, or 
possibly combine the two, picking extents where they differed more or 
less randomly.

By that time the lack of warning and absolute resolution to one version 
or the other even after mounting undegraded and accessing the file with 
incompatible versions on each of the two devices was bothering me 
sufficiently that I didn't test any further.

Being just me I have to worry about (unlike a multi-admin corporate 
scenario where you can never be /sure/ what the other admins will do 
regardless of agreed procedure), I simply set myself a set of rules very 
similar to what Zygo proposed:

1) If for whatever reason I ever split a btrfs raid1 with the intent or 
even the possibility of bringing the pieces back together again, if at 
all possible, never mount the split pieces writable -- mount read-only.

2) If a writable mount is required, keep the writable mounts to one 
device of the split.  As long as the other device is never mounted 
writable, it will have an older generation when they're reunited and a 
scrub should take care of things, reliably resolving to the updated 
written device, rewriting the older generation on the other device.

What I'd do here is physically put the removed side of the raid1 in 
storage, far enough from the remaining side that I couldn't possibly get 
them mixed up.  I'd clearly label it as well, creating a "defense in 
depth" of at least two, the labeling and the physical separation and 
storage of the read-only device.

3) If for whatever reason the originally read-only side must be mounted 
writable, very clearly mark the originally mounted-writable device 
POISONED/TOXIC!!  *NEVER* *EVER* let such a POISONED device anywhere near 
its original raid1 mate, until it is wiped, such that there's no 
possibility of btrfs getting confused and contaminated with the poisoned 
data.

Given how unimpressed I was with btrfs' ability to do the right thing in 
such cases, I'd be tempted to wipefs the device, then dd from 
/dev/zero to it, then badblocks write-pattern test a couple patterns, 
then (if it was a full physical device not just a partition) hardware 
secure-erase it, then mkfs it to ext4 or vfat, then dd from /dev/zero it 
again and again hardware secure-erase it, then FINALLY mkfs.btrfs it 
again.  Of course being ssd, a single mkfs.btrfs would issue a trim and 
that should suffice, but I was really REALLY not impressed with btrfs' 
ability to reliably do the right thing, and would effectively be tearing 
up the schoolbooks (at least the workbooks, since they couldn't be bought 
back) and feeding them to the furnace at the end of the year, as I used 
to do when I was a kid, not because it made a difference, but because it 
was so emotionally rewarding! =:^)

Or maybe I'd make that an excuse to try dban[1].

But I'd probably just dd from /dev/zero or secure-erase it, or badblocks-
write-test a couple patterns if I wanted to badblocks-test it anyway, or 
mkfs.btrfs it to get the trim from that.

But I'd have fun doing it. =:^)

And then I'd plug it back in and btrfs replace the missing device.

Anyway, the point is, either don't reintroduce absent devices once split 
out of a btrfs raid1, or ensure they don't get written and immediately do 
a scrub to update them when reintroduced, or if they were written and the 
other device was too, separately, be sure the one is wiped (Destroy them 
with Lasers![2]) before using a full btrfs replace, to keep the remaining 
device(s) and the data on them healthy. =:^)

---
[1] https://www.google.com/search?q=dban

[2] Destroy them with Lazers! by Knife Party
https://www.google.com/search?q=destroy+them+with+lazers

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-11-04  4:31               ` Zygo Blaxell
  2014-11-04  8:25                 ` Duncan
@ 2014-11-04 18:28                 ` Chris Murphy
  2014-11-04 21:44                   ` Duncan
                                     ` (2 more replies)
  1 sibling, 3 replies; 89+ messages in thread
From: Chris Murphy @ 2014-11-04 18:28 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: Btrfs BTRFS


On Nov 3, 2014, at 9:31 PM, Zygo Blaxell <zblaxell@furryterror.org> wrote:

> On Mon, Nov 03, 2014 at 10:11:18AM -0700, Chris Murphy wrote:
>> 
>> On Nov 2, 2014, at 8:43 PM, Zygo Blaxell <zblaxell@furryterror.org> wrote:
>>> btrfs seems to assume the data is correct on both disks (the generation
>>> numbers and checksums are OK) but gets confused by equally plausible but
>>> different metadata on each disk.  It doesn't take long before the
>>> filesystem becomes data soup or crashes the kernel.
>> 
>> This is a pretty significant problem to still be present, honestly. I
>> can understand the "catchup" mechanism is probably not built yet,
>> but clearly the two devices don't have the same generation. The lower
>> generation device should probably be booted/ignored or declared missing
>> in the meantime to prevent trashing the file system.
> 
> The problem with generation numbers is when both devices get divergent
> generation numbers but we can't tell them apart, e.g.
> 
> 	1.  sda generation = 5, sdb generation = 5
> 
> 	2.  sdb temporarily disconnects, so we are degraded on just sda
> 
> 	3.  sda gets more generations 6..9
> 
> 	4.  sda temporarily disconnects, so we have no disks at all.
> 
> 	5.  the machine reboots, gets sdb back but not sda
> 
> If we allow degraded here, then:
> 
> 	6.  sdb gets more generations 6..9
> 
> 	7.  sdb disconnects, no disks so no filesystem
> 
> 	8.  the machine reboots again, this time with sda and sdb present
> 
> Now we have two disks with equal generation numbers.  Generations 6..9
> on sda are not the same as generations 6..9 on sdb, so if we mix the
> two disks' metadata we get bad confusion.
> 
> It needs to be more than a sequential number.  If one of the disks
> disappears we need to record this fact on the surviving disks, and also
> cope with _both_ disks claiming to be the "surviving" one.

I agree this is also a problem. But the most common case is where we know that sda generation is newer (larger value) and most recently modified, and sdb has not since been modified but needs to be caught up. As far as I know the only way to do that on Btrfs right now is a full balance, it doesn't catch up just be being reconnected with a normal mount.


Chris Murphy

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-11-04 18:28                 ` Chris Murphy
@ 2014-11-04 21:44                   ` Duncan
  2014-11-04 22:19                   ` Robert White
  2014-11-04 22:34                   ` Zygo Blaxell
  2 siblings, 0 replies; 89+ messages in thread
From: Duncan @ 2014-11-04 21:44 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Tue, 04 Nov 2014 11:28:39 -0700 as excerpted:

>> It needs to be more than a sequential number.  If one of the disks
>> disappears we need to record this fact on the surviving disks, and also
>> cope with _both_ disks claiming to be the "surviving" one.
> 
> I agree this is also a problem. But the most common case is where we
> know that sda generation is newer (larger value) and most recently
> modified, and sdb has not since been modified but needs to be caught up.
> As far as I know the only way to do that on Btrfs right now is a full
> balance, it doesn't catch up just be being reconnected with a normal
> mount.

I thought it was a scrub that would take care of that, not a balance?

(Maybe do both to be sure?)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-11-04 18:28                 ` Chris Murphy
  2014-11-04 21:44                   ` Duncan
@ 2014-11-04 22:19                   ` Robert White
  2014-11-04 22:34                   ` Zygo Blaxell
  2 siblings, 0 replies; 89+ messages in thread
From: Robert White @ 2014-11-04 22:19 UTC (permalink / raw)
  To: Chris Murphy, Zygo Blaxell; +Cc: Btrfs BTRFS

On 11/04/2014 10:28 AM, Chris Murphy wrote:
> On Nov 3, 2014, at 9:31 PM, Zygo Blaxell <zblaxell@furryterror.org> wrote:
>> Now we have two disks with equal generation numbers.  Generations 6..9
>> on sda are not the same as generations 6..9 on sdb, so if we mix the
>> two disks' metadata we get bad confusion.
>>
>> It needs to be more than a sequential number.  If one of the disks
>> disappears we need to record this fact on the surviving disks, and also
>> cope with _both_ disks claiming to be the "surviving" one.
>
> I agree this is also a problem. But the most common case is where we know that sda generation is newer (larger value) and most recently modified, and sdb has not since been modified but needs to be caught up. As far as I know the only way to do that on Btrfs right now is a full balance, it doesn't catch up just be being reconnected with a normal mount.

I would think that any time any system or fraction thereof is mounted 
with both a "degraded" and "rw", status a degraded flag should be set 
somewhere/somehow in the superblock etc.

The only way to clear this flag would be to reach a "reconciled" state. 
That state could be reached in one of several ways. Removing the missing 
mirror element would be a fast reconcile, doing a balance or scrub would 
be a slow reconcile for a filessytem where all the media are returned to 
service (e.g. the missing volume of a RAID 1 etc is returned.)

Generation numbers are pretty good, but I'd put on a rider that any 
generation number or equivelant incremented while the system is degraded 
should have a unique quanta (say a GUID) generated and stored along with 
the generation number. The mere existence of this quanta would act as 
the degraded flag.

Any check/compare/access related to the generation number would know to 
notice that the GUID is in place and do the necessary resolution. If 
successful the GUID would be discarded.

As to how this could be implemented, I'm not fully conversant on the 
internal layout.

One possibility would be to add a block reference, or, indeed replace 
the current storage for generation numbers completely with block 
reference to a block containing the generation number and the potential 
GUID. The main value of having an out-of-structure reference is that its 
content is less space constrained, and it could be shared by multiple 
usages. In the case, for instance, where the block is added (as opposed 
to replacing the generation number) only one such block would be needed 
per degraded,rw mount, and it could be attached to as many filesystem 
structures as needed.

Just as metadata under DUP is divergent after a degraded mount, a 
generation block wold be divergent, and likely in a different location 
than its peers on a subsequent restored geometry.

A gerenation block could have other nicities like the date/time and the 
devices present (or absent); such information could conceivably be used 
to intellegently disambiguate references. For instance if one degraded 
mount had sda and sdb, and second had sdb and sdc, then itd be known 
that sdb was dominant for having been present every time.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: filesystem corruption
  2014-11-04 18:28                 ` Chris Murphy
  2014-11-04 21:44                   ` Duncan
  2014-11-04 22:19                   ` Robert White
@ 2014-11-04 22:34                   ` Zygo Blaxell
  2 siblings, 0 replies; 89+ messages in thread
From: Zygo Blaxell @ 2014-11-04 22:34 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1414 bytes --]

On Tue, Nov 04, 2014 at 11:28:39AM -0700, Chris Murphy wrote:
> On Nov 3, 2014, at 9:31 PM, Zygo Blaxell <zblaxell@furryterror.org> wrote:
> > It needs to be more than a sequential number.  If one of the disks
> > disappears we need to record this fact on the surviving disks, and also
> > cope with _both_ disks claiming to be the "surviving" one.
> 
> I agree this is also a problem. But the most common case is where we
> know that sda generation is newer (larger value) and most recently
> modified, and sdb has not since been modified but needs to be caught
> up. As far as I know the only way to do that on Btrfs right now is
> a full balance, it doesn't catch up just be being reconnected with a
> normal mount.

The data on the disks might be inconistent, so resynchronization must
read from only the "good" copy.  A balance could just spread corruption
around if it reads from two out-of-sync mirrors.  (Maybe it already does
the right thing if sdb was not modified...?).

The full resync operation is more like btrfs device replace, except that
it's replacing a disk in-place (i.e. without removing it first), and it
would not read from the non-"good" disk.

> 
> Chris Murphy--
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Filesystem Corruption
@ 2018-12-03  9:31 Stefan Malte Schumacher
  2018-12-03 11:34 ` Qu Wenruo
  2018-12-03 16:29 ` remi
  0 siblings, 2 replies; 89+ messages in thread
From: Stefan Malte Schumacher @ 2018-12-03  9:31 UTC (permalink / raw)
  To: Btrfs BTRFS

Hello,

I have noticed an unusual amount of crc-errors in downloaded rars,
beginning about a week ago. But lets start with the preliminaries. I
am using Debian Stretch.
Kernel: Linux mars 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4
(2018-08-21) x86_64 GNU/Linux
BTRFS-Tools btrfs-progs  4.7.3-1
Smartctl shows no errors for any of the drives in the filesystem.

Btrfs /dev/stats shows zero errors, but dmesg gives me a lot of
filesystem related error messages.

[5390748.884929] Buffer I/O error on dev dm-0, logical block
976701312, async page read
This errors is shown a lot of time in the log.

This seems to affect just newly written files. This is the output of
btrfs scrub status:
scrub status for 1609e4e1-4037-4d31-bf12-f84a691db5d8
        scrub started at Tue Nov 27 06:02:04 2018 and finished after 07:34:16
        total bytes scrubbed: 17.29TiB with 0 errors

What is the probable cause of these errors? How can I fix this?

Thanks in advance for your advice
Stefan

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem Corruption
  2018-12-03  9:31 Filesystem Corruption Stefan Malte Schumacher
@ 2018-12-03 11:34 ` Qu Wenruo
  2018-12-03 16:29 ` remi
  1 sibling, 0 replies; 89+ messages in thread
From: Qu Wenruo @ 2018-12-03 11:34 UTC (permalink / raw)
  To: Stefan Malte Schumacher, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 1387 bytes --]



On 2018/12/3 下午5:31, Stefan Malte Schumacher wrote:
> Hello,
> 
> I have noticed an unusual amount of crc-errors in downloaded rars,
> beginning about a week ago. But lets start with the preliminaries. I
> am using Debian Stretch.
> Kernel: Linux mars 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4
> (2018-08-21) x86_64 GNU/Linux
> BTRFS-Tools btrfs-progs  4.7.3-1
> Smartctl shows no errors for any of the drives in the filesystem.
> 
> Btrfs /dev/stats shows zero errors, but dmesg gives me a lot of
> filesystem related error messages.
> 
> [5390748.884929] Buffer I/O error on dev dm-0, logical block
> 976701312, async page read
> This errors is shown a lot of time in the log.

No "btrfs:" prefix, looks more like an error message from block level,
no wonder btrfs shows no error at all.

What is the underlying device mapper?

And further more, is there any kernel message with "btrfs"
(case-insensitive) in it?

Thanks,
Qu
> 
> This seems to affect just newly written files. This is the output of
> btrfs scrub status:
> scrub status for 1609e4e1-4037-4d31-bf12-f84a691db5d8
>         scrub started at Tue Nov 27 06:02:04 2018 and finished after 07:34:16
>         total bytes scrubbed: 17.29TiB with 0 errors
> 
> What is the probable cause of these errors? How can I fix this?
> 
> Thanks in advance for your advice
> Stefan
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Filesystem Corruption
  2018-12-03  9:31 Filesystem Corruption Stefan Malte Schumacher
  2018-12-03 11:34 ` Qu Wenruo
@ 2018-12-03 16:29 ` remi
  1 sibling, 0 replies; 89+ messages in thread
From: remi @ 2018-12-03 16:29 UTC (permalink / raw)
  To: Stefan Malte Schumacher, Btrfs BTRFS

On Mon, Dec 3, 2018, at 4:31 AM, Stefan Malte Schumacher wrote:

> I have noticed an unusual amount of crc-errors in downloaded rars,
> beginning about a week ago. But lets start with the preliminaries. I
> am using Debian Stretch.
> Kernel: Linux mars 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4
> (2018-08-21) x86_64 GNU/Linux
> 
> [5390748.884929] Buffer I/O error on dev dm-0, logical block
> 976701312, async page read

Excuse me for butting when there are *many* more qualified people on this list.

But assuming the rar crc errors are related to your unexplained buffer I/O errors, (and not some weird coincidence of simply bad downloads.), I would start, immediately, by testing the Memory.  Ram corruption can wreak havok with btrfs, (any filesystem but I think BTRFS has special challenges in this regard.)  and this looks like memory error to me.

^ permalink raw reply	[flat|nested] 89+ messages in thread

end of thread, other threads:[~2018-12-03 16:29 UTC | newest]

Thread overview: 89+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-31  0:29 filesystem corruption Tobias Holst
2014-10-31  1:02 ` Tobias Holst
2014-10-31  2:41   ` Rich Freeman
2014-10-31 17:34     ` Tobias Holst
2014-11-02  4:49       ` Robert White
2014-11-02 21:57         ` Chris Murphy
2014-11-03  3:43           ` Zygo Blaxell
2014-11-03 17:11             ` Chris Murphy
2014-11-04  4:31               ` Zygo Blaxell
2014-11-04  8:25                 ` Duncan
2014-11-04 18:28                 ` Chris Murphy
2014-11-04 21:44                   ` Duncan
2014-11-04 22:19                   ` Robert White
2014-11-04 22:34                   ` Zygo Blaxell
2014-11-03  2:55         ` Tobias Holst
2014-11-03  3:49           ` Robert White
  -- strict thread matches above, loose matches on Subject: below --
2018-12-03  9:31 Filesystem Corruption Stefan Malte Schumacher
2018-12-03 11:34 ` Qu Wenruo
2018-12-03 16:29 ` remi
2011-01-03  1:58 filesystem corruption Patrick H.
2011-01-03  3:16 ` Neil Brown
     [not found]   ` <4D214B5C.3010103@feystorm.net>
2011-01-03  4:56     ` Neil Brown
2011-01-03  5:05       ` Patrick H.
2011-01-04  5:33         ` NeilBrown
2011-01-04  7:50           ` Patrick H.
2011-01-04 17:31             ` Patrick H.
2011-01-05  1:22               ` Patrick H.
2011-01-05  7:02   ` CoolCold
     [not found]   ` <AANLkTinL_nz58f8rSPuhYvVwGY5jdu1XVkNLC1ky5A65@mail.gmail.com>
2011-01-05 14:28     ` Patrick H.
2011-01-05 15:52       ` Spelic
2011-01-05 15:55         ` Patrick H.
2007-06-06  3:10 Filesystem corruption Xu CanHao
2007-06-06 12:16 ` Ingo Bormuth
2007-05-30 20:13 devsk
2007-05-30 17:22 devsk
2007-05-30 19:24 ` Toby Thain
2007-05-30 20:03 ` David Masover
2007-05-31  0:11   ` Ingo Bormuth
2007-06-02 23:10     ` Edward Shishkin
2007-06-04  2:55       ` Ingo Bormuth
2007-06-04  9:41         ` Edward Shishkin
2007-06-05 23:20           ` Ingo Bormuth
2007-05-27 13:18 Laurent CARON
2007-05-28 12:23 ` Vladimir V. Saveliev
2007-05-28 14:10   ` Laurent CARON
2007-05-28 17:13     ` Vladimir V. Saveliev
2007-05-28 17:27       ` Laurent CARON
     [not found] ` <Pine.LNX.4.64.0705280025570.10429@sheep.housecafe.de>
2007-05-28 17:31   ` Christian Kujau
2007-05-28 18:16     ` Laurent CARON
2007-05-28 23:19       ` Christian Kujau
2007-05-29  8:39       ` Vladimir V. Saveliev
     [not found] ` <465BA9AC.8040805@ultraviolet.org>
2007-05-29  8:15   ` Vladimir V. Saveliev
2007-05-29 12:36     ` Toby Thain
2007-05-30 13:25       ` David Masover
2007-05-30 16:02         ` Vladimir V. Saveliev
2007-05-30 20:06           ` David Masover
2007-05-30 16:42         ` Toby Thain
2007-05-30 19:42           ` David Masover
2007-05-30 16:08       ` Vladimir V. Saveliev
2003-08-13 16:05 Locke
2003-08-14  7:49 ` Oleg Drokin
2002-09-05 15:57 Filesystem Corruption Brian Tinsley
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-07  7:15 ` Oleg Drokin
2002-06-11 16:49   ` Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2002-06-06 18:00 Kurt
2001-02-05 16:00 Filesystem corruption Ian Chilton
2001-02-05 13:16 Ian Chilton
2001-01-31 14:20 Carsten Langgaard
2001-01-31 15:52 ` Florian Lohoff
2001-01-31 16:24   ` Carsten Langgaard
2001-01-31 16:48     ` Florian Lohoff
2001-02-05 10:02 ` Ralf Baechle
2001-02-05 12:10   ` Alan Cox
2001-02-05 12:10     ` Alan Cox
2001-02-05 12:56     ` Geert Uytterhoeven
2001-02-05 13:01       ` Alan Cox
2001-02-05 13:01         ` Alan Cox
2001-02-05 22:01         ` Ralf Baechle
2001-02-05 22:01           ` Ralf Baechle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.