Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* unexplainable corruptions 3.17.0
@ 2014-10-16  9:17 Tomasz Torcz
  2014-10-17  8:02 ` Liu Bo
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Tomasz Torcz @ 2014-10-16  9:17 UTC (permalink / raw)
  To: linux-btrfs

Hi,

  Recently I've observed some corruptions to systemd's journal
files which are somewhat puzzling. This is especially worrying
as this is btrfs raid1 setup and I expected auto-healing.

  System details: 3.17.0-301.fc21.x86_64
btrfs: raid1 over 2x dm-crypted 6TB HDDs.
mount opts: rw,relatime,seclabel,compress=lzo,space_cache

  Broken files are in /var/log/journal directory. This directory
is set NOCOW with chattr, all the files within too.

Example of broken file:
system@0005057fe87730cf-6d3d85ed59bd70ae.journal~

When read with dd_rescue, there are many I/O errors
reported, the summary looks like that (x = error):
>-..-..xxxxxxxxx---x.-..-..-...-..-..-...-< 100%

  Reads with cat, hexdump fails with:
read(4, 0x1001000, 65536)               = -1 EIO (Input/output error)

  But btrfs dev stat reports no errors!
$ btrfs dev stat .
[/dev/dm-0].write_io_errs   0
[/dev/dm-0].read_io_errs    0
[/dev/dm-0].flush_io_errs   0
[/dev/dm-0].corruption_errs 0
[/dev/dm-0].generation_errs 0
[/dev/dm-1].write_io_errs   0
[/dev/dm-1].read_io_errs    0
[/dev/dm-1].flush_io_errs   0
[/dev/dm-1].corruption_errs 0
[/dev/dm-1].generation_errs 0

  There are no hardware errors in dmesg.

  This is perplexing.  How to find out what is causing the
brokeness and howto avoid it in the future?

-- 
Tomasz   .. oo o.   oo o. .o   .o o. o. oo o.   ..
Torcz    .. .o .o   .o .o oo   oo .o .. .. oo   oo
o.o.o.   .o .. o.   o. o. o.   o. o. oo .. ..   o.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-16  9:17 unexplainable corruptions 3.17.0 Tomasz Torcz
@ 2014-10-17  8:02 ` Liu Bo
  2014-10-17  8:10   ` Tomasz Torcz
                     ` (2 more replies)
  2014-10-17  8:17 ` Marc Dietrich
  2014-10-17 15:01 ` Chris Murphy
  2 siblings, 3 replies; 21+ messages in thread
From: Liu Bo @ 2014-10-17  8:02 UTC (permalink / raw)
  To: Tomasz Torcz, linux-btrfs

On Thu, Oct 16, 2014 at 11:17:26AM +0200, Tomasz Torcz wrote:
> Hi,
> 
>   Recently I've observed some corruptions to systemd's journal
> files which are somewhat puzzling. This is especially worrying
> as this is btrfs raid1 setup and I expected auto-healing.
> 
>   System details: 3.17.0-301.fc21.x86_64
> btrfs: raid1 over 2x dm-crypted 6TB HDDs.
> mount opts: rw,relatime,seclabel,compress=lzo,space_cache
> 
>   Broken files are in /var/log/journal directory. This directory
> is set NOCOW with chattr, all the files within too.
> 
> Example of broken file:
> system@0005057fe87730cf-6d3d85ed59bd70ae.journal~
> 
> When read with dd_rescue, there are many I/O errors
> reported, the summary looks like that (x = error):
> >-..-..xxxxxxxxx---x.-..-..-...-..-..-...-< 100%
> 
>   Reads with cat, hexdump fails with:
> read(4, 0x1001000, 65536)               = -1 EIO (Input/output error)
> 
>   But btrfs dev stat reports no errors!
> $ btrfs dev stat .
> [/dev/dm-0].write_io_errs   0
> [/dev/dm-0].read_io_errs    0
> [/dev/dm-0].flush_io_errs   0
> [/dev/dm-0].corruption_errs 0
> [/dev/dm-0].generation_errs 0
> [/dev/dm-1].write_io_errs   0
> [/dev/dm-1].read_io_errs    0
> [/dev/dm-1].flush_io_errs   0
> [/dev/dm-1].corruption_errs 0
> [/dev/dm-1].generation_errs 0
> 
>   There are no hardware errors in dmesg.
> 
>   This is perplexing.  How to find out what is causing the
> brokeness and howto avoid it in the future?

Does scrub work for you?

thanks,
-liubo

> 
> -- 
> Tomasz   .. oo o.   oo o. .o   .o o. o. oo o.   ..
> Torcz    .. .o .o   .o .o oo   oo .o .. .. oo   oo
> o.o.o.   .o .. o.   o. o. o.   o. o. oo .. ..   o.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-17  8:02 ` Liu Bo
@ 2014-10-17  8:10   ` Tomasz Torcz
  2014-10-17  8:17     ` Hugo Mills
  2014-10-17  8:29     ` Liu Bo
  2014-10-17 11:38   ` Duncan
  2014-10-17 17:29   ` Tomasz Torcz
  2 siblings, 2 replies; 21+ messages in thread
From: Tomasz Torcz @ 2014-10-17  8:10 UTC (permalink / raw)
  To: linux-btrfs

On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
> >   Recently I've observed some corruptions to systemd's journal
> > files which are somewhat puzzling. This is especially worrying
> > as this is btrfs raid1 setup and I expected auto-healing.
> > 
> >   System details: 3.17.0-301.fc21.x86_64
> > btrfs: raid1 over 2x dm-crypted 6TB HDDs.
> > mount opts: rw,relatime,seclabel,compress=lzo,space_cache
> >   Reads with cat, hexdump fails with:
> > read(4, 0x1001000, 65536)               = -1 EIO (Input/output error)
> > 
> Does scrub work for you?

  As there seem to be no way to scrub individual files, I've started
scrub of full volume.  It will take some hours to finish.

  Meanwhile, could you satisfy my curiosity what would scrub do that
wouldn't be done by just reading the whole file?

-- 
Tomasz Torcz               "Never underestimate the bandwidth of a station
xmpp: zdzichubg@chrome.pl    wagon filled with backup tapes." -- Jim Gray


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-16  9:17 unexplainable corruptions 3.17.0 Tomasz Torcz
  2014-10-17  8:02 ` Liu Bo
@ 2014-10-17  8:17 ` Marc Dietrich
  2014-10-17 15:01 ` Chris Murphy
  2 siblings, 0 replies; 21+ messages in thread
From: Marc Dietrich @ 2014-10-17  8:17 UTC (permalink / raw)
  To: Tomasz Torcz; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 943 bytes --]

Am Donnerstag, 16. Oktober 2014, 11:17:26 schrieb Tomasz Torcz:
> Hi,
> 
>   Recently I've observed some corruptions to systemd's journal
> files which are somewhat puzzling. This is especially worrying
> as this is btrfs raid1 setup and I expected auto-healing.
> 
>   System details: 3.17.0-301.fc21.x86_64
> btrfs: raid1 over 2x dm-crypted 6TB HDDs.
> mount opts: rw,relatime,seclabel,compress=lzo,space_cache
> 
>   Broken files are in /var/log/journal directory. This directory
> is set NOCOW with chattr, all the files within too.
> 
> Example of broken file:
> system@0005057fe87730cf-6d3d85ed59bd70ae.journal~
> 
> When read with dd_rescue, there are many I/O errors
> 
> reported, the summary looks like that (x = error):
> >-..-..xxxxxxxxx---x.-..-..-...-..-..-...-< 100%

sounds like
   https://patchwork.kernel.org/patch/4929981/
to me. We urgently need some stable patches or people will quickly corrupt 
their filesystems.

Marc

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-17  8:10   ` Tomasz Torcz
@ 2014-10-17  8:17     ` Hugo Mills
  2014-10-20 14:04       ` Zygo Blaxell
  2014-10-17  8:29     ` Liu Bo
  1 sibling, 1 reply; 21+ messages in thread
From: Hugo Mills @ 2014-10-17  8:17 UTC (permalink / raw)
  To: Tomasz Torcz, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]

On Fri, Oct 17, 2014 at 10:10:09AM +0200, Tomasz Torcz wrote:
> On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
> > >   Recently I've observed some corruptions to systemd's journal
> > > files which are somewhat puzzling. This is especially worrying
> > > as this is btrfs raid1 setup and I expected auto-healing.
> > > 
> > >   System details: 3.17.0-301.fc21.x86_64
> > > btrfs: raid1 over 2x dm-crypted 6TB HDDs.
> > > mount opts: rw,relatime,seclabel,compress=lzo,space_cache
> > >   Reads with cat, hexdump fails with:
> > > read(4, 0x1001000, 65536)               = -1 EIO (Input/output error)
> > > 
> > Does scrub work for you?
> 
>   As there seem to be no way to scrub individual files, I've started
> scrub of full volume.  It will take some hours to finish.
> 
>   Meanwhile, could you satisfy my curiosity what would scrub do that
> wouldn't be done by just reading the whole file?

   It checks both copies. Reading the file will only read one of the
copies of any given block (so if that's good and the other copy is
bad, it won't fix anything).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
              --- The future isn't what it used to be. ---               

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-17  8:10   ` Tomasz Torcz
  2014-10-17  8:17     ` Hugo Mills
@ 2014-10-17  8:29     ` Liu Bo
  2014-10-17  8:54       ` Tomasz Torcz
  1 sibling, 1 reply; 21+ messages in thread
From: Liu Bo @ 2014-10-17  8:29 UTC (permalink / raw)
  To: Tomasz Torcz, linux-btrfs

On Fri, Oct 17, 2014 at 10:10:09AM +0200, Tomasz Torcz wrote:
> On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
> > >   Recently I've observed some corruptions to systemd's journal
> > > files which are somewhat puzzling. This is especially worrying
> > > as this is btrfs raid1 setup and I expected auto-healing.
> > > 
> > >   System details: 3.17.0-301.fc21.x86_64
> > > btrfs: raid1 over 2x dm-crypted 6TB HDDs.
> > > mount opts: rw,relatime,seclabel,compress=lzo,space_cache
> > >   Reads with cat, hexdump fails with:
> > > read(4, 0x1001000, 65536)               = -1 EIO (Input/output error)
> > > 
> > Does scrub work for you?
> 
>   As there seem to be no way to scrub individual files, I've started
> scrub of full volume.  It will take some hours to finish.
> 
>   Meanwhile, could you satisfy my curiosity what would scrub do that
> wouldn't be done by just reading the whole file?

(Hugo has answered that in this thread.)

Well..I don't know exactly what's the cause, but as the file is NOCOW, it writes
data in place, have you experienced a hard reboot or something recently?

And any message in dmesg log while getting EIO by reading the file?

thanks,
-liubo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-17  8:29     ` Liu Bo
@ 2014-10-17  8:54       ` Tomasz Torcz
  2014-10-17 12:53         ` Chris Mason
  0 siblings, 1 reply; 21+ messages in thread
From: Tomasz Torcz @ 2014-10-17  8:54 UTC (permalink / raw)
  To: linux-btrfs

On Fri, Oct 17, 2014 at 04:29:36PM +0800, Liu Bo wrote:
> On Fri, Oct 17, 2014 at 10:10:09AM +0200, Tomasz Torcz wrote:
> > On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
> > > >   Recently I've observed some corruptions to systemd's journal
> > > > files which are somewhat puzzling. This is especially worrying
> > > > as this is btrfs raid1 setup and I expected auto-healing.
> > > > read(4, 0x1001000, 65536)               = -1 EIO (Input/output error)
> 
> Well..I don't know exactly what's the cause, but as the file is NOCOW, it writes
> data in place, have you experienced a hard reboot or something recently?
 
  Nothing like that.  Server is on an UPS, there were couple normal shutdowns
this year (few kernel upgrades).

> And any message in dmesg log while getting EIO by reading the file?

  Nothing in dmesg, no btrfs messages, no SCSI/SATA errors, nothing. That's
why I find those corruptions mysterious.
  Maybe there is some way to inspect internal btrfs state and find out what
causing the problems?  Or maybe this is related to patch mentioned in this thread?

-- 
Tomasz Torcz               "Never underestimate the bandwidth of a station
xmpp: zdzichubg@chrome.pl    wagon filled with backup tapes." -- Jim Gray


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-17  8:02 ` Liu Bo
  2014-10-17  8:10   ` Tomasz Torcz
@ 2014-10-17 11:38   ` Duncan
  2014-10-17 15:07     ` Chris Murphy
  2014-10-17 17:29   ` Tomasz Torcz
  2 siblings, 1 reply; 21+ messages in thread
From: Duncan @ 2014-10-17 11:38 UTC (permalink / raw)
  To: linux-btrfs

Liu Bo posted on Fri, 17 Oct 2014 16:02:03 +0800 as excerpted:

> On Thu, Oct 16, 2014 at 11:17:26AM +0200, Tomasz Torcz wrote:
>> Hi,
>> 
>>   Recently I've observed some corruptions to systemd's journal
>> files which are somewhat puzzling. This is especially worrying as this
>> is btrfs raid1 setup and I expected auto-healing.
>> 
>>   System details: 3.17.0-301.fc21.x86_64
>> btrfs: raid1 over 2x dm-crypted 6TB HDDs.
>> mount opts: rw,relatime,seclabel,compress=lzo,space_cache
>> 
>>   Broken files are in /var/log/journal directory. This directory
>> is set NOCOW with chattr, all the files within too.
> 
> Does scrub work for you?

NOCOW implies no checksum, so scrub shouldn't be able to help.

Some time back people were reporting problems with corrupted journald 
journal files, but I've seen no such reports in a long time.

This isn't likely much help for your (OP's) use-case, but FWIW, here's 
what I did with journald.

When I switched to systemd here, I set it to volatile storage only, and 
kept syslog-ng setup for longer term storage.  I arranged things so 
journald's volatile logs had enough room to grow for a normal single 
session in the /run/log tmpfs.  That gives me the nice journald systemd 
integration, systemctl status reporting the last few log entries for a 
specific service, etc.

But everything still gets passed to syslog-ng (which being on gentoo, I 
set the systemd USE flag for, so it integrates nicely) as well, and that 
spits out my normal text logs just as I had it setup to do long before 
systemd ever came along.  It's those that I keep on non-volatile storage 
so they stick around thru a reboot, and they play nicely with btrfs so 
I've not had to worry about what journald's binary files might do.

Btw, unless you have a need for relatime, noatime is strongly recommended 
for btrfs.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-17  8:54       ` Tomasz Torcz
@ 2014-10-17 12:53         ` Chris Mason
  2014-10-17 18:09           ` Rich Freeman
  2014-10-20 19:09           ` Tomasz Torcz
  0 siblings, 2 replies; 21+ messages in thread
From: Chris Mason @ 2014-10-17 12:53 UTC (permalink / raw)
  To: Tomasz Torcz; +Cc: linux-btrfs

On Fri, Oct 17, 2014 at 4:54 AM, Tomasz Torcz <tomek@pipebreaker.pl> 
wrote:
> On Fri, Oct 17, 2014 at 04:29:36PM +0800, Liu Bo wrote:
>>  On Fri, Oct 17, 2014 at 10:10:09AM +0200, Tomasz Torcz wrote:
>>  > On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
>>  > > >   Recently I've observed some corruptions to systemd's journal
>>  > > > files which are somewhat puzzling. This is especially worrying
>>  > > > as this is btrfs raid1 setup and I expected auto-healing.
>>  > > > read(4, 0x1001000, 65536)               = -1 EIO 
>> (Input/output error)
>> 
>>  Well..I don't know exactly what's the cause, but as the file is 
>> NOCOW, it writes
>>  data in place, have you experienced a hard reboot or something 
>> recently?
> 
>   Nothing like that.  Server is on an UPS, there were couple normal 
> shutdowns
> this year (few kernel upgrades).
> 
>>  And any message in dmesg log while getting EIO by reading the file?
> 
>   Nothing in dmesg, no btrfs messages, no SCSI/SATA errors, nothing. 
> That's
> why I find those corruptions mysterious.
>   Maybe there is some way to inspect internal btrfs state and find 
> out what
> causing the problems?  Or maybe this is related to patch mentioned in 
> this thread?

This sounds like the problem fixed with some patches to our extent 
mapping code  that went in with the merge window.  I've cherry picked a 
few for stable and I'm running them through tests now.  They are in my 
stable-3.17 branch, and I'll send to Greg once Linus grabs the revert 
for the last one.

But, if you want to try that branch out, it may fix this EIO.  
Otherwise we'll start sending you debugging.

-chris




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-16  9:17 unexplainable corruptions 3.17.0 Tomasz Torcz
  2014-10-17  8:02 ` Liu Bo
  2014-10-17  8:17 ` Marc Dietrich
@ 2014-10-17 15:01 ` Chris Murphy
  2014-10-20 19:10   ` Tomasz Torcz
  2 siblings, 1 reply; 21+ messages in thread
From: Chris Murphy @ 2014-10-17 15:01 UTC (permalink / raw)
  To: Btrfs BTRFS


On Oct 16, 2014, at 5:17 AM, Tomasz Torcz <tomek@pipebreaker.pl> wrote:
> 
>  Broken files are in /var/log/journal directory. This directory
> is set NOCOW with chattr, all the files within too.
> 
> Example of broken file:
> system@0005057fe87730cf-6d3d85ed59bd70ae.journal~

What do you get for 'journalctl --verify' ? I'm curious if any journal files are considered corrupt by journalctl, and if there's parity between journalctl and dd_rescue when it comes to good/bad journals.

> 
> When read with dd_rescue, there are many I/O errors
> reported, the summary looks like that (x = error):
>> -..-..xxxxxxxxx---x.-..-..-...-..-..-...-< 100%
> 
>  Reads with cat, hexdump fails with:
> read(4, 0x1001000, 65536)               = -1 EIO (Input/output error)

Yeah weird, I'd expect in any case that there'd be a kernel message, whether it's a Btrfs or hardware problem.

Chris Murphy


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-17 11:38   ` Duncan
@ 2014-10-17 15:07     ` Chris Murphy
  0 siblings, 0 replies; 21+ messages in thread
From: Chris Murphy @ 2014-10-17 15:07 UTC (permalink / raw)
  To: Btrfs BTRFS


On Oct 17, 2014, at 7:38 AM, Duncan <1i5t5.duncan@cox.net> wrote:
> 
> When I switched to systemd here, I set it to volatile storage only, and 
> kept syslog-ng setup for longer term storage.  I arranged things so 
> journald's volatile logs had enough room to grow for a normal single 
> session in the /run/log tmpfs.  That gives me the nice journald systemd 
> integration, systemctl status reporting the last few log entries for a 
> specific service, etc.
> 
> But everything still gets passed to syslog-ng 

For the uninitiated: To do the above, delete /var/log/journal and install syslog daemon of choice (and is systemd-journald compatible of course). That's it. By deleting /var/log/journal, systemd-journald will write logs to /run/log/journal.


Chris Murphy


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-17  8:02 ` Liu Bo
  2014-10-17  8:10   ` Tomasz Torcz
  2014-10-17 11:38   ` Duncan
@ 2014-10-17 17:29   ` Tomasz Torcz
  2 siblings, 0 replies; 21+ messages in thread
From: Tomasz Torcz @ 2014-10-17 17:29 UTC (permalink / raw)
  To: linux-btrfs

On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
> 
> Does scrub work for you?
> 

  Scrub ended with not errors:
scrub status for a4f339d4-c129-4485-acc1-1233d29c665d
        scrub started at Fri Oct 17 10:04:24 2014 and finished after 31992 seconds
        total bytes scrubbed: 6.03TiB with 0 errors

I guess I'll have to check the patch Marc pointed out.

-- 
Tomasz Torcz               "Never underestimate the bandwidth of a station
xmpp: zdzichubg@chrome.pl    wagon filled with backup tapes." -- Jim Gray


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-17 12:53         ` Chris Mason
@ 2014-10-17 18:09           ` Rich Freeman
  2014-10-18  7:32             ` Chris Samuel
  2014-10-20 19:09           ` Tomasz Torcz
  1 sibling, 1 reply; 21+ messages in thread
From: Rich Freeman @ 2014-10-17 18:09 UTC (permalink / raw)
  To: Chris Mason; +Cc: Tomasz Torcz, Btrfs BTRFS

On Fri, Oct 17, 2014 at 8:53 AM, Chris Mason <clm@fb.com> wrote:
> This sounds like the problem fixed with some patches to our extent mapping
> code  that went in with the merge window.  I've cherry picked a few for
> stable and I'm running them through tests now.  They are in my stable-3.17
> branch, and I'll send to Greg once Linus grabs the revert for the last one.

Just for clarity - when can we expect to see these in the kernel?  I
wasn't sure which merge windows you're referring to.  I take it that
3.17.1 is still unpatched (for this and the readonly snapshot issue -
which requires reverting 9c3b306e1c9e6be4be09e99a8fe2227d1005effc).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-17 18:09           ` Rich Freeman
@ 2014-10-18  7:32             ` Chris Samuel
  2014-10-19  3:01               ` Chris Samuel
  2014-10-20  8:01               ` Marc Dietrich
  0 siblings, 2 replies; 21+ messages in thread
From: Chris Samuel @ 2014-10-18  7:32 UTC (permalink / raw)
  To: linux-btrfs

On Fri, 17 Oct 2014 02:09:30 PM Rich Freeman wrote:

> Just for clarity - when can we expect to see these in the kernel?

The stable kernel rules say:

https://www.kernel.org/doc/Documentation/stable_kernel_rules.txt

#  - It or an equivalent fix must already exist in Linus' tree (upstream).

So until Linus merges the revert into the mainline kernel it cannot go into a 
stable release, and he's not merged it yet.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-18  7:32             ` Chris Samuel
@ 2014-10-19  3:01               ` Chris Samuel
  2014-10-20  8:01               ` Marc Dietrich
  1 sibling, 0 replies; 21+ messages in thread
From: Chris Samuel @ 2014-10-19  3:01 UTC (permalink / raw)
  To: linux-btrfs

On Sat, 18 Oct 2014 06:32:49 PM Chris Samuel wrote:

> So until Linus merges the revert into the mainline kernel it cannot go into
> a  stable release, and he's not merged it yet.

It was merged last night.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-18  7:32             ` Chris Samuel
  2014-10-19  3:01               ` Chris Samuel
@ 2014-10-20  8:01               ` Marc Dietrich
  2014-10-20  9:14                 ` Chris Samuel
  1 sibling, 1 reply; 21+ messages in thread
From: Marc Dietrich @ 2014-10-20  8:01 UTC (permalink / raw)
  To: Chris Samuel; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 910 bytes --]

Am Samstag, 18. Oktober 2014, 18:32:49 schrieb Chris Samuel:
> On Fri, 17 Oct 2014 02:09:30 PM Rich Freeman wrote:
> > Just for clarity - when can we expect to see these in the kernel?
> 
> The stable kernel rules say:
> 
> https://www.kernel.org/doc/Documentation/stable_kernel_rules.txt
> 
> #  - It or an equivalent fix must already exist in Linus' tree (upstream).
> 
> So until Linus merges the revert into the mainline kernel it cannot go into
> a stable release, and he's not merged it yet.

it also says a few lines below:

- To have the patch automatically included in the stable tree, add the tag
     Cc: stable@vger.kernel.org
   in the sign-off area. Once the patch is merged it will be applied to
   the stable tree without anything else needing to be done by the author
   or subsystem maintainer.

so fixes would be tagged earlier this way and merged automaticly.

Marc

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-20  8:01               ` Marc Dietrich
@ 2014-10-20  9:14                 ` Chris Samuel
  0 siblings, 0 replies; 21+ messages in thread
From: Chris Samuel @ 2014-10-20  9:14 UTC (permalink / raw)
  To: linux-btrfs

On Mon, 20 Oct 2014 10:01:56 AM Marc Dietrich wrote:

> so fixes would be tagged earlier this way and merged automaticly.

I don't think there's a lot automatic about stable, Greg K-H merges patches
into a git tree here:

http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git

As you can see since last night he pulled in a bunch of btrfs fixes into that
based upon what Chris Mason emailed out yesterday.


commit 2792dbfd1e02a70a8eef7e0cc3f44cb77d6c100f
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Mon Oct 20 07:08:43 2014 +0800

    3.17-stable patches
    
    added patches:
        btrfs-add-missing-compression-property-remove-in-btrfs_ioctl_setflags.patch
        btrfs-cleanup-error-handling-in-build_backref_tree.patch
        btrfs-don-t-do-async-reclaim-during-log-replay.patch
        btrfs-don-t-go-readonly-on-existing-qgroup-items.patch
        btrfs-fix-a-deadlock-in-btrfs_dev_replace_finishing.patch
        btrfs-fix-and-enhance-merge_extent_mapping-to-insert-best-fitted-extent-map.patch
        btrfs-fix-build_backref_tree-issue-with-multiple-shared-blocks.patch
        btrfs-fix-race-in-wait_sync-ioctl.patch
        btrfs-fix-the-wrong-condition-judgment-about-subset-extent-map.patch
        btrfs-fix-up-bounds-checking-in-lseek.patch
        btrfs-try-not-to-enospc-on-log-replay.patch
        btrfs-wake-up-transaction-thread-from-sync_fs-ioctl.patch
        revert-btrfs-race-free-update-of-commit-root-for-ro-snapshots.patch

(there are also a bunch going in for 3.10, 3.14 and 3.16 too)

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-17  8:17     ` Hugo Mills
@ 2014-10-20 14:04       ` Zygo Blaxell
  2014-10-20 14:52         ` Rich Freeman
  0 siblings, 1 reply; 21+ messages in thread
From: Zygo Blaxell @ 2014-10-20 14:04 UTC (permalink / raw)
  To: Hugo Mills, Tomasz Torcz, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2249 bytes --]

On Fri, Oct 17, 2014 at 08:17:37AM +0000, Hugo Mills wrote: > On Fri, Oct 17, 2014 at 10:10:09AM +0200, Tomasz Torcz wrote:
> > On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
> > > >   Recently I've observed some corruptions to systemd's journal
> > > > files which are somewhat puzzling. This is especially worrying
> > > > as this is btrfs raid1 setup and I expected auto-healing.
> > > > 
> > > >   System details: 3.17.0-301.fc21.x86_64
> > > > btrfs: raid1 over 2x dm-crypted 6TB HDDs.
> > > > mount opts: rw,relatime,seclabel,compress=lzo,space_cache
> > > >   Reads with cat, hexdump fails with:
> > > > read(4, 0x1001000, 65536)               = -1 EIO (Input/output error)
> > > > 
> > > Does scrub work for you?
> > 
> >   As there seem to be no way to scrub individual files, I've started
> > scrub of full volume.  It will take some hours to finish.
> > 
> >   Meanwhile, could you satisfy my curiosity what would scrub do that
> > wouldn't be done by just reading the whole file?
> 
>    It checks both copies. Reading the file will only read one of the
> copies of any given block (so if that's good and the other copy is
> bad, it won't fix anything).

Really?  One of my earliest btrfs tests was to run a loop of 'sha1sum
-c' on a gigabyte or two of files in one window while I used dd to
write random data in random locations directly to one of the filesystem
mirror partitions in the other.  I did this test *specifically* to
watch the automatic checksumming and self-healing features of btrfs
in action.  A complete 'sha1sum' verification of the filesystem contents
passed even though the kernel log was showing checksum errors scrolling
by faster than I could read, which strongly implies that read() normally
does check both mirrors before returning EIO.  This was on kernel version
3.12.21 or so, so it should be working on 3.17 too.

Thomasz reports using 'nocow', which breaks the data integrity checks.
I'd expect the read() to return success and provide garbage data, but the
observed behavior is EIO instead.  The underlying device doesn't seem
to be generating the I/O errors, so it's probably metadata corruption
of some kind.  Are there btrfs kernel messages in dmesg?


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-20 14:04       ` Zygo Blaxell
@ 2014-10-20 14:52         ` Rich Freeman
  0 siblings, 0 replies; 21+ messages in thread
From: Rich Freeman @ 2014-10-20 14:52 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: Hugo Mills, Tomasz Torcz, Btrfs BTRFS

On Mon, Oct 20, 2014 at 10:04 AM, Zygo Blaxell <zblaxell@furryterror.org> wrote:
> On Fri, Oct 17, 2014 at 08:17:37AM +0000, Hugo Mills wrote: > On Fri, Oct 17, 2014 at 10:10:09AM +0200, Tomasz Torcz wrote:
>> > On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
>> > > >   Recently I've observed some corruptions to systemd's journal
>> > > > files which are somewhat puzzling. This is especially worrying
>> > > > as this is btrfs raid1 setup and I expected auto-healing.
>> > > >
>> > > >   System details: 3.17.0-301.fc21.x86_64
>> > > > btrfs: raid1 over 2x dm-crypted 6TB HDDs.
>> > > > mount opts: rw,relatime,seclabel,compress=lzo,space_cache
>> > > >   Reads with cat, hexdump fails with:
>> > > > read(4, 0x1001000, 65536)               = -1 EIO (Input/output error)
>> > > >
>> > > Does scrub work for you?
>> >
>> >   As there seem to be no way to scrub individual files, I've started
>> > scrub of full volume.  It will take some hours to finish.
>> >
>> >   Meanwhile, could you satisfy my curiosity what would scrub do that
>> > wouldn't be done by just reading the whole file?
>>
>>    It checks both copies. Reading the file will only read one of the
>> copies of any given block (so if that's good and the other copy is
>> bad, it won't fix anything).
>
> Really?  One of my earliest btrfs tests was to run a loop of 'sha1sum
> -c' on a gigabyte or two of files in one window while I used dd to
> write random data in random locations directly to one of the filesystem
> mirror partitions in the other.  I did this test *specifically* to
> watch the automatic checksumming and self-healing features of btrfs
> in action.  A complete 'sha1sum' verification of the filesystem contents
> passed even though the kernel log was showing checksum errors scrolling
> by faster than I could read, which strongly implies that read() normally
> does check both mirrors before returning EIO.

I think you misread the earlier post.  It sounds like the algorithm is:
1.  Receive request to read block from file.
2.  Determine which mirrored block to read it from (it sounds like
this is sub-optimal today, presumably you'd want to use the least busy
disk or disk with the head closest to the right cylinder to do it).
3.  Read the block.  Verify the checksum.  If it matches return the data.
4.  If not find another mirrored block to read it from if one exists.
Verify the checksum.  If it matches return the data and update all
other mirrored copies with it.
5.  Repeat step 4 until you run out of mirrored copies.  If so, return an error.

So, doing random reads will NOT be equivalent to scrubbing the disks,
because with a scrub you want to check that ALL copies are code, and
the algorithm above only determines that any copy is good.

When you used dd to overwrite blocks, you didn't get errors because
when the first copy failed the filesystem just read the second copy as
intended.  That isn't a scrub - it is a recovery.

An actual scrub isn't file-focused, but device focused.  It starts
reading at the start of the device, and verifies each logical unit of
data sequentially.  This can be done asynchronously since btrfs stores
checksums, as opposed to a traditional RAID where the reads need to be
synchronous since the validity of a mirror/stripe can only be
ascertained by comparing it to all the other devices in that
mirror/stripe (and then unless you're using something like RAID6+ you
couldn't determine which copy is bad without a checksum).  In theory
I'd expect a scrub with btrfs to be less detrimental to performance as
a result - a read request could halt the scrub on one device without
delaying the scrub on the other devices.  Writes in RAID1 mode
necessarily disrupt two devices, but others would not be impacted.

--
Rich

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-17 12:53         ` Chris Mason
  2014-10-17 18:09           ` Rich Freeman
@ 2014-10-20 19:09           ` Tomasz Torcz
  1 sibling, 0 replies; 21+ messages in thread
From: Tomasz Torcz @ 2014-10-20 19:09 UTC (permalink / raw)
  To: linux-btrfs

On Fri, Oct 17, 2014 at 08:53:06AM -0400, Chris Mason wrote:
> On Fri, Oct 17, 2014 at 4:54 AM, Tomasz Torcz <tomek@pipebreaker.pl> wrote:
> >On Fri, Oct 17, 2014 at 04:29:36PM +0800, Liu Bo wrote:
> >> On Fri, Oct 17, 2014 at 10:10:09AM +0200, Tomasz Torcz wrote:
> >> > On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
> >> > > >   Recently I've observed some corruptions to systemd's journal
> >> > > > files which are somewhat puzzling. This is especially worrying
> >> > > > as this is btrfs raid1 setup and I expected auto-healing.
> >> > > > read(4, 0x1001000, 65536)               = -1 EIO (Input/output
> >>error)
> >>
> >> Well..I don't know exactly what's the cause, but as the file is NOCOW,
> >>it writes
> >> data in place, have you experienced a hard reboot or something
> >>recently?
> >
> >  Nothing like that.  Server is on an UPS, there were couple normal
> >shutdowns
> >this year (few kernel upgrades).
> >
> >> And any message in dmesg log while getting EIO by reading the file?
> >
> >  Nothing in dmesg, no btrfs messages, no SCSI/SATA errors, nothing.
> >That's
> >why I find those corruptions mysterious.
> >  Maybe there is some way to inspect internal btrfs state and find out
> >what
> >causing the problems?  Or maybe this is related to patch mentioned in this
> >thread?
> 
> This sounds like the problem fixed with some patches to our extent mapping
> code  that went in with the merge window.  I've cherry picked a few for
> stable and I'm running them through tests now.  They are in my stable-3.17
> branch, and I'll send to Greg once Linus grabs the revert for the last one.
> 
> But, if you want to try that branch out, it may fix this EIO.  Otherwise
> we'll start sending you debugging.

  Good shot.  Fedora kernel maintainer was kind enough to include those patches
and build a kernel for F21.  With this kernel EIO is not showing and files
are readable.  Thanks!

-- 
Tomasz Torcz              ,,If you try to upissue this patchset I shall be seeking
xmpp: zdzichubg@chrome.pl   an IP-routable hand grenade.'' -- Andrew Morton (LKML)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: unexplainable corruptions 3.17.0
  2014-10-17 15:01 ` Chris Murphy
@ 2014-10-20 19:10   ` Tomasz Torcz
  0 siblings, 0 replies; 21+ messages in thread
From: Tomasz Torcz @ 2014-10-20 19:10 UTC (permalink / raw)
  To: Btrfs BTRFS

On Fri, Oct 17, 2014 at 11:01:51AM -0400, Chris Murphy wrote:
> 
> On Oct 16, 2014, at 5:17 AM, Tomasz Torcz <tomek@pipebreaker.pl> wrote:
> > 
> >  Broken files are in /var/log/journal directory. This directory
> > is set NOCOW with chattr, all the files within too.
> > 
> > Example of broken file:
> > system@0005057fe87730cf-6d3d85ed59bd70ae.journal~
> 
> What do you get for 'journalctl --verify' ? I'm curious if any journal files are considered corrupt by journalctl, and if there's parity between journalctl and dd_rescue when it comes to good/bad journals.

  journalctl "bus errors" on them.


-- 
Tomasz Torcz              ,,If you try to upissue this patchset I shall be seeking
xmpp: zdzichubg@chrome.pl   an IP-routable hand grenade.'' -- Andrew Morton (LKML)


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-10-20 19:10 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-16  9:17 unexplainable corruptions 3.17.0 Tomasz Torcz
2014-10-17  8:02 ` Liu Bo
2014-10-17  8:10   ` Tomasz Torcz
2014-10-17  8:17     ` Hugo Mills
2014-10-20 14:04       ` Zygo Blaxell
2014-10-20 14:52         ` Rich Freeman
2014-10-17  8:29     ` Liu Bo
2014-10-17  8:54       ` Tomasz Torcz
2014-10-17 12:53         ` Chris Mason
2014-10-17 18:09           ` Rich Freeman
2014-10-18  7:32             ` Chris Samuel
2014-10-19  3:01               ` Chris Samuel
2014-10-20  8:01               ` Marc Dietrich
2014-10-20  9:14                 ` Chris Samuel
2014-10-20 19:09           ` Tomasz Torcz
2014-10-17 11:38   ` Duncan
2014-10-17 15:07     ` Chris Murphy
2014-10-17 17:29   ` Tomasz Torcz
2014-10-17  8:17 ` Marc Dietrich
2014-10-17 15:01 ` Chris Murphy
2014-10-20 19:10   ` Tomasz Torcz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox