* scrub "correcting" tons of errors ?
@ 2013-03-29 9:50 Swâmi Petaramesh
2013-03-29 12:58 ` Josef Bacik
0 siblings, 1 reply; 12+ messages in thread
From: Swâmi Petaramesh @ 2013-03-29 9:50 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
Hi there,
I've started "btrfs scrub start /" on one of my machines (Kernel
3.8.0-15 Ubuntu AMD64), which typically "behaves well" so I wasn't
suspected any disk issue.
After having ran for only 165 seconds, "scrub status" shows it pretends
having found and corrected 22926 CSUM errors ??!?!?!?!!???
This is a rather new HDD, in perfect shape (SMART all OK, never
reallocated a single sector, less than 200 hours total runtime...)
WTF ?!?
I've cancelled scrub for now, until I get further understanding of what
can be happening...
--
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E
Ne cherchez pas : Je ne suis pas sur Facebook.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scrub "correcting" tons of errors ?
2013-03-29 9:50 scrub "correcting" tons of errors ? Swâmi Petaramesh
@ 2013-03-29 12:58 ` Josef Bacik
2013-03-29 13:06 ` Swâmi Petaramesh
2013-03-29 13:06 ` Harald Glatt
0 siblings, 2 replies; 12+ messages in thread
From: Josef Bacik @ 2013-03-29 12:58 UTC (permalink / raw)
To: Swâmi Petaramesh; +Cc: linux-btrfs@vger.kernel.org
On Fri, Mar 29, 2013 at 03:50:15AM -0600, Swâmi Petaramesh wrote:
> Hi there,
>
> I've started "btrfs scrub start /" on one of my machines (Kernel
> 3.8.0-15 Ubuntu AMD64), which typically "behaves well" so I wasn't
> suspected any disk issue.
>
> After having ran for only 165 seconds, "scrub status" shows it pretends
> having found and corrected 22926 CSUM errors ??!?!?!?!!???
>
> This is a rather new HDD, in perfect shape (SMART all OK, never
> reallocated a single sector, less than 200 hours total runtime...)
>
> WTF ?!?
>
> I've cancelled scrub for now, until I get further understanding of what
> can be happening...
>
So this is probably because of the extent tree corruption you had, it's just
cleaning things up and you should be fine once it finishes. Thanks,
Josef
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scrub "correcting" tons of errors ?
2013-03-29 12:58 ` Josef Bacik
@ 2013-03-29 13:06 ` Swâmi Petaramesh
2013-03-29 13:12 ` Josef Bacik
2013-03-29 13:26 ` Josef Bacik
2013-03-29 13:06 ` Harald Glatt
1 sibling, 2 replies; 12+ messages in thread
From: Swâmi Petaramesh @ 2013-03-29 13:06 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs@vger.kernel.org
Hi Josef,
Le 29/03/2013 13:58, Josef Bacik a écrit :
> So this is probably because of the extent tree corruption you had, it's just
> cleaning things up and you should be fine once it finishes. Thanks,
Er... It's on a different machine !
Current (at the time I write) status is :
# btrfs scrub status /
scrub status for 346b81b2-0735-4c4d-a137-1995bc78ad70
scrub resumed at Fri Mar 29 11:52:43 2013 and finished after
7470 seconds
total bytes scrubbed: 231.96GB with 149691 errors
error details: csum=149691
corrected errors: 149691, uncorrectable errors: 0, unverified
errors: 0
I have to say that scrub completely froze the machine at least 4 times
(disk had ceased activity and any command that would imply a disk access
would hang forever), but at least after a (quite brutal) reboot it could
be resumed...
The only thing about this FS is that it had been imaged, then restored,
using partclone.btrfs (which itself is supposed to use the BTRFS libraries).
I have a screenshot of "last thing I saw when it hanged", I can upload
it somewhere, should it be relevant...
Kind regards.
--
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E
Ne cherchez pas : Je ne suis pas sur Facebook.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scrub "correcting" tons of errors ?
2013-03-29 12:58 ` Josef Bacik
2013-03-29 13:06 ` Swâmi Petaramesh
@ 2013-03-29 13:06 ` Harald Glatt
2013-03-29 13:11 ` Hugo Mills
1 sibling, 1 reply; 12+ messages in thread
From: Harald Glatt @ 2013-03-29 13:06 UTC (permalink / raw)
To: Josef Bacik; +Cc: Swâmi Petaramesh, linux-btrfs@vger.kernel.org
On that note, is btrfs doing automatic background scrubs of its own or
do I have to use crontab to schedule scrubs?
Thanks!
On Fri, Mar 29, 2013 at 1:58 PM, Josef Bacik <jbacik@fusionio.com> wrote:
> On Fri, Mar 29, 2013 at 03:50:15AM -0600, Swāmi Petaramesh wrote:
>> Hi there,
>>
>> I've started "btrfs scrub start /" on one of my machines (Kernel
>> 3.8.0-15 Ubuntu AMD64), which typically "behaves well" so I wasn't
>> suspected any disk issue.
>>
>> After having ran for only 165 seconds, "scrub status" shows it pretends
>> having found and corrected 22926 CSUM errors ??!?!?!?!!???
>>
>> This is a rather new HDD, in perfect shape (SMART all OK, never
>> reallocated a single sector, less than 200 hours total runtime...)
>>
>> WTF ?!?
>>
>> I've cancelled scrub for now, until I get further understanding of what
>> can be happening...
>>
>
> So this is probably because of the extent tree corruption you had, it's just
> cleaning things up and you should be fine once it finishes. Thanks,
>
> Josef
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scrub "correcting" tons of errors ?
2013-03-29 13:06 ` Harald Glatt
@ 2013-03-29 13:11 ` Hugo Mills
0 siblings, 0 replies; 12+ messages in thread
From: Hugo Mills @ 2013-03-29 13:11 UTC (permalink / raw)
To: Harald Glatt
Cc: Josef Bacik, Swâmi Petaramesh, linux-btrfs@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 1791 bytes --]
On Fri, Mar 29, 2013 at 02:06:39PM +0100, Harald Glatt wrote:
> On that note, is btrfs doing automatic background scrubs of its own or
> do I have to use crontab to schedule scrubs?
If you want a full-disk scrub, you'll need to schedule it yourself
with cron (I run mine once a month). However, if a problem is detected
during normal operation -- e.g. you read a piece of data and it's got
bad checksums -- then the FS will fix it if it can, in the same way
that it would with a scrub.
Hugo.
> Thanks!
>
> On Fri, Mar 29, 2013 at 1:58 PM, Josef Bacik <jbacik@fusionio.com> wrote:
> > On Fri, Mar 29, 2013 at 03:50:15AM -0600, Swāmi Petaramesh wrote:
> >> Hi there,
> >>
> >> I've started "btrfs scrub start /" on one of my machines (Kernel
> >> 3.8.0-15 Ubuntu AMD64), which typically "behaves well" so I wasn't
> >> suspected any disk issue.
> >>
> >> After having ran for only 165 seconds, "scrub status" shows it pretends
> >> having found and corrected 22926 CSUM errors ??!?!?!?!!???
> >>
> >> This is a rather new HDD, in perfect shape (SMART all OK, never
> >> reallocated a single sector, less than 200 hours total runtime...)
> >>
> >> WTF ?!?
> >>
> >> I've cancelled scrub for now, until I get further understanding of what
> >> can be happening...
> >>
> >
> > So this is probably because of the extent tree corruption you had, it's just
> > cleaning things up and you should be fine once it finishes. Thanks,
> >
> > Josef
--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Eighth Army Push Bottles Up Germans -- WWII newspaper ---
headline (possibly apocryphal)
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scrub "correcting" tons of errors ?
2013-03-29 13:06 ` Swâmi Petaramesh
@ 2013-03-29 13:12 ` Josef Bacik
2013-03-29 13:23 ` Swâmi Petaramesh
2013-03-29 13:35 ` Swâmi Petaramesh
2013-03-29 13:26 ` Josef Bacik
1 sibling, 2 replies; 12+ messages in thread
From: Josef Bacik @ 2013-03-29 13:12 UTC (permalink / raw)
To: Swâmi Petaramesh; +Cc: Josef Bacik, linux-btrfs@vger.kernel.org
On Fri, Mar 29, 2013 at 07:06:33AM -0600, Swâmi Petaramesh wrote:
> Hi Josef,
>
> Le 29/03/2013 13:58, Josef Bacik a écrit :
> > So this is probably because of the extent tree corruption you had, it's just
> > cleaning things up and you should be fine once it finishes. Thanks,
>
> Er... It's on a different machine !
>
> Current (at the time I write) status is :
>
> # btrfs scrub status /
> scrub status for 346b81b2-0735-4c4d-a137-1995bc78ad70
> scrub resumed at Fri Mar 29 11:52:43 2013 and finished after
> 7470 seconds
> total bytes scrubbed: 231.96GB with 149691 errors
> error details: csum=149691
> corrected errors: 149691, uncorrectable errors: 0, unverified
> errors: 0
>
> I have to say that scrub completely froze the machine at least 4 times
> (disk had ceased activity and any command that would imply a disk access
> would hang forever), but at least after a (quite brutal) reboot it could
> be resumed...
>
> The only thing about this FS is that it had been imaged, then restored,
> using partclone.btrfs (which itself is supposed to use the BTRFS libraries).
>
This is where I go "AHA!" and just assume that it wasn't our fault ;).
> I have a screenshot of "last thing I saw when it hanged", I can upload
> it somewhere, should it be relevant...
>
Screenshots are welcome, I have no doubt scrub is fixing actual problems, but it
definitely shouldn't be hanging the box so I'd like to get those fixed if
possible. Sysrq+w during hangs are very usefull but may be too much output for
screenshots, netconsole works very nicely for this
http://fedoraproject.org/wiki/Netconsole
Thanks,
Josef
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scrub "correcting" tons of errors ?
2013-03-29 13:12 ` Josef Bacik
@ 2013-03-29 13:23 ` Swâmi Petaramesh
2013-03-29 13:35 ` Swâmi Petaramesh
1 sibling, 0 replies; 12+ messages in thread
From: Swâmi Petaramesh @ 2013-03-29 13:23 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs@vger.kernel.org
Le 29/03/2013 14:12, Josef Bacik a écrit :
> Screenshots are welcome,
I posted one to http://dl.free.fr/jsRQ8JXZh (use your email address or
the list's one to fetch it)
It may or may not be very interesting, but that's all I got.
> I have no doubt scrub is fixing actual problems
Looks like it actually is. First times it hanged, I restarted it from
the start and it wasn't finding errors during the first GBs anymore, so
I assumed it has fixed them in the previous pass (even though it
eventually crashed the disk susbsystem).
> , but it
> definitely shouldn't be hanging the box so I'd like to get those fixed if
> possible. Sysrq+w during hangs are very usefull but may be too much output for
> screenshots, netconsole works very nicely for this
I'll restart the complete scrub (now everything is supposedly fixed...?)
and let you know if it hangs agan and I can put the hand on something.
Kind regards.
--
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E
Ne cherchez pas : Je ne suis pas sur Facebook.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scrub "correcting" tons of errors ?
2013-03-29 13:06 ` Swâmi Petaramesh
2013-03-29 13:12 ` Josef Bacik
@ 2013-03-29 13:26 ` Josef Bacik
2013-03-29 13:29 ` cwillu
2013-03-29 13:39 ` Swâmi Petaramesh
1 sibling, 2 replies; 12+ messages in thread
From: Josef Bacik @ 2013-03-29 13:26 UTC (permalink / raw)
To: Swâmi Petaramesh; +Cc: Josef Bacik, linux-btrfs@vger.kernel.org
On Fri, Mar 29, 2013 at 07:06:33AM -0600, Swâmi Petaramesh wrote:
> Hi Josef,
>
> Le 29/03/2013 13:58, Josef Bacik a écrit :
> > So this is probably because of the extent tree corruption you had, it's just
> > cleaning things up and you should be fine once it finishes. Thanks,
>
> Er... It's on a different machine !
>
> Current (at the time I write) status is :
>
> # btrfs scrub status /
> scrub status for 346b81b2-0735-4c4d-a137-1995bc78ad70
> scrub resumed at Fri Mar 29 11:52:43 2013 and finished after
> 7470 seconds
> total bytes scrubbed: 231.96GB with 149691 errors
> error details: csum=149691
> corrected errors: 149691, uncorrectable errors: 0, unverified
> errors: 0
>
> I have to say that scrub completely froze the machine at least 4 times
> (disk had ceased activity and any command that would imply a disk access
> would hang forever), but at least after a (quite brutal) reboot it could
> be resumed...
>
> The only thing about this FS is that it had been imaged, then restored,
> using partclone.btrfs (which itself is supposed to use the BTRFS libraries).
>
> I have a screenshot of "last thing I saw when it hanged", I can upload
> it somewhere, should it be relevant...
>
Actually instead of netconsole we have an awesome service provided by Carey, you
can just do
nc cwillu.com 10101 < /dev/kmsg
after you've run sysrq+w and then reply with the URL it spits out. Thanks,
Josef
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scrub "correcting" tons of errors ?
2013-03-29 13:26 ` Josef Bacik
@ 2013-03-29 13:29 ` cwillu
2013-03-29 13:39 ` Swâmi Petaramesh
1 sibling, 0 replies; 12+ messages in thread
From: cwillu @ 2013-03-29 13:29 UTC (permalink / raw)
To: Josef Bacik, Swâmi Petaramesh; +Cc: linux-btrfs@vger.kernel.org
> Actually instead of netconsole we have an awesome service provided by Carey, you
> can just do
>
> nc cwillu.com 10101 < /dev/kmsg
... at a root prompt.
> after you've run sysrq+w and then reply with the URL it spits out. Thanks,
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scrub "correcting" tons of errors ?
2013-03-29 13:12 ` Josef Bacik
2013-03-29 13:23 ` Swâmi Petaramesh
@ 2013-03-29 13:35 ` Swâmi Petaramesh
1 sibling, 0 replies; 12+ messages in thread
From: Swâmi Petaramesh @ 2013-03-29 13:35 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs@vger.kernel.org
Le 29/03/2013 14:12, Josef Bacik a écrit :
> Screenshots are welcome
This time I good a real nice kernel Ooops during scrub...
http://dl.free.fr/hjAdOH3mG
(use your email address or the list's one to fetch it)
--
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E
Ne cherchez pas : Je ne suis pas sur Facebook.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scrub "correcting" tons of errors ?
2013-03-29 13:26 ` Josef Bacik
2013-03-29 13:29 ` cwillu
@ 2013-03-29 13:39 ` Swâmi Petaramesh
2013-03-29 13:57 ` Josef Bacik
1 sibling, 1 reply; 12+ messages in thread
From: Swâmi Petaramesh @ 2013-03-29 13:39 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs@vger.kernel.org
Le 29/03/2013 14:26, Josef Bacik a écrit :
> after you've run sysrq+w and then reply with the URL it spits out. Thanks,
I'm afraid I won't be able to do this this afternoon : I also need to
work on my machine ;-) so for now I will avoid to restart a scrub that
would possibly crash it once more...
I'll hopefully be back on this soon.
Kind regards.
--
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E
Ne cherchez pas : Je ne suis pas sur Facebook.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: scrub "correcting" tons of errors ?
2013-03-29 13:39 ` Swâmi Petaramesh
@ 2013-03-29 13:57 ` Josef Bacik
0 siblings, 0 replies; 12+ messages in thread
From: Josef Bacik @ 2013-03-29 13:57 UTC (permalink / raw)
To: Swâmi Petaramesh; +Cc: Josef Bacik, linux-btrfs@vger.kernel.org
On Fri, Mar 29, 2013 at 07:39:06AM -0600, Swâmi Petaramesh wrote:
> Le 29/03/2013 14:26, Josef Bacik a écrit :
> > after you've run sysrq+w and then reply with the URL it spits out. Thanks,
> I'm afraid I won't be able to do this this afternoon : I also need to
> work on my machine ;-) so for now I will avoid to restart a scrub that
> would possibly crash it once more...
>
> I'll hopefully be back on this soon.
>
Yeah that picture was enough, I see what's going on, I'll send a patch. Thanks,
Josef
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-03-29 13:57 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-29 9:50 scrub "correcting" tons of errors ? Swâmi Petaramesh
2013-03-29 12:58 ` Josef Bacik
2013-03-29 13:06 ` Swâmi Petaramesh
2013-03-29 13:12 ` Josef Bacik
2013-03-29 13:23 ` Swâmi Petaramesh
2013-03-29 13:35 ` Swâmi Petaramesh
2013-03-29 13:26 ` Josef Bacik
2013-03-29 13:29 ` cwillu
2013-03-29 13:39 ` Swâmi Petaramesh
2013-03-29 13:57 ` Josef Bacik
2013-03-29 13:06 ` Harald Glatt
2013-03-29 13:11 ` Hugo Mills
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.