File system stuck in scrub

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* File system stuck in scrub
@ 2014-08-11 15:12 Nikolaus Rath
  2014-08-11 15:37 ` Hugo Mills
  2014-08-11 15:45 ` Calvin Walton
  0 siblings, 2 replies; 4+ messages in thread
From: Nikolaus Rath @ 2014-08-11 15:12 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I started a scrub of one of my btrfs filesystem and then had to restart
the system. `systemctl restart` seemed to terminate all processes, but
then got stuck at the end. The disk activity led was still flashing
rapidly at that point, so I assume that the active scrub was preventing
the reboot (is that a bug or a feature?).

In any case, I could not wait for that so I power cycled. But now my
file system seems to be stuck in a scrub that can neither be completed
nor cancelled:

$ sudo btrfs scrub status /home/nikratio/
scrub status for 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
        scrub started at Sun Aug 10 18:36:43 2014, running for 1562 seconds
        total bytes scrubbed: 209.97GiB with 0 errors

$ date
Sun Aug 10 22:00:44 PDT 2014

$ sudo btrfs scrub cancel /home/nikratio/
ERROR: scrub cancel failed on /home/nikratio/: not running

$ sudo btrfs scrub start /home/nikratio/
ERROR: scrub is already running.
To cancel use 'btrfs scrub cancel /home/nikratio/'.
To see the status use 'btrfs scrub status [-d] /home/nikratio/'.

Note that the scrub was started more than 3 hours ago, but claims to
have been running for only 1562 seconds.

I then figured that maybe I need to run btrfsck. This gave the following
output:

checking extents
checking free space cache
checking fs roots
root 5 inode 3149791 errors 400, nbytes wrong
root 5 inode 3150233 errors 400, nbytes wrong
root 5 inode 3150238 errors 400, nbytes wrong
[102 similar lines]
Checking filesystem on /dev/mapper/vg0-nikratio_crypt
UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
free space inode generation (0) did not match free space cache generation (161262)
free space inode generation (0) did not match free space cache generation (75485)
free space inode generation (0) did not match free space cache generation (79599)
free space inode generation (0) did not match free space cache generation (72280)
free space inode generation (0) did not match free space cache generation (79599)
free space inode generation (0) did not match free space cache generation (25866)
free space inode generation (0) did not match free space cache generation (12255)
free space inode generation (0) did not match free space cache generation (72521)
free space inode generation (0) did not match free space cache generation (161286)
free space inode generation (0) did not match free space cache generation (28716)
free space inode generation (0) did not match free space cache generation (161481)
found 216444746042 bytes used err is 1
total csum bytes: 383160676
total tree bytes: 875753472
total fs tree bytes: 284246016
total extent tree bytes: 69320704
btree space waste bytes: 205021777
file data blocks allocated: 3701556121600
 referenced 388107321344
Btrfs v3.14.1

So nothing about the scrub, but apparently some other errors.

Can someone tell me:

 * Should I be able to restart while a scrub is in progress, or is that
   deliberately prevented by btrfs?

 * How can I resume or cancel the scrub?

 * Is it more risky to leave the above errors uncorrected, or to run
   btrfsck with --repair?

I'm using kernel 3.14.

Thanks!
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: File system stuck in scrub
  2014-08-11 15:12 File system stuck in scrub Nikolaus Rath
@ 2014-08-11 15:37 ` Hugo Mills
  2014-08-11 15:45 ` Calvin Walton
  1 sibling, 0 replies; 4+ messages in thread
From: Hugo Mills @ 2014-08-11 15:37 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3300 bytes --]

On Mon, Aug 11, 2014 at 08:12:46AM -0700, Nikolaus Rath wrote:
> I started a scrub of one of my btrfs filesystem and then had to restart
> the system. `systemctl restart` seemed to terminate all processes, but
> then got stuck at the end. The disk activity led was still flashing
> rapidly at that point, so I assume that the active scrub was preventing
> the reboot (is that a bug or a feature?).

   Shouldn't have stopped it.

> In any case, I could not wait for that so I power cycled. But now my
> file system seems to be stuck in a scrub that can neither be completed
> nor cancelled:
> 
> $ sudo btrfs scrub status /home/nikratio/
> scrub status for 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
>         scrub started at Sun Aug 10 18:36:43 2014, running for 1562 seconds
>         total bytes scrubbed: 209.97GiB with 0 errors
> 
> $ date
> Sun Aug 10 22:00:44 PDT 2014
> 
> $ sudo btrfs scrub cancel /home/nikratio/
> ERROR: scrub cancel failed on /home/nikratio/: not running
> 
> $ sudo btrfs scrub start /home/nikratio/
> ERROR: scrub is already running.
> To cancel use 'btrfs scrub cancel /home/nikratio/'.
> To see the status use 'btrfs scrub status [-d] /home/nikratio/'.
> 
> Note that the scrub was started more than 3 hours ago, but claims to
> have been running for only 1562 seconds.

   This is a regrettably common problem -- fortunately with a simple
solution. The userspace scrub monitor died in the reboot, leaving the
status file present. If you delete the status file, which is in
/var/lib/btrfs/, that should allow you to start a new scrub.

> I then figured that maybe I need to run btrfsck. This gave the following
> output:
> 
> checking extents
> checking free space cache
> checking fs roots
> root 5 inode 3149791 errors 400, nbytes wrong
> root 5 inode 3150233 errors 400, nbytes wrong
> root 5 inode 3150238 errors 400, nbytes wrong
> [102 similar lines]
> Checking filesystem on /dev/mapper/vg0-nikratio_crypt
> UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
> free space inode generation (0) did not match free space cache generation (161262)
[snip]
> found 216444746042 bytes used err is 1
> total csum bytes: 383160676
> total tree bytes: 875753472
> total fs tree bytes: 284246016
> total extent tree bytes: 69320704
> btree space waste bytes: 205021777
> file data blocks allocated: 3701556121600
>  referenced 388107321344
> Btrfs v3.14.1
> 
> So nothing about the scrub, but apparently some other errors.

   The free space inode generation errors are harmless. The wrong
nbytes is probably not horrifically damaging, but I don't know so much
about that one.

> Can someone tell me:
> 
>  * Should I be able to restart while a scrub is in progress, or is that
>    deliberately prevented by btrfs?

   Restart the machine? Yes.

>  * How can I resume or cancel the scrub?

   It's probably simply not running -- see above.

>  * Is it more risky to leave the above errors uncorrected, or to run
>    btrfsck with --repair?

   I would, I think, leave them.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- We are all lying in the gutter,  but some of us are looking ---   
                              at the stars.                              

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: File system stuck in scrub
  2014-08-11 15:12 File system stuck in scrub Nikolaus Rath
  2014-08-11 15:37 ` Hugo Mills
@ 2014-08-11 15:45 ` Calvin Walton
  2014-08-11 15:53   ` Marc MERLIN
  1 sibling, 1 reply; 4+ messages in thread
From: Calvin Walton @ 2014-08-11 15:45 UTC (permalink / raw)
  To: Nikolaus Rath; +Cc: linux-btrfs

Hi,

On Mon, 2014-08-11 at 08:12 -0700, Nikolaus Rath wrote:
> Hello,
> 
> I started a scrub of one of my btrfs filesystem and then had to 
> restart
> the system. `systemctl restart` seemed to terminate all processes, 
> but
> then got stuck at the end. The disk activity led was still flashing
> rapidly at that point, so I assume that the active scrub was 
> preventing
> the reboot (is that a bug or a feature?).
This sounds like a bug - I know that e.g. the rebalance operation is 
designed so that you can shutdown/reboot during the operation, and it 
will complete following a reboot. But I'm not familiar with the code 
in question.

> In any case, I could not wait for that so I power cycled. But now my
> file system seems to be stuck in a scrub that can neither be 
> completed
> nor cancelled:
> 
> $ sudo btrfs scrub status /home/nikratio/
> scrub status for 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
>         scrub started at Sun Aug 10 18:36:43 2014, running for 1562 
> seconds
>         total bytes scrubbed: 209.97GiB with 0 errors
> 
> $ date
> Sun Aug 10 22:00:44 PDT 2014
> 
> $ sudo btrfs scrub cancel /home/nikratio/
> ERROR: scrub cancel failed on /home/nikratio/: not running
> 
> $ sudo btrfs scrub start /home/nikratio/
> ERROR: scrub is already running.
> To cancel use 'btrfs scrub cancel /home/nikratio/'.
> To see the status use 'btrfs scrub status [-d] /home/nikratio/'.
My guess is that this is a mismatch between some state stored by the 
userspace tools and the state in the kernel. One of the things you can 
try is to delete the files /var/lib/btrfs/scrub.status.* - that will 
force the btrfs tools to get the current status from the kernel (you 
will lose some statistics and scrub history.)

Running 'btrfs scrub status /home/nikratio/' after this should simply 
say 'no stats available', and you can start a new scrub later if you 
like.

> I then figured that maybe I need to run btrfsck. This gave the 
> following
> output:
As long as you didn't use --repair, this shouldn't break anything... 
Note that btrfs has to be run on an *unmounted* filesystem to give 
useful results.

>  * Is it more risky to leave the above errors uncorrected, or to run
>    btrfsck with --repair?
There probably aren't any issues on the filesystem that the runtime 
btrfs code can't handle. Don't run with --repair, at least not yet.

> 
> 
> I'm using kernel 3.14.
> 
> Thanks!
> -Nikolaus
> 
> 

-- 
Calvin Walton <calvin.walton@kepstin.ca>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: File system stuck in scrub
  2014-08-11 15:45 ` Calvin Walton
@ 2014-08-11 15:53   ` Marc MERLIN
  0 siblings, 0 replies; 4+ messages in thread
From: Marc MERLIN @ 2014-08-11 15:53 UTC (permalink / raw)
  To: Calvin Walton; +Cc: Nikolaus Rath, linux-btrfs

On Mon, Aug 11, 2014 at 11:45:45AM -0400, Calvin Walton wrote:
> > $ sudo btrfs scrub start /home/nikratio/
> > ERROR: scrub is already running.
> > To cancel use 'btrfs scrub cancel /home/nikratio/'.
> > To see the status use 'btrfs scrub status [-d] /home/nikratio/'.
> My guess is that this is a mismatch between some state stored by the 
> userspace tools and the state in the kernel. One of the things you can 
> try is to delete the files /var/lib/btrfs/scrub.status.* - that will 
> force the btrfs tools to get the current status from the kernel (you 
> will lose some statistics and scrub history.)

No need to really delete it, just changing one character will do :)

http://marc.merlins.org/perso/btrfs/post_2014-04-26_Btrfs-Tips_-Cancel-A-Btrfs-Scrub-That-Is-Already-Stopped.html

Cheers,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-08-11 15:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-11 15:12 File system stuck in scrub Nikolaus Rath
2014-08-11 15:37 ` Hugo Mills
2014-08-11 15:45 ` Calvin Walton
2014-08-11 15:53   ` Marc MERLIN

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).