* btrfs ops hang indefinitely (process in D state)
@ 2016-07-02 9:49 Eugene Crosser
2016-07-02 10:54 ` Duncan
0 siblings, 1 reply; 3+ messages in thread
From: Eugene Crosser @ 2016-07-02 9:49 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 3379 bytes --]
Hello,
This may be the same problem as "btrfs lockup".
I have two systems using btrfs for several years. One is my home desktop, it has
root+home ext4 fs on a PCI SSD, and "big stuff" on a btrfs using two hard disks
in RAID1 configuration:
root@pccross:/export# uname -a
Linux pccross 4.7.0-rc2-custom #2 SMP Sat Jun 11 01:13:59 MSK 2016 x86_64 x86_64
x86_64 GNU/Linux # -- Was earlier 4.x version when the problem happened
root@pccross:/export# btrfs --version
btrfs-progs v4.4
root@pccross:/export# btrfs fi show
Label: 'export' uuid: c94c3ef6-394e-4441-8992-d7033332bdff
Total devices 2 FS bytes used 1.26TiB
devid 1 size 3.64TiB used 1.26TiB path /dev/sda
devid 2 size 3.64TiB used 1.26TiB path /dev/sdb
root@pccross:/export# btrfs fi df /export
Data, RAID1: total=1.26TiB, used=1.25TiB
System, RAID1: total=32.00MiB, used=208.00KiB
Metadata, RAID1: total=5.00GiB, used=3.82GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
A month ago, I moved a directory containing a few Gb from home (ext4) to btrfs
with `mv` command. The command took some minutes and eventually finished without
error. After some hours, a cron job that uses files on btrfs did not run. I
logged in to investigate and realized that its process was in 'D' state, and any
command that I tried that would use btrfs (ls, ...) would enter 'D' state and
stay there indefinitely. There was nothing interesting (that I remember) in
dmesg. Reboot did not help and indeed could not complete because some of startup
jobs use files on btfs, and they hang.
I rebooted without mounting btrfs and ran `btrfsck`. It found and fixed some
inconsistencies (no log, sorry), and I could mount, and since then everything
works, except the directory that I moved disappeared altogether (I had a backup
so could restore it). No debugging material left so this is just for background.
=====
Enter the second system. It is a rented physical server in a datacenter with two
hard disks, joined into a single root btrfs (/dev/sd[ab]1 are swap partitions):
root@dehost:~# uname -a
Linux dehost 3.13.0-91-generic #138-Ubuntu SMP Fri Jun 24 17:00:34 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux
root@dehost:~# btrfs --version
Btrfs v3.12
root@dehost:~# btrfs fi show
Label: none uuid: 67a2708c-f039-4783-a699-6f6be0dac318
Total devices 2 FS bytes used 442.58GiB
devid 1 size 2.72TiB used 444.04GiB path /dev/sda2
devid 2 size 2.72TiB used 444.03GiB path /dev/sdb2
Btrfs v3.12
root@dehost:~# btrfs fi df /
Data, RAID1: total=440.00GiB, used=439.51GiB
System, RAID1: total=32.00MiB, used=72.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, RAID1: total=4.00GiB, used=3.07GiB
A week ago, the system started to become unresponsive every day. Kernel works
(responds to ping) but no processes can start. Looking at the logs after reboot
I noticed that activity stops some time after the start of backup cron job that
covers a set of directories (/etc, /home, /var/mail and some more.). I disabled
the backup job and since then, several days, it did not hang.
=====
My question to the developers: what can I do to (1) recover the filesystem while
it is mounted (I can use recovery netboot system and run `btrfs check` as the
last resort), and (2) provide any useful debugging information to the developers?
Thank you,
Eugene
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: btrfs ops hang indefinitely (process in D state)
2016-07-02 9:49 btrfs ops hang indefinitely (process in D state) Eugene Crosser
@ 2016-07-02 10:54 ` Duncan
2016-07-02 13:32 ` Eugene Crosser
0 siblings, 1 reply; 3+ messages in thread
From: Duncan @ 2016-07-02 10:54 UTC (permalink / raw)
To: linux-btrfs
Eugene Crosser posted on Sat, 02 Jul 2016 12:49:53 +0300 as excerpted:
> Enter the second system. It is a rented physical server in a datacenter
> with two hard disks, joined into a single root btrfs (/dev/sd[ab]1 are
> swap partitions):
>
> root@dehost:~# uname -a
> Linux dehost 3.13.0-91-generic [...]
> root@dehost:~# btrfs --version
> Btrfs v3.12
> root@dehost:~#
v3.12 userspace and v3.13 kernel are both ancient history in btrfs terms,
far too old to provide anything useful in terms of debugging info.
In general, btrfs is not yet fully stable, and usage on the production
systems where that ancient a kernel and userspace might be considered for
stability reasons is considered highly incompatible with that sort of an
interest in stability at the cost of new features, because btrfs itself
isn't anything close to that level of stable. So the general
recommendation is choose one, either the still stabilizing btrfs on a
more current system if you want btrfs, or something truly stable, if you
really need that sort of years outdated stability.
That said, while this list does tend to focus on mainline and the last
two mainline releases series of the current and LTS kernels, so ATM 4.6
and 4.5 for current and 4.4 and 4.1 for LTS, not really much earlier, we
recognize that various distros do backporting and support much further
back. But this list tracks mainline, not those distro kernels, and
specifically, we don't track what they've backported vs. what they
haven't. So if you wish to use your distro's old kernels, that's fine,
but you're going to be better off going to them for support then, because
they'll know what they've backported and what they haven't and are thus
in a better position to provide that support.
Meanwhile, I do recognize that you had something similar happen on a much
newer kernel as well, but that was on a different system, and you don't
have the details or logs left for that one, so that's not of much help
either.
Unless of course you can duplicate the behavior once again with a
reasonably current kernel within the two-release series either LTS or
current range, as specified above, and can provide the logs, etc, from
it...
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: btrfs ops hang indefinitely (process in D state)
2016-07-02 10:54 ` Duncan
@ 2016-07-02 13:32 ` Eugene Crosser
0 siblings, 0 replies; 3+ messages in thread
From: Eugene Crosser @ 2016-07-02 13:32 UTC (permalink / raw)
To: Duncan, linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 2916 bytes --]
Hi Duncan,
I pretty much understand the risks and do not need them to be explained to me.
When I installed the remote system, the versions where pretty close to the
cutting edge. And the problem looks as if it *could* to be the same in 3.13 and
in 4.4 kernels.
I wrote here to ask advice about "live" recovery *if* you have any, and to offer
debug information *if* you interested.
If you do not have advice for me, and are not interested in the sort of debug
data that I *can* provide, so be it...
Regards,
Eugene
On 07/02/2016 01:54 PM, Duncan wrote:
> Eugene Crosser posted on Sat, 02 Jul 2016 12:49:53 +0300 as excerpted:
>
>> Enter the second system. It is a rented physical server in a datacenter
>> with two hard disks, joined into a single root btrfs (/dev/sd[ab]1 are
>> swap partitions):
>>
>> root@dehost:~# uname -a
>> Linux dehost 3.13.0-91-generic [...]
>> root@dehost:~# btrfs --version
>> Btrfs v3.12
>> root@dehost:~#
>
> v3.12 userspace and v3.13 kernel are both ancient history in btrfs terms,
> far too old to provide anything useful in terms of debugging info.
>
> In general, btrfs is not yet fully stable, and usage on the production
> systems where that ancient a kernel and userspace might be considered for
> stability reasons is considered highly incompatible with that sort of an
> interest in stability at the cost of new features, because btrfs itself
> isn't anything close to that level of stable. So the general
> recommendation is choose one, either the still stabilizing btrfs on a
> more current system if you want btrfs, or something truly stable, if you
> really need that sort of years outdated stability.
>
> That said, while this list does tend to focus on mainline and the last
> two mainline releases series of the current and LTS kernels, so ATM 4.6
> and 4.5 for current and 4.4 and 4.1 for LTS, not really much earlier, we
> recognize that various distros do backporting and support much further
> back. But this list tracks mainline, not those distro kernels, and
> specifically, we don't track what they've backported vs. what they
> haven't. So if you wish to use your distro's old kernels, that's fine,
> but you're going to be better off going to them for support then, because
> they'll know what they've backported and what they haven't and are thus
> in a better position to provide that support.
>
> Meanwhile, I do recognize that you had something similar happen on a much
> newer kernel as well, but that was on a different system, and you don't
> have the details or logs left for that one, so that's not of much help
> either.
>
> Unless of course you can duplicate the behavior once again with a
> reasonably current kernel within the two-release series either LTS or
> current range, as specified above, and can provide the logs, etc, from
> it...
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-07-02 13:33 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-02 9:49 btrfs ops hang indefinitely (process in D state) Eugene Crosser
2016-07-02 10:54 ` Duncan
2016-07-02 13:32 ` Eugene Crosser
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).