* Filesystem hang on kernel 4.2.0 with copy reflink
@ 2016-01-04 8:35 Mark Zealey
2016-01-04 10:41 ` Jack Wang
0 siblings, 1 reply; 4+ messages in thread
From: Mark Zealey @ 2016-01-04 8:35 UTC (permalink / raw)
To: linux-btrfs
Hi there, I've run into a very strange hang with btrfs. I was trying to
restore a directory (postgres database) from a readonly snapshot. To do
this i used the command `cp -ar --reflink=always`. This worked fine for
100s of files, however when it got to a particular file 16 kworker
threads (I have 8 processors in this system) got marked as being in D
state (with 0 cpu usage or disk usage) and I could not access the btrfs
file system any more. I can't see any kernel message or OOPS. Can you
please let me know what additional debug information I can provide to
help track this issue down in the kernel?
System is latest ubuntu 14.04 LTS with a backported wily kernel (package
linux-image-4.2.0-22-generic):
4.2.0-22-generic #27~14.04.1-Ubuntu SMP Fri Dec 18 10:57:53 UTC 2015
x86_64 x86_64 x86_64 GNU/Linux
Thanks
Mark
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Filesystem hang on kernel 4.2.0 with copy reflink
2016-01-04 8:35 Filesystem hang on kernel 4.2.0 with copy reflink Mark Zealey
@ 2016-01-04 10:41 ` Jack Wang
2016-01-04 12:11 ` Mark Zealey
0 siblings, 1 reply; 4+ messages in thread
From: Jack Wang @ 2016-01-04 10:41 UTC (permalink / raw)
To: Mark Zealey; +Cc: linux-btrfs
Hi Mark,
Could you do below when the hang happens, and post the dmesg.
echo w > /proc/sysrq-trigger
2016-01-04 9:35 GMT+01:00 Mark Zealey <mark@markandruth.co.uk>:
> Hi there, I've run into a very strange hang with btrfs. I was trying to
> restore a directory (postgres database) from a readonly snapshot. To do this
> i used the command `cp -ar --reflink=always`. This worked fine for 100s of
> files, however when it got to a particular file 16 kworker threads (I have 8
> processors in this system) got marked as being in D state (with 0 cpu usage
> or disk usage) and I could not access the btrfs file system any more. I
> can't see any kernel message or OOPS. Can you please let me know what
> additional debug information I can provide to help track this issue down in
> the kernel?
>
> System is latest ubuntu 14.04 LTS with a backported wily kernel (package
> linux-image-4.2.0-22-generic):
>
> 4.2.0-22-generic #27~14.04.1-Ubuntu SMP Fri Dec 18 10:57:53 UTC 2015 x86_64
> x86_64 x86_64 GNU/Linux
>
> Thanks
>
> Mark
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Filesystem hang on kernel 4.2.0 with copy reflink
2016-01-04 10:41 ` Jack Wang
@ 2016-01-04 12:11 ` Mark Zealey
2016-01-09 9:28 ` Duncan
0 siblings, 1 reply; 4+ messages in thread
From: Mark Zealey @ 2016-01-04 12:11 UTC (permalink / raw)
To: Jack Wang; +Cc: linux-btrfs
It overflowed the dmesg buffer but hopefully contains enough cores -
https://mark.zealey.org/download/btrfs_crash.txt
Some other output:
# mount
/dev/sdb1 on / type btrfs (rw,noatime,skip_balance,subvol=@)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/cgroup type tmpfs (rw)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
none on /sys/firmware/efi/efivars type efivarfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
none on /run/user type tmpfs
(rw,noexec,nosuid,nodev,size=104857600,mode=0755)
none on /sys/fs/pstore type pstore (rw)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu type cgroup (rw,relatime,cpu)
cgroup on /sys/fs/cgroup/cpuacct type cgroup (rw,relatime,cpuacct)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup
(rw,relatime,freezer,release_agent=/run/cgmanager/agents/cgm-release-agent.freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup
(rw,relatime,net_cls,release_agent=/run/cgmanager/agents/cgm-release-agent.net_cls)
/dev/sdb1 on /home type btrfs (rw,noatime,skip_balance,subvol=@home)
/dev/sdb3 on /boot/efi type vfat (rw)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc
(rw,noexec,nosuid,nodev)
rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw)
systemd on /sys/fs/cgroup/systemd type cgroup
(rw,noexec,nosuid,nodev,none,name=systemd)
ps auwx|grep ' D'
root 275 0.0 0.0 0 0 ? D Jan02 2:29
[btrfs-transacti]
root 361 0.0 0.0 0 0 ? D 13:30 0:00
[kworker/u16:5]
root 404 0.0 0.0 0 0 ? D 13:31 0:00
[kworker/u16:7]
root 1127 0.0 0.0 0 0 ? D 13:54 0:00
[kworker/u16:0]
root 1137 0.0 0.0 0 0 ? D 13:54 0:00
[kworker/u16:2]
root 1189 2.3 0.0 25932 2216 pts/7 D+ 13:55 0:02 cp -vax
--reflink=always /.snapshots/psql/var/lib/postgresql/ .
root 1191 0.0 0.0 0 0 ? D 13:55 0:00
[kworker/u16:3]
root 1197 0.0 0.0 0 0 ? D 13:55 0:00
[kworker/u16:4]
root 1200 0.0 0.0 0 0 ? D 13:55 0:00
[kworker/u16:8]
root 1201 0.0 0.0 0 0 ? D 13:55 0:00
[kworker/u16:10]
root 1230 0.0 0.0 0 0 ? D 13:55 0:00
[kworker/u16:15]
root 1231 0.0 0.0 0 0 ? D 13:55 0:00
[kworker/u16:16]
root 14569 0.0 0.0 0 0 ? D 12:18 0:00
[kworker/u16:9]
root 14572 0.0 0.0 0 0 ? D 12:19 0:00
[kworker/u16:11]
root 14573 0.0 0.0 0 0 ? D 12:19 0:00
[kworker/u16:12]
root 14582 0.0 0.0 0 0 ? D 12:19 0:00
[kworker/u16:13]
root 32228 0.0 0.0 0 0 ? D 13:17 0:00
[kworker/u16:1]
The last output of the cp:
‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25009’ ->
‘./postgresql/9.5/main/base/16385/25009’
‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25011’ ->
‘./postgresql/9.5/main/base/16385/25011’
‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25012’ ->
‘./postgresql/9.5/main/base/16385/25012’
‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25243’ ->
‘./postgresql/9.5/main/base/16385/25243’
‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25246’ ->
‘./postgresql/9.5/main/base/16385/25246’
‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25248’ ->
‘./postgresql/9.5/main/base/16385/25248’
‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25249’ ->
‘./postgresql/9.5/main/base/16385/25249’
‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25251’ ->
‘./postgresql/9.5/main/base/16385/25251’
‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25254’ ->
‘./postgresql/9.5/main/base/16385/25254’
‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25256’ ->
‘./postgresql/9.5/main/base/16385/25256’
‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25257’ ->
‘./postgresql/9.5/main/base/16385/25257’
‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25283’ ->
‘./postgresql/9.5/main/base/16385/25283’
And those (and other files) that it would have copied:
-rw------- 1 postgres postgres 0 Dec 30 18:11
/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25243
-rw------- 1 postgres postgres 0 Dec 30 18:11
/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25246
-rw------- 1 postgres postgres 8192 Dec 30 18:11
/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25248
-rw------- 1 postgres postgres 8192 Dec 30 18:11
/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25249
-rw------- 1 postgres postgres 0 Dec 30 18:11
/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25251
-rw------- 1 postgres postgres 0 Dec 30 18:11
/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25254
-rw------- 1 postgres postgres 8192 Dec 30 18:11
/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25256
-rw------- 1 postgres postgres 8192 Dec 30 18:11
/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25257
-rw------- 1 postgres postgres 409624576 Dec 30 19:10
/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25283
-rw------- 1 postgres postgres 122880 Dec 30 18:29
/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25283_fsm
-rw------- 1 postgres postgres 0 Dec 30 18:22
/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25284
-rw------- 1 postgres postgres 8192 Dec 30 18:22
/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25285
Also I have quota tracking enabled on the btrfs volume if that makes any
difference.
Mark
On 04/01/16 12:41, Jack Wang wrote:
> Hi Mark,
>
> Could you do below when the hang happens, and post the dmesg.
>
> echo w > /proc/sysrq-trigger
>
> 2016-01-04 9:35 GMT+01:00 Mark Zealey <mark@markandruth.co.uk>:
>> Hi there, I've run into a very strange hang with btrfs. I was trying to
>> restore a directory (postgres database) from a readonly snapshot. To do this
>> i used the command `cp -ar --reflink=always`. This worked fine for 100s of
>> files, however when it got to a particular file 16 kworker threads (I have 8
>> processors in this system) got marked as being in D state (with 0 cpu usage
>> or disk usage) and I could not access the btrfs file system any more. I
>> can't see any kernel message or OOPS. Can you please let me know what
>> additional debug information I can provide to help track this issue down in
>> the kernel?
>>
>> System is latest ubuntu 14.04 LTS with a backported wily kernel (package
>> linux-image-4.2.0-22-generic):
>>
>> 4.2.0-22-generic #27~14.04.1-Ubuntu SMP Fri Dec 18 10:57:53 UTC 2015 x86_64
>> x86_64 x86_64 GNU/Linux
>>
>> Thanks
>>
>> Mark
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Filesystem hang on kernel 4.2.0 with copy reflink
2016-01-04 12:11 ` Mark Zealey
@ 2016-01-09 9:28 ` Duncan
0 siblings, 0 replies; 4+ messages in thread
From: Duncan @ 2016-01-09 9:28 UTC (permalink / raw)
To: linux-btrfs
Mark Zealey posted on Mon, 04 Jan 2016 14:11:00 +0200 as excerpted:
> Also I have quota tracking enabled on the btrfs volume if that makes any
> difference.
I'm not sure whether it makes a difference for this particular hang, but
I do know that btrfs quotas are simply not stable thru at least 4.3. I'm
not sure what 4.4 status is, but my general recommendation regarding
btrfs quotas is...
If you need quotas, use a more mature filesystem where they work
reliably, if you don't, then turn them off for now, and don't turn them
on again until at least two complete kernel cycles have passed without a
known quota bug or instability, which, if 4.4 is indeed known-bug-free
and it and 4.5 remain so thru 4.6, means that would be the earliest I
could recommend turning it on, and that's only if there's no known quota
bugs in 4.4 or 4.5 or 4.6 by the time of 4.6 release.
Unless of course you're deliberately and specifically testing btrfs
quotas and working with the devs on fixing quota specific issues, in
which case, thank you. =:^)
Meanwhile, the rather long history of problems with quotas on btrfs is
both the reason I'm suggesting two complete cycles without quota issues
before enabling them, and a good reason to be skeptical as to current
releases' quota code or the likelihood of that two-releases-quota-bug-
free happening any time soon. There's definitely a lot of work going
into the feature and there's gotta be a point where it actually works,
but they've rewritten the code three times and are still dealing with
bugs, and it has been in a "try back in a couple kernel cycles" state for
years, now, so... who knows? Kinda reminds me of how long the raid56
code took, only quota was being worked on before that, and is still being
worked on, so...
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-01-09 9:28 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-04 8:35 Filesystem hang on kernel 4.2.0 with copy reflink Mark Zealey
2016-01-04 10:41 ` Jack Wang
2016-01-04 12:11 ` Mark Zealey
2016-01-09 9:28 ` Duncan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).