From: Martin Steigerwald <Martin@lichtvoll.de>
To: linux-btrfs@vger.kernel.org
Cc: Arne Jansen <sensille@gmx.net>
Subject: Re: 3.2-rc4: scrubbing locks up the kernel, then hung tasks on boot
Date: Thu, 15 Mar 2012 18:32:19 +0100 [thread overview]
Message-ID: <201203151832.19703.Martin@lichtvoll.de> (raw)
In-Reply-To: <4F48986D.4090703@gmx.net>
Am Samstag, 25. Februar 2012 schrieb Arne Jansen:
> On 02/24/12 16:51, Martin Steigerwald wrote:
> > Am Samstag, 21. Januar 2012 schrieb Martin Steigerwald:
> >> Am Samstag, 21. Januar 2012 schrieb Martin Steigerwald:
> >>> I still have this with 3.2.0-1-pae - which is a debian kernel bas=
ed
> >>> on 3.2.1.
> >>>=20
> >>> When I do btrfs scrub start / the machine locks immediately up
> >>> hard.
> >>>=20
> >>> Then usually on next boot it stops on space_cache enabled message=
,
> >>> but not the one for /, but the one for /home which is mounted
> >>> later.
> >>>=20
> >>> When I then boot with 3.1 it works. BTRFS redos the space_cache
> >>> then while the machine takes ages to boot - I mean ages - 10
> >>> minutes till KDM prompt is no problem there.
> >>=20
> >> I now tested scrubbing /home which is a different BTRFS filesystem
> >> on the same machine.
> >>=20
> >> Then the scrub is started, scrub status tells me so, but nothing
> >> happens, no block in/out activity in vmstat, no CPU related activi=
ty
> >> in top.
> >>=20
> >> btrfs scrub cancel then hangs, but not the complete machine, only
> >> the process.
> >>=20
> >> I had this once on my T520 with the internal Intel SSD 320 as well=
=2E
> >> The other time it worked.
> >>=20
> >> Well maybe that is due to BTRFS doing something else on my T23 now=
:
> >>=20
> >> deepdance:~> ps aux | grep ino-cache | grep -v grep
> >> root 1992 5.5 0.0 0 0 ? D 12:15 0:09
> >> [btrfs- ino-cache]
> >>=20
> >> Hmmm, so I just let it sit for a while, maybe eventually it will
> >> scrub /home.
> >>=20
> >> At least it doesn=C2=B4t lock up hard, so there might really be so=
mething
> >> strange with /.
> >=20
> > FWIW a btrfs filesystem balance / does work. After this a btrfs scr=
ub
> > start / still locks the kernel.
>
> Hi Martin,
>=20
> I just sent 2 patches to the list. Could you please test if these
> fix your problem with scrub?
I didn=C2=B4t yet test it but I tried the first balance then scrub stuf=
f again:
deepdance:~> btrfs filesystem balance /
This time scrub didn=C2=B4t lock up hard:
deepdance:~> btrfs scrub start /
scrub started on /, fsid 2bf5b1dc-1d89-4f0d-a561-1a5551a27275 (pid=3D33=
47)
deepdance:~> btrfs scrub status /
scrub status for 2bf5b1dc-1d89-4f0d-a561-1a5551a27275
scrub started at Thu Mar 15 18:10:52 2012, running for 10 secon=
ds
total bytes scrubbed: 272.59MB with 0 errors
deepdance:~> cat /proc/version
Linux version 3.2.0-1-686-pae (Debian 3.2.6-1) (ben@decadent.org.uk)
(gcc version 4.6.2 (Debian 4.6.2-14) ) #1 SMP Fri Feb 17 06:27:21 UTC 2=
012
deepdance:~> btrfs scrub status /
scrub status for 2bf5b1dc-1d89-4f0d-a561-1a5551a27275
scrub started at Thu Mar 15 18:10:52 2012, running for 55 secon=
ds
total bytes scrubbed: 1.29GB with 0 errors
deepdance:~> btrfs scrub status /
scrub status for 2bf5b1dc-1d89-4f0d-a561-1a5551a27275
scrub started at Thu Mar 15 18:10:52 2012, running for 175 seco=
nds
total bytes scrubbed: 2.58GB with 0 errors
But it seems to be stuck now:
deepdance:~> btrfs scrub status /
scrub status for 2bf5b1dc-1d89-4f0d-a561-1a5551a27275
scrub started at Thu Mar 15 18:10:52 2012, running for 295 seco=
nds
total bytes scrubbed: 2.58GB with 0 errors
deepdance:~> btrfs scrub status /
scrub status for 2bf5b1dc-1d89-4f0d-a561-1a5551a27275
scrub started at Thu Mar 15 18:10:52 2012, running for 395 seco=
nds
total bytes scrubbed: 2.58GB with 0 errors
deepdance:~> btrfs scrub status /
scrub status for 2bf5b1dc-1d89-4f0d-a561-1a5551a27275
scrub started at Thu Mar 15 18:10:52 2012, running for 636 seco=
nds
total bytes scrubbed: 2.58GB with 0 errors
There is almost no activity in vmstat 5:
2 0 108812 63812 36 412716 0 22 0 437 515 1723 47 13=
37 3
2 0 108812 63320 36 413168 0 0 81 0 453 1600 44 12=
43 1
2 0 108812 68056 36 413512 0 0 64 0 480 1589 51 11=
38 0
2 0 108812 68112 36 413516 0 0 0 0 460 1454 46 11=
43 0
1 0 108812 68112 36 413536 0 0 0 1 454 1456 45 11=
44 0
1 1 108812 67924 36 413692 0 0 0 330 491 1578 49 13=
37 1
2 0 108812 67808 36 413792 0 0 0 6 473 1556 48 11=
41 0
2 0 108812 67436 36 414124 0 0 0 424 523 1922 47 13=
39 2
1 0 108812 67460 36 414136 0 0 0 0 453 1578 46 9=
45 0
2 0 108812 67476 36 414136 0 0 0 0 465 1493 46 12=
41 0
procs -----------memory---------- ---swap-- -----io---- -system-- ----c=
pu----
r b swpd free buff cache si so bi bo in cs us sy=
id wa
3 0 108812 67492 36 414140 0 0 0 0 467 1609 48 10=
42 0
2 0 108812 67500 36 414148 0 0 0 0 470 1686 46 12=
41 0
1 0 108812 66872 36 414152 0 0 0 1 463 1585 46 11=
43 0
1 0 108812 67492 36 414156 0 0 0 1 483 1612 48 13=
39 0
2 0 108812 67428 36 414156 0 0 0 325 497 1654 48 9=
41 1
0 0 108812 67452 36 414160 0 0 0 3 478 1588 46 15=
39 0
2 0 108812 65840 36 415228 0 0 214 0 465 1408 54 10=
36 0
0 0 108812 66124 36 415352 0 0 24 0 510 1476 62 14=
22 2
Given the general slowness of the machine, I waited about 15 seconds on=
a
tail -f /var/log/syslog
while just a Konsole and Amarok was running and wait about that time or
even longer when logging via SSH at times, I think next I try newest
Debian kernel which should be up to 3.2.10.
If that doesn=C2=B4t work, I consider trying your patches.
At least its a step forward: The machine doesn=C2=B4t lock up hard anym=
ore. Its
still playing music.
=46WIW a cancel hangs:
deepdance:~> btrfs scrub cancel /
martin@deepdance:~> ps aux | grep btrfs | grep cancel
root 3429 0.0 0.0 2128 304 pts/5 D+ 18:24 0:00 btrfs =
scrub cancel /
Ah, now there we go, from dmesg:
16453.391004] btrfs: relocating block group 41393586176 flags 1
[16568.504524] btrfs: found 9891 extents
[16660.810000] btrfs: found 9891 extents
[16664.818271] btrfs: relocating block group 40319844352 flags 1
[16729.029078] btrfs: found 3593 extents
[16755.771257] btrfs: found 3593 extents
[16757.816472] btrfs: relocating block group 39246102528 flags 1
[16843.042862] btrfs: found 13147 extents
[16898.078594] btrfs: found 13147 extents
[16905.444650] btrfs: relocating block group 39237713920 flags 34
[16910.579963] btrfs: found 1 extents
[16915.946553] btrfs: relocating block group 39103496192 flags 36
[16997.308374] btrfs: found 14552 extents
[17002.239868] btrfs: relocating block group 38969278464 flags 36
[17165.253567] btrfs: found 22792 extents
[17166.522513] btrfs: relocating block group 38835060736 flags 36
[17376.519168] btrfs: found 22857 extents
[17377.526508] btrfs: relocating block group 38700843008 flags 36
[17559.413243] btrfs: found 24810 extents
[17563.767590] btrfs: relocating block group 37627101184 flags 1
[17646.272226] btrfs: found 10900 extents
[17694.323897] btrfs: found 10900 extents
[17697.171187] btrfs: relocating block group 36553359360 flags 1
[17779.364231] btrfs: found 12149 extents
[17828.395574] btrfs: found 12149 extents
[17831.593055] btrfs: relocating block group 35479617536 flags 1
[17923.352624] btrfs: found 22174 extents
[18014.147595] btrfs: found 22174 extents
[18015.821756] btrfs: relocating block group 35345399808 flags 36
[18055.201043] btrfs: found 9166 extents
[18061.692337] btrfs: relocating block group 34271657984 flags 1
[18173.986306] btrfs: found 29124 extents
[18328.733328] btrfs: found 29124 extents
[18333.141061] btrfs: relocating block group 33197916160 flags 1
[18427.523898] btrfs: found 26881 extents
[18587.015489] btrfs: found 26880 extents
[18590.183797] btrfs: relocating block group 33063698432 flags 36
[18590.183797] btrfs: relocating block group 33063698432 flags 36
[18624.883576] btrfs: found 6522 extents
[18628.618415] btrfs: relocating block group 32929480704 flags 36
[18670.147756] btrfs: found 9876 extents
[18671.426925] btrfs: relocating block group 32795262976 flags 36
[18863.313381] btrfs: found 27324 extents
[18864.392712] btrfs: relocating block group 31721521152 flags 1
[18945.941315] btrfs: found 21748 extents
[19076.164135] btrfs: found 21721 extents
[20880.564211] INFO: task btrfs:3348 blocked for more than 120 seconds.
[20880.564225] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disab=
les this message.
[20880.564234] btrfs D 000012d2 0 3348 1 0x00000000
[20880.564251] c0b72080 00000086 2cae1a17 000012d2 ec3ae5a0 00000000 0=
00012d2 c14769c0
[20880.564275] c0b7222c c14769c0 00008050 e0b96ce0 ee000120 d1d5dae0 d=
1d5db20 00000028
[20880.564298] c105d451 00000092 c12b949b 00000092 f0b7037c 00000246 e=
ee6073c c105d451
[20880.564321] Call Trace:
[20880.564348] [<c105d451>] ? arch_local_irq_save+0xf/0x14
[20880.564368] [<c12b949b>] ? _raw_spin_lock_irqsave+0x8/0x21
[20880.564482] [<f0b7037c>] ? btrfs_queue_worker+0x1cd/0x1fc [btrfs]
[20880.564497] [<c105d451>] ? arch_local_irq_save+0xf/0x14
[20880.564511] [<c12b949b>] ? _raw_spin_lock_irqsave+0x8/0x21
[20880.564526] [<c104d402>] ? prepare_to_wait+0x12/0x50
[20880.564601] [<f0b8a8c8>] ? btrfs_reada_wait+0x5b/0x84 [btrfs]
[20880.564615] [<c104d4d0>] ? add_wait_queue+0x30/0x30
[20880.564687] [<f0b87e33>] ? scrub_stripe+0x34c/0xbda [btrfs]
[20880.564729] [<f0a4da1f>] ? scsi_alloc_sgtable+0x1a/0x38 [scsi_mod]
[20880.564786] [<f0b37a7f>] ? comp_keys+0x2f/0x34 [btrfs]
[20880.564842] [<f0b37b4d>] ? generic_bin_search.constprop.31+0xc9/0x1=
04 [btrfs]
[20880.564856] [<c104d4d0>] ? add_wait_queue+0x30/0x30
[20880.564912] [<f0b37bbf>] ? bin_search+0x37/0x3d [btrfs]
[20880.564987] [<f0b5e3d4>] ? __lookup_extent_mapping+0xe5/0x101 [btrf=
s]
[20880.565061] [<f0b88781>] ? scrub_chunk.isra.13+0xc0/0xef [btrfs]
[20880.565134] [<f0b889a3>] ? scrub_enumerate_chunks+0x1f3/0x23a [btrf=
s]
[20880.565209] [<f0b89486>] ? btrfs_scrub_dev+0x215/0x34e [btrfs]
[20880.565230] [<c10c061a>] ? __kmalloc_track_caller+0x9b/0xa7
[20880.565248] [<c102a280>] ? should_resched+0x5/0x1e
[20880.565320] [<f0b74b14>] ? btrfs_ioctl+0xc55/0xda1 [btrfs]
[20880.565335] [<c1029188>] ? kmap_atomic_prot+0x2f/0xe0
[20880.565353] [<c10ad3d5>] ? handle_mm_fault+0x1ee/0x1fd
[20880.565426] [<f0b73ebf>] ? btrfs_ioctl_trans_end+0x49/0x49 [btrfs]
[20880.565442] [<c10d7023>] ? do_vfs_ioctl+0x459/0x48f
[20880.565457] [<c12bc3d7>] ? do_page_fault+0x2e0/0x2fc
[20880.565469] [<c12bc3c4>] ? do_page_fault+0x2cd/0x2fc
[20880.565484] [<c10d709d>] ? sys_ioctl+0x44/0x67
[20880.565499] [<c12bdd1f>] ? sysenter_do_call+0x12/0x28
[21000.564277] INFO: task btrfs:3348 blocked for more than 120 seconds.
[21000.564292] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disab=
les this message.
[21000.564302] btrfs D 000012d2 0 3348 1 0x00000000
[21000.564319] c0b72080 00000086 2cae1a17 000012d2 ec3ae5a0 00000000 0=
00012d2 c14769c0
[21000.564344] c0b7222c c14769c0 00008050 e0b96ce0 ee000120 d1d5dae0 d=
1d5db20 00000028
[21000.564366] c105d451 00000092 c12b949b 00000092 f0b7037c 00000246 e=
ee6073c c105d451
[21000.564390] Call Trace:
[21000.564422] [<c105d451>] ? arch_local_irq_save+0xf/0x14
[21000.564442] [<c12b949b>] ? _raw_spin_lock_irqsave+0x8/0x21
[21000.564582] [<f0b7037c>] ? btrfs_queue_worker+0x1cd/0x1fc [btrfs]
[21000.564600] [<c105d451>] ? arch_local_irq_save+0xf/0x14
[21000.564614] [<c12b949b>] ? _raw_spin_lock_irqsave+0x8/0x21
[21000.564630] [<c104d402>] ? prepare_to_wait+0x12/0x50
[21000.564707] [<f0b8a8c8>] ? btrfs_reada_wait+0x5b/0x84 [btrfs]
[21000.564721] [<c104d4d0>] ? add_wait_queue+0x30/0x30
[21000.564793] [<f0b87e33>] ? scrub_stripe+0x34c/0xbda [btrfs]
[21000.564840] [<f0a4da1f>] ? scsi_alloc_sgtable+0x1a/0x38 [scsi_mod]
[21000.564899] [<f0b37a7f>] ? comp_keys+0x2f/0x34 [btrfs]
[21000.564954] [<f0b37b4d>] ? generic_bin_search.constprop.31+0xc9/0x1=
04 [btrfs]
[21000.564968] [<c104d4d0>] ? add_wait_queue+0x30/0x30
[21000.565024] [<f0b37bbf>] ? bin_search+0x37/0x3d [btrfs]
[21000.565099] [<f0b5e3d4>] ? __lookup_extent_mapping+0xe5/0x101 [btrf=
s]
[21000.565173] [<f0b88781>] ? scrub_chunk.isra.13+0xc0/0xef [btrfs]
[21000.565246] [<f0b889a3>] ? scrub_enumerate_chunks+0x1f3/0x23a [btrf=
s]
[21000.565322] [<f0b89486>] ? btrfs_scrub_dev+0x215/0x34e [btrfs]
[21000.565345] [<c10c061a>] ? __kmalloc_track_caller+0x9b/0xa7
[21000.565363] [<c102a280>] ? should_resched+0x5/0x1e
[21000.565436] [<f0b74b14>] ? btrfs_ioctl+0xc55/0xda1 [btrfs]
[21000.565451] [<c1029188>] ? kmap_atomic_prot+0x2f/0xe0
[21000.565471] [<c10ad3d5>] ? handle_mm_fault+0x1ee/0x1fd
[21000.565544] [<f0b73ebf>] ? btrfs_ioctl_trans_end+0x49/0x49 [btrfs]
[21000.565561] [<c10d7023>] ? do_vfs_ioctl+0x459/0x48f
[21000.565577] [<c12bc3d7>] ? do_page_fault+0x2e0/0x2fc
[21000.565589] [<c12bc3c4>] ? do_page_fault+0x2cd/0x2fc
[21000.565603] [<c10d709d>] ? sys_ioctl+0x44/0x67
[21000.565619] [<c12bdd1f>] ? sysenter_do_call+0x12/0x28
[=E2=80=A6 it seems to go on this way =E2=80=A6]
Maybe something of that is helpful?
I leave the machine running for a while, but I think it won=C2=B4t hibe=
rnate
with a process stuck in D state so eventually at some time I will shut =
it
down and restart. OTOH I could leave it running till tomorrow or what.
I will put log files aside for further diagnostics in any case.
Ciao,
--=20
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-03-15 17:32 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-17 17:33 3.2-rc4: scrubbing locks up the kernel, then hung tasks on boot Martin Steigerwald
2011-12-17 18:11 ` Martin Steigerwald
2011-12-20 20:46 ` Martin Steigerwald
2012-01-21 10:49 ` Martin Steigerwald
2012-01-21 11:19 ` Martin Steigerwald
2012-02-24 15:51 ` Martin Steigerwald
2012-02-25 8:14 ` Arne Jansen
2012-02-25 20:14 ` Martin Steigerwald
2012-03-15 17:32 ` Martin Steigerwald [this message]
2012-03-15 17:39 ` Martin Steigerwald
2012-03-15 17:42 ` Chris Mason
2012-03-15 18:03 ` Martin Steigerwald
2012-03-15 18:08 ` Chris Mason
2012-03-16 15:05 ` Martin Steigerwald
2012-03-16 15:37 ` Arne Jansen
2012-03-17 9:43 ` Martin Steigerwald
2012-03-18 0:37 ` Greg Kroah-Hartman
2012-03-19 8:31 ` Martin Steigerwald
2012-03-19 15:48 ` Greg Kroah-Hartman
2012-03-19 16:03 ` Martin Steigerwald
2012-01-24 9:28 ` Arne Jansen
2012-01-24 10:16 ` Martin Steigerwald
2012-01-24 10:29 ` Arne Jansen
2012-01-24 10:39 ` Martin Steigerwald
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201203151832.19703.Martin@lichtvoll.de \
--to=martin@lichtvoll.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=sensille@gmx.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.