* [bcachefs] BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328]
@ 2016-09-03 21:29 Marcin
2016-09-03 21:57 ` Marcin
2016-09-04 0:17 ` Kent Overstreet
0 siblings, 2 replies; 7+ messages in thread
From: Marcin @ 2016-09-03 21:29 UTC (permalink / raw)
To: linux-bcache
Hi!
Kernel at commit c820493652e830dc050e1418301e1bdec5691a1e
I createt to devices, fast has size
# blockdev --getsz /dev/sde1
20971520
and slower device:
# blockdev --getsz /dev/sdd1
2930209551
I was copying files from one disk to bcache, after some time I got:
BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328]
# cat
/sys/fs/bcache/5ad3d751-cb40-4bb7-9546-3e5b48ccd3cf/cache0/dirty_bytes
/sys/fs/bcache/5ad3d751-cb40-4bb7-9546-3e5b48ccd3cf/cache1/dirty_bytes
9269264384
2540163072
dmesg says:
[ 9088.110001] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
[bch_copygc_read:5328]
[ 9088.110001] Modules linked in: bcache lz4_compress zlib_deflate
libcrc32c lz4_decompress sha1_generic cpufreq_ondemand msr af_packet
fuse pcrypt nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack ipv6
coretemp hwmon kvm_intel kvm irqbypass i2c_i801 i2c_core sky2 hpsa
acpi_cpufreq scsi_transport_sas processor button raid1 md_mod
pata_jmicron ehci_pci ehci_hcd dm_mod
[ 9088.110001] CPU: 0 PID: 5328 Comm: bch_copygc_read Not tainted 4.7.0+
#2
[ 9088.110001] Hardware name: Gigabyte Technology Co., Ltd.
965P-DS3/965P-DS3, BIOS F14 06/25/2009
[ 9088.110001] task: ffff88001b561a80 ti: ffff880002810000 task.ti:
ffff880002810000
[ 9088.110001] RIP: 0010:[<ffffffffc028795b>] [<ffffffffc028795b>]
bch_moving_gc+0xcb/0xd90 [bcache]
[ 9088.110001] RSP: 0018:ffff880002813d20 EFLAGS: 00000287
[ 9088.110001] RAX: ffffc900001f6e78 RBX: ffff8800102c5000 RCX:
000000000003c000
[ 9088.110001] RDX: ffffc9000027f000 RSI: ffff8800102c5000 RDI:
ffff8800102c5608
[ 9088.110001] RBP: ffff880002813e40 R08: ffff880002813d90 R09:
0000000002000000
[ 9088.110001] R10: 0000000000000bc6 R11: 0000000000064000 R12:
ffff8800014d0000
[ 9088.110001] R13: ffff8800102c5000 R14: 0000000000027800 R15:
0000000001b6c6e0
[ 9088.110001] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000)
knlGS:0000000000000000
[ 9088.110001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9088.110001] CR2: 000056048ce06f62 CR3: 000000003d0ce000 CR4:
00000000000006f0
[ 9088.110001] Stack:
[ 9088.110001] 0000000000015f00 0000000000015f00 ffffffff8107bd3f
ffff8800102c5000
[ 9088.110001] ffff880002813de0 0000000000000000 ffff880002813e68
ffff880002813e40
[ 9088.110001] 0000000000000002 0000000000064000 ffff8800102c5608
000000000000000f
[ 9088.110001] Call Trace:
[ 9088.110001] [<ffffffff8107bd3f>] ? sched_clock_cpu+0x7f/0xa0
[ 9088.110001] [<ffffffffc028878c>] bch_moving_gc_thread+0x16c/0x180
[bcache]
[ 9088.110001] [<ffffffffc0288620>] ? bch_moving_gc+0xd90/0xd90
[bcache]
[ 9088.110001] [<ffffffff8106e874>] kthread+0xc4/0xe0
[ 9088.110001] [<ffffffff8146b0bf>] ret_from_fork+0x1f/0x40
[ 9088.110001] [<ffffffff8106e7b0>] ? kthread_worker_fn+0x150/0x150
[ 9088.110001] Code: 00 48 8b 8b a8 02 00 00 48 8b 93 a8 05 00 00 48 8d
04 40 48 8d 0c 49 48 8d 04 82 48 8d 14 8a 48 39 d0 0f 83 7e 08 00 00 48
89 de <44> 8b 48 04 c6 40 09 00 45 85 c9 74 2e f7 40 04 01 00 01 00 75
Compression wasn't turned on.
Marcin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bcachefs] BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328]
2016-09-03 21:29 [bcachefs] BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328] Marcin
@ 2016-09-03 21:57 ` Marcin
2016-09-04 0:17 ` Kent Overstreet
1 sibling, 0 replies; 7+ messages in thread
From: Marcin @ 2016-09-03 21:57 UTC (permalink / raw)
To: linux-bcache
W dniu 2016-09-03 23:29, Marcin napisał(a):
> Hi!
> Kernel at commit c820493652e830dc050e1418301e1bdec5691a1e
It looks that I was messing with bcache-encryption and this bug appears
on bcache-encryption branch.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bcachefs] BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328]
2016-09-03 21:29 [bcachefs] BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328] Marcin
2016-09-03 21:57 ` Marcin
@ 2016-09-04 0:17 ` Kent Overstreet
2016-09-04 18:21 ` Marcin
1 sibling, 1 reply; 7+ messages in thread
From: Kent Overstreet @ 2016-09-04 0:17 UTC (permalink / raw)
To: Marcin; +Cc: linux-bcache
On Sat, Sep 03, 2016 at 11:29:49PM +0200, Marcin wrote:
> Hi!
> Kernel at commit c820493652e830dc050e1418301e1bdec5691a1e
>
> I createt to devices, fast has size
> # blockdev --getsz /dev/sde1
> 20971520
> and slower device:
> # blockdev --getsz /dev/sdd1
> 2930209551
>
> I was copying files from one disk to bcache, after some time I got:
> BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328]
Thanks for the report - can you run addr2line with your vmlinux file, and the
RIP?
addr2line -i -e vmlinux ffffffffc028795b
>
> # cat /sys/fs/bcache/5ad3d751-cb40-4bb7-9546-3e5b48ccd3cf/cache0/dirty_bytes
> /sys/fs/bcache/5ad3d751-cb40-4bb7-9546-3e5b48ccd3cf/cache1/dirty_bytes
> 9269264384
> 2540163072
>
> dmesg says:
> [ 9088.110001] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
> [bch_copygc_read:5328]
> [ 9088.110001] Modules linked in: bcache lz4_compress zlib_deflate libcrc32c
> lz4_decompress sha1_generic cpufreq_ondemand msr af_packet fuse pcrypt
> nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack ipv6 coretemp hwmon
> kvm_intel kvm irqbypass i2c_i801 i2c_core sky2 hpsa acpi_cpufreq
> scsi_transport_sas processor button raid1 md_mod pata_jmicron ehci_pci
> ehci_hcd dm_mod
> [ 9088.110001] CPU: 0 PID: 5328 Comm: bch_copygc_read Not tainted 4.7.0+ #2
> [ 9088.110001] Hardware name: Gigabyte Technology Co., Ltd.
> 965P-DS3/965P-DS3, BIOS F14 06/25/2009
> [ 9088.110001] task: ffff88001b561a80 ti: ffff880002810000 task.ti:
> ffff880002810000
> [ 9088.110001] RIP: 0010:[<ffffffffc028795b>] [<ffffffffc028795b>]
> bch_moving_gc+0xcb/0xd90 [bcache]
> [ 9088.110001] RSP: 0018:ffff880002813d20 EFLAGS: 00000287
> [ 9088.110001] RAX: ffffc900001f6e78 RBX: ffff8800102c5000 RCX:
> 000000000003c000
> [ 9088.110001] RDX: ffffc9000027f000 RSI: ffff8800102c5000 RDI:
> ffff8800102c5608
> [ 9088.110001] RBP: ffff880002813e40 R08: ffff880002813d90 R09:
> 0000000002000000
> [ 9088.110001] R10: 0000000000000bc6 R11: 0000000000064000 R12:
> ffff8800014d0000
> [ 9088.110001] R13: ffff8800102c5000 R14: 0000000000027800 R15:
> 0000000001b6c6e0
> [ 9088.110001] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000)
> knlGS:0000000000000000
> [ 9088.110001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 9088.110001] CR2: 000056048ce06f62 CR3: 000000003d0ce000 CR4:
> 00000000000006f0
> [ 9088.110001] Stack:
> [ 9088.110001] 0000000000015f00 0000000000015f00 ffffffff8107bd3f
> ffff8800102c5000
> [ 9088.110001] ffff880002813de0 0000000000000000 ffff880002813e68
> ffff880002813e40
> [ 9088.110001] 0000000000000002 0000000000064000 ffff8800102c5608
> 000000000000000f
> [ 9088.110001] Call Trace:
> [ 9088.110001] [<ffffffff8107bd3f>] ? sched_clock_cpu+0x7f/0xa0
> [ 9088.110001] [<ffffffffc028878c>] bch_moving_gc_thread+0x16c/0x180
> [bcache]
> [ 9088.110001] [<ffffffffc0288620>] ? bch_moving_gc+0xd90/0xd90 [bcache]
> [ 9088.110001] [<ffffffff8106e874>] kthread+0xc4/0xe0
> [ 9088.110001] [<ffffffff8146b0bf>] ret_from_fork+0x1f/0x40
> [ 9088.110001] [<ffffffff8106e7b0>] ? kthread_worker_fn+0x150/0x150
> [ 9088.110001] Code: 00 48 8b 8b a8 02 00 00 48 8b 93 a8 05 00 00 48 8d 04
> 40 48 8d 0c 49 48 8d 04 82 48 8d 14 8a 48 39 d0 0f 83 7e 08 00 00 48 89 de
> <44> 8b 48 04 c6 40 09 00 45 85 c9 74 2e f7 40 04 01 00 01 00 75
>
>
> Compression wasn't turned on.
>
> Marcin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bcachefs] BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328]
2016-09-04 0:17 ` Kent Overstreet
@ 2016-09-04 18:21 ` Marcin
2016-09-06 2:24 ` Kent Overstreet
0 siblings, 1 reply; 7+ messages in thread
From: Marcin @ 2016-09-04 18:21 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache
W dniu 2016-09-04 02:17, Kent Overstreet napisał(a):
Hi!
> On Sat, Sep 03, 2016 at 11:29:49PM +0200, Marcin wrote:
>> Hi!
>> Kernel at commit c820493652e830dc050e1418301e1bdec5691a1e
>>
>> I createt to devices, fast has size
>> # blockdev --getsz /dev/sde1
>> 20971520
>> and slower device:
>> # blockdev --getsz /dev/sdd1
>> 2930209551
>>
>> I was copying files from one disk to bcache, after some time I got:
>> BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328]
>
> Thanks for the report - can you run addr2line with your vmlinux file,
> and the
> RIP?
>
> addr2line -i -e vmlinux ffffffffc028795b
It returned:
??:0
Probably due to I'm using bcache as module.
<long story>
As I mentioned before I wasn't sure which branch I used to test. So I
started to reproduce problem again and watch which branch is used. I
encountered strange behaviour, random BUGs in kernel, even random
reboots. It looked like a hardware problem but I never got such problems
using this box as backup & samba server. Memtest86+ didn't found any
problem, cpuburn didn't trigger anything suspicious. Gcc compiles
without any strange failures. Nothing suspicious happened when I didn't
use bcachefs. So i spent a couple of hours trying to find out what is
going on. I tried bcachefs without tiering. When I formated /dev/sde1
then everything was ok. But when I tried to use /dev/sdd1 something
happened:
# bcache format /dev/sdd1
/dev/sdd1 contains a bcache filesystem
Proceed anyway? (y,n) y
UUID: 78a90acf-0247-4edf-a56f-f4f82a51e13e
Set UUID: a0a6e9fa-d7bb-4cec-8c42-b7e4bbccb2d8
version: 6
nbuckets: 3815462
block_size: 1
bucket_size: 768
nr_in_set: 1
nr_this_dev: 0
first_bucket: 3
# LC_ALL=en_US.utf-8 mount -t bcache /dev/sdd1 /mnt/test/
mount: mount(2) failed: No such file or directory
Kernel throws:
zserwerek kernel: [ 5832.483306] bcache: bch_open_as_blockdevs()
register_cache_set err Bad bucket size
Hmmm. So I resized /dev/sdd1 from 1.4TB to 100GB. Formated again, and
mounted without problem. Hmm. I created tiered bcachefs with smaller
/dev/sdd1 and it works stable.
<conclucion>
Please look at line with "bucket size":
bucket_size: 768
If bucket size is higher than (probably) 512 then I can't mount simple
(without tiering) bcachefs filesystem. If I use such big device in
tiered bcachefs I'm expieriencing random problems with stability of box.
I think that bug in mail's subject is only random symptom of problem
when device is formated with bucket size >512.
What is going inside kernel in this case, is overwittem memory of other
processes?
Marcin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bcachefs] BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328]
2016-09-04 18:21 ` Marcin
@ 2016-09-06 2:24 ` Kent Overstreet
2016-09-06 11:08 ` Marcin Mirosław
0 siblings, 1 reply; 7+ messages in thread
From: Kent Overstreet @ 2016-09-06 2:24 UTC (permalink / raw)
To: Marcin; +Cc: linux-bcache
On Sun, Sep 04, 2016 at 08:21:17PM +0200, Marcin wrote:
> W dniu 2016-09-04 02:17, Kent Overstreet napisał(a):
>
> Hi!
>
> > On Sat, Sep 03, 2016 at 11:29:49PM +0200, Marcin wrote:
> > > Hi!
> > > Kernel at commit c820493652e830dc050e1418301e1bdec5691a1e
> > >
> > > I createt to devices, fast has size
> > > # blockdev --getsz /dev/sde1
> > > 20971520
> > > and slower device:
> > > # blockdev --getsz /dev/sdd1
> > > 2930209551
> > >
> > > I was copying files from one disk to bcache, after some time I got:
> > > BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328]
> >
> > Thanks for the report - can you run addr2line with your vmlinux file,
> > and the
> > RIP?
> >
> > addr2line -i -e vmlinux ffffffffc028795b
>
> It returned:
> ??:0
>
> Probably due to I'm using bcache as module.
> <long story>
> As I mentioned before I wasn't sure which branch I used to test.
In case I didn't mention before - bcache-dev. This bug in the bcache-encryption
branch is a bit disconcerting though since my tests never hit it, but don't
worry about it - I'll chase it down.
> Please look at line with "bucket size":
> bucket_size: 768
> If bucket size is higher than (probably) 512 then I can't mount simple
> (without tiering) bcachefs filesystem. If I use such big device in tiered
> bcachefs I'm expieriencing random problems with stability of box.
> I think that bug in mail's subject is only random symptom of problem when
> device is formated with bucket size >512.
> What is going inside kernel in this case, is overwittem memory of other
> processes?
Whoops - that one is a bug in bcache-tools, non power of two bucket sizes aren't
supported (might be someday, but aren't currently). I just pushed a fix for that
to bcache-tools.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bcachefs] BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328]
2016-09-06 2:24 ` Kent Overstreet
@ 2016-09-06 11:08 ` Marcin Mirosław
2016-09-06 13:08 ` Kent Overstreet
0 siblings, 1 reply; 7+ messages in thread
From: Marcin Mirosław @ 2016-09-06 11:08 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcache
W dniu 06.09.2016 o 04:24, Kent Overstreet pisze:
Hi!
> On Sun, Sep 04, 2016 at 08:21:17PM +0200, Marcin wrote:
>> W dniu 2016-09-04 02:17, Kent Overstreet napisał(a):
>>
>> Hi!
>>
>>> On Sat, Sep 03, 2016 at 11:29:49PM +0200, Marcin wrote:
>>>> Hi!
>>>> Kernel at commit c820493652e830dc050e1418301e1bdec5691a1e
>>>>
>>>> I createt to devices, fast has size
>>>> # blockdev --getsz /dev/sde1
>>>> 20971520
>>>> and slower device:
>>>> # blockdev --getsz /dev/sdd1
>>>> 2930209551
>>>>
>>>> I was copying files from one disk to bcache, after some time I got:
>>>> BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328]
>>>
>>> Thanks for the report - can you run addr2line with your vmlinux file,
>>> and the
>>> RIP?
>>>
>>> addr2line -i -e vmlinux ffffffffc028795b
>>
>> It returned:
>> ??:0
>>
>> Probably due to I'm using bcache as module.
>> <long story>
>> As I mentioned before I wasn't sure which branch I used to test.
>
> In case I didn't mention before - bcache-dev. This bug in the bcache-encryption
> branch is a bit disconcerting though since my tests never hit it, but don't
> worry about it - I'll chase it down.
I think that bug "BUG: soft lockup" is due to problem with bucket size.
I saw many random, different bugs when second tiered device had bucket
size equal to 768.
>> Please look at line with "bucket size":
>> bucket_size: 768
>> If bucket size is higher than (probably) 512 then I can't mount simple
>> (without tiering) bcachefs filesystem. If I use such big device in tiered
>> bcachefs I'm expieriencing random problems with stability of box.
>> I think that bug in mail's subject is only random symptom of problem when
>> device is formated with bucket size >512.
>> What is going inside kernel in this case, is overwittem memory of other
>> processes?
>
> Whoops - that one is a bug in bcache-tools, non power of two bucket sizes aren't
> supported (might be someday, but aren't currently). I just pushed a fix for that
> to bcache-tools.
One mor thing, when I tested tiering with one device formated with
unsupported bucket sizethis command worked:
# mount /dev/sde1:/dev/sdd1 /mnt/test
but this one didn't:
# mount /dev/sdd1:/dev/sde1 /mnt/test
so:
<low priority wish>
it could be good to check if on disk format of every device is correct
and supported.
Thank you,
Marcin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bcachefs] BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328]
2016-09-06 11:08 ` Marcin Mirosław
@ 2016-09-06 13:08 ` Kent Overstreet
0 siblings, 0 replies; 7+ messages in thread
From: Kent Overstreet @ 2016-09-06 13:08 UTC (permalink / raw)
To: Marcin Mirosław; +Cc: linux-bcache
On Tue, Sep 06, 2016 at 01:08:57PM +0200, Marcin Mirosław wrote:
> <low priority wish>
> it could be good to check if on disk format of every device is correct
> and supported.
Whoops - that should be fixed now.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-09-06 13:09 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-03 21:29 [bcachefs] BUG: soft lockup - CPU#0 stuck for 22s! [bch_copygc_read:5328] Marcin
2016-09-03 21:57 ` Marcin
2016-09-04 0:17 ` Kent Overstreet
2016-09-04 18:21 ` Marcin
2016-09-06 2:24 ` Kent Overstreet
2016-09-06 11:08 ` Marcin Mirosław
2016-09-06 13:08 ` Kent Overstreet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).