kernel BUG at fs/ext4/inode.c:2428!

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* kernel BUG at fs/ext4/inode.c:2428!
@ 2016-11-14 10:15 Nikolay Borisov
  2016-11-14 13:49 ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Nikolay Borisov @ 2016-11-14 10:15 UTC (permalink / raw)
  To: linux-ext4, Theodore Ts'o, Jan Kara, Andreas Dilger

Hello, 

So I hit the following BUG_ON on 3 separate servers: 

[1387898.597939] sh (14886): drop_caches: 3
[1387945.259613] ------------[ cut here ]------------
[1387945.259791] kernel BUG at fs/ext4/inode.c:2428!
[1387945.259964] invalid opcode: 0000 [#1] SMP 
[1387945.263921] CPU: 9 PID: 8987 Comm: kworker/u24:23 Tainted: P           O    4.4.26-clouder1 #3
[1387945.264213] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 01/16/2015
[1387945.264512] Workqueue: writeback wb_workfn (flush-253:8)
[1387945.264780] task: ffff880287ca5280 ti: ffff8800064dc000 task.ti: ffff8800064dc000
[1387945.265073] RIP: 0010:[<ffffffff8122a3ac>]  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
[1387945.265426] RSP: 0018:ffff8800064df960  EFLAGS: 00010246
[1387945.265596] RAX: 02fffc0000030039 RBX: ffff8800064dfad0 RCX: 0000000000000537
[1387945.265881] RDX: 000000000000231b RSI: 0000000000000000 RDI: ffffffff81a052a6
[1387945.266165] RBP: ffff8800064dfa28 R08: 0000000000000000 R09: 0000000000000000
[1387945.266450] R10: 0000000000000001 R11: 0000000000000100 R12: ffff8800064df980
[1387945.266734] R13: 0000000000003400 R14: ffffffffffffffff R15: ffffea000686fbc0
[1387945.267024] FS:  0000000000000000(0000) GS:ffff88047fd20000(0000) knlGS:0000000000000000
[1387945.267315] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1387945.267487] CR2: ffffffffff600400 CR3: 00000004555ff000 CR4: 00000000000406e0
[1387945.267765] Stack:
[1387945.267925]  0000000000000000 ffff88015b3c2be0 ffff8800064df980 0000000000000538
[1387945.268386]  0000000000000001 0000000000000000 ffffea000686fbc0 ffff88046bce1800
[1387945.268850]  ffff8800064df9e8 ffffffff81274777 ffffffff02400040 0fd000086bce6800
[1387945.269319] Call Trace:
[1387945.269489]  [<ffffffff81274777>] ? jbd2__journal_start+0xe7/0x200
[1387945.269663]  [<ffffffff8122e581>] ? ext4_writepages+0x3a1/0xcd0
[1387945.269839]  [<ffffffff8125ba8d>] ? __ext4_journal_start_sb+0x6d/0x100
[1387945.270013]  [<ffffffff8122e5b2>] ext4_writepages+0x3d2/0xcd0
[1387945.270207]  [<ffffffffa07f878b>] ? leaf_space_used+0xcb/0x100 [btrfs]
[1387945.270382]  [<ffffffff810823a1>] ? sched_clock_cpu+0xa1/0xb0
[1387945.270556]  [<ffffffff8107beb4>] ? check_preempt_curr+0x54/0x90
[1387945.270730]  [<ffffffff8107c07d>] ? ttwu_do_activate.constprop.92+0x5d/0x70
[1387945.270905]  [<ffffffff8113735e>] do_writepages+0x1e/0x30
[1387945.271076]  [<ffffffff811c6c85>] __writeback_single_inode+0x45/0x370
[1387945.271250]  [<ffffffff811c74d2>] writeback_sb_inodes+0x252/0x570
[1387945.271423]  [<ffffffff811c7879>] __writeback_inodes_wb+0x89/0xc0
[1387945.271596]  [<ffffffff811c7bb8>] wb_writeback+0x268/0x300
[1387945.271766]  [<ffffffff811c83e6>] wb_workfn+0x2d6/0x400
[1387945.271938]  [<ffffffff81614ea8>] ? _raw_spin_lock_irqsave+0x18/0x50
[1387945.272112]  [<ffffffff8106bf89>] process_one_work+0x159/0x450
[1387945.272285]  [<ffffffff8106c639>] worker_thread+0x69/0x490
[1387945.272456]  [<ffffffff8106c5d0>] ? rescuer_thread+0x350/0x350
[1387945.272630]  [<ffffffff810717bf>] kthread+0xef/0x110
[1387945.272803]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
[1387945.272975]  [<ffffffff816156bf>] ret_from_fork+0x3f/0x70
[1387945.273146]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
[1387945.273322] Code: d2 e4 ff e8 67 6f 3e 00 48 8b 85 50 ff ff ff 49 39 c6 0f 83 15 fe ff ff 31 c0 eb a7 4c 89 ff e8 3b e8 ef ff e9 b8 fe ff ff 0f 0b <0f> 0b 48 8d bd 58 ff ff ff 89 85 48 ff ff ff e8 50 f8 f0 ff 8b 
[1387945.276751] RIP  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
[1387945.277089]  RSP <ffff8800064df960>

So a user triggers drop_caches and ext4 crashes due to it trying to 
write a page that isn't fs-owned. ffffffff8122a3ac is : head = page_buffers(page);
which has this: BUG_ON(!PagePrivate(page)); 

page.flags is flags = 216168384067469369 (in binary: 0000001011111111111111000000000000000000000000110000000000111001) 
so the 11th bit (PG_private) is not set, triggering the BUG_ON. The flags are 
(PG_LOCKED|PG_UPTODATE|PG_DIRTY|MAPPEDTODISK|PG_RECLAIM). Do these flags seem 
corrupt - uptodate and dirty being set at the same time? Maybe the page struct
is being corrupted?

page.private is actually NULL. The page does have an associated address_space 
mapping. I've validated this since address_space.host is the same as the 
inode member of the passed mpd. 

I have a complete crash dump so am able to dig for additional info, 
if it is required. 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:2428!
  2016-11-14 10:15 kernel BUG at fs/ext4/inode.c:2428! Nikolay Borisov
@ 2016-11-14 13:49 ` Jan Kara
  2016-11-14 14:46   ` Nikolay Borisov
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2016-11-14 13:49 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-ext4, Theodore Ts'o, Jan Kara, Andreas Dilger

Hello,

On Mon 14-11-16 12:15:16, Nikolay Borisov wrote:
> So I hit the following BUG_ON on 3 separate servers: 
> 
> [1387898.597939] sh (14886): drop_caches: 3
> [1387945.259613] ------------[ cut here ]------------
> [1387945.259791] kernel BUG at fs/ext4/inode.c:2428!
> [1387945.259964] invalid opcode: 0000 [#1] SMP 
> [1387945.263921] CPU: 9 PID: 8987 Comm: kworker/u24:23 Tainted: P           O    4.4.26-clouder1 #3
> [1387945.264213] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 01/16/2015
> [1387945.264512] Workqueue: writeback wb_workfn (flush-253:8)
> [1387945.264780] task: ffff880287ca5280 ti: ffff8800064dc000 task.ti: ffff8800064dc000
> [1387945.265073] RIP: 0010:[<ffffffff8122a3ac>]  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
> [1387945.265426] RSP: 0018:ffff8800064df960  EFLAGS: 00010246
> [1387945.265596] RAX: 02fffc0000030039 RBX: ffff8800064dfad0 RCX: 0000000000000537
> [1387945.265881] RDX: 000000000000231b RSI: 0000000000000000 RDI: ffffffff81a052a6
> [1387945.266165] RBP: ffff8800064dfa28 R08: 0000000000000000 R09: 0000000000000000
> [1387945.266450] R10: 0000000000000001 R11: 0000000000000100 R12: ffff8800064df980
> [1387945.266734] R13: 0000000000003400 R14: ffffffffffffffff R15: ffffea000686fbc0
> [1387945.267024] FS:  0000000000000000(0000) GS:ffff88047fd20000(0000) knlGS:0000000000000000
> [1387945.267315] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [1387945.267487] CR2: ffffffffff600400 CR3: 00000004555ff000 CR4: 00000000000406e0
> [1387945.267765] Stack:
> [1387945.267925]  0000000000000000 ffff88015b3c2be0 ffff8800064df980 0000000000000538
> [1387945.268386]  0000000000000001 0000000000000000 ffffea000686fbc0 ffff88046bce1800
> [1387945.268850]  ffff8800064df9e8 ffffffff81274777 ffffffff02400040 0fd000086bce6800
> [1387945.269319] Call Trace:
> [1387945.269489]  [<ffffffff81274777>] ? jbd2__journal_start+0xe7/0x200
> [1387945.269663]  [<ffffffff8122e581>] ? ext4_writepages+0x3a1/0xcd0
> [1387945.269839]  [<ffffffff8125ba8d>] ? __ext4_journal_start_sb+0x6d/0x100
> [1387945.270013]  [<ffffffff8122e5b2>] ext4_writepages+0x3d2/0xcd0
> [1387945.270207]  [<ffffffffa07f878b>] ? leaf_space_used+0xcb/0x100 [btrfs]
> [1387945.270382]  [<ffffffff810823a1>] ? sched_clock_cpu+0xa1/0xb0
> [1387945.270556]  [<ffffffff8107beb4>] ? check_preempt_curr+0x54/0x90
> [1387945.270730]  [<ffffffff8107c07d>] ? ttwu_do_activate.constprop.92+0x5d/0x70
> [1387945.270905]  [<ffffffff8113735e>] do_writepages+0x1e/0x30
> [1387945.271076]  [<ffffffff811c6c85>] __writeback_single_inode+0x45/0x370
> [1387945.271250]  [<ffffffff811c74d2>] writeback_sb_inodes+0x252/0x570
> [1387945.271423]  [<ffffffff811c7879>] __writeback_inodes_wb+0x89/0xc0
> [1387945.271596]  [<ffffffff811c7bb8>] wb_writeback+0x268/0x300
> [1387945.271766]  [<ffffffff811c83e6>] wb_workfn+0x2d6/0x400
> [1387945.271938]  [<ffffffff81614ea8>] ? _raw_spin_lock_irqsave+0x18/0x50
> [1387945.272112]  [<ffffffff8106bf89>] process_one_work+0x159/0x450
> [1387945.272285]  [<ffffffff8106c639>] worker_thread+0x69/0x490
> [1387945.272456]  [<ffffffff8106c5d0>] ? rescuer_thread+0x350/0x350
> [1387945.272630]  [<ffffffff810717bf>] kthread+0xef/0x110
> [1387945.272803]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
> [1387945.272975]  [<ffffffff816156bf>] ret_from_fork+0x3f/0x70
> [1387945.273146]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
> [1387945.273322] Code: d2 e4 ff e8 67 6f 3e 00 48 8b 85 50 ff ff ff 49 39 c6 0f 83 15 fe ff ff 31 c0 eb a7 4c 89 ff e8 3b e8 ef ff e9 b8 fe ff ff 0f 0b <0f> 0b 48 8d bd 58 ff ff ff 89 85 48 ff ff ff e8 50 f8 f0 ff 8b 
> [1387945.276751] RIP  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
> [1387945.277089]  RSP <ffff8800064df960>
> 
> So a user triggers drop_caches and ext4 crashes due to it trying to 
> write a page that isn't fs-owned. ffffffff8122a3ac is : head = page_buffers(page);
> which has this: BUG_ON(!PagePrivate(page)); 
> 
> page.flags is flags = 216168384067469369 (in binary: 0000001011111111111111000000000000000000000000110000000000111001) 
> so the 11th bit (PG_private) is not set, triggering the BUG_ON. The flags are 
> (PG_LOCKED|PG_UPTODATE|PG_DIRTY|MAPPEDTODISK|PG_RECLAIM). Do these flags seem 
> corrupt - uptodate and dirty being set at the same time? Maybe the page struct
> is being corrupted?
> 
> page.private is actually NULL. The page does have an associated address_space 
> mapping. I've validated this since address_space.host is the same as the 
> inode member of the passed mpd. 

Interesting. I didn't see this yet. What mount options does the filesystem
use? The file where this happened is a regular file I assume, right? What
is blocksize and page size for the filesystem?

The page flags actually look rather consistent. The only thing that is
strange is how a dirty page can be without buffers because whenever we
write to a page we create buffers for the page and then mark (some of
those) buffers dirty. ext4_releasepage() ends up calling
try_to_free_buffers() which does not remove dirty buffers. But apparently
the buffers attached to a page got somehow clean but the page remained
dirty, then ext4_releasepage() removed those buffers and then we crashed
when trying to write that page. If blocksize < pagesize, I could imagine
some subtle bugs in our code e.g. for pages straddling EOF...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:2428!
  2016-11-14 13:49 ` Jan Kara
@ 2016-11-14 14:46   ` Nikolay Borisov
  2016-11-21 14:27     ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Nikolay Borisov @ 2016-11-14 14:46 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Theodore Ts'o, Jan Kara, Andreas Dilger



On 11/14/2016 03:49 PM, Jan Kara wrote:
> Hello,
> 
> On Mon 14-11-16 12:15:16, Nikolay Borisov wrote:
>> So I hit the following BUG_ON on 3 separate servers: 
>>
>> [1387898.597939] sh (14886): drop_caches: 3
>> [1387945.259613] ------------[ cut here ]------------
>> [1387945.259791] kernel BUG at fs/ext4/inode.c:2428!
>> [1387945.259964] invalid opcode: 0000 [#1] SMP 
>> [1387945.263921] CPU: 9 PID: 8987 Comm: kworker/u24:23 Tainted: P           O    4.4.26-clouder1 #3
>> [1387945.264213] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 01/16/2015
>> [1387945.264512] Workqueue: writeback wb_workfn (flush-253:8)
>> [1387945.264780] task: ffff880287ca5280 ti: ffff8800064dc000 task.ti: ffff8800064dc000
>> [1387945.265073] RIP: 0010:[<ffffffff8122a3ac>]  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
>> [1387945.265426] RSP: 0018:ffff8800064df960  EFLAGS: 00010246
>> [1387945.265596] RAX: 02fffc0000030039 RBX: ffff8800064dfad0 RCX: 0000000000000537
>> [1387945.265881] RDX: 000000000000231b RSI: 0000000000000000 RDI: ffffffff81a052a6
>> [1387945.266165] RBP: ffff8800064dfa28 R08: 0000000000000000 R09: 0000000000000000
>> [1387945.266450] R10: 0000000000000001 R11: 0000000000000100 R12: ffff8800064df980
>> [1387945.266734] R13: 0000000000003400 R14: ffffffffffffffff R15: ffffea000686fbc0
>> [1387945.267024] FS:  0000000000000000(0000) GS:ffff88047fd20000(0000) knlGS:0000000000000000
>> [1387945.267315] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [1387945.267487] CR2: ffffffffff600400 CR3: 00000004555ff000 CR4: 00000000000406e0
>> [1387945.267765] Stack:
>> [1387945.267925]  0000000000000000 ffff88015b3c2be0 ffff8800064df980 0000000000000538
>> [1387945.268386]  0000000000000001 0000000000000000 ffffea000686fbc0 ffff88046bce1800
>> [1387945.268850]  ffff8800064df9e8 ffffffff81274777 ffffffff02400040 0fd000086bce6800
>> [1387945.269319] Call Trace:
>> [1387945.269489]  [<ffffffff81274777>] ? jbd2__journal_start+0xe7/0x200
>> [1387945.269663]  [<ffffffff8122e581>] ? ext4_writepages+0x3a1/0xcd0
>> [1387945.269839]  [<ffffffff8125ba8d>] ? __ext4_journal_start_sb+0x6d/0x100
>> [1387945.270013]  [<ffffffff8122e5b2>] ext4_writepages+0x3d2/0xcd0
>> [1387945.270207]  [<ffffffffa07f878b>] ? leaf_space_used+0xcb/0x100 [btrfs]
>> [1387945.270382]  [<ffffffff810823a1>] ? sched_clock_cpu+0xa1/0xb0
>> [1387945.270556]  [<ffffffff8107beb4>] ? check_preempt_curr+0x54/0x90
>> [1387945.270730]  [<ffffffff8107c07d>] ? ttwu_do_activate.constprop.92+0x5d/0x70
>> [1387945.270905]  [<ffffffff8113735e>] do_writepages+0x1e/0x30
>> [1387945.271076]  [<ffffffff811c6c85>] __writeback_single_inode+0x45/0x370
>> [1387945.271250]  [<ffffffff811c74d2>] writeback_sb_inodes+0x252/0x570
>> [1387945.271423]  [<ffffffff811c7879>] __writeback_inodes_wb+0x89/0xc0
>> [1387945.271596]  [<ffffffff811c7bb8>] wb_writeback+0x268/0x300
>> [1387945.271766]  [<ffffffff811c83e6>] wb_workfn+0x2d6/0x400
>> [1387945.271938]  [<ffffffff81614ea8>] ? _raw_spin_lock_irqsave+0x18/0x50
>> [1387945.272112]  [<ffffffff8106bf89>] process_one_work+0x159/0x450
>> [1387945.272285]  [<ffffffff8106c639>] worker_thread+0x69/0x490
>> [1387945.272456]  [<ffffffff8106c5d0>] ? rescuer_thread+0x350/0x350
>> [1387945.272630]  [<ffffffff810717bf>] kthread+0xef/0x110
>> [1387945.272803]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
>> [1387945.272975]  [<ffffffff816156bf>] ret_from_fork+0x3f/0x70
>> [1387945.273146]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
>> [1387945.273322] Code: d2 e4 ff e8 67 6f 3e 00 48 8b 85 50 ff ff ff 49 39 c6 0f 83 15 fe ff ff 31 c0 eb a7 4c 89 ff e8 3b e8 ef ff e9 b8 fe ff ff 0f 0b <0f> 0b 48 8d bd 58 ff ff ff 89 85 48 ff ff ff e8 50 f8 f0 ff 8b 
>> [1387945.276751] RIP  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
>> [1387945.277089]  RSP <ffff8800064df960>
>>
>> So a user triggers drop_caches and ext4 crashes due to it trying to 
>> write a page that isn't fs-owned. ffffffff8122a3ac is : head = page_buffers(page);
>> which has this: BUG_ON(!PagePrivate(page)); 
>>
>> page.flags is flags = 216168384067469369 (in binary: 0000001011111111111111000000000000000000000000110000000000111001) 
>> so the 11th bit (PG_private) is not set, triggering the BUG_ON. The flags are 
>> (PG_LOCKED|PG_UPTODATE|PG_DIRTY|MAPPEDTODISK|PG_RECLAIM). Do these flags seem 
>> corrupt - uptodate and dirty being set at the same time? Maybe the page struct
>> is being corrupted?
>>
>> page.private is actually NULL. The page does have an associated address_space 
>> mapping. I've validated this since address_space.host is the same as the 
>> inode member of the passed mpd. 
> 
> Interesting. I didn't see this yet. What mount options does the filesystem
> use? The file where this happened is a regular file I assume, right? What
> is blocksize and page size for the filesystem?

s_blocksize_bits = 12,
s_blocksize = 4096,

And this is x86_64 so pagesize is also 4k. Unfortunately I cannot
re-mount the file system since it has been converted to btrfs. However
here are the options: rw,relatime,discard,stripe=32,data=ordered, these
have been taken from an analogous mount. This is how it's supposed to
have been created: mkfs.ext4 -b 4096 -K -E nodiscard,lazy_itable_init=1
-O has_journal,large_file,resize_inode,sparse_super,uninit_bg

Since those values are from the system which supposedly created those,
here are the raw values form the ext4_sb_info->s_mount_opt* members:
s_mount_opt = 3892496400,
s_mount_opt2 = 2,

And from ext4_super_block->s_feature_*:

  s_feature_compat = 0,
  s_feature_incompat = 61267,
  s_feature_ro_compat = 0,





> The page flags actually look rather consistent. The only thing that is

Why is UPTODATE and DIRTY set simultaneously. Don't they contradict each
other?

> strange is how a dirty page can be without buffers because whenever we
> write to a page we create buffers for the page and then mark (some of
> those) buffers dirty. ext4_releasepage() ends up calling
> try_to_free_buffers() which does not remove dirty buffers. But apparently
> the buffers attached to a page got somehow clean but the page remained
> dirty, then ext4_releasepage() removed those buffers and then we crashed
> when trying to write that page. If blocksize < pagesize, I could imagine
> some subtle bugs in our code e.g. for pages straddling EOF...

In this case blocksize == pagesize  :(

> 
> 								Honza
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:2428!
  2016-11-14 14:46   ` Nikolay Borisov
@ 2016-11-21 14:27     ` Jan Kara
  2016-11-21 14:42       ` Nikolay Borisov
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2016-11-21 14:27 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Jan Kara, linux-ext4, Theodore Ts'o, Jan Kara, Andreas Dilger

On Mon 14-11-16 16:46:51, Nikolay Borisov wrote:
> 
> 
> On 11/14/2016 03:49 PM, Jan Kara wrote:
> > Hello,
> > 
> > On Mon 14-11-16 12:15:16, Nikolay Borisov wrote:
> >> So I hit the following BUG_ON on 3 separate servers: 
> >>
> >> [1387898.597939] sh (14886): drop_caches: 3
> >> [1387945.259613] ------------[ cut here ]------------
> >> [1387945.259791] kernel BUG at fs/ext4/inode.c:2428!
> >> [1387945.259964] invalid opcode: 0000 [#1] SMP 
> >> [1387945.263921] CPU: 9 PID: 8987 Comm: kworker/u24:23 Tainted: P           O    4.4.26-clouder1 #3
> >> [1387945.264213] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 01/16/2015
> >> [1387945.264512] Workqueue: writeback wb_workfn (flush-253:8)
> >> [1387945.264780] task: ffff880287ca5280 ti: ffff8800064dc000 task.ti: ffff8800064dc000
> >> [1387945.265073] RIP: 0010:[<ffffffff8122a3ac>]  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
> >> [1387945.265426] RSP: 0018:ffff8800064df960  EFLAGS: 00010246
> >> [1387945.265596] RAX: 02fffc0000030039 RBX: ffff8800064dfad0 RCX: 0000000000000537
> >> [1387945.265881] RDX: 000000000000231b RSI: 0000000000000000 RDI: ffffffff81a052a6
> >> [1387945.266165] RBP: ffff8800064dfa28 R08: 0000000000000000 R09: 0000000000000000
> >> [1387945.266450] R10: 0000000000000001 R11: 0000000000000100 R12: ffff8800064df980
> >> [1387945.266734] R13: 0000000000003400 R14: ffffffffffffffff R15: ffffea000686fbc0
> >> [1387945.267024] FS:  0000000000000000(0000) GS:ffff88047fd20000(0000) knlGS:0000000000000000
> >> [1387945.267315] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [1387945.267487] CR2: ffffffffff600400 CR3: 00000004555ff000 CR4: 00000000000406e0
> >> [1387945.267765] Stack:
> >> [1387945.267925]  0000000000000000 ffff88015b3c2be0 ffff8800064df980 0000000000000538
> >> [1387945.268386]  0000000000000001 0000000000000000 ffffea000686fbc0 ffff88046bce1800
> >> [1387945.268850]  ffff8800064df9e8 ffffffff81274777 ffffffff02400040 0fd000086bce6800
> >> [1387945.269319] Call Trace:
> >> [1387945.269489]  [<ffffffff81274777>] ? jbd2__journal_start+0xe7/0x200
> >> [1387945.269663]  [<ffffffff8122e581>] ? ext4_writepages+0x3a1/0xcd0
> >> [1387945.269839]  [<ffffffff8125ba8d>] ? __ext4_journal_start_sb+0x6d/0x100
> >> [1387945.270013]  [<ffffffff8122e5b2>] ext4_writepages+0x3d2/0xcd0
> >> [1387945.270207]  [<ffffffffa07f878b>] ? leaf_space_used+0xcb/0x100 [btrfs]
> >> [1387945.270382]  [<ffffffff810823a1>] ? sched_clock_cpu+0xa1/0xb0
> >> [1387945.270556]  [<ffffffff8107beb4>] ? check_preempt_curr+0x54/0x90
> >> [1387945.270730]  [<ffffffff8107c07d>] ? ttwu_do_activate.constprop.92+0x5d/0x70
> >> [1387945.270905]  [<ffffffff8113735e>] do_writepages+0x1e/0x30
> >> [1387945.271076]  [<ffffffff811c6c85>] __writeback_single_inode+0x45/0x370
> >> [1387945.271250]  [<ffffffff811c74d2>] writeback_sb_inodes+0x252/0x570
> >> [1387945.271423]  [<ffffffff811c7879>] __writeback_inodes_wb+0x89/0xc0
> >> [1387945.271596]  [<ffffffff811c7bb8>] wb_writeback+0x268/0x300
> >> [1387945.271766]  [<ffffffff811c83e6>] wb_workfn+0x2d6/0x400
> >> [1387945.271938]  [<ffffffff81614ea8>] ? _raw_spin_lock_irqsave+0x18/0x50
> >> [1387945.272112]  [<ffffffff8106bf89>] process_one_work+0x159/0x450
> >> [1387945.272285]  [<ffffffff8106c639>] worker_thread+0x69/0x490
> >> [1387945.272456]  [<ffffffff8106c5d0>] ? rescuer_thread+0x350/0x350
> >> [1387945.272630]  [<ffffffff810717bf>] kthread+0xef/0x110
> >> [1387945.272803]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
> >> [1387945.272975]  [<ffffffff816156bf>] ret_from_fork+0x3f/0x70
> >> [1387945.273146]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
> >> [1387945.273322] Code: d2 e4 ff e8 67 6f 3e 00 48 8b 85 50 ff ff ff 49 39 c6 0f 83 15 fe ff ff 31 c0 eb a7 4c 89 ff e8 3b e8 ef ff e9 b8 fe ff ff 0f 0b <0f> 0b 48 8d bd 58 ff ff ff 89 85 48 ff ff ff e8 50 f8 f0 ff 8b 
> >> [1387945.276751] RIP  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
> >> [1387945.277089]  RSP <ffff8800064df960>
> >>
> >> So a user triggers drop_caches and ext4 crashes due to it trying to 
> >> write a page that isn't fs-owned. ffffffff8122a3ac is : head = page_buffers(page);
> >> which has this: BUG_ON(!PagePrivate(page)); 
> >>
> >> page.flags is flags = 216168384067469369 (in binary: 0000001011111111111111000000000000000000000000110000000000111001) 
> >> so the 11th bit (PG_private) is not set, triggering the BUG_ON. The flags are 
> >> (PG_LOCKED|PG_UPTODATE|PG_DIRTY|MAPPEDTODISK|PG_RECLAIM). Do these flags seem 
> >> corrupt - uptodate and dirty being set at the same time? Maybe the page struct
> >> is being corrupted?
> >>
> >> page.private is actually NULL. The page does have an associated address_space 
> >> mapping. I've validated this since address_space.host is the same as the 
> >> inode member of the passed mpd. 
> > 
> > Interesting. I didn't see this yet. What mount options does the filesystem
> > use? The file where this happened is a regular file I assume, right? What
> > is blocksize and page size for the filesystem?
> 
> s_blocksize_bits = 12,
> s_blocksize = 4096,
> 
> And this is x86_64 so pagesize is also 4k. Unfortunately I cannot
> re-mount the file system since it has been converted to btrfs. However
> here are the options: rw,relatime,discard,stripe=32,data=ordered, these
> have been taken from an analogous mount. This is how it's supposed to
> have been created: mkfs.ext4 -b 4096 -K -E nodiscard,lazy_itable_init=1
> -O has_journal,large_file,resize_inode,sparse_super,uninit_bg
> 
> Since those values are from the system which supposedly created those,
> here are the raw values form the ext4_sb_info->s_mount_opt* members:
> s_mount_opt = 3892496400,
> s_mount_opt2 = 2,
> 
> And from ext4_super_block->s_feature_*:
> 
>   s_feature_compat = 0,
>   s_feature_incompat = 61267,
>   s_feature_ro_compat = 0,

This looks strange - COMPAT and RO_COMPAT features definitely should not be
zero (e.g. has_journal and sparse_super should be set), also INCOMPAT
features look weird - e.g. bit 0x800 is not used but it is set in incompat
features. Did you get the sb pointer right?

> > The page flags actually look rather consistent. The only thing that is
> 
> Why is UPTODATE and DIRTY set simultaneously. Don't they contradict each
> other?

They can be set simultaneously - 'uptodate' means page has at least as new
data as is on disk, 'dirty' means page has strictly newer data than on
disk.

What is page->index and how large is the inode (inode->i_size)?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:2428!
  2016-11-21 14:27     ` Jan Kara
@ 2016-11-21 14:42       ` Nikolay Borisov
  2016-11-21 15:20         ` Theodore Ts'o
  2016-11-22 13:34         ` Jan Kara
  0 siblings, 2 replies; 12+ messages in thread
From: Nikolay Borisov @ 2016-11-21 14:42 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Theodore Ts'o, Jan Kara, Andreas Dilger



On 11/21/2016 04:27 PM, Jan Kara wrote:
> On Mon 14-11-16 16:46:51, Nikolay Borisov wrote:
>>
>>
>> On 11/14/2016 03:49 PM, Jan Kara wrote:
>>> Hello,
>>>
>>> On Mon 14-11-16 12:15:16, Nikolay Borisov wrote:
>>>> So I hit the following BUG_ON on 3 separate servers: 
>>>>
>>>> [1387898.597939] sh (14886): drop_caches: 3
>>>> [1387945.259613] ------------[ cut here ]------------
>>>> [1387945.259791] kernel BUG at fs/ext4/inode.c:2428!
>>>> [1387945.259964] invalid opcode: 0000 [#1] SMP 
>>>> [1387945.263921] CPU: 9 PID: 8987 Comm: kworker/u24:23 Tainted: P           O    4.4.26-clouder1 #3
>>>> [1387945.264213] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 01/16/2015
>>>> [1387945.264512] Workqueue: writeback wb_workfn (flush-253:8)
>>>> [1387945.264780] task: ffff880287ca5280 ti: ffff8800064dc000 task.ti: ffff8800064dc000
>>>> [1387945.265073] RIP: 0010:[<ffffffff8122a3ac>]  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
>>>> [1387945.265426] RSP: 0018:ffff8800064df960  EFLAGS: 00010246
>>>> [1387945.265596] RAX: 02fffc0000030039 RBX: ffff8800064dfad0 RCX: 0000000000000537
>>>> [1387945.265881] RDX: 000000000000231b RSI: 0000000000000000 RDI: ffffffff81a052a6
>>>> [1387945.266165] RBP: ffff8800064dfa28 R08: 0000000000000000 R09: 0000000000000000
>>>> [1387945.266450] R10: 0000000000000001 R11: 0000000000000100 R12: ffff8800064df980
>>>> [1387945.266734] R13: 0000000000003400 R14: ffffffffffffffff R15: ffffea000686fbc0
>>>> [1387945.267024] FS:  0000000000000000(0000) GS:ffff88047fd20000(0000) knlGS:0000000000000000
>>>> [1387945.267315] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [1387945.267487] CR2: ffffffffff600400 CR3: 00000004555ff000 CR4: 00000000000406e0
>>>> [1387945.267765] Stack:
>>>> [1387945.267925]  0000000000000000 ffff88015b3c2be0 ffff8800064df980 0000000000000538
>>>> [1387945.268386]  0000000000000001 0000000000000000 ffffea000686fbc0 ffff88046bce1800
>>>> [1387945.268850]  ffff8800064df9e8 ffffffff81274777 ffffffff02400040 0fd000086bce6800
>>>> [1387945.269319] Call Trace:
>>>> [1387945.269489]  [<ffffffff81274777>] ? jbd2__journal_start+0xe7/0x200
>>>> [1387945.269663]  [<ffffffff8122e581>] ? ext4_writepages+0x3a1/0xcd0
>>>> [1387945.269839]  [<ffffffff8125ba8d>] ? __ext4_journal_start_sb+0x6d/0x100
>>>> [1387945.270013]  [<ffffffff8122e5b2>] ext4_writepages+0x3d2/0xcd0
>>>> [1387945.270207]  [<ffffffffa07f878b>] ? leaf_space_used+0xcb/0x100 [btrfs]
>>>> [1387945.270382]  [<ffffffff810823a1>] ? sched_clock_cpu+0xa1/0xb0
>>>> [1387945.270556]  [<ffffffff8107beb4>] ? check_preempt_curr+0x54/0x90
>>>> [1387945.270730]  [<ffffffff8107c07d>] ? ttwu_do_activate.constprop.92+0x5d/0x70
>>>> [1387945.270905]  [<ffffffff8113735e>] do_writepages+0x1e/0x30
>>>> [1387945.271076]  [<ffffffff811c6c85>] __writeback_single_inode+0x45/0x370
>>>> [1387945.271250]  [<ffffffff811c74d2>] writeback_sb_inodes+0x252/0x570
>>>> [1387945.271423]  [<ffffffff811c7879>] __writeback_inodes_wb+0x89/0xc0
>>>> [1387945.271596]  [<ffffffff811c7bb8>] wb_writeback+0x268/0x300
>>>> [1387945.271766]  [<ffffffff811c83e6>] wb_workfn+0x2d6/0x400
>>>> [1387945.271938]  [<ffffffff81614ea8>] ? _raw_spin_lock_irqsave+0x18/0x50
>>>> [1387945.272112]  [<ffffffff8106bf89>] process_one_work+0x159/0x450
>>>> [1387945.272285]  [<ffffffff8106c639>] worker_thread+0x69/0x490
>>>> [1387945.272456]  [<ffffffff8106c5d0>] ? rescuer_thread+0x350/0x350
>>>> [1387945.272630]  [<ffffffff810717bf>] kthread+0xef/0x110
>>>> [1387945.272803]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
>>>> [1387945.272975]  [<ffffffff816156bf>] ret_from_fork+0x3f/0x70
>>>> [1387945.273146]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
>>>> [1387945.273322] Code: d2 e4 ff e8 67 6f 3e 00 48 8b 85 50 ff ff ff 49 39 c6 0f 83 15 fe ff ff 31 c0 eb a7 4c 89 ff e8 3b e8 ef ff e9 b8 fe ff ff 0f 0b <0f> 0b 48 8d bd 58 ff ff ff 89 85 48 ff ff ff e8 50 f8 f0 ff 8b 
>>>> [1387945.276751] RIP  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
>>>> [1387945.277089]  RSP <ffff8800064df960>
>>>>
>>>> So a user triggers drop_caches and ext4 crashes due to it trying to 
>>>> write a page that isn't fs-owned. ffffffff8122a3ac is : head = page_buffers(page);
>>>> which has this: BUG_ON(!PagePrivate(page)); 
>>>>
>>>> page.flags is flags = 216168384067469369 (in binary: 0000001011111111111111000000000000000000000000110000000000111001) 
>>>> so the 11th bit (PG_private) is not set, triggering the BUG_ON. The flags are 
>>>> (PG_LOCKED|PG_UPTODATE|PG_DIRTY|MAPPEDTODISK|PG_RECLAIM). Do these flags seem 
>>>> corrupt - uptodate and dirty being set at the same time? Maybe the page struct
>>>> is being corrupted?
>>>>
>>>> page.private is actually NULL. The page does have an associated address_space 
>>>> mapping. I've validated this since address_space.host is the same as the 
>>>> inode member of the passed mpd. 
>>>
>>> Interesting. I didn't see this yet. What mount options does the filesystem
>>> use? The file where this happened is a regular file I assume, right? What
>>> is blocksize and page size for the filesystem?
>>
>> s_blocksize_bits = 12,
>> s_blocksize = 4096,
>>
>> And this is x86_64 so pagesize is also 4k. Unfortunately I cannot
>> re-mount the file system since it has been converted to btrfs. However
>> here are the options: rw,relatime,discard,stripe=32,data=ordered, these
>> have been taken from an analogous mount. This is how it's supposed to
>> have been created: mkfs.ext4 -b 4096 -K -E nodiscard,lazy_itable_init=1
>> -O has_journal,large_file,resize_inode,sparse_super,uninit_bg
>>
>> Since those values are from the system which supposedly created those,
>> here are the raw values form the ext4_sb_info->s_mount_opt* members:
>> s_mount_opt = 3892496400,
>> s_mount_opt2 = 2,
>>
>> And from ext4_super_block->s_feature_*:
>>
>>   s_feature_compat = 0,
>>   s_feature_incompat = 61267,
>>   s_feature_ro_compat = 0,
> 
> This looks strange - COMPAT and RO_COMPAT features definitely should not be
> zero (e.g. has_journal and sparse_super should be set), also INCOMPAT
> features look weird - e.g. bit 0x800 is not used but it is set in incompat
> features. Did you get the sb pointer right?
> 
>>> The page flags actually look rather consistent. The only thing that is
>>
>> Why is UPTODATE and DIRTY set simultaneously. Don't they contradict each
>> other?
> 
> They can be set simultaneously - 'uptodate' means page has at least as new
> data as is on disk, 'dirty' means page has strictly newer data than on
> disk.
> 
> What is page->index and how large is the inode (inode->i_size)?

So here is the full page struct for reference:

struct page {
  flags = 216168384067469369,
  {
    mapping = 0xffff88015b3c2be0,
    s_mem = 0xffff88015b3c2be0
  },
  {
    {
      index = 1335,
      freelist = 0x537
    },
    {
      counters = 12884901887,
      {
        {
          _mapcount = {
            counter = -1
          },
          {
            inuse = 65535,
            objects = 32767,
            frozen = 1
          },
          units = -1
        },
        _count = {
          counter = 2
        }
      },
      active = 4294967295
    }
  },
  {
    lru = {
      next = 0xffffea000686caa0,
      prev = 0xffffea000686ca60
    },
    {
      next = 0xffffea000686caa0,
      pages = 109496928,
      pobjects = -5632
    },
    callback_head = {
      next = 0xffffea000686caa0,
      func = 0xffffea000686ca60
    },
    {
      compound_head = 18446719884563237536,
      compound_dtor = 109496928,
      compound_order = 4294961664
    },
    {
      __pad = 18446719884563237536,
      pmd_huge_pte = 0xffffea000686ca60
    }
  },
  {
    private = 0,
    ptl = {
      {
        rlock = {
          raw_lock = {
            val = {
              counter = 0
            }
          }


And here is the inode size: i_size = 6366887

And here is the mpage_da_data struct being used:

struct mpage_da_data {
  inode = 0xffff88015b3c2a78,
  wbc = 0xffff8800064dfc00,
  first_page = 1335,
  next_page = 1336,
  last_page = 18446744073709551615,
  map = {
    m_pblk = 18446612145925311300,
    m_lblk = 105773856,
    m_len = 0,
    m_flags = 2164768893
  },
  io_submit = {
    io_wbc = 0xffff8800064dfc00,
    io_bio = 0x0,
    io_end = 0xffff88010ce72510,
    io_next_block = 18446612132419992536
  }
}

Also I just had the same issue happen on 2 more physical servers. This
excludes the possibility of a random memory corruption.

> 
> 								Honza
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:2428!
  2016-11-21 14:42       ` Nikolay Borisov
@ 2016-11-21 15:20         ` Theodore Ts'o
  2016-11-21 15:32           ` Nikolay Borisov
  2016-11-22 13:34         ` Jan Kara
  1 sibling, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2016-11-21 15:20 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Jan Kara, linux-ext4, Jan Kara, Andreas Dilger

Can you try getting the inode number and then trying to get a pathname
to the inode?  And then get the on-disk stat information?  You can do
this using debugfs, for example:

debugfs /dev/sda3
debugfs:  ncheck 2621442
Inode	Pathname
2621442	/u1/Doctor-Who-medley.flac
debugfs:  stat /u1/Doctor-Who-medley.flac
Inode: 2621442   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 3942406670    Version: 0x00000000:00000001
User: 18209   Group: 18209   Project:     0   Size: 500591580
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 977608
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x57e99a99:57d3bfbc -- Mon Sep 26 18:00:57 2016
 atime: 0x57eef0af:64fe6c84 -- Fri Sep 30 19:09:35 2016
 mtime: 0x56b18898:00000000 -- Tue Feb  2 23:56:56 2016
crtime: 0x57e99a95:91002508 -- Mon Sep 26 18:00:53 2016
Size of extra inode fields: 32
Inode checksum: 0xfe12a620
EXTENTS:
(ETB0):10543077, (0-2):649216-649218, (18-6143):649234-655359, (6144-16383):7106
56-720895, (16384-24575):696320-704511, (24576-36863):806912-819199, (36864-4505
5):794624-802815, (45056-47103):882688-884735, (47104-71679):892928-917503, (716
80-75775):937984-942079, (75776-77823):1003520-1005567, (77824-83967):1009664-10
15807, (83968-88063):991232-995327, (88064-92159):999424-1003519, (92160-94207):
1021952-1023999, (94208-96255):1028096-1030143, (96256-98303):1040384-1042431, (
98304-106495):1032192-1040383, (106496-120831):1095680-1110015, (120832-122214):
1085440-1086822

And then can you see if there's any commonality in terms of the
workload associated with that inode, or whether or not that inode was
likely to be in the process of being modified?

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:2428!
  2016-11-21 15:20         ` Theodore Ts'o
@ 2016-11-21 15:32           ` Nikolay Borisov
  0 siblings, 0 replies; 12+ messages in thread
From: Nikolay Borisov @ 2016-11-21 15:32 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Jan Kara, linux-ext4, Jan Kara, Andreas Dilger



On 11/21/2016 05:20 PM, Theodore Ts'o wrote:
> Can you try getting the inode number and then trying to get a pathname
> to the inode?  And then get the on-disk stat information?  You can do
> this using debugfs, for example:
> 
> debugfs /dev/sda3
> debugfs:  ncheck 2621442
> Inode	Pathname
> 2621442	/u1/Doctor-Who-medley.flac
> debugfs:  stat /u1/Doctor-Who-medley.flac
> Inode: 2621442   Type: regular    Mode:  0644   Flags: 0x80000
> Generation: 3942406670    Version: 0x00000000:00000001
> User: 18209   Group: 18209   Project:     0   Size: 500591580
> File ACL: 0    Directory ACL: 0
> Links: 1   Blockcount: 977608
> Fragment:  Address: 0    Number: 0    Size: 0
>  ctime: 0x57e99a99:57d3bfbc -- Mon Sep 26 18:00:57 2016
>  atime: 0x57eef0af:64fe6c84 -- Fri Sep 30 19:09:35 2016
>  mtime: 0x56b18898:00000000 -- Tue Feb  2 23:56:56 2016
> crtime: 0x57e99a95:91002508 -- Mon Sep 26 18:00:53 2016
> Size of extra inode fields: 32
> Inode checksum: 0xfe12a620
> EXTENTS:
> (ETB0):10543077, (0-2):649216-649218, (18-6143):649234-655359, (6144-16383):7106
> 56-720895, (16384-24575):696320-704511, (24576-36863):806912-819199, (36864-4505
> 5):794624-802815, (45056-47103):882688-884735, (47104-71679):892928-917503, (716
> 80-75775):937984-942079, (75776-77823):1003520-1005567, (77824-83967):1009664-10
> 15807, (83968-88063):991232-995327, (88064-92159):999424-1003519, (92160-94207):
> 1021952-1023999, (94208-96255):1028096-1030143, (96256-98303):1040384-1042431, (
> 98304-106495):1032192-1040383, (106496-120831):1095680-1110015, (120832-122214):
> 1085440-1086822
> 
> And then can you see if there's any commonality in terms of the
> workload associated with that inode, or whether or not that inode was
> likely to be in the process of being modified?
> 


Unfortunately the files involved in those workloads have since been
migrated to BTRFS. However, I have full crashdumps and can extract any
information from the in-memory data structures. I'd really like to get
this resolved and hope the in-memory information will suffice.

> Thanks,
> 
> 					- Ted
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:2428!
  2016-11-21 14:42       ` Nikolay Borisov
  2016-11-21 15:20         ` Theodore Ts'o
@ 2016-11-22 13:34         ` Jan Kara
  2016-11-22 14:00           ` Nikolay Borisov
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Kara @ 2016-11-22 13:34 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Jan Kara, linux-ext4, Theodore Ts'o, Jan Kara, Andreas Dilger

[-- Attachment #1: Type: text/plain, Size: 7866 bytes --]

On Mon 21-11-16 16:42:09, Nikolay Borisov wrote:
> 
> 
> On 11/21/2016 04:27 PM, Jan Kara wrote:
> > On Mon 14-11-16 16:46:51, Nikolay Borisov wrote:
> >>
> >>
> >> On 11/14/2016 03:49 PM, Jan Kara wrote:
> >>> Hello,
> >>>
> >>> On Mon 14-11-16 12:15:16, Nikolay Borisov wrote:
> >>>> So I hit the following BUG_ON on 3 separate servers: 
> >>>>
> >>>> [1387898.597939] sh (14886): drop_caches: 3
> >>>> [1387945.259613] ------------[ cut here ]------------
> >>>> [1387945.259791] kernel BUG at fs/ext4/inode.c:2428!
> >>>> [1387945.259964] invalid opcode: 0000 [#1] SMP 
> >>>> [1387945.263921] CPU: 9 PID: 8987 Comm: kworker/u24:23 Tainted: P           O    4.4.26-clouder1 #3
> >>>> [1387945.264213] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 01/16/2015
> >>>> [1387945.264512] Workqueue: writeback wb_workfn (flush-253:8)
> >>>> [1387945.264780] task: ffff880287ca5280 ti: ffff8800064dc000 task.ti: ffff8800064dc000
> >>>> [1387945.265073] RIP: 0010:[<ffffffff8122a3ac>]  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
> >>>> [1387945.265426] RSP: 0018:ffff8800064df960  EFLAGS: 00010246
> >>>> [1387945.265596] RAX: 02fffc0000030039 RBX: ffff8800064dfad0 RCX: 0000000000000537
> >>>> [1387945.265881] RDX: 000000000000231b RSI: 0000000000000000 RDI: ffffffff81a052a6
> >>>> [1387945.266165] RBP: ffff8800064dfa28 R08: 0000000000000000 R09: 0000000000000000
> >>>> [1387945.266450] R10: 0000000000000001 R11: 0000000000000100 R12: ffff8800064df980
> >>>> [1387945.266734] R13: 0000000000003400 R14: ffffffffffffffff R15: ffffea000686fbc0
> >>>> [1387945.267024] FS:  0000000000000000(0000) GS:ffff88047fd20000(0000) knlGS:0000000000000000
> >>>> [1387945.267315] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>> [1387945.267487] CR2: ffffffffff600400 CR3: 00000004555ff000 CR4: 00000000000406e0
> >>>> [1387945.267765] Stack:
> >>>> [1387945.267925]  0000000000000000 ffff88015b3c2be0 ffff8800064df980 0000000000000538
> >>>> [1387945.268386]  0000000000000001 0000000000000000 ffffea000686fbc0 ffff88046bce1800
> >>>> [1387945.268850]  ffff8800064df9e8 ffffffff81274777 ffffffff02400040 0fd000086bce6800
> >>>> [1387945.269319] Call Trace:
> >>>> [1387945.269489]  [<ffffffff81274777>] ? jbd2__journal_start+0xe7/0x200
> >>>> [1387945.269663]  [<ffffffff8122e581>] ? ext4_writepages+0x3a1/0xcd0
> >>>> [1387945.269839]  [<ffffffff8125ba8d>] ? __ext4_journal_start_sb+0x6d/0x100
> >>>> [1387945.270013]  [<ffffffff8122e5b2>] ext4_writepages+0x3d2/0xcd0
> >>>> [1387945.270207]  [<ffffffffa07f878b>] ? leaf_space_used+0xcb/0x100 [btrfs]
> >>>> [1387945.270382]  [<ffffffff810823a1>] ? sched_clock_cpu+0xa1/0xb0
> >>>> [1387945.270556]  [<ffffffff8107beb4>] ? check_preempt_curr+0x54/0x90
> >>>> [1387945.270730]  [<ffffffff8107c07d>] ? ttwu_do_activate.constprop.92+0x5d/0x70
> >>>> [1387945.270905]  [<ffffffff8113735e>] do_writepages+0x1e/0x30
> >>>> [1387945.271076]  [<ffffffff811c6c85>] __writeback_single_inode+0x45/0x370
> >>>> [1387945.271250]  [<ffffffff811c74d2>] writeback_sb_inodes+0x252/0x570
> >>>> [1387945.271423]  [<ffffffff811c7879>] __writeback_inodes_wb+0x89/0xc0
> >>>> [1387945.271596]  [<ffffffff811c7bb8>] wb_writeback+0x268/0x300
> >>>> [1387945.271766]  [<ffffffff811c83e6>] wb_workfn+0x2d6/0x400
> >>>> [1387945.271938]  [<ffffffff81614ea8>] ? _raw_spin_lock_irqsave+0x18/0x50
> >>>> [1387945.272112]  [<ffffffff8106bf89>] process_one_work+0x159/0x450
> >>>> [1387945.272285]  [<ffffffff8106c639>] worker_thread+0x69/0x490
> >>>> [1387945.272456]  [<ffffffff8106c5d0>] ? rescuer_thread+0x350/0x350
> >>>> [1387945.272630]  [<ffffffff810717bf>] kthread+0xef/0x110
> >>>> [1387945.272803]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
> >>>> [1387945.272975]  [<ffffffff816156bf>] ret_from_fork+0x3f/0x70
> >>>> [1387945.273146]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
> >>>> [1387945.273322] Code: d2 e4 ff e8 67 6f 3e 00 48 8b 85 50 ff ff ff 49 39 c6 0f 83 15 fe ff ff 31 c0 eb a7 4c 89 ff e8 3b e8 ef ff e9 b8 fe ff ff 0f 0b <0f> 0b 48 8d bd 58 ff ff ff 89 85 48 ff ff ff e8 50 f8 f0 ff 8b 
> >>>> [1387945.276751] RIP  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
> >>>> [1387945.277089]  RSP <ffff8800064df960>
> >>>>
> >>>> So a user triggers drop_caches and ext4 crashes due to it trying to 
> >>>> write a page that isn't fs-owned. ffffffff8122a3ac is : head = page_buffers(page);
> >>>> which has this: BUG_ON(!PagePrivate(page)); 
> >>>>
> >>>> page.flags is flags = 216168384067469369 (in binary: 0000001011111111111111000000000000000000000000110000000000111001) 
> >>>> so the 11th bit (PG_private) is not set, triggering the BUG_ON. The flags are 
> >>>> (PG_LOCKED|PG_UPTODATE|PG_DIRTY|MAPPEDTODISK|PG_RECLAIM). Do these flags seem 
> >>>> corrupt - uptodate and dirty being set at the same time? Maybe the page struct
> >>>> is being corrupted?
> >>>>
> >>>> page.private is actually NULL. The page does have an associated address_space 
> >>>> mapping. I've validated this since address_space.host is the same as the 
> >>>> inode member of the passed mpd. 
> >>>
> >>> Interesting. I didn't see this yet. What mount options does the filesystem
> >>> use? The file where this happened is a regular file I assume, right? What
> >>> is blocksize and page size for the filesystem?
> >>
> >> s_blocksize_bits = 12,
> >> s_blocksize = 4096,
> >>
> >> And this is x86_64 so pagesize is also 4k. Unfortunately I cannot
> >> re-mount the file system since it has been converted to btrfs. However
> >> here are the options: rw,relatime,discard,stripe=32,data=ordered, these
> >> have been taken from an analogous mount. This is how it's supposed to
> >> have been created: mkfs.ext4 -b 4096 -K -E nodiscard,lazy_itable_init=1
> >> -O has_journal,large_file,resize_inode,sparse_super,uninit_bg
> >>
> >> Since those values are from the system which supposedly created those,
> >> here are the raw values form the ext4_sb_info->s_mount_opt* members:
> >> s_mount_opt = 3892496400,
> >> s_mount_opt2 = 2,
> >>
> >> And from ext4_super_block->s_feature_*:
> >>
> >>   s_feature_compat = 0,
> >>   s_feature_incompat = 61267,
> >>   s_feature_ro_compat = 0,
> > 
> > This looks strange - COMPAT and RO_COMPAT features definitely should not be
> > zero (e.g. has_journal and sparse_super should be set), also INCOMPAT
> > features look weird - e.g. bit 0x800 is not used but it is set in incompat
> > features. Did you get the sb pointer right?

You did not respond to this... Can you show full ext4_super_block contents
as well?

> >>> The page flags actually look rather consistent. The only thing that is
> >>
> >> Why is UPTODATE and DIRTY set simultaneously. Don't they contradict each
> >> other?
> > 
> > They can be set simultaneously - 'uptodate' means page has at least as new
> > data as is on disk, 'dirty' means page has strictly newer data than on
> > disk.
> > 
> > What is page->index and how large is the inode (inode->i_size)?
> 
> So here is the full page struct for reference:

<snip>

All the structures look sane and consistent except for page->private being
NULL...

> Also I just had the same issue happen on 2 more physical servers. This
> excludes the possibility of a random memory corruption.

OK, any idea how reproducible it is? Attached debug patch could tell us
more and it can be easily used in production as well, just I'm not sure to
how many machined you'd have to deploy this to catch the problem...

Also can you check whether there are not any warnings from ext4 before this
fatal failure in the kernel logs? Because my current suspicion is that the
machines are getting close to OOM, some allocation in the writeback path
fails somewhere and we do not properly recover from the failed writeback
(i.e. we redirty the page but not corresponding buffers) or something like
that.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-ext4-Warn-when-page-is-dirtied-without-buffers.patch --]
[-- Type: text/x-patch, Size: 1532 bytes --]

>From 50f4bc8f67f7c035a88a551851292ac2beafbee6 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 22 Nov 2016 14:27:58 +0100
Subject: [PATCH] ext4: Warn when page is dirtied without buffers

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9c064727ed62..5b86edbc5ebd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3615,6 +3615,12 @@ static int ext4_journalled_set_page_dirty(struct page *page)
 	return __set_page_dirty_nobuffers(page);
 }
 
+static int ext4_debug_set_page_dirty(struct page *page)
+{
+	WARN_ON_ONCE(!page_has_buffers(page));
+	return __set_page_dirty_buffers(page);
+}
+
 static const struct address_space_operations ext4_aops = {
 	.readpage		= ext4_readpage,
 	.readpages		= ext4_readpages,
@@ -3622,6 +3628,7 @@ static const struct address_space_operations ext4_aops = {
 	.writepages		= ext4_writepages,
 	.write_begin		= ext4_write_begin,
 	.write_end		= ext4_write_end,
+	.set_page_dirty		= ext4_debug_set_page_dirty,
 	.bmap			= ext4_bmap,
 	.invalidatepage		= ext4_invalidatepage,
 	.releasepage		= ext4_releasepage,
@@ -3654,6 +3661,7 @@ static const struct address_space_operations ext4_da_aops = {
 	.writepages		= ext4_writepages,
 	.write_begin		= ext4_da_write_begin,
 	.write_end		= ext4_da_write_end,
+	.set_page_dirty		= ext4_debug_set_page_dirty,
 	.bmap			= ext4_bmap,
 	.invalidatepage		= ext4_da_invalidatepage,
 	.releasepage		= ext4_releasepage,
-- 
2.6.6


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:2428!
  2016-11-22 13:34         ` Jan Kara
@ 2016-11-22 14:00           ` Nikolay Borisov
  2016-11-23  8:36             ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Nikolay Borisov @ 2016-11-22 14:00 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Theodore Ts'o, Jan Kara, Andreas Dilger



On 11/22/2016 03:34 PM, Jan Kara wrote:
> On Mon 21-11-16 16:42:09, Nikolay Borisov wrote:
>>
>>
>> On 11/21/2016 04:27 PM, Jan Kara wrote:
>>> On Mon 14-11-16 16:46:51, Nikolay Borisov wrote:
>>>>
>>>>
>>>> On 11/14/2016 03:49 PM, Jan Kara wrote:
>>>>> Hello,
>>>>>
>>>>> On Mon 14-11-16 12:15:16, Nikolay Borisov wrote:
>>>>>> So I hit the following BUG_ON on 3 separate servers: 
>>>>>>
>>>>>> [1387898.597939] sh (14886): drop_caches: 3
>>>>>> [1387945.259613] ------------[ cut here ]------------
>>>>>> [1387945.259791] kernel BUG at fs/ext4/inode.c:2428!
>>>>>> [1387945.259964] invalid opcode: 0000 [#1] SMP 
>>>>>> [1387945.263921] CPU: 9 PID: 8987 Comm: kworker/u24:23 Tainted: P           O    4.4.26-clouder1 #3
>>>>>> [1387945.264213] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 01/16/2015
>>>>>> [1387945.264512] Workqueue: writeback wb_workfn (flush-253:8)
>>>>>> [1387945.264780] task: ffff880287ca5280 ti: ffff8800064dc000 task.ti: ffff8800064dc000
>>>>>> [1387945.265073] RIP: 0010:[<ffffffff8122a3ac>]  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
>>>>>> [1387945.265426] RSP: 0018:ffff8800064df960  EFLAGS: 00010246
>>>>>> [1387945.265596] RAX: 02fffc0000030039 RBX: ffff8800064dfad0 RCX: 0000000000000537
>>>>>> [1387945.265881] RDX: 000000000000231b RSI: 0000000000000000 RDI: ffffffff81a052a6
>>>>>> [1387945.266165] RBP: ffff8800064dfa28 R08: 0000000000000000 R09: 0000000000000000
>>>>>> [1387945.266450] R10: 0000000000000001 R11: 0000000000000100 R12: ffff8800064df980
>>>>>> [1387945.266734] R13: 0000000000003400 R14: ffffffffffffffff R15: ffffea000686fbc0
>>>>>> [1387945.267024] FS:  0000000000000000(0000) GS:ffff88047fd20000(0000) knlGS:0000000000000000
>>>>>> [1387945.267315] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [1387945.267487] CR2: ffffffffff600400 CR3: 00000004555ff000 CR4: 00000000000406e0
>>>>>> [1387945.267765] Stack:
>>>>>> [1387945.267925]  0000000000000000 ffff88015b3c2be0 ffff8800064df980 0000000000000538
>>>>>> [1387945.268386]  0000000000000001 0000000000000000 ffffea000686fbc0 ffff88046bce1800
>>>>>> [1387945.268850]  ffff8800064df9e8 ffffffff81274777 ffffffff02400040 0fd000086bce6800
>>>>>> [1387945.269319] Call Trace:
>>>>>> [1387945.269489]  [<ffffffff81274777>] ? jbd2__journal_start+0xe7/0x200
>>>>>> [1387945.269663]  [<ffffffff8122e581>] ? ext4_writepages+0x3a1/0xcd0
>>>>>> [1387945.269839]  [<ffffffff8125ba8d>] ? __ext4_journal_start_sb+0x6d/0x100
>>>>>> [1387945.270013]  [<ffffffff8122e5b2>] ext4_writepages+0x3d2/0xcd0
>>>>>> [1387945.270207]  [<ffffffffa07f878b>] ? leaf_space_used+0xcb/0x100 [btrfs]
>>>>>> [1387945.270382]  [<ffffffff810823a1>] ? sched_clock_cpu+0xa1/0xb0
>>>>>> [1387945.270556]  [<ffffffff8107beb4>] ? check_preempt_curr+0x54/0x90
>>>>>> [1387945.270730]  [<ffffffff8107c07d>] ? ttwu_do_activate.constprop.92+0x5d/0x70
>>>>>> [1387945.270905]  [<ffffffff8113735e>] do_writepages+0x1e/0x30
>>>>>> [1387945.271076]  [<ffffffff811c6c85>] __writeback_single_inode+0x45/0x370
>>>>>> [1387945.271250]  [<ffffffff811c74d2>] writeback_sb_inodes+0x252/0x570
>>>>>> [1387945.271423]  [<ffffffff811c7879>] __writeback_inodes_wb+0x89/0xc0
>>>>>> [1387945.271596]  [<ffffffff811c7bb8>] wb_writeback+0x268/0x300
>>>>>> [1387945.271766]  [<ffffffff811c83e6>] wb_workfn+0x2d6/0x400
>>>>>> [1387945.271938]  [<ffffffff81614ea8>] ? _raw_spin_lock_irqsave+0x18/0x50
>>>>>> [1387945.272112]  [<ffffffff8106bf89>] process_one_work+0x159/0x450
>>>>>> [1387945.272285]  [<ffffffff8106c639>] worker_thread+0x69/0x490
>>>>>> [1387945.272456]  [<ffffffff8106c5d0>] ? rescuer_thread+0x350/0x350
>>>>>> [1387945.272630]  [<ffffffff810717bf>] kthread+0xef/0x110
>>>>>> [1387945.272803]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
>>>>>> [1387945.272975]  [<ffffffff816156bf>] ret_from_fork+0x3f/0x70
>>>>>> [1387945.273146]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
>>>>>> [1387945.273322] Code: d2 e4 ff e8 67 6f 3e 00 48 8b 85 50 ff ff ff 49 39 c6 0f 83 15 fe ff ff 31 c0 eb a7 4c 89 ff e8 3b e8 ef ff e9 b8 fe ff ff 0f 0b <0f> 0b 48 8d bd 58 ff ff ff 89 85 48 ff ff ff e8 50 f8 f0 ff 8b 
>>>>>> [1387945.276751] RIP  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
>>>>>> [1387945.277089]  RSP <ffff8800064df960>
>>>>>>
>>>>>> So a user triggers drop_caches and ext4 crashes due to it trying to 
>>>>>> write a page that isn't fs-owned. ffffffff8122a3ac is : head = page_buffers(page);
>>>>>> which has this: BUG_ON(!PagePrivate(page)); 
>>>>>>
>>>>>> page.flags is flags = 216168384067469369 (in binary: 0000001011111111111111000000000000000000000000110000000000111001) 
>>>>>> so the 11th bit (PG_private) is not set, triggering the BUG_ON. The flags are 
>>>>>> (PG_LOCKED|PG_UPTODATE|PG_DIRTY|MAPPEDTODISK|PG_RECLAIM). Do these flags seem 
>>>>>> corrupt - uptodate and dirty being set at the same time? Maybe the page struct
>>>>>> is being corrupted?
>>>>>>
>>>>>> page.private is actually NULL. The page does have an associated address_space 
>>>>>> mapping. I've validated this since address_space.host is the same as the 
>>>>>> inode member of the passed mpd. 
>>>>>
>>>>> Interesting. I didn't see this yet. What mount options does the filesystem
>>>>> use? The file where this happened is a regular file I assume, right? What
>>>>> is blocksize and page size for the filesystem?
>>>>
>>>> s_blocksize_bits = 12,
>>>> s_blocksize = 4096,
>>>>
>>>> And this is x86_64 so pagesize is also 4k. Unfortunately I cannot
>>>> re-mount the file system since it has been converted to btrfs. However
>>>> here are the options: rw,relatime,discard,stripe=32,data=ordered, these
>>>> have been taken from an analogous mount. This is how it's supposed to
>>>> have been created: mkfs.ext4 -b 4096 -K -E nodiscard,lazy_itable_init=1
>>>> -O has_journal,large_file,resize_inode,sparse_super,uninit_bg
>>>>
>>>> Since those values are from the system which supposedly created those,
>>>> here are the raw values form the ext4_sb_info->s_mount_opt* members:
>>>> s_mount_opt = 3892496400,
>>>> s_mount_opt2 = 2,
>>>>
>>>> And from ext4_super_block->s_feature_*:
>>>>
>>>>   s_feature_compat = 0,
>>>>   s_feature_incompat = 61267,
>>>>   s_feature_ro_compat = 0,
>>>
>>> This looks strange - COMPAT and RO_COMPAT features definitely should not be
>>> zero (e.g. has_journal and sparse_super should be set), also INCOMPAT
>>> features look weird - e.g. bit 0x800 is not used but it is set in incompat
>>> features. Did you get the sb pointer right?
> 
> You did not respond to this... Can you show full ext4_super_block contents
> as well?

crash> struct mpage_da_data ffff8800064dfad0
struct mpage_da_data {
  inode = 0xffff88015b3c2a78,
  wbc = 0xffff8800064dfc00,
  first_page = 1335,
  next_page = 1336,
  last_page = 18446744073709551615,
  map = {
    m_pblk = 18446612145925311300,
    m_lblk = 105773856,
    m_len = 0,
    m_flags = 2164768893
  },
  io_submit = {
    io_wbc = 0xffff8800064dfc00,
    io_bio = 0x0,
    io_end = 0xffff88010ce72510,
    io_next_block = 18446612132419992536
  }
}
crash> struct inode.i_sb 0xffff88015b3c2a78
  i_sb = 0xffff88046bce6800
crash> struct super_block.s_fs_info 0xffff88046bce6800
  s_fs_info = 0xffff88046bce7800


Here is the output of 'struct ext4_super_block 0xffff88046bce7800':
http://sprunge.us/DdEA

And the flags now look different than what I have posted previously,
very strange. I must have misread something somewhere.

Unfortunately I cannot apply the patch  you sent since the workload has
already been migrated to btrfs. I hoped the info the crash dump would be
enough to track it and fix it. The actual workload was just a bunch of
ext4 filesystems created inside a loop file and doing rsyncs into those
loop devices. So nothing special really.

> 
>>>>> The page flags actually look rather consistent. The only thing that is
>>>>
>>>> Why is UPTODATE and DIRTY set simultaneously. Don't they contradict each
>>>> other?
>>>
>>> They can be set simultaneously - 'uptodate' means page has at least as new
>>> data as is on disk, 'dirty' means page has strictly newer data than on
>>> disk.
>>>
>>> What is page->index and how large is the inode (inode->i_size)?
>>
>> So here is the full page struct for reference:
> 
> <snip>
> 
> All the structures look sane and consistent except for page->private being
> NULL...
> 
>> Also I just had the same issue happen on 2 more physical servers. This
>> excludes the possibility of a random memory corruption.
> 
> OK, any idea how reproducible it is? Attached debug patch could tell us
> more and it can be easily used in production as well, just I'm not sure to
> how many machined you'd have to deploy this to catch the problem...
> 
> Also can you check whether there are not any warnings from ext4 before this
> fatal failure in the kernel logs? Because my current suspicion is that the
> machines are getting close to OOM, some allocation in the writeback path
> fails somewhere and we do not properly recover from the failed writeback
> (i.e. we redirty the page but not corresponding buffers) or something like
> that.
> 
> 								Honza
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:2428!
  2016-11-22 14:00           ` Nikolay Borisov
@ 2016-11-23  8:36             ` Jan Kara
  2016-11-23  8:55               ` Nikolay Borisov
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2016-11-23  8:36 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Jan Kara, linux-ext4, Theodore Ts'o, Jan Kara, Andreas Dilger

On Tue 22-11-16 16:00:15, Nikolay Borisov wrote:
> On 11/22/2016 03:34 PM, Jan Kara wrote:
> > On Mon 21-11-16 16:42:09, Nikolay Borisov wrote:
> >> On 11/21/2016 04:27 PM, Jan Kara wrote:
> >>> On Mon 14-11-16 16:46:51, Nikolay Borisov wrote:
> >>>> On 11/14/2016 03:49 PM, Jan Kara wrote:
> >>>>> Hello,
> >>>>>
> >>>>> On Mon 14-11-16 12:15:16, Nikolay Borisov wrote:
> >>>>>> So I hit the following BUG_ON on 3 separate servers: 
> >>>>>>
> >>>>>> [1387898.597939] sh (14886): drop_caches: 3
> >>>>>> [1387945.259613] ------------[ cut here ]------------
> >>>>>> [1387945.259791] kernel BUG at fs/ext4/inode.c:2428!
> >>>>>> [1387945.259964] invalid opcode: 0000 [#1] SMP 
> >>>>>> [1387945.263921] CPU: 9 PID: 8987 Comm: kworker/u24:23 Tainted: P           O    4.4.26-clouder1 #3
> >>>>>> [1387945.264213] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 01/16/2015
> >>>>>> [1387945.264512] Workqueue: writeback wb_workfn (flush-253:8)
> >>>>>> [1387945.264780] task: ffff880287ca5280 ti: ffff8800064dc000 task.ti: ffff8800064dc000
> >>>>>> [1387945.265073] RIP: 0010:[<ffffffff8122a3ac>]  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
> >>>>>> [1387945.265426] RSP: 0018:ffff8800064df960  EFLAGS: 00010246
> >>>>>> [1387945.265596] RAX: 02fffc0000030039 RBX: ffff8800064dfad0 RCX: 0000000000000537
> >>>>>> [1387945.265881] RDX: 000000000000231b RSI: 0000000000000000 RDI: ffffffff81a052a6
> >>>>>> [1387945.266165] RBP: ffff8800064dfa28 R08: 0000000000000000 R09: 0000000000000000
> >>>>>> [1387945.266450] R10: 0000000000000001 R11: 0000000000000100 R12: ffff8800064df980
> >>>>>> [1387945.266734] R13: 0000000000003400 R14: ffffffffffffffff R15: ffffea000686fbc0
> >>>>>> [1387945.267024] FS:  0000000000000000(0000) GS:ffff88047fd20000(0000) knlGS:0000000000000000
> >>>>>> [1387945.267315] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>> [1387945.267487] CR2: ffffffffff600400 CR3: 00000004555ff000 CR4: 00000000000406e0
> >>>>>> [1387945.267765] Stack:
> >>>>>> [1387945.267925]  0000000000000000 ffff88015b3c2be0 ffff8800064df980 0000000000000538
> >>>>>> [1387945.268386]  0000000000000001 0000000000000000 ffffea000686fbc0 ffff88046bce1800
> >>>>>> [1387945.268850]  ffff8800064df9e8 ffffffff81274777 ffffffff02400040 0fd000086bce6800
> >>>>>> [1387945.269319] Call Trace:
> >>>>>> [1387945.269489]  [<ffffffff81274777>] ? jbd2__journal_start+0xe7/0x200
> >>>>>> [1387945.269663]  [<ffffffff8122e581>] ? ext4_writepages+0x3a1/0xcd0
> >>>>>> [1387945.269839]  [<ffffffff8125ba8d>] ? __ext4_journal_start_sb+0x6d/0x100
> >>>>>> [1387945.270013]  [<ffffffff8122e5b2>] ext4_writepages+0x3d2/0xcd0
> >>>>>> [1387945.270207]  [<ffffffffa07f878b>] ? leaf_space_used+0xcb/0x100 [btrfs]
> >>>>>> [1387945.270382]  [<ffffffff810823a1>] ? sched_clock_cpu+0xa1/0xb0
> >>>>>> [1387945.270556]  [<ffffffff8107beb4>] ? check_preempt_curr+0x54/0x90
> >>>>>> [1387945.270730]  [<ffffffff8107c07d>] ? ttwu_do_activate.constprop.92+0x5d/0x70
> >>>>>> [1387945.270905]  [<ffffffff8113735e>] do_writepages+0x1e/0x30
> >>>>>> [1387945.271076]  [<ffffffff811c6c85>] __writeback_single_inode+0x45/0x370
> >>>>>> [1387945.271250]  [<ffffffff811c74d2>] writeback_sb_inodes+0x252/0x570
> >>>>>> [1387945.271423]  [<ffffffff811c7879>] __writeback_inodes_wb+0x89/0xc0
> >>>>>> [1387945.271596]  [<ffffffff811c7bb8>] wb_writeback+0x268/0x300
> >>>>>> [1387945.271766]  [<ffffffff811c83e6>] wb_workfn+0x2d6/0x400
> >>>>>> [1387945.271938]  [<ffffffff81614ea8>] ? _raw_spin_lock_irqsave+0x18/0x50
> >>>>>> [1387945.272112]  [<ffffffff8106bf89>] process_one_work+0x159/0x450
> >>>>>> [1387945.272285]  [<ffffffff8106c639>] worker_thread+0x69/0x490
> >>>>>> [1387945.272456]  [<ffffffff8106c5d0>] ? rescuer_thread+0x350/0x350
> >>>>>> [1387945.272630]  [<ffffffff810717bf>] kthread+0xef/0x110
> >>>>>> [1387945.272803]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
> >>>>>> [1387945.272975]  [<ffffffff816156bf>] ret_from_fork+0x3f/0x70
> >>>>>> [1387945.273146]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
> >>>>>> [1387945.273322] Code: d2 e4 ff e8 67 6f 3e 00 48 8b 85 50 ff ff ff 49 39 c6 0f 83 15 fe ff ff 31 c0 eb a7 4c 89 ff e8 3b e8 ef ff e9 b8 fe ff ff 0f 0b <0f> 0b 48 8d bd 58 ff ff ff 89 85 48 ff ff ff e8 50 f8 f0 ff 8b 
> >>>>>> [1387945.276751] RIP  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
> >>>>>> [1387945.277089]  RSP <ffff8800064df960>
> >>>>>>
> >>>>>> So a user triggers drop_caches and ext4 crashes due to it trying to 
> >>>>>> write a page that isn't fs-owned. ffffffff8122a3ac is : head = page_buffers(page);
> >>>>>> which has this: BUG_ON(!PagePrivate(page)); 
> >>>>>>
> >>>>>> page.flags is flags = 216168384067469369 (in binary: 0000001011111111111111000000000000000000000000110000000000111001) 
> >>>>>> so the 11th bit (PG_private) is not set, triggering the BUG_ON. The flags are 
> >>>>>> (PG_LOCKED|PG_UPTODATE|PG_DIRTY|MAPPEDTODISK|PG_RECLAIM). Do these flags seem 
> >>>>>> corrupt - uptodate and dirty being set at the same time? Maybe the page struct
> >>>>>> is being corrupted?
> >>>>>>
> >>>>>> page.private is actually NULL. The page does have an associated address_space 
> >>>>>> mapping. I've validated this since address_space.host is the same as the 
> >>>>>> inode member of the passed mpd. 
> >>>>>
> >>>>> Interesting. I didn't see this yet. What mount options does the filesystem
> >>>>> use? The file where this happened is a regular file I assume, right? What
> >>>>> is blocksize and page size for the filesystem?
> >>>>
> >>>> s_blocksize_bits = 12,
> >>>> s_blocksize = 4096,
> >>>>
> >>>> And this is x86_64 so pagesize is also 4k. Unfortunately I cannot
> >>>> re-mount the file system since it has been converted to btrfs. However
> >>>> here are the options: rw,relatime,discard,stripe=32,data=ordered, these
> >>>> have been taken from an analogous mount. This is how it's supposed to
> >>>> have been created: mkfs.ext4 -b 4096 -K -E nodiscard,lazy_itable_init=1
> >>>> -O has_journal,large_file,resize_inode,sparse_super,uninit_bg
> >>>>
> >>>> Since those values are from the system which supposedly created those,
> >>>> here are the raw values form the ext4_sb_info->s_mount_opt* members:
> >>>> s_mount_opt = 3892496400,
> >>>> s_mount_opt2 = 2,
> >>>>
> >>>> And from ext4_super_block->s_feature_*:
> >>>>
> >>>>   s_feature_compat = 0,
> >>>>   s_feature_incompat = 61267,
> >>>>   s_feature_ro_compat = 0,
> >>>
> >>> This looks strange - COMPAT and RO_COMPAT features definitely should not be
> >>> zero (e.g. has_journal and sparse_super should be set), also INCOMPAT
> >>> features look weird - e.g. bit 0x800 is not used but it is set in incompat
> >>> features. Did you get the sb pointer right?
> > 
> > You did not respond to this... Can you show full ext4_super_block contents
> > as well?
> 
> crash> struct mpage_da_data ffff8800064dfad0
> struct mpage_da_data {
>   inode = 0xffff88015b3c2a78,
>   wbc = 0xffff8800064dfc00,
>   first_page = 1335,
>   next_page = 1336,
>   last_page = 18446744073709551615,
>   map = {
>     m_pblk = 18446612145925311300,
>     m_lblk = 105773856,
>     m_len = 0,
>     m_flags = 2164768893
>   },
>   io_submit = {
>     io_wbc = 0xffff8800064dfc00,
>     io_bio = 0x0,
>     io_end = 0xffff88010ce72510,
>     io_next_block = 18446612132419992536
>   }
> }
> crash> struct inode.i_sb 0xffff88015b3c2a78
>   i_sb = 0xffff88046bce6800
> crash> struct super_block.s_fs_info 0xffff88046bce6800
>   s_fs_info = 0xffff88046bce7800
> 
> 
> Here is the output of 'struct ext4_super_block 0xffff88046bce7800':
> http://sprunge.us/DdEA

Ah, s_fs_info points to struct ext4_sb_info, not to struct
ext4_super_block. That's why the contents does not make sense. Can you dump
0xffff88046bce7800 as ext4_sb_info and then s_fs_info->s_es as
ext4_super_block? Thanks.

> Unfortunately I cannot apply the patch  you sent since the workload has
> already been migrated to btrfs. I hoped the info the crash dump would be
> enough to track it and fix it. The actual workload was just a bunch of
> ext4 filesystems created inside a loop file and doing rsyncs into those
> loop devices. So nothing special really.

OK.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:2428!
  2016-11-23  8:36             ` Jan Kara
@ 2016-11-23  8:55               ` Nikolay Borisov
  2016-11-23  9:27                 ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Nikolay Borisov @ 2016-11-23  8:55 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Theodore Ts'o, Jan Kara, Andreas Dilger



On 11/23/2016 10:36 AM, Jan Kara wrote:
> On Tue 22-11-16 16:00:15, Nikolay Borisov wrote:
>> On 11/22/2016 03:34 PM, Jan Kara wrote:
>>> On Mon 21-11-16 16:42:09, Nikolay Borisov wrote:
>>>> On 11/21/2016 04:27 PM, Jan Kara wrote:
>>>>> On Mon 14-11-16 16:46:51, Nikolay Borisov wrote:
>>>>>> On 11/14/2016 03:49 PM, Jan Kara wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> On Mon 14-11-16 12:15:16, Nikolay Borisov wrote:
>>>>>>>> So I hit the following BUG_ON on 3 separate servers: 
>>>>>>>>
>>>>>>>> [1387898.597939] sh (14886): drop_caches: 3
>>>>>>>> [1387945.259613] ------------[ cut here ]------------
>>>>>>>> [1387945.259791] kernel BUG at fs/ext4/inode.c:2428!
>>>>>>>> [1387945.259964] invalid opcode: 0000 [#1] SMP 
>>>>>>>> [1387945.263921] CPU: 9 PID: 8987 Comm: kworker/u24:23 Tainted: P           O    4.4.26-clouder1 #3
>>>>>>>> [1387945.264213] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 01/16/2015
>>>>>>>> [1387945.264512] Workqueue: writeback wb_workfn (flush-253:8)
>>>>>>>> [1387945.264780] task: ffff880287ca5280 ti: ffff8800064dc000 task.ti: ffff8800064dc000
>>>>>>>> [1387945.265073] RIP: 0010:[<ffffffff8122a3ac>]  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
>>>>>>>> [1387945.265426] RSP: 0018:ffff8800064df960  EFLAGS: 00010246
>>>>>>>> [1387945.265596] RAX: 02fffc0000030039 RBX: ffff8800064dfad0 RCX: 0000000000000537
>>>>>>>> [1387945.265881] RDX: 000000000000231b RSI: 0000000000000000 RDI: ffffffff81a052a6
>>>>>>>> [1387945.266165] RBP: ffff8800064dfa28 R08: 0000000000000000 R09: 0000000000000000
>>>>>>>> [1387945.266450] R10: 0000000000000001 R11: 0000000000000100 R12: ffff8800064df980
>>>>>>>> [1387945.266734] R13: 0000000000003400 R14: ffffffffffffffff R15: ffffea000686fbc0
>>>>>>>> [1387945.267024] FS:  0000000000000000(0000) GS:ffff88047fd20000(0000) knlGS:0000000000000000
>>>>>>>> [1387945.267315] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> [1387945.267487] CR2: ffffffffff600400 CR3: 00000004555ff000 CR4: 00000000000406e0
>>>>>>>> [1387945.267765] Stack:
>>>>>>>> [1387945.267925]  0000000000000000 ffff88015b3c2be0 ffff8800064df980 0000000000000538
>>>>>>>> [1387945.268386]  0000000000000001 0000000000000000 ffffea000686fbc0 ffff88046bce1800
>>>>>>>> [1387945.268850]  ffff8800064df9e8 ffffffff81274777 ffffffff02400040 0fd000086bce6800
>>>>>>>> [1387945.269319] Call Trace:
>>>>>>>> [1387945.269489]  [<ffffffff81274777>] ? jbd2__journal_start+0xe7/0x200
>>>>>>>> [1387945.269663]  [<ffffffff8122e581>] ? ext4_writepages+0x3a1/0xcd0
>>>>>>>> [1387945.269839]  [<ffffffff8125ba8d>] ? __ext4_journal_start_sb+0x6d/0x100
>>>>>>>> [1387945.270013]  [<ffffffff8122e5b2>] ext4_writepages+0x3d2/0xcd0
>>>>>>>> [1387945.270207]  [<ffffffffa07f878b>] ? leaf_space_used+0xcb/0x100 [btrfs]
>>>>>>>> [1387945.270382]  [<ffffffff810823a1>] ? sched_clock_cpu+0xa1/0xb0
>>>>>>>> [1387945.270556]  [<ffffffff8107beb4>] ? check_preempt_curr+0x54/0x90
>>>>>>>> [1387945.270730]  [<ffffffff8107c07d>] ? ttwu_do_activate.constprop.92+0x5d/0x70
>>>>>>>> [1387945.270905]  [<ffffffff8113735e>] do_writepages+0x1e/0x30
>>>>>>>> [1387945.271076]  [<ffffffff811c6c85>] __writeback_single_inode+0x45/0x370
>>>>>>>> [1387945.271250]  [<ffffffff811c74d2>] writeback_sb_inodes+0x252/0x570
>>>>>>>> [1387945.271423]  [<ffffffff811c7879>] __writeback_inodes_wb+0x89/0xc0
>>>>>>>> [1387945.271596]  [<ffffffff811c7bb8>] wb_writeback+0x268/0x300
>>>>>>>> [1387945.271766]  [<ffffffff811c83e6>] wb_workfn+0x2d6/0x400
>>>>>>>> [1387945.271938]  [<ffffffff81614ea8>] ? _raw_spin_lock_irqsave+0x18/0x50
>>>>>>>> [1387945.272112]  [<ffffffff8106bf89>] process_one_work+0x159/0x450
>>>>>>>> [1387945.272285]  [<ffffffff8106c639>] worker_thread+0x69/0x490
>>>>>>>> [1387945.272456]  [<ffffffff8106c5d0>] ? rescuer_thread+0x350/0x350
>>>>>>>> [1387945.272630]  [<ffffffff810717bf>] kthread+0xef/0x110
>>>>>>>> [1387945.272803]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
>>>>>>>> [1387945.272975]  [<ffffffff816156bf>] ret_from_fork+0x3f/0x70
>>>>>>>> [1387945.273146]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
>>>>>>>> [1387945.273322] Code: d2 e4 ff e8 67 6f 3e 00 48 8b 85 50 ff ff ff 49 39 c6 0f 83 15 fe ff ff 31 c0 eb a7 4c 89 ff e8 3b e8 ef ff e9 b8 fe ff ff 0f 0b <0f> 0b 48 8d bd 58 ff ff ff 89 85 48 ff ff ff e8 50 f8 f0 ff 8b 
>>>>>>>> [1387945.276751] RIP  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
>>>>>>>> [1387945.277089]  RSP <ffff8800064df960>
>>>>>>>>
>>>>>>>> So a user triggers drop_caches and ext4 crashes due to it trying to 
>>>>>>>> write a page that isn't fs-owned. ffffffff8122a3ac is : head = page_buffers(page);
>>>>>>>> which has this: BUG_ON(!PagePrivate(page)); 
>>>>>>>>
>>>>>>>> page.flags is flags = 216168384067469369 (in binary: 0000001011111111111111000000000000000000000000110000000000111001) 
>>>>>>>> so the 11th bit (PG_private) is not set, triggering the BUG_ON. The flags are 
>>>>>>>> (PG_LOCKED|PG_UPTODATE|PG_DIRTY|MAPPEDTODISK|PG_RECLAIM). Do these flags seem 
>>>>>>>> corrupt - uptodate and dirty being set at the same time? Maybe the page struct
>>>>>>>> is being corrupted?
>>>>>>>>
>>>>>>>> page.private is actually NULL. The page does have an associated address_space 
>>>>>>>> mapping. I've validated this since address_space.host is the same as the 
>>>>>>>> inode member of the passed mpd. 
>>>>>>>
>>>>>>> Interesting. I didn't see this yet. What mount options does the filesystem
>>>>>>> use? The file where this happened is a regular file I assume, right? What
>>>>>>> is blocksize and page size for the filesystem?
>>>>>>
>>>>>> s_blocksize_bits = 12,
>>>>>> s_blocksize = 4096,
>>>>>>
>>>>>> And this is x86_64 so pagesize is also 4k. Unfortunately I cannot
>>>>>> re-mount the file system since it has been converted to btrfs. However
>>>>>> here are the options: rw,relatime,discard,stripe=32,data=ordered, these
>>>>>> have been taken from an analogous mount. This is how it's supposed to
>>>>>> have been created: mkfs.ext4 -b 4096 -K -E nodiscard,lazy_itable_init=1
>>>>>> -O has_journal,large_file,resize_inode,sparse_super,uninit_bg
>>>>>>
>>>>>> Since those values are from the system which supposedly created those,
>>>>>> here are the raw values form the ext4_sb_info->s_mount_opt* members:
>>>>>> s_mount_opt = 3892496400,
>>>>>> s_mount_opt2 = 2,
>>>>>>
>>>>>> And from ext4_super_block->s_feature_*:
>>>>>>
>>>>>>   s_feature_compat = 0,
>>>>>>   s_feature_incompat = 61267,
>>>>>>   s_feature_ro_compat = 0,
>>>>>
>>>>> This looks strange - COMPAT and RO_COMPAT features definitely should not be
>>>>> zero (e.g. has_journal and sparse_super should be set), also INCOMPAT
>>>>> features look weird - e.g. bit 0x800 is not used but it is set in incompat
>>>>> features. Did you get the sb pointer right?
>>>
>>> You did not respond to this... Can you show full ext4_super_block contents
>>> as well?
>>
>> crash> struct mpage_da_data ffff8800064dfad0
>> struct mpage_da_data {
>>   inode = 0xffff88015b3c2a78,
>>   wbc = 0xffff8800064dfc00,
>>   first_page = 1335,
>>   next_page = 1336,
>>   last_page = 18446744073709551615,
>>   map = {
>>     m_pblk = 18446612145925311300,
>>     m_lblk = 105773856,
>>     m_len = 0,
>>     m_flags = 2164768893
>>   },
>>   io_submit = {
>>     io_wbc = 0xffff8800064dfc00,
>>     io_bio = 0x0,
>>     io_end = 0xffff88010ce72510,
>>     io_next_block = 18446612132419992536
>>   }
>> }
>> crash> struct inode.i_sb 0xffff88015b3c2a78
>>   i_sb = 0xffff88046bce6800
>> crash> struct super_block.s_fs_info 0xffff88046bce6800
>>   s_fs_info = 0xffff88046bce7800
>>
>>
>> Here is the output of 'struct ext4_super_block 0xffff88046bce7800':
>> http://sprunge.us/DdEA
> 
> Ah, s_fs_info points to struct ext4_sb_info, not to struct
> ext4_super_block. That's why the contents does not make sense. Can you dump
> 0xffff88046bce7800 as ext4_sb_info and then s_fs_info->s_es as
> ext4_super_block? Thanks.

Ah, silly me. However no luck: 

crash> struct ext4_sb_info.s_es 0xffff88046bce7800
  s_es = 0xffff8800072cd400
crash> struct ext4_super_block 0xffff8800072cd400
struct ext4_super_block struct: page excluded: kernel virtual address: ffff8800072cd400  type: "gdb_readmem_callback"
Cannot access memory at address 0xffff8800072cd400

My kdump filtering flags are set to 31 so that I get only core kernel memory, 
i don't see a reason why this address should be excluded. I also checked 
on the other 2 machines experiencing this problem and the s_es memory is 
being excluded. I guess I'm out of luck :( 


> 
>> Unfortunately I cannot apply the patch  you sent since the workload has
>> already been migrated to btrfs. I hoped the info the crash dump would be
>> enough to track it and fix it. The actual workload was just a bunch of
>> ext4 filesystems created inside a loop file and doing rsyncs into those
>> loop devices. So nothing special really.
> 
> OK.
> 
> 								Honza
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: kernel BUG at fs/ext4/inode.c:2428!
  2016-11-23  8:55               ` Nikolay Borisov
@ 2016-11-23  9:27                 ` Jan Kara
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2016-11-23  9:27 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Jan Kara, linux-ext4, Theodore Ts'o, Jan Kara, Andreas Dilger

On Wed 23-11-16 10:55:02, Nikolay Borisov wrote:
> 
> 
> On 11/23/2016 10:36 AM, Jan Kara wrote:
> > On Tue 22-11-16 16:00:15, Nikolay Borisov wrote:
> >> On 11/22/2016 03:34 PM, Jan Kara wrote:
> >>> On Mon 21-11-16 16:42:09, Nikolay Borisov wrote:
> >>>> On 11/21/2016 04:27 PM, Jan Kara wrote:
> >>>>> On Mon 14-11-16 16:46:51, Nikolay Borisov wrote:
> >>>>>> On 11/14/2016 03:49 PM, Jan Kara wrote:
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> On Mon 14-11-16 12:15:16, Nikolay Borisov wrote:
> >>>>>>>> So I hit the following BUG_ON on 3 separate servers: 
> >>>>>>>>
> >>>>>>>> [1387898.597939] sh (14886): drop_caches: 3
> >>>>>>>> [1387945.259613] ------------[ cut here ]------------
> >>>>>>>> [1387945.259791] kernel BUG at fs/ext4/inode.c:2428!
> >>>>>>>> [1387945.259964] invalid opcode: 0000 [#1] SMP 
> >>>>>>>> [1387945.263921] CPU: 9 PID: 8987 Comm: kworker/u24:23 Tainted: P           O    4.4.26-clouder1 #3
> >>>>>>>> [1387945.264213] Hardware name: Supermicro X9DRD-iF/LF/X9DRD-iF, BIOS 3.2 01/16/2015
> >>>>>>>> [1387945.264512] Workqueue: writeback wb_workfn (flush-253:8)
> >>>>>>>> [1387945.264780] task: ffff880287ca5280 ti: ffff8800064dc000 task.ti: ffff8800064dc000
> >>>>>>>> [1387945.265073] RIP: 0010:[<ffffffff8122a3ac>]  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
> >>>>>>>> [1387945.265426] RSP: 0018:ffff8800064df960  EFLAGS: 00010246
> >>>>>>>> [1387945.265596] RAX: 02fffc0000030039 RBX: ffff8800064dfad0 RCX: 0000000000000537
> >>>>>>>> [1387945.265881] RDX: 000000000000231b RSI: 0000000000000000 RDI: ffffffff81a052a6
> >>>>>>>> [1387945.266165] RBP: ffff8800064dfa28 R08: 0000000000000000 R09: 0000000000000000
> >>>>>>>> [1387945.266450] R10: 0000000000000001 R11: 0000000000000100 R12: ffff8800064df980
> >>>>>>>> [1387945.266734] R13: 0000000000003400 R14: ffffffffffffffff R15: ffffea000686fbc0
> >>>>>>>> [1387945.267024] FS:  0000000000000000(0000) GS:ffff88047fd20000(0000) knlGS:0000000000000000
> >>>>>>>> [1387945.267315] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>>>> [1387945.267487] CR2: ffffffffff600400 CR3: 00000004555ff000 CR4: 00000000000406e0
> >>>>>>>> [1387945.267765] Stack:
> >>>>>>>> [1387945.267925]  0000000000000000 ffff88015b3c2be0 ffff8800064df980 0000000000000538
> >>>>>>>> [1387945.268386]  0000000000000001 0000000000000000 ffffea000686fbc0 ffff88046bce1800
> >>>>>>>> [1387945.268850]  ffff8800064df9e8 ffffffff81274777 ffffffff02400040 0fd000086bce6800
> >>>>>>>> [1387945.269319] Call Trace:
> >>>>>>>> [1387945.269489]  [<ffffffff81274777>] ? jbd2__journal_start+0xe7/0x200
> >>>>>>>> [1387945.269663]  [<ffffffff8122e581>] ? ext4_writepages+0x3a1/0xcd0
> >>>>>>>> [1387945.269839]  [<ffffffff8125ba8d>] ? __ext4_journal_start_sb+0x6d/0x100
> >>>>>>>> [1387945.270013]  [<ffffffff8122e5b2>] ext4_writepages+0x3d2/0xcd0
> >>>>>>>> [1387945.270207]  [<ffffffffa07f878b>] ? leaf_space_used+0xcb/0x100 [btrfs]
> >>>>>>>> [1387945.270382]  [<ffffffff810823a1>] ? sched_clock_cpu+0xa1/0xb0
> >>>>>>>> [1387945.270556]  [<ffffffff8107beb4>] ? check_preempt_curr+0x54/0x90
> >>>>>>>> [1387945.270730]  [<ffffffff8107c07d>] ? ttwu_do_activate.constprop.92+0x5d/0x70
> >>>>>>>> [1387945.270905]  [<ffffffff8113735e>] do_writepages+0x1e/0x30
> >>>>>>>> [1387945.271076]  [<ffffffff811c6c85>] __writeback_single_inode+0x45/0x370
> >>>>>>>> [1387945.271250]  [<ffffffff811c74d2>] writeback_sb_inodes+0x252/0x570
> >>>>>>>> [1387945.271423]  [<ffffffff811c7879>] __writeback_inodes_wb+0x89/0xc0
> >>>>>>>> [1387945.271596]  [<ffffffff811c7bb8>] wb_writeback+0x268/0x300
> >>>>>>>> [1387945.271766]  [<ffffffff811c83e6>] wb_workfn+0x2d6/0x400
> >>>>>>>> [1387945.271938]  [<ffffffff81614ea8>] ? _raw_spin_lock_irqsave+0x18/0x50
> >>>>>>>> [1387945.272112]  [<ffffffff8106bf89>] process_one_work+0x159/0x450
> >>>>>>>> [1387945.272285]  [<ffffffff8106c639>] worker_thread+0x69/0x490
> >>>>>>>> [1387945.272456]  [<ffffffff8106c5d0>] ? rescuer_thread+0x350/0x350
> >>>>>>>> [1387945.272630]  [<ffffffff810717bf>] kthread+0xef/0x110
> >>>>>>>> [1387945.272803]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
> >>>>>>>> [1387945.272975]  [<ffffffff816156bf>] ret_from_fork+0x3f/0x70
> >>>>>>>> [1387945.273146]  [<ffffffff810716d0>] ? kthread_park+0x60/0x60
> >>>>>>>> [1387945.273322] Code: d2 e4 ff e8 67 6f 3e 00 48 8b 85 50 ff ff ff 49 39 c6 0f 83 15 fe ff ff 31 c0 eb a7 4c 89 ff e8 3b e8 ef ff e9 b8 fe ff ff 0f 0b <0f> 0b 48 8d bd 58 ff ff ff 89 85 48 ff ff ff e8 50 f8 f0 ff 8b 
> >>>>>>>> [1387945.276751] RIP  [<ffffffff8122a3ac>] mpage_prepare_extent_to_map+0x29c/0x2d0
> >>>>>>>> [1387945.277089]  RSP <ffff8800064df960>
> >>>>>>>>
> >>>>>>>> So a user triggers drop_caches and ext4 crashes due to it trying to 
> >>>>>>>> write a page that isn't fs-owned. ffffffff8122a3ac is : head = page_buffers(page);
> >>>>>>>> which has this: BUG_ON(!PagePrivate(page)); 
> >>>>>>>>
> >>>>>>>> page.flags is flags = 216168384067469369 (in binary: 0000001011111111111111000000000000000000000000110000000000111001) 
> >>>>>>>> so the 11th bit (PG_private) is not set, triggering the BUG_ON. The flags are 
> >>>>>>>> (PG_LOCKED|PG_UPTODATE|PG_DIRTY|MAPPEDTODISK|PG_RECLAIM). Do these flags seem 
> >>>>>>>> corrupt - uptodate and dirty being set at the same time? Maybe the page struct
> >>>>>>>> is being corrupted?
> >>>>>>>>
> >>>>>>>> page.private is actually NULL. The page does have an associated address_space 
> >>>>>>>> mapping. I've validated this since address_space.host is the same as the 
> >>>>>>>> inode member of the passed mpd. 
> >>>>>>>
> >>>>>>> Interesting. I didn't see this yet. What mount options does the filesystem
> >>>>>>> use? The file where this happened is a regular file I assume, right? What
> >>>>>>> is blocksize and page size for the filesystem?
> >>>>>>
> >>>>>> s_blocksize_bits = 12,
> >>>>>> s_blocksize = 4096,
> >>>>>>
> >>>>>> And this is x86_64 so pagesize is also 4k. Unfortunately I cannot
> >>>>>> re-mount the file system since it has been converted to btrfs. However
> >>>>>> here are the options: rw,relatime,discard,stripe=32,data=ordered, these
> >>>>>> have been taken from an analogous mount. This is how it's supposed to
> >>>>>> have been created: mkfs.ext4 -b 4096 -K -E nodiscard,lazy_itable_init=1
> >>>>>> -O has_journal,large_file,resize_inode,sparse_super,uninit_bg
> >>>>>>
> >>>>>> Since those values are from the system which supposedly created those,
> >>>>>> here are the raw values form the ext4_sb_info->s_mount_opt* members:
> >>>>>> s_mount_opt = 3892496400,
> >>>>>> s_mount_opt2 = 2,
> >>>>>>
> >>>>>> And from ext4_super_block->s_feature_*:
> >>>>>>
> >>>>>>   s_feature_compat = 0,
> >>>>>>   s_feature_incompat = 61267,
> >>>>>>   s_feature_ro_compat = 0,
> >>>>>
> >>>>> This looks strange - COMPAT and RO_COMPAT features definitely should not be
> >>>>> zero (e.g. has_journal and sparse_super should be set), also INCOMPAT
> >>>>> features look weird - e.g. bit 0x800 is not used but it is set in incompat
> >>>>> features. Did you get the sb pointer right?
> >>>
> >>> You did not respond to this... Can you show full ext4_super_block contents
> >>> as well?
> >>
> >> crash> struct mpage_da_data ffff8800064dfad0
> >> struct mpage_da_data {
> >>   inode = 0xffff88015b3c2a78,
> >>   wbc = 0xffff8800064dfc00,
> >>   first_page = 1335,
> >>   next_page = 1336,
> >>   last_page = 18446744073709551615,
> >>   map = {
> >>     m_pblk = 18446612145925311300,
> >>     m_lblk = 105773856,
> >>     m_len = 0,
> >>     m_flags = 2164768893
> >>   },
> >>   io_submit = {
> >>     io_wbc = 0xffff8800064dfc00,
> >>     io_bio = 0x0,
> >>     io_end = 0xffff88010ce72510,
> >>     io_next_block = 18446612132419992536
> >>   }
> >> }
> >> crash> struct inode.i_sb 0xffff88015b3c2a78
> >>   i_sb = 0xffff88046bce6800
> >> crash> struct super_block.s_fs_info 0xffff88046bce6800
> >>   s_fs_info = 0xffff88046bce7800
> >>
> >>
> >> Here is the output of 'struct ext4_super_block 0xffff88046bce7800':
> >> http://sprunge.us/DdEA
> > 
> > Ah, s_fs_info points to struct ext4_sb_info, not to struct
> > ext4_super_block. That's why the contents does not make sense. Can you dump
> > 0xffff88046bce7800 as ext4_sb_info and then s_fs_info->s_es as
> > ext4_super_block? Thanks.
> 
> Ah, silly me. However no luck: 
> 
> crash> struct ext4_sb_info.s_es 0xffff88046bce7800
>   s_es = 0xffff8800072cd400
> crash> struct ext4_super_block 0xffff8800072cd400
> struct ext4_super_block struct: page excluded: kernel virtual address: ffff8800072cd400  type: "gdb_readmem_callback"
> Cannot access memory at address 0xffff8800072cd400
> 
> My kdump filtering flags are set to 31 so that I get only core kernel memory, 
> i don't see a reason why this address should be excluded. I also checked 
> on the other 2 machines experiencing this problem and the s_es memory is 
> being excluded. I guess I'm out of luck :( 

Yeah, s_es actually points to the on-disk super block in buffer cache so it
makes sense that with your filtering flags it got excluded. Bad luck. I'm
trying some test runs on my test machine with some debugging added. Maybe
it will show something...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-11-23  9:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-14 10:15 kernel BUG at fs/ext4/inode.c:2428! Nikolay Borisov
2016-11-14 13:49 ` Jan Kara
2016-11-14 14:46   ` Nikolay Borisov
2016-11-21 14:27     ` Jan Kara
2016-11-21 14:42       ` Nikolay Borisov
2016-11-21 15:20         ` Theodore Ts'o
2016-11-21 15:32           ` Nikolay Borisov
2016-11-22 13:34         ` Jan Kara
2016-11-22 14:00           ` Nikolay Borisov
2016-11-23  8:36             ` Jan Kara
2016-11-23  8:55               ` Nikolay Borisov
2016-11-23  9:27                 ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).