From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nikolay Borisov Subject: Re: ext4 crash in 4.4.10 Date: Mon, 4 Jul 2016 11:49:27 +0300 Message-ID: <577A2317.2070609@kyup.com> References: <57513FAF.5030800@kyup.com> <20160603091936.GA2470@quack2.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: linux-ext4 , Theodore Ts'o , Jan Kara , SiteGround Operations To: Jan Kara Return-path: Received: from mail-wm0-f47.google.com ([74.125.82.47]:36911 "EHLO mail-wm0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932189AbcGDItb (ORCPT ); Mon, 4 Jul 2016 04:49:31 -0400 Received: by mail-wm0-f47.google.com with SMTP id a66so105692211wme.0 for ; Mon, 04 Jul 2016 01:49:30 -0700 (PDT) In-Reply-To: <20160603091936.GA2470@quack2.suse.cz> Sender: linux-ext4-owner@vger.kernel.org List-ID: Hello again Jan, On 06/03/2016 12:19 PM, Jan Kara wrote: > Hi, > > On Fri 03-06-16 11:28:31, Nikolay Borisov wrote: >> Recently the following crash was brought to my attention: >> [SNIP] > > Hum, this looks most likely like a memory corruption. The value > ffffffffd9c01f11 doesn't look like a valid pointer to any dynamically > allocated data (it is not aligned to multiple of 4, it does not point to > data segment ffff88..........). It is close to a pointer to kernel code > (modules start at ffffffffa.......) so if it really points to some kernel > code it may be interesting to find out where. I have no clue how such > number could get to ei->i_dquot[0]. Usually what I do in such cases is > search kernel memory whether something unusual points to that place, > whether previous struct members didn't get corrupted as well or whether > that value is not also somewhere else in memory. But it's a search for a > needle in a haystack. > > Honza So I got this exact same crash on a different machine, with the exact same value. This rules out it being a random corruption: [2455521.848677] BUG: unable to handle kernel paging request at ffffffffd9c01fb1 [2455521.849025] IP: [] dquot_free_inode+0xa2/0x230 [2455521.849315] PGD 1c0b067 PUD 1c0d067 PMD 0 [2455521.849720] Oops: 0000 [#1] SMP [2455521.850062] Modules linked in: [2455521.856549] ipv6 [last unloaded: nf_conntrack_ftp] [2455521.856904] CPU: 8 PID: 2955 Comm: rm Tainted: G O 4.4.10-clouder1 #73 [2455521.857286] Hardware name: Supermicro X10DRi/X10DRi, BIOS 2.0 12/28/2015 [2455521.857517] task: ffff883506658000 ti: ffff881d50198000 task.ti: ffff881d50198000 [2455521.857898] RIP: 0010:[] [] dquot_free_inode+0xa2/0x230 [2455521.858353] RSP: 0018:ffff881d5019bc48 EFLAGS: 00010286 [2455521.858581] RAX: ffffffffd9c01f11 RBX: ffff881d5019bc48 RCX: 000000000000fb20 [2455521.858962] RDX: ffff881d5019bc58 RSI: ffff880996894680 RDI: ffffffff81c09540 [2455521.859343] RBP: ffff881d5019bcc8 R08: 0000000000000001 R09: ffff881d5019bc58 [2455521.859724] R10: ffff881d5019bca0 R11: 0000000100000000 R12: ffff880996894680 [2455521.860105] R13: 0000000000000000 R14: 0000000000000008 R15: ffff881d5019be68 [2455521.860486] FS: 00007f6ad2fe9700(0000) GS:ffff881fffb00000(0000) knlGS:0000000000000000 [2455521.860868] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2455521.861096] CR2: ffffffffd9c01fb1 CR3: 0000000151007000 CR4: 00000000001406e0 [2455521.861476] Stack: [2455521.861696] ffff881fa0388c00 ffff880996894368 0000000000000000 0000000000000000 [2455521.862335] 0000000000000000 ffffffff8123949c ffff881d5019bd28 ffffffff812351c8 [2455521.862972] ffff881d5019bcb8 ffff883fb9a4d800 ffff881ff093a810 ffff883fb9a4d800 [2455521.863611] Call Trace: [2455521.863838] [] ? ext4_evict_inode+0x26c/0x4c0 [2455521.864069] [] ? ext4_mark_iloc_dirty+0x518/0x770 [2455521.864304] [] ext4_free_inode+0x83/0x5a0 [2455521.864534] [] ? ext4_evict_inode+0x26c/0x4c0 [2455521.864765] [] ? ext4_mark_inode_dirty+0x7b/0x260 [2455521.864999] [] ext4_evict_inode+0x4b5/0x4c0 [2455521.865233] [] evict+0xc6/0x1c0 [2455521.865466] [] iput+0x1ec/0x260 [2455521.865696] [] ? vfs_unlink+0x128/0x130 [2455521.865928] [] do_unlinkat+0x186/0x2c0 [2455521.866158] [] SyS_unlinkat+0x22/0x40 [2455521.866390] [] entry_SYSCALL_64_fastpath+0x12/0x6a [2455521.866620] Code: 80 41 be 08 00 00 00 65 ff 0d cf 60 e0 7e e8 f6 0d 43 00 48 8d 53 10 4c 89 e6 4c 8d 55 d8 66 c7 02 00 00 48 8b 06 48 85 c0 74 61 <48> 8b 88 a0 00 00 00 4c 8d 80 a0 00 00 00 83 e1 08 0f 84 a5 00 [2455521.871376] RIP [] dquot_free_inode+0xa2/0x230 [2455521.871674] RSP [2455521.871897] CR2: ffffffffd9c01fb1 The crash again points to test_bit in info_idq_free. I followed your advise to search for the address and here is what I got: crash> search -m ffffffff00000000 d9c01f11 ffff88000181e030: d9c01927d9c01f11 ffff880996894680: ffffffffd9c01f11 ffff881d5019b858: ffffffffd9c01f11 ffff881d5019b998: ffffffffd9c01f11 - ffff881d5019bbe8: ffffffffd9c01f11 -