From: Bob Liu <bob.liu@oracle.com>
To: Evgenii Shatokhin <eshatokhin@virtuozzo.com>
Cc: Juergen Gross <jgross@suse.com>,
Dario Faggioli <dario.faggioli@citrix.com>,
George Dunlap <George.Dunlap@citrix.com>,
xen-devel@lists.xen.org, David Vrabel <david.vrabel@citrix.com>,
Konstantin Khorenko <khorenko@virtuozzo.com>,
Roger Pau Monne <roger.paumonne@citrix.com>
Subject: Re: [BUG] kernel BUG at drivers/block/xen-blkfront.c:1711
Date: Wed, 10 Aug 2016 20:49:40 +0800 [thread overview]
Message-ID: <57AB22E4.1040208@oracle.com> (raw)
In-Reply-To: <57AB1F2B.5080403@virtuozzo.com>
On 08/10/2016 08:33 PM, Evgenii Shatokhin wrote:
> On 14.07.2016 15:04, Bob Liu wrote:
>>
>> On 07/14/2016 07:49 PM, Evgenii Shatokhin wrote:
>>> On 11.07.2016 15:04, Bob Liu wrote:
>>>>
>>>>
>>>> On 07/11/2016 04:50 PM, Evgenii Shatokhin wrote:
>>>>> On 06.06.2016 11:42, Dario Faggioli wrote:
>>>>>> Just Cc-ing some Linux, block, and Xen on CentOS people...
>>>>>>
>>>>>
>>>>> Ping.
>>>>>
>>>>> Any suggestions how to debug this or what might cause the problem?
>>>>>
>>>>> Obviously, we cannot control Xen on the Amazon's servers. But perhaps there is something we can do at the kernel's side, is it?
>>>>>
>>>>>> On Mon, 2016-06-06 at 11:24 +0300, Evgenii Shatokhin wrote:
>>>>>>> (Resending this bug report because the message I sent last week did
>>>>>>> not
>>>>>>> make it to the mailing list somehow.)
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> One of our users gets kernel panics from time to time when he tries
>>>>>>> to
>>>>>>> use his Amazon EC2 instance with CentOS7 x64 in it [1]. Kernel panic
>>>>>>> happens within minutes from the moment the instance starts. The
>>>>>>> problem
>>>>>>> does not show up every time, however.
>>>>>>>
>>>>>>> The user first observed the problem with a custom kernel, but it was
>>>>>>> found later that the stock kernel 3.10.0-327.18.2.el7.x86_64 from
>>>>>>> CentOS7 was affected as well.
>>>>
>>>> Please try this patch:
>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7b0767502b5db11cb1f0daef2d01f6d71b1192dc
>>>>
>>>> Regards,
>>>> Bob
>>>>
>>>
>>> Unfortunately, it did not help. The same BUG_ON() in blkfront_setup_indirect() still triggers in our kernel based on RHEL's 3.10.0-327.18.2, where I added the patch.
>>>
>>> As far as I can see, the patch makes sure the indirect pages are added to the list only if (!info->feature_persistent) holds. I suppose it holds in our case and the pages are added to the list because the triggered BUG_ON() is here:
>>>
>>> if (!info->feature_persistent && info->max_indirect_segments) {
>>> <...>
>>> BUG_ON(!list_empty(&info->indirect_pages));
>>> <...>
>>> }
>>>
>>
>> That's odd.
>> Could you please try to reproduce this issue with a recent upstream kernel?
>>
>> Thanks,
>> Bob
>
> No luck with the upstream kernel 4.7.0 so far due to unrelated issues (bad initrd, I suppose, so the system does not even boot).
>
> However, the problem reproduced with the stable upstream kernel 3.14.74. After the system booted the second time with this kernel, that BUG_ON triggered:
> kernel BUG at drivers/block/xen-blkfront.c:1701
>
Could you please provide more detail on how to reproduce this bug? I'd like to have a test.
Thanks!
Bob
>>
>>> So the problem is still out there somewhere, it seems.
>>>
>>> Regards,
>>> Evgenii
>>>
>>>>>>>
>>>>>>> The part of the system log he was able to retrieve is attached. Here
>>>>>>> is
>>>>>>> the bug info, for convenience:
>>>>>>>
>>>>>>> ------------------------------------
>>>>>>> [ 2.246912] kernel BUG at drivers/block/xen-blkfront.c:1711!
>>>>>>> [ 2.246912] invalid opcode: 0000 [#1] SMP
>>>>>>> [ 2.246912] Modules linked in: ata_generic pata_acpi
>>>>>>> crct10dif_pclmul
>>>>>>> crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel
>>>>>>> xen_netfront xen_blkfront(+) aesni_intel lrw ata_piix gf128mul
>>>>>>> glue_helper ablk_helper cryptd libata serio_raw floppy sunrpc
>>>>>>> dm_mirror
>>>>>>> dm_region_hash dm_log dm_mod scsi_transport_iscsi
>>>>>>> [ 2.246912] CPU: 1 PID: 50 Comm: xenwatch Not tainted
>>>>>>> 3.10.0-327.18.2.el7.x86_64 #1
>>>>>>> [ 2.246912] Hardware name: Xen HVM domU, BIOS 4.2.amazon
>>>>>>> 12/07/2015
>>>>>>> [ 2.246912] task: ffff8800e9fcb980 ti: ffff8800e98bc000 task.ti:
>>>>>>> ffff8800e98bc000
>>>>>>> [ 2.246912] RIP: 0010:[<ffffffffa015584f>] [<ffffffffa015584f>]
>>>>>>> blkfront_setup_indirect+0x41f/0x430 [xen_blkfront]
>>>>>>> [ 2.246912] RSP: 0018:ffff8800e98bfcd0 EFLAGS: 00010283
>>>>>>> [ 2.246912] RAX: ffff8800353e15c0 RBX: ffff8800e98c52c8 RCX:
>>>>>>> 0000000000000020
>>>>>>> [ 2.246912] RDX: ffff8800353e15b0 RSI: ffff8800e98c52b8 RDI:
>>>>>>> ffff8800353e15d0
>>>>>>> [ 2.246912] RBP: ffff8800e98bfd20 R08: ffff8800353e15b0 R09:
>>>>>>> ffff8800eb403c00
>>>>>>> [ 2.246912] R10: ffffffffa0155532 R11: ffffffffffffffe8 R12:
>>>>>>> ffff8800e98c4000
>>>>>>> [ 2.246912] R13: ffff8800e98c52b8 R14: 0000000000000020 R15:
>>>>>>> ffff8800353e15c0
>>>>>>> [ 2.246912] FS: 0000000000000000(0000) GS:ffff8800efc20000(0000)
>>>>>>> knlGS:0000000000000000
>>>>>>> [ 2.246912] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [ 2.246912] CR2: 00007f1b615ef000 CR3: 00000000e2b44000 CR4:
>>>>>>> 00000000001406e0
>>>>>>> [ 2.246912] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>>>>> 0000000000000000
>>>>>>> [ 2.246912] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>>>>>>> 0000000000000400
>>>>>>> [ 2.246912] Stack:
>>>>>>> [ 2.246912] 0000000000000020 0000000000000001 00000020a0157217
>>>>>>> 00000100e98bfdbc
>>>>>>> [ 2.246912] 0000000027efa3ef ffff8800e98bfdbc ffff8800e98ce000
>>>>>>> ffff8800e98c4000
>>>>>>> [ 2.246912] ffff8800e98ce040 0000000000000001 ffff8800e98bfe08
>>>>>>> ffffffffa0155d4c
>>>>>>> [ 2.246912] Call Trace:
>>>>>>> [ 2.246912] [<ffffffffa0155d4c>] blkback_changed+0x4ec/0xfc8
>>>>>>> [xen_blkfront]
>>>>>>> [ 2.246912] [<ffffffff813a6fd0>] ? xenbus_gather+0x170/0x190
>>>>>>> [ 2.246912] [<ffffffff816322f5>] ? __slab_free+0x10e/0x277
>>>>>>> [ 2.246912] [<ffffffff813a805d>]
>>>>>>> xenbus_otherend_changed+0xad/0x110
>>>>>>> [ 2.246912] [<ffffffff813a7257>] ? xenwatch_thread+0x77/0x180
>>>>>>> [ 2.246912] [<ffffffff813a9ba3>] backend_changed+0x13/0x20
>>>>>>> [ 2.246912] [<ffffffff813a7246>] xenwatch_thread+0x66/0x180
>>>>>>> [ 2.246912] [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30
>>>>>>> [ 2.246912] [<ffffffff813a71e0>] ?
>>>>>>> unregister_xenbus_watch+0x1f0/0x1f0
>>>>>>> [ 2.246912] [<ffffffff810a5aef>] kthread+0xcf/0xe0
>>>>>>> [ 2.246912] [<ffffffff810a5a20>] ?
>>>>>>> kthread_create_on_node+0x140/0x140
>>>>>>> [ 2.246912] [<ffffffff81646118>] ret_from_fork+0x58/0x90
>>>>>>> [ 2.246912] [<ffffffff810a5a20>] ?
>>>>>>> kthread_create_on_node+0x140/0x140
>>>>>>> [ 2.246912] Code: e1 48 85 c0 75 ce 49 8d 84 24 40 01 00 00 48 89
>>>>>>> 45
>>>>>>> b8 e9 91 fd ff ff 4c 89 ff e8 8d ae 06 e1 e9 f2 fc ff ff 31 c0 e9 2e
>>>>>>> fe
>>>>>>> ff ff <0f> 0b e8 9a 57 f2 e0 0f 0b 0f 1f 84 00 00 00 00 00 0f 1f 44
>>>>>>> 00
>>>>>>> [ 2.246912] RIP [<ffffffffa015584f>]
>>>>>>> blkfront_setup_indirect+0x41f/0x430 [xen_blkfront]
>>>>>>> [ 2.246912] RSP <ffff8800e98bfcd0>
>>>>>>> [ 2.491574] ---[ end trace 8a9b992812627c71 ]---
>>>>>>> [ 2.495618] Kernel panic - not syncing: Fatal exception
>>>>>>> ------------------------------------
>>>>>>>
>>>>>>> Xen version 4.2.
>>>>>>>
>>>>>>> EC2 instance type: c3.large with EBS magnetic storage, if that
>>>>>>> matters.
>>>>>>>
>>>>>>> Here is the code where the BUG_ON triggers (drivers/block/xen-
>>>>>>> blkfront.c):
>>>>>>> ------------------------------------
>>>>>>> if (!info->feature_persistent && info->max_indirect_segments) {
>>>>>>> /*
>>>>>>> * We are using indirect descriptors but not persistent
>>>>>>> * grants, we need to allocate a set of pages that can be
>>>>>>> * used for mapping indirect grefs
>>>>>>> */
>>>>>>> int num = INDIRECT_GREFS(segs) * BLK_RING_SIZE;
>>>>>>>
>>>>>>> BUG_ON(!list_empty(&info->indirect_pages)); // << This one hits.
>>>>>>> for (i = 0; i < num; i++) {
>>>>>>> struct page *indirect_page = alloc_page(GFP_NOIO);
>>>>>>> if (!indirect_page)
>>>>>>> goto out_of_memory;
>>>>>>> list_add(&indirect_page->lru, &info->indirect_pages);
>>>>>>> }
>>>>>>> }
>>>>>>> ------------------------------------
>>>>>>>
>>>>>>> As we checked, 'info->indirect_pages' list indeed contained around
>>>>>>> 30
>>>>>>> elements at that point.
>>>>>>>
>>>>>>> Any ideas what may cause this and how to fix it?
>>>>>>>
>>>>>>> If any other data are needed, please let me know.
>>>>>>>
>>>>>>> References:
>>>>>>> [1] https://bugs.openvz.org/browse/OVZ-6718
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-08-10 12:49 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-06 8:24 [BUG] kernel BUG at drivers/block/xen-blkfront.c:1711 Evgenii Shatokhin
2016-06-06 8:42 ` Dario Faggioli
2016-07-11 8:50 ` Evgenii Shatokhin
2016-07-11 10:37 ` George Dunlap
2016-07-11 14:34 ` Evgenii Shatokhin
2016-07-11 12:04 ` Bob Liu
2016-07-11 14:08 ` Evgenii Shatokhin
2016-07-14 11:49 ` Evgenii Shatokhin
2016-07-14 12:04 ` Bob Liu
2016-07-14 12:53 ` Evgenii Shatokhin
2016-08-10 12:33 ` Evgenii Shatokhin
2016-08-10 12:49 ` Bob Liu [this message]
2016-08-10 14:54 ` Evgenii Shatokhin
2016-08-11 2:10 ` Bob Liu
2016-08-11 7:45 ` Evgenii Shatokhin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57AB22E4.1040208@oracle.com \
--to=bob.liu@oracle.com \
--cc=George.Dunlap@citrix.com \
--cc=dario.faggioli@citrix.com \
--cc=david.vrabel@citrix.com \
--cc=eshatokhin@virtuozzo.com \
--cc=jgross@suse.com \
--cc=khorenko@virtuozzo.com \
--cc=roger.paumonne@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.