All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <kernel@kyup.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: BUG_ON in rcu_sync_func triggered
Date: Wed, 14 Sep 2016 09:30:26 +0300	[thread overview]
Message-ID: <57D8EE82.3090502@kyup.com> (raw)
In-Reply-To: <20160913152042.GA30160@redhat.com>



On 09/13/2016 06:20 PM, Oleg Nesterov wrote:
> On 09/13, Nikolay Borisov wrote:
>>
>> On 09/13/2016 05:35 PM, Nikolay Borisov wrote:
>>>
>>> On 09/13/2016 04:43 PM, Oleg Nesterov wrote:
>>>> On 09/13, Oleg Nesterov wrote:
>>>>>
>>>>> OK... perhaps the unbalanced up_write... I'll try to look at freeze/thaw code,
>>>>
>>>> Heh, yes, it looks racy or I am totally confused.
>>>>
>>>>> could test the debugging patch below meanwhile?
>>>>
>>>> Yes please. I'll send you another patch (hopefully fix) later, but it
>>>> would be nice if you can test this patch to get more info.
>>>
>>> I've already started testing with this patch on 4.4.20 this time
> 
> I think it would be better to stay with the same kernel version to
> debug the problem...
> 
>> Actually forget that, here is a warning that this triggered:
>>
>> [  844.290454] WARNING: CPU: 2 PID: 1900 at kernel/rcu/sync.c:160 rcu_sync_func+0xc8/0x150()
>> ...
>> [  844.754708] XXX: ffff88047527da78 gp=2 cnt=0 cb=1
> 
> Hmm. Thanks. Please show us all the warnings you get.

What I showed you yesterday was the only warning that I got. On the same kernel another run produced the following warnings: 

[  556.869624] ------------[ cut here ]------------
[  556.875128] WARNING: CPU: 6 PID: 20210 at kernel/rcu/sync.c:111 rcu_sync_enter+0x148/0x1a0()
[  556.885142] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio
[  556.960028] CPU: 6 PID: 20210 Comm: fsfreeze Not tainted 4.4.20-clouder1 #9
[  556.968143] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012
[  556.975963]  0000000000000000 ffff880256b43cf8 ffffffff81307a9b 0000000000004ef2
[  556.985182]  0000000000000000 0000000000000000 000000000000006f ffff880256b43d38
[  556.994383]  ffffffff81054a85 ffff880473326b10 ffff880473326b08 ffff880473326b10
[  557.003589] Call Trace:
[  557.006656]  [<ffffffff81307a9b>] dump_stack+0x6b/0xa0
[  557.012737]  [<ffffffff81054a85>] warn_slowpath_common+0x95/0xe0
[  557.019781]  [<ffffffff81054aea>] warn_slowpath_null+0x1a/0x20
[  557.026645]  [<ffffffff810ab9a8>] rcu_sync_enter+0x148/0x1a0
[  557.033309]  [<ffffffff8109c9be>] percpu_down_write+0x1e/0xf0
[  557.040074]  [<ffffffff81315683>] ? call_rwsem_down_write_failed+0x13/0x20
[  557.048092]  [<ffffffff811a868b>] freeze_super+0xab/0x1b0
[  557.054456]  [<ffffffff811b7c0d>] do_vfs_ioctl+0x29d/0x560
[  557.060920]  [<ffffffff811aae7e>] ? SYSC_newfstat+0x2e/0x40
[  557.067480]  [<ffffffff811b7f62>] SyS_ioctl+0x92/0xa0
[  557.073465]  [<ffffffff8163c357>] entry_SYSCALL_64_fastpath+0x12/0x6a
[  557.081015] ---[ end trace fc087420ac1d8f16 ]---
[  557.086507] XXX: ffff880473326b08 gp=2 cnt=-1 cb=1
[  557.092326] rbd: rbd19: added with size 0x500000000

This is: if (WARN_ON(rsp->gp_count < 0)) xxx(rsp);



[  557.094835] ------------[ cut here ]------------
[  557.094841] WARNING: CPU: 1 PID: 0 at kernel/rcu/sync.c:159 rcu_sync_func+0xf9/0x150()
[  557.094873] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio
[  557.094876] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.4.20-clouder1 #9
[  557.094877] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012
[  557.094879]  0000000000000000 ffff880277c83d78 ffffffff81307a9b 0000000000000000
[  557.094881]  0000000000000000 0000000000000000 000000000000009f ffff880277c83db8
[  557.094882]  ffffffff81054a85 0000000000000000 ffff880473326bc0 ffff880473326b98
[  557.094882] Call Trace:
[  557.094889]  <IRQ>  [<ffffffff81307a9b>] dump_stack+0x6b/0xa0
[  557.094892]  [<ffffffff81054a85>] warn_slowpath_common+0x95/0xe0
[  557.094893]  [<ffffffff81054aea>] warn_slowpath_null+0x1a/0x20
[  557.094894]  [<ffffffff810ab7b9>] rcu_sync_func+0xf9/0x150
[  557.094897]  [<ffffffff810b0620>] rcu_process_callbacks+0x290/0x740
[  557.094900]  [<ffffffff810590f3>] __do_softirq+0x113/0x330
[  557.094902]  [<ffffffff810593e5>] irq_exit+0x75/0x80
[  557.094905]  [<ffffffff8163ea16>] smp_apic_timer_interrupt+0x46/0x55
[  557.094907]  [<ffffffff8163d069>] apic_timer_interrupt+0x89/0x90
[  557.094911]  <EOI>  [<ffffffff81530cbc>] ? cpuidle_enter_state+0x14c/0x2e0
[  557.094912]  [<ffffffff81530cb1>] ? cpuidle_enter_state+0x141/0x2e0
[  557.094914]  [<ffffffff81530e67>] cpuidle_enter+0x17/0x20
[  557.094916]  [<ffffffff81096d70>] cpu_startup_entry+0x310/0x370
[  557.094919]  [<ffffffff810c2aac>] ? clockevents_config_and_register+0x2c/0x40
[  557.094923]  [<ffffffff81035359>] start_secondary+0xf9/0x100
[  557.094924] ---[ end trace fc087420ac1d8f17 ]---
[  557.094925] XXX: ffff880473326b98 gp=2 cnt=-1 cb=1

This is: 
if (WARN_ON(rsp->gp_count < 0)) xxx(rsp);

I'd say the warnings are pretty consistent between 4.4.14 and 4.4.20. 
Another run on the same kernel (no reboots between tries) produced the following: 

[  700.981151] ------------[ cut here ]------------
[  700.986643] WARNING: CPU: 3 PID: 2428 at kernel/rcu/sync.c:111 rcu_sync_enter+0x148/0x1a0()
[  700.996541] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio
[  701.071212] CPU: 3 PID: 2428 Comm: fsfreeze Tainted: G        W       4.4.20-clouder1 #9
[  701.080816] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012
[  701.088530]  0000000000000000 ffff88006b5efcf8 ffffffff81307a9b 000000000000097c
[  701.097711]  0000000000000000 0000000000000000 000000000000006f ffff88006b5efd38
[  701.106870]  ffffffff81054a85 ffff880079ab9310 ffff880079ab9308 ffff880079ab9310
[  701.116039] Call Trace:
[  701.119084]  [<ffffffff81307a9b>] dump_stack+0x6b/0xa0
[  701.125149]  [<ffffffff81054a85>] warn_slowpath_common+0x95/0xe0
[  701.132184]  [<ffffffff81054aea>] warn_slowpath_null+0x1a/0x20
[  701.139027]  [<ffffffff810ab9a8>] rcu_sync_enter+0x148/0x1a0
[  701.145674]  [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
[  701.152221]  [<ffffffff8109c9be>] percpu_down_write+0x1e/0xf0
[  701.158962]  [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
[  701.165511]  [<ffffffff811a868b>] freeze_super+0xab/0x1b0
[  701.171867]  [<ffffffff811b7c0d>] do_vfs_ioctl+0x29d/0x560
[  701.178319]  [<ffffffff811aae7e>] ? SYSC_newfstat+0x2e/0x40
[  701.184959]  [<ffffffff811b7f62>] SyS_ioctl+0x92/0xa0
[  701.190926]  [<ffffffff8163c357>] entry_SYSCALL_64_fastpath+0x12/0x6a
[  701.198446] ---[ end trace fc087420ac1d8f18 ]---
[  701.203915] XXX: ffff880079ab9308 gp=0 cnt=-1 cb=0

[  701.209624] ------------[ cut here ]------------
[  701.215193] WARNING: CPU: 3 PID: 2428 at kernel/rcu/sync.c:118 rcu_sync_enter+0x7e/0x1a0()
[  701.224996] Modules linked in: xt_state act_police cls_basic sch_ingress veth rbd libceph openvswitch nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp xt_owner iptable_mangle xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_CT iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ip6table_filter ip6_tables rdma_ucm ib_ucm ib_uverbs rdma_cm iw_cm dm_mirror dm_region_hash dm_log ib_umad ib_ipoib ib_cm ib_sa ib_mad ib_core ib_addr ipv6 x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32_pclmul ixgbe mdio ipmi_devintf ipmi_si ipmi_msghandler igb i2c_algo_bit sb_edac edac_core i2c_i801 lpc_ich mfd_core ioatdma dca shpchp dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio
[  701.299721] CPU: 3 PID: 2428 Comm: fsfreeze Tainted: G        W       4.4.20-clouder1 #9
[  701.309325] Hardware name: Supermicro X9DRW/X9DRW, BIOS 1.0b 10/11/2012
[  701.317038]  0000000000000000 ffff88006b5efcf8 ffffffff81307a9b 000000000000097c
[  701.326203]  0000000000000000 0000000000000000 0000000000000076 ffff88006b5efd38
[  701.335382]  ffffffff81054a85 0000000000000007 ffff880079ab9308 ffff880079ab9310
[  701.344556] Call Trace:
[  701.347610]  [<ffffffff81307a9b>] dump_stack+0x6b/0xa0
[  701.353674]  [<ffffffff81054a85>] warn_slowpath_common+0x95/0xe0
[  701.360705]  [<ffffffff81054aea>] warn_slowpath_null+0x1a/0x20
[  701.367543]  [<ffffffff810ab8de>] rcu_sync_enter+0x7e/0x1a0
[  701.374090]  [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
[  701.380637]  [<ffffffff8109c9be>] percpu_down_write+0x1e/0xf0
[  701.387379]  [<ffffffff8107b0b2>] ? __might_sleep+0x52/0x90
[  701.393928]  [<ffffffff811a868b>] freeze_super+0xab/0x1b0
[  701.400281]  [<ffffffff811b7c0d>] do_vfs_ioctl+0x29d/0x560
[  701.406737]  [<ffffffff811aae7e>] ? SYSC_newfstat+0x2e/0x40
[  701.413283]  [<ffffffff811b7f62>] SyS_ioctl+0x92/0xa0
[  701.419249]  [<ffffffff8163c357>] entry_SYSCALL_64_fastpath+0x12/0x6a
[  701.426773] ---[ end trace fc087420ac1d8f19 ]---
[  701.432273] XXX: ffff880079ab9308 gp=1 cnt=0 cb=0

Regards, 
Nikolay 


> 
> Oleg.
> 

  reply	other threads:[~2016-09-14  6:30 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <57D69CEC.5010103@kyup.com>
2016-09-12 13:01 ` BUG_ON in rcu_sync_func triggered Oleg Nesterov
2016-09-13  8:21   ` Nikolay Borisov
2016-09-13 13:18     ` Oleg Nesterov
2016-09-13 13:43       ` Oleg Nesterov
2016-09-13 14:35         ` Nikolay Borisov
2016-09-13 14:38           ` Nikolay Borisov
2016-09-13 14:39             ` Nikolay Borisov
2016-09-13 15:20             ` Oleg Nesterov
2016-09-14  6:30               ` Nikolay Borisov [this message]
2016-09-14 12:58                 ` Oleg Nesterov
2016-09-23 13:35                   ` Nikolay Borisov
2016-09-23 13:52                     ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57D8EE82.3090502@kyup.com \
    --to=kernel@kyup.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.