From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57928) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YyLyB-0000li-7w for qemu-devel@nongnu.org; Fri, 29 May 2015 11:13:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YyLy4-0007D0-Tp for qemu-devel@nongnu.org; Fri, 29 May 2015 11:13:11 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56870) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YyLy4-0007Cm-MX for qemu-devel@nongnu.org; Fri, 29 May 2015 11:13:04 -0400 Date: Fri, 29 May 2015 16:12:56 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20150529151256.GA20215@work-vm> References: <1432196001-10352-1-git-send-email-zhang.zhanghailiang@huawei.com> <20150528162402.GE2127@work-vm> <5567C104.3070805@cn.fujitsu.com> <55681DF5.50206@huawei.com> <20150529084249.GC2127@work-vm> <55685CCA.2010604@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55685CCA.2010604@cn.fujitsu.com> Subject: Re: [Qemu-devel] [PATCH COLO-Frame v5 00/29] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Wen Congyang Cc: zhanghailiang , lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, qemu-devel@nongnu.org, peter.huangpeng@huawei.com, arei.gonglei@huawei.com, netfilter-devel@vger.kernel.org, amit.shah@redhat.com, david@gibson.dropbear.id.au * Wen Congyang (wency@cn.fujitsu.com) wrote: > On 05/29/2015 04:42 PM, Dr. David Alan Gilbert wrote: > > * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: > >> On 2015/5/29 9:29, Wen Congyang wrote: > >>> On 05/29/2015 12:24 AM, Dr. David Alan Gilbert wrote: > >>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: > >>>> The colo-proxy rcu problem I hit shows as rcu-stalls in both primary and secondary > >>>> after the qemu quits; the backtrace of the qemu stack is: > >>> > >>> How to reproduce it? Use monitor command quit to quit qemu? Or kill the qemu? > >>> > >>>> > >>>> [] wait_rcu_gp+0x5c/0x80 > >>>> [] synchronize_rcu+0x45/0xd0 > >>>> [] colo_node_release+0x35/0x50 [nfnetlink_colo] > >>>> [] colonl_close_event+0xe5/0x160 [nfnetlink_colo] > >>>> [] notifier_call_chain+0x66/0x90 > >>>> [] atomic_notifier_call_chain+0x6c/0x110 > >>>> [] netlink_release+0x5b7/0x7f0 > >>>> [] sock_release+0x1f/0x90 > >>>> [] sock_close+0x12/0x20 > >>>> [] __fput+0xd3/0x210 > >>>> [] ____fput+0xe/0x10 > >>>> [] task_work_run+0xb7/0xf0 > >>>> [] do_notify_resume+0x8d/0xa0 > >>>> [] int_signal+0x12/0x17 > >>>> [] 0xffffffffffffffff > >>> > >>> Thanks for your test. The backtrace is very useful, and we will fix it soon. > >>> > >> > >> Yes, it is a bug, the callback function colonl_close_event() is called when holding > >> rcu lock: > >> netlink_release > >> ->atomic_notifier_call_chain > >> ->rcu_read_lock(); > >> ->notifier_call_chain > >> ->ret = nb->notifier_call(nb, val, v); > >> And here it is wrong to call synchronize_rcu which will lead to sleep. > >> Besides, there is another function might lead to sleep, kthread_stop which is called > >> in destroy_notify_cb. > >> > >>>> > >>>> that's with both the 423a8e268acbe3e644a16c15bc79603cfe9eb084 from yesterday and > >>>> older e58e5152b74945871b00a88164901c0d46e6365e tags on colo-proxy. > >>>> I'm not sure of the right fix; perhaps it might be possible to replace the > >>>> synchronize_rcu in colo_node_release by a call_rcu that does the kfree later? > >>> > >>> I agree with it. > >> > >> That is a good solution, i will fix both of the above problems. > > > > Thanks, > > We have fix this problem, and test it. The patch is pushed to github, please try it. Yes, that works. Thank you very much for the quick fix. Dave > > Thanks > Wen Congyang > > > > > Dave > > > >> > >> Thanks, > >> zhanghailiang > >> > >>> > >>>> > >>>> Thanks, > >>>> > >>>> Dave > >>>> > >>>>> > >>> > >>> > >>> . > >>> > >> > >> > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > -- > > To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > . > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK