From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: Re: [PATCH] blkfront: Move blkif_interrupt into a tasklet. Date: Mon, 29 Aug 2011 15:46:33 -0400 Message-ID: <20110829194633.GB16530@dumpdata.com> References: <4C802934.2000305@goop.org> <4C9B7B69.7080705@redhat.com> <4C9B7F1A.2040302@goop.org> <4C9B826B.10302@redhat.com> <4C9B9E1D.2040501@goop.org> <1313494014833-4704111.post@n5.nabble.com> <20110817023827.GA21468@dumpdata.com> <4E4B84C7.9000507@redhat.com> <20110824153605.GB8311@dumpdata.com> <4E5528AA.2090804@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <4E5528AA.2090804@redhat.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Igor Mammedov Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On Wed, Aug 24, 2011 at 06:36:58PM +0200, Igor Mammedov wrote: > On 08/24/2011 05:36 PM, Konrad Rzeszutek Wilk wrote: > >On Wed, Aug 17, 2011 at 11:07:19AM +0200, Igor Mammedov wrote: > >>On 08/17/2011 04:38 AM, Konrad Rzeszutek Wilk wrote: > >>>On Tue, Aug 16, 2011 at 04:26:55AM -0700, imammedo wrote: > >>>> > >>>>Jeremy Fitzhardinge wrote: > >>>>> > >>>>>Have you tried bisecting to see when this particular problem appeared? > >>>>>It looks to me like something is accidentally re-enabling interrupts - > >>>>>perhaps a stack overrun is corrupting the "flags" argument between a > >>>>>spin_lock_irqsave()/restore pair. > >>>>> > >>>>>Is it only on 32-bit kernels? > >>>>> > >>>> ------------[ cut here ]------------ > >>>>[604001.659925] WARNING: at block/blk-core.c:239 blk_start_queue+0x70/0x80() > >>>>[604001.659964] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl > >>>>sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 > >>>>nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables xen_netfront > >>>>pcspkr [last unloaded: scsi_wait_scan] > >>>>[604001.660147] Pid: 336, comm: udevd Tainted: G W 3.0.0+ #50 > >>>>[604001.660181] Call Trace: > >>>>[604001.660209] [] warn_slowpath_common+0x72/0xa0 > >>>>[604001.660243] [] ? blk_start_queue+0x70/0x80 > >>>>[604001.660275] [] ? blk_start_queue+0x70/0x80 > >>>>[604001.660310] [] warn_slowpath_null+0x22/0x30 > >>>>[604001.660343] [] blk_start_queue+0x70/0x80 > >>>>[604001.660379] [] kick_pending_request_queues+0x21/0x30 > >>>>[604001.660417] [] blkif_interrupt+0x19f/0x2b0 > >>>>... > >>>> ------------[ cut here ]------------ > >>>> > >>>>I've debugged a bit blk-core warning and can say: > >>>> - Yes, It is 32-bit PAE kernel and happens only with it so far. > >>>> - Affects PV xen guest, bare-metal and kvm configs are not affected. > >>>> - Upstream kernel is affected as well. > >>>> - Reproduces on xen 4.1.1 and 3.1.2 hosts > >>> > >>>And the dom0 is 2.6.18 right? This problem is not present > >>>when you use a 3.0 dom0? > >> > >>For xen 4.1.1 testing, I've used as dom0 Jeremy's 2.6.32.43 > > > >Jeremy pointed me to this: > >https://patchwork.kernel.org/patch/1091772/ > >(and http://groups.google.com/group/linux.kernel/browse_thread/thread/39a397566cafc979) > >which looks to have a similar backtrack. > > > >Perhaps Peter's fix solves the issue? > > > I've applied patches: > sched-separate-the-scheduler-entry-for-preemption.patch > sched-move-blk_schedule_flush_plug-out-of-__schedule.patch > block-shorten-interrupt-disabled-regions.patch > > Unfortunately these patches don't help, the problem is still there. Those patches were a bit fresh. Both Peter and Thomas have some updated ones: http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-tip.git;a=commitdiff;h=9c40cef2b799f9b5e7fa5de4d2ad3a0168ba118c http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-tip.git;a=commitdiff;h=c259e01a1ec90063042f758e409cd26b2a0963c8 Please try those out