From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eli Cohen Subject: Re: Re: Possible process deadlock in RMPP flow Date: Tue, 20 Oct 2009 09:48:59 +0200 Message-ID: <20091020074859.GA27129@mtls03> References: <20090923150454.GA26150@mtls03> <7A32EEE20DF5432CADB60B8F8B1E0093@amr.corp.intel.com> <20090923172532.GA32223@mtls03> <4ABB13F3.1060702@Voltaire.com> <20090924073601.GA28876@mtls03> <4AC848E9.6040909@voltaire.com> <4AC8BE74.3050200@mellanox.co.il> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ewg-bounces-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org Errors-To: ewg-bounces-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org To: Sean Hefty Cc: Linux RDMA list , ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org, Roland Dreier List-Id: linux-rdma@vger.kernel.org On Mon, Oct 19, 2009 at 01:30:47PM -0700, Sean Hefty wrote: > > I can't find anything off in the code for this. It's odd, since > unregister_mad_agent() does: > > flush_workqueue(port_priv->wq); > ib_cancel_rmpp_recvs(mad_agent_priv); > > and ib_cancel_rmpp_recvs() does: > > spin_lock_irqsave(&agent->lock, flags); > list_for_each_entry(rmpp_recv, &agent->rmpp_list, list) { > cancel_delayed_work(&rmpp_recv->timeout_work); > cancel_delayed_work(&rmpp_recv->cleanup_work); > } > spin_unlock_irqrestore(&agent->lock, flags); > > flush_workqueue(agent->qp_info->port_priv->wq); > > which basically just flushes the same work queue. > > I haven't been able to reproduce the problem, but I'm running the latest kernel > - not sure that matters in this case. Does ibnetdiscover just hang forever at > the end of the test when this occurs? Is there any more information available? > We are checking if the problem is a firmware bug, it looks like it. Once we verify this I will send an update.