From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: [PATCH] mlx4: prevent the device from being removed concurrently Date: Tue, 28 Feb 2012 15:46:57 -0500 (EST) Message-ID: <20120228.154657.1817512578346429850.davem@davemloft.net> References: <1330454176-17768-1-git-send-email-cascardo@linux.vnet.ibm.com> <20120228.143051.352474620462899753.davem@davemloft.net> <20120228203438.GA12028@oc1711230544.ibm.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: yevgenyp-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org, jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org To: cascardo-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org Return-path: In-Reply-To: <20120228203438.GA12028-/9mL1TZGaJOu3CHPIDa7bVaTQe2KTcn/@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org From: Thadeu Lima de Souza Cascardo Date: Tue, 28 Feb 2012 17:34:38 -0300 > On Tue, Feb 28, 2012 at 02:30:51PM -0500, David Miller wrote: >> From: Thadeu Lima de Souza Cascardo >> Date: Tue, 28 Feb 2012 15:36:16 -0300 >> >> > When a EEH happens, the catas poll code will try to restart the device, >> > removing it and adding it back again. The EEH code will try to do the >> > same. One of the threads ends up accessing memory that was freed by the >> > other thread and we get a crash. >> >> Stop adding bandaids to the locking. >> >> If the EEH infrastructure doesn't synchronize parallel operations >> on the same device, that is the real bug, and that's where the real >> fix belongs. >> >> I refuse to apply this patch. >> > > It's not EEH that does not synchronize removal. The problem is that the > driver itself calls the driver remove function through mlx4_restart_one. Then reuse the existing intf_mutex this driver has, export it to main.c and add a new __mlx4_unregister_device that can be called with the intf_mutex held already. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html