From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751494AbbJBIZs (ORCPT ); Fri, 2 Oct 2015 04:25:48 -0400 Received: from mx2.suse.de ([195.135.220.15]:44024 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750809AbbJBIZq (ORCPT ); Fri, 2 Oct 2015 04:25:46 -0400 Message-ID: <560E3F88.7060400@suse.de> Date: Fri, 02 Oct 2015 10:25:44 +0200 From: Hannes Reinecke User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Johannes Thumshirn , James Bottomley , Christoph Hellwig CC: linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] SCSI: Fix hard lockup in scsi_remove_target() References: <1443774062-15638-1-git-send-email-jthumshirn@suse.de> In-Reply-To: <1443774062-15638-1-git-send-email-jthumshirn@suse.de> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/02/2015 10:21 AM, Johannes Thumshirn wrote: > Removing a SCSI target via scsi_remove_target() suspected to be racy. When a > sibling get's removed from the list it can occassionly happen that one CPU is > stuck endlessly looping around this code block > > list_for_each_entry(starget, &shost->__targets, siblings) { > if (starget->state == STARGET_DEL) > continue; > > Resulting in the following hard lockup. > > Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 > [...] > Call Trace: > [] dump_trace+0x7d/0x2d0 > [] show_stack_log_lvl+0x94/0x170 > [] show_stack+0x21/0x50 > [] dump_stack+0x41/0x51 > [] panic+0xc8/0x1d7 > [] watchdog_overflow_callback+0xba/0xc0 > [] __perf_event_overflow+0x88/0x240 > [] intel_pmu_handle_irq+0x1fa/0x3e0 > [] perf_event_nmi_handler+0x26/0x40 > [] nmi_handle.isra.2+0x8d/0x180 > [] do_nmi+0x126/0x3c0 > [] end_repeat_nmi+0x1a/0x1e > [] scsi_remove_target+0x68/0x240 [scsi_mod] > [] process_one_work+0x172/0x420 > [] worker_thread+0x11a/0x3c0 > [] kthread+0xb4/0xc0 > [] ret_from_fork+0x58/0x90 > > This patch decouples the list traversal for targets and the reaping of SCSI > targets by moving to be removed targets to a separate reap list. Entries in > this list can then be removed by the SCSI layer in a lockless manner. > > This was discovered by a partner in a 24h stress test. > > Signed-off-by: Johannes Thumshirn > --- > drivers/scsi/scsi_sysfs.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)