stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Wei Fang <fangwei1@huawei.com>,
	James Bottomley <jejb@linux.vnet.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>
Subject: [PATCH 3.14 48/53] scsi: fix race between simultaneous decrements of ->host_failed
Date: Mon, 25 Jul 2016 13:55:30 -0700	[thread overview]
Message-ID: <20160725203516.490330178@linuxfoundation.org> (raw)
In-Reply-To: <20160725203514.202312855@linuxfoundation.org>

3.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Wei Fang <fangwei1@huawei.com>

commit 72d8c36ec364c82bf1bf0c64dfa1041cfaf139f7 upstream.

sas_ata_strategy_handler() adds the works of the ata error handler to
system_unbound_wq. This workqueue asynchronously runs work items, so the
ata error handler will be performed concurrently on different CPUs. In
this case, ->host_failed will be decreased simultaneously in
scsi_eh_finish_cmd() on different CPUs, and become abnormal.

It will lead to permanently inequality between ->host_failed and
->host_busy, and scsi error handler thread won't start running. IO
errors after that won't be handled.

Since all scmds must have been handled in the strategy handler, just
remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy after
the strategy handler to fix this race.

Fixes: 50824d6c5657 ("[SCSI] libsas: async ata-eh")
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Reviewed-by: James Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 Documentation/scsi/scsi_eh.txt |    8 ++++++--
 drivers/ata/libata-eh.c        |    2 +-
 drivers/scsi/scsi_error.c      |    4 +++-
 3 files changed, 10 insertions(+), 4 deletions(-)

--- a/Documentation/scsi/scsi_eh.txt
+++ b/Documentation/scsi/scsi_eh.txt
@@ -263,19 +263,23 @@ scmd->allowed.
 
  3. scmd recovered
     ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
-	- shost->host_failed--
 	- clear scmd->eh_eflags
 	- scsi_setup_cmd_retry()
 	- move from local eh_work_q to local eh_done_q
     LOCKING: none
+    CONCURRENCY: at most one thread per separate eh_work_q to
+		 keep queue manipulation lockless
 
  4. EH completes
     ACTION: scsi_eh_flush_done_q() retries scmds or notifies upper
-	    layer of failure.
+	    layer of failure. May be called concurrently but must have
+	    a no more than one thread per separate eh_work_q to
+	    manipulate the queue locklessly
 	- scmd is removed from eh_done_q and scmd->eh_entry is cleared
 	- if retry is necessary, scmd is requeued using
           scsi_queue_insert()
 	- otherwise, scsi_finish_command() is invoked for scmd
+	- zero shost->host_failed
     LOCKING: queue or finish function performs appropriate locking
 
 
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -604,7 +604,7 @@ void ata_scsi_error(struct Scsi_Host *ho
 	ata_scsi_port_error_handler(host, ap);
 
 	/* finish or retry handled scmd's and clean up */
-	WARN_ON(host->host_failed || !list_empty(&eh_work_q));
+	WARN_ON(!list_empty(&eh_work_q));
 
 	DPRINTK("EXIT\n");
 }
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1111,7 +1111,6 @@ static int scsi_eh_action(struct scsi_cm
  */
 void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
 {
-	scmd->device->host->host_failed--;
 	scmd->eh_eflags = 0;
 	list_move_tail(&scmd->eh_entry, done_q);
 }
@@ -2193,6 +2192,9 @@ int scsi_error_handler(void *data)
 		else
 			scsi_unjam_host(shost);
 
+		/* All scmds have been handled */
+		shost->host_failed = 0;
+
 		/*
 		 * Note - if the above fails completely, the action is to take
 		 * individual devices offline and flush the queue of any



  parent reply	other threads:[~2016-07-25 20:55 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-25 20:54 [PATCH 3.14 00/53] 3.14.74-stable review Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 01/53] crypto: ux500 - memmove the right size Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 02/53] sit: correct IP protocol used in ipip6_err Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 03/53] ipmr/ip6mr: Initialize the last assert time of mfc entries Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 04/53] net: alx: Work around the DMA RX overflow issue Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 05/53] usb: quirks: Add no-lpm quirk for Acer C120 LED Projector Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 06/53] usb: musb: Stop bulk endpoint while queue is rotated Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 07/53] usb: musb: Ensure rx reinit occurs for shared_fifo endpoints Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 08/53] mac80211: mesh: flush mesh paths unconditionally Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 09/53] mac80211_hwsim: Add missing check for HWSIM_ATTR_SIGNAL Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 10/53] IB/mlx4: Properly initialize GRH TClass and FlowLabel in AHs Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 11/53] powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 12/53] powerpc/pseries: Fix PCI config address for DDW Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 13/53] powerpc/tm: Always reclaim in start_thread() for exec() class syscalls Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 14/53] USB: EHCI: declare hostpc register as zero-length array Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 15/53] x86, build: copy ldlinux.c32 to image.iso Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 16/53] kprobes/x86: Clear TF bit in fault on single-stepping Greg Kroah-Hartman
2016-07-25 20:54 ` [PATCH 3.14 17/53] x86/amd_nb: Fix boot crash on non-AMD systems Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 18/53] make nfs_atomic_open() call d_drop() on all ->open_context() errors Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 19/53] NFS: Fix another OPEN_DOWNGRADE bug Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 20/53] ARM: 8578/1: mm: ensure pmd_present only checks the valid bit Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 21/53] mm: Export migrate_page_move_mapping and migrate_page_copy Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 22/53] UBIFS: Implement ->migratepage() Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 23/53] posix_acl: Add set_posix_acl Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 24/53] nfsd: check permissions when setting ACLs Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 25/53] signal: remove warning about using SI_TKILL in rt_[tg]sigqueueinfo Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 28/53] KEYS: potential uninitialized variable Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 29/53] kvm: Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 30/53] HID: elo: kill not flush the work Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 31/53] HID: hiddev: validate num_values for HIDIOCGUSAGES, HIDIOCSUSAGES commands Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 32/53] tracing: Handle NULL formats in hold_module_trace_bprintk_format() Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 33/53] base: make module_create_drivers_dir race-free Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 34/53] drm/radeon: fix asic initialization for virtualized environments Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 36/53] perf/x86: Honor the architectural performance monitoring version Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 37/53] perf/x86: Fix undefined shift on 32-bit kernels Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 38/53] iio: Fix error handling in iio_trigger_attach_poll_func Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 39/53] staging: iio: accel: fix error check Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 40/53] iio: accel: kxsd9: fix the usage of spi_w8r8() Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 41/53] iio:ad7266: Fix broken regulator error handling Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 42/53] iio:ad7266: Fix support for optional regulators Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 43/53] iio:ad7266: Fix probe deferral for vref Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 44/53] tty/vt/keyboard: fix OOB access in do_compute_shiftstate() Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 45/53] ALSA: dummy: Fix a use-after-free at closing Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 46/53] ALSA: au88x0: Fix calculation in vortex_wtdma_bufshift() Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 47/53] ALSA: ctl: Stop notification after disconnection Greg Kroah-Hartman
2016-07-25 20:55 ` Greg Kroah-Hartman [this message]
2016-07-25 20:55 ` [PATCH 3.14 49/53] Fix reconnect to not defer smb3 session reconnect long after socket reconnect Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 50/53] xen/acpi: allow xen-acpi-processor driver to load on Xen 4.7 Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 51/53] tmpfs: dont undo fallocate past its last page Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 52/53] tmpfs: fix regression hang in fallocate undo Greg Kroah-Hartman
2016-07-25 20:55 ` [PATCH 3.14 53/53] s390/seccomp: fix error return for filtered system calls Greg Kroah-Hartman
2016-07-26  1:52 ` [PATCH 3.14 00/53] 3.14.74-stable review Shuah Khan
2016-07-26 13:50 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160725203516.490330178@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=fangwei1@huawei.com \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).