linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Rob Millner <rlm@daterainc.com>,
	Vaibhav Tandon <vst@datera.io>,
	"Bryant G. Ly" <bryantly@linux.vnet.ibm.com>,
	Nicholas Bellinger <nab@linux-iscsi.org>
Subject: [PATCH 4.4 20/36] target: Fix NULL dereference during LUN lookup + active I/O shutdown
Date: Mon, 13 Mar 2017 16:39:20 +0800	[thread overview]
Message-ID: <20170313083353.746407930@linuxfoundation.org> (raw)
In-Reply-To: <20170313083352.550085638@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Nicholas Bellinger <nab@linux-iscsi.org>

commit bd4e2d2907fa23a11d46217064ecf80470ddae10 upstream.

When transport_clear_lun_ref() is shutting down a se_lun via
configfs with new I/O in-flight, it's possible to trigger a
NULL pointer dereference in transport_lookup_cmd_lun() due
to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD
checking before incrementing lun->lun_ref.count after
lun->lun_ref has switched to atomic_t mode.

This results in a NULL pointer dereference as LUN shutdown
code in core_tpg_remove_lun() continues running after the
existing ->release() -> core_tpg_lun_ref_release() callback
completes, and clears the RCU protected se_lun->lun_se_dev
pointer.

During the OOPs, the state of lun->lun_ref in the process
which triggered the NULL pointer dereference looks like
the following on v4.1.y stable code:

struct se_lun {
  lun_link_magic = 4294932337,
  lun_status = TRANSPORT_LUN_STATUS_FREE,

  .....

  lun_se_dev = 0x0,
  lun_sep = 0x0,

  .....

  lun_ref = {
    count = {
      counter = 1
    },
    percpu_count_ptr = 3,
    release = 0xffffffffa02fa1e0 <core_tpg_lun_ref_release>,
    confirm_switch = 0x0,
    force_atomic = false,
    rcu = {
      next = 0xffff88154fa1a5d0,
      func = 0xffffffff8137c4c0 <percpu_ref_switch_to_atomic_rcu>
    }
  }
}

To address this bug, use percpu_ref_tryget_live() to ensure
once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref
has switched to atomic_t, all new I/Os will fail to obtain
a new lun->lun_ref reference.

Also use an explicit percpu_ref_kill_and_confirm() callback
to block on ->lun_ref_comp to allow the first stage and
associated RCU grace period to complete, and then block on
->lun_ref_shutdown waiting for the final percpu_ref_put()
to drop the last reference via transport_lun_remove_cmd()
before continuing with core_tpg_remove_lun() shutdown.

Reported-by: Rob Millner <rlm@daterainc.com>
Tested-by: Rob Millner <rlm@daterainc.com>
Cc: Rob Millner <rlm@daterainc.com>
Tested-by: Vaibhav Tandon <vst@datera.io>
Cc: Vaibhav Tandon <vst@datera.io>
Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/target/target_core_device.c    |   10 ++++++++--
 drivers/target/target_core_tpg.c       |    3 ++-
 drivers/target/target_core_transport.c |   31 ++++++++++++++++++++++++++++++-
 include/target/target_core_base.h      |    1 +
 4 files changed, 41 insertions(+), 4 deletions(-)

--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -77,12 +77,16 @@ transport_lookup_cmd_lun(struct se_cmd *
 					&deve->read_bytes);
 
 		se_lun = rcu_dereference(deve->se_lun);
+
+		if (!percpu_ref_tryget_live(&se_lun->lun_ref)) {
+			se_lun = NULL;
+			goto out_unlock;
+		}
+
 		se_cmd->se_lun = rcu_dereference(deve->se_lun);
 		se_cmd->pr_res_key = deve->pr_res_key;
 		se_cmd->orig_fe_lun = unpacked_lun;
 		se_cmd->se_cmd_flags |= SCF_SE_LUN_CMD;
-
-		percpu_ref_get(&se_lun->lun_ref);
 		se_cmd->lun_ref_active = true;
 
 		if ((se_cmd->data_direction == DMA_TO_DEVICE) &&
@@ -96,6 +100,7 @@ transport_lookup_cmd_lun(struct se_cmd *
 			goto ref_dev;
 		}
 	}
+out_unlock:
 	rcu_read_unlock();
 
 	if (!se_lun) {
@@ -826,6 +831,7 @@ struct se_device *target_alloc_device(st
 	xcopy_lun = &dev->xcopy_lun;
 	rcu_assign_pointer(xcopy_lun->lun_se_dev, dev);
 	init_completion(&xcopy_lun->lun_ref_comp);
+	init_completion(&xcopy_lun->lun_shutdown_comp);
 	INIT_LIST_HEAD(&xcopy_lun->lun_deve_list);
 	INIT_LIST_HEAD(&xcopy_lun->lun_dev_link);
 	mutex_init(&xcopy_lun->lun_tg_pt_md_mutex);
--- a/drivers/target/target_core_tpg.c
+++ b/drivers/target/target_core_tpg.c
@@ -539,7 +539,7 @@ static void core_tpg_lun_ref_release(str
 {
 	struct se_lun *lun = container_of(ref, struct se_lun, lun_ref);
 
-	complete(&lun->lun_ref_comp);
+	complete(&lun->lun_shutdown_comp);
 }
 
 int core_tpg_register(
@@ -666,6 +666,7 @@ struct se_lun *core_tpg_alloc_lun(
 	lun->lun_link_magic = SE_LUN_LINK_MAGIC;
 	atomic_set(&lun->lun_acl_count, 0);
 	init_completion(&lun->lun_ref_comp);
+	init_completion(&lun->lun_shutdown_comp);
 	INIT_LIST_HEAD(&lun->lun_deve_list);
 	INIT_LIST_HEAD(&lun->lun_dev_link);
 	atomic_set(&lun->lun_tg_pt_secondary_offline, 0);
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2680,10 +2680,39 @@ void target_wait_for_sess_cmds(struct se
 }
 EXPORT_SYMBOL(target_wait_for_sess_cmds);
 
+static void target_lun_confirm(struct percpu_ref *ref)
+{
+	struct se_lun *lun = container_of(ref, struct se_lun, lun_ref);
+
+	complete(&lun->lun_ref_comp);
+}
+
 void transport_clear_lun_ref(struct se_lun *lun)
 {
-	percpu_ref_kill(&lun->lun_ref);
+	/*
+	 * Mark the percpu-ref as DEAD, switch to atomic_t mode, drop
+	 * the initial reference and schedule confirm kill to be
+	 * executed after one full RCU grace period has completed.
+	 */
+	percpu_ref_kill_and_confirm(&lun->lun_ref, target_lun_confirm);
+	/*
+	 * The first completion waits for percpu_ref_switch_to_atomic_rcu()
+	 * to call target_lun_confirm after lun->lun_ref has been marked
+	 * as __PERCPU_REF_DEAD on all CPUs, and switches to atomic_t
+	 * mode so that percpu_ref_tryget_live() lookup of lun->lun_ref
+	 * fails for all new incoming I/O.
+	 */
 	wait_for_completion(&lun->lun_ref_comp);
+	/*
+	 * The second completion waits for percpu_ref_put_many() to
+	 * invoke ->release() after lun->lun_ref has switched to
+	 * atomic_t mode, and lun->lun_ref.count has reached zero.
+	 *
+	 * At this point all target-core lun->lun_ref references have
+	 * been dropped via transport_lun_remove_cmd(), and it's safe
+	 * to proceed with the remaining LUN shutdown.
+	 */
+	wait_for_completion(&lun->lun_shutdown_comp);
 }
 
 static bool
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -740,6 +740,7 @@ struct se_lun {
 	struct config_group	lun_group;
 	struct se_port_stat_grps port_stat_grps;
 	struct completion	lun_ref_comp;
+	struct completion	lun_shutdown_comp;
 	struct percpu_ref	lun_ref;
 	struct list_head	lun_dev_link;
 	struct hlist_node	link;

  parent reply	other threads:[~2017-03-13  9:22 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-13  8:39 [PATCH 4.4 00/36] 4.4.54-stable review Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 01/36] TTY: n_hdlc, fix lockdep false positive Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 02/36] tty: n_hdlc: get rid of racy n_hdlc.tbuf Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 03/36] serial: 8250_pci: Add MKS Tenta SCOM-0800 and SCOM-0801 cards Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 04/36] KVM: s390: Disable dirty log retrieval for UCONTROL guests Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 05/36] KVM: VMX: use correct vmcs_read/write for guest segment selector/base Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 06/36] Bluetooth: Add another AR3012 04ca:3018 device Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 07/36] s390/qdio: clear DSCI prior to scanning multiple input queues Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 08/36] s390/dcssblk: fix device size calculation in dcssblk_direct_access() Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 09/36] s390: TASK_SIZE for kernel threads Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 10/36] s390: make setup_randomness work Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 11/36] s390: use correct input data address for setup_randomness Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 12/36] net: mvpp2: fix DMA address calculation in mvpp2_txq_inc_put() Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 13/36] mnt: Tuck mounts under others instead of creating shadow/side mounts Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 14/36] IB/ipoib: Fix deadlock between rmmod and set_mode Greg Kroah-Hartman
2017-03-17  2:24   ` Ben Hutchings
2017-03-13  8:39 ` [PATCH 4.4 15/36] IB/IPoIB: Add destination address when re-queue packet Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 16/36] IB/srp: Avoid that duplicate responses trigger a kernel bug Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 17/36] IB/srp: Fix race conditions related to task management Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 18/36] ktest: Fix child exit code processing Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 19/36] ceph: remove req from unsafe list when unregistering it Greg Kroah-Hartman
2017-03-13  8:39 ` Greg Kroah-Hartman [this message]
2017-03-13  8:39 ` [PATCH 4.4 21/36] nlm: Ensure callback code also checks that the files match Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 22/36] pwm: pca9685: Fix period change with same duty cycle Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 23/36] xtensa: move parse_tag_fdt out of #ifdef CONFIG_BLK_DEV_INITRD Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 24/36] mac80211: flush delayed work when entering suspend Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 26/36] drm/ast: Fix test for VGA enabled Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 27/36] drm/ast: Call open_key before enable_mmio in POST code Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 28/36] drm/ast: Fix AST2400 POST failure without BMC FW or VBIOS Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 31/36] drm/atomic: fix an error code in mode_fixup() Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 32/36] fakelb: fix schedule while atomic Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 34/36] libceph: use BUG() instead of BUG_ON(1) Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 35/36] fat: fix using uninitialized fields of fat_inode/fsinfo_inode Greg Kroah-Hartman
2017-03-13  8:39 ` [PATCH 4.4 36/36] drivers: hv: Turn off write permission on the hypercall page Greg Kroah-Hartman
2017-03-13 22:36 ` [PATCH 4.4 00/36] 4.4.54-stable review Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170313083353.746407930@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bryantly@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nab@linux-iscsi.org \
    --cc=rlm@daterainc.com \
    --cc=stable@vger.kernel.org \
    --cc=vst@datera.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).