stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Feras Daoud <ferasda@mellanox.com>,
	Leon Romanovsky <leon@kernel.org>,
	Doug Ledford <dledford@redhat.com>,
	Sasha Levin <alexander.levin@microsoft.com>
Subject: [PATCH 4.4 31/97] IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow
Date: Fri, 23 Mar 2018 10:54:18 +0100	[thread overview]
Message-ID: <20180323094159.381404262@linuxfoundation.org> (raw)
In-Reply-To: <20180323094157.535925724@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Feras Daoud <ferasda@mellanox.com>


[ Upstream commit 3e31a490e01a6e67cbe9f6e1df2f3ff0fbf48972 ]

Before calling ipoib_stop, rtnl_lock should be taken, then
the flow clears the IPOIB_FLAG_ADMIN_UP and IPOIB_FLAG_OPER_UP
flags, and waits for mcast completion if IPOIB_MCAST_FLAG_BUSY
is set.

On the other hand, the flow of multicast join task initializes
a mcast completion, sets the IPOIB_MCAST_FLAG_BUSY and calls
ipoib_mcast_join. If IPOIB_FLAG_OPER_UP flag is not set, this
call returns EINVAL without setting the mcast completion and
leads to a deadlock.

    ipoib_stop                          |
        |                               |
    clear_bit(IPOIB_FLAG_ADMIN_UP)      |
        |                               |
    Context Switch                      |
        |                       ipoib_mcast_join_task
        |                               |
        |                       spin_lock_irq(lock)
        |                               |
        |                       init_completion(mcast)
        |                               |
        |                       set_bit(IPOIB_MCAST_FLAG_BUSY)
        |                               |
        |                       Context Switch
        |                               |
    clear_bit(IPOIB_FLAG_OPER_UP)       |
        |                               |
    spin_lock_irqsave(lock)             |
        |                               |
    Context Switch                      |
        |                       ipoib_mcast_join
        |                       return (-EINVAL)
        |                               |
        |                       spin_unlock_irq(lock)
        |                               |
        |                       Context Switch
        |                               |
    ipoib_mcast_dev_flush               |
    wait_for_completion(mcast)          |

ipoib_stop will wait for mcast completion for ever, and will
not release the rtnl_lock. As a result panic occurs with the
following trace:

    [13441.639268] Call Trace:
    [13441.640150]  [<ffffffff8168b579>] schedule+0x29/0x70
    [13441.641038]  [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0
    [13441.641914]  [<ffffffff810bc017>] ? complete+0x47/0x50
    [13441.642765]  [<ffffffff810a690d>] ? flush_workqueue_prep_pwqs+0x16d/0x200
    [13441.643580]  [<ffffffff8168b956>] wait_for_completion+0x116/0x170
    [13441.644434]  [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20
    [13441.645293]  [<ffffffffa05af170>] ipoib_mcast_dev_flush+0x150/0x190 [ib_ipoib]
    [13441.646159]  [<ffffffffa05ac967>] ipoib_ib_dev_down+0x37/0x60 [ib_ipoib]
    [13441.647013]  [<ffffffffa05a4805>] ipoib_stop+0x75/0x150 [ib_ipoib]

Fixes: 08bc327629cb ("IB/ipoib: fix for rare multicast join race condition")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -473,6 +473,9 @@ static int ipoib_mcast_join(struct net_d
 	    !test_bit(IPOIB_FLAG_OPER_UP, &priv->flags))
 		return -EINVAL;
 
+	init_completion(&mcast->done);
+	set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
+
 	ipoib_dbg_mcast(priv, "joining MGID %pI6\n", mcast->mcmember.mgid.raw);
 
 	rec.mgid     = mcast->mcmember.mgid;
@@ -631,8 +634,6 @@ void ipoib_mcast_join_task(struct work_s
 			if (mcast->backoff == 1 ||
 			    time_after_eq(jiffies, mcast->delay_until)) {
 				/* Found the next unjoined group */
-				init_completion(&mcast->done);
-				set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
 				if (ipoib_mcast_join(dev, mcast)) {
 					spin_unlock_irq(&priv->lock);
 					return;
@@ -652,11 +653,9 @@ out:
 		queue_delayed_work(priv->wq, &priv->mcast_task,
 				   delay_until - jiffies);
 	}
-	if (mcast) {
-		init_completion(&mcast->done);
-		set_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
+	if (mcast)
 		ipoib_mcast_join(dev, mcast);
-	}
+
 	spin_unlock_irq(&priv->lock);
 }
 

  parent reply	other threads:[~2018-03-23 10:14 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-23  9:53 [PATCH 4.4 00/97] 4.4.124-stable review Greg Kroah-Hartman
2018-03-23  9:53 ` [PATCH 4.4 01/97] tpm: fix potential buffer overruns caused by bit glitches on the bus Greg Kroah-Hartman
2018-03-23  9:53 ` [PATCH 4.4 02/97] tpm_tis: " Greg Kroah-Hartman
2018-03-23  9:53 ` [PATCH 4.4 03/97] SMB3: Validate negotiate request must always be signed Greg Kroah-Hartman
2018-03-23  9:53 ` [PATCH 4.4 04/97] CIFS: Enable encryption during session setup phase Greg Kroah-Hartman
2018-03-23  9:53 ` [PATCH 4.4 05/97] staging: android: ashmem: Fix possible deadlock in ashmem_ioctl Greg Kroah-Hartman
2018-03-23  9:53 ` [PATCH 4.4 06/97] platform/x86: asus-nb-wmi: Add wapf4 quirk for the X302UA Greg Kroah-Hartman
2018-03-23  9:53 ` [PATCH 4.4 07/97] regulator: anatop: set default voltage selector for pcie Greg Kroah-Hartman
2018-03-23  9:53 ` [PATCH 4.4 08/97] x86: i8259: export legacy_pic symbol Greg Kroah-Hartman
2018-03-23  9:53 ` [PATCH 4.4 09/97] rtc: cmos: Do not assume irq 8 for rtc when there are no legacy irqs Greg Kroah-Hartman
2018-03-23  9:53 ` [PATCH 4.4 10/97] Input: ar1021_i2c - fix too long name in drivers device table Greg Kroah-Hartman
2018-03-23  9:53 ` [PATCH 4.4 11/97] time: Change posix clocks ops interfaces to use timespec64 Greg Kroah-Hartman
2018-03-23  9:53 ` [PATCH 4.4 12/97] ACPI/processor: Fix error handling in __acpi_processor_start() Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 13/97] ACPI/processor: Replace racy task affinity logic Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 14/97] cpufreq/sh: " Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 15/97] genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs Greg Kroah-Hartman
2018-04-03 14:17   ` Ben Hutchings
2018-04-06  7:21     ` Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 16/97] i2c: i2c-scmi: add a MS HID Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 17/97] net: ipv6: send unsolicited NA on admin up Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 18/97] [media] media/dvb-core: Race condition when writing to CAM Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 19/97] spi: dw: Disable clock after unregistering the host Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 20/97] ath: Fix updating radar flags for coutry code India Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 21/97] clk: ns2: Correct SDIO bits Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 22/97] scsi: virtio_scsi: Always try to read VPD pages Greg Kroah-Hartman
2018-04-03 14:54   ` Ben Hutchings
2018-04-03 17:38     ` Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 23/97] KVM: PPC: Book3S PR: Exit KVM on failed mapping Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 24/97] ARM: 8668/1: ftrace: Fix dynamic ftrace with DEBUG_RODATA and !FRAME_POINTER Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 25/97] iommu/omap: Register driver before setting IOMMU ops Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 26/97] md/raid10: wait up frozen array in handle_write_completed Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 27/97] NFS: Fix missing pg_cleanup after nfs_pageio_cond_complete() Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 28/97] tcp: remove poll() flakes with FastOpen Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 29/97] e1000e: fix timing for 82579 Gigabit Ethernet controller Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 30/97] ALSA: hda - Fix headset microphone detection for ASUS N551 and N751 Greg Kroah-Hartman
2018-03-23  9:54 ` Greg Kroah-Hartman [this message]
2018-03-23  9:54 ` [PATCH 4.4 32/97] IB/ipoib: Update broadcast object if PKey value was changed in index 0 Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 33/97] HSI: ssi_protocol: double free in ssip_pn_xmit() Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 34/97] IB/mlx4: Take write semaphore when changing the vma struct Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 35/97] IB/mlx4: Change vma from shared to private Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 36/97] ASoC: Intel: Skylake: Uninitialized variable in probe_codec() Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 37/97] Fix driver usage of 128B WQEs when WQ_CREATE is V1 Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 38/97] netfilter: xt_CT: fix refcnt leak on error path Greg Kroah-Hartman
2018-04-03 17:46   ` Ben Hutchings
2018-04-10 15:14     ` Ben Hutchings
2018-03-23  9:54 ` [PATCH 4.4 39/97] openvswitch: Delete conntrack entry clashing with an expectation Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 40/97] mmc: host: omap_hsmmc: checking for NULL instead of IS_ERR() Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 41/97] wan: pc300too: abort path on failure Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 42/97] qlcnic: fix unchecked return value Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 43/97] scsi: mac_esp: Replace bogus memory barrier with spinlock Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 44/97] infiniband/uverbs: Fix integer overflows Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 45/97] NFS: dont try to cross a mountpount when there isnt one there Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 46/97] iio: st_pressure: st_accel: Initialise sensor platform data properly Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 47/97] mt7601u: check return value of alloc_skb Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 48/97] rndis_wlan: add return value validation Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 49/97] Btrfs: send, fix file hole not being preserved due to inline extent Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 50/97] mac80211: dont parse encrypted management frames in ieee80211_frame_acked Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 51/97] mfd: palmas: Reset the POWERHOLD mux during power off Greg Kroah-Hartman
2018-04-03 20:49   ` Ben Hutchings
2018-04-06  7:26     ` Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 52/97] mtip32xx: use runtime tag to initialize command header Greg Kroah-Hartman
2018-04-03 21:01   ` Ben Hutchings
2018-04-06  7:29     ` Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 53/97] staging: unisys: visorhba: fix s-Par to boot with option CONFIG_VMAP_STACK set to y Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 54/97] staging: wilc1000: fix unchecked return value Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 55/97] mmc: sdhci-of-esdhc: limit SD clock for ls1012a/ls1046a Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 56/97] ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 57/97] ipmi/watchdog: fix wdog hang on panic waiting for ipmi response Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 58/97] ACPI / PMIC: xpower: Fix power_table addresses Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 59/97] drm/nouveau/kms: Increase max retries in scanout position queries Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 60/97] bnx2x: Align RX buffers Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 61/97] power: supply: pda_power: move from timer to delayed_work Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 62/97] Input: twl4030-pwrbutton - use correct device for irq request Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 63/97] md/raid10: skip spare disk as first disk Greg Kroah-Hartman
2018-04-03 21:32   ` Ben Hutchings
2018-04-06  7:32     ` Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 65/97] tcm_fileio: Prevent information leak for short reads Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 66/97] video: fbdev: udlfb: Fix buffer on stack Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 67/97] sm501fb: dont return zero on failure path in sm501fb_start() Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 68/97] net: hns: fix ethtool_get_strings overflow in hns driver Greg Kroah-Hartman
2018-04-03 21:39   ` Ben Hutchings
2018-04-06  7:33     ` Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 69/97] cifs: small underflow in cnvrtDosUnixTm() Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 70/97] rtc: ds1374: wdt: Fix issue with timeout scaling from secs to wdt ticks Greg Kroah-Hartman
2018-04-03 21:46   ` Ben Hutchings
2018-03-23  9:54 ` [PATCH 4.4 71/97] rtc: ds1374: wdt: Fix stop/start ioctl always returning -EINVAL Greg Kroah-Hartman
2018-03-23  9:54 ` [PATCH 4.4 72/97] perf tests kmod-path: Dont fail if compressed modules arent supported Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 73/97] Bluetooth: hci_qca: Avoid setup failure on missing rampatch Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 74/97] media: c8sectpfe: fix potential NULL pointer dereference in c8sectpfe_timer_interrupt Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 75/97] drm/msm: fix leak in failed get_pages Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 77/97] rtlwifi: rtl_pci: Fix the bug when inactiveps is enabled Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 78/97] media: bt8xx: Fix err bt878_probe() Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 79/97] media: [RESEND] media: dvb-frontends: Add delay to Si2168 restart Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 80/97] cros_ec: fix nul-termination for firmware build info Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 81/97] platform/chrome: Use proper protocol transfer function Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 82/97] mmc: avoid removing non-removable hosts during suspend Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 83/97] IB/ipoib: Avoid memory leak if the SA returns a different DGID Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 84/97] RDMA/cma: Use correct size when writing netlink stats Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 85/97] IB/umem: Fix use of npages/nmap fields Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 86/97] vgacon: Set VGA struct resource types Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 87/97] drm/omap: DMM: Check for DMM readiness after successful transaction commit Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 88/97] pty: cancel pty slave port bufs work in tty_release Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 89/97] coresight: Fix disabling of CoreSight TPIU Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 90/97] pinctrl: Really force states during suspend/resume Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 91/97] iommu/vt-d: clean up pr_irq if request_threaded_irq fails Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 92/97] ip6_vti: adjust vti mtu according to mtu of lower device Greg Kroah-Hartman
2018-04-04  0:09   ` Ben Hutchings
2018-04-05 15:36     ` Stefano Brivio
2018-04-06  7:47       ` Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 93/97] RDMA/ocrdma: Fix permissions for OCRDMA_RESET_STATS Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 94/97] nfsd4: permit layoutget of executable-only files Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 95/97] clk: si5351: Rename internal plls to avoid name collisions Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 96/97] dmaengine: ti-dma-crossbar: Fix event mapping for TPCC_EVT_MUX_60_63 Greg Kroah-Hartman
2018-03-23  9:55 ` [PATCH 4.4 97/97] RDMA/ucma: Fix access to non-initialized CM_ID object Greg Kroah-Hartman
2018-03-23 14:33 ` [PATCH 4.4 00/97] 4.4.124-stable review Naresh Kamboju
2018-03-23 15:01 ` Nathan Chancellor
2018-03-23 16:49   ` Greg Kroah-Hartman
2018-03-23 20:47 ` Shuah Khan
2018-03-24  0:09 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180323094159.381404262@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=alexander.levin@microsoft.com \
    --cc=dledford@redhat.com \
    --cc=ferasda@mellanox.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).