All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Ivan Vecera <ivecera@redhat.com>,
	Petr Oros <poros@redhat.com>,
	Dave Ertman <david.m.ertman@intel.com>,
	Jakub Kicinski <kuba@kernel.org>
Subject: [PATCH 5.16 28/28] ice: Fix race condition during interface enslave
Date: Thu, 17 Mar 2022 13:46:19 +0100	[thread overview]
Message-ID: <20220317124527.562546441@linuxfoundation.org> (raw)
In-Reply-To: <20220317124526.768423926@linuxfoundation.org>

From: Ivan Vecera <ivecera@redhat.com>

commit 5cb1ebdbc4342b1c2ce89516e19808d64417bdbc upstream.

Commit 5dbbbd01cbba83 ("ice: Avoid RTNL lock when re-creating
auxiliary device") changes a process of re-creation of aux device
so ice_plug_aux_dev() is called from ice_service_task() context.
This unfortunately opens a race window that can result in dead-lock
when interface has left LAG and immediately enters LAG again.

Reproducer:
```
#!/bin/sh

ip link add lag0 type bond mode 1 miimon 100
ip link set lag0

for n in {1..10}; do
        echo Cycle: $n
        ip link set ens7f0 master lag0
        sleep 1
        ip link set ens7f0 nomaster
done
```

This results in:
[20976.208697] Workqueue: ice ice_service_task [ice]
[20976.213422] Call Trace:
[20976.215871]  __schedule+0x2d1/0x830
[20976.219364]  schedule+0x35/0xa0
[20976.222510]  schedule_preempt_disabled+0xa/0x10
[20976.227043]  __mutex_lock.isra.7+0x310/0x420
[20976.235071]  enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core]
[20976.251215]  ib_enum_roce_netdev+0xa4/0xe0 [ib_core]
[20976.256192]  ib_cache_setup_one+0x33/0xa0 [ib_core]
[20976.261079]  ib_register_device+0x40d/0x580 [ib_core]
[20976.266139]  irdma_ib_register_device+0x129/0x250 [irdma]
[20976.281409]  irdma_probe+0x2c1/0x360 [irdma]
[20976.285691]  auxiliary_bus_probe+0x45/0x70
[20976.289790]  really_probe+0x1f2/0x480
[20976.298509]  driver_probe_device+0x49/0xc0
[20976.302609]  bus_for_each_drv+0x79/0xc0
[20976.306448]  __device_attach+0xdc/0x160
[20976.310286]  bus_probe_device+0x9d/0xb0
[20976.314128]  device_add+0x43c/0x890
[20976.321287]  __auxiliary_device_add+0x43/0x60
[20976.325644]  ice_plug_aux_dev+0xb2/0x100 [ice]
[20976.330109]  ice_service_task+0xd0c/0xed0 [ice]
[20976.342591]  process_one_work+0x1a7/0x360
[20976.350536]  worker_thread+0x30/0x390
[20976.358128]  kthread+0x10a/0x120
[20976.365547]  ret_from_fork+0x1f/0x40
...
[20976.438030] task:ip              state:D stack:    0 pid:213658 ppid:213627 flags:0x00004084
[20976.446469] Call Trace:
[20976.448921]  __schedule+0x2d1/0x830
[20976.452414]  schedule+0x35/0xa0
[20976.455559]  schedule_preempt_disabled+0xa/0x10
[20976.460090]  __mutex_lock.isra.7+0x310/0x420
[20976.464364]  device_del+0x36/0x3c0
[20976.467772]  ice_unplug_aux_dev+0x1a/0x40 [ice]
[20976.472313]  ice_lag_event_handler+0x2a2/0x520 [ice]
[20976.477288]  notifier_call_chain+0x47/0x70
[20976.481386]  __netdev_upper_dev_link+0x18b/0x280
[20976.489845]  bond_enslave+0xe05/0x1790 [bonding]
[20976.494475]  do_setlink+0x336/0xf50
[20976.502517]  __rtnl_newlink+0x529/0x8b0
[20976.543441]  rtnl_newlink+0x43/0x60
[20976.546934]  rtnetlink_rcv_msg+0x2b1/0x360
[20976.559238]  netlink_rcv_skb+0x4c/0x120
[20976.563079]  netlink_unicast+0x196/0x230
[20976.567005]  netlink_sendmsg+0x204/0x3d0
[20976.570930]  sock_sendmsg+0x4c/0x50
[20976.574423]  ____sys_sendmsg+0x1eb/0x250
[20976.586807]  ___sys_sendmsg+0x7c/0xc0
[20976.606353]  __sys_sendmsg+0x57/0xa0
[20976.609930]  do_syscall_64+0x5b/0x1a0
[20976.613598]  entry_SYSCALL_64_after_hwframe+0x65/0xca

1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev()
   is called from ice_service_task() context, aux device is created
   and associated device->lock is taken.
2. Command 'ip link ... set master...' calls ice's notifier under
   RTNL lock and that notifier calls ice_unplug_aux_dev(). That
   function tries to take aux device->lock but this is already taken
   by ice_plug_aux_dev() in step 1
3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already
   taken in step 2
4. Dead-lock

The patch fixes this issue by following changes:
- Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev()
  call in ice_service_task()
- The bit is checked in ice_clear_rdma_cap() and only if it is not set
  then ice_unplug_aux_dev() is called. If it is set (in other words
  plugging of aux device was requested and ice_plug_aux_dev() is
  potentially running) then the function only clears the bit
- Once ice_plug_aux_dev() call (in ice_service_task) is finished
  the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked
  whether it was already cleared by ice_clear_rdma_cap(). If so then
  aux device is unplugged.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Co-developed-by: Petr Oros <poros@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/intel/ice/ice.h      |   11 ++++++++++-
 drivers/net/ethernet/intel/ice/ice_main.c |   12 +++++++++++-
 2 files changed, 21 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -891,7 +891,16 @@ static inline void ice_set_rdma_cap(stru
  */
 static inline void ice_clear_rdma_cap(struct ice_pf *pf)
 {
-	ice_unplug_aux_dev(pf);
+	/* We can directly unplug aux device here only if the flag bit
+	 * ICE_FLAG_PLUG_AUX_DEV is not set because ice_unplug_aux_dev()
+	 * could race with ice_plug_aux_dev() called from
+	 * ice_service_task(). In this case we only clear that bit now and
+	 * aux device will be unplugged later once ice_plug_aux_device()
+	 * called from ice_service_task() finishes (see ice_service_task()).
+	 */
+	if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
+		ice_unplug_aux_dev(pf);
+
 	clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
 	clear_bit(ICE_FLAG_AUX_ENA, pf->flags);
 }
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2237,9 +2237,19 @@ static void ice_service_task(struct work
 		return;
 	}
 
-	if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
+	if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) {
+		/* Plug aux device per request */
 		ice_plug_aux_dev(pf);
 
+		/* Mark plugging as done but check whether unplug was
+		 * requested during ice_plug_aux_dev() call
+		 * (e.g. from ice_clear_rdma_cap()) and if so then
+		 * plug aux device.
+		 */
+		if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags))
+			ice_unplug_aux_dev(pf);
+	}
+
 	if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) {
 		struct iidc_event *event;
 



  parent reply	other threads:[~2022-03-17 12:58 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-17 12:45 [PATCH 5.16 00/28] 5.16.16-rc1 review Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 01/28] Revert "xfrm: state and policy should fail if XFRMA_IF_ID 0" Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 02/28] arm64: dts: rockchip: fix dma-controller node names on rk356x Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 03/28] arm64: dts: rockchip: fix rk3399-puma-haikou USB OTG mode Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 04/28] xfrm: Check if_id in xfrm_migrate Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 05/28] xfrm: Fix xfrm migrate issues when address family changes Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 06/28] arm64: dts: rockchip: fix rk3399-puma eMMC HS400 signal integrity Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 07/28] arm64: dts: rockchip: align pl330 node name with dtschema Greg Kroah-Hartman
2022-03-17 12:45 ` [PATCH 5.16 08/28] arm64: dts: rockchip: reorder rk3399 hdmi clocks Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 09/28] arm64: dts: agilex: use the compatible "intel,socfpga-agilex-hsotg" Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 10/28] ARM: dts: rockchip: reorder rk322x hmdi clocks Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 11/28] ARM: dts: rockchip: fix a typo on rk3288 crypto-controller Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 12/28] mac80211: refuse aggregations sessions before authorized Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 13/28] MIPS: smp: fill in sibling and core maps earlier Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 14/28] ARM: 9178/1: fix unmet dependency on BITREVERSE for HAVE_ARCH_BITREVERSE Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 15/28] Bluetooth: hci_core: Fix leaking sent_cmd skb Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 16/28] can: rcar_canfd: rcar_canfd_channel_probe(): register the CAN device when fully ready Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 17/28] atm: firestream: check the return value of ioremap() in fs_init() Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 18/28] netfilter: egress: silence egress hook lockdep splats Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 19/28] Input: goodix - use the new soc_intel_is_byt() helper Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 20/28] Input: goodix - workaround Cherry Trail devices with a bogus ACPI Interrupt() resource Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 21/28] iwlwifi: dont advertise TWT support Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 22/28] drm/vrr: Set VRR capable prop only if it is attached to connector Greg Kroah-Hartman
2022-03-17 12:46   ` Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 23/28] nl80211: Update bss channel on channel switch for P2P_CLIENT Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 24/28] tcp: make tcp_read_sock() more robust Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 25/28] sfc: extend the locking on mcdi->seqno Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 26/28] bnx2: Fix an error message Greg Kroah-Hartman
2022-03-17 12:46 ` [PATCH 5.16 27/28] kselftest/vm: fix tests build with old libc Greg Kroah-Hartman
2022-03-17 12:46 ` Greg Kroah-Hartman [this message]
2022-03-17 17:01 ` [PATCH 5.16 00/28] 5.16.16-rc1 review Fox Chen
2022-03-17 22:17 ` Florian Fainelli
2022-03-18  2:20 ` Guenter Roeck
2022-03-18  9:10 ` Naresh Kamboju
2022-03-18 11:09 ` Bagas Sanjaya
2022-03-18 13:14 ` Jon Hunter
2022-03-18 13:48 ` Ron Economos
2022-03-18 13:52 ` Rudi Heitbaum
2022-03-18 22:56 ` Justin Forbes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220317124527.562546441@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=david.m.ertman@intel.com \
    --cc=ivecera@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=poros@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.