From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
To: davem@davemloft.net
Cc: Jacob Keller <jacob.e.keller@intel.com>,
netdev@vger.kernel.org, nhorman@redhat.com, sassmann@redhat.com,
jogreene@redhat.com, Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Subject: [net-next 13/13] fm10k: prevent race condition of __FM10K_SERVICE_SCHED
Date: Mon, 2 Oct 2017 08:42:36 -0700 [thread overview]
Message-ID: <20171002154236.84043-14-jeffrey.t.kirsher@intel.com> (raw)
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Although very unlikely, it is possible that cancel_work_sync() may stop
the service_task before it actually started. In this case, the
__FM10K_SERVICE_SCHED bit will never be cleared. This results in the
service task being unable to reschedule in the future. Add a helper
function which sets the service disable bit, waits for the service task
to stop and clears the schedule bit, thus avoiding the race condition.
We know the schedule bit is safe to clear because the cancel_work_sync()
guarantees the service task is not running.
Add a helper function also to restart the service task, for symmetry.
This is not strictly needed but helps the mental model of how to stop
and start the service task.
This race could only happen in fm10k_suspend/fm10k_resume as this is the
only place where the service task is actually restarted. Thus,
suspend/resume testing would be ideal. However, note that the chance of
this happening is very slim as the service event is scheduled for
immediate execution, and you would have to trigger a suspend at almost
the exact same time as the service task was scheduled.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 32 ++++++++++++++++++++++------
1 file changed, 25 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 41335154d6b1..9575f7c1862d 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -118,6 +118,27 @@ static void fm10k_service_event_complete(struct fm10k_intfc *interface)
fm10k_service_event_schedule(interface);
}
+static void fm10k_stop_service_event(struct fm10k_intfc *interface)
+{
+ set_bit(__FM10K_SERVICE_DISABLE, interface->state);
+ cancel_work_sync(&interface->service_task);
+
+ /* It's possible that cancel_work_sync stopped the service task from
+ * running before it could actually start. In this case the
+ * __FM10K_SERVICE_SCHED bit will never be cleared. Since we know that
+ * the service task cannot be running at this point, we need to clear
+ * the scheduled bit, as otherwise the service task may never be
+ * restarted.
+ */
+ clear_bit(__FM10K_SERVICE_SCHED, interface->state);
+}
+
+static void fm10k_start_service_event(struct fm10k_intfc *interface)
+{
+ clear_bit(__FM10K_SERVICE_DISABLE, interface->state);
+ fm10k_service_event_schedule(interface);
+}
+
/**
* fm10k_service_timer - Timer Call-back
* @data: pointer to interface cast into an unsigned long
@@ -2116,8 +2137,7 @@ static void fm10k_remove(struct pci_dev *pdev)
del_timer_sync(&interface->service_timer);
- set_bit(__FM10K_SERVICE_DISABLE, interface->state);
- cancel_work_sync(&interface->service_task);
+ fm10k_stop_service_event(interface);
/* free netdev, this may bounce the interrupts due to setup_tc */
if (netdev->reg_state == NETREG_REGISTERED)
@@ -2155,8 +2175,7 @@ static void fm10k_prepare_suspend(struct fm10k_intfc *interface)
* stopped. We stop the watchdog task until after we resume software
* activity.
*/
- set_bit(__FM10K_SERVICE_DISABLE, interface->state);
- cancel_work_sync(&interface->service_task);
+ fm10k_stop_service_event(interface);
fm10k_prepare_for_reset(interface);
}
@@ -2183,9 +2202,8 @@ static int fm10k_handle_resume(struct fm10k_intfc *interface)
interface->link_down_event = jiffies + (HZ);
set_bit(__FM10K_LINK_DOWN, interface->state);
- /* clear the service task disable bit to allow service task to start */
- clear_bit(__FM10K_SERVICE_DISABLE, interface->state);
- fm10k_service_event_schedule(interface);
+ /* restart the service task */
+ fm10k_start_service_event(interface);
return err;
}
--
2.14.2
next prev parent reply other threads:[~2017-10-02 15:43 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-02 15:42 [net-next 00/13][pull request] 100GbE Intel Wired LAN Driver Updates 2017-10-02 Jeff Kirsher
2017-10-02 15:42 ` [net-next 01/13] fm10k: ensure we process SM mbx when processing VF mbx Jeff Kirsher
2017-10-02 15:42 ` [net-next 02/13] fm10k: reschedule service event if we stall the PF<->SM mailbox Jeff Kirsher
2017-10-02 15:42 ` [net-next 03/13] fm10k: Use seq_putc() in fm10k_dbg_desc_break() Jeff Kirsher
2017-10-02 15:42 ` [net-next 04/13] fm10k: stop spurious link down messages when Tx FIFO is full Jeff Kirsher
2017-10-02 15:42 ` [net-next 05/13] fm10k: fix typos on fall through comments Jeff Kirsher
2017-10-02 15:42 ` [net-next 06/13] fm10k: avoid possible truncation of q_vector->name Jeff Kirsher
2017-10-02 15:42 ` [net-next 07/13] fm10k: add missing fall through comment Jeff Kirsher
2017-10-02 15:42 ` [net-next 08/13] fm10k: avoid needless delay when loading driver Jeff Kirsher
2017-10-02 15:42 ` [net-next 09/13] fm10k: simplify reading PFVFLRE register Jeff Kirsher
2017-10-02 15:42 ` [net-next 10/13] fm10k: don't loop while resetting VFs due to VFLR event Jeff Kirsher
2017-10-02 15:42 ` [net-next 11/13] fm10k: avoid divide by zero in rare cases when device is resetting Jeff Kirsher
2017-10-02 15:42 ` [net-next 12/13] fm10k: move fm10k_prepare_for_reset and fm10k_handle_reset Jeff Kirsher
2017-10-02 15:42 ` Jeff Kirsher [this message]
2017-10-02 18:59 ` [net-next 00/13][pull request] 100GbE Intel Wired LAN Driver Updates 2017-10-02 David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171002154236.84043-14-jeffrey.t.kirsher@intel.com \
--to=jeffrey.t.kirsher@intel.com \
--cc=davem@davemloft.net \
--cc=jacob.e.keller@intel.com \
--cc=jogreene@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@redhat.com \
--cc=sassmann@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox