All of lore.kernel.org
 help / color / mirror / Atom feed
* Firmware loading race condition (+fix)
@ 2010-10-08 20:13 Duncan Simpson
  0 siblings, 0 replies; only message in thread
From: Duncan Simpson @ 2010-10-08 20:13 UTC (permalink / raw)
  To: linux-kernel

There is a nasty race condition in base/firmware_class.c which
*definitely* affects the bnx2 drivers, and almost certainly also
affects the bnx2x driver. When you load the firmware the first 5
events which happen are:

1. _request_firmware generates uevent which starts hotplug
2. The hotplug script loads the mips firmware
3. unregistering the device generates another uevent
4. _request_firmware removes the device
5. The hotplug script tries to reload the *same* firmware again

Step 5 is the wrong thing. Effects range from kernel null pointer
deference via the device not working due to incorrect firmware to
annoying but harmless error messages. Let it suffice to say I had
some serious hardware with 4 on board bnx2 devices of which at
one reliably does not work (assumming, as was usually the case,
the kernel null pointer dererefence timing was avoided).

My employer wants the fix below to be sent upstream from an email
address which does not allow them to be identified. The fix below
solves the problem by adding DEVSTATUS to the environment which is
0 at step 2 and either 2 or 4, depending on whether the firmware
loading worked or not, at step 5.

The hotplug script has to be changed to only load the firmware is
DEVSTATUS is 0. The patch fixes the simple hotplug script in the
Documentation directory to do this. Please Cc: responses to me
because I do not have sufficient bandwidth to read LKML in addition
to the mail I currently receive.

If your hardware only has one piece of firmware or you use built
in firmware or never experience unfortunate timing you wont
experience this particular bug, which I suspect has been present
for a long time. I think the only multiple firmware drivers are
currently bnx2 and bnx2x.

When the fix is applied all 4 bnx2 interfaces on the box which had
problems all work and can be relied upon to work.

Duncan (-:

diff -ur linux-2.6.35.7/Documentation/firmware_class/hotplug-script linux-2.6.35.7.new/Documentation/firmware_class/hotplug-script
--- linux-2.6.35.7/Documentation/firmware_class/hotplug-script	2010-09-29 02:09:08.000000000 +0100
+++ linux-2.6.35.7.new/Documentation/firmware_class/hotplug-script	2010-10-08 17:20:12.989065696 +0100
@@ -2,13 +2,15 @@

 # Simple hotplug script sample:
 #
-# Both $DEVPATH and $FIRMWARE are already provided in the environment.
-
+# $DEVSTATUS, $DEVPATH and $FIRMWARE are already in the environment.
+	
 HOTPLUG_FW_DIR=/usr/lib/hotplug/firmware/

-echo 1 > /sys/$DEVPATH/loading
-cat $HOTPLUG_FW_DIR/$FIRMWARE > /sys/$DEVPATH/data
-echo 0 > /sys/$DEVPATH/loading
+if [ $DEVSTATUS -eq 0 ]; then
+    echo 1 > /sys/$DEVPATH/loading
+    cat $HOTPLUG_FW_DIR/$FIRMWARE > /sysfs/$DEVPATH/data
+    echo 0 > /sys/$DEVPATH/loading
+fi

 # To cancel the load in case of error:
 #
diff -ur linux-2.6.35.7/Documentation/firmware_class/README linux-2.6.35.7.new/Documentation/firmware_class/README
--- linux-2.6.35.7/Documentation/firmware_class/README	2010-09-29 02:09:08.000000000 +0100
+++ linux-2.6.35.7.new/Documentation/firmware_class/README	2010-10-08 17:18:40.078685400 +0100
@@ -56,13 +56,15 @@
  Sample/simple hotplug script:
  ============================

-	# Both $DEVPATH and $FIRMWARE are already provided in the environment.
-
+	# $DEVSTATUS, $DEVPATH and $FIRMWARE are already in the environment.
+	
 	HOTPLUG_FW_DIR=/usr/lib/hotplug/firmware/

-	echo 1 > /sys/$DEVPATH/loading
-	cat $HOTPLUG_FW_DIR/$FIRMWARE > /sysfs/$DEVPATH/data
-	echo 0 > /sys/$DEVPATH/loading
+	if [ $DEVSTATUS -eq 0 ]; then
+	    echo 1 > /sys/$DEVPATH/loading
+	    cat $HOTPLUG_FW_DIR/$FIRMWARE > /sysfs/$DEVPATH/data
+	    echo 0 > /sys/$DEVPATH/loading
+	fi

  Random notes:
  ============
@@ -80,6 +82,11 @@
    user contexts to request firmware asynchronously, but can't be called
    in atomic contexts.

+ - after the load has finished or been aborted the script is called
+   with DEVSTATUS either 2 (success) or 4 (aborted/failed). Bad things
+   can happen if you try loading firmware in either case, especially
+   if a device has 2 or more distinct pieces of firmware.
+

  about in-kernel persistence:
  ---------------------------
diff -ur linux-2.6.35.7/drivers/base/firmware_class.c linux-2.6.35.7.new/drivers/base/firmware_class.c
--- linux-2.6.35.7/drivers/base/firmware_class.c	2010-09-29 02:09:08.000000000 +0100
+++ linux-2.6.35.7.new/drivers/base/firmware_class.c	2010-10-08 17:32:16.298766180 +0100
@@ -168,6 +168,8 @@
 		return -ENOMEM;
 	if (add_uevent_var(env, "ASYNC=%d", fw_priv->nowait))
 		return -ENOMEM;
+	if (add_uevent_var(env, "DEVSTATUS=%d", fw_priv->status))
+		return -ENOMEM;

 	return 0;
 }
@@ -227,6 +229,12 @@
 	case 1:
 		mutex_lock(&fw_lock);
 		if (!fw_priv->fw) {
+			dev_err(dev, "%s: unexpected value (1) after loading %s completed.\n", __func__, fw_priv->fw_id);
+			mutex_unlock(&fw_lock);
+			break;
+		}
+		if (test_bit(FW_STATUS_LOADING, &fw_priv->status)!=0) {
+			dev_err(dev, "%s: unexpected value (1) after while %s loading.\n", __func__, fw_priv->fw_id);
 			mutex_unlock(&fw_lock);
 			break;
 		}
@@ -259,6 +267,7 @@
 			fw_priv->page_array_size = 0;
 			fw_priv->nr_pages = 0;
 			complete(&fw_priv->completion);
+			set_bit(FW_STATUS_DONE, &fw_priv->status);
 			clear_bit(FW_STATUS_LOADING, &fw_priv->status);
 			break;
 		}
@@ -565,7 +574,6 @@

 		kobject_uevent(&f_dev->kobj, KOBJ_ADD);
 		wait_for_completion(&fw_priv->completion);
-		set_bit(FW_STATUS_DONE, &fw_priv->status);
 		del_timer_sync(&fw_priv->timeout);
 	} else
 		wait_for_completion(&fw_priv->completion);

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2010-10-08 20:23 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-08 20:13 Firmware loading race condition (+fix) Duncan Simpson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.