* [PATCH 1/4] firewire: fw-sbp2: use device generation, not card generation
2008-01-24 0:52 ` [PATCH 0/4] firewire: order of memory accesses (bus generation vs. node ID) Stefan Richter
@ 2008-01-24 0:53 ` Stefan Richter
2008-01-24 16:04 ` Jarod Wilson
2008-01-24 0:53 ` [PATCH 2/4] firewire: fw-cdev: " Stefan Richter
` (2 subsequent siblings)
3 siblings, 1 reply; 23+ messages in thread
From: Stefan Richter @ 2008-01-24 0:53 UTC (permalink / raw)
To: linux1394-devel
Cc: linux-kernel, Kristian Høgsberg, Jarod Wilson, Nick Piggin
There was a small window where a login or reconnect job could use an
already updated card generation with an outdated node ID. We have to
use the fw_device.generation here, not the fw_card.generation, because
the generation must never be newer than the node ID when we emit a
transaction. This cannot be guaranteed with fw_card.generation.
Furthermore, the target's and initiator's node IDs can be obtained from
fw_device and fw_card. Dereferencing their underlying topology objects
is not necessary.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Rework of patch "firewire: fw-sbp2: enforce read order of device
generation and node ID" from November 1 2007.
This code also needs barriers to work precisely as intended. They will
be added by a subsequent patch which consistently updates readers and
writers of .generation and .node_id.
drivers/firewire/fw-sbp2.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
Index: linux/drivers/firewire/fw-sbp2.c
===================================================================
--- linux.orig/drivers/firewire/fw-sbp2.c
+++ linux/drivers/firewire/fw-sbp2.c
@@ -671,9 +671,9 @@ static void sbp2_login(struct work_struc
struct sbp2_login_response response;
int generation, node_id, local_node_id;
- generation = device->card->generation;
- node_id = device->node->node_id;
- local_node_id = device->card->local_node->node_id;
+ generation = device->generation;
+ node_id = device->node_id;
+ local_node_id = device->card->node_id;
if (sbp2_send_management_orb(lu, node_id, generation,
SBP2_LOGIN_REQUEST, lu->lun, &response) < 0) {
@@ -921,9 +921,9 @@ static void sbp2_reconnect(struct work_s
struct fw_device *device = fw_device(unit->device.parent);
int generation, node_id, local_node_id;
- generation = device->card->generation;
- node_id = device->node->node_id;
- local_node_id = device->card->local_node->node_id;
+ generation = device->generation;
+ node_id = device->node_id;
+ local_node_id = device->card->node_id;
if (sbp2_send_management_orb(lu, node_id, generation,
SBP2_RECONNECT_REQUEST,
--
Stefan Richter
-=====-==--- ---= ==---
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 1/4] firewire: fw-sbp2: use device generation, not card generation
2008-01-24 0:53 ` [PATCH 1/4] firewire: fw-sbp2: use device generation, not card generation Stefan Richter
@ 2008-01-24 16:04 ` Jarod Wilson
0 siblings, 0 replies; 23+ messages in thread
From: Jarod Wilson @ 2008-01-24 16:04 UTC (permalink / raw)
To: Stefan Richter
Cc: linux1394-devel, linux-kernel, Kristian Høgsberg,
Nick Piggin
Stefan Richter wrote:
> There was a small window where a login or reconnect job could use an
> already updated card generation with an outdated node ID. We have to
> use the fw_device.generation here, not the fw_card.generation, because
> the generation must never be newer than the node ID when we emit a
> transaction. This cannot be guaranteed with fw_card.generation.
>
> Furthermore, the target's and initiator's node IDs can be obtained from
> fw_device and fw_card. Dereferencing their underlying topology objects
> is not necessary.
>
> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Verified in concert with parts 2 and 3 as well as with 2, 3 and 4, to fix
'giving up on config rom' issues on multiple system and drive combinations
that were previously affected.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
--
Jarod Wilson
jwilson@redhat.com
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/4] firewire: fw-cdev: use device generation, not card generation
2008-01-24 0:52 ` [PATCH 0/4] firewire: order of memory accesses (bus generation vs. node ID) Stefan Richter
2008-01-24 0:53 ` [PATCH 1/4] firewire: fw-sbp2: use device generation, not card generation Stefan Richter
@ 2008-01-24 0:53 ` Stefan Richter
2008-01-24 16:05 ` Jarod Wilson
2008-01-24 0:54 ` [PATCH 3/4] firewire: enforce access order between generation and node ID Stefan Richter
2008-01-24 0:55 ` [PATCH 4/4] firewire: fw-core: react on bus resets while the config ROM is being fetched Stefan Richter
3 siblings, 1 reply; 23+ messages in thread
From: Stefan Richter @ 2008-01-24 0:53 UTC (permalink / raw)
To: linux1394-devel
Cc: linux-kernel, Kristian Høgsberg, Jarod Wilson, Nick Piggin
We have to use the fw_device.generation here, not the fw_card.generation,
because the generation must never be newer than the node ID when we emit
a transaction. This cannot be guaranteed with fw_card.generation.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
This code also needs barriers to work precisely as intended. They will
be added by a subsequent patch which consistently updates readers and
writers of .generation and .node_id.
drivers/firewire/fw-cdev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux/drivers/firewire/fw-cdev.c
===================================================================
--- linux.orig/drivers/firewire/fw-cdev.c
+++ linux/drivers/firewire/fw-cdev.c
@@ -206,12 +206,12 @@ fill_bus_reset_event(struct fw_cdev_even
event->closure = client->bus_reset_closure;
event->type = FW_CDEV_EVENT_BUS_RESET;
+ event->generation = client->device->generation;
event->node_id = client->device->node_id;
event->local_node_id = card->local_node->node_id;
event->bm_node_id = 0; /* FIXME: We don't track the BM. */
event->irm_node_id = card->irm_node->node_id;
event->root_node_id = card->root_node->node_id;
- event->generation = card->generation;
}
static void
--
Stefan Richter
-=====-==--- ---= ==---
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/4] firewire: fw-cdev: use device generation, not card generation
2008-01-24 0:53 ` [PATCH 2/4] firewire: fw-cdev: " Stefan Richter
@ 2008-01-24 16:05 ` Jarod Wilson
0 siblings, 0 replies; 23+ messages in thread
From: Jarod Wilson @ 2008-01-24 16:05 UTC (permalink / raw)
To: Stefan Richter
Cc: linux1394-devel, linux-kernel, Kristian Høgsberg,
Nick Piggin
Stefan Richter wrote:
> We have to use the fw_device.generation here, not the fw_card.generation,
> because the generation must never be newer than the node ID when we emit
> a transaction. This cannot be guaranteed with fw_card.generation.
>
> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Verified in concert with parts 1 and 3 as well as with 1, 3 and 4, to fix
'giving up on config rom' issues on multiple system and drive combinations
that were previously affected.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
--
Jarod Wilson
jwilson@redhat.com
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 3/4] firewire: enforce access order between generation and node ID
2008-01-24 0:52 ` [PATCH 0/4] firewire: order of memory accesses (bus generation vs. node ID) Stefan Richter
2008-01-24 0:53 ` [PATCH 1/4] firewire: fw-sbp2: use device generation, not card generation Stefan Richter
2008-01-24 0:53 ` [PATCH 2/4] firewire: fw-cdev: " Stefan Richter
@ 2008-01-24 0:54 ` Stefan Richter
2008-01-24 4:55 ` Nick Piggin
2008-01-24 16:06 ` Jarod Wilson
2008-01-24 0:55 ` [PATCH 4/4] firewire: fw-core: react on bus resets while the config ROM is being fetched Stefan Richter
3 siblings, 2 replies; 23+ messages in thread
From: Stefan Richter @ 2008-01-24 0:54 UTC (permalink / raw)
To: linux1394-devel
Cc: linux-kernel, Kristian Høgsberg, Jarod Wilson, Nick Piggin
fw_device.node_id and fw_device.generation are accessed without mutexes.
We have to ensure that all readers will get to see node_id updates
before generation updates.
An earlier incarnation of this patch fixes an inability to recognize
devices after "giving up on config rom",
https://bugzilla.redhat.com/show_bug.cgi?id=429950
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Rework of patches
firewire: fw-core: enforce write order when updating fw_device.generation
and parts of
firewire: fw-core: react on bus resets while the config ROM is being fetched
firewire: fw-sbp2: enforce read order of device generation and node ID
from November 1 2007.
Update:
- write site and read sites folded into one patch
- added fix to fw_device_enable_phys_dma() and fill_bus_reset_event()
- smp_ barriers are sufficient
- comments, changelog
drivers/firewire/fw-cdev.c | 1 +
drivers/firewire/fw-device.c | 13 +++++++++++--
drivers/firewire/fw-device.h | 12 ++++++++++++
drivers/firewire/fw-sbp2.c | 2 ++
drivers/firewire/fw-topology.c | 5 +++++
5 files changed, 31 insertions(+), 2 deletions(-)
Index: linux/drivers/firewire/fw-device.c
===================================================================
--- linux.orig/drivers/firewire/fw-device.c
+++ linux/drivers/firewire/fw-device.c
@@ -182,9 +182,13 @@ static void fw_device_release(struct dev
int fw_device_enable_phys_dma(struct fw_device *device)
{
+ int generation = device->generation;
+
+ /* device->node_id, accessed below, must not be older than generation */
+ smp_rmb();
return device->card->driver->enable_phys_dma(device->card,
device->node_id,
- device->generation);
+ generation);
}
EXPORT_SYMBOL(fw_device_enable_phys_dma);
@@ -389,12 +393,16 @@ static int read_rom(struct fw_device *de
struct read_quadlet_callback_data callback_data;
struct fw_transaction t;
u64 offset;
+ int generation = device->generation;
+
+ /* device->node_id, accessed below, must not be older than generation */
+ smp_rmb();
init_completion(&callback_data.done);
offset = 0xfffff0000400ULL + index * 4;
fw_send_request(device->card, &t, TCODE_READ_QUADLET_REQUEST,
- device->node_id, device->generation, device->max_speed,
+ device->node_id, generation, device->max_speed,
offset, NULL, 4, complete_transaction, &callback_data);
wait_for_completion(&callback_data.done);
@@ -801,6 +809,7 @@ void fw_node_event(struct fw_card *card,
device = node->data;
device->node_id = node->node_id;
+ smp_wmb(); /* update node_id before generation */
device->generation = card->generation;
if (atomic_read(&device->state) == FW_DEVICE_RUNNING) {
PREPARE_DELAYED_WORK(&device->work, fw_device_update);
Index: linux/drivers/firewire/fw-topology.c
===================================================================
--- linux.orig/drivers/firewire/fw-topology.c
+++ linux/drivers/firewire/fw-topology.c
@@ -518,6 +518,11 @@ fw_core_handle_bus_reset(struct fw_card
card->bm_retries = 0;
card->node_id = node_id;
+ /*
+ * Update node_id before generation to prevent anybody from using
+ * a stale node_id togeher with a current generation.
+ */
+ smp_wmb();
card->generation = generation;
card->reset_jiffies = jiffies;
schedule_delayed_work(&card->work, 0);
Index: linux/drivers/firewire/fw-sbp2.c
===================================================================
--- linux.orig/drivers/firewire/fw-sbp2.c
+++ linux/drivers/firewire/fw-sbp2.c
@@ -672,6 +672,7 @@ static void sbp2_login(struct work_struc
int generation, node_id, local_node_id;
generation = device->generation;
+ smp_rmb(); /* node_id must not be older than generation */
node_id = device->node_id;
local_node_id = device->card->node_id;
@@ -922,6 +923,7 @@ static void sbp2_reconnect(struct work_s
int generation, node_id, local_node_id;
generation = device->generation;
+ smp_rmb(); /* node_id must not be older than generation */
node_id = device->node_id;
local_node_id = device->card->node_id;
Index: linux/drivers/firewire/fw-device.h
===================================================================
--- linux.orig/drivers/firewire/fw-device.h
+++ linux/drivers/firewire/fw-device.h
@@ -35,6 +35,18 @@ struct fw_attribute_group {
struct attribute *attrs[11];
};
+/*
+ * Note, fw_device.generation always has to be read before fw_device.node_id.
+ * Use SMP memory barriers to ensure this. Otherwise requests will be sent
+ * to an outdated node_id if the generation was updated in the meantime due
+ * to a bus reset.
+ *
+ * Likewise, fw-core will take care to update .node_id before .generation so
+ * that whenever fw_device.generation is current WRT the actual bus generation,
+ * fw_device.node_id is guaranteed to be current too.
+ *
+ * The same applies to fw_device.card->node_id vs. fw_device.generation.
+ */
struct fw_device {
atomic_t state;
struct fw_node *node;
Index: linux/drivers/firewire/fw-cdev.c
===================================================================
--- linux.orig/drivers/firewire/fw-cdev.c
+++ linux/drivers/firewire/fw-cdev.c
@@ -207,6 +207,7 @@ fill_bus_reset_event(struct fw_cdev_even
event->closure = client->bus_reset_closure;
event->type = FW_CDEV_EVENT_BUS_RESET;
event->generation = client->device->generation;
+ smp_rmb(); /* node_id must not be older than generation */
event->node_id = client->device->node_id;
event->local_node_id = card->local_node->node_id;
event->bm_node_id = 0; /* FIXME: We don't track the BM. */
--
Stefan Richter
-=====-==--- ---= ==---
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 3/4] firewire: enforce access order between generation and node ID
2008-01-24 0:54 ` [PATCH 3/4] firewire: enforce access order between generation and node ID Stefan Richter
@ 2008-01-24 4:55 ` Nick Piggin
2008-01-24 16:06 ` Jarod Wilson
1 sibling, 0 replies; 23+ messages in thread
From: Nick Piggin @ 2008-01-24 4:55 UTC (permalink / raw)
To: Stefan Richter
Cc: linux1394-devel, linux-kernel, Kristian Høgsberg,
Jarod Wilson
On Thursday 24 January 2008 11:54, Stefan Richter wrote:
> fw_device.node_id and fw_device.generation are accessed without mutexes.
> We have to ensure that all readers will get to see node_id updates
> before generation updates.
>
> An earlier incarnation of this patch fixes an inability to recognize
> devices after "giving up on config rom",
> https://bugzilla.redhat.com/show_bug.cgi?id=429950
>
> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
> ---
>
> Rework of patches
> firewire: fw-core: enforce write order when updating
> fw_device.generation and parts of
> firewire: fw-core: react on bus resets while the config ROM is being
> fetched firewire: fw-sbp2: enforce read order of device generation and node
> ID from November 1 2007.
>
> Update:
> - write site and read sites folded into one patch
> - added fix to fw_device_enable_phys_dma() and fill_bus_reset_event()
> - smp_ barriers are sufficient
> - comments, changelog
I don't know the firewire subsystem at all, but the barriers seem
right (in that they match your description of the problem), and
comments for them are really good.
Thanks,
Nick
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/4] firewire: enforce access order between generation and node ID
2008-01-24 0:54 ` [PATCH 3/4] firewire: enforce access order between generation and node ID Stefan Richter
2008-01-24 4:55 ` Nick Piggin
@ 2008-01-24 16:06 ` Jarod Wilson
2008-01-25 16:35 ` Stefan Richter
1 sibling, 1 reply; 23+ messages in thread
From: Jarod Wilson @ 2008-01-24 16:06 UTC (permalink / raw)
To: Stefan Richter
Cc: linux1394-devel, linux-kernel, Kristian Høgsberg,
Nick Piggin
Stefan Richter wrote:
> fw_device.node_id and fw_device.generation are accessed without mutexes.
> We have to ensure that all readers will get to see node_id updates
> before generation updates.
>
> An earlier incarnation of this patch fixes an inability to recognize
> devices after "giving up on config rom",
> https://bugzilla.redhat.com/show_bug.cgi?id=429950
>
> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Verified in concert with parts 1 and 2 as well as with 1, 2 and 4, to fix
'giving up on config rom' issues on multiple system and drive combinations
that were previously affected.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
--
Jarod Wilson
jwilson@redhat.com
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/4] firewire: enforce access order between generation and node ID
2008-01-24 16:06 ` Jarod Wilson
@ 2008-01-25 16:35 ` Stefan Richter
2008-01-25 17:57 ` Stefan Richter
0 siblings, 1 reply; 23+ messages in thread
From: Stefan Richter @ 2008-01-25 16:35 UTC (permalink / raw)
To: Jarod Wilson
Cc: Kristian Høgsberg, Nick Piggin, linux1394-devel,
linux-kernel
I'm going to commit it with a whitespace adjustment and a comment typo
fixed. Jarod, thanks for the rigorous testing; Nick, thanks for
helping in making this a solid improvement over the initial submission.
--
Stefan Richter
-=====-==--- ---= ==--=
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/4] firewire: enforce access order between generation and node ID
2008-01-25 16:35 ` Stefan Richter
@ 2008-01-25 17:57 ` Stefan Richter
[not found] ` <59ad55d30801251024j6ff43953tb86aaa52fdea5ec9@mail.gmail.com>
0 siblings, 1 reply; 23+ messages in thread
From: Stefan Richter @ 2008-01-25 17:57 UTC (permalink / raw)
To: Jarod Wilson
Cc: Kristian Høgsberg, Nick Piggin, linux1394-devel,
linux-kernel
Stefan Richter wrote:
> I'm going to commit it with a whitespace adjustment and a comment typo
> fixed.
Also forgot to #include <asm/system.h> in some of the files. Here is
the final product, for the record.
From: Stefan Richter <stefanr@s5r6.in-berlin.de>
Subject: firewire: enforce access order between generation and node ID, fix "giving up on config rom"
fw_device.node_id and fw_device.generation are accessed without mutexes.
We have to ensure that all readers will get to see node_id updates
before generation updates.
Fixes an inability to recognize devices after "giving up on config rom",
https://bugzilla.redhat.com/show_bug.cgi?id=429950
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Reviewed by Nick Piggin <nickpiggin@yahoo.com.au>.
Verified to fix 'giving up on config rom' issues on multiple system and
drive combinations that were previously affected.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
---
drivers/firewire/fw-cdev.c | 1 +
drivers/firewire/fw-device.c | 15 +++++++++++++--
drivers/firewire/fw-device.h | 12 ++++++++++++
drivers/firewire/fw-sbp2.c | 3 +++
drivers/firewire/fw-topology.c | 6 ++++++
5 files changed, 35 insertions(+), 2 deletions(-)
Index: linux/drivers/firewire/fw-device.c
===================================================================
--- linux.orig/drivers/firewire/fw-device.c
+++ linux/drivers/firewire/fw-device.c
@@ -27,6 +27,7 @@
#include <linux/idr.h>
#include <linux/rwsem.h>
#include <asm/semaphore.h>
+#include <asm/system.h>
#include <linux/ctype.h>
#include "fw-transaction.h"
#include "fw-topology.h"
@@ -182,9 +183,14 @@ static void fw_device_release(struct dev
int fw_device_enable_phys_dma(struct fw_device *device)
{
+ int generation = device->generation;
+
+ /* device->node_id, accessed below, must not be older than generation */
+ smp_rmb();
+
return device->card->driver->enable_phys_dma(device->card,
device->node_id,
- device->generation);
+ generation);
}
EXPORT_SYMBOL(fw_device_enable_phys_dma);
@@ -389,12 +395,16 @@ static int read_rom(struct fw_device *de
struct read_quadlet_callback_data callback_data;
struct fw_transaction t;
u64 offset;
+ int generation = device->generation;
+
+ /* device->node_id, accessed below, must not be older than generation */
+ smp_rmb();
init_completion(&callback_data.done);
offset = 0xfffff0000400ULL + index * 4;
fw_send_request(device->card, &t, TCODE_READ_QUADLET_REQUEST,
- device->node_id, device->generation, device->max_speed,
+ device->node_id, generation, device->max_speed,
offset, NULL, 4, complete_transaction, &callback_data);
wait_for_completion(&callback_data.done);
@@ -801,6 +811,7 @@ void fw_node_event(struct fw_card *card,
device = node->data;
device->node_id = node->node_id;
+ smp_wmb(); /* update node_id before generation */
device->generation = card->generation;
if (atomic_read(&device->state) == FW_DEVICE_RUNNING) {
PREPARE_DELAYED_WORK(&device->work, fw_device_update);
Index: linux/drivers/firewire/fw-topology.c
===================================================================
--- linux.orig/drivers/firewire/fw-topology.c
+++ linux/drivers/firewire/fw-topology.c
@@ -21,6 +21,7 @@
#include <linux/module.h>
#include <linux/wait.h>
#include <linux/errno.h>
+#include <asm/system.h>
#include "fw-transaction.h"
#include "fw-topology.h"
@@ -518,6 +519,11 @@ fw_core_handle_bus_reset(struct fw_card
card->bm_retries = 0;
card->node_id = node_id;
+ /*
+ * Update node_id before generation to prevent anybody from using
+ * a stale node_id together with a current generation.
+ */
+ smp_wmb();
card->generation = generation;
card->reset_jiffies = jiffies;
schedule_delayed_work(&card->work, 0);
Index: linux/drivers/firewire/fw-sbp2.c
===================================================================
--- linux.orig/drivers/firewire/fw-sbp2.c
+++ linux/drivers/firewire/fw-sbp2.c
@@ -40,6 +40,7 @@
#include <linux/stringify.h>
#include <linux/timer.h>
#include <linux/workqueue.h>
+#include <asm/system.h>
#include <scsi/scsi.h>
#include <scsi/scsi_cmnd.h>
@@ -662,6 +663,7 @@ static void sbp2_login(struct work_struc
int generation, node_id, local_node_id;
generation = device->generation;
+ smp_rmb(); /* node_id must not be older than generation */
node_id = device->node_id;
local_node_id = device->card->node_id;
@@ -912,6 +914,7 @@ static void sbp2_reconnect(struct work_s
int generation, node_id, local_node_id;
generation = device->generation;
+ smp_rmb(); /* node_id must not be older than generation */
node_id = device->node_id;
local_node_id = device->card->node_id;
Index: linux/drivers/firewire/fw-device.h
===================================================================
--- linux.orig/drivers/firewire/fw-device.h
+++ linux/drivers/firewire/fw-device.h
@@ -35,6 +35,18 @@ struct fw_attribute_group {
struct attribute *attrs[11];
};
+/*
+ * Note, fw_device.generation always has to be read before fw_device.node_id.
+ * Use SMP memory barriers to ensure this. Otherwise requests will be sent
+ * to an outdated node_id if the generation was updated in the meantime due
+ * to a bus reset.
+ *
+ * Likewise, fw-core will take care to update .node_id before .generation so
+ * that whenever fw_device.generation is current WRT the actual bus generation,
+ * fw_device.node_id is guaranteed to be current too.
+ *
+ * The same applies to fw_device.card->node_id vs. fw_device.generation.
+ */
struct fw_device {
atomic_t state;
struct fw_node *node;
Index: linux/drivers/firewire/fw-cdev.c
===================================================================
--- linux.orig/drivers/firewire/fw-cdev.c
+++ linux/drivers/firewire/fw-cdev.c
@@ -207,6 +207,7 @@ fill_bus_reset_event(struct fw_cdev_even
event->closure = client->bus_reset_closure;
event->type = FW_CDEV_EVENT_BUS_RESET;
event->generation = client->device->generation;
+ smp_rmb(); /* node_id must not be older than generation */
event->node_id = client->device->node_id;
event->local_node_id = card->local_node->node_id;
event->bm_node_id = 0; /* FIXME: We don't track the BM. */
--
Stefan Richter
-=====-==--- ---= ==--=
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 4/4] firewire: fw-core: react on bus resets while the config ROM is being fetched
2008-01-24 0:52 ` [PATCH 0/4] firewire: order of memory accesses (bus generation vs. node ID) Stefan Richter
` (2 preceding siblings ...)
2008-01-24 0:54 ` [PATCH 3/4] firewire: enforce access order between generation and node ID Stefan Richter
@ 2008-01-24 0:55 ` Stefan Richter
2008-01-24 16:11 ` Jarod Wilson
3 siblings, 1 reply; 23+ messages in thread
From: Stefan Richter @ 2008-01-24 0:55 UTC (permalink / raw)
To: linux1394-devel
Cc: linux-kernel, Kristian Høgsberg, Jarod Wilson, Nick Piggin
read_rom() obtained a fresh new fw_device.generation for each read
transaction. Hence it was able to continue reading in the middle of the
ROM even if a bus reset happened. However the device may have modified
the ROM during the reset. We would end up with a corrupt fetched ROM
image then.
Although all of this is quite unlikely, it is not impossible.
Therefore we now restart reading the ROM if the bus generation changed.
Side note: The barrier in read_rom(), inserted by patch "firewire:
enforce access order between generation and node ID" is not necessary
anymore because the sequence of calls
fw_device_init() ->
read_bus_info_block() ->
read_rom()
read_rom()
read_rom()
...
will take care that generation is read before node_id, won't it?
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Refreshed version of the patch from November 1 2007.
drivers/firewire/fw-device.c | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)
Index: linux/drivers/firewire/fw-device.c
===================================================================
--- linux.orig/drivers/firewire/fw-device.c
+++ linux/drivers/firewire/fw-device.c
@@ -388,15 +388,12 @@ complete_transaction(struct fw_card *car
complete(&callback_data->done);
}
-static int read_rom(struct fw_device *device, int index, u32 * data)
+static int
+read_rom(struct fw_device *device, int generation, int index, u32 *data)
{
struct read_quadlet_callback_data callback_data;
struct fw_transaction t;
u64 offset;
- int generation = device->generation;
-
- /* device->node_id, accessed below, must not be older than generation */
- smp_rmb();
init_completion(&callback_data.done);
@@ -412,7 +409,14 @@ static int read_rom(struct fw_device *de
return callback_data.rcode;
}
-static int read_bus_info_block(struct fw_device *device)
+/*
+ * Read the bus info block, perform a speed probe, and read all of the rest of
+ * the config ROM. We do all this with a cached bus generation. If the bus
+ * generation changes under us, read_bus_info_block will fail and get retried.
+ * It's better to start all over in this case because the node from which we
+ * are reading the ROM may have changed the ROM during the reset.
+ */
+static int read_bus_info_block(struct fw_device *device, int generation)
{
static u32 rom[256];
u32 stack[16], sp, key;
@@ -422,7 +426,7 @@ static int read_bus_info_block(struct fw
/* First read the bus info block. */
for (i = 0; i < 5; i++) {
- if (read_rom(device, i, &rom[i]) != RCODE_COMPLETE)
+ if (read_rom(device, generation, i, &rom[i]) != RCODE_COMPLETE)
return -1;
/*
* As per IEEE1212 7.2, during power-up, devices can
@@ -457,7 +461,8 @@ static int read_bus_info_block(struct fw
device->max_speed = device->card->link_speed;
while (device->max_speed > SCODE_100) {
- if (read_rom(device, 0, &dummy) == RCODE_COMPLETE)
+ if (read_rom(device, generation, 0, &dummy) ==
+ RCODE_COMPLETE)
break;
device->max_speed--;
}
@@ -490,7 +495,7 @@ static int read_bus_info_block(struct fw
return -1;
/* Read header quadlet for the block to get the length. */
- if (read_rom(device, i, &rom[i]) != RCODE_COMPLETE)
+ if (read_rom(device, generation, i, &rom[i]) != RCODE_COMPLETE)
return -1;
end = i + (rom[i] >> 16) + 1;
i++;
@@ -509,7 +514,8 @@ static int read_bus_info_block(struct fw
* it references another block, and push it in that case.
*/
while (i < end) {
- if (read_rom(device, i, &rom[i]) != RCODE_COMPLETE)
+ if (read_rom(device, generation, i, &rom[i]) !=
+ RCODE_COMPLETE)
return -1;
if ((key >> 30) == 3 && (rom[i] >> 30) > 1 &&
sp < ARRAY_SIZE(stack))
@@ -656,7 +662,7 @@ static void fw_device_init(struct work_s
* device.
*/
- if (read_bus_info_block(device) < 0) {
+ if (read_bus_info_block(device, device->generation) < 0) {
if (device->config_rom_retries < MAX_RETRIES) {
device->config_rom_retries++;
schedule_delayed_work(&device->work, RETRY_DELAY);
--
Stefan Richter
-=====-==--- ---= ==---
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 4/4] firewire: fw-core: react on bus resets while the config ROM is being fetched
2008-01-24 0:55 ` [PATCH 4/4] firewire: fw-core: react on bus resets while the config ROM is being fetched Stefan Richter
@ 2008-01-24 16:11 ` Jarod Wilson
2008-01-24 19:26 ` Jarod Wilson
0 siblings, 1 reply; 23+ messages in thread
From: Jarod Wilson @ 2008-01-24 16:11 UTC (permalink / raw)
To: Stefan Richter
Cc: linux1394-devel, linux-kernel, Kristian Høgsberg,
Nick Piggin
Stefan Richter wrote:
> read_rom() obtained a fresh new fw_device.generation for each read
> transaction. Hence it was able to continue reading in the middle of the
> ROM even if a bus reset happened. However the device may have modified
> the ROM during the reset. We would end up with a corrupt fetched ROM
> image then.
>
> Although all of this is quite unlikely, it is not impossible.
> Therefore we now restart reading the ROM if the bus generation changed.
>
> Side note: The barrier in read_rom(), inserted by patch "firewire:
> enforce access order between generation and node ID" is not necessary
> anymore because the sequence of calls
> fw_device_init() ->
> read_bus_info_block() ->
> read_rom()
> read_rom()
> read_rom()
> ...
> will take care that generation is read before node_id, won't it?
>
> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Based on a quick read through the code path, coupled with empirical evidence,
yes, it appears safe to remove the barrier in read_rom().
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
--
Jarod Wilson
jwilson@redhat.com
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/4] firewire: fw-core: react on bus resets while the config ROM is being fetched
2008-01-24 16:11 ` Jarod Wilson
@ 2008-01-24 19:26 ` Jarod Wilson
2008-01-25 16:53 ` [PATCH 4/4 update] " Stefan Richter
0 siblings, 1 reply; 23+ messages in thread
From: Jarod Wilson @ 2008-01-24 19:26 UTC (permalink / raw)
To: Stefan Richter
Cc: Kristian Høgsberg, Nick Piggin, linux1394-devel,
linux-kernel
Jarod Wilson wrote:
> Stefan Richter wrote:
>> read_rom() obtained a fresh new fw_device.generation for each read
>> transaction. Hence it was able to continue reading in the middle of the
>> ROM even if a bus reset happened. However the device may have modified
>> the ROM during the reset. We would end up with a corrupt fetched ROM
>> image then.
>>
>> Although all of this is quite unlikely, it is not impossible.
>> Therefore we now restart reading the ROM if the bus generation changed.
>>
>> Side note: The barrier in read_rom(), inserted by patch "firewire:
>> enforce access order between generation and node ID" is not necessary
>> anymore because the sequence of calls
>> fw_device_init() ->
>> read_bus_info_block() ->
>> read_rom()
>> read_rom()
>> read_rom()
>> ...
>> will take care that generation is read before node_id, won't it?
>>
>> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
>
> Based on a quick read through the code path, coupled with empirical evidence,
> yes, it appears safe to remove the barrier in read_rom().
Crap. My earlier 'empirical evidence' seems to have been happy coincidence.
After a fresh boot, I'm consistently hitting 'failed to read config rom'
failures with this patch applied. Simply putting the barrier back in gets
things working again though. Interestingly, subsequent reloading of firewire-*
modules less the barrier also tend to work until the system is rebooted.
--
Jarod Wilson
jwilson@redhat.com
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 4/4 update] firewire: fw-core: react on bus resets while the config ROM is being fetched
2008-01-24 19:26 ` Jarod Wilson
@ 2008-01-25 16:53 ` Stefan Richter
2008-01-25 17:16 ` Jarod Wilson
0 siblings, 1 reply; 23+ messages in thread
From: Stefan Richter @ 2008-01-25 16:53 UTC (permalink / raw)
To: Jarod Wilson
Cc: Kristian Høgsberg, Nick Piggin, linux1394-devel,
linux-kernel
read_rom() obtained a fresh new fw_device.generation for each read
transaction. Hence it was able to continue reading in the middle of the
ROM even if a bus reset happened. However the device may have modified
the ROM during the reset. We would end up with a corrupt fetched ROM
image then.
Although all of this is quite unlikely, it is not impossible.
Therefore we now restart reading the ROM if the bus generation changed.
Note, the memory barrier in read_rom() is still necessary according to
tests by Jarod Wilson, despite of the ->generation access being moved up
in the call chain.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
First posted on 2007-11-01. Update: Barrier stays.
drivers/firewire/fw-device.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
Index: linux-2.6.24/drivers/firewire/fw-device.c
===================================================================
--- linux-2.6.24.orig/drivers/firewire/fw-device.c
+++ linux-2.6.24/drivers/firewire/fw-device.c
@@ -389,12 +389,12 @@ complete_transaction(struct fw_card *car
complete(&callback_data->done);
}
-static int read_rom(struct fw_device *device, int index, u32 * data)
+static int
+read_rom(struct fw_device *device, int generation, int index, u32 *data)
{
struct read_quadlet_callback_data callback_data;
struct fw_transaction t;
u64 offset;
- int generation = device->generation;
/* device->node_id, accessed below, must not be older than generation */
smp_rmb();
@@ -413,7 +413,14 @@ static int read_rom(struct fw_device *de
return callback_data.rcode;
}
-static int read_bus_info_block(struct fw_device *device)
+/*
+ * Read the bus info block, perform a speed probe, and read all of the rest of
+ * the config ROM. We do all this with a cached bus generation. If the bus
+ * generation changes under us, read_bus_info_block will fail and get retried.
+ * It's better to start all over in this case because the node from which we
+ * are reading the ROM may have changed the ROM during the reset.
+ */
+static int read_bus_info_block(struct fw_device *device, int generation)
{
static u32 rom[256];
u32 stack[16], sp, key;
@@ -423,7 +430,7 @@ static int read_bus_info_block(struct fw
/* First read the bus info block. */
for (i = 0; i < 5; i++) {
- if (read_rom(device, i, &rom[i]) != RCODE_COMPLETE)
+ if (read_rom(device, generation, i, &rom[i]) != RCODE_COMPLETE)
return -1;
/*
* As per IEEE1212 7.2, during power-up, devices can
@@ -458,7 +465,8 @@ static int read_bus_info_block(struct fw
device->max_speed = device->card->link_speed;
while (device->max_speed > SCODE_100) {
- if (read_rom(device, 0, &dummy) == RCODE_COMPLETE)
+ if (read_rom(device, generation, 0, &dummy) ==
+ RCODE_COMPLETE)
break;
device->max_speed--;
}
@@ -491,7 +499,7 @@ static int read_bus_info_block(struct fw
return -1;
/* Read header quadlet for the block to get the length. */
- if (read_rom(device, i, &rom[i]) != RCODE_COMPLETE)
+ if (read_rom(device, generation, i, &rom[i]) != RCODE_COMPLETE)
return -1;
end = i + (rom[i] >> 16) + 1;
i++;
@@ -510,7 +518,8 @@ static int read_bus_info_block(struct fw
* it references another block, and push it in that case.
*/
while (i < end) {
- if (read_rom(device, i, &rom[i]) != RCODE_COMPLETE)
+ if (read_rom(device, generation, i, &rom[i]) !=
+ RCODE_COMPLETE)
return -1;
if ((key >> 30) == 3 && (rom[i] >> 30) > 1 &&
sp < ARRAY_SIZE(stack))
@@ -657,7 +666,7 @@ static void fw_device_init(struct work_s
* device.
*/
- if (read_bus_info_block(device) < 0) {
+ if (read_bus_info_block(device, device->generation) < 0) {
if (device->config_rom_retries < MAX_RETRIES) {
device->config_rom_retries++;
schedule_delayed_work(&device->work, RETRY_DELAY);
--
Stefan Richter
-=====-==--- ---= ==--=
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH 4/4 update] firewire: fw-core: react on bus resets while the config ROM is being fetched
2008-01-25 16:53 ` [PATCH 4/4 update] " Stefan Richter
@ 2008-01-25 17:16 ` Jarod Wilson
2008-01-25 17:21 ` Stefan Richter
0 siblings, 1 reply; 23+ messages in thread
From: Jarod Wilson @ 2008-01-25 17:16 UTC (permalink / raw)
To: Stefan Richter
Cc: Kristian Høgsberg, Nick Piggin, linux1394-devel,
linux-kernel
On Friday 25 January 2008 11:53:49 am Stefan Richter wrote:
> read_rom() obtained a fresh new fw_device.generation for each read
> transaction. Hence it was able to continue reading in the middle of the
> ROM even if a bus reset happened. However the device may have modified
> the ROM during the reset. We would end up with a corrupt fetched ROM
> image then.
>
> Although all of this is quite unlikely, it is not impossible.
> Therefore we now restart reading the ROM if the bus generation changed.
>
> Note, the memory barrier in read_rom() is still necessary according to
> tests by Jarod Wilson, despite of the ->generation access being moved up
> in the call chain.
>
> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
> ---
>
> First posted on 2007-11-01. Update: Barrier stays.
This is essentially what I've been beating on locally, and I've yet to hit
another config rom read failure with it.
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
--
Jarod Wilson
jwilson@redhat.com
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/4 update] firewire: fw-core: react on bus resets while the config ROM is being fetched
2008-01-25 17:16 ` Jarod Wilson
@ 2008-01-25 17:21 ` Stefan Richter
0 siblings, 0 replies; 23+ messages in thread
From: Stefan Richter @ 2008-01-25 17:21 UTC (permalink / raw)
To: Jarod Wilson
Cc: Kristian Høgsberg, Nick Piggin, linux1394-devel,
linux-kernel
Jarod Wilson wrote:
> On Friday 25 January 2008 11:53:49 am Stefan Richter wrote:
>> Update: Barrier stays.
>
> This is essentially what I've been beating on locally, and I've yet to hit
> another config rom read failure with it.
>
> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Thanks. I already pushed to linux1394-2.6.git but I will add your
sign-off before I ask Linus to pull.
--
Stefan Richter
-=====-==--- ---= ==--=
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 23+ messages in thread