[PATCH 2.6.18-rc4 0/5] drivers/ieee1394/sbp2: fixes for 2.6.18-rc

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 2.6.18-rc4 0/5] drivers/ieee1394/sbp2: fixes for 2.6.18-rc
@ 2006-08-27 11:25 Stefan Richter
  2006-08-27 11:26 ` [PATCH 2.6.18-rc4 1/5] ieee1394: sbp2: workaround for write protect bit of Initio firmware Stefan Richter
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Stefan Richter @ 2006-08-27 11:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ben Collins, linux-kernel

There are a few bug fixes for sbp2 in -mm. Here is a selection of those
which I consider important enough and/or simple enough for inclusion
into 2.6.18-rc:

1/5 ieee1394: sbp2: workaround for write protect bit of Initio firmware
    fix bug #6947 (workaround for a device bug, not a Linux bug per se)

2/5 ieee1394: sbp2: safer agent reset in error handlers
    simple and obvious improvement

3/5 ieee1394: sbp2: safer last_orb and next_ORB handling
    prerequisite for patch 5/5, and more

4/5 ieee1394: sbp2: discard return value of sbp2_link_orb_command
    prerequisite for patch 5/5

5/5 ieee1394: sbp2: handle "sbp2util_node_write_no_wait failed"
    fix bug #6948

combined diffstat:
 drivers/ieee1394/sbp2.c |  186 +++++++++++++++++++++++++++-------------
 drivers/ieee1394/sbp2.h |   15 +--
 2 files changed, 133 insertions(+), 68 deletions(-)
-- 
Stefan Richter
-=====-=-==- =--- ==-==
http://arcgraph.de/sr/


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2.6.18-rc4 1/5] ieee1394: sbp2: workaround for write protect bit of Initio firmware
  2006-08-27 11:25 [PATCH 2.6.18-rc4 0/5] drivers/ieee1394/sbp2: fixes for 2.6.18-rc Stefan Richter
@ 2006-08-27 11:26 ` Stefan Richter
  2006-08-27 20:17   ` Linus Torvalds
  2006-08-27 11:27 ` [PATCH 2.6.18-rc4 2/5] ieee1394: sbp2: safer agent reset in error handlers Stefan Richter
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Stefan Richter @ 2006-08-27 11:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ben Collins, linux-kernel

Yet another mode pages related bug of Initio firmwares was seen.
INIC-1530 with a firmware by Initio responded with garbage to MODE SENSE
(10).  Some HDDs were therefore incorrectly marked as write protected:
http://bugzilla.kernel.org/show_bug.cgi?id=6947

Sbp2 now tells scsi_lib to use MODE SENSE (6) for the one known
defective model.  The workaround could be expanded to other or perhaps
even all model IDs of Initio firmwares if necessary.  At least it worked
OK with an INIC-2430 with different model ID and without the MS(10) bug.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.h
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.h	2006-08-27 12:27:55.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.h	2006-08-27 12:35:40.000000000 +0200
@@ -237,8 +237,9 @@ struct sbp2_status_block {
 /* Flags for detected oddities and brokeness */
 #define SBP2_WORKAROUND_128K_MAX_TRANS	0x1
 #define SBP2_WORKAROUND_INQUIRY_36	0x2
-#define SBP2_WORKAROUND_MODE_SENSE_8	0x4
+#define SBP2_WORKAROUND_SKIP_PAGE_08	0x4
 #define SBP2_WORKAROUND_FIX_CAPACITY	0x8
+#define SBP2_WORKAROUND_MODE_SENSE_6	0x10
 #define SBP2_WORKAROUND_OVERRIDE	0x100
 
 /* This is the two dma types we use for cmd_dma below */
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.c
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.c	2006-08-27 12:27:59.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.c	2006-08-27 12:35:40.000000000 +0200
@@ -168,8 +168,9 @@ module_param_named(workarounds, sbp2_def
 MODULE_PARM_DESC(workarounds, "Work around device bugs (default = 0"
 	", 128kB max transfer = " __stringify(SBP2_WORKAROUND_128K_MAX_TRANS)
 	", 36 byte inquiry = "    __stringify(SBP2_WORKAROUND_INQUIRY_36)
-	", skip mode page 8 = "   __stringify(SBP2_WORKAROUND_MODE_SENSE_8)
+	", skip mode page 08 = "  __stringify(SBP2_WORKAROUND_SKIP_PAGE_08)
 	", fix capacity = "       __stringify(SBP2_WORKAROUND_FIX_CAPACITY)
+	", use mode sense 6 = "   __stringify(SBP2_WORKAROUND_MODE_SENSE_6)
 	", override internal blacklist = " __stringify(SBP2_WORKAROUND_OVERRIDE)
 	", or a combination)");
 
@@ -301,6 +302,10 @@ static struct hpsb_protocol_driver sbp2_
  * The firmware_revision field, masked with 0xffff00, is the best indicator
  * for the type of bridge chip of a device.  It yields a few false positives
  * but this did not break correctly behaving devices so far.
+ *
+ * The order of table entries is from special to general, like for example
+ * the Initio entries.  This order is necessary because once an entry matches,
+ * the rest of the table is skipped.
  */
 static const struct {
 	u32 firmware_revision;
@@ -311,7 +316,12 @@ static const struct {
 		.firmware_revision	= 0x002800,
 		.model_id		= 0x001010,
 		.workarounds		= SBP2_WORKAROUND_INQUIRY_36 |
-					  SBP2_WORKAROUND_MODE_SENSE_8,
+					  SBP2_WORKAROUND_SKIP_PAGE_08,
+	},
+	/* Initio INIC-1530 with a firmware apparently from Initio */ {
+		.firmware_revision	= 0x000200,
+		.model_id		= 0x000540,
+		.workarounds		= SBP2_WORKAROUND_MODE_SENSE_6,
 	},
 	/* Initio bridges, actually only needed for some older ones */ {
 		.firmware_revision	= 0x000200,
@@ -2511,10 +2521,12 @@ static int sbp2scsi_slave_configure(stru
 	sdev->use_10_for_ms = 1;
 
 	if (sdev->type == TYPE_DISK &&
-	    scsi_id->workarounds & SBP2_WORKAROUND_MODE_SENSE_8)
+	    scsi_id->workarounds & SBP2_WORKAROUND_SKIP_PAGE_08)
 		sdev->skip_ms_page_8 = 1;
 	if (scsi_id->workarounds & SBP2_WORKAROUND_FIX_CAPACITY)
 		sdev->fix_capacity = 1;
+	if (scsi_id->workarounds & SBP2_WORKAROUND_MODE_SENSE_6)
+		sdev->use_10_for_ms = 0;
 	if (scsi_id->ne->guid_vendor_id == 0x0010b9 && /* Maxtor's OUI */
 	    (sdev->type == TYPE_DISK || sdev->type == TYPE_RBC))
 		sdev->allow_restart = 1;



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2.6.18-rc4 2/5] ieee1394: sbp2: safer agent reset in error handlers
  2006-08-27 11:25 [PATCH 2.6.18-rc4 0/5] drivers/ieee1394/sbp2: fixes for 2.6.18-rc Stefan Richter
  2006-08-27 11:26 ` [PATCH 2.6.18-rc4 1/5] ieee1394: sbp2: workaround for write protect bit of Initio firmware Stefan Richter
@ 2006-08-27 11:27 ` Stefan Richter
  2006-08-27 11:28 ` [PATCH 2.6.18-rc4 3/5] ieee1394: sbp2: safer last_orb and next_ORB handling Stefan Richter
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Stefan Richter @ 2006-08-27 11:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ben Collins, linux-kernel

The scsi_host_template's eh_abort_handler and eh_device_reset_handler
are allowed to sleep.  Use this to run sbp2_agent_reset in the more
reliable mode which returns _after_ its write transaction was finished.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.c
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.c	2006-08-27 12:35:40.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.c	2006-08-27 12:36:50.000000000 +0200
@@ -2583,7 +2583,7 @@ static int sbp2scsi_abort(struct scsi_cm
 		/*
 		 * Initiate a fetch agent reset.
 		 */
-		sbp2_agent_reset(scsi_id, 0);
+		sbp2_agent_reset(scsi_id, 1);
 		sbp2scsi_complete_all_commands(scsi_id, DID_BUS_BUSY);
 	}
 
@@ -2602,7 +2602,7 @@ static int sbp2scsi_reset(struct scsi_cm
 
 	if (sbp2util_node_is_available(scsi_id)) {
 		SBP2_ERR("Generating sbp2 fetch agent reset");
-		sbp2_agent_reset(scsi_id, 0);
+		sbp2_agent_reset(scsi_id, 1);
 	}
 
 	return SUCCESS;



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2.6.18-rc4 3/5] ieee1394: sbp2: safer last_orb and next_ORB handling
  2006-08-27 11:25 [PATCH 2.6.18-rc4 0/5] drivers/ieee1394/sbp2: fixes for 2.6.18-rc Stefan Richter
  2006-08-27 11:26 ` [PATCH 2.6.18-rc4 1/5] ieee1394: sbp2: workaround for write protect bit of Initio firmware Stefan Richter
  2006-08-27 11:27 ` [PATCH 2.6.18-rc4 2/5] ieee1394: sbp2: safer agent reset in error handlers Stefan Richter
@ 2006-08-27 11:28 ` Stefan Richter
  2006-08-27 11:29 ` [PATCH 2.6.18-rc4 4/5] ieee1394: sbp2: discard return value of sbp2_link_orb_command Stefan Richter
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Stefan Richter @ 2006-08-27 11:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ben Collins, linux-kernel

The sbp2 initiator has two ways to tell a target's fetch agent about new
command ORBs:
 - Write the ORB's address to the ORB_POINTER register.  This must not
   be done while the fetch agent is active.
 - Put the ORB's address into the previously submitted ORB's next_ORB
   field and write to the DOORBELL register.  This may be done while the
   fetch agent is active or suspended.  It must not be done while the
   fetch agent is in reset state.
Sbp2 has a last_orb pointer which indicates in what way a new command
should be announced.  That pointer is concurrently accessed at various
occasions.  Furthermore, initiator and target are accessing the next_ORB
field of ORBs concurrently and asynchronously.

This patch does:
 - Protect all initiator accesses to last_orb by sbp2_command_orb_lock.
 - Add pci_dma_sync_single_for_device before a previously submitted
   ORB's next_ORB field is overwritten.
 - Insert a memory barrier between when next_ORB_lo and next_ORB_hi are
   overwritten.  Next_ORB_hi must not be updated before next_ORB_lo.
 - Remove the rather unspecific and now superfluous qualifier "volatile"
   from the next_ORB fields.
 - Add comments on how last_orb is connected with what is known about
   the target's fetch agent's state.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.h
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.h	2006-08-27 12:35:40.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.h	2006-08-27 12:37:13.000000000 +0200
@@ -46,8 +46,8 @@
 #define ORB_SET_DIRECTION(value)		((value & 0x1) << 27)
 
 struct sbp2_command_orb {
-	volatile u32 next_ORB_hi;
-	volatile u32 next_ORB_lo;
+	u32 next_ORB_hi;
+	u32 next_ORB_lo;
 	u32 data_descriptor_hi;
 	u32 data_descriptor_lo;
 	u32 misc;
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.c
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.c	2006-08-27 12:36:50.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.c	2006-08-27 12:37:13.000000000 +0200
@@ -1715,6 +1715,7 @@ static int sbp2_agent_reset(struct scsi_
 	quadlet_t data;
 	u64 addr;
 	int retval;
+	unsigned long flags;
 
 	SBP2_DEBUG_ENTER();
 
@@ -1734,7 +1735,9 @@ static int sbp2_agent_reset(struct scsi_
 	/*
 	 * Need to make sure orb pointer is written on next command
 	 */
+	spin_lock_irqsave(&scsi_id->sbp2_command_orb_lock, flags);
 	scsi_id->last_orb = NULL;
+	spin_unlock_irqrestore(&scsi_id->sbp2_command_orb_lock, flags);
 
 	return 0;
 }
@@ -1976,8 +1979,12 @@ static int sbp2_link_orb_command(struct 
 {
 	struct sbp2scsi_host_info *hi = scsi_id->hi;
 	struct sbp2_command_orb *command_orb = &command->command_orb;
-	struct node_entry *ne = scsi_id->ne;
-	u64 addr;
+	struct sbp2_command_orb *last_orb;
+	dma_addr_t last_orb_dma;
+	u64 addr = scsi_id->sbp2_command_block_agent_addr;
+	quadlet_t data[2];
+	size_t length;
+	unsigned long flags;
 
 	outstanding_orb_incr;
 	SBP2_ORB_DEBUG("sending command orb %p, total orbs = %x",
@@ -1992,64 +1999,50 @@ static int sbp2_link_orb_command(struct 
 	/*
 	 * Check to see if there are any previous orbs to use
 	 */
-	if (scsi_id->last_orb == NULL) {
-		quadlet_t data[2];
-
+	spin_lock_irqsave(&scsi_id->sbp2_command_orb_lock, flags);
+	last_orb = scsi_id->last_orb;
+	last_orb_dma = scsi_id->last_orb_dma;
+	if (!last_orb) {
 		/*
-		 * Ok, let's write to the target's management agent register
+		 * last_orb == NULL means: We know that the target's fetch agent
+		 * is not active right now.
 		 */
-		addr = scsi_id->sbp2_command_block_agent_addr + SBP2_ORB_POINTER_OFFSET;
+		addr += SBP2_ORB_POINTER_OFFSET;
 		data[0] = ORB_SET_NODE_ID(hi->host->node_id);
 		data[1] = command->command_orb_dma;
 		sbp2util_cpu_to_be32_buffer(data, 8);
-
-		SBP2_ORB_DEBUG("write command agent, command orb %p", command_orb);
-
-		if (sbp2util_node_write_no_wait(ne, addr, data, 8) < 0) {
-			SBP2_ERR("sbp2util_node_write_no_wait failed.\n");
-			return -EIO;
-		}
-
-		SBP2_ORB_DEBUG("write command agent complete");
-
-		scsi_id->last_orb = command_orb;
-		scsi_id->last_orb_dma = command->command_orb_dma;
-
+		length = 8;
 	} else {
-		quadlet_t data;
-
 		/*
-		 * We have an orb already sent (maybe or maybe not
-		 * processed) that we can append this orb to. So do so,
-		 * and ring the doorbell. Have to be very careful
-		 * modifying these next orb pointers, as they are accessed
-		 * both by the sbp2 device and us.
+		 * last_orb != NULL means: We know that the target's fetch agent
+		 * is (very probably) not dead or in reset state right now.
+		 * We have an ORB already sent that we can append a new one to.
+		 * The target's fetch agent may or may not have read this
+		 * previous ORB yet.
 		 */
-		scsi_id->last_orb->next_ORB_lo =
-		    cpu_to_be32(command->command_orb_dma);
+		pci_dma_sync_single_for_cpu(hi->host->pdev, last_orb_dma,
+					    sizeof(struct sbp2_command_orb),
+					    PCI_DMA_BIDIRECTIONAL);
+		last_orb->next_ORB_lo = cpu_to_be32(command->command_orb_dma);
+		wmb();
 		/* Tells hardware that this pointer is valid */
-		scsi_id->last_orb->next_ORB_hi = 0x0;
-		pci_dma_sync_single_for_device(hi->host->pdev,
-					       scsi_id->last_orb_dma,
+		last_orb->next_ORB_hi = 0;
+		pci_dma_sync_single_for_device(hi->host->pdev, last_orb_dma,
 					       sizeof(struct sbp2_command_orb),
 					       PCI_DMA_BIDIRECTIONAL);
+		addr += SBP2_DOORBELL_OFFSET;
+		data[0] = 0;
+		length = 4;
+	}
+	scsi_id->last_orb = command_orb;
+	scsi_id->last_orb_dma = command->command_orb_dma;
+	spin_unlock_irqrestore(&scsi_id->sbp2_command_orb_lock, flags);
 
-		/*
-		 * Ring the doorbell
-		 */
-		data = cpu_to_be32(command->command_orb_dma);
-		addr = scsi_id->sbp2_command_block_agent_addr + SBP2_DOORBELL_OFFSET;
-
-		SBP2_ORB_DEBUG("ring doorbell, command orb %p", command_orb);
-
-		if (sbp2util_node_write_no_wait(ne, addr, &data, 4) < 0) {
-			SBP2_ERR("sbp2util_node_write_no_wait failed");
-			return -EIO;
-		}
-
-		scsi_id->last_orb = command_orb;
-		scsi_id->last_orb_dma = command->command_orb_dma;
-
+	SBP2_ORB_DEBUG("write to %s register, command orb %p",
+			last_orb ? "DOORBELL" : "ORB_POINTER", command_orb);
+	if (sbp2util_node_write_no_wait(scsi_id->ne, addr, data, length) < 0) {
+		SBP2_ERR("sbp2util_node_write_no_wait failed.\n");
+		return -EIO;
 	}
 	return 0;
 }
@@ -2241,14 +2234,16 @@ static int sbp2_handle_status_write(stru
 		}
 
 		/*
-		 * Check here to see if there are no commands in-use. If there are none, we can
-		 * null out last orb so that next time around we write directly to the orb pointer...
-		 * Quick start saves one 1394 bus transaction.
+		 * Check here to see if there are no commands in-use. If there
+		 * are none, we know that the fetch agent left the active state
+		 * _and_ that we did not reactivate it yet. Therefore clear
+		 * last_orb so that next time we write directly to the
+		 * ORB_POINTER register. That way the fetch agent does not need
+		 * to refetch the next_ORB.
 		 */
 		spin_lock_irqsave(&scsi_id->sbp2_command_orb_lock, flags);
-		if (list_empty(&scsi_id->sbp2_command_orb_inuse)) {
+		if (list_empty(&scsi_id->sbp2_command_orb_inuse))
 			scsi_id->last_orb = NULL;
-		}
 		spin_unlock_irqrestore(&scsi_id->sbp2_command_orb_lock, flags);
 
 	} else {



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2.6.18-rc4 4/5] ieee1394: sbp2: discard return value of sbp2_link_orb_command
  2006-08-27 11:25 [PATCH 2.6.18-rc4 0/5] drivers/ieee1394/sbp2: fixes for 2.6.18-rc Stefan Richter
                   ` (2 preceding siblings ...)
  2006-08-27 11:28 ` [PATCH 2.6.18-rc4 3/5] ieee1394: sbp2: safer last_orb and next_ORB handling Stefan Richter
@ 2006-08-27 11:29 ` Stefan Richter
  2006-08-27 11:31 ` [PATCH 2.6.18-rc4 5/5] ieee1394: sbp2: handle "sbp2util_node_write_no_wait failed" Stefan Richter
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Stefan Richter @ 2006-08-27 11:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ben Collins, linux-kernel

Since sbp2 is at the moment unable to do anything with the return value
of sbp2_link_orb_command, just discard it.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.c
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.c	2006-08-27 12:37:13.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.c	2006-08-27 12:37:44.000000000 +0200
@@ -1974,7 +1974,7 @@ static void sbp2_create_command_orb(stru
 /*
  * This function is called in order to begin a regular SBP-2 command.
  */
-static int sbp2_link_orb_command(struct scsi_id_instance_data *scsi_id,
+static void sbp2_link_orb_command(struct scsi_id_instance_data *scsi_id,
 				 struct sbp2_command_info *command)
 {
 	struct sbp2scsi_host_info *hi = scsi_id->hi;
@@ -2040,11 +2040,9 @@ static int sbp2_link_orb_command(struct 
 
 	SBP2_ORB_DEBUG("write to %s register, command orb %p",
 			last_orb ? "DOORBELL" : "ORB_POINTER", command_orb);
-	if (sbp2util_node_write_no_wait(scsi_id->ne, addr, data, length) < 0) {
+	if (sbp2util_node_write_no_wait(scsi_id->ne, addr, data, length))
 		SBP2_ERR("sbp2util_node_write_no_wait failed.\n");
-		return -EIO;
-	}
-	return 0;
+	/* We rely on SCSI EH to deal with _node_write_ failures. */
 }
 
 /*
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.h
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.h	2006-08-27 12:37:13.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.h	2006-08-27 12:37:44.000000000 +0200
@@ -391,11 +391,6 @@ static int sbp2_logout_device(struct scs
 static int sbp2_handle_status_write(struct hpsb_host *host, int nodeid, int destid,
 				    quadlet_t *data, u64 addr, size_t length, u16 flags);
 static int sbp2_agent_reset(struct scsi_id_instance_data *scsi_id, int wait);
-static int sbp2_link_orb_command(struct scsi_id_instance_data *scsi_id,
-				 struct sbp2_command_info *command);
-static int sbp2_send_command(struct scsi_id_instance_data *scsi_id,
-			     struct scsi_cmnd *SCpnt,
-			     void (*done)(struct scsi_cmnd *));
 static unsigned int sbp2_status_to_sense_data(unchar *sbp2_status,
 					      unchar *sense_data);
 static void sbp2_parse_unit_directory(struct scsi_id_instance_data *scsi_id,



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2.6.18-rc4 5/5] ieee1394: sbp2: handle "sbp2util_node_write_no_wait failed"
  2006-08-27 11:25 [PATCH 2.6.18-rc4 0/5] drivers/ieee1394/sbp2: fixes for 2.6.18-rc Stefan Richter
                   ` (3 preceding siblings ...)
  2006-08-27 11:29 ` [PATCH 2.6.18-rc4 4/5] ieee1394: sbp2: discard return value of sbp2_link_orb_command Stefan Richter
@ 2006-08-27 11:31 ` Stefan Richter
  2006-08-27 11:34 ` [PATCH 2.6.18-rc4 3/5] ieee1394: sbp2: safer last_orb and next_ORB handling Stefan Richter
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Stefan Richter @ 2006-08-27 11:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ben Collins, linux-kernel

Fix for http://bugzilla.kernel.org/show_bug.cgi?id=6948

Because sbp2 writes to the target's fetch agent's registers from within
atomic context, it cannot sleep to guaranteedly get a free transaction
label.  This may repeatedly lead to "sbp2util_node_write_no_wait failed"
and consequently to SCSI command abortion after timeout.  A likely cause
is that many queue_command softirqs may occur before khpsbpkt (the
ieee1394 driver's thread which cleans up after finished transactions) is
woken up to recycle tlabels.

Sbp2 now schedules a workqueue job whenever sbp2_link_orb_command fails
in sbp2util_node_write_no_wait.  The job will reliably get a transaction
label because it can sleep.

We use the kernel-wide shared workqueue because it is unlikely that the
job itself actually needs to sleep.  In the improbable case that it has
to sleep, it doesn't need to sleep long since the standard transaction
timeout is 100ms.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.h
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.h	2006-08-27 12:37:44.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.h	2006-08-27 12:38:02.000000000 +0200
@@ -345,6 +345,9 @@ struct scsi_id_instance_data {
 
 	/* Device specific workarounds/brokeness */
 	unsigned workarounds;
+
+	atomic_t unfinished_reset;
+	struct work_struct protocol_work;
 };
 
 /* Sbp2 host data structure (one per IEEE1394 host) */
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.c
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.c	2006-08-27 12:37:44.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.c	2006-08-27 12:38:02.000000000 +0200
@@ -478,6 +478,44 @@ static int sbp2util_node_write_no_wait(s
 	return 0;
 }
 
+static void sbp2util_notify_fetch_agent(struct scsi_id_instance_data *scsi_id,
+					u64 offset, quadlet_t *data, size_t len)
+{
+	/*
+	 * There is a small window after a bus reset within which the node
+	 * entry's generation is current but the reconnect wasn't completed.
+	 */
+	if (atomic_read(&scsi_id->unfinished_reset))
+		return;
+
+	if (hpsb_node_write(scsi_id->ne,
+			    scsi_id->sbp2_command_block_agent_addr + offset,
+			    data, len))
+		SBP2_ERR("sbp2util_notify_fetch_agent failed.");
+	/*
+	 * Now accept new SCSI commands, unless a bus reset happended during
+	 * hpsb_node_write.
+	 */
+	if (!atomic_read(&scsi_id->unfinished_reset))
+		scsi_unblock_requests(scsi_id->scsi_host);
+}
+
+static void sbp2util_write_orb_pointer(void *p)
+{
+	quadlet_t data[2];
+
+	data[0] = ORB_SET_NODE_ID(
+			((struct scsi_id_instance_data *)p)->hi->host->node_id);
+	data[1] = ((struct scsi_id_instance_data *)p)->last_orb_dma;
+	sbp2util_cpu_to_be32_buffer(data, 8);
+	sbp2util_notify_fetch_agent(p, SBP2_ORB_POINTER_OFFSET, data, 8);
+}
+
+static void sbp2util_write_doorbell(void *p)
+{
+	sbp2util_notify_fetch_agent(p, SBP2_DOORBELL_OFFSET, NULL, 4);
+}
+
 /*
  * This function is called to create a pool of command orbs used for
  * command processing. It is called when a new sbp2 device is detected.
@@ -725,6 +763,7 @@ static int sbp2_remove(struct device *de
 			sbp2scsi_complete_all_commands(scsi_id, DID_NO_CONNECT);
 		/* scsi_remove_device() will trigger shutdown functions of SCSI
 		 * highlevel drivers which would deadlock if blocked. */
+		atomic_set(&scsi_id->unfinished_reset, 0);
 		scsi_unblock_requests(scsi_id->scsi_host);
 	}
 	sdev = scsi_id->sdev;
@@ -778,6 +817,7 @@ static int sbp2_update(struct unit_direc
 
 	/* Make sure we unblock requests (since this is likely after a bus
 	 * reset). */
+	atomic_set(&scsi_id->unfinished_reset, 0);
 	scsi_unblock_requests(scsi_id->scsi_host);
 
 	return 0;
@@ -809,6 +849,8 @@ static struct scsi_id_instance_data *sbp
 	INIT_LIST_HEAD(&scsi_id->sbp2_command_orb_completed);
 	INIT_LIST_HEAD(&scsi_id->scsi_list);
 	spin_lock_init(&scsi_id->sbp2_command_orb_lock);
+	atomic_set(&scsi_id->unfinished_reset, 0);
+	INIT_WORK(&scsi_id->protocol_work, NULL, NULL);
 
 	ud->device.driver_data = scsi_id;
 
@@ -893,8 +935,10 @@ static void sbp2_host_reset(struct hpsb_
 	hi = hpsb_get_hostinfo(&sbp2_highlevel, host);
 
 	if (hi) {
-		list_for_each_entry(scsi_id, &hi->scsi_ids, scsi_list)
+		list_for_each_entry(scsi_id, &hi->scsi_ids, scsi_list) {
+			atomic_set(&scsi_id->unfinished_reset, 1);
 			scsi_block_requests(scsi_id->scsi_host);
+		}
 	}
 }
 
@@ -1046,7 +1090,7 @@ static void sbp2_remove_device(struct sc
 		scsi_remove_host(scsi_id->scsi_host);
 		scsi_host_put(scsi_id->scsi_host);
 	}
-
+	flush_scheduled_work();
 	sbp2util_remove_command_orb_pool(scsi_id);
 
 	list_del(&scsi_id->scsi_list);
@@ -1719,6 +1763,10 @@ static int sbp2_agent_reset(struct scsi_
 
 	SBP2_DEBUG_ENTER();
 
+	cancel_delayed_work(&scsi_id->protocol_work);
+	if (wait)
+		flush_scheduled_work();
+
 	data = ntohl(SBP2_AGENT_RESET_DATA);
 	addr = scsi_id->sbp2_command_block_agent_addr + SBP2_AGENT_RESET_OFFSET;
 
@@ -2040,9 +2088,22 @@ static void sbp2_link_orb_command(struct
 
 	SBP2_ORB_DEBUG("write to %s register, command orb %p",
 			last_orb ? "DOORBELL" : "ORB_POINTER", command_orb);
-	if (sbp2util_node_write_no_wait(scsi_id->ne, addr, data, length))
-		SBP2_ERR("sbp2util_node_write_no_wait failed.\n");
-	/* We rely on SCSI EH to deal with _node_write_ failures. */
+	if (sbp2util_node_write_no_wait(scsi_id->ne, addr, data, length)) {
+		/*
+		 * sbp2util_node_write_no_wait failed. We certainly ran out
+		 * of transaction labels, perhaps just because there were no
+		 * context switches which gave khpsbpkt a chance to collect
+		 * free tlabels. Try again in non-atomic context. If necessary,
+		 * the workqueue job will sleep to guaranteedly get a tlabel.
+		 * We do not accept new commands until the job is over.
+		 */
+		scsi_block_requests(scsi_id->scsi_host);
+		PREPARE_WORK(&scsi_id->protocol_work,
+			     last_orb ? sbp2util_write_doorbell:
+					sbp2util_write_orb_pointer,
+			     scsi_id);
+		schedule_work(&scsi_id->protocol_work);
+	}
 }
 
 /*



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2.6.18-rc4 3/5] ieee1394: sbp2: safer last_orb and next_ORB handling
  2006-08-27 11:25 [PATCH 2.6.18-rc4 0/5] drivers/ieee1394/sbp2: fixes for 2.6.18-rc Stefan Richter
                   ` (4 preceding siblings ...)
  2006-08-27 11:31 ` [PATCH 2.6.18-rc4 5/5] ieee1394: sbp2: handle "sbp2util_node_write_no_wait failed" Stefan Richter
@ 2006-08-27 11:34 ` Stefan Richter
  2006-08-27 11:34 ` [PATCH 2.6.18-rc4 4/5] ieee1394: sbp2: discard return value of sbp2_link_orb_command Stefan Richter
  2006-08-27 11:35 ` [PATCH 2.6.18-rc4 5/5] ieee1394: sbp2: handle "sbp2util_node_write_no_wait failed" Stefan Richter
  7 siblings, 0 replies; 15+ messages in thread
From: Stefan Richter @ 2006-08-27 11:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ben Collins, linux-kernel

The sbp2 initiator has two ways to tell a target's fetch agent about new
command ORBs:
 - Write the ORB's address to the ORB_POINTER register.  This must not
   be done while the fetch agent is active.
 - Put the ORB's address into the previously submitted ORB's next_ORB
   field and write to the DOORBELL register.  This may be done while the
   fetch agent is active or suspended.  It must not be done while the
   fetch agent is in reset state.
Sbp2 has a last_orb pointer which indicates in what way a new command
should be announced.  That pointer is concurrently accessed at various
occasions.  Furthermore, initiator and target are accessing the next_ORB
field of ORBs concurrently and asynchronously.

This patch does:
 - Protect all initiator accesses to last_orb by sbp2_command_orb_lock.
 - Add pci_dma_sync_single_for_device before a previously submitted
   ORB's next_ORB field is overwritten.
 - Insert a memory barrier between when next_ORB_lo and next_ORB_hi are
   overwritten.  Next_ORB_hi must not be updated before next_ORB_lo.
 - Remove the rather unspecific and now superfluous qualifier "volatile"
   from the next_ORB fields.
 - Add comments on how last_orb is connected with what is known about
   the target's fetch agent's state.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.h
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.h	2006-08-27 12:35:40.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.h	2006-08-27 12:37:13.000000000 +0200
@@ -46,8 +46,8 @@
 #define ORB_SET_DIRECTION(value)		((value & 0x1) << 27)
 
 struct sbp2_command_orb {
-	volatile u32 next_ORB_hi;
-	volatile u32 next_ORB_lo;
+	u32 next_ORB_hi;
+	u32 next_ORB_lo;
 	u32 data_descriptor_hi;
 	u32 data_descriptor_lo;
 	u32 misc;
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.c
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.c	2006-08-27 12:36:50.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.c	2006-08-27 12:37:13.000000000 +0200
@@ -1715,6 +1715,7 @@ static int sbp2_agent_reset(struct scsi_
 	quadlet_t data;
 	u64 addr;
 	int retval;
+	unsigned long flags;
 
 	SBP2_DEBUG_ENTER();
 
@@ -1734,7 +1735,9 @@ static int sbp2_agent_reset(struct scsi_
 	/*
 	 * Need to make sure orb pointer is written on next command
 	 */
+	spin_lock_irqsave(&scsi_id->sbp2_command_orb_lock, flags);
 	scsi_id->last_orb = NULL;
+	spin_unlock_irqrestore(&scsi_id->sbp2_command_orb_lock, flags);
 
 	return 0;
 }
@@ -1976,8 +1979,12 @@ static int sbp2_link_orb_command(struct 
 {
 	struct sbp2scsi_host_info *hi = scsi_id->hi;
 	struct sbp2_command_orb *command_orb = &command->command_orb;
-	struct node_entry *ne = scsi_id->ne;
-	u64 addr;
+	struct sbp2_command_orb *last_orb;
+	dma_addr_t last_orb_dma;
+	u64 addr = scsi_id->sbp2_command_block_agent_addr;
+	quadlet_t data[2];
+	size_t length;
+	unsigned long flags;
 
 	outstanding_orb_incr;
 	SBP2_ORB_DEBUG("sending command orb %p, total orbs = %x",
@@ -1992,64 +1999,50 @@ static int sbp2_link_orb_command(struct 
 	/*
 	 * Check to see if there are any previous orbs to use
 	 */
-	if (scsi_id->last_orb == NULL) {
-		quadlet_t data[2];
-
+	spin_lock_irqsave(&scsi_id->sbp2_command_orb_lock, flags);
+	last_orb = scsi_id->last_orb;
+	last_orb_dma = scsi_id->last_orb_dma;
+	if (!last_orb) {
 		/*
-		 * Ok, let's write to the target's management agent register
+		 * last_orb == NULL means: We know that the target's fetch agent
+		 * is not active right now.
 		 */
-		addr = scsi_id->sbp2_command_block_agent_addr + SBP2_ORB_POINTER_OFFSET;
+		addr += SBP2_ORB_POINTER_OFFSET;
 		data[0] = ORB_SET_NODE_ID(hi->host->node_id);
 		data[1] = command->command_orb_dma;
 		sbp2util_cpu_to_be32_buffer(data, 8);
-
-		SBP2_ORB_DEBUG("write command agent, command orb %p", command_orb);
-
-		if (sbp2util_node_write_no_wait(ne, addr, data, 8) < 0) {
-			SBP2_ERR("sbp2util_node_write_no_wait failed.\n");
-			return -EIO;
-		}
-
-		SBP2_ORB_DEBUG("write command agent complete");
-
-		scsi_id->last_orb = command_orb;
-		scsi_id->last_orb_dma = command->command_orb_dma;
-
+		length = 8;
 	} else {
-		quadlet_t data;
-
 		/*
-		 * We have an orb already sent (maybe or maybe not
-		 * processed) that we can append this orb to. So do so,
-		 * and ring the doorbell. Have to be very careful
-		 * modifying these next orb pointers, as they are accessed
-		 * both by the sbp2 device and us.
+		 * last_orb != NULL means: We know that the target's fetch agent
+		 * is (very probably) not dead or in reset state right now.
+		 * We have an ORB already sent that we can append a new one to.
+		 * The target's fetch agent may or may not have read this
+		 * previous ORB yet.
 		 */
-		scsi_id->last_orb->next_ORB_lo =
-		    cpu_to_be32(command->command_orb_dma);
+		pci_dma_sync_single_for_cpu(hi->host->pdev, last_orb_dma,
+					    sizeof(struct sbp2_command_orb),
+					    PCI_DMA_BIDIRECTIONAL);
+		last_orb->next_ORB_lo = cpu_to_be32(command->command_orb_dma);
+		wmb();
 		/* Tells hardware that this pointer is valid */
-		scsi_id->last_orb->next_ORB_hi = 0x0;
-		pci_dma_sync_single_for_device(hi->host->pdev,
-					       scsi_id->last_orb_dma,
+		last_orb->next_ORB_hi = 0;
+		pci_dma_sync_single_for_device(hi->host->pdev, last_orb_dma,
 					       sizeof(struct sbp2_command_orb),
 					       PCI_DMA_BIDIRECTIONAL);
+		addr += SBP2_DOORBELL_OFFSET;
+		data[0] = 0;
+		length = 4;
+	}
+	scsi_id->last_orb = command_orb;
+	scsi_id->last_orb_dma = command->command_orb_dma;
+	spin_unlock_irqrestore(&scsi_id->sbp2_command_orb_lock, flags);
 
-		/*
-		 * Ring the doorbell
-		 */
-		data = cpu_to_be32(command->command_orb_dma);
-		addr = scsi_id->sbp2_command_block_agent_addr + SBP2_DOORBELL_OFFSET;
-
-		SBP2_ORB_DEBUG("ring doorbell, command orb %p", command_orb);
-
-		if (sbp2util_node_write_no_wait(ne, addr, &data, 4) < 0) {
-			SBP2_ERR("sbp2util_node_write_no_wait failed");
-			return -EIO;
-		}
-
-		scsi_id->last_orb = command_orb;
-		scsi_id->last_orb_dma = command->command_orb_dma;
-
+	SBP2_ORB_DEBUG("write to %s register, command orb %p",
+			last_orb ? "DOORBELL" : "ORB_POINTER", command_orb);
+	if (sbp2util_node_write_no_wait(scsi_id->ne, addr, data, length) < 0) {
+		SBP2_ERR("sbp2util_node_write_no_wait failed.\n");
+		return -EIO;
 	}
 	return 0;
 }
@@ -2241,14 +2234,16 @@ static int sbp2_handle_status_write(stru
 		}
 
 		/*
-		 * Check here to see if there are no commands in-use. If there are none, we can
-		 * null out last orb so that next time around we write directly to the orb pointer...
-		 * Quick start saves one 1394 bus transaction.
+		 * Check here to see if there are no commands in-use. If there
+		 * are none, we know that the fetch agent left the active state
+		 * _and_ that we did not reactivate it yet. Therefore clear
+		 * last_orb so that next time we write directly to the
+		 * ORB_POINTER register. That way the fetch agent does not need
+		 * to refetch the next_ORB.
 		 */
 		spin_lock_irqsave(&scsi_id->sbp2_command_orb_lock, flags);
-		if (list_empty(&scsi_id->sbp2_command_orb_inuse)) {
+		if (list_empty(&scsi_id->sbp2_command_orb_inuse))
 			scsi_id->last_orb = NULL;
-		}
 		spin_unlock_irqrestore(&scsi_id->sbp2_command_orb_lock, flags);
 
 	} else {



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2.6.18-rc4 4/5] ieee1394: sbp2: discard return value of sbp2_link_orb_command
  2006-08-27 11:25 [PATCH 2.6.18-rc4 0/5] drivers/ieee1394/sbp2: fixes for 2.6.18-rc Stefan Richter
                   ` (5 preceding siblings ...)
  2006-08-27 11:34 ` [PATCH 2.6.18-rc4 3/5] ieee1394: sbp2: safer last_orb and next_ORB handling Stefan Richter
@ 2006-08-27 11:34 ` Stefan Richter
  2006-08-27 11:35 ` [PATCH 2.6.18-rc4 5/5] ieee1394: sbp2: handle "sbp2util_node_write_no_wait failed" Stefan Richter
  7 siblings, 0 replies; 15+ messages in thread
From: Stefan Richter @ 2006-08-27 11:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ben Collins, linux-kernel

Since sbp2 is at the moment unable to do anything with the return value
of sbp2_link_orb_command, just discard it.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.c
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.c	2006-08-27 12:37:13.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.c	2006-08-27 12:37:44.000000000 +0200
@@ -1974,7 +1974,7 @@ static void sbp2_create_command_orb(stru
 /*
  * This function is called in order to begin a regular SBP-2 command.
  */
-static int sbp2_link_orb_command(struct scsi_id_instance_data *scsi_id,
+static void sbp2_link_orb_command(struct scsi_id_instance_data *scsi_id,
 				 struct sbp2_command_info *command)
 {
 	struct sbp2scsi_host_info *hi = scsi_id->hi;
@@ -2040,11 +2040,9 @@ static int sbp2_link_orb_command(struct 
 
 	SBP2_ORB_DEBUG("write to %s register, command orb %p",
 			last_orb ? "DOORBELL" : "ORB_POINTER", command_orb);
-	if (sbp2util_node_write_no_wait(scsi_id->ne, addr, data, length) < 0) {
+	if (sbp2util_node_write_no_wait(scsi_id->ne, addr, data, length))
 		SBP2_ERR("sbp2util_node_write_no_wait failed.\n");
-		return -EIO;
-	}
-	return 0;
+	/* We rely on SCSI EH to deal with _node_write_ failures. */
 }
 
 /*
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.h
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.h	2006-08-27 12:37:13.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.h	2006-08-27 12:37:44.000000000 +0200
@@ -391,11 +391,6 @@ static int sbp2_logout_device(struct scs
 static int sbp2_handle_status_write(struct hpsb_host *host, int nodeid, int destid,
 				    quadlet_t *data, u64 addr, size_t length, u16 flags);
 static int sbp2_agent_reset(struct scsi_id_instance_data *scsi_id, int wait);
-static int sbp2_link_orb_command(struct scsi_id_instance_data *scsi_id,
-				 struct sbp2_command_info *command);
-static int sbp2_send_command(struct scsi_id_instance_data *scsi_id,
-			     struct scsi_cmnd *SCpnt,
-			     void (*done)(struct scsi_cmnd *));
 static unsigned int sbp2_status_to_sense_data(unchar *sbp2_status,
 					      unchar *sense_data);
 static void sbp2_parse_unit_directory(struct scsi_id_instance_data *scsi_id,



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2.6.18-rc4 5/5] ieee1394: sbp2: handle "sbp2util_node_write_no_wait failed"
  2006-08-27 11:25 [PATCH 2.6.18-rc4 0/5] drivers/ieee1394/sbp2: fixes for 2.6.18-rc Stefan Richter
                   ` (6 preceding siblings ...)
  2006-08-27 11:34 ` [PATCH 2.6.18-rc4 4/5] ieee1394: sbp2: discard return value of sbp2_link_orb_command Stefan Richter
@ 2006-08-27 11:35 ` Stefan Richter
  7 siblings, 0 replies; 15+ messages in thread
From: Stefan Richter @ 2006-08-27 11:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ben Collins, linux-kernel

Fix for http://bugzilla.kernel.org/show_bug.cgi?id=6948

Because sbp2 writes to the target's fetch agent's registers from within
atomic context, it cannot sleep to guaranteedly get a free transaction
label.  This may repeatedly lead to "sbp2util_node_write_no_wait failed"
and consequently to SCSI command abortion after timeout.  A likely cause
is that many queue_command softirqs may occur before khpsbpkt (the
ieee1394 driver's thread which cleans up after finished transactions) is
woken up to recycle tlabels.

Sbp2 now schedules a workqueue job whenever sbp2_link_orb_command fails
in sbp2util_node_write_no_wait.  The job will reliably get a transaction
label because it can sleep.

We use the kernel-wide shared workqueue because it is unlikely that the
job itself actually needs to sleep.  In the improbable case that it has
to sleep, it doesn't need to sleep long since the standard transaction
timeout is 100ms.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.h
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.h	2006-08-27 12:37:44.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.h	2006-08-27 12:38:02.000000000 +0200
@@ -345,6 +345,9 @@ struct scsi_id_instance_data {
 
 	/* Device specific workarounds/brokeness */
 	unsigned workarounds;
+
+	atomic_t unfinished_reset;
+	struct work_struct protocol_work;
 };
 
 /* Sbp2 host data structure (one per IEEE1394 host) */
Index: linux-2.6.18-rc4/drivers/ieee1394/sbp2.c
===================================================================
--- linux-2.6.18-rc4.orig/drivers/ieee1394/sbp2.c	2006-08-27 12:37:44.000000000 +0200
+++ linux-2.6.18-rc4/drivers/ieee1394/sbp2.c	2006-08-27 12:38:02.000000000 +0200
@@ -478,6 +478,44 @@ static int sbp2util_node_write_no_wait(s
 	return 0;
 }
 
+static void sbp2util_notify_fetch_agent(struct scsi_id_instance_data *scsi_id,
+					u64 offset, quadlet_t *data, size_t len)
+{
+	/*
+	 * There is a small window after a bus reset within which the node
+	 * entry's generation is current but the reconnect wasn't completed.
+	 */
+	if (atomic_read(&scsi_id->unfinished_reset))
+		return;
+
+	if (hpsb_node_write(scsi_id->ne,
+			    scsi_id->sbp2_command_block_agent_addr + offset,
+			    data, len))
+		SBP2_ERR("sbp2util_notify_fetch_agent failed.");
+	/*
+	 * Now accept new SCSI commands, unless a bus reset happended during
+	 * hpsb_node_write.
+	 */
+	if (!atomic_read(&scsi_id->unfinished_reset))
+		scsi_unblock_requests(scsi_id->scsi_host);
+}
+
+static void sbp2util_write_orb_pointer(void *p)
+{
+	quadlet_t data[2];
+
+	data[0] = ORB_SET_NODE_ID(
+			((struct scsi_id_instance_data *)p)->hi->host->node_id);
+	data[1] = ((struct scsi_id_instance_data *)p)->last_orb_dma;
+	sbp2util_cpu_to_be32_buffer(data, 8);
+	sbp2util_notify_fetch_agent(p, SBP2_ORB_POINTER_OFFSET, data, 8);
+}
+
+static void sbp2util_write_doorbell(void *p)
+{
+	sbp2util_notify_fetch_agent(p, SBP2_DOORBELL_OFFSET, NULL, 4);
+}
+
 /*
  * This function is called to create a pool of command orbs used for
  * command processing. It is called when a new sbp2 device is detected.
@@ -725,6 +763,7 @@ static int sbp2_remove(struct device *de
 			sbp2scsi_complete_all_commands(scsi_id, DID_NO_CONNECT);
 		/* scsi_remove_device() will trigger shutdown functions of SCSI
 		 * highlevel drivers which would deadlock if blocked. */
+		atomic_set(&scsi_id->unfinished_reset, 0);
 		scsi_unblock_requests(scsi_id->scsi_host);
 	}
 	sdev = scsi_id->sdev;
@@ -778,6 +817,7 @@ static int sbp2_update(struct unit_direc
 
 	/* Make sure we unblock requests (since this is likely after a bus
 	 * reset). */
+	atomic_set(&scsi_id->unfinished_reset, 0);
 	scsi_unblock_requests(scsi_id->scsi_host);
 
 	return 0;
@@ -809,6 +849,8 @@ static struct scsi_id_instance_data *sbp
 	INIT_LIST_HEAD(&scsi_id->sbp2_command_orb_completed);
 	INIT_LIST_HEAD(&scsi_id->scsi_list);
 	spin_lock_init(&scsi_id->sbp2_command_orb_lock);
+	atomic_set(&scsi_id->unfinished_reset, 0);
+	INIT_WORK(&scsi_id->protocol_work, NULL, NULL);
 
 	ud->device.driver_data = scsi_id;
 
@@ -893,8 +935,10 @@ static void sbp2_host_reset(struct hpsb_
 	hi = hpsb_get_hostinfo(&sbp2_highlevel, host);
 
 	if (hi) {
-		list_for_each_entry(scsi_id, &hi->scsi_ids, scsi_list)
+		list_for_each_entry(scsi_id, &hi->scsi_ids, scsi_list) {
+			atomic_set(&scsi_id->unfinished_reset, 1);
 			scsi_block_requests(scsi_id->scsi_host);
+		}
 	}
 }
 
@@ -1046,7 +1090,7 @@ static void sbp2_remove_device(struct sc
 		scsi_remove_host(scsi_id->scsi_host);
 		scsi_host_put(scsi_id->scsi_host);
 	}
-
+	flush_scheduled_work();
 	sbp2util_remove_command_orb_pool(scsi_id);
 
 	list_del(&scsi_id->scsi_list);
@@ -1719,6 +1763,10 @@ static int sbp2_agent_reset(struct scsi_
 
 	SBP2_DEBUG_ENTER();
 
+	cancel_delayed_work(&scsi_id->protocol_work);
+	if (wait)
+		flush_scheduled_work();
+
 	data = ntohl(SBP2_AGENT_RESET_DATA);
 	addr = scsi_id->sbp2_command_block_agent_addr + SBP2_AGENT_RESET_OFFSET;
 
@@ -2040,9 +2088,22 @@ static void sbp2_link_orb_command(struct
 
 	SBP2_ORB_DEBUG("write to %s register, command orb %p",
 			last_orb ? "DOORBELL" : "ORB_POINTER", command_orb);
-	if (sbp2util_node_write_no_wait(scsi_id->ne, addr, data, length))
-		SBP2_ERR("sbp2util_node_write_no_wait failed.\n");
-	/* We rely on SCSI EH to deal with _node_write_ failures. */
+	if (sbp2util_node_write_no_wait(scsi_id->ne, addr, data, length)) {
+		/*
+		 * sbp2util_node_write_no_wait failed. We certainly ran out
+		 * of transaction labels, perhaps just because there were no
+		 * context switches which gave khpsbpkt a chance to collect
+		 * free tlabels. Try again in non-atomic context. If necessary,
+		 * the workqueue job will sleep to guaranteedly get a tlabel.
+		 * We do not accept new commands until the job is over.
+		 */
+		scsi_block_requests(scsi_id->scsi_host);
+		PREPARE_WORK(&scsi_id->protocol_work,
+			     last_orb ? sbp2util_write_doorbell:
+					sbp2util_write_orb_pointer,
+			     scsi_id);
+		schedule_work(&scsi_id->protocol_work);
+	}
 }
 
 /*



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2.6.18-rc4 1/5] ieee1394: sbp2: workaround for write protect bit of Initio firmware
  2006-08-27 11:26 ` [PATCH 2.6.18-rc4 1/5] ieee1394: sbp2: workaround for write protect bit of Initio firmware Stefan Richter
@ 2006-08-27 20:17   ` Linus Torvalds
  2006-08-28  1:19     ` Jeff Garzik
  2006-08-28  9:24     ` Christoph Hellwig
  0 siblings, 2 replies; 15+ messages in thread
From: Linus Torvalds @ 2006-08-27 20:17 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Ben Collins, linux-kernel

On Sun, 27 Aug 2006, Stefan Richter wrote:
>
> Yet another mode pages related bug of Initio firmwares was seen.
> INIC-1530 with a firmware by Initio responded with garbage to MODE SENSE
> (10).  Some HDDs were therefore incorrectly marked as write protected:
> http://bugzilla.kernel.org/show_bug.cgi?id=6947

Why does sbp2scsi_slave_configure() set "use_10_for_ms" in the first 
place?

Almost all the other SCSI code seems to default to using the 6-byte 
version, and then only uses the 10-byte version if they have some specific 
reason to do so.

I don't think there really is _ever_ any reason to use the 10-byte version 
if the 6-byte version is expected to work. Is there?

Examples:

 - scsi_add_lun() only sets the 10-byte MS for things that are 
   black-listed to set it (BLIST_USE_10_BYTE_MS)

   In fact, I don't see _anything_ in the source code that actually does 
   set that, so as far as I can see, scsi_add_lun() _never_ asks to use 
   the 10-byte modesense.

 - libata does set the 10-byte modesense for all SATA devices (but 
   scsi_lib.c might clear it if we get a bad status back).

 - USB devices set the 10-byte version _only_ if it's a non-disk device, 
   or if the subclass of the device is not SCSI.

Anyway, it would appear that sbp2 is the odd man out in setting up for a 
10-byte modesense, and then having to have strange magic rules for 
clearing it again.

Is there _really_ any reason to use the 10-byte version at all in the 
first place?

		Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2.6.18-rc4 1/5] ieee1394: sbp2: workaround for write protect bit of Initio firmware
  2006-08-27 20:17   ` Linus Torvalds
@ 2006-08-28  1:19     ` Jeff Garzik
  2006-08-28  9:24     ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff Garzik @ 2006-08-28  1:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stefan Richter, Ben Collins, linux-kernel

Linus Torvalds wrote:
> I don't think there really is _ever_ any reason to use the 10-byte version 
> if the 6-byte version is expected to work. Is there?

Yes, but not really important:  10-byte allows a 16-bit allocation 
length (additional data section), where 6-byte only allows 8-bit.

However, the mode pages the kernel requests from the device are normally 
<= 256 bytes.

In general, your intuition is correct.  use_10_for_foo is largely meant 
for the subclass of "SCSI" devices which often don't bother to implement 
the 6-byte commands, forcing the software driver to emulate 6-byte. 
use_10_for_foo allows the software driver to eliminate that in-kernel 
emulation code.

	Jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2.6.18-rc4 1/5] ieee1394: sbp2: workaround for write protect bit of Initio firmware
  2006-08-27 20:17   ` Linus Torvalds
  2006-08-28  1:19     ` Jeff Garzik
@ 2006-08-28  9:24     ` Christoph Hellwig
  2006-08-28 11:44       ` Stefan Richter
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2006-08-28  9:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stefan Richter, Ben Collins, linux-kernel

On Sun, Aug 27, 2006 at 01:17:31PM -0700, Linus Torvalds wrote:
> 
> 
> On Sun, 27 Aug 2006, Stefan Richter wrote:
> >
> > Yet another mode pages related bug of Initio firmwares was seen.
> > INIC-1530 with a firmware by Initio responded with garbage to MODE SENSE
> > (10).  Some HDDs were therefore incorrectly marked as write protected:
> > http://bugzilla.kernel.org/show_bug.cgi?id=6947
> 
> Why does sbp2scsi_slave_configure() set "use_10_for_ms" in the first 
> place?

I suspect that's because the typical command set for ieee1394 devices is
RBC which only specifies the 10-byte commands.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2.6.18-rc4 1/5] ieee1394: sbp2: workaround for write protect bit of Initio firmware
  2006-08-28  9:24     ` Christoph Hellwig
@ 2006-08-28 11:44       ` Stefan Richter
       [not found]         ` <44FC8AAC.10609@s5r6.in-berlin.de>
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Richter @ 2006-08-28 11:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Linus Torvalds, Ben Collins, linux-kernel

Christoph Hellwig wrote:
> On Sun, Aug 27, 2006 at 01:17:31PM -0700, Linus Torvalds wrote:
>> Why does sbp2scsi_slave_configure() set "use_10_for_ms" in the first 
>> place?

Good question...

All I can say about it is that older versions of sbp2 enforced the usage
of MODE SENSE(10). If the SCSI stack passed down an MS(6), sbp2
converted it to MS(10) and converted returned data back into MS(6)
format. It did so for RBC disks, SBC disks, and CD-ROMs.
http://lxr.free-electrons.com/source/drivers/ieee1394/sbp2.c?v=2.4.32#2292
Some months ago all of these command conversions in sbp2 were removed
and the use_10_for_ms flag was switched on.

It would be a non-issue if devices simply returned "illegal request"
status on a wrong version of MODE SENSE. But these Initio INIC-1530
bridges return "good" status after either MODE SENSE(6) or (10) but
bogus data after the latter.

> I suspect that's because the typical command set for ieee1394 devices is
> RBC which only specifies the 10-byte commands.

RBC actually lists only MODE SENSE(6), not (10). This makes sbp2's old
behaviour even more questionable. Nevertheless, INIC-1530 is the very
first device that became known to break on MS(10). Everything else
seemed to work fine with MS(10) so far. Switching unconditionally to try
MS(6) before MS(10) could break any other devices.

I suppose I should set up an SBP-2 target (Oracle has a GPL'd userspace
implementation; somebody recently mentioned an as yet unpublished
kernelspace implementation) and use this to log which version of MODE
SENSE Windows and Mac OS prefer. Firmware coders certainly test their
works of art with some variants of these OSs.

PS: Initio's published data sheet mentions RBC support. But their
inquiry data indicate a peripheral device type of 0, which is associated
with SBC.

PPS: This patch here is not strictly needed. The INIC-1530 can be
corrected if the user directly asks Initio for a corrected firmware and
firmware uploader (they don't have it on their website) and has access
to a Windows PC to run the firmware uploader. But I think Linux users
are better served if our driver handles these devices out of the box.
-- 
Stefan Richter
-=====-=-==- =--- ==--=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2.6.18-rc4 1/5] ieee1394: sbp2: workaround for write protect bit of Initio firmware
       [not found]         ` <44FC8AAC.10609@s5r6.in-berlin.de>
@ 2006-09-04 20:52           ` Linus Torvalds
  2006-09-04 21:26             ` Stefan Richter
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2006-09-04 20:52 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux-kernel, Christoph Hellwig, Ben Collins

On Mon, 4 Sep 2006, Stefan Richter wrote:
> 
> I finally did my homework.

Thanks for looking into it.

> OS X' and Win XP's SBP-2 drivers send MODE SENSE (6) to RBC disks and to
> SBC disks. OS X sends MODE SENSE (10) to MMC devices, Win XP seems to
> send MODE SENSE (6).

I think that clinches it. If WinXP sends the 6-byte version, we should 
too, since that's effectively what pretty much everybody has tested with 
(yeah, Apple used to be the main firewire user, but I bet there have been 
more XP users over the last few years, it's not like it's unusual on PC's 
any more).

> So it should be safe to remove the use_10_for_ms flag in our sbp2 driver
> at least for RBC disks and SBC disks. I.e. the proposed "workaround for
> write protect bit of Initio firmware" will no longer be necessary.
> 
> Next steps for me to do:
>  - test all my devices with use_10_for_ms removed from sbp2,
>  - prepare patch as candidate for 2.6.19-rc.

Yeah, please send me a patch asap after 2.6.18, so that it can go in very 
early (regardless of anything else).

Btw, somebody should tell the Oracle Endpoint people to get their target 
client to support the 6-byte version too, no? Anybody who knows that team?

		Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2.6.18-rc4 1/5] ieee1394: sbp2: workaround for write protect bit of Initio firmware
  2006-09-04 20:52           ` Linus Torvalds
@ 2006-09-04 21:26             ` Stefan Richter
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan Richter @ 2006-09-04 21:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Christoph Hellwig, Ben Collins

Linus Torvalds wrote:
> Btw, somebody should tell the Oracle Endpoint people to get their target 
> client to support the 6-byte version too, no? Anybody who knows that team?

At least OS X doesn't complain. I haven't tested Linux-to-Linux yet ---
but that's what Endpoint was obviously written for in the first place,
potentially with multiple 1394 cards on the target and multiple
concurrent initiators. Manish Singh is the author; he will get feedback
from me. I think he was around at linux1394-devel occasionally.
BTW, Endpoint could be reused to build a transport driver for the
upcoming generic SCSI target framework.
svn co http://oss.oracle.com/projects/endpoint/src/
-- 
Stefan Richter
-=====-=-==- =--= --=--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-09-04 21:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-27 11:25 [PATCH 2.6.18-rc4 0/5] drivers/ieee1394/sbp2: fixes for 2.6.18-rc Stefan Richter
2006-08-27 11:26 ` [PATCH 2.6.18-rc4 1/5] ieee1394: sbp2: workaround for write protect bit of Initio firmware Stefan Richter
2006-08-27 20:17   ` Linus Torvalds
2006-08-28  1:19     ` Jeff Garzik
2006-08-28  9:24     ` Christoph Hellwig
2006-08-28 11:44       ` Stefan Richter
     [not found]         ` <44FC8AAC.10609@s5r6.in-berlin.de>
2006-09-04 20:52           ` Linus Torvalds
2006-09-04 21:26             ` Stefan Richter
2006-08-27 11:27 ` [PATCH 2.6.18-rc4 2/5] ieee1394: sbp2: safer agent reset in error handlers Stefan Richter
2006-08-27 11:28 ` [PATCH 2.6.18-rc4 3/5] ieee1394: sbp2: safer last_orb and next_ORB handling Stefan Richter
2006-08-27 11:29 ` [PATCH 2.6.18-rc4 4/5] ieee1394: sbp2: discard return value of sbp2_link_orb_command Stefan Richter
2006-08-27 11:31 ` [PATCH 2.6.18-rc4 5/5] ieee1394: sbp2: handle "sbp2util_node_write_no_wait failed" Stefan Richter
2006-08-27 11:34 ` [PATCH 2.6.18-rc4 3/5] ieee1394: sbp2: safer last_orb and next_ORB handling Stefan Richter
2006-08-27 11:34 ` [PATCH 2.6.18-rc4 4/5] ieee1394: sbp2: discard return value of sbp2_link_orb_command Stefan Richter
2006-08-27 11:35 ` [PATCH 2.6.18-rc4 5/5] ieee1394: sbp2: handle "sbp2util_node_write_no_wait failed" Stefan Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox