Linux SCSI subsystem development
 help / color / mirror / Atom feed
* [PATCH] megaraid_sas: Enable shared host tag map
@ 2014-11-24 15:33 Hannes Reinecke
  2014-11-24 15:35 ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2014-11-24 15:33 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sumit Saxena, Kashyap Desai, Christoph Hellwig, linux-scsi,
	Hannes Reinecke

The megaraid SAS driver uses a shared host tag map internally,
so we should be telling the block layer about it.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/megaraid/megaraid_sas_base.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index f05580e..7fb83d0 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -2757,6 +2757,7 @@ static struct scsi_host_template megasas_template = {
 	.use_clustering = ENABLE_CLUSTERING,
 	.change_queue_depth = scsi_change_queue_depth,
 	.no_write_same = 1,
+	.use_blk_tags = 1,
 };
 
 /**
@@ -4996,6 +4997,11 @@ static int megasas_io_attach(struct megasas_instance *instance)
 		host->hostt->eh_bus_reset_handler = NULL;
 	}
 
+	if (scsi_init_shared_tag_map(host, host->can_queue)) {
+		printk(KERN_INFO "megasas: failed to init shared tag map\n");
+		return -ENODEV;
+	}
+
 	/*
 	 * Notify the mid-layer about the new controller
 	 */
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] megaraid_sas: Enable shared host tag map
  2014-11-24 15:33 [PATCH] megaraid_sas: Enable shared host tag map Hannes Reinecke
@ 2014-11-24 15:35 ` Christoph Hellwig
  2014-11-24 15:51   ` Hannes Reinecke
  2014-11-24 15:52   ` Kashyap Desai
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2014-11-24 15:35 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: James Bottomley, Sumit Saxena, Kashyap Desai, Christoph Hellwig,
	linux-scsi

On Mon, Nov 24, 2014 at 04:33:55PM +0100, Hannes Reinecke wrote:
> The megaraid SAS driver uses a shared host tag map internally,
> so we should be telling the block layer about it.

But it doesn't make use of request->tag yet.  This would only be
useful if you got rid of the internal tag allocator.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] megaraid_sas: Enable shared host tag map
  2014-11-24 15:35 ` Christoph Hellwig
@ 2014-11-24 15:51   ` Hannes Reinecke
  2014-11-24 15:59     ` Kashyap Desai
  2014-11-25 14:31     ` Christoph Hellwig
  2014-11-24 15:52   ` Kashyap Desai
  1 sibling, 2 replies; 10+ messages in thread
From: Hannes Reinecke @ 2014-11-24 15:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: James Bottomley, Sumit Saxena, Kashyap Desai, linux-scsi

On 11/24/2014 04:35 PM, Christoph Hellwig wrote:
> On Mon, Nov 24, 2014 at 04:33:55PM +0100, Hannes Reinecke wrote:
>> The megaraid SAS driver uses a shared host tag map internally,
>> so we should be telling the block layer about it.
> 
> But it doesn't make use of request->tag yet.  This would only be
> useful if you got rid of the internal tag allocator.
> 
It is useful as is, as we'll be getting prefixed logging output :-)

But yeah, it would be good to get rid of the internal tag allocator.

Which I didn't do yet as the driver is using a larger tag map than
that one announced to the block layer.
This is to facilitate internal command submission, which should
always work independent on any tag starvation issues from the
upper layers.

So when moving to the generic tag allocation code I'd need:
- an block-layer helper for allocating a tag without a request
  (basically separating the existing blk-tag functionality, easy)
- introducing 'emergency pools' for the tag map, allowing to
  allocate a tag even under I/O pressure.

What I did was to:
- Split off blk_reserve_tag() functionality from blk_start_tag()
  (and similar with blk_end_tag)
- registered the overall size of the shared tag map
- called blk_resize_tags() to shrink it to ->can_queue
- Implemented a 'force' attribute to blk_reserve_tag() to
  allow it to dip into the excess size.

But then I didn't really like this mis-use of the blk_resize_tag()
operation; I'd rather have it marked explicitly.

However, I'm sure megasas isn't the only driver requiring such a
feature (libata eg would benefit from this, too).

Ideas?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH] megaraid_sas: Enable shared host tag map
  2014-11-24 15:35 ` Christoph Hellwig
  2014-11-24 15:51   ` Hannes Reinecke
@ 2014-11-24 15:52   ` Kashyap Desai
  1 sibling, 0 replies; 10+ messages in thread
From: Kashyap Desai @ 2014-11-24 15:52 UTC (permalink / raw)
  To: Christoph Hellwig, Hannes Reinecke
  Cc: James Bottomley, Sumit Saxena, linux-scsi

> -----Original Message-----
> From: Christoph Hellwig [mailto:hch@lst.de]
> Sent: Monday, November 24, 2014 9:06 PM
> To: Hannes Reinecke
> Cc: James Bottomley; Sumit Saxena; Kashyap Desai; Christoph Hellwig;
linux-
> scsi@vger.kernel.org
> Subject: Re: [PATCH] megaraid_sas: Enable shared host tag map
>
> On Mon, Nov 24, 2014 at 04:33:55PM +0100, Hannes Reinecke wrote:
> > The megaraid SAS driver uses a shared host tag map internally, so we
> > should be telling the block layer about it.
>
> But it doesn't make use of request->tag yet.  This would only be useful
if you
> got rid of the internal tag allocator.

I have patch for code changes in this particular area and that is
currently under internal test cycle.

Next patch series will start using shared blk tag for internal MPT frame
allocation. It will get-rid-off  MPT frame pool link list completely and
also does lots of refactor w.r.t MFI/MPT frame corruption issue. Using
shared blk tag gave good performance improvement as driver completely
removed mpt frame spin lock used in megasas_get_cmd_fusion/
megasas_return_cmd_fusion.

~ Kashyap

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH] megaraid_sas: Enable shared host tag map
  2014-11-24 15:51   ` Hannes Reinecke
@ 2014-11-24 15:59     ` Kashyap Desai
  2014-11-25 14:31     ` Christoph Hellwig
  1 sibling, 0 replies; 10+ messages in thread
From: Kashyap Desai @ 2014-11-24 15:59 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig
  Cc: James Bottomley, Sumit Saxena, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 3014 bytes --]

> -----Original Message-----
> From: Hannes Reinecke [mailto:hare@suse.de]
> Sent: Monday, November 24, 2014 9:21 PM
> To: Christoph Hellwig
> Cc: James Bottomley; Sumit Saxena; Kashyap Desai; linux-
> scsi@vger.kernel.org
> Subject: Re: [PATCH] megaraid_sas: Enable shared host tag map
>
> On 11/24/2014 04:35 PM, Christoph Hellwig wrote:
> > On Mon, Nov 24, 2014 at 04:33:55PM +0100, Hannes Reinecke wrote:
> >> The megaraid SAS driver uses a shared host tag map internally, so we
> >> should be telling the block layer about it.
> >
> > But it doesn't make use of request->tag yet.  This would only be
> > useful if you got rid of the internal tag allocator.
> >
> It is useful as is, as we'll be getting prefixed logging output :-)
>
> But yeah, it would be good to get rid of the internal tag allocator.
>
> Which I didn't do yet as the driver is using a larger tag map than that
one
> announced to the block layer.
> This is to facilitate internal command submission, which should always
work
> independent on any tag starvation issues from the upper layers.
>
> So when moving to the generic tag allocation code I'd need:
> - an block-layer helper for allocating a tag without a request
>   (basically separating the existing blk-tag functionality, easy)
> - introducing 'emergency pools' for the tag map, allowing to
>   allocate a tag even under I/O pressure.
>
> What I did was to:
> - Split off blk_reserve_tag() functionality from blk_start_tag()
>   (and similar with blk_end_tag)
> - registered the overall size of the shared tag map
> - called blk_resize_tags() to shrink it to ->can_queue
> - Implemented a 'force' attribute to blk_reserve_tag() to
>   allow it to dip into the excess size.

Hannes,

I just received your reply... My response crossed with yours. I have
attached preview patch for your reference which I refer in my earlier
response.
Megaraid_sas driver will expose only actual IO command queue depth to the
SML and it will manage all internal used commands without taking any help
from SML.

Driver will now expose max_scsi_cmds to the upper layer which will be
lower than actually FW can support max command.
 E.a
+     instance->max_scsi_cmds = instance->max_fw_cmds -
+                               (MEGASAS_FUSION_INTERNAL_CMDS +
+                               MEGASAS_FUSION_IOCTL_CMDS);

We have to do this to make use of existing OS distribution which really
does not support blk_reserve_tag().

~ Kashyap
>
> But then I didn't really like this mis-use of the blk_resize_tag()
operation; I'd
> rather have it marked explicitly.
>
> However, I'm sure megasas isn't the only driver requiring such a feature
> (libata eg would benefit from this, too).
>
> Ideas?
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke		      zSeries & Storage
> hare@suse.de			      +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 21284 (AG Nürnberg)

[-- Attachment #2: blk_shared.patch --]
[-- Type: application/octet-stream, Size: 42584 bytes --]

diff -arup megaraid_sas-06.807.01.01/distro/base//megaraid_sas.h megaraid_sas-06.807.02.00/distro/base//megaraid_sas.h
--- megaraid_sas-06.807.01.01/distro/base//megaraid_sas.h	2014-11-18 17:16:48.000000000 +0530
+++ megaraid_sas-06.807.02.00/distro/base//megaraid_sas.h	2014-11-24 21:52:28.000000000 +0530
@@ -33,7 +33,7 @@
 /*
  * MegaRAID SAS Driver meta data
  */
-#define MEGASAS_VERSION				"06.807.01.00"
+#define MEGASAS_VERSION				"06.807.02.00"
 #define MEGASAS_RELDATE				"Nov 13, 2014"
 #define MEGASAS_EXT_VERSION			"Nov 13, 17:00:00 PDT 2014"
 #define FALSE 0
@@ -156,6 +156,11 @@
 #define MFI_FRAME_DIR_BOTH			0x0018
 #define MFI_FRAME_IEEE                          0x0020
 
+/* Driver internal */
+#define DRV_DCMD_POLLED_MODE		0x1
+#define	BLK_TAG_DEBUG			0
+#define	BLK_TAG_REFCOUNT		0
+
 /*
  * Definition for cmd_status
  */
@@ -239,12 +244,6 @@
 							MEGASAS_MAX_DEV_PER_CHANNEL)
 
 
-typedef enum _MR_MFI_MPT_PTHR_FLAGS {
-        MFI_MPT_DETACHED =   0,
-        MFI_LIST_ADDED   =   1,
-        MFI_MPT_ATTACHED =   2,
-} MR_MFI_MPT_PTHR_FLAGS;
-
 typedef enum _MR_SCSI_CMD_TYPE {
         READ_WRITE_LDIO = 0,
         NON_READ_WRITE_LDIO = 1,
@@ -2222,13 +2221,10 @@ struct megasas_cmd {
 	u8 cmd_status;
 	u8 abort_aen;
         u8 retry_for_fw_reset;
+	u8 flags;
 
 	struct list_head list;
 	struct scsi_cmnd *scmd;
-	
-	void *mpt_pthr_cmd_blocked;
-	atomic_t mfi_mpt_pthr;
-	u8 is_wait_event;
 
 	struct megasas_instance *instance;
 	union {
diff -arup megaraid_sas-06.807.01.01/distro/base//megaraid_sas_base.c megaraid_sas-06.807.02.00/distro/base//megaraid_sas_base.c
--- megaraid_sas-06.807.01.01/distro/base//megaraid_sas_base.c	2014-11-18 17:16:48.000000000 +0530
+++ megaraid_sas-06.807.02.00/distro/base//megaraid_sas_base.c	2014-11-24 21:52:28.000000000 +0530
@@ -210,8 +210,8 @@ extern void megasas_free_host_crash_buff
 		(struct megasas_instance *instance);
 extern void
 megasas_return_cmd_fusion(struct megasas_instance *instance, struct megasas_cmd_fusion *cmd);
-extern void megasas_return_mfi_mpt_pthr(struct megasas_instance *instance, 
-		struct megasas_cmd *cmd_mfi, struct megasas_cmd_fusion *cmd_fusion);
+static int
+megasas_setup_irqs(struct megasas_instance *instance, u8 is_probe);
 
 void
 megasas_issue_dcmd(struct megasas_instance *instance, struct megasas_cmd *cmd)
@@ -238,7 +238,6 @@ struct megasas_cmd *megasas_get_cmd(stru
 		cmd = list_entry((&instance->cmd_pool)->next,
 				 struct megasas_cmd, list);
 		list_del_init(&cmd->list);
-		atomic_set(&cmd->mfi_mpt_pthr, MFI_MPT_DETACHED);
 	} else {
 		printk(KERN_ERR "megasas: Command pool empty!\n");
 	}
@@ -248,51 +247,46 @@ struct megasas_cmd *megasas_get_cmd(stru
 }
 
 /**
- * __megasas_return_cmd -	Return a cmd to free command pool
+ * megasas_return_cmd -	Return a cmd to free command pool
  * @instance:		Adapter soft state
  * @cmd:		Command packet to be returned to free command pool
  */
 inline void
-__megasas_return_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd)
+megasas_return_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd)
 {
-	unsigned long mfi_mpt_flag;
-  
-	/* Do not free MFI command which is used for MFI-MPT passthrough 
-	 * This check is common for MFI and Fusion Adapter
- 	 * as MFI controller only set MFI_MPT_DETACHED and MFI_MPT_ADDED flags.
+	unsigned long flags;
+	u32 blk_tags;
+	struct megasas_cmd_fusion *cmd_fusion;
+	struct fusion_context *fusion = instance->ctrl_context;
+
+	/* This flag is used only for fusion adapter.
+	 * Wait for Interrupt for Polled mode DCMD
 	 */
-	mfi_mpt_flag = atomic_read(&cmd->mfi_mpt_pthr);
-	if(mfi_mpt_flag)
+	if (cmd->flags & DRV_DCMD_POLLED_MODE)
 		return;
-	
+
+	spin_lock_irqsave(&instance->cmd_pool_lock, flags);
+
+	if (fusion) {
+		blk_tags = instance->max_scsi_cmds + cmd->index;
+#if BLK_TAG_DEBUG
+		dev_info(&instance->pdev->dev,
+			"%s : opcode (0x%08x) tag (%04d) cmd flags (0x%08x)\n",
+				__func__, le32_to_cpu(cmd->frame->dcmd.opcode),
+				blk_tags, cmd->flags);
+#endif
+		cmd_fusion = fusion->cmd_list[blk_tags];
+		megasas_return_cmd_fusion(instance, cmd_fusion);
+	}
 	cmd->scmd = NULL;
 	cmd->frame_count = 0;
-	cmd->is_wait_event = 0;
-	cmd->mpt_pthr_cmd_blocked = NULL;
-	
-	if ((instance->pdev->device != PCI_DEVICE_ID_LSI_FUSION) &&
-	    (instance->pdev->device != PCI_DEVICE_ID_LSI_PLASMA) &&
-	    (instance->pdev->device != PCI_DEVICE_ID_LSI_INVADER) &&
-	    (instance->pdev->device != PCI_DEVICE_ID_LSI_FURY) &&
-	    (reset_devices))
+	cmd->flags = 0;
+	if (!fusion && reset_devices)
 		cmd->frame->hdr.cmd = MFI_CMD_INVALID;
-	
-	atomic_set(&cmd->mfi_mpt_pthr, MFI_LIST_ADDED);
 	list_add(&cmd->list, (&instance->cmd_pool)->next);
-}
 
-/**
- * megasas_return_cmd -	Return a cmd to free command pool
- * @instance:		Adapter soft state
- * @cmd:		Command packet to be returned to free command pool
- */
-inline void
-megasas_return_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd)
-{
-	unsigned long flags;
-	spin_lock_irqsave(&instance->cmd_pool_lock, flags);
-	__megasas_return_cmd(instance, cmd);	
- 	spin_unlock_irqrestore(&instance->cmd_pool_lock, flags);
+	spin_unlock_irqrestore(&instance->cmd_pool_lock, flags);
+
 }
   
 /**
@@ -975,7 +969,6 @@ megasas_issue_blocked_cmd(struct megasas
 	int ret = 0;
 	cmd->cmd_status = ENODATA;
 
-	cmd->is_wait_event = 1;
 	instance->instancet->issue_dcmd(instance, cmd);
 	if (timeout) {
 		ret = wait_event_timeout(instance->int_cmd_wait_q,
@@ -1050,7 +1043,6 @@ megasas_issue_blocked_abort_cmd(struct m
 
 	cmd->sync_cmd = 0;
 
-
 	megasas_return_cmd(instance, cmd);
 	return 0;
 }
@@ -1865,6 +1857,10 @@ static int megasas_slave_configure(struc
 	blk_queue_rq_timeout(sdev->request_queue,
 		MEGASAS_DEFAULT_CMD_TIMEOUT * HZ);
 
+	sdev_printk(KERN_INFO, sdev, "qdepth(%d), tagged(%d), "
+		"scsi_level(%d), cmd_que(%d)\n", sdev->queue_depth,
+		sdev->tagged_supported, sdev->scsi_level,
+		(sdev->inquiry[7] & 2) >> 1);
 	return 0;
 }
 
@@ -3696,10 +3692,7 @@ megasas_service_aen(struct megasas_insta
 
 	instance->aen_cmd = NULL;
 
-	if(instance->ctrl_context && cmd->mpt_pthr_cmd_blocked)
-		megasas_return_mfi_mpt_pthr(instance, cmd, cmd->mpt_pthr_cmd_blocked);
-	else
-		megasas_return_cmd(instance, cmd);
+	megasas_return_cmd(instance, cmd);
 
 	if ((instance->unload == 0) && ((instance->issuepend_done == 1))) {
 		struct megasas_aen_event *ev;
@@ -3729,10 +3722,15 @@ static int megasas_slave_alloc(struct sc
 			sdev->id;
 		if (instance->pd_list[pd_index].driveState ==
 					MR_PD_STATE_SYSTEM) {
-			return 0;
+			goto scan_target;
 		}
 		return -ENXIO;
 	}
+
+scan_target:
+	sdev->tagged_supported = 1;
+	scsi_activate_tcq(sdev, sdev->queue_depth);
+
 	return 0;
 }
 
@@ -4082,13 +4080,13 @@ megasas_complete_cmd(struct megasas_inst
 					/* LD MAP is only used for Fusion controller, 
 					 * so it is safe to call mfi_mpt free routine without checking controller type.
 					 */
-					megasas_return_mfi_mpt_pthr(instance, cmd, cmd->mpt_pthr_cmd_blocked);
+					megasas_return_cmd(instance, cmd);
 					spin_unlock_irqrestore(instance->host->host_lock, flags);
 					break;
 				}
 			} else
 				instance->map_id++;
-			megasas_return_mfi_mpt_pthr(instance, cmd, cmd->mpt_pthr_cmd_blocked);
+			megasas_return_cmd(instance, cmd);
 
 			// Set fast path io to ZERO. Validate Map will set proper value. Meanwhile all IOs will go 
 			// as LD IO.
@@ -4834,17 +4832,10 @@ int megasas_alloc_cmds(struct megasas_in
 		}
 	}
 
-	/*
-	 * Add all the commands to command pool (instance->cmd_pool)
-        	MFI_MPT_DETACHED =   0,
-	        MFI_LIST_ADDED   =   1,
-        	MFI_MPT_ATTACHED =   2,
-	 */
 	for (i = 0; i < max_cmd; i++) {
 		cmd = instance->cmd_list[i];
 		memset(cmd, 0, sizeof(struct megasas_cmd));
 		cmd->index = i;
-		atomic_set(&cmd->mfi_mpt_pthr, MFI_LIST_ADDED);
 		cmd->scmd = NULL;
 		cmd->instance = instance;
 
@@ -4951,10 +4942,7 @@ megasas_get_pd_list(struct megasas_insta
 				MEGASAS_MAX_PD * sizeof(struct MR_PD_LIST),
 				ci, ci_h);
 		
-	if(instance->ctrl_context && cmd->mpt_pthr_cmd_blocked)
-		megasas_return_mfi_mpt_pthr(instance, cmd, cmd->mpt_pthr_cmd_blocked);
-	else
-		megasas_return_cmd(instance, cmd);
+	megasas_return_cmd(instance, cmd);
 	
 	return ret;
 }
@@ -5038,11 +5026,7 @@ megasas_get_ld_list(struct megasas_insta
 
 	pci_free_consistent(instance->pdev, sizeof(struct MR_LD_LIST), ci, ci_h); 
 		
-
-	if(instance->ctrl_context && cmd->mpt_pthr_cmd_blocked)
-		megasas_return_mfi_mpt_pthr(instance, cmd, cmd->mpt_pthr_cmd_blocked);
-	else
-		megasas_return_cmd(instance, cmd);
+	megasas_return_cmd(instance, cmd);
 
 	return ret;
 }
@@ -5123,10 +5107,7 @@ megasas_ld_list_query(struct megasas_ins
 	pci_free_consistent(instance->pdev, sizeof(struct MR_LD_TARGETID_LIST),
 			    ci, ci_h);
 
-	if(instance->ctrl_context && cmd->mpt_pthr_cmd_blocked)
-		megasas_return_mfi_mpt_pthr(instance, cmd, cmd->mpt_pthr_cmd_blocked);
-	else
-		megasas_return_cmd(instance, cmd);
+	megasas_return_cmd(instance, cmd);
 
 	return ret;
 }
@@ -5161,10 +5142,12 @@ static void megasas_update_ext_vd_detail
 		instance->fw_supported_vd_count = MAX_LOGICAL_DRIVES;
 		instance->fw_supported_pd_count = MAX_PHYSICAL_DEVICES;
 	}
-	dev_info(&instance->pdev->dev, "Firmware supports %d VD %d PD\n",
+	dev_info(&instance->pdev->dev,
+		"firmware supports\t: %d VD %d PD\n",
 		instance->fw_supported_vd_count,
 		instance->fw_supported_pd_count);
-	dev_info(&instance->pdev->dev, "Driver supports %d VD  %d PD\n",
+	dev_info(&instance->pdev->dev,
+		"driver supports\t: %d VD %d PD\n",
 		instance->drv_supported_vd_count,
 		instance->drv_supported_pd_count);
 
@@ -5263,15 +5246,19 @@ megasas_get_ctrl_info(struct megasas_ins
 		 * in case of Firmware upgrade without system reboot.
 		 */
 		megasas_update_ext_vd_details(instance);
+
+		/*Check whether controller is iMR or MR */
+		instance->is_imr = (ctrl_info->memory_size ? 0 : 1);
+		dev_info(&instance->pdev->dev,
+				"controller type\t: %s(%dMB)\n",
+				instance->is_imr ? "iMR" : "MR",
+				le16_to_cpu(ctrl_info->memory_size));
 	}
 
 	pci_free_consistent(instance->pdev, sizeof(struct megasas_ctrl_info),
 			    ci, ci_h);
 
-	if(instance->ctrl_context && cmd->mpt_pthr_cmd_blocked)
-		megasas_return_mfi_mpt_pthr(instance, cmd, cmd->mpt_pthr_cmd_blocked);
-	else
-		megasas_return_cmd(instance, cmd);
+	megasas_return_cmd(instance, cmd);
 
 
 	return ret;
@@ -5327,10 +5314,7 @@ megasas_set_crash_dump_params(struct meg
 	else
 		ret = megasas_issue_polled(instance, cmd);
 
-	if(instance->ctrl_context && cmd->mpt_pthr_cmd_blocked)
-		megasas_return_mfi_mpt_pthr(instance, cmd, cmd->mpt_pthr_cmd_blocked);
-	else
-		megasas_return_cmd(instance, cmd);
+	megasas_return_cmd(instance, cmd);
 
 	return ret;
 }
@@ -5603,6 +5587,21 @@ megasas_init_adapter_mfi(struct megasas_
 	instance->max_num_sge = (instance->instancet->read_fw_status_reg(reg_set) & 0xFF0000) >> 
 					0x10;
 	/*
+	 * For MFI skinny adapters, MEGASAS_SKINNY_INT_CMDS commands
+	 * are reserved for IOCTL + driver's internal DCMDs.
+	 */
+	if ((instance->pdev->device == PCI_DEVICE_ID_LSI_SAS0073SKINNY) ||
+		(instance->pdev->device == PCI_DEVICE_ID_LSI_SAS0071SKINNY)) {
+		instance->max_scsi_cmds = (instance->max_fw_cmds -
+			MEGASAS_SKINNY_INT_CMDS);
+		sema_init(&instance->ioctl_sem, MEGASAS_SKINNY_INT_CMDS);
+	} else {
+		instance->max_scsi_cmds = (instance->max_fw_cmds -
+			MEGASAS_INT_CMDS);
+		sema_init(&instance->ioctl_sem, (MEGASAS_INT_CMDS - 5));
+	}
+
+	/*
 	 * Create a pool of commands
 	 */
 	if (megasas_alloc_cmds(instance))
@@ -5802,28 +5801,32 @@ static int megasas_init_fw(struct megasa
 		} else
 			instance->msix_vectors = 0;
 
-		printk(KERN_INFO "[scsi%d]: FW supports <%d> MSIX vector, Online"
-			"CPUs: <%d>, Current MSIX <%d>\n", instance->host->host_no,
-			fw_msix_count, (unsigned int)num_online_cpus(),
-			instance->msix_vectors);
+		dev_info(&instance->pdev->dev,
+			"firmware supports msix\t: (%d)", fw_msix_count);
+		dev_info(&instance->pdev->dev,
+			"current msix/online cpus\t: (%d/%d)\n",
+			instance->msix_vectors, (unsigned int)num_online_cpus());
+
+		if (megasas_setup_irqs(instance, 1))
+			goto fail_init_adapter;
 	}
-	
+
 	instance->ctrl_info = kzalloc(sizeof(struct megasas_ctrl_info), GFP_KERNEL);
 	if ( instance->ctrl_info == NULL)
 		goto fail_init_adapter;
-			
+
 	/*Below are defaul value for legacy Firmware. non-fusion based controllers*/
 	instance->fw_supported_vd_count = MAX_LOGICAL_DRIVES;
 	instance->fw_supported_pd_count = MAX_PHYSICAL_DEVICES;
-	/* Get operational params, sge flags, send init cmd to controller */
+	/* Do not send too many DCMD from init_adapter(), because interrupt
+	 * is still disabled, so DCMD will be send using polling method.
+	 * Current limit is - MEGASAS_FUSION_INTERNAL_CMDS +
+	 * MEGASAS_FUSION_IOCTL_CMDS.
+	 */
 	if (instance->instancet->init_adapter(instance))
 		goto fail_init_adapter;
 	
-	printk(KERN_ERR "megasas: INIT adapter done \n");
-
-	/** for passthrough
-	* the following function will get the PD LIST.
-	*/
+	instance->instancet->enable_intr(instance);
 
 	memset(instance->pd_list, 0, MEGASAS_MAX_PD * sizeof(struct megasas_pd_list));
 	megasas_get_pd_list(instance);
@@ -5852,32 +5855,16 @@ static int megasas_init_fw(struct megasa
 	tmp_sectors = min_t(u32, max_sectors_1 , max_sectors_2);
 	
 
-	printk(KERN_INFO "Controller Features:\n");
-	printk(KERN_INFO "Vendor ID: %u Device ID: %u\n",
-			ctrl_info->pci.vendor_id,
-			ctrl_info->pci.device_id);
-	printk(KERN_INFO "Sub Vendor ID: %u SubDevice ID: %u\n",
-			ctrl_info->pci.sub_vendor_id,
-			ctrl_info->pci.sub_device_id);
-
-	/*Check whether controller is iMR or MR */
-	if (ctrl_info->memory_size) {
-		instance->is_imr = 0;
-		printk("megaraid_sas: Controller type: MR, Memory size is: %dMB\n",
-			le16_to_cpu(ctrl_info->memory_size));
-	} else {
-		instance->is_imr = 1;
-		printk("megaraid_sas: Controller type: iMR\n");
-	}
-	instance->disableOnlineCtrlReset = ctrl_info->properties.OnOffProperties.disableOnlineCtrlReset;
-	dev_info(&instance->pdev->dev, "disableOnlineCtrlReset : %d\n",
-		instance->disableOnlineCtrlReset);
-	instance->UnevenSpanSupport = ctrl_info->adapterOperations2.supportUnevenSpans;
+
+	instance->disableOnlineCtrlReset =
+		ctrl_info->properties.OnOffProperties.disableOnlineCtrlReset;
+	instance->UnevenSpanSupport =
+		ctrl_info->adapterOperations2.supportUnevenSpans;
+	instance->secure_jbod_support =
+		ctrl_info->adapterOperations3.supportSecurityonJBOD;
 	instance->mpio = ctrl_info->adapterOperations2.mpio;
 	if(instance->UnevenSpanSupport) {
 		struct fusion_context *fusion = instance->ctrl_context;
-		printk("megaraid_sas: FW supports: UnevenSpanSupport=%x\n", 
-			instance->UnevenSpanSupport);
 		if (MR_ValidateMapInfo(instance))
 			fusion->fast_path_io = 1;
 		else
@@ -5904,11 +5891,9 @@ static int megasas_init_fw(struct megasa
 	instance->crash_dump_drv_support = (crashdump_enable && 
 				instance->crash_dump_fw_support &&
 				instance->crash_dump_buf);
-	if(instance->crash_dump_drv_support) { 
-		printk(KERN_INFO "megaraid_sas: FW Crash dump is supported\n");
+	if (instance->crash_dump_drv_support)
 		megasas_set_crash_dump_params(instance, MR_CRASH_BUF_TURN_OFF);
-		 
-	} else {
+	else {
 		if (instance->crash_dump_buf)
 			pci_free_consistent(instance->pdev, CRASH_DMA_BUF_SIZE,
 				instance->crash_dump_buf,
@@ -5916,38 +5901,26 @@ static int megasas_init_fw(struct megasa
 		instance->crash_dump_buf = NULL;
 	}
 	
-	instance->secure_jbod_support = ctrl_info->adapterOperations3.supportSecurityonJBOD;
-	if(instance->secure_jbod_support) 
-		printk(KERN_INFO "megaraid_sas: FW supports Secure JBOD\n");
-			
+	dev_info(&instance->pdev->dev,
+		"pci id\t\t: (0x%04x)/(0x%04x)/(0x%04x)/(0x%04x)\n",
+		le16_to_cpu(ctrl_info->pci.vendor_id),
+		le16_to_cpu(ctrl_info->pci.device_id),
+		le16_to_cpu(ctrl_info->pci.sub_vendor_id),
+		le16_to_cpu(ctrl_info->pci.sub_device_id));
+	dev_info(&instance->pdev->dev, "unevenspan support	: %s\n",
+		instance->UnevenSpanSupport ? "yes" : "no");
+	dev_info(&instance->pdev->dev, "disable ocr		: %s\n",
+		instance->disableOnlineCtrlReset ? "yes" : "no");
+	dev_info(&instance->pdev->dev, "firmware crash dump	: %s\n",
+		instance->crash_dump_drv_support ? "yes" : "no");
+	dev_info(&instance->pdev->dev, "secure jbod		: %s\n",
+		instance->secure_jbod_support ? "yes" : "no");
+
 	instance->max_sectors_per_req = instance->max_num_sge *
 						PAGE_SIZE / 512;
 	if (tmp_sectors && (instance->max_sectors_per_req > tmp_sectors))
 		instance->max_sectors_per_req = tmp_sectors;
 	
-	/* 
-	 * 1. For fusion adapters, 3 commands for IOCTL and 5 commands
-	 *    for driver's internal DCMDs.
-	 * 2. For MFI skinny adapters, 5 commands for IOCTL + driver's
-	 *    internal DCMDs.
-	 * 3. For rest of MFI adapters, 27 commands reserved for IOCTLs
-	 *    and 5 commands for drivers's internal DCMD.    
-	 */
-	if (instance->ctrl_context) {
-		instance->max_scsi_cmds = instance->max_fw_cmds -
-					(MEGASAS_FUSION_INTERNAL_CMDS +
-					MEGASAS_FUSION_IOCTL_CMDS);
-		sema_init(&instance->ioctl_sem, MEGASAS_FUSION_IOCTL_CMDS);
-	} else if ((instance->pdev->device == PCI_DEVICE_ID_LSI_SAS0073SKINNY) ||
-		(instance->pdev->device == PCI_DEVICE_ID_LSI_SAS0071SKINNY)) {
-		instance->max_scsi_cmds = instance->max_fw_cmds -
-                                          MEGASAS_SKINNY_INT_CMDS;
-		sema_init(&instance->ioctl_sem, MEGASAS_SKINNY_INT_CMDS);
-	} else {
-		instance->max_scsi_cmds = instance->max_fw_cmds -
-                                          MEGASAS_INT_CMDS;
-		sema_init(&instance->ioctl_sem, (MEGASAS_INT_CMDS - 5)); 
-	}
 
 	/* Check for valid throttlequeuedepth module parameter */
 	if (throttlequeuedepth && 
@@ -6076,10 +6049,7 @@ megasas_get_seq_num(struct megasas_insta
 	pci_free_consistent(instance->pdev, sizeof(struct megasas_evt_log_info),
 			    el_info, el_info_h);
 
-	if(instance->ctrl_context && cmd->mpt_pthr_cmd_blocked)
-		megasas_return_mfi_mpt_pthr(instance, cmd, cmd->mpt_pthr_cmd_blocked);
-	else
-		megasas_return_cmd(instance, cmd);
+	megasas_return_cmd(instance, cmd);
 
 	return 0;
 }
@@ -6323,12 +6293,21 @@ static int megasas_io_attach(struct mega
 		host->hostt->eh_device_reset_handler = NULL;
 		host->hostt->eh_bus_reset_handler = NULL;
 	}
+	error = scsi_init_shared_tag_map(host, host->can_queue);
+	if (error) {
+		dev_err(&instance->pdev->dev,
+			"Failed to shared tag from %s %d\n",
+			__func__, __LINE__);
+		return -ENODEV;
+	}
 
 	/*
 	 * Notify the mid-layer about the new controller
 	 */
 	if (scsi_add_host(host, &instance->pdev->dev)) {
-		printk(KERN_DEBUG "megasas: scsi_add_host failed\n");
+		dev_err(&instance->pdev->dev,
+			"Failed to add host from %s %d\n",
+			__func__, __LINE__);
 		return -ENODEV;
 	}
 
@@ -6387,6 +6366,71 @@ fail_set_dma_mask:
 }
 
 /**
+ * megasas_setup_irqs -	register interrupt with IRQ sub system.
+ * @instance:				Adapter soft state
+ * @is_probe:				Driver probe check
+ *
+ * Do not enable interrupt, only setup ISRs.
+ *
+ * Return 0 on success.
+ */
+static int
+megasas_setup_irqs(struct megasas_instance *instance, u8 is_probe)
+{
+	int i, j, cpu;
+	struct pci_dev *pdev;
+
+	pdev = instance->pdev;
+
+try_io_apic:
+	if (!instance->msix_vectors) {
+		instance->irq_context[0].instance = instance;
+		instance->irq_context[0].MSIxIndex = 0;
+		if (request_irq(pdev->irq, instance->instancet->service_isr,
+			IRQF_SHARED, "megasas", &instance->irq_context[0])) {
+			dev_err(&instance->pdev->dev,
+					"Failed to register IRQ from %s %d\n",
+					__func__, __LINE__);
+			return -1;
+		}
+		return 0;
+	}
+	/* Try MSI-x */
+	cpu = cpumask_first(cpu_online_mask);
+	for (i = 0; i < instance->msix_vectors; i++) {
+		instance->irq_context[i].instance = instance;
+		instance->irq_context[i].MSIxIndex = i;
+		if (request_irq(instance->msixentry[i].vector,
+			instance->instancet->service_isr, 0, "megasas",
+			&instance->irq_context[i])) {
+			dev_err(&instance->pdev->dev,
+				"Failed to register IRQ for vector %d.\n", i);
+			for (j = 0; j < i; j++) {
+				if (smp_affinity_enable)
+					irq_set_affinity_hint(
+						instance->msixentry[j].vector, NULL);
+				free_irq(instance->msixentry[j].vector,
+					&instance->irq_context[j]);
+			}
+			/* Retry irq register for IO_APIC*/
+			instance->msix_vectors = 0;
+			if (is_probe)
+				goto try_io_apic;
+			else
+				return -1;
+		}
+		if (smp_affinity_enable) {
+			if (irq_set_affinity_hint(instance->msixentry[i].vector,
+				get_cpu_mask(cpu)))
+				dev_err(&instance->pdev->dev,
+					"Failed to set affinity hint"
+					" for cpu %d\n", cpu);
+			cpu = cpumask_next(cpu, cpu_online_mask);
+		}
+	}
+	return 0;
+}
+/**
  * megasas_probe_one -	PCI hotplug entry point
  * @pdev:		PCI device structure
  * @id:			PCI ids of supported hotplugged adapter	
@@ -6394,7 +6438,7 @@ fail_set_dma_mask:
 static int __devinit
 megasas_probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
 {
-	int rval, pos, i, j, cpu;
+	int rval, pos, i;
 	struct Scsi_Host *host;
 	struct megasas_instance *instance;
 	u16 control = 0;
@@ -6413,16 +6457,6 @@ megasas_probe_one(struct pci_dev *pdev,
 	}
 
 	/*
-	 * Announce PCI information
-	 */
-	printk(KERN_INFO "megasas: %#4.04x:%#4.04x:%#4.04x:%#4.04x: ",
-	       pdev->vendor, pdev->device, pdev->subsystem_vendor,
-	       pdev->subsystem_device);
-
-	printk("bus %d:slot %d:func %d\n",
-	       pdev->bus->number, PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
-
-	/*
 	 * PCI prepping: enable device set bus mastering and dma mask
 	 */
 	rval = pci_enable_device(pdev);
@@ -6480,9 +6514,6 @@ megasas_probe_one(struct pci_dev *pdev,
 			memset(fusion, 0,
 				((1 << PAGE_SHIFT) << instance->ctrl_context_pages));
 
-			INIT_LIST_HEAD(&fusion->cmd_pool);
-			spin_lock_init(&fusion->cmd_pool_lock);
-			
 		}
 			break;
 		default: /* For all other supported controllers */
@@ -6612,47 +6643,6 @@ megasas_probe_one(struct pci_dev *pdev,
 		}
 	}
 
-
-retry_irq_register:
-	/*
-	 * Register IRQ
-	 */
-	if (instance->msix_vectors) {
-		cpu = cpumask_first(cpu_online_mask);
-		for (i = 0; i < instance->msix_vectors; i++) {
-			instance->irq_context[i].instance = instance;
-			instance->irq_context[i].MSIxIndex = i;
-			if (request_irq(instance->msixentry[i].vector, instance->instancet->service_isr, 0, "megasas", &instance->irq_context[i])) {
-				printk(KERN_DEBUG "megasas: Failed to register IRQ for vector %d.\n", i);
-				for (j = 0; j < i; j++) {
-					if (smp_affinity_enable)
-						irq_set_affinity_hint(
-							instance->msixentry[j].vector, NULL);
-					free_irq(instance->msixentry[j].vector, &instance->irq_context[j]);
-				}
-				/* Retry irq register for IO_APIC*/
-				instance->msix_vectors = 0;
-				goto retry_irq_register;
-			}
-			if (smp_affinity_enable) {
-				if (irq_set_affinity_hint(instance->msixentry[i].vector,
-					get_cpu_mask(cpu)))
-					dev_err(&instance->pdev->dev, "Error setting"
-						"affinity hint for cpu %d\n", cpu);
-				cpu = cpumask_next(cpu, cpu_online_mask);
-			}
-		}
-	} else {
-		instance->irq_context[0].instance = instance;
-		instance->irq_context[0].MSIxIndex = 0;
-		if (request_irq(pdev->irq, instance->instancet->service_isr, IRQF_SHARED, "megasas", &instance->irq_context[0])) {
-			printk(KERN_DEBUG "megasas: Failed to register IRQ\n");
-			goto fail_irq;
-		}
-	}
-
-	instance->instancet->enable_intr(instance);
-
 	/*
 	 * Store instance in PCI softstate
 	 */
@@ -6706,7 +6696,6 @@ retry_irq_register:
 		}
 	else
 		free_irq(instance->pdev->irq, &instance->irq_context[0]);
-      fail_irq:
 	if ((instance->pdev->device == PCI_DEVICE_ID_LSI_FUSION) ||
 	    (instance->pdev->device == PCI_DEVICE_ID_LSI_PLASMA) ||
 	    (instance->pdev->device == PCI_DEVICE_ID_LSI_INVADER) ||
@@ -6772,10 +6761,7 @@ static void megasas_flush_cache(struct m
 
 	megasas_issue_blocked_cmd(instance, cmd, 30);
 
-	if(instance->ctrl_context && cmd->mpt_pthr_cmd_blocked)
-		megasas_return_mfi_mpt_pthr(instance, cmd, cmd->mpt_pthr_cmd_blocked);
-	else
-		megasas_return_cmd(instance, cmd);
+	megasas_return_cmd(instance, cmd);
 
 	return;
 }
@@ -6818,10 +6804,7 @@ static void megasas_shutdown_controller(
 
 	megasas_issue_blocked_cmd(instance, cmd, 30);
 
-	if(instance->ctrl_context && cmd->mpt_pthr_cmd_blocked)
-		megasas_return_mfi_mpt_pthr(instance, cmd, cmd->mpt_pthr_cmd_blocked);
-	else
-		megasas_return_cmd(instance, cmd);
+	megasas_return_cmd(instance, cmd);
 
 	return;
 }
@@ -6890,7 +6873,7 @@ megasas_suspend(struct pci_dev *pdev, pm
 static int
 megasas_resume(struct pci_dev *pdev)
 {
-	int rval, i, j, cpu;
+	int rval;
 	struct Scsi_Host *host;
 	struct megasas_instance *instance;
 
@@ -6965,54 +6948,8 @@ megasas_resume(struct pci_dev *pdev)
 	tasklet_init(&instance->isr_tasklet, instance->instancet->tasklet,
                          (unsigned long)instance);	
 
-	/*
-	 * Register IRQ
-	 */
-	if (instance->msix_vectors) {
-		cpu = cpumask_first(cpu_online_mask);
-		for (i = 0 ; i < instance->msix_vectors; i++) {
-			instance->irq_context[i].instance = instance;
-			instance->irq_context[i].MSIxIndex = i;
-			if (request_irq(instance->msixentry[i].vector, instance->instancet->service_isr, 0, "megasas", &instance->irq_context[i])) {
-				printk(KERN_DEBUG "megasas: Failed to register IRQ for vector %d.\n", i);
-				for (j = 0; j < i; j++) {
-					if (smp_affinity_enable)
-						irq_set_affinity_hint(
-							instance->msixentry[j].vector, NULL);
-					free_irq(instance->msixentry[j].vector, &instance->irq_context[j]);
-				}
-				goto fail_irq;
-			}
-
-			if (smp_affinity_enable) {
-				if (irq_set_affinity_hint(instance->msixentry[i].vector,
-					get_cpu_mask(cpu)))
-					dev_err(&instance->pdev->dev, "Error setting"
-						"affinity hint for cpu %d\n", cpu);
-				cpu = cpumask_next(cpu, cpu_online_mask);
-			}
-		}
-	} else {
-		instance->irq_context[0].instance = instance;
-		instance->irq_context[0].MSIxIndex = 0;
-		if (request_irq(pdev->irq, instance->instancet->service_isr, IRQF_SHARED, "megasas", &instance->irq_context[0])) {
-			printk(KERN_DEBUG "megasas: Failed to register IRQ\n");
-			goto fail_irq;
-		}
-	}
-
-	/* Re-launch SR-IOV heartbeat timer */
-	if (instance->requestorId) {
-		if (!megasas_sriov_start_heartbeat(instance, 0))
-			megasas_start_timer(instance,
-					    &instance->sriov_heartbeat_timer,
-					    megasas_sriov_heartbeat_handler,
-					    MEGASAS_SRIOV_HEARTBEAT_INTERVAL_VF);
-		else {
-			instance->skip_heartbeat_timer_del = 1;
-			goto fail_irq;
-		}
-	}
+	if (megasas_setup_irqs(instance, 0))
+		goto fail_init_mfi;
 
 	instance->instancet->enable_intr(instance);
 	instance->unload = 0;
@@ -7025,7 +6962,6 @@ megasas_resume(struct pci_dev *pdev)
 
 	return 0;
 
-fail_irq:
 fail_init_mfi:
 	if (instance->evt_detail)
 		pci_free_consistent(pdev, sizeof(struct megasas_evt_detail),
@@ -7499,10 +7435,7 @@ megasas_mgmt_fw_ioctl(struct megasas_ins
 	/* Complete both MPT and MFI command from this context 
 	 * Applicable for Fusion Adapter only.
 	 */
-	if(instance->ctrl_context && cmd->mpt_pthr_cmd_blocked)
-		megasas_return_mfi_mpt_pthr(instance, cmd, cmd->mpt_pthr_cmd_blocked);
-	else
-		megasas_return_cmd(instance, cmd);
+	megasas_return_cmd(instance, cmd);
 	
 	return error;
 }
diff -arup megaraid_sas-06.807.01.01/distro/base//megaraid_sas_fusion.c megaraid_sas-06.807.02.00/distro/base//megaraid_sas_fusion.c
--- megaraid_sas-06.807.01.01/distro/base//megaraid_sas_fusion.c	2014-11-18 17:16:49.000000000 +0530
+++ megaraid_sas-06.807.02.00/distro/base//megaraid_sas_fusion.c	2014-11-24 21:52:28.000000000 +0530
@@ -56,6 +56,7 @@
 #include "megaraid_sas_fusion.h"
 #include "megaraid_sas.h"
 
+
 extern void megasas_free_cmds(struct megasas_instance *instance);
 extern struct megasas_cmd *megasas_get_cmd(struct megasas_instance
 					   *instance);
@@ -172,27 +173,29 @@ megasas_clear_intr_fusion(struct megasas
  * megasas_get_cmd_fusion -	Get a command from the free pool
  * @instance:		Adapter soft state
  *
- * Returns a free command from the pool
+ * Returns a blk_tag indexed mpt frame
  */
-struct megasas_cmd_fusion *megasas_get_cmd_fusion(struct megasas_instance
-						  *instance)
+inline struct megasas_cmd_fusion *megasas_get_cmd_fusion(struct megasas_instance
+						  *instance, u32 blk_tag)
 {
-	unsigned long flags;
-	struct fusion_context *fusion = (struct fusion_context *)instance->ctrl_context;
+	struct fusion_context *fusion =
+		(struct fusion_context *)instance->ctrl_context;
+#if BLK_TAG_REFCOUNT
 	struct megasas_cmd_fusion *cmd = NULL;
-
-	spin_lock_irqsave(&fusion->cmd_pool_lock, flags);
-
-	if (!list_empty(&fusion->cmd_pool)) {
-		cmd = list_entry((&fusion->cmd_pool)->next,
-				 struct megasas_cmd_fusion, list);
-		list_del_init(&cmd->list);
-	} else {
-		printk(KERN_ERR "megasas: Command pool (fusion) empty!\n");
-	}
-
-	spin_unlock_irqrestore(&fusion->cmd_pool_lock, flags);
-	return cmd;
+	cmd = fusion->cmd_list[blk_tag];
+	if (blk_tag > instance->max_fw_cmds) {
+		dev_err(&instance->pdev->dev,
+			"request tags (%04d) max_fw_cmds (%04d) "
+			"can_queue (%04d) max_scsi_cmds (%04d)\n",
+			blk_tag, instance->max_fw_cmds,
+			instance->host->can_queue, instance->max_scsi_cmds);
+		panic("Invalid blk tag");
+	}
+	if (atomic_inc_return(&cmd->refcount) > 1)
+		dev_err(&instance->pdev->dev, "request tags %d from %s %d\n",
+			blk_tag, __func__, __LINE__);
+#endif
+	return fusion->cmd_list[blk_tag];
 }
 
 /**
@@ -203,36 +206,10 @@ struct megasas_cmd_fusion *megasas_get_c
 inline void
 megasas_return_cmd_fusion(struct megasas_instance *instance, struct megasas_cmd_fusion *cmd)
 {
-	unsigned long flags;
-	struct fusion_context *fusion = (struct fusion_context *)instance->ctrl_context;
-
-	spin_lock_irqsave(&fusion->cmd_pool_lock, flags);
-
 	cmd->scmd = NULL;
-	cmd->sync_cmd_idx = (u32)ULONG_MAX;
-	list_add(&cmd->list, (&fusion->cmd_pool)->next);
-
-	spin_unlock_irqrestore(&fusion->cmd_pool_lock, flags);
-}
-
-/**
- * megasas_return_mfi_mpt_pthr - Return a mfi and mpt to free command pool
- * @instance:		Adapter soft state
- * @cmd_mfi:		MFI Command packet to be returned to free command pool
- * @cmd_mpt:		MPT Command packet to be returned to free command pool
- */
-inline void
-megasas_return_mfi_mpt_pthr(struct megasas_instance *instance, 
-		struct megasas_cmd *cmd_mfi, struct megasas_cmd_fusion *cmd_fusion) 
-{
-	unsigned long flags;
-	spin_lock_irqsave(&instance->cmd_pool_lock, flags);
-	megasas_return_cmd_fusion(instance, cmd_fusion);
-	if( atomic_read(&cmd_mfi->mfi_mpt_pthr) != MFI_MPT_ATTACHED)
-		printk("LSI debug possible bug from %s %d \n", __func__, __LINE__);
-	atomic_set(&cmd_mfi->mfi_mpt_pthr, MFI_MPT_DETACHED);
-	__megasas_return_cmd(instance, cmd_mfi);
-	spin_unlock_irqrestore(&instance->cmd_pool_lock, flags);
+#if BLK_TAG_REFCOUNT
+	(void)atomic_dec_and_test(&cmd->refcount);
+#endif
 }
 
 /**
@@ -326,7 +303,6 @@ megasas_free_cmds_fusion(struct megasas_
 	kfree(fusion->cmd_list);
 	fusion->cmd_list = NULL;
 
-	INIT_LIST_HEAD(&fusion->cmd_pool);
 }
 
 /**
@@ -513,6 +489,10 @@ megasas_alloc_cmds_fusion(struct megasas
 	/*
 	 * Add all the commands to command pool (fusion->cmd_pool)
 	 */
+#if BLK_TAG_DEBUG
+	dev_info(&instance->pdev->dev,
+		"passthrough pool : mpt index\t mfi index\n");
+#endif
 
 	/* SMID 0 is reserved. Set SMID/index from 1 */
 	for (i = 0; i < max_cmd; i++) {
@@ -521,13 +501,20 @@ megasas_alloc_cmds_fusion(struct megasas
 		memset(cmd, 0, sizeof(struct megasas_cmd_fusion));
 		cmd->index = i + 1; 
 		cmd->scmd = NULL;
-		cmd->sync_cmd_idx = (u32)ULONG_MAX; /* Set to Invalid */
+		cmd->sync_cmd_idx = (i >= instance->max_scsi_cmds) ?
+				(i - instance->max_scsi_cmds) :
+				(u32)ULONG_MAX; /* Set to Invalid */
+#if BLK_TAG_DEBUG
+		if (cmd->sync_cmd_idx != (u32) ULONG_MAX)
+			dev_info(&instance->pdev->dev,
+				"\t\t\t[%d]\t[%d]\n",
+				i, cmd->sync_cmd_idx);
+#endif
 		cmd->instance = instance;
 		cmd->io_request = (MEGASAS_RAID_SCSI_IO_REQUEST *)(io_req_base + offset);
 		memset(cmd->io_request, 0, sizeof(MEGASAS_RAID_SCSI_IO_REQUEST));
 		cmd->io_request_phys_addr = io_req_base_phys + offset;
-
-		list_add_tail(&cmd->list, &fusion->cmd_pool);
+		atomic_set(&cmd->refcount, 0);
 	}
 
 	/*
@@ -590,8 +577,7 @@ wait_and_poll(struct megasas_instance *i
 
 	if (frame_hdr->cmd_status == 0xff) {
 		if(fusion)
-			megasas_return_mfi_mpt_pthr(instance, cmd,
-				cmd->mpt_pthr_cmd_blocked);
+			megasas_return_cmd(instance, cmd);
                 return -ETIME;
         }
 
@@ -811,10 +797,7 @@ megasas_get_ld_map_info(struct megasas_i
 	else
 		ret = megasas_issue_polled(instance, cmd);
 
-	if(instance->ctrl_context && cmd->mpt_pthr_cmd_blocked)
-		megasas_return_mfi_mpt_pthr(instance, cmd, cmd->mpt_pthr_cmd_blocked);
-	else
-		megasas_return_cmd(instance, cmd);
+	megasas_return_cmd(instance, cmd);
 
 	return ret;
 }
@@ -1041,6 +1024,15 @@ megasas_init_adapter_fusion(struct megas
 		fusion->last_reply_idx[i] = 0;
 
 	/*
+	 * For fusion adapters, 3 commands for IOCTL and 5 commands
+	 * for driver's internal DCMDs.
+	 */
+	instance->max_scsi_cmds = instance->max_fw_cmds -
+				(MEGASAS_FUSION_INTERNAL_CMDS +
+				MEGASAS_FUSION_IOCTL_CMDS);
+	sema_init(&instance->ioctl_sem, MEGASAS_FUSION_IOCTL_CMDS);
+
+	/*
 	 * Allocate memory for descriptors
 	 * Create a pool of commands
 	 */
@@ -1866,7 +1858,8 @@ megasas_build_and_issue_cmd_fusion(struc
 
 	fusion = instance->ctrl_context;
 
-	cmd = megasas_get_cmd_fusion(instance);
+	cmd = megasas_get_cmd_fusion(instance, scmd->request->tag);
+
 	if (!cmd)
 		return SCSI_MLQUEUE_HOST_BUSY;
 
@@ -1923,6 +1916,7 @@ complete_cmd_fusion(struct megasas_insta
 	union desc_value d_val;
 	PLD_LOAD_BALANCE_INFO lbinfo;
 	int threshold_reply_count = 0;
+	struct scsi_cmnd *scmd_local = NULL;
 
 	fusion = instance->ctrl_context;
 
@@ -1955,6 +1949,7 @@ complete_cmd_fusion(struct megasas_insta
 		if (cmd_fusion->scmd)
 			cmd_fusion->scmd->SCp.ptr = NULL;
 
+		scmd_local = cmd_fusion->scmd;
 		status = scsi_io_req->RaidContext.status;
 		extStatus = scsi_io_req->RaidContext.exStatus;
 
@@ -1962,7 +1957,7 @@ complete_cmd_fusion(struct megasas_insta
 		{
 		case MPI2_FUNCTION_SCSI_IO_REQUEST :  /*Fast Path IO.*/
 			/* Update load balancing info */
-			device_id = MEGASAS_DEV_INDEX(instance, cmd_fusion->scmd);
+			device_id = MEGASAS_DEV_INDEX(instance, scmd_local);
 			lbinfo = &fusion->load_balance_info[device_id];
 			if (cmd_fusion->scmd->SCp.Status & MEGASAS_LOAD_BALANCE_FLAG) {
 				atomic_dec(&lbinfo->scsi_pending_cmds[cmd_fusion->pd_r1_lb]);
@@ -1976,26 +1971,24 @@ complete_cmd_fusion(struct megasas_insta
 		case MEGASAS_MPI2_FUNCTION_LD_IO_REQUEST : /* LD-IO Path */
 			/* Map the FW Cmd Status */
 			map_cmd_status(fusion, cmd_fusion,status,extStatus);
-			scsi_dma_unmap(cmd_fusion->scmd);
-			cmd_fusion->scmd->scsi_done(cmd_fusion->scmd);
         		scsi_io_req->RaidContext.status = 0;
 		        scsi_io_req->RaidContext.exStatus = 0;
 			megasas_return_cmd_fusion(instance, cmd_fusion);
+			scsi_dma_unmap(scmd_local);
+			scmd_local->scsi_done(scmd_local);
 			atomic_dec(&instance->fw_outstanding);
 
 			break;
 		case MEGASAS_MPI2_FUNCTION_PASSTHRU_IO_REQUEST: /*MFI command */
 			cmd_mfi = instance->cmd_list[cmd_fusion->sync_cmd_idx];
-			
-			if(!cmd_mfi->mpt_pthr_cmd_blocked) {
-				if (megasas_dbg_lvl == 5)
-					printk("LSI Debug freeing mfi/mpt pass-through from %s %d \n",
-						__func__, __LINE__);
-				megasas_return_mfi_mpt_pthr(instance, cmd_mfi, cmd_fusion);
-			}
-		
-			megasas_complete_cmd(instance, cmd_mfi, DID_OK);
-			cmd_fusion->flags = 0;
+			/* Poll mode. Dummy free.
+			 * In case of Interrupt mode, caller has reverse check.
+			 */
+			if (cmd_mfi->flags & DRV_DCMD_POLLED_MODE) {
+				cmd_mfi->flags &= ~DRV_DCMD_POLLED_MODE;
+				megasas_return_cmd(instance, cmd_mfi);
+			} else
+				megasas_complete_cmd(instance, cmd_mfi, DID_OK);
 			break;
 		}
 
@@ -2149,27 +2142,15 @@ build_mpt_mfi_pass_thru(struct megasas_i
 	struct megasas_cmd_fusion *cmd;
 	struct fusion_context *fusion;
         struct megasas_header *frame_hdr = &mfi_cmd->frame->hdr;
-	u32 opcode;
+	fusion = instance->ctrl_context;
 
-	cmd = megasas_get_cmd_fusion(instance);
+	cmd = megasas_get_cmd_fusion(instance,
+			instance->max_scsi_cmds + mfi_cmd->index);
 	if (!cmd)
 		return 1;
 
 	/*  Save the smid. To be used for returning the cmd */
 	mfi_cmd->context.smid = cmd->index;
-	cmd->sync_cmd_idx = mfi_cmd->index;
-	
-	/* Set this only for Blocked commands */
- 	opcode = le32_to_cpu(mfi_cmd->frame->dcmd.opcode);
-	if ((opcode == MR_DCMD_LD_MAP_GET_INFO)
-                        && (mfi_cmd->frame->dcmd.mbox.b[1] == 1))
-		mfi_cmd->is_wait_event = 1;
-
-	if (opcode == MR_DCMD_CTRL_EVENT_WAIT)
-		mfi_cmd->is_wait_event = 1;
-
-	if(mfi_cmd->is_wait_event)
-		mfi_cmd->mpt_pthr_cmd_blocked = cmd;
 
 	/*
 	 * For cmds where the flag is set, store the flag and check
@@ -2178,9 +2159,16 @@ build_mpt_mfi_pass_thru(struct megasas_i
 	 */
 
 	if (frame_hdr->flags & cpu_to_le16(MFI_FRAME_DONT_POST_IN_REPLY_QUEUE))
-		cmd->flags = MFI_FRAME_DONT_POST_IN_REPLY_QUEUE;
+		mfi_cmd->flags |= DRV_DCMD_POLLED_MODE;
 
-	fusion = instance->ctrl_context;
+#if BLK_TAG_DEBUG
+	dev_info(&instance->pdev->dev,
+			"%s : opcode (0x%08x) poll_mode (%d) tag (%04d) mfi->index (%04d)\n",
+			__func__, le32_to_cpu(mfi_cmd->frame->dcmd.opcode),
+			mfi_cmd->flags,
+			instance->max_scsi_cmds + mfi_cmd->index,
+			cmd->sync_cmd_idx);
+#endif
 	io_req = cmd->io_request;
 
 	if ((instance->pdev->device == PCI_DEVICE_ID_LSI_INVADER) ||
@@ -2256,7 +2244,6 @@ megasas_issue_dcmd_fusion(struct megasas
 	}
 	
 
-	atomic_set(&cmd->mfi_mpt_pthr, MFI_MPT_ATTACHED);
 	instance->instancet->fire_cmd(instance,
 				req_desc->u.low, req_desc->u.high, instance->reg_set);
 }
@@ -2462,6 +2449,58 @@ out:
 	return retval;
 }
 
+
+/*
+ * megasas_refire_mgmt_cmd :	Re-fire management commands
+ * @instance:				Controller's soft instance
+*/
+void megasas_refire_mgmt_cmd(struct megasas_instance *instance)
+{
+	int j;
+	struct megasas_cmd_fusion *cmd_fusion;
+	struct fusion_context *fusion;
+	struct megasas_cmd *cmd_mfi;
+	MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc;
+	u16 smid;
+
+	fusion = instance->ctrl_context;
+
+	/* Re-fire management commands.
+	 * Do not traverse complet MPT frame pool. Start from max_scsi_cmds.
+	 */
+	for (j = instance->max_scsi_cmds ; j < instance->max_fw_cmds; j++) {
+		cmd_fusion = fusion->cmd_list[j];
+		cmd_mfi = instance->cmd_list[cmd_fusion->sync_cmd_idx];
+		smid = le16_to_cpu(cmd_mfi->context.smid);
+
+		if (!smid)
+			continue;
+		req_desc = megasas_get_request_descriptor
+					(instance, smid - 1);
+		if (req_desc && (cmd_mfi->frame->dcmd.opcode !=
+				cpu_to_le32(MR_DCMD_LD_MAP_GET_INFO))) {
+#if BLK_TAG_DEBUG
+			dev_info(&instance->pdev->dev,
+				"%s : refire sync_cmd_idx (%04d) tag (%04d) opcode (0x%08x)\n",
+				__func__, cmd_fusion->sync_cmd_idx,
+				(smid - 1),
+				le32_to_cpu(cmd_mfi->frame->dcmd.opcode));
+#endif
+			instance->instancet->fire_cmd(instance,
+				req_desc->u.low, req_desc->u.high,
+				instance->reg_set);
+		} else {
+#if BLK_TAG_DEBUG
+			dev_info(&instance->pdev->dev,
+				"%s : skip sync_cmd_idx (%04d) tag (%04d) opcode (0x%08x)\n",
+				__func__, cmd_fusion->sync_cmd_idx,
+				(smid - 1),
+				le32_to_cpu(cmd_mfi->frame->dcmd.opcode));
+#endif
+			megasas_return_cmd(instance, cmd_mfi);
+		}
+	}
+}
 /*
  * megasas_reset_fusion :	Core reset function for fusion adapters
  * shost	        :	SCSI host 	
@@ -2470,12 +2509,10 @@ out:
 /* Core fusion reset function */
 int megasas_reset_fusion(struct Scsi_Host *shost, int iotimeout)
 {
-	int retval = SUCCESS, i, j, retry = 0, convert = 0;
+	int retval = SUCCESS, i, retry = 0, convert = 0;
 	struct megasas_instance *instance;
 	struct megasas_cmd_fusion *cmd_fusion;
 	struct fusion_context *fusion;
-	struct megasas_cmd *cmd_mfi;
-	MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc;
 	u32 host_diag, abs_state, status_reg, reset_adapter;
 	u32 io_timeout_in_crash_mode = 0;
 
@@ -2703,27 +2740,7 @@ int megasas_reset_fusion(struct Scsi_Hos
 				continue;
 			}
 
-			/* Re-fire management commands */
-			for (j = 0 ; j < instance->max_fw_cmds; j++) {
-				cmd_fusion = fusion->cmd_list[j];
-				if (cmd_fusion->sync_cmd_idx != (u32)ULONG_MAX) {
-					cmd_mfi = instance->cmd_list[cmd_fusion->sync_cmd_idx];
-					if (cmd_mfi->frame->dcmd.opcode == cpu_to_le32(MR_DCMD_LD_MAP_GET_INFO)) {
-						megasas_return_mfi_mpt_pthr(instance, cmd_mfi, cmd_fusion);
-					} else  {
-						req_desc = megasas_get_request_descriptor(instance, cmd_mfi->context.smid - 1);
-						if (!req_desc) {
-							printk(KERN_WARNING "req_desc NULL for scsi%d\n", instance->host->host_no);
-							/* Return leaked MPT
-							   frame */
-							megasas_return_cmd_fusion(instance, cmd_fusion);
-						} else {
-							instance->instancet->fire_cmd(instance, 
-								req_desc->u.low, req_desc->u.high, instance->reg_set);
-						}
-					}
-				}
-			}
+			megasas_refire_mgmt_cmd(instance);
 			
  			if (megasas_get_ctrl_info(instance)) {
  				dev_info(&instance->pdev->dev,
@@ -2758,14 +2775,13 @@ int megasas_reset_fusion(struct Scsi_Hos
 			printk(KERN_WARNING "megaraid_sas: Reset successful "
 			       "for scsi%d.\n", instance->host->host_no);
 			
-			if (instance->crash_dump_drv_support) {
-				if (instance->crash_dump_app_support)
-					megasas_set_crash_dump_params(instance, 
-						MR_CRASH_BUF_TURN_ON);
-				else 
-					megasas_set_crash_dump_params(instance, 
-						MR_CRASH_BUF_TURN_OFF);
-			}
+			if (instance->crash_dump_drv_support &&
+				instance->crash_dump_app_support)
+				megasas_set_crash_dump_params(instance,
+					MR_CRASH_BUF_TURN_ON);
+			else
+				megasas_set_crash_dump_params(instance,
+					MR_CRASH_BUF_TURN_OFF);
 			
 			retval = SUCCESS;
 			goto out;
diff -arup megaraid_sas-06.807.01.01/distro/base//megaraid_sas_fusion.h megaraid_sas-06.807.02.00/distro/base//megaraid_sas_fusion.h
--- megaraid_sas-06.807.01.01/distro/base//megaraid_sas_fusion.h	2014-11-18 17:16:49.000000000 +0530
+++ megaraid_sas-06.807.02.00/distro/base//megaraid_sas_fusion.h	2014-11-24 21:52:28.000000000 +0530
@@ -854,8 +854,8 @@ struct megasas_cmd_fusion {
 	 */
 	u32 sync_cmd_idx;
 	u32 index;
-	u8 flags;
 	u8 pd_r1_lb; /*PD after R1 load balancing*/
+	atomic_t refcount;
 };
 
 typedef struct _LD_LOAD_BALANCE_INFO
@@ -890,9 +890,6 @@ typedef struct LOG_BLOCK_SPAN_INFO {
 struct fusion_context
 {
 	struct megasas_cmd_fusion **cmd_list;
-	struct list_head cmd_pool;
-
-	spinlock_t cmd_pool_lock;
 
 	dma_addr_t req_frames_desc_phys;
 	u8 *req_frames_desc;	

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] megaraid_sas: Enable shared host tag map
  2014-11-24 15:51   ` Hannes Reinecke
  2014-11-24 15:59     ` Kashyap Desai
@ 2014-11-25 14:31     ` Christoph Hellwig
  2014-11-25 14:47       ` Hannes Reinecke
  2014-11-25 15:03       ` Kashyap Desai
  1 sibling, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2014-11-25 14:31 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: James Bottomley, Sumit Saxena, Kashyap Desai, linux-scsi,
	Webb Scales, Don Brace, Jens Axboe

On Mon, Nov 24, 2014 at 04:51:14PM +0100, Hannes Reinecke wrote:
> It is useful as is, as we'll be getting prefixed logging output :-)

Use the blk-mq code path if you care :)

> Which I didn't do yet as the driver is using a larger tag map than
> that one announced to the block layer.
> This is to facilitate internal command submission, which should
> always work independent on any tag starvation issues from the
> upper layers.

This is an "issue" for a lot of drivers.  blk-mq provides a reserved_tags
pool for that, which reserves a number of tags for internal use, those
must be allocated using blk_mq_alloc_request with the reserved argument
set to true.

The lockless hpsa patches expose this to SCSI, which I'm generally
fine with, but we need to find a way to transparently make this work
for the old code path, too.  This might be as simple as embedding a
second blk_queue_tag structure into the Scsi_Host, adding a constant
prefix to the tag and providing some wrappes in scsi that allow
allocating a struct request (or rather scsi_cmnd) for internal use.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] megaraid_sas: Enable shared host tag map
  2014-11-25 14:31     ` Christoph Hellwig
@ 2014-11-25 14:47       ` Hannes Reinecke
  2014-11-25 16:30         ` Christoph Hellwig
  2014-11-25 15:03       ` Kashyap Desai
  1 sibling, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2014-11-25 14:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: James Bottomley, Sumit Saxena, Kashyap Desai, linux-scsi,
	Webb Scales, Don Brace, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 1972 bytes --]

On 11/25/2014 03:31 PM, Christoph Hellwig wrote:
> On Mon, Nov 24, 2014 at 04:51:14PM +0100, Hannes Reinecke wrote:
>> It is useful as is, as we'll be getting prefixed logging output :-)
> 
> Use the blk-mq code path if you care :)
> 
>> Which I didn't do yet as the driver is using a larger tag map than
>> that one announced to the block layer.
>> This is to facilitate internal command submission, which should
>> always work independent on any tag starvation issues from the
>> upper layers.
> 
> This is an "issue" for a lot of drivers.  blk-mq provides a reserved_tags
> pool for that, which reserves a number of tags for internal use, those
> must be allocated using blk_mq_alloc_request with the reserved argument
> set to true.
> 
> The lockless hpsa patches expose this to SCSI, which I'm generally
> fine with, but we need to find a way to transparently make this work
> for the old code path, too.  This might be as simple as embedding a
> second blk_queue_tag structure into the Scsi_Host, adding a constant
> prefix to the tag and providing some wrappes in scsi that allow
> allocating a struct request (or rather scsi_cmnd) for internal use.
> 
I'd rather have a single map to get request/tags from; otherwise
we'd be arbitrarily starving internal requests even though the
'main' tag map is empty.
My plan was more to mark a certain range of tags as 'reserved',
and add another helper/argument to allow to dip into the reserved
pool, too.

A tentative patch is attached.
Idea is to call blk_queue_init_tags() with the actual tag size and
then blk_resize_tags() to limit the number of tags for the request
queue.
The driver can then use 'blk_allocate_tag' with the appropriate max
depth to get tags from the range [max_depth:real_max_depth].

Cheers,

Hannes

-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 21284 (AG Nürnberg)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-blk-tag-separate-out-blk_-allocate-release-_tag.patch --]
[-- Type: text/x-patch; name="0001-blk-tag-separate-out-blk_-allocate-release-_tag.patch", Size: 3252 bytes --]

From e872bb0c4c2b1f72982f4d31925d3b2f65317ffd Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@suse.de>
Date: Tue, 25 Nov 2014 15:43:07 +0100
Subject: [PATCH] blk-tag: separate out blk_(allocate|release)_tag

Separate out helper functions blk_allocate_tag() and
blk_release_tag().

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-tag.c | 74 ++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 55 insertions(+), 19 deletions(-)

diff --git a/block/blk-tag.c b/block/blk-tag.c
index a185b86..6526fff 100644
--- a/block/blk-tag.c
+++ b/block/blk-tag.c
@@ -244,6 +244,24 @@ int blk_queue_resize_tags(struct request_queue *q, int new_depth)
 }
 EXPORT_SYMBOL(blk_queue_resize_tags);
 
+void blk_release_tag(struct blk_queue_tag *bqt, int tag)
+{
+	if (unlikely(!bqt))
+		return;
+
+	if (unlikely(!test_bit(tag, bqt->tag_map))) {
+		printk(KERN_ERR "%s: attempt to clear non-busy tag (%d)\n",
+		       __func__, tag);
+		return;
+	}
+	/*
+	 * The tag_map bit acts as a lock for tag_index[bit], so we need
+	 * unlock memory barrier semantics.
+	 */
+	clear_bit_unlock(tag, bqt->tag_map);
+}
+EXPORT_SYMBOL(blk_release_tag);
+
 /**
  * blk_queue_end_tag - end tag operations for a request
  * @q:  the request queue for the device
@@ -275,18 +293,43 @@ void blk_queue_end_tag(struct request_queue *q, struct request *rq)
 
 	bqt->tag_index[tag] = NULL;
 
-	if (unlikely(!test_bit(tag, bqt->tag_map))) {
-		printk(KERN_ERR "%s: attempt to clear non-busy tag (%d)\n",
-		       __func__, tag);
-		return;
-	}
+	blk_release_tag(bqt, tag);
+}
+EXPORT_SYMBOL(blk_queue_end_tag);
+
+/**
+ * blk_reserve_tag - lock and return the next free tag
+ * @bqt:  tag map
+ * @max_depth: max tag depth to use
+ *
+ * Description:
+ *   Lock and return the next free tag.
+ *   The tag needs to be freed up after usage with
+ *   blk_release_tag()
+ *
+ **/
+int blk_reserve_tag(struct blk_queue_tag *bqt, int max_depth)
+{
+	int tag = -1;
+
+	if (!bqt)
+		return tag;
+	if (max_depth >= bqt->real_max_depth)
+		return -1;
+
+	do {
+		tag = find_first_zero_bit(bqt->tag_map, max_depth);
+		if (tag >= max_depth)
+			return tag;
+
+	} while (test_and_set_bit_lock(tag, bqt->tag_map));
 	/*
-	 * The tag_map bit acts as a lock for tag_index[bit], so we need
-	 * unlock memory barrier semantics.
+	 * We need lock ordering semantics given by test_and_set_bit_lock.
+	 * See blk_queue_end_tag for details.
 	 */
-	clear_bit_unlock(tag, bqt->tag_map);
+	return tag;
 }
-EXPORT_SYMBOL(blk_queue_end_tag);
+EXPORT_SYMBOL(blk_reserve_tag);
 
 /**
  * blk_queue_start_tag - find a free tag and assign it
@@ -343,16 +386,9 @@ int blk_queue_start_tag(struct request_queue *q, struct request *rq)
 			return 1;
 	}
 
-	do {
-		tag = find_first_zero_bit(bqt->tag_map, max_depth);
-		if (tag >= max_depth)
-			return 1;
-
-	} while (test_and_set_bit_lock(tag, bqt->tag_map));
-	/*
-	 * We need lock ordering semantics given by test_and_set_bit_lock.
-	 * See blk_queue_end_tag for details.
-	 */
+	tag = blk_reserve_tag(bqt, max_depth);
+	if (tag == -1)
+		return 1;
 
 	rq->cmd_flags |= REQ_QUEUED;
 	rq->tag = tag;
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: [PATCH] megaraid_sas: Enable shared host tag map
  2014-11-25 14:31     ` Christoph Hellwig
  2014-11-25 14:47       ` Hannes Reinecke
@ 2014-11-25 15:03       ` Kashyap Desai
  2014-11-25 16:34         ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Kashyap Desai @ 2014-11-25 15:03 UTC (permalink / raw)
  To: Christoph Hellwig, Hannes Reinecke
  Cc: James Bottomley, Sumit Saxena, linux-scsi, Webb Scales, Don Brace,
	Jens Axboe

> -----Original Message-----
> From: Christoph Hellwig [mailto:hch@lst.de]
> Sent: Tuesday, November 25, 2014 8:02 PM
> To: Hannes Reinecke
> Cc: James Bottomley; Sumit Saxena; Kashyap Desai; linux-
> scsi@vger.kernel.org; Webb Scales; Don Brace; Jens Axboe
> Subject: Re: [PATCH] megaraid_sas: Enable shared host tag map
>
> On Mon, Nov 24, 2014 at 04:51:14PM +0100, Hannes Reinecke wrote:
> > It is useful as is, as we'll be getting prefixed logging output :-)
>
> Use the blk-mq code path if you care :)
>
> > Which I didn't do yet as the driver is using a larger tag map than
> > that one announced to the block layer.
> > This is to facilitate internal command submission, which should always
> > work independent on any tag starvation issues from the upper layers.
>
> This is an "issue" for a lot of drivers.  blk-mq provides a
reserved_tags pool
> for that, which reserves a number of tags for internal use, those must
be
> allocated using blk_mq_alloc_request with the reserved argument set to
> true.
>
> The lockless hpsa patches expose this to SCSI, which I'm generally fine
with,
> but we need to find a way to transparently make this work for the old
code
> path, too.  This might be as simple as embedding a second blk_queue_tag
> structure into the Scsi_Host, adding a constant prefix to the tag and
providing
> some wrappes in scsi that allow allocating a struct request (or rather
> scsi_cmnd) for internal use.

Just trying to understand your above comment. I see host template has new
field called
"reserved_tags" which is normally used by mid layer to reserved block tag
pool for driver's internal use.
What if driver is OK to manage internal pool without any blk tag
dependency and do not expose actual max can queue of what FW can support.
For example, in megaraid_sas driver FW expose max_fw_cmd = 1024 and driver
keep some reserve (let's say 24 commands for internal use) and it just
expose 1000 command for blk tags.  This will work for blk-mq and legacy
blk driver. Let driver to manage whatever internal reserved it kept for
and do not add any dependency with blk tag from above layer for those
reserved pool.

` Kashyap

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] megaraid_sas: Enable shared host tag map
  2014-11-25 14:47       ` Hannes Reinecke
@ 2014-11-25 16:30         ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2014-11-25 16:30 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, James Bottomley, Sumit Saxena, Kashyap Desai,
	linux-scsi, Webb Scales, Don Brace, Jens Axboe

On Tue, Nov 25, 2014 at 03:47:40PM +0100, Hannes Reinecke wrote:
> I'd rather have a single map to get request/tags from; otherwise
> we'd be arbitrarily starving internal requests even though the
> 'main' tag map is empty.

At least in blk-mq the assumption is that a driver needs very few
internal tags, and it might need access to them in "emergency"
situations like resets or aborts.

> My plan was more to mark a certain range of tags as 'reserved',
> and add another helper/argument to allow to dip into the reserved
> pool, too.

We can add this as optional behavior, but I think the existing blk-mq
behavior is a good default.

> A tentative patch is attached.
> Idea is to call blk_queue_init_tags() with the actual tag size and
> then blk_resize_tags() to limit the number of tags for the request
> queue.
> The driver can then use 'blk_allocate_tag' with the appropriate max
> depth to get tags from the range [max_depth:real_max_depth].

I'd much prefer the blk-mq approach with two maps.  Either way please
make sure whatever you come up is compatible with blk-mq.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] megaraid_sas: Enable shared host tag map
  2014-11-25 15:03       ` Kashyap Desai
@ 2014-11-25 16:34         ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2014-11-25 16:34 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Hannes Reinecke, James Bottomley, Sumit Saxena, linux-scsi,
	Webb Scales, Don Brace, Jens Axboe

On Tue, Nov 25, 2014 at 08:33:28PM +0530, Kashyap Desai wrote:
> Just trying to understand your above comment. I see host template has new
> field called
> "reserved_tags" which is normally used by mid layer to reserved block tag
> pool for driver's internal use.
> What if driver is OK to manage internal pool without any blk tag
> dependency and do not expose actual max can queue of what FW can support.
> For example, in megaraid_sas driver FW expose max_fw_cmd = 1024 and driver
> keep some reserve (let's say 24 commands for internal use) and it just
> expose 1000 command for blk tags.  This will work for blk-mq and legacy
> blk driver. Let driver to manage whatever internal reserved it kept for
> and do not add any dependency with blk tag from above layer for those
> reserved pool.

The driver can of coure always just manage its internal tags, but that
means we need to duplicate a tag allocator in every driver.  In short
it works ok for now, but I'd rather solve the problem in a single place.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-11-25 16:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-24 15:33 [PATCH] megaraid_sas: Enable shared host tag map Hannes Reinecke
2014-11-24 15:35 ` Christoph Hellwig
2014-11-24 15:51   ` Hannes Reinecke
2014-11-24 15:59     ` Kashyap Desai
2014-11-25 14:31     ` Christoph Hellwig
2014-11-25 14:47       ` Hannes Reinecke
2014-11-25 16:30         ` Christoph Hellwig
2014-11-25 15:03       ` Kashyap Desai
2014-11-25 16:34         ` Christoph Hellwig
2014-11-24 15:52   ` Kashyap Desai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox