linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: James Bottomley <jejb@linux.vnet.ibm.com>,
	Hannes Reinecke <hare@suse.de>,
	Sumit Saxena <sumit.saxena@broadcom.com>,
	linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, thenzl@redhat.com,
	kashyap.desai@broadcom.com, Christoph Hellwig <hch@infradead.org>,
	"Martin K. Petersen" <mkp@mkp.net>,
	Jeff Moyer <jmoyer@redhat.com>, Gris Ge <fge@redhat.com>,
	Ewan Milne <emilne@redhat.com>, Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to firmware
Date: Mon, 17 Oct 2016 12:27:00 -0400	[thread overview]
Message-ID: <14aa9905-b7fc-9695-dd19-1cc31cbee330@redhat.com> (raw)
In-Reply-To: <1476721205.2734.12.camel@linux.vnet.ibm.com>

On 10/17/2016 12:20 PM, James Bottomley wrote:
> On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote:
>> On 10/17/2016 07:34 AM, Hannes Reinecke wrote:
>>> On 10/17/2016 12:24 PM, Sumit Saxena wrote:
>>>> megaraid_sas driver returns SYNCHRONIZE_CACHE command back to
>>>> SCSI layer without sending it to firmware as firmware takes care
>>>> of flushing cache.  This patch will change the driver behavior
>>>> wrt SYNCHRONIZE_CACHE handling. If underlying firmware has
>>>> support to handle the SYNCHORNIZE_CACHE, driver will send it for
>>>> firmware otherwise complete it back to SCSI layer with SUCCESS
>>>> immediately.  If  Firmware  handle SYNCHORNIZE_CACHE for both VD
>>>> and JBOD "canHandleSyncCache" bit in scratch pad register(offset
>>>> 0x00B4) will be set.
>>>>
>>>> This behavior can be controlled via module parameter and user can
>>>> fallback to old behavior of returning SYNCHRONIZE_CACHE by driver
>>>> only without sending it to firmware.
>>>>
>>>> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
>>>> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
>>>> ---
>>>>    drivers/scsi/megaraid/megaraid_sas.h        |  3 +++
>>>>    drivers/scsi/megaraid/megaraid_sas_base.c   | 14 ++++++--------
>>>>    drivers/scsi/megaraid/megaraid_sas_fusion.c |  3 +++
>>>>    drivers/scsi/megaraid/megaraid_sas_fusion.h |  2 ++
>>>>    4 files changed, 14 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h
>>>> b/drivers/scsi/megaraid/megaraid_sas.h
>>>> index ca86c88..43fd14f 100644
>>>> --- a/drivers/scsi/megaraid/megaraid_sas.h
>>>> +++ b/drivers/scsi/megaraid/megaraid_sas.h
>>>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT {
>>>>    #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT    14
>>>>    #define MR_MAX_MSIX_REG_ARRAY                   16
>>>>    #define MR_RDPQ_MODE_OFFSET			0X00800000
>>>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET		0X0100000
>>>> 0
>>>> +
>>>>    /*
>>>>    * register set for both 1068 and 1078 controllers
>>>>    * structure extended for 1078 registers
>>>> @@ -2140,6 +2142,7 @@ struct megasas_instance {
>>>>    	u8 is_imr;
>>>>    	u8 is_rdpq;
>>>>    	bool dev_handle;
>>>> +	bool fw_sync_cache_support;
>>>>    };
>>>>    struct MR_LD_VF_MAP {
>>>>    	u32 size;
>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
>>>> b/drivers/scsi/megaraid/megaraid_sas_base.c
>>>> index 092387f..a4a8e2f 100644
>>>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
>>>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout =
>>>> MEGASAS_DEFAULT_CMD_TIMEOUT;
>>>>    module_param(scmd_timeout, int, S_IRUGO);
>>>>    MODULE_PARM_DESC(scmd_timeout, "scsi command timeout (10-90s),
>>>> default 90s. See megasas_reset_timer.");
>>>>    
>>>> +bool block_sync_cache;
>>>> +module_param(block_sync_cache, bool, S_IRUGO);
>>>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver
>>>> Default: 0(Send it to firmware)");
>>>> +
>>>>    MODULE_LICENSE("GPL");
>>>>    MODULE_VERSION(MEGASAS_VERSION);
>>>>    MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com");
>>>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host
>>>> *shost, struct scsi_cmnd *scmd)
>>>>    		goto out_done;
>>>>    	}
>>>>    
>>>> -	switch (scmd->cmnd[0]) {
>>>> -	case SYNCHRONIZE_CACHE:
>>>> -		/*
>>>> -		 * FW takes care of flush cache on its own
>>>> -		 * No need to send it down
>>>> -		 */
>>>> +	if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) &&
>>>> +		(!instance->fw_sync_cache_support)) {
>>>>    		scmd->result = DID_OK << 16;
>>>>    		goto out_done;
>>>> -	default:
>>>> -		break;
>>>>    	}
>>>>    
>>>>    	return instance->instancet
>>>> ->build_and_issue_cmd(instance, scmd);
>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>> index 2159f6a..8237580 100644
>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>> @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct
>>>> megasas_instance *instance)
>>>>    		ret = 1;
>>>>    		goto fail_fw_init;
>>>>    	}
>>>> +	if (!block_sync_cache)
>>>> +		instance->fw_sync_cache_support = (scratch_pad_2
>>>> &
>>>> +			MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 :
>>>> 0;
>>>>    
>>>>    	IOCInitMessage =
>>>>    	  dma_alloc_coherent(&instance->pdev->dev,
>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>> index e3bee04..2857154 100644
>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>> @@ -72,6 +72,8 @@
>>>>    #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET   (0x0000030C)
>>>>    #define MPI2_REPLY_POST_HOST_INDEX_OFFSET	(0x0000006C)
>>>>    
>>>> +extern bool block_sync_cache;
>>>> +
>>>>    /*
>>>>     * Raid context flags
>>>>     */
>>>>
>>> Be extra careful with that.
>>>
>>> SYNCHRONIZE_CACHE has (potentially) a global scope, as there might
>>> be an array-wide cache, and a cache flush would affect the entire
>>> cache. Linux is using SYNCHRONIZE_CACHE as a per device setting, ie
>>> it assumes that the effects of a cache flush is restricted to the
>>> device in question.
>>>
>>> If this is _not_ the case I'd rather not enable it.
>>> Have you checked that enabling this functionality doesn't have
>>> negative performance impact?
>>>
>>> Cheers,
>>>
>>> Hannes
>> This must go in - without this fix, there is no data integrity for
>> any file system.
> That's not what I get from the change log.  What it says to me is that
> the caches are currently firmware managed.  Barring firmware bugs, that
> means that we currently don't have any integrity issues.

Your understanding (or the change log) is incorrect.

The issue here is for devices that are non-RAID or pass through - this is a real 
issue and has been seen in practice.

Ric

>
>> In effect, this driver by default has been throwing away
>> SYNCHRONIZE_CACHE commands even when acting in JBOD/non-RAID mode.
> That's not what the changelog says.  It says the cache flushing has
> been managed by the firmware only.  Meaning the firmware decides when
> to flush the cache.  Barring firmware bugs, this should mean that
> integrity is preserved in either mode.
>
>> Of course, actually doing a SYNCHRONIZE_CACHE to drives will be
>> slower than throwing it away, but this is not optional.
> What Hannes means is that we need to know that turning over cache
> management to Linux is a) safe and b) not a performance regression.
>   Given that there aren't any integrity issues, that's a reasonable
> request.
>
>> We really need to have some ways to validate that our IO stack is
>> properly and safely configured.
>>
>> I would love to see a couple of things:
>>
>> * having T10 & T13 report the existence of a volatile write cache -
>> this is different than WCE set, some devices have a write cache and
>> are battery/flash backed.
> That's the non-volatile cache log page.
>
> James
>
>
>> * having a robust tool to test over power failure/disconnect that our
>> assumptions are true - any write is durable after a SYNCHRONIZE_CACHE
>> or CACHE_FLUSH_EXT is sent and ack'ed
>>
>> Regards,
>>
>> Ric
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
>> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


  reply	other threads:[~2016-10-17 16:27 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-17 10:24 [PATCH 0/7] megaraid_sas: Updates for scsi-next Sumit Saxena
2016-10-17 10:24 ` [PATCH 1/7] megaraid_sas: For SRIOV enabled firmware, ensure VF driver waits for 30secs before reset Sumit Saxena
2016-10-17 11:29   ` Hannes Reinecke
2016-10-17 12:43   ` Tomas Henzl
2016-10-17 10:24 ` [PATCH 2/7] megaraid_sas: Send correct PhysArm to FW for R1 VD downgrade Sumit Saxena
2016-10-17 11:29   ` Hannes Reinecke
2016-10-17 12:50   ` Tomas Henzl
2016-10-17 10:24 ` [PATCH 3/7] megaraid_sas: Do not fire DCMDs during PCI shutdown/detach Sumit Saxena
2016-10-17 11:31   ` Hannes Reinecke
2016-10-17 12:19     ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to firmware Sumit Saxena
2016-10-17 11:34   ` Hannes Reinecke
2016-10-17 12:13     ` Sumit Saxena
2016-10-17 13:01     ` Ric Wheeler
2016-10-17 13:31       ` Sumit Saxena
2016-10-17 15:55       ` Christoph Hellwig
2016-10-17 16:08         ` Ric Wheeler
2016-10-17 16:23           ` Ewan D. Milne
2016-10-17 16:20       ` James Bottomley
2016-10-17 16:27         ` Ric Wheeler [this message]
2016-10-17 17:19           ` James Bottomley
2016-10-17 17:30             ` Ric Wheeler
2016-10-17 17:36             ` Kashyap Desai
2016-10-17 17:51               ` James Bottomley
2016-10-18 13:56                 ` Tomas Henzl
2016-10-19  9:50                   ` Ching Huang
2016-10-19 12:55                     ` Tomas Henzl
2016-10-19 18:07                       ` Raghava Aditya Renukunta
2016-10-21 16:30                         ` Tomas Henzl
2016-10-25  2:02                       ` Martin K. Petersen
     [not found]                         ` <CAKiknE_MM88eVON1g_qy7wbb2fkxdAs6O0SRSzN-RbQqSvx1eA@mail.gmail.com>
2016-10-27  2:07                           ` Martin K. Petersen
2016-10-18 18:47                 ` Sumit Saxena
2016-10-17 16:29         ` Ric Wheeler
2016-10-17 13:13   ` Tomas Henzl
2016-10-17 13:28     ` Sumit Saxena
2016-10-17 13:57       ` Tomas Henzl
2016-10-17 14:25         ` Sumit Saxena
2016-10-18 13:07         ` Ric Wheeler
2016-10-18 13:22           ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 5/7] megaraid_sas: driver version upgrade Sumit Saxena
2016-10-17 11:35   ` Hannes Reinecke
2016-10-17 12:20     ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 6/7] MAINTAINERS: Update megaraid maintainers list Sumit Saxena
2016-10-17 11:37   ` Hannes Reinecke
2016-10-17 10:24 ` [PATCH 7/7] megaraid_sas: Do not set MPI2_TYPE_CUDA for JBOD FP path for FW which does not support JBOD sequence map Sumit Saxena
2016-10-17 11:37   ` Hannes Reinecke
2016-10-17 13:23   ` Tomas Henzl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14aa9905-b7fc-9695-dd19-1cc31cbee330@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=emilne@redhat.com \
    --cc=fge@redhat.com \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=jejb@linux.vnet.ibm.com \
    --cc=jmoyer@redhat.com \
    --cc=kashyap.desai@broadcom.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mkp@mkp.net \
    --cc=sumit.saxena@broadcom.com \
    --cc=thenzl@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).