From: Ric Wheeler <rwheeler@redhat.com>
To: James Bottomley <jejb@linux.vnet.ibm.com>,
Hannes Reinecke <hare@suse.de>,
Sumit Saxena <sumit.saxena@broadcom.com>,
linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, thenzl@redhat.com,
kashyap.desai@broadcom.com, Christoph Hellwig <hch@infradead.org>,
"Martin K. Petersen" <mkp@mkp.net>,
Jeff Moyer <jmoyer@redhat.com>, Gris Ge <fge@redhat.com>,
Ewan Milne <emilne@redhat.com>, Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to firmware
Date: Mon, 17 Oct 2016 12:27:00 -0400 [thread overview]
Message-ID: <14aa9905-b7fc-9695-dd19-1cc31cbee330@redhat.com> (raw)
In-Reply-To: <1476721205.2734.12.camel@linux.vnet.ibm.com>
On 10/17/2016 12:20 PM, James Bottomley wrote:
> On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote:
>> On 10/17/2016 07:34 AM, Hannes Reinecke wrote:
>>> On 10/17/2016 12:24 PM, Sumit Saxena wrote:
>>>> megaraid_sas driver returns SYNCHRONIZE_CACHE command back to
>>>> SCSI layer without sending it to firmware as firmware takes care
>>>> of flushing cache. This patch will change the driver behavior
>>>> wrt SYNCHRONIZE_CACHE handling. If underlying firmware has
>>>> support to handle the SYNCHORNIZE_CACHE, driver will send it for
>>>> firmware otherwise complete it back to SCSI layer with SUCCESS
>>>> immediately. If Firmware handle SYNCHORNIZE_CACHE for both VD
>>>> and JBOD "canHandleSyncCache" bit in scratch pad register(offset
>>>> 0x00B4) will be set.
>>>>
>>>> This behavior can be controlled via module parameter and user can
>>>> fallback to old behavior of returning SYNCHRONIZE_CACHE by driver
>>>> only without sending it to firmware.
>>>>
>>>> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
>>>> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
>>>> ---
>>>> drivers/scsi/megaraid/megaraid_sas.h | 3 +++
>>>> drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++--------
>>>> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++
>>>> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++
>>>> 4 files changed, 14 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h
>>>> b/drivers/scsi/megaraid/megaraid_sas.h
>>>> index ca86c88..43fd14f 100644
>>>> --- a/drivers/scsi/megaraid/megaraid_sas.h
>>>> +++ b/drivers/scsi/megaraid/megaraid_sas.h
>>>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT {
>>>> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14
>>>> #define MR_MAX_MSIX_REG_ARRAY 16
>>>> #define MR_RDPQ_MODE_OFFSET 0X00800000
>>>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X0100000
>>>> 0
>>>> +
>>>> /*
>>>> * register set for both 1068 and 1078 controllers
>>>> * structure extended for 1078 registers
>>>> @@ -2140,6 +2142,7 @@ struct megasas_instance {
>>>> u8 is_imr;
>>>> u8 is_rdpq;
>>>> bool dev_handle;
>>>> + bool fw_sync_cache_support;
>>>> };
>>>> struct MR_LD_VF_MAP {
>>>> u32 size;
>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
>>>> b/drivers/scsi/megaraid/megaraid_sas_base.c
>>>> index 092387f..a4a8e2f 100644
>>>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
>>>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout =
>>>> MEGASAS_DEFAULT_CMD_TIMEOUT;
>>>> module_param(scmd_timeout, int, S_IRUGO);
>>>> MODULE_PARM_DESC(scmd_timeout, "scsi command timeout (10-90s),
>>>> default 90s. See megasas_reset_timer.");
>>>>
>>>> +bool block_sync_cache;
>>>> +module_param(block_sync_cache, bool, S_IRUGO);
>>>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver
>>>> Default: 0(Send it to firmware)");
>>>> +
>>>> MODULE_LICENSE("GPL");
>>>> MODULE_VERSION(MEGASAS_VERSION);
>>>> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com");
>>>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host
>>>> *shost, struct scsi_cmnd *scmd)
>>>> goto out_done;
>>>> }
>>>>
>>>> - switch (scmd->cmnd[0]) {
>>>> - case SYNCHRONIZE_CACHE:
>>>> - /*
>>>> - * FW takes care of flush cache on its own
>>>> - * No need to send it down
>>>> - */
>>>> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) &&
>>>> + (!instance->fw_sync_cache_support)) {
>>>> scmd->result = DID_OK << 16;
>>>> goto out_done;
>>>> - default:
>>>> - break;
>>>> }
>>>>
>>>> return instance->instancet
>>>> ->build_and_issue_cmd(instance, scmd);
>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>> index 2159f6a..8237580 100644
>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>> @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct
>>>> megasas_instance *instance)
>>>> ret = 1;
>>>> goto fail_fw_init;
>>>> }
>>>> + if (!block_sync_cache)
>>>> + instance->fw_sync_cache_support = (scratch_pad_2
>>>> &
>>>> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 :
>>>> 0;
>>>>
>>>> IOCInitMessage =
>>>> dma_alloc_coherent(&instance->pdev->dev,
>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>> index e3bee04..2857154 100644
>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>> @@ -72,6 +72,8 @@
>>>> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET (0x0000030C)
>>>> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x0000006C)
>>>>
>>>> +extern bool block_sync_cache;
>>>> +
>>>> /*
>>>> * Raid context flags
>>>> */
>>>>
>>> Be extra careful with that.
>>>
>>> SYNCHRONIZE_CACHE has (potentially) a global scope, as there might
>>> be an array-wide cache, and a cache flush would affect the entire
>>> cache. Linux is using SYNCHRONIZE_CACHE as a per device setting, ie
>>> it assumes that the effects of a cache flush is restricted to the
>>> device in question.
>>>
>>> If this is _not_ the case I'd rather not enable it.
>>> Have you checked that enabling this functionality doesn't have
>>> negative performance impact?
>>>
>>> Cheers,
>>>
>>> Hannes
>> This must go in - without this fix, there is no data integrity for
>> any file system.
> That's not what I get from the change log. What it says to me is that
> the caches are currently firmware managed. Barring firmware bugs, that
> means that we currently don't have any integrity issues.
Your understanding (or the change log) is incorrect.
The issue here is for devices that are non-RAID or pass through - this is a real
issue and has been seen in practice.
Ric
>
>> In effect, this driver by default has been throwing away
>> SYNCHRONIZE_CACHE commands even when acting in JBOD/non-RAID mode.
> That's not what the changelog says. It says the cache flushing has
> been managed by the firmware only. Meaning the firmware decides when
> to flush the cache. Barring firmware bugs, this should mean that
> integrity is preserved in either mode.
>
>> Of course, actually doing a SYNCHRONIZE_CACHE to drives will be
>> slower than throwing it away, but this is not optional.
> What Hannes means is that we need to know that turning over cache
> management to Linux is a) safe and b) not a performance regression.
> Given that there aren't any integrity issues, that's a reasonable
> request.
>
>> We really need to have some ways to validate that our IO stack is
>> properly and safely configured.
>>
>> I would love to see a couple of things:
>>
>> * having T10 & T13 report the existence of a volatile write cache -
>> this is different than WCE set, some devices have a write cache and
>> are battery/flash backed.
> That's the non-volatile cache log page.
>
> James
>
>
>> * having a robust tool to test over power failure/disconnect that our
>> assumptions are true - any write is durable after a SYNCHRONIZE_CACHE
>> or CACHE_FLUSH_EXT is sent and ack'ed
>>
>> Regards,
>>
>> Ric
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
>> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
next prev parent reply other threads:[~2016-10-17 16:27 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-17 10:24 [PATCH 0/7] megaraid_sas: Updates for scsi-next Sumit Saxena
2016-10-17 10:24 ` [PATCH 1/7] megaraid_sas: For SRIOV enabled firmware, ensure VF driver waits for 30secs before reset Sumit Saxena
2016-10-17 11:29 ` Hannes Reinecke
2016-10-17 12:43 ` Tomas Henzl
2016-10-17 10:24 ` [PATCH 2/7] megaraid_sas: Send correct PhysArm to FW for R1 VD downgrade Sumit Saxena
2016-10-17 11:29 ` Hannes Reinecke
2016-10-17 12:50 ` Tomas Henzl
2016-10-17 10:24 ` [PATCH 3/7] megaraid_sas: Do not fire DCMDs during PCI shutdown/detach Sumit Saxena
2016-10-17 11:31 ` Hannes Reinecke
2016-10-17 12:19 ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to firmware Sumit Saxena
2016-10-17 11:34 ` Hannes Reinecke
2016-10-17 12:13 ` Sumit Saxena
2016-10-17 13:01 ` Ric Wheeler
2016-10-17 13:31 ` Sumit Saxena
2016-10-17 15:55 ` Christoph Hellwig
2016-10-17 16:08 ` Ric Wheeler
2016-10-17 16:23 ` Ewan D. Milne
2016-10-17 16:20 ` James Bottomley
2016-10-17 16:27 ` Ric Wheeler [this message]
2016-10-17 17:19 ` James Bottomley
2016-10-17 17:30 ` Ric Wheeler
2016-10-17 17:36 ` Kashyap Desai
2016-10-17 17:51 ` James Bottomley
2016-10-18 13:56 ` Tomas Henzl
2016-10-19 9:50 ` Ching Huang
2016-10-19 12:55 ` Tomas Henzl
2016-10-19 18:07 ` Raghava Aditya Renukunta
2016-10-21 16:30 ` Tomas Henzl
2016-10-25 2:02 ` Martin K. Petersen
[not found] ` <CAKiknE_MM88eVON1g_qy7wbb2fkxdAs6O0SRSzN-RbQqSvx1eA@mail.gmail.com>
2016-10-27 2:07 ` Martin K. Petersen
2016-10-18 18:47 ` Sumit Saxena
2016-10-17 16:29 ` Ric Wheeler
2016-10-17 13:13 ` Tomas Henzl
2016-10-17 13:28 ` Sumit Saxena
2016-10-17 13:57 ` Tomas Henzl
2016-10-17 14:25 ` Sumit Saxena
2016-10-18 13:07 ` Ric Wheeler
2016-10-18 13:22 ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 5/7] megaraid_sas: driver version upgrade Sumit Saxena
2016-10-17 11:35 ` Hannes Reinecke
2016-10-17 12:20 ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 6/7] MAINTAINERS: Update megaraid maintainers list Sumit Saxena
2016-10-17 11:37 ` Hannes Reinecke
2016-10-17 10:24 ` [PATCH 7/7] megaraid_sas: Do not set MPI2_TYPE_CUDA for JBOD FP path for FW which does not support JBOD sequence map Sumit Saxena
2016-10-17 11:37 ` Hannes Reinecke
2016-10-17 13:23 ` Tomas Henzl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=14aa9905-b7fc-9695-dd19-1cc31cbee330@redhat.com \
--to=rwheeler@redhat.com \
--cc=axboe@kernel.dk \
--cc=emilne@redhat.com \
--cc=fge@redhat.com \
--cc=hare@suse.de \
--cc=hch@infradead.org \
--cc=jejb@linux.vnet.ibm.com \
--cc=jmoyer@redhat.com \
--cc=kashyap.desai@broadcom.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mkp@mkp.net \
--cc=sumit.saxena@broadcom.com \
--cc=thenzl@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).