From: Tomas Henzl <thenzl@redhat.com>
To: James Bottomley <jejb@linux.vnet.ibm.com>,
Kashyap Desai <kashyap.desai@broadcom.com>,
Ric Wheeler <rwheeler@redhat.com>, Hannes Reinecke <hare@suse.de>,
Sumit Saxena <sumit.saxena@broadcom.com>,
linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com,
"Christoph Hellwig" <hch@infradead.org>,
"Martin K. Petersen" <mkp@mkp.net>,
"Jeff Moyer" <jmoyer@redhat.com>, "Gris Ge" <fge@redhat.com>,
"Ewan Milne" <emilne@redhat.com>, "Jens Axboe" <axboe@kernel.dk>,
黃清隆 <ching2048@areca.com.tw>,
RaghavaAditya.Renukunta@microsemi.com
Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to firmware
Date: Tue, 18 Oct 2016 15:56:48 +0200 [thread overview]
Message-ID: <b803c55c-23db-3bec-3d2f-58bc53d4ef37@redhat.com> (raw)
In-Reply-To: <1476726692.2734.36.camel@linux.vnet.ibm.com>
Hi,
similar suspicious code path can be found in the queuecommand functions in other drivers too
these are -
pmcraid.c
arcmsr_hba.c
cc-ing maintainers -
(but both drivers seem to be unmaintained for a while -
I've added Ching for arcmsr and Raghava for pmcraid)
please read this thread and verify that your driver and firmware is safe.
It is expected that a raid card controls the disk write cache (switches it off)
but this might not be true for all modes of operation a raid adapter handles
- pass through/non-RAID etc. In such case when disk write cache is enabled
and your driver skips the SYNCHRONIZE_CACHE command a FS corruption
is very much possible during unexpected power loss or even a clean shutdown.
tomash
On 17.10.2016 19:51, James Bottomley wrote:
> On Mon, 2016-10-17 at 23:06 +0530, Kashyap Desai wrote:
>>> -----Original Message-----
>>> From: James Bottomley [mailto:jejb@linux.vnet.ibm.com]
>>> Sent: Monday, October 17, 2016 10:50 PM
>>> To: Ric Wheeler; Hannes Reinecke; Sumit Saxena;
>>> linux-scsi@vger.kernel.org
>>> Cc: martin.petersen@oracle.com; thenzl@redhat.com;
>>> kashyap.desai@broadcom.com; Christoph Hellwig; Martin K. Petersen;
>>> Jeff
>>> Moyer; Gris Ge; Ewan Milne; Jens Axboe
>>> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE
>>> command
>>> to firmware
>>>
>>> On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote:
>>>> On 10/17/2016 12:20 PM, James Bottomley wrote:
>>>>> On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote:
>>>>>> On 10/17/2016 07:34 AM, Hannes Reinecke wrote:
>>>>>>> On 10/17/2016 12:24 PM, Sumit Saxena wrote:
>>>>>>>> megaraid_sas driver returns SYNCHRONIZE_CACHE command
>>>>>>>> back to
>>>>>>>> SCSI layer without sending it to firmware as firmware
>>>>>>>> takes
>>>>>>>> care of flushing cache. This patch will change the
>>>>>>>> driver
>>>>>>>> behavior wrt SYNCHRONIZE_CACHE handling. If underlying
>>>>>>>> firmware has support to handle the SYNCHORNIZE_CACHE,
>>>>>>>> driver
>>>>>>>> will send it for firmware otherwise complete it back to
>>>>>>>> SCSI
>>>>>>>> layer with SUCCESS immediately. If Firmware handle
>>>>>>>> SYNCHORNIZE_CACHE for both VD and JBOD
>>>>>>>> "canHandleSyncCache"
>>>>>>>> bit in scratch pad register(offset
>>>>>>>> 0x00B4) will be set.
>>>>>>>>
>>>>>>>> This behavior can be controlled via module parameter and
>>>>>>>> user
>>>>>>>> can fallback to old behavior of returning
>>>>>>>> SYNCHRONIZE_CACHE by
>>>>>>>> driver only without sending it to firmware.
>>>>>>>>
>>>>>>>> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
>>>>>>>> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
>>>>>>>> ---
>>>>>>>> drivers/scsi/megaraid/megaraid_sas.h | 3 +++
>>>>>>>> drivers/scsi/megaraid/megaraid_sas_base.c | 14
>>>>>>>> ++++++---
>>>>>>>> -----
>>>>>>>> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++
>>>>>>>> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++
>>>>>>>> 4 files changed, 14 insertions(+), 8 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h
>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas.h
>>>>>>>> index ca86c88..43fd14f 100644
>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas.h
>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas.h
>>>>>>>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT {
>>>>>>>> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14
>>>>>>>> #define MR_MAX_MSIX_REG_ARRAY 16
>>>>>>>> #define MR_RDPQ_MODE_OFFSET 0X0
>>>>>>>> 0800
>>>>>>>> 000
>>>>>>>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0
>>>>>>>> X010
>>>>>>>> 0000
>>>>>>>> 0
>>>>>>>> +
>>>>>>>> /*
>>>>>>>> * register set for both 1068 and 1078 controllers
>>>>>>>> * structure extended for 1078 registers @@ -2140,6
>>>>>>>> +2142,7
>>>>>>>> @@ struct megasas_instance {
>>>>>>>> u8 is_imr;
>>>>>>>> u8 is_rdpq;
>>>>>>>> bool dev_handle;
>>>>>>>> + bool fw_sync_cache_support;
>>>>>>>> };
>>>>>>>> struct MR_LD_VF_MAP {
>>>>>>>> u32 size;
>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_base.c
>>>>>>>> index 092387f..a4a8e2f 100644
>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
>>>>>>>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout =
>>>>>>>> MEGASAS_DEFAULT_CMD_TIMEOUT;
>>>>>>>> module_param(scmd_timeout, int, S_IRUGO);
>>>>>>>> MODULE_PARM_DESC(scmd_timeout, "scsi command timeout
>>>>>>>> (10
>>>>>>>> -90s), default 90s. See megasas_reset_timer.");
>>>>>>>>
>>>>>>>> +bool block_sync_cache;
>>>>>>>> +module_param(block_sync_cache, bool, S_IRUGO);
>>>>>>>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by
>>>>>>>> driver
>>>>>>>> Default: 0(Send it to firmware)");
>>>>>>>> +
>>>>>>>> MODULE_LICENSE("GPL");
>>>>>>>> MODULE_VERSION(MEGASAS_VERSION);
>>>>>>>> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com");
>>>>>>>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct
>>> Scsi_Host
>>>>>>>> *shost, struct scsi_cmnd *scmd)
>>>>>>>> goto out_done;
>>>>>>>> }
>>>>>>>>
>>>>>>>> - switch (scmd->cmnd[0]) {
>>>>>>>> - case SYNCHRONIZE_CACHE:
>>>>>>>> - /*
>>>>>>>> - * FW takes care of flush cache on its
>>>>>>>> own
>>>>>>>> - * No need to send it down
>>>>>>>> - */
>>>>>>>> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) &&
>>>>>>>> + (!instance->fw_sync_cache_support)) {
>>>>>>>> scmd->result = DID_OK << 16;
>>>>>>>> goto out_done;
>>>>>>>> - default:
>>>>>>>> - break;
>>>>>>>> }
>>>>>>>>
>>>>>>>> return instance->instancet
>>>>>>>> ->build_and_issue_cmd(instance, scmd);
>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>>>>>> index 2159f6a..8237580 100644
>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>>>>>>>> @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct
>>>>>>>> megasas_instance *instance)
>>>>>>>> ret = 1;
>>>>>>>> goto fail_fw_init;
>>>>>>>> }
>>>>>>>> + if (!block_sync_cache)
>>>>>>>> + instance->fw_sync_cache_support =
>>>>>>>> (scratch_pad_2
>>>>>>>> &
>>>>>>>> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET)
>>>>>>>> ? 1
>>>>>>>> :
>>>>>>>> 0;
>>>>>>>>
>>>>>>>> IOCInitMessage =
>>>>>>>> dma_alloc_coherent(&instance->pdev->dev,
>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>>>>>> index e3bee04..2857154 100644
>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h
>>>>>>>> @@ -72,6 +72,8 @@
>>>>>>>> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET
>>>>>>>> (0x0000030C)
>>>>>>>> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x00
>>>>>>>> 0000
>>>>>>>> 6C)
>>>>>>>>
>>>>>>>> +extern bool block_sync_cache;
>>>>>>>> +
>>>>>>>> /*
>>>>>>>> * Raid context flags
>>>>>>>> */
>>>>>>>>
>>>>>>> Be extra careful with that.
>>>>>>>
>>>>>>> SYNCHRONIZE_CACHE has (potentially) a global scope, as
>>>>>>> there
>>>>>>> might be an array-wide cache, and a cache flush would
>>>>>>> affect the
>>>>>>> entire cache. Linux is using SYNCHRONIZE_CACHE as a per
>>>>>>> device
>>>>>>> setting, ie it assumes that the effects of a cache flush is
>>>>>>> restricted to the device in question.
>>>>>>>
>>>>>>> If this is _not_ the case I'd rather not enable it.
>>>>>>> Have you checked that enabling this functionality doesn't
>>>>>>> have
>>>>>>> negative performance impact?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Hannes
>>>>>> This must go in - without this fix, there is no data
>>>>>> integrity for
>>>>>> any file system.
>>>>> That's not what I get from the change log. What it says to me
>>>>> is
>>>>> that the caches are currently firmware managed. Barring
>>>>> firmware
>>>>> bugs, that means that we currently don't have any integrity
>>>>> issues.
>>>> Your understanding (or the change log) is incorrect.
>>> There's no current way in English of construing "as firmware takes
>>> care of
>>> flushing cache" to mean its inverse, so the changelog needs
>>> updating to
>>> explain
>>> that firmware does *not* take care of cache flushing, so
>>> effectively
>>> nothing
>>> does and we'll need a cc to stable if the problem is that nothing
>>> takes
>>> care of
>>> flushing the drive caches.
>>>
>>> James
>> Sorry for confusion. More accurate to say -
>>
>> Current megaraid_sas driver returns SYNCHRONIZE_CACHE command back to
>> SCSI layer without sending it down to firmware as firmware supposed
>> to takes care of flushing cache for all Virtual Disk at the time of
>> system reboot/shutdown. Because of above FW rule driver coded wrongly
>> and it added bug that SYNCHRONIZE_CACHE is not passed to FW for JBOD
>> as well. (MR Firmware wanted OS to manage SYNCHRONIZE_CACHE and pass
>> the same to the Firmware. ). We figure out that even for VD, why
>> driver should send back SYNC_CACHE command...let's have the same in
>> FW (giving all control in FW)
>>
>> New behavior described -
>>
>> IF application requests SYNCH_CACHE
>> IF any FW which supports new API bit called canHandleSyncCache
>> Driver sends SYNCH_CACHE command to the FW
>> IF 'VirtualDisk'
>> FW does not send SYNCH_CACHE to drives
>> FW returns SUCCESS
>> ELSE IF 'JBOD'
>> FW sends SYNCH_CACHE to drive
>> FW obtains status from drive and returns same status back to
>> driver
>> ENDIF
>>
>> Fixing this is robust if FW and driver changes are inline. See below
>> for more detail.
>>
>> Case - 1
>> This patch is attempt to fix one level problem where Virtual Disks
>> are not managed correctly in MR FW. There are some MR FW (E.a Gen2
>> and Gen2.5 FW), which set WCE bit for Virtual Disk but FW is not
>> reply correct for SYNCH_CACHE. This was handled in past and carry
>> forward (in driver + FW ) to all new generation products as well. We
>> tried to collect all possible details why it was handled such way to
>> provide better driver fix for this issue(mainly to avoid this FW
>> checks and module parameters etc..). But now it looks like to make
>> generic fix (Device ID based etc..)is even risky, so went with this
>> protective approach. It is very risky to find out which Device ID
>> and FW has capability to manage SYNC_CACHE, so we managed to figure
>> out which are the new FW using FW capability bit.
>>
>> E.g Liberator/Thunderbolt MR FW SYNC_CACHE is write Unsupported
>> command (this is a firmware bug) and immediately it will be failed to
>> SML. This will cause File system to go Read-only.
> That's a better description ... what you're saying is that the patch
> doesn't actually fix the bug Ric is worrying about.
>
>> Case -2
>> One more thing which we are trying and known driver can send
>> "SYNC_CACHE" unconditionally to all generation of FW. For this we
>> are doing more unit testing on various controllers/FW and planning to
>> provide another patch to fix JBOD path for any FW. (It will not be
>> based on FW capability/module parameter).
> OK, but we really need this ASAP. As Ric said, data integrity is at
> risk until this is fixed. Can you also reference the commit that
> introduced the problem so we know how far to do the sable backports?
>
>> If we remove module parameter, we can ask customer to disable WCE on
>> drive to get similar impact.
> Thanks,
>
> James
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-10-18 13:56 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-17 10:24 [PATCH 0/7] megaraid_sas: Updates for scsi-next Sumit Saxena
2016-10-17 10:24 ` [PATCH 1/7] megaraid_sas: For SRIOV enabled firmware, ensure VF driver waits for 30secs before reset Sumit Saxena
2016-10-17 11:29 ` Hannes Reinecke
2016-10-17 12:43 ` Tomas Henzl
2016-10-17 10:24 ` [PATCH 2/7] megaraid_sas: Send correct PhysArm to FW for R1 VD downgrade Sumit Saxena
2016-10-17 11:29 ` Hannes Reinecke
2016-10-17 12:50 ` Tomas Henzl
2016-10-17 10:24 ` [PATCH 3/7] megaraid_sas: Do not fire DCMDs during PCI shutdown/detach Sumit Saxena
2016-10-17 11:31 ` Hannes Reinecke
2016-10-17 12:19 ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to firmware Sumit Saxena
2016-10-17 11:34 ` Hannes Reinecke
2016-10-17 12:13 ` Sumit Saxena
2016-10-17 13:01 ` Ric Wheeler
2016-10-17 13:31 ` Sumit Saxena
2016-10-17 15:55 ` Christoph Hellwig
2016-10-17 16:08 ` Ric Wheeler
2016-10-17 16:23 ` Ewan D. Milne
2016-10-17 16:20 ` James Bottomley
2016-10-17 16:27 ` Ric Wheeler
2016-10-17 17:19 ` James Bottomley
2016-10-17 17:30 ` Ric Wheeler
2016-10-17 17:36 ` Kashyap Desai
2016-10-17 17:51 ` James Bottomley
2016-10-18 13:56 ` Tomas Henzl [this message]
2016-10-19 9:50 ` Ching Huang
2016-10-19 12:55 ` Tomas Henzl
2016-10-19 18:07 ` Raghava Aditya Renukunta
2016-10-21 16:30 ` Tomas Henzl
2016-10-25 2:02 ` Martin K. Petersen
[not found] ` <CAKiknE_MM88eVON1g_qy7wbb2fkxdAs6O0SRSzN-RbQqSvx1eA@mail.gmail.com>
2016-10-27 2:07 ` Martin K. Petersen
2016-10-18 18:47 ` Sumit Saxena
2016-10-17 16:29 ` Ric Wheeler
2016-10-17 13:13 ` Tomas Henzl
2016-10-17 13:28 ` Sumit Saxena
2016-10-17 13:57 ` Tomas Henzl
2016-10-17 14:25 ` Sumit Saxena
2016-10-18 13:07 ` Ric Wheeler
2016-10-18 13:22 ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 5/7] megaraid_sas: driver version upgrade Sumit Saxena
2016-10-17 11:35 ` Hannes Reinecke
2016-10-17 12:20 ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 6/7] MAINTAINERS: Update megaraid maintainers list Sumit Saxena
2016-10-17 11:37 ` Hannes Reinecke
2016-10-17 10:24 ` [PATCH 7/7] megaraid_sas: Do not set MPI2_TYPE_CUDA for JBOD FP path for FW which does not support JBOD sequence map Sumit Saxena
2016-10-17 11:37 ` Hannes Reinecke
2016-10-17 13:23 ` Tomas Henzl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b803c55c-23db-3bec-3d2f-58bc53d4ef37@redhat.com \
--to=thenzl@redhat.com \
--cc=RaghavaAditya.Renukunta@microsemi.com \
--cc=axboe@kernel.dk \
--cc=ching2048@areca.com.tw \
--cc=emilne@redhat.com \
--cc=fge@redhat.com \
--cc=hare@suse.de \
--cc=hch@infradead.org \
--cc=jejb@linux.vnet.ibm.com \
--cc=jmoyer@redhat.com \
--cc=kashyap.desai@broadcom.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mkp@mkp.net \
--cc=rwheeler@redhat.com \
--cc=sumit.saxena@broadcom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).