From: Sumit Saxena <sumit.saxena@broadcom.com>
To: James Bottomley <jejb@linux.vnet.ibm.com>,
Kashyap Desai <kashyap.desai@broadcom.com>,
Ric Wheeler <rwheeler@redhat.com>, Hannes Reinecke <hare@suse.de>,
linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, thenzl@redhat.com,
Christoph Hellwig <hch@infradead.org>,
"Martin K. Petersen" <mkp@mkp.net>,
Jeff Moyer <jmoyer@redhat.com>, Gris Ge <fge@redhat.com>,
Ewan Milne <emilne@redhat.com>, Jens Axboe <axboe@kernel.dk>
Subject: RE: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to firmware
Date: Wed, 19 Oct 2016 00:17:06 +0530 [thread overview]
Message-ID: <a5ef18ed27db613b106ddbf4564320d5@mail.gmail.com> (raw)
In-Reply-To: <1476726692.2734.36.camel@linux.vnet.ibm.com>
>-----Original Message-----
>From: James Bottomley [mailto:jejb@linux.vnet.ibm.com]
>Sent: Monday, October 17, 2016 11:22 PM
>To: Kashyap Desai; Ric Wheeler; Hannes Reinecke; Sumit Saxena; linux-
>scsi@vger.kernel.org
>Cc: martin.petersen@oracle.com; thenzl@redhat.com; Christoph Hellwig;
>Martin
>K. Petersen; Jeff Moyer; Gris Ge; Ewan Milne; Jens Axboe
>Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to
>firmware
>
>On Mon, 2016-10-17 at 23:06 +0530, Kashyap Desai wrote:
>> > -----Original Message-----
>> > From: James Bottomley [mailto:jejb@linux.vnet.ibm.com]
>> > Sent: Monday, October 17, 2016 10:50 PM
>> > To: Ric Wheeler; Hannes Reinecke; Sumit Saxena;
>> > linux-scsi@vger.kernel.org
>> > Cc: martin.petersen@oracle.com; thenzl@redhat.com;
>> > kashyap.desai@broadcom.com; Christoph Hellwig; Martin K. Petersen;
>> > Jeff Moyer; Gris Ge; Ewan Milne; Jens Axboe
>> > Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE
>> > command to firmware
>> >
>> > On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote:
>> > > On 10/17/2016 12:20 PM, James Bottomley wrote:
>> > > > On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote:
>> > > > > On 10/17/2016 07:34 AM, Hannes Reinecke wrote:
>> > > > > > On 10/17/2016 12:24 PM, Sumit Saxena wrote:
>> > > > > > > megaraid_sas driver returns SYNCHRONIZE_CACHE command back
>> > > > > > > to SCSI layer without sending it to firmware as firmware
>> > > > > > > takes care of flushing cache. This patch will change the
>> > > > > > > driver behavior wrt SYNCHRONIZE_CACHE handling. If
>> > > > > > > underlying firmware has support to handle the
>> > > > > > > SYNCHORNIZE_CACHE, driver will send it for firmware
>> > > > > > > otherwise complete it back to SCSI layer with SUCCESS
>> > > > > > > immediately. If Firmware handle SYNCHORNIZE_CACHE for
>> > > > > > > both VD and JBOD "canHandleSyncCache"
>> > > > > > > bit in scratch pad register(offset
>> > > > > > > 0x00B4) will be set.
>> > > > > > >
>> > > > > > > This behavior can be controlled via module parameter and
>> > > > > > > user can fallback to old behavior of returning
>> > > > > > > SYNCHRONIZE_CACHE by driver only without sending it to
>> > > > > > > firmware.
>> > > > > > >
>> > > > > > > Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
>> > > > > > > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
>> > > > > > > ---
>> > > > > > > drivers/scsi/megaraid/megaraid_sas.h | 3 +++
>> > > > > > > drivers/scsi/megaraid/megaraid_sas_base.c | 14
>> > > > > > > ++++++---
>> > > > > > > -----
>> > > > > > > drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++
>> > > > > > > drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++
>> > > > > > > 4 files changed, 14 insertions(+), 8 deletions(-)
>> > > > > > >
>> > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas.h
>> > > > > > > b/drivers/scsi/megaraid/megaraid_sas.h
>> > > > > > > index ca86c88..43fd14f 100644
>> > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas.h
>> > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas.h
>> > > > > > > @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT {
>> > > > > > > #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14
>> > > > > > > #define MR_MAX_MSIX_REG_ARRAY 16
>> > > > > > > #define MR_RDPQ_MODE_OFFSET 0X0
>> > > > > > > 0800
>> > > > > > > 000
>> > > > > > > +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0
>> > > > > > > X010
>> > > > > > > 0000
>> > > > > > > 0
>> > > > > > > +
>> > > > > > > /*
>> > > > > > > * register set for both 1068 and 1078 controllers
>> > > > > > > * structure extended for 1078 registers @@ -2140,6
>> > > > > > > +2142,7
>> > > > > > > @@ struct megasas_instance {
>> > > > > > > u8 is_imr;
>> > > > > > > u8 is_rdpq;
>> > > > > > > bool dev_handle;
>> > > > > > > + bool fw_sync_cache_support;
>> > > > > > > };
>> > > > > > > struct MR_LD_VF_MAP {
>> > > > > > > u32 size;
>> > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
>> > > > > > > b/drivers/scsi/megaraid/megaraid_sas_base.c
>> > > > > > > index 092387f..a4a8e2f 100644
>> > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_base.c
>> > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
>> > > > > > > @@ -104,6 +104,10 @@ unsigned int scmd_timeout =
>> > > > > > > MEGASAS_DEFAULT_CMD_TIMEOUT;
>> > > > > > > module_param(scmd_timeout, int, S_IRUGO);
>> > > > > > > MODULE_PARM_DESC(scmd_timeout, "scsi command timeout
>> > > > > > > (10
>> > > > > > > -90s), default 90s. See megasas_reset_timer.");
>> > > > > > >
>> > > > > > > +bool block_sync_cache;
>> > > > > > > +module_param(block_sync_cache, bool, S_IRUGO);
>> > > > > > > +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by
>> > > > > > > driver
>> > > > > > > Default: 0(Send it to firmware)");
>> > > > > > > +
>> > > > > > > MODULE_LICENSE("GPL");
>> > > > > > > MODULE_VERSION(MEGASAS_VERSION);
>> > > > > > > MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com");
>> > > > > > > @@ -1700,16 +1704,10 @@ megasas_queue_command(struct
>> > Scsi_Host
>> > > > > > > *shost, struct scsi_cmnd *scmd)
>> > > > > > > goto out_done;
>> > > > > > > }
>> > > > > > >
>> > > > > > > - switch (scmd->cmnd[0]) {
>> > > > > > > - case SYNCHRONIZE_CACHE:
>> > > > > > > - /*
>> > > > > > > - * FW takes care of flush cache on its
>> > > > > > > own
>> > > > > > > - * No need to send it down
>> > > > > > > - */
>> > > > > > > + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) &&
>> > > > > > > + (!instance->fw_sync_cache_support)) {
>> > > > > > > scmd->result = DID_OK << 16;
>> > > > > > > goto out_done;
>> > > > > > > - default:
>> > > > > > > - break;
>> > > > > > > }
>> > > > > > >
>> > > > > > > return instance->instancet
>> > > > > > > ->build_and_issue_cmd(instance, scmd);
>> > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>> > > > > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>> > > > > > > index 2159f6a..8237580 100644
>> > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>> > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>> > > > > > > @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct
>> > > > > > > megasas_instance *instance)
>> > > > > > > ret = 1;
>> > > > > > > goto fail_fw_init;
>> > > > > > > }
>> > > > > > > + if (!block_sync_cache)
>> > > > > > > + instance->fw_sync_cache_support =
>> > > > > > > (scratch_pad_2
>> > > > > > > &
>> > > > > > > + MR_CAN_HANDLE_SYNC_CACHE_OFFSET)
>> > > > > > > ? 1
>> > > > > > > :
>> > > > > > > 0;
>> > > > > > >
>> > > > > > > IOCInitMessage =
>> > > > > > > dma_alloc_coherent(&instance->pdev->dev,
>> > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h
>> > > > > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.h
>> > > > > > > index e3bee04..2857154 100644
>> > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h
>> > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h
>> > > > > > > @@ -72,6 +72,8 @@
>> > > > > > > #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET
>> > > > > > > (0x0000030C)
>> > > > > > > #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x00
>> > > > > > > 0000
>> > > > > > > 6C)
>> > > > > > >
>> > > > > > > +extern bool block_sync_cache;
>> > > > > > > +
>> > > > > > > /*
>> > > > > > > * Raid context flags
>> > > > > > > */
>> > > > > > >
>> > > > > > Be extra careful with that.
>> > > > > >
>> > > > > > SYNCHRONIZE_CACHE has (potentially) a global scope, as there
>> > > > > > might be an array-wide cache, and a cache flush would affect
>> > > > > > the entire cache. Linux is using SYNCHRONIZE_CACHE as a per
>> > > > > > device setting, ie it assumes that the effects of a cache
>> > > > > > flush is restricted to the device in question.
>> > > > > >
>> > > > > > If this is _not_ the case I'd rather not enable it.
>> > > > > > Have you checked that enabling this functionality doesn't
>> > > > > > have negative performance impact?
>> > > > > >
>> > > > > > Cheers,
>> > > > > >
>> > > > > > Hannes
>> > > > > This must go in - without this fix, there is no data integrity
>> > > > > for any file system.
>> > > > That's not what I get from the change log. What it says to me
>> > > > is that the caches are currently firmware managed. Barring
>> > > > firmware bugs, that means that we currently don't have any
>> > > > integrity issues.
>> > >
>> > > Your understanding (or the change log) is incorrect.
>> >
>> > There's no current way in English of construing "as firmware takes
>> > care of flushing cache" to mean its inverse, so the changelog needs
>> > updating to explain that firmware does *not* take care of cache
>> > flushing, so effectively nothing does and we'll need a cc to stable
>> > if the problem is that nothing takes care of flushing the drive
>> > caches.
>> >
>> > James
>>
>> Sorry for confusion. More accurate to say -
>>
>> Current megaraid_sas driver returns SYNCHRONIZE_CACHE command back to
>> SCSI layer without sending it down to firmware as firmware supposed to
>> takes care of flushing cache for all Virtual Disk at the time of
>> system reboot/shutdown. Because of above FW rule driver coded wrongly
>> and it added bug that SYNCHRONIZE_CACHE is not passed to FW for JBOD
>> as well. (MR Firmware wanted OS to manage SYNCHRONIZE_CACHE and pass
>> the same to the Firmware. ). We figure out that even for VD, why
>> driver should send back SYNC_CACHE command...let's have the same in FW
>> (giving all control in FW)
>>
>> New behavior described -
>>
>> IF application requests SYNCH_CACHE
>> IF any FW which supports new API bit called canHandleSyncCache
>> Driver sends SYNCH_CACHE command to the FW
>> IF 'VirtualDisk'
>> FW does not send SYNCH_CACHE to drives
>> FW returns SUCCESS
>> ELSE IF 'JBOD'
>> FW sends SYNCH_CACHE to drive
>> FW obtains status from drive and returns same status back to
>> driver
>> ENDIF
>>
>> Fixing this is robust if FW and driver changes are inline. See below
>> for more detail.
>>
>> Case - 1
>> This patch is attempt to fix one level problem where Virtual Disks are
>> not managed correctly in MR FW. There are some MR FW (E.a Gen2 and
>> Gen2.5 FW), which set WCE bit for Virtual Disk but FW is not reply
>> correct for SYNCH_CACHE. This was handled in past and carry forward
>> (in driver + FW ) to all new generation products as well. We tried to
>> collect all possible details why it was handled such way to provide
>> better driver fix for this issue(mainly to avoid this FW checks and
>> module parameters etc..). But now it looks like to make generic fix
>> (Device ID based etc..)is even risky, so went with this protective
>> approach. It is very risky to find out which Device ID and FW has
>> capability to manage SYNC_CACHE, so we managed to figure out which are
>> the new FW using FW capability bit.
>>
>> E.g Liberator/Thunderbolt MR FW SYNC_CACHE is write Unsupported
>> command (this is a firmware bug) and immediately it will be failed to
>> SML. This will cause File system to go Read-only.
>
>That's a better description ... what you're saying is that the patch
>doesn't actually
>fix the bug Ric is worrying about.
>
>> Case -2
>> One more thing which we are trying and known driver can send
>> "SYNC_CACHE" unconditionally to all generation of FW. For this we are
>> doing more unit testing on various controllers/FW and planning to
>> provide another patch to fix JBOD path for any FW. (It will not be
>> based on FW capability/module parameter).
>
>OK, but we really need this ASAP. As Ric said, data integrity is at risk
>until this is
>fixed. Can you also reference the commit that introduced the problem so we
>know how far to do the sable backports?
Commit- " 02b01e0 [SCSI] megaraid_sas: return sync cache call with success"
added the code
in driver to return SYNCHRONIZE_CACHE without sending it to firmware long
back in 2007.
This was then added for RAID volumes as then supported controller firmwares
did not have
support for SYNCHRONIZE_CACHE. This was mistakenly carried forward for non
RAID(JBODs)
which was wrong. I will be sending a consolidated patch which will address
all issues pertaining to SYNCHRONIZE_CACHE for RAID volumes and non
RAID(JBOD) disks.
>
>> If we remove module parameter, we can ask customer to disable WCE on
>> drive to get similar impact.
>
>Thanks,
>
>James
next prev parent reply other threads:[~2016-10-18 18:47 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-17 10:24 [PATCH 0/7] megaraid_sas: Updates for scsi-next Sumit Saxena
2016-10-17 10:24 ` [PATCH 1/7] megaraid_sas: For SRIOV enabled firmware, ensure VF driver waits for 30secs before reset Sumit Saxena
2016-10-17 11:29 ` Hannes Reinecke
2016-10-17 12:43 ` Tomas Henzl
2016-10-17 10:24 ` [PATCH 2/7] megaraid_sas: Send correct PhysArm to FW for R1 VD downgrade Sumit Saxena
2016-10-17 11:29 ` Hannes Reinecke
2016-10-17 12:50 ` Tomas Henzl
2016-10-17 10:24 ` [PATCH 3/7] megaraid_sas: Do not fire DCMDs during PCI shutdown/detach Sumit Saxena
2016-10-17 11:31 ` Hannes Reinecke
2016-10-17 12:19 ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to firmware Sumit Saxena
2016-10-17 11:34 ` Hannes Reinecke
2016-10-17 12:13 ` Sumit Saxena
2016-10-17 13:01 ` Ric Wheeler
2016-10-17 13:31 ` Sumit Saxena
2016-10-17 15:55 ` Christoph Hellwig
2016-10-17 16:08 ` Ric Wheeler
2016-10-17 16:23 ` Ewan D. Milne
2016-10-17 16:20 ` James Bottomley
2016-10-17 16:27 ` Ric Wheeler
2016-10-17 17:19 ` James Bottomley
2016-10-17 17:30 ` Ric Wheeler
2016-10-17 17:36 ` Kashyap Desai
2016-10-17 17:51 ` James Bottomley
2016-10-18 13:56 ` Tomas Henzl
2016-10-19 9:50 ` Ching Huang
2016-10-19 12:55 ` Tomas Henzl
2016-10-19 18:07 ` Raghava Aditya Renukunta
2016-10-21 16:30 ` Tomas Henzl
2016-10-25 2:02 ` Martin K. Petersen
[not found] ` <CAKiknE_MM88eVON1g_qy7wbb2fkxdAs6O0SRSzN-RbQqSvx1eA@mail.gmail.com>
2016-10-27 2:07 ` Martin K. Petersen
2016-10-18 18:47 ` Sumit Saxena [this message]
2016-10-17 16:29 ` Ric Wheeler
2016-10-17 13:13 ` Tomas Henzl
2016-10-17 13:28 ` Sumit Saxena
2016-10-17 13:57 ` Tomas Henzl
2016-10-17 14:25 ` Sumit Saxena
2016-10-18 13:07 ` Ric Wheeler
2016-10-18 13:22 ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 5/7] megaraid_sas: driver version upgrade Sumit Saxena
2016-10-17 11:35 ` Hannes Reinecke
2016-10-17 12:20 ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 6/7] MAINTAINERS: Update megaraid maintainers list Sumit Saxena
2016-10-17 11:37 ` Hannes Reinecke
2016-10-17 10:24 ` [PATCH 7/7] megaraid_sas: Do not set MPI2_TYPE_CUDA for JBOD FP path for FW which does not support JBOD sequence map Sumit Saxena
2016-10-17 11:37 ` Hannes Reinecke
2016-10-17 13:23 ` Tomas Henzl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a5ef18ed27db613b106ddbf4564320d5@mail.gmail.com \
--to=sumit.saxena@broadcom.com \
--cc=axboe@kernel.dk \
--cc=emilne@redhat.com \
--cc=fge@redhat.com \
--cc=hare@suse.de \
--cc=hch@infradead.org \
--cc=jejb@linux.vnet.ibm.com \
--cc=jmoyer@redhat.com \
--cc=kashyap.desai@broadcom.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mkp@mkp.net \
--cc=rwheeler@redhat.com \
--cc=thenzl@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).