From: Kashyap Desai <kashyap.desai@broadcom.com>
To: James Bottomley <jejb@linux.vnet.ibm.com>,
Ric Wheeler <rwheeler@redhat.com>, Hannes Reinecke <hare@suse.de>,
Sumit Saxena <sumit.saxena@broadcom.com>,
linux-scsi@vger.kernel.org
Cc: martin.petersen@oracle.com, thenzl@redhat.com,
Christoph Hellwig <hch@infradead.org>,
"Martin K. Petersen" <mkp@mkp.net>,
Jeff Moyer <jmoyer@redhat.com>, Gris Ge <fge@redhat.com>,
Ewan Milne <emilne@redhat.com>, Jens Axboe <axboe@kernel.dk>
Subject: RE: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to firmware
Date: Mon, 17 Oct 2016 23:06:09 +0530 [thread overview]
Message-ID: <01e727ba166a34a97efdd5e890e4bc22@mail.gmail.com> (raw)
In-Reply-To: <1476724793.2734.21.camel@linux.vnet.ibm.com>
> -----Original Message-----
> From: James Bottomley [mailto:jejb@linux.vnet.ibm.com]
> Sent: Monday, October 17, 2016 10:50 PM
> To: Ric Wheeler; Hannes Reinecke; Sumit Saxena; linux-scsi@vger.kernel.org
> Cc: martin.petersen@oracle.com; thenzl@redhat.com;
> kashyap.desai@broadcom.com; Christoph Hellwig; Martin K. Petersen; Jeff
> Moyer; Gris Ge; Ewan Milne; Jens Axboe
> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command
> to firmware
>
> On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote:
> > On 10/17/2016 12:20 PM, James Bottomley wrote:
> > > On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote:
> > > > On 10/17/2016 07:34 AM, Hannes Reinecke wrote:
> > > > > On 10/17/2016 12:24 PM, Sumit Saxena wrote:
> > > > > > megaraid_sas driver returns SYNCHRONIZE_CACHE command back to
> > > > > > SCSI layer without sending it to firmware as firmware takes
> > > > > > care of flushing cache. This patch will change the driver
> > > > > > behavior wrt SYNCHRONIZE_CACHE handling. If underlying
> > > > > > firmware has support to handle the SYNCHORNIZE_CACHE, driver
> > > > > > will send it for firmware otherwise complete it back to SCSI
> > > > > > layer with SUCCESS immediately. If Firmware handle
> > > > > > SYNCHORNIZE_CACHE for both VD and JBOD "canHandleSyncCache"
> > > > > > bit in scratch pad register(offset
> > > > > > 0x00B4) will be set.
> > > > > >
> > > > > > This behavior can be controlled via module parameter and user
> > > > > > can fallback to old behavior of returning SYNCHRONIZE_CACHE by
> > > > > > driver only without sending it to firmware.
> > > > > >
> > > > > > Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
> > > > > > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
> > > > > > ---
> > > > > > drivers/scsi/megaraid/megaraid_sas.h | 3 +++
> > > > > > drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++---
> > > > > > -----
> > > > > > drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++
> > > > > > drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++
> > > > > > 4 files changed, 14 insertions(+), 8 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas.h
> > > > > > b/drivers/scsi/megaraid/megaraid_sas.h
> > > > > > index ca86c88..43fd14f 100644
> > > > > > --- a/drivers/scsi/megaraid/megaraid_sas.h
> > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas.h
> > > > > > @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT {
> > > > > > #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14
> > > > > > #define MR_MAX_MSIX_REG_ARRAY 16
> > > > > > #define MR_RDPQ_MODE_OFFSET 0X00800
> > > > > > 000
> > > > > > +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X010
> > > > > > 0000
> > > > > > 0
> > > > > > +
> > > > > > /*
> > > > > > * register set for both 1068 and 1078 controllers
> > > > > > * structure extended for 1078 registers @@ -2140,6 +2142,7
> > > > > > @@ struct megasas_instance {
> > > > > > u8 is_imr;
> > > > > > u8 is_rdpq;
> > > > > > bool dev_handle;
> > > > > > + bool fw_sync_cache_support;
> > > > > > };
> > > > > > struct MR_LD_VF_MAP {
> > > > > > u32 size;
> > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
> > > > > > b/drivers/scsi/megaraid/megaraid_sas_base.c
> > > > > > index 092387f..a4a8e2f 100644
> > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> > > > > > @@ -104,6 +104,10 @@ unsigned int scmd_timeout =
> > > > > > MEGASAS_DEFAULT_CMD_TIMEOUT;
> > > > > > module_param(scmd_timeout, int, S_IRUGO);
> > > > > > MODULE_PARM_DESC(scmd_timeout, "scsi command timeout (10
> > > > > > -90s), default 90s. See megasas_reset_timer.");
> > > > > >
> > > > > > +bool block_sync_cache;
> > > > > > +module_param(block_sync_cache, bool, S_IRUGO);
> > > > > > +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by
> > > > > > driver
> > > > > > Default: 0(Send it to firmware)");
> > > > > > +
> > > > > > MODULE_LICENSE("GPL");
> > > > > > MODULE_VERSION(MEGASAS_VERSION);
> > > > > > MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com");
> > > > > > @@ -1700,16 +1704,10 @@ megasas_queue_command(struct
> Scsi_Host
> > > > > > *shost, struct scsi_cmnd *scmd)
> > > > > > goto out_done;
> > > > > > }
> > > > > >
> > > > > > - switch (scmd->cmnd[0]) {
> > > > > > - case SYNCHRONIZE_CACHE:
> > > > > > - /*
> > > > > > - * FW takes care of flush cache on its own
> > > > > > - * No need to send it down
> > > > > > - */
> > > > > > + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) &&
> > > > > > + (!instance->fw_sync_cache_support)) {
> > > > > > scmd->result = DID_OK << 16;
> > > > > > goto out_done;
> > > > > > - default:
> > > > > > - break;
> > > > > > }
> > > > > >
> > > > > > return instance->instancet
> > > > > > ->build_and_issue_cmd(instance, scmd);
> > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> > > > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> > > > > > index 2159f6a..8237580 100644
> > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> > > > > > @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct
> > > > > > megasas_instance *instance)
> > > > > > ret = 1;
> > > > > > goto fail_fw_init;
> > > > > > }
> > > > > > + if (!block_sync_cache)
> > > > > > + instance->fw_sync_cache_support =
> > > > > > (scratch_pad_2
> > > > > > &
> > > > > > + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1
> > > > > > :
> > > > > > 0;
> > > > > >
> > > > > > IOCInitMessage =
> > > > > > dma_alloc_coherent(&instance->pdev->dev,
> > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h
> > > > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.h
> > > > > > index e3bee04..2857154 100644
> > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h
> > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h
> > > > > > @@ -72,6 +72,8 @@
> > > > > > #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET
> > > > > > (0x0000030C)
> > > > > > #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x000000
> > > > > > 6C)
> > > > > >
> > > > > > +extern bool block_sync_cache;
> > > > > > +
> > > > > > /*
> > > > > > * Raid context flags
> > > > > > */
> > > > > >
> > > > > Be extra careful with that.
> > > > >
> > > > > SYNCHRONIZE_CACHE has (potentially) a global scope, as there
> > > > > might be an array-wide cache, and a cache flush would affect the
> > > > > entire cache. Linux is using SYNCHRONIZE_CACHE as a per device
> > > > > setting, ie it assumes that the effects of a cache flush is
> > > > > restricted to the device in question.
> > > > >
> > > > > If this is _not_ the case I'd rather not enable it.
> > > > > Have you checked that enabling this functionality doesn't have
> > > > > negative performance impact?
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Hannes
> > > > This must go in - without this fix, there is no data integrity for
> > > > any file system.
> > > That's not what I get from the change log. What it says to me is
> > > that the caches are currently firmware managed. Barring firmware
> > > bugs, that means that we currently don't have any integrity issues.
> >
> > Your understanding (or the change log) is incorrect.
>
> There's no current way in English of construing "as firmware takes care of
> flushing cache" to mean its inverse, so the changelog needs updating to
> explain
> that firmware does *not* take care of cache flushing, so effectively
> nothing
> does and we'll need a cc to stable if the problem is that nothing takes
> care of
> flushing the drive caches.
>
> James
Sorry for confusion. More accurate to say -
Current megaraid_sas driver returns SYNCHRONIZE_CACHE command back to SCSI
layer without sending it down to firmware as firmware supposed to takes care
of flushing cache for all Virtual Disk at the time of system
reboot/shutdown.
Because of above FW rule driver coded wrongly and it added bug that
SYNCHRONIZE_CACHE is not passed to FW for JBOD as well. (MR Firmware wanted
OS to manage SYNCHRONIZE_CACHE and pass the same to the Firmware. ).
We figure out that even for VD, why driver should send back SYNC_CACHE
command...let's have the same in FW (giving all control in FW)
New behavior described -
IF application requests SYNCH_CACHE
IF any FW which supports new API bit called canHandleSyncCache
Driver sends SYNCH_CACHE command to the FW
IF 'VirtualDisk'
FW does not send SYNCH_CACHE to drives
FW returns SUCCESS
ELSE IF 'JBOD'
FW sends SYNCH_CACHE to drive
FW obtains status from drive and returns same status back to driver
ENDIF
Fixing this is robust if FW and driver changes are inline. See below for
more detail.
Case - 1
This patch is attempt to fix one level problem where Virtual Disks are not
managed correctly in MR FW. There are some MR FW (E.a Gen2 and Gen2.5 FW),
which set WCE bit for Virtual Disk but FW is not reply correct for
SYNCH_CACHE.
This was handled in past and carry forward (in driver + FW ) to all new
generation products as well. We tried to collect all possible details why it
was handled such way to provide better driver fix for this issue( mainly to
avoid this FW checks and module
parameters etc..). But now it looks like to make generic fix (Device ID
based etc..)is even risky, so went with this protective approach. It is
very risky to find out which Device ID and FW has capability to manage
SYNC_CACHE, so we managed to figure out which are the new FW using FW
capability bit.
E.g Liberator/Thunderbolt MR FW SYNC_CACHE is write Unsupported command
(this is a firmware bug) and immediately it will be failed to SML. This will
cause File system to go Read-only.
Case -2
One more thing which we are trying and known driver can send "SYNC_CACHE"
unconditionally to all generation of FW. For this we are doing more unit
testing on various controllers/FW and planning to provide another patch to
fix JBOD path for any FW. (It will not be based on FW capability/module
parameter).
If we remove module parameter, we can ask customer to disable WCE on drive
to get similar impact.
` Kashyap
>
>
> > The issue here is for devices that are non-RAID or pass through - this
> > is a real issue and has been seen in practice.
>
>
next prev parent reply other threads:[~2016-10-17 17:36 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-17 10:24 [PATCH 0/7] megaraid_sas: Updates for scsi-next Sumit Saxena
2016-10-17 10:24 ` [PATCH 1/7] megaraid_sas: For SRIOV enabled firmware, ensure VF driver waits for 30secs before reset Sumit Saxena
2016-10-17 11:29 ` Hannes Reinecke
2016-10-17 12:43 ` Tomas Henzl
2016-10-17 10:24 ` [PATCH 2/7] megaraid_sas: Send correct PhysArm to FW for R1 VD downgrade Sumit Saxena
2016-10-17 11:29 ` Hannes Reinecke
2016-10-17 12:50 ` Tomas Henzl
2016-10-17 10:24 ` [PATCH 3/7] megaraid_sas: Do not fire DCMDs during PCI shutdown/detach Sumit Saxena
2016-10-17 11:31 ` Hannes Reinecke
2016-10-17 12:19 ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to firmware Sumit Saxena
2016-10-17 11:34 ` Hannes Reinecke
2016-10-17 12:13 ` Sumit Saxena
2016-10-17 13:01 ` Ric Wheeler
2016-10-17 13:31 ` Sumit Saxena
2016-10-17 15:55 ` Christoph Hellwig
2016-10-17 16:08 ` Ric Wheeler
2016-10-17 16:23 ` Ewan D. Milne
2016-10-17 16:20 ` James Bottomley
2016-10-17 16:27 ` Ric Wheeler
2016-10-17 17:19 ` James Bottomley
2016-10-17 17:30 ` Ric Wheeler
2016-10-17 17:36 ` Kashyap Desai [this message]
2016-10-17 17:51 ` James Bottomley
2016-10-18 13:56 ` Tomas Henzl
2016-10-19 9:50 ` Ching Huang
2016-10-19 12:55 ` Tomas Henzl
2016-10-19 18:07 ` Raghava Aditya Renukunta
2016-10-21 16:30 ` Tomas Henzl
2016-10-25 2:02 ` Martin K. Petersen
[not found] ` <CAKiknE_MM88eVON1g_qy7wbb2fkxdAs6O0SRSzN-RbQqSvx1eA@mail.gmail.com>
2016-10-27 2:07 ` Martin K. Petersen
2016-10-18 18:47 ` Sumit Saxena
2016-10-17 16:29 ` Ric Wheeler
2016-10-17 13:13 ` Tomas Henzl
2016-10-17 13:28 ` Sumit Saxena
2016-10-17 13:57 ` Tomas Henzl
2016-10-17 14:25 ` Sumit Saxena
2016-10-18 13:07 ` Ric Wheeler
2016-10-18 13:22 ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 5/7] megaraid_sas: driver version upgrade Sumit Saxena
2016-10-17 11:35 ` Hannes Reinecke
2016-10-17 12:20 ` Sumit Saxena
2016-10-17 10:24 ` [PATCH 6/7] MAINTAINERS: Update megaraid maintainers list Sumit Saxena
2016-10-17 11:37 ` Hannes Reinecke
2016-10-17 10:24 ` [PATCH 7/7] megaraid_sas: Do not set MPI2_TYPE_CUDA for JBOD FP path for FW which does not support JBOD sequence map Sumit Saxena
2016-10-17 11:37 ` Hannes Reinecke
2016-10-17 13:23 ` Tomas Henzl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=01e727ba166a34a97efdd5e890e4bc22@mail.gmail.com \
--to=kashyap.desai@broadcom.com \
--cc=axboe@kernel.dk \
--cc=emilne@redhat.com \
--cc=fge@redhat.com \
--cc=hare@suse.de \
--cc=hch@infradead.org \
--cc=jejb@linux.vnet.ibm.com \
--cc=jmoyer@redhat.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mkp@mkp.net \
--cc=rwheeler@redhat.com \
--cc=sumit.saxena@broadcom.com \
--cc=thenzl@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).