From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tomas Henzl Subject: Re: [PATCH 04/11] megaraid_sas : Firmware crash dump feature support Date: Tue, 09 Sep 2014 17:54:10 +0200 Message-ID: <540F22A2.3040906@redhat.com> References: <201409061328.s86DS6lO012975@palmhbs0.lsi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from mx1.redhat.com ([209.132.183.28]:45596 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753234AbaIIPyv (ORCPT ); Tue, 9 Sep 2014 11:54:51 -0400 In-Reply-To: <201409061328.s86DS6lO012975@palmhbs0.lsi.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Sumit.Saxena@avagotech.com, linux-scsi@vger.kernel.org Cc: martin.petersen@oracle.com, hch@infradead.org, jbottomley@parallels.com, kashyap.desai@avagotech.com, aradford@gmail.com On 09/06/2014 03:25 PM, Sumit.Saxena@avagotech.com wrote: > This feature will provide similar interface as kernel crash dump feat= ure. > When megaraid firmware encounter any crash, driver will collect the f= irmware raw image and=20 > dump it into pre-configured location. > > Driver will allocate two different segment of memory.=20 > #1 Non-DMA able large buffer (will be allocated on demand) to capture= actual FW crash dump. > #2 DMA buffer (persistence allocation) just to do a arbitrator job.=20 > > Firmware will keep writing Crash dump data in chucks of DMA buffer si= ze into #2,=20 > which will be copy back by driver to the host memory as described in = #1. > > Driver-Firmware interface: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > A.) Host driver can allocate maximum 512MB Host memory to store crash= dump data.=20 > > This memory will be internal to the host and will not be exposed to t= he Firmware. > Driver may not be able to allocate 512 MB. In that case, driver will = do possible memory=20 > (available at run time) allocation to store crash dump data.=20 > > Let=E2=80=99s call this buffer as Host Crash Buffer.=20 > > Host Crash buffer will not be contigious as a whole, but it will have= multiple chunk of contigious memory.=20 > This will be internal to driver and firmware/application are unaware = of it.=20 > Partial allocation of Host Crash buffer may have valid information to= debug depending upon=20 > what was collected in that buffer and depending on nature of failure.= =20 > > Complete Crash dump is the best case, but we do want to capture parti= al buffer just to grab something rather than nothing. > Host Crash buffer will be allocated only when FW Crash dump data is a= vailable,=20 > and will be deallocated once application copy Host Crash buffer to th= e file.=20 > Host Crash buffer size can be anything between 1MB to 512MB. (It will= be multiple of 1MBs) > > > B.) Irrespective of underlying Firmware capability of crash dump supp= ort,=20 > driver will allocate DMA buffer at start of the day for each MR contr= ollers.=20 > Let=E2=80=99s call this buffer as =E2=80=9CDMA Crash Buffer=E2=80=9D. > > For this feature, size of DMA crash buffer will be 1MB.=20 > (We will not gain much even if DMA buffer size is increased.)=20 > > C.) Driver will now read Controller Info sending existing dcmd =E2=80= =9CMR_DCMD_CTRL_GET_INFO=E2=80=9D.=20 > Driver should extract the information from ctrl info provided by firm= ware and=20 > figure out if firmware support crash dump feature or not. > > Driver will enable crash dump feature only if > =E2=80=9CFirmware support Crash dump=E2=80=9D + > =E2=80=9CDriver was able to create DMA Crash Buffer=E2=80=9D. > > If either one from above is not set, Crash dump feature should be dis= able in driver. > Firmware will enable crash dump feature only if =E2=80=9CDriver Send = DCMD- MR_DCMD_SET_CRASH_BUF_PARA with MR_CRASH_BUF_TURN_ON=E2=80=9D > > Helper application/script should use sysfs parameter fw_crash_xxx to = actually copy data from > host memory to the filesystem. Is it possible to store the crash dump data on filesystem on the same c= ontroller after the controller has crashed or do you expect that a use of another disk/= controller? With several controllers in a system this may take a lot memory, could = you also in case when a kdump kernel is running lower it, by not using this feat= ure? see other comments inside > > Signed-off-by: Sumit Saxena > Signed-off-by: Kashyap Desai > --- > drivers/scsi/megaraid/megaraid_sas.h | 58 +++++- > drivers/scsi/megaraid/megaraid_sas_base.c | 292 ++++++++++++++++++= +++++++++- > drivers/scsi/megaraid/megaraid_sas_fusion.c | 172 +++++++++++++++- > 3 files changed, 517 insertions(+), 5 deletions(-) > > diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/mega= raid/megaraid_sas.h > index bc7adcf..e0f03e2 100644 > --- a/drivers/scsi/megaraid/megaraid_sas.h > +++ b/drivers/scsi/megaraid/megaraid_sas.h > @@ -105,6 +105,9 @@ > #define MFI_STATE_READY 0xB0000000 > #define MFI_STATE_OPERATIONAL 0xC0000000 > #define MFI_STATE_FAULT 0xF0000000 > +#define MFI_STATE_FORCE_OCR 0x00000080 > +#define MFI_STATE_DMADONE 0x00000008 > +#define MFI_STATE_CRASH_DUMP_DONE 0x00000004 > #define MFI_RESET_REQUIRED 0x00000001 > #define MFI_RESET_ADAPTER 0x00000002 > #define MEGAMFI_FRAME_SIZE 64 > @@ -191,6 +194,9 @@ > #define MR_DCMD_CLUSTER_RESET_LD 0x08010200 > #define MR_DCMD_PD_LIST_QUERY 0x02010100 > =20 > +#define MR_DCMD_CTRL_SET_CRASH_DUMP_PARAMS 0x01190100 > +#define MR_DRIVER_SET_APP_CRASHDUMP_MODE (0xF0010000 | 0x0600) > + > /* > * Global functions > */ > @@ -264,6 +270,25 @@ enum MFI_STAT { > }; > =20 > /* > + * Crash dump related defines > + */ > +#define MAX_CRASH_DUMP_SIZE 512 > +#define CRASH_DMA_BUF_SIZE (1024 * 1024) > + > +enum MR_FW_CRASH_DUMP_STATE { > + UNAVAILABLE =3D 0, > + AVAILABLE =3D 1, > + COPYING =3D 2, > + COPIED =3D 3, > + COPY_ERROR =3D 4, > +}; > + > +enum _MR_CRASH_BUF_STATUS { > + MR_CRASH_BUF_TURN_OFF =3D 0, > + MR_CRASH_BUF_TURN_ON =3D 1, > +}; > + > +/* > * Number of mailbox bytes in DCMD message frame > */ > #define MFI_MBOX_SIZE 12 > @@ -933,7 +958,19 @@ struct megasas_ctrl_info { > u8 reserved; /*0x7E7*/ > } iov; > =20 > - u8 pad[0x800-0x7E8]; /*0x7E8 pad to 2k */ > + struct { > +#if defined(__BIG_ENDIAN_BITFIELD) > + u32 reserved:25; > + u32 supportCrashDump:1; > + u32 reserved1:6; > +#else > + u32 reserved1:6; > + u32 supportCrashDump:1; > + u32 reserved:25; > +#endif > + } adapterOperations3; > + > + u8 pad[0x800-0x7EC]; > } __packed; > =20 > /* > @@ -1559,6 +1596,20 @@ struct megasas_instance { > u32 *reply_queue; > dma_addr_t reply_queue_h; > =20 > + u32 *crash_dump_buf; > + dma_addr_t crash_dump_h; > + void *crash_buf[MAX_CRASH_DUMP_SIZE]; > + u32 crash_buf_pages; > + unsigned int fw_crash_buffer_size; > + unsigned int fw_crash_state; > + unsigned int fw_crash_buffer_offset; > + u32 drv_buf_index; > + u32 drv_buf_alloc; > + u32 crash_dump_fw_support; > + u32 crash_dump_drv_support; > + u32 crash_dump_app_support; > + spinlock_t crashdump_lock; > + > struct megasas_register_set __iomem *reg_set; > u32 *reply_post_host_index_addr[MR_MAX_MSIX_REG_ARRAY]; > struct megasas_pd_list pd_list[MEGASAS_MAX_PD]; > @@ -1606,6 +1657,7 @@ struct megasas_instance { > struct megasas_instance_template *instancet; > struct tasklet_struct isr_tasklet; > struct work_struct work_init; > + struct work_struct crash_init; > =20 > u8 flag; > u8 unload; > @@ -1830,4 +1882,8 @@ u16 MR_LdSpanArrayGet(u32 ld, u32 span, struct = MR_FW_RAID_MAP_ALL *map); > u16 MR_PdDevHandleGet(u32 pd, struct MR_FW_RAID_MAP_ALL *map); > u16 MR_GetLDTgtId(u32 ld, struct MR_FW_RAID_MAP_ALL *map); > =20 > +int megasas_set_crash_dump_params(struct megasas_instance *instance, > + u8 crash_buf_state); > +void megasas_free_host_crash_buffer(struct megasas_instance *instanc= e); > +void megasas_fusion_crash_dump_wq(struct work_struct *work); > #endif /*LSI_MEGARAID_SAS_H */ > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi= /megaraid/megaraid_sas_base.c > index a894f13..5b58e39d 100644 > --- a/drivers/scsi/megaraid/megaraid_sas_base.c > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c > @@ -2560,6 +2560,152 @@ static int megasas_change_queue_depth(struct = scsi_device *sdev, > return queue_depth; > } > =20 > +static ssize_t > +megasas_fw_crash_buffer_store(struct device *cdev, > + struct device_attribute *attr, const char *buf, size_t count) > +{ > + struct Scsi_Host *shost =3D class_to_shost(cdev); > + struct megasas_instance *instance =3D > + (struct megasas_instance *) shost->hostdata; > + int val =3D 0; > + unsigned long flags; > + > + if (kstrtoint(buf, 0, &val) !=3D 0) > + return -EINVAL; > + > + spin_lock_irqsave(&instance->crashdump_lock, flags); > + instance->fw_crash_buffer_offset =3D val; > + spin_unlock_irqrestore(&instance->crashdump_lock, flags); The access to fw_crash_buffer_offset is not protected in the function b= elow why is the spinlock needed here, could it be removed? > + return strlen(buf); > +} > + > +static ssize_t > +megasas_fw_crash_buffer_show(struct device *cdev, > + struct device_attribute *attr, char *buf) > +{ > + struct Scsi_Host *shost =3D class_to_shost(cdev); > + struct megasas_instance *instance =3D > + (struct megasas_instance *) shost->hostdata; > + u32 size; > + unsigned long buff_addr; > + unsigned long dmachunk =3D CRASH_DMA_BUF_SIZE; > + unsigned long src_addr; > + unsigned long flags; > + u32 buff_offset; > + > + buff_offset =3D instance->fw_crash_buffer_offset; > + spin_lock_irqsave(&instance->crashdump_lock, flags); > + if (!instance->crash_dump_buf && > + !((instance->fw_crash_state =3D=3D AVAILABLE) || > + (instance->fw_crash_state =3D=3D COPYING))) { > + dev_err(&instance->pdev->dev, > + "Firmware crash dump is not available\n"); > + spin_unlock_irqrestore(&instance->crashdump_lock, flags); > + return -EINVAL; > + } > + > + buff_addr =3D (unsigned long) buf; > + > + if (buff_offset > > + (instance->fw_crash_buffer_size * dmachunk)) { > + dev_err(&instance->pdev->dev, > + "Firmware crash dump offset is out of range\n"); > + spin_unlock_irqrestore(&instance->crashdump_lock, flags); > + return 0; > + } > + > + size =3D (instance->fw_crash_buffer_size * dmachunk) - buff_offset; > + size =3D (size >=3D PAGE_SIZE) ? (PAGE_SIZE - 1) : size; > + > + src_addr =3D (unsigned long)instance->crash_buf[buff_offset / dmach= unk] + > + (buff_offset % dmachunk); > + memcpy(buf, (void *)src_addr, size); > + spin_unlock_irqrestore(&instance->crashdump_lock, flags); > + > + return size; > +} > + > +static ssize_t > +megasas_fw_crash_buffer_size_show(struct device *cdev, > + struct device_attribute *attr, char *buf) > +{ > + struct Scsi_Host *shost =3D class_to_shost(cdev); > + struct megasas_instance *instance =3D > + (struct megasas_instance *) shost->hostdata; > + > + return snprintf(buf, PAGE_SIZE, "%ld\n", (unsigned long) > + ((instance->fw_crash_buffer_size) * 1024 * 1024)/PAGE_SIZE); > +} > + > +static ssize_t > +megasas_fw_crash_state_store(struct device *cdev, > + struct device_attribute *attr, const char *buf, size_t count) > +{ > + struct Scsi_Host *shost =3D class_to_shost(cdev); > + struct megasas_instance *instance =3D > + (struct megasas_instance *) shost->hostdata; > + int val =3D 0; > + unsigned long flags; > + > + if (kstrtoint(buf, 0, &val) !=3D 0) > + return -EINVAL; > + > + if ((val <=3D AVAILABLE || val > COPY_ERROR)) { > + dev_err(&instance->pdev->dev, "application updates invalid " > + "firmware crash state\n"); > + return -EINVAL; > + } > + > + instance->fw_crash_state =3D val; > + > + if ((val =3D=3D COPIED) || (val =3D=3D COPY_ERROR)) { > + spin_lock_irqsave(&instance->crashdump_lock, flags); > + megasas_free_host_crash_buffer(instance); > + spin_unlock_irqrestore(&instance->crashdump_lock, flags); > + if (val =3D=3D COPY_ERROR) > + dev_info(&instance->pdev->dev, "application failed to " > + "copy Firmware crash dump\n"); > + else > + dev_info(&instance->pdev->dev, "Firmware crash dump " > + "copied successfully\n"); > + } > + return strlen(buf); > +} > + > +static ssize_t > +megasas_fw_crash_state_show(struct device *cdev, > + struct device_attribute *attr, char *buf) > +{ > + struct Scsi_Host *shost =3D class_to_shost(cdev); > + struct megasas_instance *instance =3D > + (struct megasas_instance *) shost->hostdata; > + return snprintf(buf, PAGE_SIZE, "%d\n", instance->fw_crash_state); > +} > + > +static ssize_t > +megasas_page_size_show(struct device *cdev, > + struct device_attribute *attr, char *buf) > +{ > + return snprintf(buf, PAGE_SIZE, "%ld\n", (unsigned long)PAGE_SIZE -= 1); > +} > + > +static DEVICE_ATTR(fw_crash_buffer, S_IRUGO | S_IWUSR, > + megasas_fw_crash_buffer_show, megasas_fw_crash_buffer_store); > +static DEVICE_ATTR(fw_crash_buffer_size, S_IRUGO, > + megasas_fw_crash_buffer_size_show, NULL); > +static DEVICE_ATTR(fw_crash_state, S_IRUGO | S_IWUSR, > + megasas_fw_crash_state_show, megasas_fw_crash_state_store); > +static DEVICE_ATTR(page_size, S_IRUGO, > + megasas_page_size_show, NULL); > + > +struct device_attribute *megaraid_host_attrs[] =3D { > + &dev_attr_fw_crash_buffer_size, > + &dev_attr_fw_crash_buffer, > + &dev_attr_fw_crash_state, > + &dev_attr_page_size, > + NULL, > +}; > + > /* > * Scsi host template for megaraid_sas driver > */ > @@ -2575,6 +2721,7 @@ static struct scsi_host_template megasas_templa= te =3D { > .eh_bus_reset_handler =3D megasas_reset_bus_host, > .eh_host_reset_handler =3D megasas_reset_bus_host, > .eh_timed_out =3D megasas_reset_timer, > + .shost_attrs =3D megaraid_host_attrs, > .bios_param =3D megasas_bios_param, > .use_clustering =3D ENABLE_CLUSTERING, > .change_queue_depth =3D megasas_change_queue_depth, > @@ -3887,6 +4034,59 @@ megasas_get_ctrl_info(struct megasas_instance = *instance, > return ret; > } > =20 > +/* > + * megasas_set_crash_dump_params - Sends address of crash dump DMA b= uffer > + * to firmware > + * > + * @instance: Adapter soft state > + * @crash_buf_state - tell FW to turn ON/OFF crash dump feature > + MR_CRASH_BUF_TURN_OFF =3D 0 > + MR_CRASH_BUF_TURN_ON =3D 1 > + * @return 0 on success non-zero on failure. > + * Issues an internal command (DCMD) to set parameters for crash dum= p feature. > + * Driver will send address of crash dump DMA buffer and set mbox to= tell FW > + * that driver supports crash dump feature. This DCMD will be sent o= nly if > + * crash dump feature is supported by the FW. > + * > + */ > +int megasas_set_crash_dump_params(struct megasas_instance *instance, > + u8 crash_buf_state) > +{ > + int ret =3D 0; > + struct megasas_cmd *cmd; > + struct megasas_dcmd_frame *dcmd; > + > + cmd =3D megasas_get_cmd(instance); > + > + if (!cmd) { > + dev_err(&instance->pdev->dev, "Failed to get a free cmd\n"); > + return -ENOMEM; > + } > + > + > + dcmd =3D &cmd->frame->dcmd; > + > + memset(dcmd->mbox.b, 0, MFI_MBOX_SIZE); > + dcmd->mbox.b[0] =3D crash_buf_state; > + dcmd->cmd =3D MFI_CMD_DCMD; > + dcmd->cmd_status =3D 0xFF; > + dcmd->sge_count =3D 1; > + dcmd->flags =3D cpu_to_le16(MFI_FRAME_DIR_NONE); > + dcmd->timeout =3D 0; > + dcmd->pad_0 =3D 0; > + dcmd->data_xfer_len =3D cpu_to_le32(CRASH_DMA_BUF_SIZE); > + dcmd->opcode =3D cpu_to_le32(MR_DCMD_CTRL_SET_CRASH_DUMP_PARAMS); > + dcmd->sgl.sge32[0].phys_addr =3D cpu_to_le32(instance->crash_dump_h= ); > + dcmd->sgl.sge32[0].length =3D cpu_to_le32(CRASH_DMA_BUF_SIZE); > + > + if (!megasas_issue_polled(instance, cmd)) > + ret =3D 0; > + else > + ret =3D -1; > + megasas_return_cmd(instance, cmd); > + return ret; > +} > + > /** > * megasas_issue_init_mfi - Initializes the FW > * @instance: Adapter soft state > @@ -4272,6 +4472,27 @@ static int megasas_init_fw(struct megasas_inst= ance *instance) > printk(KERN_WARNING "megaraid_sas: I am VF " > "requestorId %d\n", instance->requestorId); > } > + > + le32_to_cpus((u32 *)&ctrl_info->adapterOperations3); > + instance->crash_dump_fw_support =3D > + ctrl_info->adapterOperations3.supportCrashDump; > + instance->crash_dump_drv_support =3D > + (instance->crash_dump_fw_support && > + instance->crash_dump_buf); > + if (instance->crash_dump_drv_support) { > + dev_info(&instance->pdev->dev, "Firmware Crash dump " > + "feature is supported\n"); > + megasas_set_crash_dump_params(instance, > + MR_CRASH_BUF_TURN_OFF); > + > + } else { > + if (instance->crash_dump_buf) > + pci_free_consistent(instance->pdev, > + CRASH_DMA_BUF_SIZE, > + instance->crash_dump_buf, > + instance->crash_dump_h); > + instance->crash_dump_buf =3D NULL; > + } > } > instance->max_sectors_per_req =3D instance->max_num_sge * > PAGE_SIZE / 512; > @@ -4791,6 +5012,21 @@ static int megasas_probe_one(struct pci_dev *p= dev, > break; > } > =20 > + /* Crash dump feature related initialisation*/ > + instance->drv_buf_index =3D 0; > + instance->drv_buf_alloc =3D 0; > + instance->crash_dump_fw_support =3D 0; > + instance->crash_dump_app_support =3D 0; > + instance->fw_crash_state =3D UNAVAILABLE; > + spin_lock_init(&instance->crashdump_lock); > + > + instance->crash_dump_buf =3D pci_alloc_consistent(pdev, > + CRASH_DMA_BUF_SIZE, > + &instance->crash_dump_h); > + if (!instance->crash_dump_buf) > + dev_err(&instance->pdev->dev, "Can't allocate Firmware " > + "crash dump DMA buffer\n"); > + > megasas_poll_wait_aen =3D 0; > instance->flag_ieee =3D 0; > instance->ev =3D NULL; > @@ -4852,9 +5088,10 @@ static int megasas_probe_one(struct pci_dev *p= dev, > if ((instance->pdev->device =3D=3D PCI_DEVICE_ID_LSI_FUSION) || > (instance->pdev->device =3D=3D PCI_DEVICE_ID_LSI_PLASMA) || > (instance->pdev->device =3D=3D PCI_DEVICE_ID_LSI_INVADER) || > - (instance->pdev->device =3D=3D PCI_DEVICE_ID_LSI_FURY)) > + (instance->pdev->device =3D=3D PCI_DEVICE_ID_LSI_FURY)) { > INIT_WORK(&instance->work_init, megasas_fusion_ocr_wq); > - else > + INIT_WORK(&instance->crash_init, megasas_fusion_crash_dump_wq); > + } else > INIT_WORK(&instance->work_init, process_fw_state_change_wq); > =20 > /* > @@ -5342,6 +5579,8 @@ static void megasas_detach_one(struct pci_dev *= pdev) > if (instance->requestorId && !instance->skip_heartbeat_timer_del) > del_timer_sync(&instance->sriov_heartbeat_timer); > =20 > + if (instance->fw_crash_state !=3D UNAVAILABLE) > + megasas_free_host_crash_buffer(instance); > scsi_remove_host(instance->host); > megasas_flush_cache(instance); > megasas_shutdown_controller(instance, MR_DCMD_CTRL_SHUTDOWN); > @@ -5432,6 +5671,10 @@ static void megasas_detach_one(struct pci_dev = *pdev) > instance->hb_host_mem, > instance->hb_host_mem_h); > =20 > + if (instance->crash_dump_buf) > + pci_free_consistent(pdev, CRASH_DMA_BUF_SIZE, > + instance->crash_dump_buf, instance->crash_dump_h); > + > scsi_host_put(host); > =20 > pci_disable_device(pdev); > @@ -5523,6 +5766,45 @@ static unsigned int megasas_mgmt_poll(struct f= ile *file, poll_table *wait) > return mask; > } > =20 > +/* > + * megasas_set_crash_dump_params_ioctl: > + * Send CRASH_DUMP_MODE DCMD to all controllers > + * @cmd: MFI command frame > + */ > + > +static int megasas_set_crash_dump_params_ioctl( > + struct megasas_cmd *cmd) > +{ > + struct megasas_instance *local_instance; > + int i, error =3D 0; > + int crash_support; > + > + crash_support =3D cmd->frame->dcmd.mbox.w[0]; > + > + for (i =3D 0; i < megasas_mgmt_info.max_index; i++) { > + local_instance =3D megasas_mgmt_info.instance[i]; > + if (local_instance && local_instance->crash_dump_drv_support) { > + if ((local_instance->adprecovery =3D=3D > + MEGASAS_HBA_OPERATIONAL) && > + !megasas_set_crash_dump_params(local_instance, > + crash_support)) { > + local_instance->crash_dump_app_support =3D > + crash_support; > + dev_info(&local_instance->pdev->dev, > + "Application firmware crash " > + "dump mode set success\n"); > + error =3D 0; > + } else { > + dev_info(&local_instance->pdev->dev, > + "Application firmware crash " > + "dump mode set failed\n"); > + error =3D -1; > + } > + } > + } > + return error; > +} > + > /** > * megasas_mgmt_fw_ioctl - Issues management ioctls to FW > * @instance: Adapter soft state > @@ -5569,6 +5851,12 @@ megasas_mgmt_fw_ioctl(struct megasas_instance = *instance, > MFI_FRAME_SGL64 | > MFI_FRAME_SENSE64)); > =20 > + if (cmd->frame->dcmd.opcode =3D=3D MR_DRIVER_SET_APP_CRASHDUMP_MODE= ) { > + error =3D megasas_set_crash_dump_params_ioctl(cmd); > + megasas_return_cmd(instance, cmd); > + return error; > + } > + > /* > * The management interface between applications and the fw uses > * MFI frames. E.g, RAID configuration changes, LD property changes > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/sc= si/megaraid/megaraid_sas_fusion.c > index f30297d..aaba2a7 100644 > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c > @@ -91,6 +91,8 @@ void megasas_start_timer(struct megasas_instance *i= nstance, > extern struct megasas_mgmt_info megasas_mgmt_info; > extern int resetwaittime; > =20 > + > + > /** > * megasas_enable_intr_fusion - Enables interrupts > * @regs: MFI register set > @@ -2057,7 +2059,7 @@ irqreturn_t megasas_isr_fusion(int irq, void *d= evp) > { > struct megasas_irq_context *irq_context =3D devp; > struct megasas_instance *instance =3D irq_context->instance; > - u32 mfiStatus, fw_state; > + u32 mfiStatus, fw_state, dma_state; > =20 > if (instance->mask_interrupts) > return IRQ_NONE; > @@ -2079,7 +2081,16 @@ irqreturn_t megasas_isr_fusion(int irq, void *= devp) > /* If we didn't complete any commands, check for FW fault */ > fw_state =3D instance->instancet->read_fw_status_reg( > instance->reg_set) & MFI_STATE_MASK; > - if (fw_state =3D=3D MFI_STATE_FAULT) { > + dma_state =3D instance->instancet->read_fw_status_reg > + (instance->reg_set) & MFI_STATE_DMADONE; > + if (instance->crash_dump_drv_support && > + instance->crash_dump_app_support) { > + /* Start collecting crash, if DMA bit is done */ > + if ((fw_state =3D=3D MFI_STATE_FAULT) && dma_state) > + schedule_work(&instance->crash_init); > + else if (fw_state =3D=3D MFI_STATE_FAULT) > + schedule_work(&instance->work_init); > + } else if (fw_state =3D=3D MFI_STATE_FAULT) { > printk(KERN_WARNING "megaraid_sas: Iop2SysDoorbellInt" > "for scsi%d\n", instance->host->host_no); > schedule_work(&instance->work_init); > @@ -2232,6 +2243,49 @@ megasas_read_fw_status_reg_fusion(struct megas= as_register_set __iomem *regs) > } > =20 > /** > + * megasas_alloc_host_crash_buffer - Host buffers for Crash dump col= lection from Firmware > + * @instance: Controller's soft instance > + * return: Number of allocated host crash buffers > + */ > +static void > +megasas_alloc_host_crash_buffer(struct megasas_instance *instance) > +{ > + unsigned int i; > + > + instance->crash_buf_pages =3D get_order(CRASH_DMA_BUF_SIZE); > + for (i =3D 0; i < MAX_CRASH_DUMP_SIZE; i++) { > + instance->crash_buf[i] =3D (void *)__get_free_pages(GFP_KERNEL, > + instance->crash_buf_pages); > + if (!instance->crash_buf[i]) { > + dev_info(&instance->pdev->dev, "Firmware crash dump " > + "memory allocation failed at index %d\n", i); > + break; > + } > + } > + instance->drv_buf_alloc =3D i; > +} > + > +/** > + * megasas_free_host_crash_buffer - Host buffers for Crash dump coll= ection from Firmware > + * @instance: Controller's soft instance > + */ > +void > +megasas_free_host_crash_buffer(struct megasas_instance *instance) > +{ > + unsigned int i > +; > + for (i =3D 0; i < MAX_CRASH_DUMP_SIZE; i++) { I'm not sure, but shouldn't this be changed to ? for (i =3D 0; i < instance->drv_buf_alloc; i++) { =20 > + if (instance->crash_buf[i]) > + free_pages((ulong)instance->crash_buf[i], > + instance->crash_buf_pages); > + } > + instance->drv_buf_index =3D 0; > + instance->drv_buf_alloc =3D 0; > + instance->fw_crash_state =3D UNAVAILABLE; > + instance->fw_crash_buffer_size =3D 0; > +} > + > +/** > * megasas_adp_reset_fusion - For controller reset > * @regs: MFI register set > */ > @@ -2374,6 +2428,7 @@ int megasas_reset_fusion(struct Scsi_Host *shos= t, int iotimeout) > struct megasas_cmd *cmd_mfi; > union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc; > u32 host_diag, abs_state, status_reg, reset_adapter; > + u32 io_timeout_in_crash_mode =3D 0; > =20 > instance =3D (struct megasas_instance *)shost->hostdata; > fusion =3D instance->ctrl_context; > @@ -2387,6 +2442,42 @@ int megasas_reset_fusion(struct Scsi_Host *sho= st, int iotimeout) > mutex_unlock(&instance->reset_mutex); > return FAILED; > } > + status_reg =3D instance->instancet->read_fw_status_reg(instance->re= g_set); > + abs_state =3D status_reg & MFI_STATE_MASK; > + > + /* IO timeout detected, forcibly put FW in FAULT state */ > + if (abs_state !=3D MFI_STATE_FAULT && instance->crash_dump_buf && > + instance->crash_dump_app_support && iotimeout) { > + dev_info(&instance->pdev->dev, "IO timeout is detected, " > + "forcibly FAULT Firmware\n"); > + instance->adprecovery =3D MEGASAS_ADPRESET_SM_INFAULT; > + status_reg =3D readl(&instance->reg_set->doorbell); > + writel(status_reg | MFI_STATE_FORCE_OCR, > + &instance->reg_set->doorbell); > + readl(&instance->reg_set->doorbell); > + mutex_unlock(&instance->reset_mutex); > + do { > + ssleep(3); > + io_timeout_in_crash_mode++; > + dev_dbg(&instance->pdev->dev, "waiting for [%d] " > + "seconds for crash dump collection and OCR " > + "to be done\n", (io_timeout_in_crash_mode * 3)); > + } while ((instance->adprecovery !=3D MEGASAS_HBA_OPERATIONAL) && > + (io_timeout_in_crash_mode < 80)); > + > + if (instance->adprecovery =3D=3D MEGASAS_HBA_OPERATIONAL) { > + dev_info(&instance->pdev->dev, "OCR done for IO " > + "timeout case\n"); > + retval =3D SUCCESS; > + } else { > + dev_info(&instance->pdev->dev, "Controller is not " > + "operational after 240 seconds wait for IO " > + "timeout case in FW crash dump mode\n do " > + "OCR/kill adapter\n"); > + retval =3D megasas_reset_fusion(shost, 0); > + } > + return retval; > + } > =20 > if (instance->requestorId && !instance->skip_heartbeat_timer_del) > del_timer_sync(&instance->sriov_heartbeat_timer); > @@ -2653,6 +2744,15 @@ int megasas_reset_fusion(struct Scsi_Host *sho= st, int iotimeout) > printk(KERN_WARNING "megaraid_sas: Reset " > "successful for scsi%d.\n", > instance->host->host_no); > + > + if (instance->crash_dump_drv_support) { > + if (instance->crash_dump_app_support) > + megasas_set_crash_dump_params(instance, > + MR_CRASH_BUF_TURN_ON); > + else > + megasas_set_crash_dump_params(instance, > + MR_CRASH_BUF_TURN_OFF); > + } > retval =3D SUCCESS; > goto out; > } > @@ -2681,6 +2781,74 @@ out: > return retval; > } > =20 > +/* Fusion Crash dump collection work queue */ > +void megasas_fusion_crash_dump_wq(struct work_struct *work) > +{ > + struct megasas_instance *instance =3D > + container_of(work, struct megasas_instance, crash_init); > + u32 status_reg; > + u8 partial_copy =3D 0; > + > + > + status_reg =3D instance->instancet->read_fw_status_reg(instance->re= g_set); > + > + /* > + * Allocate host crash buffers to copy data from 1 MB DMA crash buf= fer > + * to host crash buffers > + */ > + if (instance->drv_buf_index =3D=3D 0) { > + /* Buffer is already allocated for old Crash dump. > + * Do OCR and do not wait for crash dump collection > + */ > + if (instance->drv_buf_alloc) { > + dev_info(&instance->pdev->dev, "earlier crash dump is " > + "not yet copied by application, ignoring this " > + "crash dump and initiating OCR\n"); > + status_reg |=3D MFI_STATE_CRASH_DUMP_DONE; > + writel(status_reg, > + &instance->reg_set->outbound_scratch_pad); > + readl(&instance->reg_set->outbound_scratch_pad); > + return; > + } > + megasas_alloc_host_crash_buffer(instance); > + dev_info(&instance->pdev->dev, "Number of host crash buffers " > + "allocated: %d\n", instance->drv_buf_alloc); > + } > + > + /* > + * Driver has allocated max buffers, which can be allocated > + * and FW has more crash dump data, then driver will > + * ignore the data. > + */ > + if (instance->drv_buf_index >=3D (instance->drv_buf_alloc)) { > + dev_info(&instance->pdev->dev, "Driver is done copying " > + "the buffer: %d\n", instance->drv_buf_alloc); > + status_reg |=3D MFI_STATE_CRASH_DUMP_DONE; > + partial_copy =3D 1; > + } else { > + memcpy(instance->crash_buf[instance->drv_buf_index], > + instance->crash_dump_buf, CRASH_DMA_BUF_SIZE); > + instance->drv_buf_index++; > + status_reg &=3D ~MFI_STATE_DMADONE; > + } > + > + if (status_reg & MFI_STATE_CRASH_DUMP_DONE) { > + dev_info(&instance->pdev->dev, "Crash Dump is available,number " > + "of copied buffers: %d\n", instance->drv_buf_index); > + instance->fw_crash_buffer_size =3D instance->drv_buf_index; > + instance->fw_crash_state =3D AVAILABLE; > + instance->drv_buf_index =3D 0; > + writel(status_reg, &instance->reg_set->outbound_scratch_pad); > + readl(&instance->reg_set->outbound_scratch_pad); > + if (!partial_copy) > + megasas_reset_fusion(instance->host, 0); > + } else { > + writel(status_reg, &instance->reg_set->outbound_scratch_pad); > + readl(&instance->reg_set->outbound_scratch_pad); > + } > +} > + > + > /* Fusion OCR work queue */ > void megasas_fusion_ocr_wq(struct work_struct *work) > { -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html