From: Hannes Reinecke <hare@suse.de>
To: "Desai, Kashyap" <Kashyap.Desai@lsi.com>,
James Bottomley <jbottomley@parallels.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
Adam Radford <aradford@gmail.com>,
"Saxena, Sumit" <Sumit.Saxena@lsi.com>
Subject: Re: [PATCH 1/6] megaraid_sas: Do not wait forever
Date: Fri, 24 Jan 2014 11:04:15 +0100 [thread overview]
Message-ID: <52E23A9F.4000904@suse.de> (raw)
In-Reply-To: <d02c587fc2dd4e83b6f87d957c2344e7@BN1PR07MB247.namprd07.prod.outlook.com>
On 01/24/2014 09:34 AM, Desai, Kashyap wrote:
>
>
>> -----Original Message-----
>> From: Hannes Reinecke [mailto:hare@suse.de]
>> Sent: Friday, January 24, 2014 1:54 PM
>> To: Desai, Kashyap; James Bottomley
>> Cc: linux-scsi@vger.kernel.org; Adam Radford; Saxena, Sumit
>> Subject: Re: [PATCH 1/6] megaraid_sas: Do not wait forever
>>
>> On 01/24/2014 08:46 AM, Desai, Kashyap wrote:
>>> Hannes:
>>>
>>> We have already worked on "wait_event" usage in
>> "megasas_issue_blocked_cmd".
>>> That code will be posted by LSI once we received test result from
>> LSI Q/A team.
>>>
>>> If you see the current OCR code in Linux Driver we do "re-send the IOCTL
>> command".
>>> MR product does not want IOCTL timeout due to some reason. That is why
>>> even if FW faulted, Driver will do OCR and re-send all existing
>> <Management commands>
>>> (IOCTL comes under management commands).
>>>
>>> Just for info. (see below snippet in OCR code)
>>>
>>> /* Re-fire management commands */
>>> for (j = 0 ; j < instance->max_fw_cmds; j++) {
>>> cmd_fusion = fusion->cmd_list[j];
>>> if (cmd_fusion->sync_cmd_idx != (u32)ULONG_MAX) {
>>> cmd_mfi = instance->cmd_list[cmd_fusion-
>>> sync_cmd_idx];
>>> if (cmd_mfi->frame->dcmd.opcode ==
>> MR_DCMD_LD_MAP_GET_INFO) {
>>> megasas_return_cmd(instance, cmd_mfi);
>>>
>>> megasas_return_cmd_fusion(instance, cmd_fusion);
>>>
>>>
>>>
>>> Current <MR> Driver is not designed to add <timeout> for DCMD and IOCTL
>> path.
>>> [ I added timeout only for limited DCMDs, which are harmless to
>> continue after timeout ]
>>>
>>> As of now, you can skip this patch and we will be submitting patch to fix
>> similar issue.
>>> But note, we cannot add complete "wait_event_timeout" due to day-1
>>> design, but will try to cover wait_event_timout for some valid cases.
>>>
>> Ouch.
>>
>> The reason I sent this patch is that I've got an Intel box here, which blocks
>> megaraid_sas initialisation when the IOMMU is turned on:
>>
>> [ 21.867264] megasas: io_request_frames ffff880800f50000
>> [ 21.867363] megasas: init frame 00000000fff57000
>> [ 22.223234] megasas: frame status 00
>> [ 22.223235] megasas: IOC Init cmd success
>> [ 22.223282] megasas: ld map ffff88080b600000
>> [ 22.223289] megasas: issue dcmd 05 opcode 300e101
>> [ 22.244184] dmar: DRHD: handling fault status reg 2
>> [ 22.244186] dmar: DMAR:[DMA Read] Request device [06:00.0] fault
>> addr 6980000
>> [ 22.244186] DMAR:[fault reason 06] PTE Read access is not set
>> [ 22.247223] megasas: frame status 00
>> [ 22.247231] megasas: issue dcmd 05 opcode 300e101
>> [ 22.247231] megasas: INIT adapter done
>> [ 22.247237] megasas: pd list ffff88080cfd0000 size 8192
>> [ 22.247237] megasas: issue dcmd 05 opcode 2010100
>> [ 22.253516] dmar: DRHD: handling fault status reg 102
>> [ 22.253518] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
>> addr e3f0000
>> [ 22.253518] DMAR:[fault reason 05] PTE Write access is not set
>> [ 22.253521] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
>> addr e3f0000
>> [ 22.253521] DMAR:[fault reason 05] PTE Write access is not set
>> [ 22.253523] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
>> addr e3f0000
>>
>> [ Some more DMAR messages snipped ]
>>
>> [ 22.273199] dmar: DRHD: handling fault status reg 2
>> [ 22.273201] dmar: DMAR:[DMA Read] Request device [06:00.0] fault
>> addr 6cef000
>> [ 22.273201] DMAR:[fault reason 06] PTE Read access is not set
>>
>> [ .. ]
>>
>> [ 94.222456] megasas: frame status ff
>> [ 94.240946] megasas: failed to get PD list
>>
>> (I've inserted some debugging messages :-)
>>
>> This is really weird. The 'write' faults do correspond with the number of
>> (megaraid) commands, reserved at the initial step.
>> (This is a 'Fury' card, btw).
>
> Fury card has iMR FW and we have seen issue with iMR FW if IOMMU is ON, but not like driver load failure.
> Is your OS driver behind Fury ? What is a Raid type used on your setup ?
>
It's SLES12 (alpha), basically plain 3.13
> Which system you are using ?
>
Pre-production hardware, admittedly. So it _might_ be a BIOS issue.
>> What is more puzzling is that the INIT command and the initial LD List
>> command goes through, but the PD List command gets blocked.
>>
>> Incidentally, this is not consistent; occasionally even the LD List command
>> gets blocked, and the DMAR messages occur earlier.
>
> LD command use megasas_issue_polled which is already timeout based mechanism.
> Below are list of DCMD command which use infinite timeout.
>
> megasas_get_seq_num
> megasas_flush_cache
> megasas_shutdown_controller
> megasas_mgmt_fw_ioctl
>
>
> We can convert all DCMD except IOCTL with timeout value. For you " megasas_get_seq_num"
> might be hang in FW. It cannot be " megasas_get_ld_list".
>
Ahh. Okay, will try be modify megasas_get_seq_num() and see if that
works.
Cheers,
Hannes
P.S.: I've also send an earlier patch named 'megaraid_sas: disable
controller reset for PPC' to linux-scsi. Care to review it, too?
Thanks.
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-01-24 10:04 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-16 10:25 [PATCH 0/6] megaraid_sas: Fix system stall with iommu enabled Hannes Reinecke
2014-01-16 10:25 ` [PATCH 1/6] megaraid_sas: Do not wait forever Hannes Reinecke
2014-01-24 7:46 ` Desai, Kashyap
2014-01-24 8:24 ` Hannes Reinecke
2014-01-24 8:34 ` Desai, Kashyap
2014-01-24 10:04 ` Hannes Reinecke [this message]
2014-01-16 10:25 ` [PATCH 2/6] megaraid_sas_fusion: Fixup fire_cmd syntax Hannes Reinecke
2014-01-16 10:25 ` [PATCH 3/6] megaraid_sas_fusion: correctly pass queue info pointer Hannes Reinecke
2014-01-24 8:41 ` Desai, Kashyap
2014-01-16 10:25 ` [PATCH 4/6] megaraid_sas: catch errors from megasas_get_map_info() Hannes Reinecke
2014-01-24 8:35 ` Desai, Kashyap
2014-01-16 10:25 ` [PATCH 5/6] megaraid_sas_fusion: Return correct error value in megasas_get_ld_map_info() Hannes Reinecke
2014-01-24 8:45 ` Desai, Kashyap
2014-01-16 10:25 ` [PATCH 6/6] megaraid_sas: check return value for megasas_get_pd_list() Hannes Reinecke
2014-01-24 8:38 ` Desai, Kashyap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52E23A9F.4000904@suse.de \
--to=hare@suse.de \
--cc=Kashyap.Desai@lsi.com \
--cc=Sumit.Saxena@lsi.com \
--cc=aradford@gmail.com \
--cc=jbottomley@parallels.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.