linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: "Desai, Kashyap" <Kashyap.Desai@lsi.com>,
	James Bottomley <jbottomley@parallels.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	Adam Radford <aradford@gmail.com>,
	"Saxena, Sumit" <Sumit.Saxena@lsi.com>
Subject: Re: [PATCH 1/6] megaraid_sas: Do not wait forever
Date: Fri, 24 Jan 2014 09:24:00 +0100	[thread overview]
Message-ID: <52E22320.4060603@suse.de> (raw)
In-Reply-To: <ed5d620b461a4d918b00ab18553618dd@BN1PR07MB247.namprd07.prod.outlook.com>

On 01/24/2014 08:46 AM, Desai, Kashyap wrote:
> Hannes:
> 
> We have already worked on "wait_event" usage in "megasas_issue_blocked_cmd".
> That code will be posted  by LSI once we received test result from
LSI Q/A team.
> 
> If you see the current OCR code in Linux Driver we do "re-send the IOCTL command".
> MR product does not want IOCTL timeout due to some reason. That is why even if
> FW faulted, Driver will do OCR and re-send all existing
<Management commands>
> (IOCTL comes under management commands).
> 
> Just for info. (see below snippet in  OCR code)
> 
> /* Re-fire management commands */
>                         for (j = 0 ; j < instance->max_fw_cmds; j++) {
>                                 cmd_fusion = fusion->cmd_list[j];
>                                 if (cmd_fusion->sync_cmd_idx != (u32)ULONG_MAX) {
>                                         cmd_mfi = instance->cmd_list[cmd_fusion->sync_cmd_idx];
>                                         if (cmd_mfi->frame->dcmd.opcode == MR_DCMD_LD_MAP_GET_INFO) {
>                                                 megasas_return_cmd(instance, cmd_mfi);
>                                                 megasas_return_cmd_fusion(instance, cmd_fusion);
> 
> 
> 
> Current <MR> Driver is not designed to add <timeout> for DCMD and IOCTL path.
> [ I added timeout only for limited DCMDs, which are harmless to
continue after timeout ]
> 
> As of now, you can skip this patch and we will be submitting patch to fix similar issue.
> But note, we cannot add complete "wait_event_timeout" due to day-1 design, but will
> try to cover wait_event_timout for some valid cases.
> 
Ouch.

The reason I sent this patch is that I've got an Intel box here,
which blocks megaraid_sas initialisation when the IOMMU is turned on:

[   21.867264] megasas: io_request_frames ffff880800f50000
[   21.867363] megasas: init frame 00000000fff57000
[   22.223234] megasas: frame status 00
[   22.223235] megasas: IOC Init cmd success
[   22.223282] megasas: ld map ffff88080b600000
[   22.223289] megasas: issue dcmd 05 opcode 300e101
[   22.244184] dmar: DRHD: handling fault status reg 2
[   22.244186] dmar: DMAR:[DMA Read] Request device [06:00.0] fault
addr 6980000
[   22.244186] DMAR:[fault reason 06] PTE Read access is not set
[   22.247223] megasas: frame status 00
[   22.247231] megasas: issue dcmd 05 opcode 300e101
[   22.247231] megasas: INIT adapter done
[   22.247237] megasas: pd list ffff88080cfd0000 size 8192
[   22.247237] megasas: issue dcmd 05 opcode 2010100
[   22.253516] dmar: DRHD: handling fault status reg 102
[   22.253518] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
addr e3f0000
[   22.253518] DMAR:[fault reason 05] PTE Write access is not set
[   22.253521] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
addr e3f0000
[   22.253521] DMAR:[fault reason 05] PTE Write access is not set
[   22.253523] dmar: DMAR:[DMA Write] Request device [06:00.0] fault
addr e3f0000

[ Some more DMAR messages snipped ]

[   22.273199] dmar: DRHD: handling fault status reg 2
[   22.273201] dmar: DMAR:[DMA Read] Request device [06:00.0] fault
addr 6cef000
[   22.273201] DMAR:[fault reason 06] PTE Read access is not set

[ .. ]

[   94.222456] megasas: frame status ff
[   94.240946] megasas: failed to get PD list

(I've inserted some debugging messages :-)

This is really weird. The 'write' faults do correspond with the
number of (megaraid) commands, reserved at the initial step.
(This is a 'Fury' card, btw).
What is more puzzling is that the INIT command and the initial
LD List command goes through, but the PD List command gets blocked.

Incidentally, this is not consistent; occasionally even the LD List
command gets blocked, and the DMAR messages occur earlier.

Anyway. Point is, if we cannot timout these initial commands
the megaraid_sas driver will be stuck during initialisation (as the
loop _never_ terminates).
Which in turn means that the modprobe command hangs indefinitely,
and you cannot even unload the module.
The only way to recover here is a reboot.
Nasty.

Hence the patch for the timeout; when this triggers the HBA is
pretty much hosed anyway, so the state of the firmware is pretty
much irrelevant here. But at least you can continue to boot.

(And OCR doesn't work at this point, neither. But that's a different
story).

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-01-24  8:24 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-16 10:25 [PATCH 0/6] megaraid_sas: Fix system stall with iommu enabled Hannes Reinecke
2014-01-16 10:25 ` [PATCH 1/6] megaraid_sas: Do not wait forever Hannes Reinecke
2014-01-24  7:46   ` Desai, Kashyap
2014-01-24  8:24     ` Hannes Reinecke [this message]
2014-01-24  8:34       ` Desai, Kashyap
2014-01-24 10:04         ` Hannes Reinecke
2014-01-16 10:25 ` [PATCH 2/6] megaraid_sas_fusion: Fixup fire_cmd syntax Hannes Reinecke
2014-01-16 10:25 ` [PATCH 3/6] megaraid_sas_fusion: correctly pass queue info pointer Hannes Reinecke
2014-01-24  8:41   ` Desai, Kashyap
2014-01-16 10:25 ` [PATCH 4/6] megaraid_sas: catch errors from megasas_get_map_info() Hannes Reinecke
2014-01-24  8:35   ` Desai, Kashyap
2014-01-16 10:25 ` [PATCH 5/6] megaraid_sas_fusion: Return correct error value in megasas_get_ld_map_info() Hannes Reinecke
2014-01-24  8:45   ` Desai, Kashyap
2014-01-16 10:25 ` [PATCH 6/6] megaraid_sas: check return value for megasas_get_pd_list() Hannes Reinecke
2014-01-24  8:38   ` Desai, Kashyap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52E22320.4060603@suse.de \
    --to=hare@suse.de \
    --cc=Kashyap.Desai@lsi.com \
    --cc=Sumit.Saxena@lsi.com \
    --cc=aradford@gmail.com \
    --cc=jbottomley@parallels.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).