linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
       [not found] <bug-11045-10286@http.bugzilla.kernel.org/>
@ 2008-07-06 19:34 ` Andrew Morton
       [not found]   ` <C5679C710E19AF4C8D9C02FF5C72E3C133C9C76A@cosmail01.lsi.com>
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2008-07-06 19:34 UTC (permalink / raw)
  To: linux-scsi, linux-acpi; +Cc: bugme-daemon, Moore, Eric Dean, support


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sun,  6 Jul 2008 11:22:08 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=11045
> 
>            Summary: Bug in MPT Fusion 2.6.26-rc7 unbootable
>            Product: Drivers
>            Version: 2.5
>      KernelVersion: 2.6.26-rc7
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: drivers_other@kernel-bugs.osdl.org
>         ReportedBy: kurk@shiftmail.org
> 
> 
> Latest working kernel version: 2.6.25
> Earliest failing kernel version: 2.6.26-rc7
> Distribution: Debian (but vanilla kernel)
> Hardware Environment: IBM xSeries 335
> Software Environment: error and hangup at boot
> Problem Description: MPT Fusion error, unbootable, see below
> Steps to reproduce: see below

We have two bugs here.  One in mpt-fusion and what I suspect is a
post-2.6.25 regression in ACPI.


> Detailed description:
> 
> Hi all,
> I'm no kernel expert, I hope I made no mistakes in this report. It seems to me
> that a bug was added to the MPT Fusion driver in 2.6.26 (rc7).
> 
> I compiled 2.6.26-rc7 on a machine with controller LSI53C1080 and it cannot
> boot. Doing the same with 2.6.25, basically the same config file, boots without
> problems.
> 
> I tried to forward-port the Fusion driver from 2.6.25 to 2.6.26-rc7 by simply
> copying over the directory drivers/message/fusion/ from 2.6.25 to 2.6.26-rc7
> but unfortunately this doesn't compile, so I am stuck not being able to use
> 2.6.26 on this machine (actually I have not tried versions of 2.6.26 earlier
> than rc7... I don't have much time now).
> 
> I connected a serial cable in order to obtain the boot error message. I
> obtained two of those on different boots. I will paste these at the end of this
> post.
> 
> 
> This is the verbose lspci of the controller (obtained with 2.6.25):
> ----------------------------------------
> 01:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X
> Fusion-MPT Dual Ultra320 SCSI (rev 07)
>         Subsystem: IBM Unknown device 026d
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+
> Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 72 (4250ns min, 4500ns max), Cache Line Size: 32 bytes
>         Interrupt: pin A routed to IRQ 22
>         Region 0: I/O ports at 2300 [size=256]
>         Region 1: Memory at fbff0000 (64-bit, non-prefetchable) [size=64K]
>         Region 3: Memory at fbfe0000 (64-bit, non-prefetchable) [size=64K]
>         Capabilities: [50] Power Management version 2
>                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0
> Enable-
>                 Address: 0000000000000000  Data: 0000
>         Capabilities: [68] PCI-X non-bridge device
>                 Command: DPERE- ERO- RBC=512 OST=1
>                 Status: Dev=01:01.0 64bit+ 133MHz+ SCD- USC- DC=simple
> DMMRBC=2048 DMOST=8 DMCRS=16 RSCEM- 266MHz- 533MHz-
>         Kernel driver in use: mptspi
>         Kernel modules: mptspi
> ----------------------------------------
> 
> 
> This is an excerpt of the dmesg on 2.6.25 where the controller WORKS:
> --------------------------------------------------------------------
> Fusion MPT base driver 3.04.06
> Copyright (c) 1999-2007 LSI Corporation
> Fusion MPT SPI Host driver 3.04.06
> ...
> mptbase: ioc0: Initiating bringup
> ...
> ioc0: LSI53C1030 B2: Capabilities={Initiator}
> Probing IDE interface ide1...
> hdc: LG CD-ROM CRN-8245B, ATAPI CD/DVD-ROM drive
> scsi0 : ioc0: LSI53C1030 B2, FwRev=01000e00h, Ports=1, MaxQ=222, IRQ=22
> ...
> scsi0 : ioc0: LSI53C1030 B2, FwRev=01000e00h, Ports=1, MaxQ=222, IRQ=22
> hdc: host max PIO4 wanted PIO255(auto-tune) selected PIO4
> hdc: UDMA/33 mode selected
> ide1 at 0x170-0x177,0x376 on irq 15
> tg3.c:v3.90 (April 12, 2008)
> ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 24 (level, low) -> IRQ 24
> scsi 0:0:0:0: Direct-Access     IBM-ESXS DTN018C1UCDY10F  S23J PQ: 0 ANSI: 3
>  target0:0:0: Beginning Domain Validation
>  target0:0:0: Ending Domain Validation
>  target0:0:0: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 127)
> scsi 0:0:1:0: Direct-Access     IBM-ESXS DTN018C1UCDY10F  S23J PQ: 0 ANSI: 3
>  target0:0:1: Beginning Domain Validation
> ...
> ACPI: PCI Interrupt 0000:02:02.0[A] -> GSI 25 (level, low) -> IRQ 25
>  target0:0:1: Ending Domain Validation
>  target0:0:1: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 127)
> ...
> hdc: ATAPI 24X CD-ROM drive, 128kB Cache
> Uniform CD-ROM driver Revision: 3.20
> scsi 0:0:8:0: Processor         IBM      25P3495a S320  1 1    PQ: 0 ANSI: 2
>  target0:0:8: Beginning Domain Validation
>  target0:0:8: Ending Domain Validation
>  target0:0:8: asynchronous
> Driver 'sd' needs updating - please use bus_type methods
> sd 0:0:0:0: [sda] 35548320 512-byte hardware sectors (18201 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: cb 00 00 08
> sd 0:0:0:0: Attached scsi generic sg0 type 0
> scsi 0:0:1:0: Attached scsi generic sg1 type 0
> scsi 0:0:8:0: Attached scsi generic sg2 type 3
> --------------------------------------------------------------------
> 
> 
> It is an x86 32bit PC compile. This is the excerpt of the .config file grepping
> for FUSION
> ------------------------------------
> CONFIG_FUSION=y
> CONFIG_FUSION_SPI=m
> CONFIG_FUSION_FC=m
> CONFIG_FUSION_SAS=m
> CONFIG_FUSION_MAX_SGE=40
> CONFIG_FUSION_CTL=m
> CONFIG_FUSION_LAN=m
> # CONFIG_FUSION_LOGGING is not set
> ------------------------------------
> 
> 
> 
> This is the boot error message obtained with serial cable. I left it running
> for 8 minutes for this. It loops so the message never ends.
> --------------------------------------------------------------------
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry
> 
> ACPI: Resource is not an IRQ entry

The acpi problem.

> mptbase: ioc0: ERROR - Doorbell ACK timeout (count=4999), IntStatus=80000009!
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000034c
> 
> IP: [<f885cc5e>] :mptspi:mptspi_dv_renegotiate_work+0xa/0x9f
> 
> Oops: 0000 [#1] SMP
> 
> Modules linked in: ide_pci_generic(+) floppy mptspi(+) mptscsih ohci_hcd tg3
> mptbase scsi_transport_spi usbcore serverworks ide_core ata_generic libata
> scsi_mod dock thermal processor fan thermal_sys
> 
> 
> 
> Pid: 9, comm: events/0 Not tainted (2.6.26-rc7 #1)
> 
> EIP: 0060:[<f885cc5e>] EFLAGS: 00010282 CPU: 0
> 
> EIP is at mptspi_dv_renegotiate_work+0xa/0x9f [mptspi]
> 
> EAX: f7a447c0 EBX: f7429900 ECX: f7a447c4 EDX: c1908988
> 
> ESI: f7a447c0 EDI: 0000034c EBP: f7429904 ESP: f7477f80
> 
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> 
> Process events/0 (pid: 9, ti=f7476000 task=f744d770 task.ti=f7476000)
> 
> Stack: f744d8e0 c190b260 00000000 c1908984 f7429900 f7a447c0 f885cc54 f7429904
> 
>        c012f253 f7429900 c012f934 f742990c 00000000 c012f9e8 00000000 f744d770
> 
>        c0131bdc f7477fc4 f7477fc4 f7429900 c012f934 00000000 c0131b1b c0131ae3
> 
> Call Trace:
> 
>  [<f885cc54>] mptspi_dv_renegotiate_work+0x0/0x9f [mptspi]
> 
>  [<c012f253>] run_workqueue+0x75/0xf6
> 
>  [<c012f934>] worker_thread+0x0/0xbf
> 
>  [<c012f9e8>] worker_thread+0xb4/0xbf
> 
>  [<c0131bdc>] autoremove_wake_function+0x0/0x2b
> 
>  [<c012f934>] worker_thread+0x0/0xbf
> 
>  [<c0131b1b>] kthread+0x38/0x5d
> 
>  [<c0131ae3>] kthread+0x0/0x5d
> 
>  [<c0104573>] kernel_thread_helper+0x7/0x10
> 
>  =======================
> 
> Code: 70 e8 9e f8 ff ff 8b 47 70 e8 44 b7 fe ff 8b 47 70 5a 5b 5e 5f 5d e9 89
> f8 ff ff 58 5b 5e 5f 5d c3 55 57 56 53 83 ec 10 8b 78 10 <8b> 2f e8 c7 98 90 c7
> 66 83 bf 96 02 00 00 00 8b 85 3c 01 00 00
> 
> EIP: [<f885cc5e>] mptspi_dv_renegotiate_work+0xa/0x9f [mptspi] SS:ESP
> 0068:f7477f80
> 
> ---[ end trace e311270f757682e4 ]---

mpt-fusion shouldn't oops, no matter what acpi did to it.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
       [not found]   ` <C5679C710E19AF4C8D9C02FF5C72E3C133C9C76A@cosmail01.lsi.com>
@ 2008-07-08  8:57     ` Andrew Morton
  2008-07-08 14:08       ` James Bottomley
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2008-07-08  8:57 UTC (permalink / raw)
  Cc: linux-scsi, linux-acpi, bugme-daemon, Moore, Eric Dean, support,
	kurk


You removed everyone from cc.  Please don't do that - there's not much
point in asking me to do things - this bug is reported by
kurk@shiftmail.org.

I don't know what "we do not assist with compiling drivers" can possibly
mean.  Eric, can you please help here?


On Mon, 7 Jul 2008 07:28:00 -0600 "Support, Software" <support@lsi.com> wrote:

>  Unfortunately,  we do not assist with compiling drivers.
> 
> I would recommend updating the firmware and BIOS on the controllers you are using, so that the compiled driver could communicate with the controller better.
> 
> In order to point you to the correct package for the controller that is not taking the compiled driver, I will need for you to send me all of the numbers off of the front and back of the controller.
> 
> -----Original Message-----
> From: Andrew Morton [mailto:akpm@linux-foundation.org]
> Sent: Sunday, July 06, 2008 3:34 PM
> To: linux-scsi@vger.kernel.org; linux-acpi@vger.kernel.org
> Cc: bugme-daemon@bugzilla.kernel.org; Moore, Eric; Support, Software
> Subject: Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
> 
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the bugzilla web interface).
> 
> On Sun,  6 Jul 2008 11:22:08 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=11045
> >
> >            Summary: Bug in MPT Fusion 2.6.26-rc7 unbootable
> >            Product: Drivers
> >            Version: 2.5
> >      KernelVersion: 2.6.26-rc7
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >         AssignedTo: drivers_other@kernel-bugs.osdl.org
> >         ReportedBy: kurk@shiftmail.org
> >
> >
> > Latest working kernel version: 2.6.25
> > Earliest failing kernel version: 2.6.26-rc7
> > Distribution: Debian (but vanilla kernel) Hardware Environment: IBM
> > xSeries 335 Software Environment: error and hangup at boot Problem
> > Description: MPT Fusion error, unbootable, see below Steps to
> > reproduce: see below
> 
> We have two bugs here.  One in mpt-fusion and what I suspect is a
> post-2.6.25 regression in ACPI.
> 
> 
> > Detailed description:
> >
> > Hi all,
> > I'm no kernel expert, I hope I made no mistakes in this report. It
> > seems to me that a bug was added to the MPT Fusion driver in 2.6.26 (rc7).
> >
> > I compiled 2.6.26-rc7 on a machine with controller LSI53C1080 and it
> > cannot boot. Doing the same with 2.6.25, basically the same config
> > file, boots without problems.
> >
> > I tried to forward-port the Fusion driver from 2.6.25 to 2.6.26-rc7 by
> > simply copying over the directory drivers/message/fusion/ from 2.6.25
> > to 2.6.26-rc7 but unfortunately this doesn't compile, so I am stuck
> > not being able to use
> > 2.6.26 on this machine (actually I have not tried versions of 2.6.26
> > earlier than rc7... I don't have much time now).
> >
> > I connected a serial cable in order to obtain the boot error message.
> > I obtained two of those on different boots. I will paste these at the
> > end of this post.
> >
> >
> > This is the verbose lspci of the controller (obtained with 2.6.25):
> > ----------------------------------------
> > 01:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030
> > PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
> >         Subsystem: IBM Unknown device 026d
> >         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
> > ParErr+
> > Stepping- SERR+ FastB2B- DisINTx-
> >         Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
> > >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Latency: 72 (4250ns min, 4500ns max), Cache Line Size: 32 bytes
> >         Interrupt: pin A routed to IRQ 22
> >         Region 0: I/O ports at 2300 [size=256]
> >         Region 1: Memory at fbff0000 (64-bit, non-prefetchable) [size=64K]
> >         Region 3: Memory at fbfe0000 (64-bit, non-prefetchable) [size=64K]
> >         Capabilities: [50] Power Management version 2
> >                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
> > PME(D0-,D1-,D2-,D3hot-,D3cold-)
> >                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> >         Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+
> > Queue=0/0
> > Enable-
> >                 Address: 0000000000000000  Data: 0000
> >         Capabilities: [68] PCI-X non-bridge device
> >                 Command: DPERE- ERO- RBC=512 OST=1
> >                 Status: Dev=01:01.0 64bit+ 133MHz+ SCD- USC- DC=simple
> > DMMRBC=2048 DMOST=8 DMCRS=16 RSCEM- 266MHz- 533MHz-
> >         Kernel driver in use: mptspi
> >         Kernel modules: mptspi
> > ----------------------------------------
> >
> >
> > This is an excerpt of the dmesg on 2.6.25 where the controller WORKS:
> > --------------------------------------------------------------------
> > Fusion MPT base driver 3.04.06
> > Copyright (c) 1999-2007 LSI Corporation Fusion MPT SPI Host driver
> > 3.04.06 ...
> > mptbase: ioc0: Initiating bringup
> > ...
> > ioc0: LSI53C1030 B2: Capabilities={Initiator} Probing IDE interface
> > ide1...
> > hdc: LG CD-ROM CRN-8245B, ATAPI CD/DVD-ROM drive scsi0 : ioc0:
> > LSI53C1030 B2, FwRev=01000e00h, Ports=1, MaxQ=222, IRQ=22 ...
> > scsi0 : ioc0: LSI53C1030 B2, FwRev=01000e00h, Ports=1, MaxQ=222,
> > IRQ=22
> > hdc: host max PIO4 wanted PIO255(auto-tune) selected PIO4
> > hdc: UDMA/33 mode selected
> > ide1 at 0x170-0x177,0x376 on irq 15
> > tg3.c:v3.90 (April 12, 2008)
> > ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 24 (level, low) -> IRQ 24
> > scsi 0:0:0:0: Direct-Access     IBM-ESXS DTN018C1UCDY10F  S23J PQ: 0 ANSI: 3
> >  target0:0:0: Beginning Domain Validation
> >  target0:0:0: Ending Domain Validation
> >  target0:0:0: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 127)
> > scsi 0:0:1:0: Direct-Access     IBM-ESXS DTN018C1UCDY10F  S23J PQ: 0 ANSI: 3
> >  target0:0:1: Beginning Domain Validation ...
> > ACPI: PCI Interrupt 0000:02:02.0[A] -> GSI 25 (level, low) -> IRQ 25
> >  target0:0:1: Ending Domain Validation
> >  target0:0:1: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 127)
> > ...
> > hdc: ATAPI 24X CD-ROM drive, 128kB Cache Uniform CD-ROM driver
> > Revision: 3.20
> > scsi 0:0:8:0: Processor         IBM      25P3495a S320  1 1    PQ: 0 ANSI: 2
> >  target0:0:8: Beginning Domain Validation
> >  target0:0:8: Ending Domain Validation
> >  target0:0:8: asynchronous
> > Driver 'sd' needs updating - please use bus_type methods sd 0:0:0:0:
> > [sda] 35548320 512-byte hardware sectors (18201 MB) sd 0:0:0:0: [sda]
> > Write Protect is off sd 0:0:0:0: [sda] Mode Sense: cb 00 00 08 sd
> > 0:0:0:0: Attached scsi generic sg0 type 0 scsi 0:0:1:0: Attached scsi
> > generic sg1 type 0 scsi 0:0:8:0: Attached scsi generic sg2 type 3
> > --------------------------------------------------------------------
> >
> >
> > It is an x86 32bit PC compile. This is the excerpt of the .config file
> > grepping for FUSION
> > ------------------------------------
> > CONFIG_FUSION=y
> > CONFIG_FUSION_SPI=m
> > CONFIG_FUSION_FC=m
> > CONFIG_FUSION_SAS=m
> > CONFIG_FUSION_MAX_SGE=40
> > CONFIG_FUSION_CTL=m
> > CONFIG_FUSION_LAN=m
> > # CONFIG_FUSION_LOGGING is not set
> > ------------------------------------
> >
> >
> >
> > This is the boot error message obtained with serial cable. I left it
> > running for 8 minutes for this. It loops so the message never ends.
> > --------------------------------------------------------------------
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> >
> > ACPI: Resource is not an IRQ entry
> 
> The acpi problem.
> 
> > mptbase: ioc0: ERROR - Doorbell ACK timeout (count=4999), IntStatus=80000009!
> >
> > BUG: unable to handle kernel NULL pointer dereference at 0000034c
> >
> > IP: [<f885cc5e>] :mptspi:mptspi_dv_renegotiate_work+0xa/0x9f
> >
> > Oops: 0000 [#1] SMP
> >
> > Modules linked in: ide_pci_generic(+) floppy mptspi(+) mptscsih
> > ohci_hcd tg3 mptbase scsi_transport_spi usbcore serverworks ide_core
> > ata_generic libata scsi_mod dock thermal processor fan thermal_sys
> >
> >
> >
> > Pid: 9, comm: events/0 Not tainted (2.6.26-rc7 #1)
> >
> > EIP: 0060:[<f885cc5e>] EFLAGS: 00010282 CPU: 0
> >
> > EIP is at mptspi_dv_renegotiate_work+0xa/0x9f [mptspi]
> >
> > EAX: f7a447c0 EBX: f7429900 ECX: f7a447c4 EDX: c1908988
> >
> > ESI: f7a447c0 EDI: 0000034c EBP: f7429904 ESP: f7477f80
> >
> >  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> >
> > Process events/0 (pid: 9, ti=f7476000 task=f744d770 task.ti=f7476000)
> >
> > Stack: f744d8e0 c190b260 00000000 c1908984 f7429900 f7a447c0 f885cc54
> > f7429904
> >
> >        c012f253 f7429900 c012f934 f742990c 00000000 c012f9e8 00000000
> > f744d770
> >
> >        c0131bdc f7477fc4 f7477fc4 f7429900 c012f934 00000000 c0131b1b
> > c0131ae3
> >
> > Call Trace:
> >
> >  [<f885cc54>] mptspi_dv_renegotiate_work+0x0/0x9f [mptspi]
> >
> >  [<c012f253>] run_workqueue+0x75/0xf6
> >
> >  [<c012f934>] worker_thread+0x0/0xbf
> >
> >  [<c012f9e8>] worker_thread+0xb4/0xbf
> >
> >  [<c0131bdc>] autoremove_wake_function+0x0/0x2b
> >
> >  [<c012f934>] worker_thread+0x0/0xbf
> >
> >  [<c0131b1b>] kthread+0x38/0x5d
> >
> >  [<c0131ae3>] kthread+0x0/0x5d
> >
> >  [<c0104573>] kernel_thread_helper+0x7/0x10
> >
> >  =======================
> >
> > Code: 70 e8 9e f8 ff ff 8b 47 70 e8 44 b7 fe ff 8b 47 70 5a 5b 5e 5f
> > 5d e9 89
> > f8 ff ff 58 5b 5e 5f 5d c3 55 57 56 53 83 ec 10 8b 78 10 <8b> 2f e8 c7
> > 98 90 c7
> > 66 83 bf 96 02 00 00 00 8b 85 3c 01 00 00
> >
> > EIP: [<f885cc5e>] mptspi_dv_renegotiate_work+0xa/0x9f [mptspi] SS:ESP
> > 0068:f7477f80
> >
> > ---[ end trace e311270f757682e4 ]---
> 
> mpt-fusion shouldn't oops, no matter what acpi did to it.
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
  2008-07-08  8:57     ` Andrew Morton
@ 2008-07-08 14:08       ` James Bottomley
  2008-07-08 16:51         ` Bjorn Helgaas
  0 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2008-07-08 14:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Support, Software, linux-scsi, linux-acpi, bugme-daemon,
	Moore, Eric Dean, kurk

On Tue, 2008-07-08 at 01:57 -0700, Andrew Morton wrote:
> You removed everyone from cc.  Please don't do that - there's not much
> point in asking me to do things - this bug is reported by
> kurk@shiftmail.org.
> 
> I don't know what "we do not assist with compiling drivers" can possibly
> mean.  Eric, can you please help here?

heh, well, cc'ing a support line on a technical bug report isn't
necessarily conducive to producing useful results ... what we're
discussing is probably already at level 4 or 5 (the real engineering
problems).  Support calls go in at levels 1-3 (as in consult manual and
spit out canned response before triaging for escalation).

That said, this line:

mptbase: ioc0: ERROR - Doorbell ACK timeout (count=4999), IntStatus=80000009!

is absolutely characteristic of a lost interrupt.

With the current LSI driver, we have two possible causes for this.  One
is the usual ACPI screw up that we never seem to be able to fix.  The
other is that the driver recently enabled MSI (commit
23a274c8a5adafc74a66f16988776fc7dd6f6e51 in v2.6.26-rc1).  For the
former, just follow the usual ACPI screw up recipe.  For the latter, you
should see this message in the boot up:

mptbase: ioc0: PCI-MSI enabled

MSI can be turned off again by using the module parameter
mpt_msi_enable=0.

Unfortunately, the true fix is to find out if the motherboard really has
a global MSI problem (and I know MSI works with the LSI because I have a
1030 in an ia64 system here working just fine) and add it to the PCI
quirks file as unable to use MSI.

James



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
  2008-07-08 14:08       ` James Bottomley
@ 2008-07-08 16:51         ` Bjorn Helgaas
  2008-07-08 17:23           ` James Bottomley
  0 siblings, 1 reply; 16+ messages in thread
From: Bjorn Helgaas @ 2008-07-08 16:51 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, Support, Software, linux-scsi, linux-acpi,
	bugme-daemon, Moore, Eric Dean, kurk

On Tuesday 08 July 2008 08:08:46 am James Bottomley wrote:
> That said, this line:
> 
> mptbase: ioc0: ERROR - Doorbell ACK timeout (count=4999), IntStatus=80000009!
> 
> is absolutely characteristic of a lost interrupt.
> 
> With the current LSI driver, we have two possible causes for this.  One
> is the usual ACPI screw up that we never seem to be able to fix.

Which ACPI screw up is that?  And what's the usual recipe?

I know about the ancient "pci=routeirq" recipe, but as far as I know,
there are no current problems that require that.

> The 
> other is that the driver recently enabled MSI (commit
> 23a274c8a5adafc74a66f16988776fc7dd6f6e51 in v2.6.26-rc1).  For the
> former, just follow the usual ACPI screw up recipe.  For the latter, you
> should see this message in the boot up:
> 
> mptbase: ioc0: PCI-MSI enabled
> 
> MSI can be turned off again by using the module parameter
> mpt_msi_enable=0.
> 
> Unfortunately, the true fix is to find out if the motherboard really has
> a global MSI problem (and I know MSI works with the LSI because I have a
> 1030 in an ia64 system here working just fine) and add it to the PCI
> quirks file as unable to use MSI.
> 
> James

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
  2008-07-08 16:51         ` Bjorn Helgaas
@ 2008-07-08 17:23           ` James Bottomley
  2008-07-08 20:56             ` Bjorn Helgaas
  0 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2008-07-08 17:23 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andrew Morton, Support, Software, linux-scsi, linux-acpi,
	bugme-daemon, Moore, Eric Dean, kurk

On Tue, 2008-07-08 at 10:51 -0600, Bjorn Helgaas wrote:
> On Tuesday 08 July 2008 08:08:46 am James Bottomley wrote:
> > That said, this line:
> > 
> > mptbase: ioc0: ERROR - Doorbell ACK timeout (count=4999), IntStatus=80000009!
> > 
> > is absolutely characteristic of a lost interrupt.
> > 
> > With the current LSI driver, we have two possible causes for this.  One
> > is the usual ACPI screw up that we never seem to be able to fix.
> 
> Which ACPI screw up is that?  And what's the usual recipe?

The usual screw up where subtle ACPI breakage from release to release
causes some IRQs to get misrouted.

Usually you start with noacpi and cycle through the pci routing options

> I know about the ancient "pci=routeirq" recipe, but as far as I know,
> there are no current problems that require that.

If you actually read this bug report, you'll see there was a message

ACPI: Resource is not an IRQ entry

Just before the fusion IRQ failed to get delivered, so I think it's a
good indicator that there *are* ACPI problems ...

James



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
  2008-07-08 17:23           ` James Bottomley
@ 2008-07-08 20:56             ` Bjorn Helgaas
  2008-07-08 21:47               ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: Bjorn Helgaas @ 2008-07-08 20:56 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, Support, Software, linux-scsi, linux-acpi,
	bugme-daemon, Moore, Eric Dean, kurk

On Tuesday 08 July 2008 11:23:33 am James Bottomley wrote:
> On Tue, 2008-07-08 at 10:51 -0600, Bjorn Helgaas wrote:
> > Which ACPI screw up is that?  And what's the usual recipe?
> 
> The usual screw up where subtle ACPI breakage from release to release
> causes some IRQs to get misrouted.
> 
> Usually you start with noacpi and cycle through the pci routing options

Don't worry, I wasn't trying to talk you out of an ACPI bug report;
I just wanted to get enough specifics so I could see whether it was
something I could fix.

> If you actually read this bug report, you'll see there was a message
> 
> ACPI: Resource is not an IRQ entry
> 
> Just before the fusion IRQ failed to get delivered, so I think it's a
> good indicator that there *are* ACPI problems ...

These messages also happen with 2.6.25, where the MPT Fusion driver
worked, so Kurk opened a separate bugzilla,
  http://bugzilla.kernel.org/show_bug.cgi?id=11049
for them.

Yakui Zhao thinks the messages are harmless because they're
related to interrupt link devices that we don't use in IOAPIC mode,
and given that the driver works in 2.6.25, that seems plausible
to me.

Regardless, the messages are alarming and annoying.  I'd like
to understand them better, but I'll pursue that in the 11049
bugzilla.

Bjorn

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
  2008-07-08 20:56             ` Bjorn Helgaas
@ 2008-07-08 21:47               ` Andrew Morton
  2008-07-08 21:57                 ` James Bottomley
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2008-07-08 21:47 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: James Bottomley, Support, Software, linux-scsi, linux-acpi,
	bugme-daemon, Moore, Eric Dean, kurk

On Tue, 8 Jul 2008 14:56:53 -0600 Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:

> On Tuesday 08 July 2008 11:23:33 am James Bottomley wrote:
> > On Tue, 2008-07-08 at 10:51 -0600, Bjorn Helgaas wrote:
> > > Which ACPI screw up is that?  And what's the usual recipe?
> > 
> > The usual screw up where subtle ACPI breakage from release to release
> > causes some IRQs to get misrouted.
> > 
> > Usually you start with noacpi and cycle through the pci routing options
> 
> Don't worry, I wasn't trying to talk you out of an ACPI bug report;
> I just wanted to get enough specifics so I could see whether it was
> something I could fix.
> 
> > If you actually read this bug report, you'll see there was a message
> > 
> > ACPI: Resource is not an IRQ entry
> > 
> > Just before the fusion IRQ failed to get delivered, so I think it's a
> > good indicator that there *are* ACPI problems ...
> 
> These messages also happen with 2.6.25, where the MPT Fusion driver
> worked, so Kurk opened a separate bugzilla,
>   http://bugzilla.kernel.org/show_bug.cgi?id=11049
> for them.
> 
> Yakui Zhao thinks the messages are harmless because they're
> related to interrupt link devices that we don't use in IOAPIC mode,
> and given that the driver works in 2.6.25, that seems plausible
> to me.
> 
> Regardless, the messages are alarming and annoying.  I'd like
> to understand them better, but I'll pursue that in the 11049
> bugzilla.
> 

Let us not forget the other part of this report:

BUG: unable to handle kernel NULL pointer dereference at 0000034c
IP: [<f885cc5e>] :mptspi:mptspi_dv_renegotiate_work+0xa/0x9f
Oops: 0000 [#1] SMP


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
  2008-07-08 21:47               ` Andrew Morton
@ 2008-07-08 21:57                 ` James Bottomley
  2008-07-09  8:08                   ` Prakash, Sathya
  2008-07-10 14:24                   ` kurk
  0 siblings, 2 replies; 16+ messages in thread
From: James Bottomley @ 2008-07-08 21:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Bjorn Helgaas, Support, Software, linux-scsi, linux-acpi,
	bugme-daemon, Moore, Eric Dean, kurk

On Tue, 2008-07-08 at 14:47 -0700, Andrew Morton wrote:
> On Tue, 8 Jul 2008 14:56:53 -0600 Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
> 
> > On Tuesday 08 July 2008 11:23:33 am James Bottomley wrote:
> > > On Tue, 2008-07-08 at 10:51 -0600, Bjorn Helgaas wrote:
> > > > Which ACPI screw up is that?  And what's the usual recipe?
> > > 
> > > The usual screw up where subtle ACPI breakage from release to release
> > > causes some IRQs to get misrouted.
> > > 
> > > Usually you start with noacpi and cycle through the pci routing options
> > 
> > Don't worry, I wasn't trying to talk you out of an ACPI bug report;
> > I just wanted to get enough specifics so I could see whether it was
> > something I could fix.
> > 
> > > If you actually read this bug report, you'll see there was a message
> > > 
> > > ACPI: Resource is not an IRQ entry
> > > 
> > > Just before the fusion IRQ failed to get delivered, so I think it's a
> > > good indicator that there *are* ACPI problems ...
> > 
> > These messages also happen with 2.6.25, where the MPT Fusion driver
> > worked, so Kurk opened a separate bugzilla,
> >   http://bugzilla.kernel.org/show_bug.cgi?id=11049
> > for them.
> > 
> > Yakui Zhao thinks the messages are harmless because they're
> > related to interrupt link devices that we don't use in IOAPIC mode,
> > and given that the driver works in 2.6.25, that seems plausible
> > to me.
> > 
> > Regardless, the messages are alarming and annoying.  I'd like
> > to understand them better, but I'll pursue that in the 11049
> > bugzilla.
> > 
> 
> Let us not forget the other part of this report:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000034c
> IP: [<f885cc5e>] :mptspi:mptspi_dv_renegotiate_work+0xa/0x9f
> Oops: 0000 [#1] SMP

That's fixed in the scsi-rc-fixes tree ... but it's a symptom, not a
cause.  If essential storage is on this adapter, the system will still
be unbootable.

James



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
  2008-07-08 21:57                 ` James Bottomley
@ 2008-07-09  8:08                   ` Prakash, Sathya
  2008-07-10 14:24                     ` kurk
  2008-07-10 14:52                     ` kurk
  2008-07-10 14:24                   ` kurk
  1 sibling, 2 replies; 16+ messages in thread
From: Prakash, Sathya @ 2008-07-09  8:08 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, Bjorn Helgaas, Support, Software,
	linux-scsi@vger.kernel.org, linux-acpi@vger.kernel.org,
	bugme-daemon@bugzilla.kernel.org, Moore, Eric, kurk@shiftmail.org

This may  be a problem due to enabling MSI for SPI controllers. I have posted another message in the list providing the correction patch which is already in scsi-misc tree. 
If the problem is gone with changing the module parameter mpt_msi_enable=0 or by applying the patch http://marc.info/?l=linux-scsi&m=121131228827682&w=4 then it might be due to MSI enabling.


On Tue, Jul 08, 2008 at 05:57:35PM -0400, James Bottomley wrote:
> On Tue, 2008-07-08 at 14:47 -0700, Andrew Morton wrote:
> > On Tue, 8 Jul 2008 14:56:53 -0600 Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:
> >
> > > On Tuesday 08 July 2008 11:23:33 am James Bottomley wrote:
> > > > On Tue, 2008-07-08 at 10:51 -0600, Bjorn Helgaas wrote:
> > > > > Which ACPI screw up is that?  And what's the usual recipe?
> > > >
> > > > The usual screw up where subtle ACPI breakage from release to release
> > > > causes some IRQs to get misrouted.
> > > >
> > > > Usually you start with noacpi and cycle through the pci routing options
> > >
> > > Don't worry, I wasn't trying to talk you out of an ACPI bug report;
> > > I just wanted to get enough specifics so I could see whether it was
> > > something I could fix.
> > >
> > > > If you actually read this bug report, you'll see there was a message
> > > >
> > > > ACPI: Resource is not an IRQ entry
> > > >
> > > > Just before the fusion IRQ failed to get delivered, so I think it's a
> > > > good indicator that there *are* ACPI problems ...
> > >
> > > These messages also happen with 2.6.25, where the MPT Fusion driver
> > > worked, so Kurk opened a separate bugzilla,
> > >   http://bugzilla.kernel.org/show_bug.cgi?id=11049
> > > for them.
> > >
> > > Yakui Zhao thinks the messages are harmless because they're
> > > related to interrupt link devices that we don't use in IOAPIC mode,
> > > and given that the driver works in 2.6.25, that seems plausible
> > > to me.
> > >
> > > Regardless, the messages are alarming and annoying.  I'd like
> > > to understand them better, but I'll pursue that in the 11049
> > > bugzilla.
> > >
> >
> > Let us not forget the other part of this report:
> >
> > BUG: unable to handle kernel NULL pointer dereference at 0000034c
> > IP: [<f885cc5e>] :mptspi:mptspi_dv_renegotiate_work+0xa/0x9f
> > Oops: 0000 [#1] SMP
> 
> That's fixed in the scsi-rc-fixes tree ... but it's a symptom, not a
> cause.  If essential storage is on this adapter, the system will still
> be unbootable.
> 
> James
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
  2008-07-08 21:57                 ` James Bottomley
  2008-07-09  8:08                   ` Prakash, Sathya
@ 2008-07-10 14:24                   ` kurk
  1 sibling, 0 replies; 16+ messages in thread
From: kurk @ 2008-07-10 14:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: James Bottomley, Andrew Morton, Support, Software, linux-scsi,
	linux-acpi, bugme-daemon, Moore, Eric Dean, Prakash, Sathya

Bjorn,
you were right, the serial dump was not complete because there was the 
"quiet" option specified as kernel parameter.
I just uploaded the full (non-quiet) serial dump as attachment on the 
bugzilla web interface.
Thank you

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
  2008-07-09  8:08                   ` Prakash, Sathya
@ 2008-07-10 14:24                     ` kurk
  2008-07-10 14:52                     ` kurk
  1 sibling, 0 replies; 16+ messages in thread
From: kurk @ 2008-07-10 14:24 UTC (permalink / raw)
  To: Prakash, Sathya, James Bottomley
  Cc: Andrew Morton, Bjorn Helgaas, Support, Software,
	linux-scsi@vger.kernel.org, linux-acpi@vger.kernel.org,
	bugme-daemon@bugzilla.kernel.org, Moore, Eric

Hi all,
Good news! James and Sathya were correct, the bug is related to MSI: 
specifying mpt_msi_enable=0 as option for the mptbase module solves the 
problem and the system can boot as usual.

Having said this, do you still want me to try a patch, or perform some 
additional test?

Just out of curiosity: do you intend to eventually modify the kernel so 
to "support" and work around buggy hardware like the one we have (IBM 
xSeries 335), so that Linux can work out of the box even on this hardware?

Thank everybody for your help

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
  2008-07-09  8:08                   ` Prakash, Sathya
  2008-07-10 14:24                     ` kurk
@ 2008-07-10 14:52                     ` kurk
  2008-07-10 23:44                       ` Andrew Morton
  1 sibling, 1 reply; 16+ messages in thread
From: kurk @ 2008-07-10 14:52 UTC (permalink / raw)
  To: Prakash, Sathya
  Cc: James Bottomley, Andrew Morton, Bjorn Helgaas, Support, Software,
	linux-scsi@vger.kernel.org, linux-acpi@vger.kernel.org,
	bugme-daemon@bugzilla.kernel.org, Moore, Eric

Prakash, Sathya wrote:
> This may  be a problem due to enabling MSI for SPI controllers. I have posted another message in the list providing the correction patch which is already in scsi-misc tree. 
> If the problem is gone with changing the module parameter mpt_msi_enable=0 or by applying the patch http://marc.info/?l=linux-scsi&m=121131228827682&w=4 then it might be due to MSI enabling.
>   
Another good news: I confirm that yes, the problem is also fixed by the 
patch linked above by Sathya, and in that case it is not needed to 
specify option mpt_msi_enable=0 for mptbase. Any one of the two (patch 
or option) is enough to fix the problem.
It would be nice to see this patch in the final release of the 2.6.26 kernel
Thank you

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
  2008-07-10 14:52                     ` kurk
@ 2008-07-10 23:44                       ` Andrew Morton
  2008-07-11  0:42                         ` James Bottomley
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2008-07-10 23:44 UTC (permalink / raw)
  To: kurk
  Cc: Prakash, Sathya, James Bottomley, Bjorn Helgaas,
	Support, Software, linux-scsi@vger.kernel.org,
	linux-acpi@vger.kernel.org, bugme-daemon@bugzilla.kernel.org,
	Moore, Eric

On Thu, 10 Jul 2008 16:52:01 +0200 kurk <kurk@shiftmail.org> wrote:

> Prakash, Sathya wrote:
> > This may  be a problem due to enabling MSI for SPI controllers. I have posted another message in the list providing the correction patch which is already in scsi-misc tree. 
> > If the problem is gone with changing the module parameter mpt_msi_enable=0 or by applying the patch http://marc.info/?l=linux-scsi&m=121131228827682&w=4 then it might be due to MSI enabling.
> >   
> Another good news: I confirm that yes, the problem is also fixed by the 
> patch linked above by Sathya, and in that case it is not needed to 
> specify option mpt_msi_enable=0 for mptbase. Any one of the two (patch 
> or option) is enough to fix the problem.
> It would be nice to see this patch in the final release of the 2.6.26 kernel
> Thank you

James, shouldn't we put that into 2.6.26?

That whole patch series looks pretty desirable actually..

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
  2008-07-10 23:44                       ` Andrew Morton
@ 2008-07-11  0:42                         ` James Bottomley
  2008-07-11  4:33                           ` Prakash, Sathya
  0 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2008-07-11  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kurk, Prakash, Sathya, Bjorn Helgaas, Support, Software,
	linux-scsi@vger.kernel.org, linux-acpi@vger.kernel.org,
	bugme-daemon@bugzilla.kernel.org, Moore, Eric

On Thu, 2008-07-10 at 16:44 -0700, Andrew Morton wrote:
> On Thu, 10 Jul 2008 16:52:01 +0200 kurk <kurk@shiftmail.org> wrote:
> 
> > Prakash, Sathya wrote:
> > > This may  be a problem due to enabling MSI for SPI controllers. I have posted another message in the list providing the correction patch which is already in scsi-misc tree. 
> > > If the problem is gone with changing the module parameter mpt_msi_enable=0 or by applying the patch http://marc.info/?l=linux-scsi&m=121131228827682&w=4 then it might be due to MSI enabling.
> > >   
> > Another good news: I confirm that yes, the problem is also fixed by the 
> > patch linked above by Sathya, and in that case it is not needed to 
> > specify option mpt_msi_enable=0 for mptbase. Any one of the two (patch 
> > or option) is enough to fix the problem.
> > It would be nice to see this patch in the final release of the 2.6.26 kernel
> > Thank you
> 
> James, shouldn't we put that into 2.6.26?

I'm still not sure ... if it's a fault on the board with MSI, then yes,
we need it in ... although the form would then be wrong because we
probably should be identifying the faulty parts and blacklisting them.

If it's actually a fault on the motherboard with MSI, then no, this
isn't the patch series that should be in we need the motherboard strings
to blacklist it.

Unfortunately, I can't seem to get an answer out of LSI on this
question, It looks like the commit will cherry pick easily enough ...
although now I look at it the parameter's description is wrong.

> That whole patch series looks pretty desirable actually..

Well, it was billed as a driver update ... and it has a lot more than
just trivial changes, so on an eve of release quality issue, I'd tend to
say that wouldn't be a good idea.

James



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
  2008-07-11  0:42                         ` James Bottomley
@ 2008-07-11  4:33                           ` Prakash, Sathya
  2008-07-11 14:05                             ` James Bottomley
  0 siblings, 1 reply; 16+ messages in thread
From: Prakash, Sathya @ 2008-07-11  4:33 UTC (permalink / raw)
  To: James Bottomley, Andrew Morton
  Cc: kurk, Bjorn Helgaas, Support, Software,
	linux-scsi@vger.kernel.org, linux-acpi@vger.kernel.org,
	bugme-daemon@bugzilla.kernel.org, Moore, Eric

I did a recheck on this, except FC 919X and 929X boards, everything else should work fine with MSI. Hence the SPI boards (1030) should work with MSI and the problem might be with the motherboard.
But we would like to keep the MSI disabled for SPI controllers since we have not tested internally with MSI and FC enabled by default for them in our recent drivers.
So I would like to request to pull in the patch to disable MSI for SPI & FC.
-Thanks
Sathya

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@HansenPartnership.com]
Sent: Friday, July 11, 2008 6:12 AM
To: Andrew Morton
Cc: kurk; Prakash, Sathya; Bjorn Helgaas; Support, Software; linux-scsi@vger.kernel.org; linux-acpi@vger.kernel.org; bugme-daemon@bugzilla.kernel.org; Moore, Eric
Subject: Re: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable

On Thu, 2008-07-10 at 16:44 -0700, Andrew Morton wrote:
> On Thu, 10 Jul 2008 16:52:01 +0200 kurk <kurk@shiftmail.org> wrote:
>
> > Prakash, Sathya wrote:
> > > This may  be a problem due to enabling MSI for SPI controllers. I have posted another message in the list providing the correction patch which is already in scsi-misc tree.
> > > If the problem is gone with changing the module parameter mpt_msi_enable=0 or by applying the patch http://marc.info/?l=linux-scsi&m=121131228827682&w=4 then it might be due to MSI enabling.
> > >
> > Another good news: I confirm that yes, the problem is also fixed by
> > the patch linked above by Sathya, and in that case it is not needed
> > to specify option mpt_msi_enable=0 for mptbase. Any one of the two
> > (patch or option) is enough to fix the problem.
> > It would be nice to see this patch in the final release of the
> > 2.6.26 kernel Thank you
>
> James, shouldn't we put that into 2.6.26?

I'm still not sure ... if it's a fault on the board with MSI, then yes, we need it in ... although the form would then be wrong because we probably should be identifying the faulty parts and blacklisting them.

If it's actually a fault on the motherboard with MSI, then no, this isn't the patch series that should be in we need the motherboard strings to blacklist it.

Unfortunately, I can't seem to get an answer out of LSI on this question, It looks like the commit will cherry pick easily enough ...
although now I look at it the parameter's description is wrong.

> That whole patch series looks pretty desirable actually..

Well, it was billed as a driver update ... and it has a lot more than just trivial changes, so on an eve of release quality issue, I'd tend to say that wouldn't be a good idea.

James



^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable
  2008-07-11  4:33                           ` Prakash, Sathya
@ 2008-07-11 14:05                             ` James Bottomley
  0 siblings, 0 replies; 16+ messages in thread
From: James Bottomley @ 2008-07-11 14:05 UTC (permalink / raw)
  To: Prakash, Sathya
  Cc: Andrew Morton, kurk, Bjorn Helgaas, Support, Software,
	linux-scsi@vger.kernel.org, linux-acpi@vger.kernel.org,
	bugme-daemon@bugzilla.kernel.org, Moore, Eric

On Fri, 2008-07-11 at 12:33 +0800, Prakash, Sathya wrote:
> I did a recheck on this, except FC 919X and 929X boards, everything
> else should work fine with MSI. Hence the SPI boards (1030) should
> work with MSI and the problem might be with the motherboard.

Right ... that's why I was asking ... my 1030 works fine with MSI.  If
there's a fault with the FC boards, then certainly they should have MSI
disabled.

The motherboard was my suspicion ... especially as older ones have SPI
and newer ones have SAS (and the older ones are most likely to have MSI
faults).  However, I think you can see from our point of view that if
the problem is the motherboard, disabling MSI in the fusion is the wrong
way to fix it.  If we do it this way, we'll promptly get another slew of
nasty bug reports for the next driver that enables MSI and doesn't work
on this platform

> But we would like to keep the MSI disabled for SPI controllers since
> we have not tested internally with MSI and FC enabled by default for
> them in our recent drivers.
> So I would like to request to pull in the patch to disable MSI for SPI & FC.

Yes, we'll do that ... I'll also see if the PCI maintainer can determine
the information needed to blacklist the motherboards so that we don't
get this all over again with them and a different driver.

James



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-07-11 14:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-11045-10286@http.bugzilla.kernel.org/>
2008-07-06 19:34 ` [Bugme-new] [Bug 11045] New: Bug in MPT Fusion 2.6.26-rc7 unbootable Andrew Morton
     [not found]   ` <C5679C710E19AF4C8D9C02FF5C72E3C133C9C76A@cosmail01.lsi.com>
2008-07-08  8:57     ` Andrew Morton
2008-07-08 14:08       ` James Bottomley
2008-07-08 16:51         ` Bjorn Helgaas
2008-07-08 17:23           ` James Bottomley
2008-07-08 20:56             ` Bjorn Helgaas
2008-07-08 21:47               ` Andrew Morton
2008-07-08 21:57                 ` James Bottomley
2008-07-09  8:08                   ` Prakash, Sathya
2008-07-10 14:24                     ` kurk
2008-07-10 14:52                     ` kurk
2008-07-10 23:44                       ` Andrew Morton
2008-07-11  0:42                         ` James Bottomley
2008-07-11  4:33                           ` Prakash, Sathya
2008-07-11 14:05                             ` James Bottomley
2008-07-10 14:24                   ` kurk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).