All of lore.kernel.org
 help / color / mirror / Atom feed
From: scameron@beardog.cce.hp.com
To: Tomas Henzl <thenzl@redhat.com>
Cc: james.bottomley@hansenpartnership.com, stephenmcameron@gmail.com,
	mikem@beardog.cce.hp.com, linux-scsi@vger.kernel.org,
	scott.teel@hp.com, scameron@beardog.cce.hp.com
Subject: Re: [PATCH 07/10] hpsa: hide logical drives with format in progress from linux
Date: Fri, 27 Sep 2013 14:11:05 -0500	[thread overview]
Message-ID: <20130927191105.GF31476@beardog.cce.hp.com> (raw)
In-Reply-To: <5245868B.4080900@redhat.com>

On Fri, Sep 27, 2013 at 03:22:19PM +0200, Tomas Henzl wrote:
> On 09/23/2013 08:34 PM, Stephen M. Cameron wrote:
> > From: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> >
> > SCSI mid layer doesn't seem to handle logical drives undergoing format
> > very well.  scsi_add_device on such devices seems to result in hitting
> > those devices with a TUR at a rate of 3Hz for awhile, transitioning
> > to hitting them with a READ(10) at a much higher rate indefinitely,
> > and at boot time, this prevents the system from coming up.  If we
> > do not expose such devices to the kernel, it isn't bothered by them.
> 
> Is the result of this patch that the drive is no more visible for the user
> and he can't follow the formatting progress? 
> I think a better option is to fix the kernel to handle formatting devices better
> or harden the hpsa so it can cope with TURs or reads (ignore) from a formatting device.

So here is the behavior I see with linux-3.12-rc2 when create a logical
drive with rapid parity initialization enabled and then reboot
before the drive finishes.  Note that scsi 0:0:0:1 is
the device that's in this state.  Interspersed are some notes from
me, prefixed "smc> "

Summary: First you see sd (I think) printing dots very slowly.
Then you see udev get angry.  Then a couple stack traces one
from modprobe and one from dmraid, and the system doesn't
boot up.  20-something minutes have elapsed at this point. It
may eventually boot when the RPI finally finishes, but at this
point, I don't care, because 20 minutes is too long to be holding
things up.


HP HPSA Driver (v 3.4.0-1)                                                      
hpsa 0000:02:00.0: can't disable ASPM; OS doesn't have ASPM control             
hpsa 0000:02:00.0: MSIX                                                         
hpsa 0000:02:00.0: hpsa0: <0x323b> at IRQ 64 using DAC                          
scsi0 : hpsa                                                                    
hpsa 0000:02:00.0: RAID              device c0b3t0l0 added.                     
hpsa 0000:02:00.0: Direct-Access     device c0b0t0l0 added.                     
hpsa 0000:02:00.0: Direct-Access     device c0b0t0l1 added.                     
hpsa 0000:02:00.0: Direct-Access     device c0b0t0l2 added.                     
usb 1-1.3: new low-speed USB device number 3 using ehci-pci                     
scsi 0:3:0:0: RAID              HP       P420i            5.19 PQ: 0 ANSI: 5    
scsi 0:0:0:0: Direct-Access     HP       LOGICAL VOLUME   5.19 PQ: 0 ANSI: 5    
scsi 0:0:0:1: Direct-Access     HP       LOGICAL VOLUME   5.19 PQ: 0 ANSI: 5    
scsi 0:0:0:2: Direct-Access     HP       LOGICAL VOLUME   5.19 PQ: 0 ANSI: 5    
ata_piix 0000:00:1f.2: MAP [                                                    
 P0 P2 P1 P3 ]                                                                  
usb 1-1.3: New USB device found, idVendor=0624, idProduct=0341                  
usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0             
usb 1-1.3: Product: HP 336047-B21                                               
usb 1-1.3: Manufacturer: Avocent                                                
input: Avocent HP 336047-B21 as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.31
hid-generic 0003:0624:0341.0001: input,hidraw0: USB HID v1.10 Keyboard [Avocent0
scsi1 : ata_piix                                                                
scsi2 : ata_piix                                                                
ata1: SATA max UDMA/133 cmd 0x4000 ctl 0x4008 bmdma 0x4020 irq 17               
ata2: SATA max UDMA/133 cmd 0x4010 ctl 0x4018 bmdma 0x4028 irq 17               
input: Avocent HP 336047-B21 as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.32
hid-generic 0003:0624:0341.0002: input,hidraw1: USB HID v1.10 Mouse [Avocent HP1
sd 0:0:0:0: [sda] 2344160432 512-byte logical blocks: (1.20 TB/1.09 TiB)        
sd 0:0:0:1: [sdb] Spinning up disk...                                           
usb 2-1.3: new high-speed USB device number 3 using ehci-pci                    
sd 0:0:0:2: [sdc] 390651840 512-byte logical blocks: (200 GB/186 GiB)           
sd 0:0:0:0: [sda] Write Protect is off                                          
sd 0:0:0:2: [sdc] Write Protect is off                                          
sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DA
sd 0:0:0:2: [sdc] Write cache: disabled, read cache: enabled, doesn't support DA
 sdc: unknown partition table                                                   
sd 0:0:0:2: [sdc] Attached SCSI disk                                            
 sda: sda1 sda2 sda3                                                            
sd 0:0:0:0: [sda] Attached SCSI disk                                            
usb 2-1.3: New USB device found, idVendor=0424, idProduct=2660                  
usb 2-1.3: New USB device strings: Mfr=0, Product=0, SerialNumber=0             
hub 2-1.3:1.0: USB hub found                                                    
hub 2-1.3:1.0: 2 ports detected                                                 
Switched to clocksource tsc                                                     
.ata2.01: failed to resume link (SControl 0)                                    
ata2.00: SATA link down (SStatus 0 SControl 300)                                
ata2.01: SATA link down (SStatus 4 SControl 0)                                  
ata1.01: failed to resume link (SControl 0)                                     
ata1.00: SATA link down (SStatus 0 SControl 300)                                
ata1.01: SATA link down (SStatus 4 SControl 0)                                  
................................................................................
sd 0:0:0:1: [sdb] 1757614684 512-byte logical blocks: (899 GB/838 GiB)          
sd 0:0:0:1: [sdb] 4096-byte physical blocks                                     
sd 0:0:0:1: [sdb] Write Protect is off                                          
sd 0:0:0:1: [sdb] Write cache: disabled, read cache: enabled, doesn't support DA
sd 0:0:0:1: [sdb] Spinning up disk...                                           
............................................................................... 


smc> there is a loooooong pause while it prints those dots above.
smc> below, udev starts getting angry...

 
udevadm settle - timeout of 180 seconds reached, the event queue contains:      
  /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:35/PNP0A06:00/PNP0501:00)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:3:0/0:3:0:0 ()
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:3:0/0:3:0:0/s)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:3:0/0:3:0:0/b)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0 ()
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0/s)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0/b)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:1 ()
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:1/s)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:1/b)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:2 ()
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:2/s)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:2/b)
  /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.0/input/input1 (2)
  /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.0/input/input1/ev)
  /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.1/input/input2 (2)
  /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.1/input/input2/mo)
  /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.1/input/input2/ev)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0/s)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:1/s)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:2/s)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:2/b)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0/b)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0/b)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0/b)
  /sys/devices/pci0000:00/0000:00:02.2/0000:02:00.0/host0/target0:0:0/0:0:0:0/b)
udevd[130]: worker [175] unexpectedly returned with status 0x0100               
                                                                                
udevd[130]: worker [175] failed while handling '/devices/pci0000:00/0000:00:02.'
                                                                                
udevd[130]: worker [176] unexpectedly returned with status 0x0100               
                                                                                
udevd[130]: worker [176] failed while handling '/devices/pci0000:00/0000:00:02.'
                                                                                
udevd[130]: worker [178] unexpectedly returned with status 0x0100               
                                                                                
udevd[130]: worker [178] failed while handling '/devices/pci0000:00/0000:00:02.'
                                                                                
udevd[130]: worker [179] unexpectedly returned with status 0x0100               
                                                                                
udevd[130]: worker [179] failed while handling '/devices/pci0000:00/0000:00:02.'
                                                                                
EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)         
dracut: Mounted root filesystem /dev/sda2                                       
.SELinux:  Disabled at runtime.                                                 
type=1404 audit(1380289585.871:2): selinux=0 auid=4294967295 ses=4294967295     
dracut:                                                                         
dracut: Switching root                                                          
                Welcome to Red Hatreadahead: starting                           
 Enterprise Linux Server                                                        
.Starting udev: udev: starting version 147                                      
WARNING! power/level is deprecated; use power/control instead                   
.G.pps_core: LinuxPPS API ver. 1 registered                                     
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@>
PTP clock support registered                                                    
tg3.c:v3.133 (Jul 29, 2013)                                                     
tg3 0000:03:00.0 eth0: Tigon3 [partno(629133-001) rev 5719001] (PCI Express) MA0
tg3 0000:03:00.0 eth0: attached PHY is 5719C (10/100/1000Base-T Ethernet) (Wire)
tg3 0000:03:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]       
tg3 0000:03:00.0 eth0: dma_rwctrl[00000001] dma_mask[64-bit]                    
tg3 0000:03:00.1 eth1: Tigon3 [partno(629133-001) rev 5719001] (PCI Express) MA1
tg3 0000:03:00.1 eth1: attached PHY is 5719C (10/100/1000Base-T Ethernet) (Wire)
tg3 0000:03:00.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]       
tg3 0000:03:00.1 eth1: dma_rwctrl[00000001] dma_mask[64-bit]                    
tg3 0000:03:00.2 eth2: Tigon3 [partno(629133-001) rev 5719001] (PCI Express) MA2
tg3 0000:03:00.2 eth2: attached PHY is 5719C (10/100/1000Base-T Ethernet) (Wire)
tg3 0000:03:00.2 eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]       
tg3 0000:03:00.2 eth2: dma_rwctrl[00000001] dma_mask[64-bit]                    
tg3 0000:03:00.3 eth3: Tigon3 [partno(629133-001) rev 5719001] (PCI Express) MA3
tg3 0000:03:00.3 eth3: attached PHY is 5719C (10/100/1000Base-T Ethernet) (Wire)
tg3 0000:03:00.3 eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]       
tg3 0000:03:00.3 eth3: dma_rwctrl[00000001] dma_mask[64-bit]                    
dca service started, version 1.12.1                                             
ioatdma: Intel(R) QuickData Technology Driver 4.00                              
ioatdma 0000:00:04.0: can't derive routing for PCI INT A                        
ioatdma 0000:00:04.0: PCI INT A: no GSI - using ISA IRQ 5                       
ioatdma 0000:00:04.1: can't derive routing for PCI INT B                        
ioatdma 0000:00:04.1: PCI INT B: no GSI - using ISA IRQ 7                       
ioatdma 0000:00:04.2: can't derive routing for PCI INT C                        
ioatdma 0000:00:04.2: PCI INT C: no GSI - using ISA IRQ 10                      
ioatdma 0000:00:04.3: can't derive routing for PCI INT D                        
ioatdma 0000:00:04.3: PCI INT D: no GSI - using ISA IRQ 10                      
ioatdma 0000:00:04.4: can't derive routing for PCI INT A                        
ioatdma 0000:00:04.4: PCI INT A: no GSI - using ISA IRQ 5                       
ioatdma 0000:00:04.5: can't derive routing for PCI INT B                        
ioatdma 0000:00:04.5: PCI INT B: no GSI - using ISA IRQ 7                       
ioatdma 0000:00:04.6: can't derive routing for PCI INT C                        
ioatdma 0000:00:04.6: PCI INT C: no GSI - using ISA IRQ 10                      
ioatdma 0000:00:04.7: can't derive routing for PCI INT D                        
ioatdma 0000:00:04.7: PCI INT D: no GSI - using ISA IRQ 10                      
.hpwdt 0000:01:00.0: HP Watchdog Timer Driver: NMI decoding initialized, allow )
hpwdt 0000:01:00.0: HP Watchdog Timer Driver: 1.3.2, timer margin: 30 seconds (.
                                                                                
ACPI Warning: 0x0000000000000928-0x000000000000092f SystemIO conflicts with Reg)
ACPI: If an ACPI driver is available for this device, you should use it insteadr
lpc_ich: Resource conflict(s) found affecting gpio_ich                          
EDAC MC: Ver: 3.0.0                                                             
EDAC sbridge: Seeking for: dev 0e.0 PCI ID 8086:3ca0                            
EDAC sbridge: Seeking for: dev 0e.0 PCI ID 8086:3ca0                            
EDAC sbridge: Seeking for: dev 0f.0 PCI ID 8086:3ca8                            
EDAC sbridge: Seeking for: dev 0f.0 PCI ID 8086:3ca8                            
EDAC sbridge: Seeking for: dev 0f.1 PCI ID 8086:3c71                            
EDAC sbridge: Seeking for: dev 0f.1 PCI ID 8086:3c71                            
EDAC sbridge: Seeking for: dev 0f.2 PCI ID 8086:3caa                            
EDAC sbridge: Seeking for: dev 0f.2 PCI ID 8086:3caa                            
EDAC sbridge: Seeking for: dev 0f.3 PCI ID 8086:3cab                            
EDAC sbridge: Seeking for: dev 0f.3 PCI ID 8086:3cab                            
EDAC sbridge: Seeking for: dev 0f.4 PCI ID 8086:3cac                            
EDAC sbridge: Seeking for: dev 0f.4 PCI ID 8086:3cac                            
EDAC sbridge: Seeking for: dev 0f.5 PCI ID 8086:3cad                            
EDAC sbridge: Seeking for: dev 0f.5 PCI ID 8086:3cad                            
EDAC sbridge: Seeking for: dev 11.0 PCI ID 8086:3cb8                            
EDAC sbridge: Seeking for: dev 11.0 PCI ID 8086:3cb8                            
EDAC sbridge: Seeking for: dev 0c.6 PCI ID 8086:3cf4                            
EDAC sbridge: Seeking for: dev 0c.6 PCI ID 8086:3cf4                            
EDAC sbridge: Seeking for: dev 0c.7 PCI ID 8086:3cf6                            
EDAC sbridge: Seeking for: dev 0c.7 PCI ID 8086:3cf6                            
EDAC sbridge: Seeking for: dev 0d.6 PCI ID 8086:3cf5                            
EDAC sbridge: Seeking for: dev 0d.6 PCI ID 8086:3cf5                            
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 000
EDAC sbridge: Driver loaded.                                                    
scsi 0:3:0:0: Attached scsi generic sg0 type 12                                 
sd 0:0:0:0: Attached scsi generic sg1 type 0                                    
sd 0:0:0:1: Attached scsi generic sg2 type 0                                    
sd 0:0:0:2: Attached scsi generic sg3 type 0                                    
input: PC Speaker as /devices/platform/pcspkr/input/input3                      
microcode: CPU0 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU1 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU2 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU3 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU4 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU5 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU6 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU7 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU8 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU9 sig=0x206d7, pf=0x1, revision=0x70d                             
microcode: CPU10 sig=0x206d7, pf=0x1, revision=0x70d                            
microcode: CPU11 sig=0x206d7, pf=0x1, revision=0x70d                            
microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter a
ipmi message handler version 39.2                                               
IPMI System Interface driver.                                                   
ipmi_si: probing via ACPI                                                       
ipmi_si 00:02: [io  0x0ca2-0x0ca3] regsize 1 spacing 1 irq 0                    
ipmi_si: Adding ACPI-specified kcs state machine                                
ipmi_si: probing via SMBIOS                                                     
ipmi_si: SMBIOS: io 0xca2 regsize 1 spacing 1 irq 0                             
ipmi_si: Adding SMBIOS-specified kcs state machine duplicate interface          
ipmi_si: probing via SPMI                                                       
ipmi_si: SPMI: io 0xca2 regsize 2 spacing 2 irq 0                               
ipmi_si: Adding SPMI-specified kcs state machine duplicate interface            
ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca2, slave ad0
ipmi_si 00:02: Found new BMC (man_id: 0x00000b, prod_id: 0x2000, dev_id: 0x13)  
ipmi_si 00:02: IPMI kcs interface initialized                                   
iTCO_vendor_support: vendor-support=0                                           
iTCO_wdt: Intel TCO WatchDog Timer Driver v1.10                                 
iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware/BIOS      
[  O.K  ]                                                                       
tun: Universal TUN/TAP device driver, 1.6                                       
tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>                          
Setting hostname localhost.localdomain:  [  OK  ]                               
device-mapper: uevent: version 1.0.3                                            
device-mapper: ioctl: 4.26.0-ioctl (2013-08-15) initialised: dm-devel@redhat.com
...............not responding...                                                
INFO: task modprobe:487 blocked for more than 120 seconds.                      
      Not tainted 3.12.0-rc2+ #1                                                
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.       
modprobe        D 0000000000000000     0   487      1 0x00000000                
 ffff880c0bc6bdc8 0000000000000046 ffffffff8107af7d ffff880c0bc6a000            
 ffff880c0bc6bfd8 ffff880c0bc6a000 ffff880c0bc6a010 ffff880c0bc6a000            
 ffff880c0bc6bfd8 ffff880c0bc6a000 ffff880c09a16440 ffff880c0ee6a540            
Call Trace:                                                                     
 [<ffffffff8107af7d>] ? lowest_in_progress+0x4d/0x60                            
 [<ffffffff81592109>] schedule+0x29/0x70                                        
 [<ffffffff8107b005>] async_synchronize_cookie_domain+0x75/0x120                
 [<ffffffff81073c20>] ? wake_up_bit+0x40/0x40                                   
 [<ffffffff8107b0e8>] async_synchronize_full_domain+0x18/0x20                   
 [<ffffffff8107b100>] async_synchronize_full+0x10/0x20                          
 [<ffffffff810c7c65>] do_init_module+0x135/0x1b0                                
 [<ffffffff810c9932>] load_module+0x502/0x620                                   
 [<ffffffff810c7170>] ? __unlink_module+0x30/0x30                               
 [<ffffffff810c6760>] ? module_sect_show+0x30/0x30                              
 [<ffffffff810c9bd6>] SyS_init_module+0x96/0xc0                                 
 [<ffffffff8159d1d2>] system_call_fastpath+0x16/0x1b                            
no locks held by modprobe/487.     
INFO: task dmraid:6718 blocked for more than 120 seconds.                       
      Not tainted 3.12.0-rc2+ #1                                                
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.       
dmraid          D 0000000000000000     0  6718    553 0x00000000                
 ffff8800b9a51ae8 0000000000000046 ffff880c0a42d200 ffff8800b9a50000            
 ffff8800b9a51fd8 ffff8800b9a50000 ffff8800b9a50010 ffff8800b9a50000            
 ffff8800b9a51fd8 ffff8800b9a50000 ffff880c0a42c940 ffffffff81a104c0            
Call Trace:                                                                     
 [<ffffffff81592109>] schedule+0x29/0x70                                        
 [<ffffffff81592467>] schedule_preempt_disabled+0x27/0x40                       
 [<ffffffff8158f84a>] mutex_lock_nested+0x13a/0x340                             
 [<ffffffff811cc21e>] ? __blkdev_get+0x6e/0x490                                 
 [<ffffffff811cc21e>] __blkdev_get+0x6e/0x490                                   
 [<ffffffff811cb6a9>] ? bd_acquire+0x99/0xf0                                    
 [<ffffffff811cc69c>] blkdev_get+0x5c/0x210                                     
 [<ffffffff8159446b>] ? _raw_spin_unlock+0x2b/0x50                              
 [<ffffffff811cc850>] ? blkdev_get+0x210/0x210                                  
 [<ffffffff811cc8b2>] blkdev_open+0x62/0x80                                     
 [<ffffffff8118d46e>] do_dentry_open+0x24e/0x2e0                                
 [<ffffffff8118d615>] finish_open+0x35/0x50                                     
 [<ffffffff811a0ab6>] do_last+0x436/0x7e0                                       
 [<ffffffff811a0f24>] path_openat+0xc4/0x490                                    
 [<ffffffff811a142a>] do_filp_open+0x4a/0xa0                                    
 [<ffffffff811ae2c1>] ? __alloc_fd+0xb1/0x160                                   
 [<ffffffff8115f01f>] ? vm_munmap+0x5f/0x80                                     
 [<ffffffff8118e91a>] do_sys_open+0x11a/0x230                                   
 [<ffffffff81078223>] ? up_write+0x23/0x40                                      
 [<ffffffff81296909>] ? lockdep_sys_exit_thunk+0x35/0x67                        
 [<ffffffff8118ea6e>] SyS_open+0x1e/0x20                                        
 [<ffffffff8159d1d2>] system_call_fastpath+0x16/0x1b                            
1 lock held by dmraid/6718:                                                     
 #0:  (&bdev->bd_mutex){......}, at: [<ffffffff811cc21e>] __blkdev_get+0x6e/0x40

smc> and it's been 20-something minutes at this point, and the system is
still not up, still cannot login..

If anyone wants to try it themself, make a RAID5 volume on a smart array
with rapid parity init enabled and then reboot.

Userland is RHEL6u3, I think (might be RHEL6u4, I don't think it makes
a difference.).

-- steve


> 
> Also maybe a cmd_special_free is missing - see below
> 
> Cheers, Tomas
> Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
> ---
>  drivers/scsi/hpsa.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  drivers/scsi/hpsa.h |    1 +
>  2 files changed, 49 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index b7f405f..38e3af4 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1010,6 +1010,20 @@ static void adjust_hpsa_scsi_table(struct ctlr_info *h, int hostno,
>  	for (i = 0; i < nsds; i++) {
>  		if (!sd[i]) /* if already added above. */
>  			continue;
> +
> +		/* Don't add devices which are NOT READY, FORMAT IN PROGRESS
> +		 * as the SCSI mid-layer does not handle such devices well.
> +		 * It relentlessly loops sending TUR at 3Hz, then READ(10)
> +		 * at 160Hz, and prevents the system from coming up.
> +		 */
> +		if (sd[i]->format_in_progress) {
> +			dev_info(&h->pdev->dev,
> +				"Logical drive format in progress, device c%db%dt%dl%d offline.\n",
> +				h->scsi_host->host_no,
> +				sd[i]->bus, sd[i]->target, sd[i]->lun);
> +			continue;
> +		}
> +
>  		device_change = hpsa_scsi_find_entry(sd[i], h->dev,
>  					h->ndevices, &entry);
>  		if (device_change == DEVICE_NOT_FOUND) {
> @@ -1715,6 +1729,34 @@ static inline void hpsa_set_bus_target_lun(struct hpsa_scsi_dev_t *device,
>  	device->lun = lun;
>  }
>  
> +static unsigned char hpsa_format_in_progress(struct ctlr_info *h,
> +		unsigned char scsi3addr[])
> +{
> +	struct CommandList *c;
> +	unsigned char *sense, sense_key, asc, ascq;
> +#define ASC_LUN_NOT_READY 0x04
> +#define ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS 0x04
> +
> +
> +	c = cmd_special_alloc(h);
> +	if (!c)
> +		return 0;
> +	fill_cmd(c, TEST_UNIT_READY, h, NULL, 0, 0, scsi3addr, TYPE_CMD);
> +	hpsa_scsi_do_simple_cmd_core(h, c);
> +	sense = c->err_info->SenseInfo;
> +	sense_key = sense[2];
> +	asc = sense[12];
> +	ascq = sense[13];
> +	if (c->err_info->CommandStatus == CMD_TARGET_STATUS &&
> +		c->err_info->ScsiStatus == SAM_STAT_CHECK_CONDITION &&
> +		sense_key == NOT_READY &&
> +		asc == ASC_LUN_NOT_READY &&
> +		ascq == ASCQ_LUN_NOT_READY_FORMAT_IN_PROGRESS)
> +		return 1;
> return^ without cmd_special_free
> 
> +	cmd_special_free(h, c);
> +	return 0;
> +}
> +
>  static int hpsa_update_device_info(struct ctlr_info *h,
>  	unsigned char scsi3addr[], struct hpsa_scsi_dev_t *this_device,
>  	unsigned char *is_OBDR_device)
> @@ -1753,10 +1795,14 @@ static int hpsa_update_device_info(struct ctlr_info *h,
>  		sizeof(this_device->device_id));
>  
>  	if (this_device->devtype == TYPE_DISK &&
> -		is_logical_dev_addr_mode(scsi3addr))
> +		is_logical_dev_addr_mode(scsi3addr)) {
>  		hpsa_get_raid_level(h, scsi3addr, &this_device->raid_level);
> -	else
> +		this_device->format_in_progress =
> +			hpsa_format_in_progress(h, scsi3addr);
> +	} else {
>  		this_device->raid_level = RAID_UNKNOWN;
> +		this_device->format_in_progress = 0;
> +	}
>  
>  	if (is_OBDR_device) {
>  		/* See if this is a One-Button-Disaster-Recovery device
> diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
> index bc85e72..4fd0d45 100644
> --- a/drivers/scsi/hpsa.h
> +++ b/drivers/scsi/hpsa.h
> @@ -46,6 +46,7 @@ struct hpsa_scsi_dev_t {
>  	unsigned char vendor[8];        /* bytes 8-15 of inquiry data */
>  	unsigned char model[16];        /* bytes 16-31 of inquiry data */
>  	unsigned char raid_level;	/* from inquiry page 0xC1 */
> +	unsigned char format_in_progress;
>  };
>  
>  struct reply_pool {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2013-09-27 19:11 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-23 18:33 [PATCH 00/10] hpsa: September 2013 driver fixes Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 01/10] hpsa: do not attempt to flush the cache on locked up controllers Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 02/10] hpsa: add 5 second delay after doorbell reset Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 03/10] hpsa: do not discard scsi status on aborted commands Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 04/10] hpsa: remove unneeded include of seq_file.h Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 05/10] hpsa: fix memory leak in CCISS_BIG_PASSTHRU ioctl Stephen M. Cameron
2013-09-23 18:33 ` [PATCH 06/10] hpsa: add MSA 2040 to list of external target devices Stephen M. Cameron
2013-09-23 18:34 ` [PATCH 07/10] hpsa: hide logical drives with format in progress from linux Stephen M. Cameron
2013-09-27 13:22   ` Tomas Henzl
2013-09-27 13:34     ` scameron
2013-09-27 14:01       ` Tomas Henzl
2013-09-27 14:41         ` scameron
2013-09-27 14:58           ` Tomas Henzl
2013-09-30 21:18             ` scameron
2013-09-27 16:54           ` Douglas Gilbert
2013-09-27 17:41             ` scameron
2013-10-10 16:25       ` scameron
2013-09-27 19:11     ` scameron [this message]
2013-09-23 18:34 ` [PATCH 08/10] hpsa: bring logical drives online when format completes Stephen M. Cameron
2013-09-23 18:34 ` [PATCH 09/10] hpsa: cap CCISS_PASSTHRU at 20 concurrent commands Stephen M. Cameron
2013-09-23 18:34 ` [PATCH 10/10] hpsa: prevent stalled i/o Stephen M. Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130927191105.GF31476@beardog.cce.hp.com \
    --to=scameron@beardog.cce.hp.com \
    --cc=james.bottomley@hansenpartnership.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mikem@beardog.cce.hp.com \
    --cc=scott.teel@hp.com \
    --cc=stephenmcameron@gmail.com \
    --cc=thenzl@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.