sata_nv and RAID1

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* sata_nv and RAID1
@ 2005-06-11 16:13 Diego M. Vadell
  2005-06-11 19:26 ` Jeff Garzik
  0 siblings, 1 reply; 37+ messages in thread
From: Diego M. Vadell @ 2005-06-11 16:13 UTC (permalink / raw)
  To: linux-raid

Hi,
   A new computer arrived at work with 4 160GB SATA disks. I made a
couple of RAID 1 (mirror) with two disks each, and then joined them wih
LVM. Now I have 320GB in my root volume.

My boss asked me to test it, so we all gathered and unplugged the data
cable of one of the disks. I was hoping to see linux making warnings for
some seconds, then giving up and running a degraded raid, but it just
hang, repeating disk errors about the just-removed disk:

Jun  9 20:29:24 localhost kernel:  disk 1, wo:0, o:1, dev:sdd2
Jun  9 20:29:55 localhost kernel: nv_sata: Primary device removed
Jun  9 20:30:25 localhost kernel: ata3: command 0x35 timeout, stat 0xd0
host_stat 0x41
Jun  9 20:30:25 localhost kernel: ata3: status=0xd0 { Busy }
Jun  9 20:30:25 localhost kernel: ata3: called with no error (D0)!
Jun  9 20:30:25 localhost kernel: scsi2: ERROR on channel 0, id 0, lun
0, CDB: Write (10) 00 12 a1 89 e1 00 00 08 00
Jun  9 20:30:25 localhost kernel: Current sdc: sense key Medium Error
Jun  9 20:30:25 localhost kernel: Additional sense: Write error - auto
reallocation failed
Jun  9 20:30:25 localhost kernel: end_request: I/O error, dev sdc,
sector 312576481
Jun  9 20:30:25 localhost kernel: ATA: abnormal status 0xD0 on port 0x9E7
Jun  9 20:30:25 localhost last message repeated 2 times
Jun  9 20:30:55 localhost kernel: ata3: command 0x35 timeout, stat 0xd0
host_stat 0x41
Jun  9 20:30:55 localhost kernel: ata3: status=0xd0 { Busy }
Jun  9 20:40:59 localhost kernel: ata3: called with no error (D0)!
Jun  9 20:40:59 localhost kernel: scsi2: ERROR on channel 0, id 0, lun
0, CDB: Write (10) 00 12 a1 89 e2 00 00 07 00
Jun  9 20:40:59 localhost crond(pam_unix)[5681]: session opened for user
root by (uid=0)
Jun  9 20:40:59 localhost kernel: Current sdc: sense key Medium Error
Jun  9 20:40:59 localhost kernel: Additional sense: Write error - auto
reallocation failed
Jun  9 20:40:59 localhost kernel: end_request: I/O error, dev sdc,
sector 312576482
Jun  9 20:40:59 localhost crond(pam_unix)[5687]: session opened for user
root by (uid=0)
Jun  9 20:40:59 localhost kernel: ATA: abnormal status 0xD0 on port 0x9E7
Jun  9 20:40:59 localhost crond(pam_unix)[5680]: session opened for user
root by (uid=0)
Jun  9 20:40:59 localhost kernel: ATA: abnormal status 0xD0 on port 0x9E7
Jun  9 20:40:59 localhost kernel: ATA: abnormal status 0xD0 on port 0x9E7
Jun  9 20:40:59 localhost kernel: ata3: command 0x35 timeout, stat 0xd0
host_stat 0x41


It can stay forever giving this errors, and it wont timeout and run in
degraded mode. Does anybody knows why? 

I read somewhere that if the lower layer (the sata_nv here) retries forever 
when it finds it has no comunication with the disk, it will never report that 
to the md layer, and that maybe what is happening. But Im just a newbie and I 
dont know if it can be applied here.

Some more configuration follow. 

Thanks in advance,
 -- Diego.

-------------------------------------------------------

[root@localhost ~]# cat /etc/redhat-release
CentOS release 4.0 (Final)

-------------------------------------------------------

[root@localhost ~]# uname -a
Linux localhost.localdomain 2.6.9-5.0.3.EL #1 Sat Feb 19 15:25:58 CST 2005 
x86_64 x86_64 x86_64 GNU/Linux

-------------------------------------------------------

[root@localhost ~]# lspci
00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller
(rev a3)
00:01.0 ISA bridge: nVidia Corporation: Unknown device 0050 (rev a3)
00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)
00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3)
00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97
Audio Controller (rev a2)
00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a2)
00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller
(rev a3)
00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller
(rev a3)
00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2)
00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
05:06.0 VGA compatible controller: Silicon Integrated Systems [SiS]
86C326 5598/6326 (rev 0b)
05:0b.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A
IEEE-1394a-2000 Controller (PHY/Link)

-------------------------------------------------------
lsmod (edited)

dm_snapshot            17833  0
dm_zero                 2753  0
dm_mirror              26105  2
ext3                  139473  2
jbd                    86897  1 ext3
raid1                  24129  3
dm_mod                 65449  5 dm_snapshot,dm_zero,dm_mirror
sata_nv                10565  8
libata                 49481  1 sata_nv
sd_mod                 19265  12
scsi_mod              150449  2 libata,sd_mod
-------------------------------------------------------

[root@localhost ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb2[1] sda2[0]
      156023168 blocks [2/2] [UU]

md2 : active raid1 sdd2[1] sdc2[0]
      156023168 blocks [2/2] [UU]

md0 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0]
      264960 blocks [4/4] [UUUU]

unused devices: <none>
-------------------------------------------------------
[root@localhost ~]# mdadm -D /dev/md[012]
/dev/md0:
        Version : 00.90.01
  Creation Time : Thu Jun  9 17:06:18 2005
     Raid Level : raid1
     Array Size : 264960 (258.75 MiB 271.32 MB)
    Device Size : 264960 (258.75 MiB 271.32 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sat Jun 11 15:12:21 2005
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0


    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1
           UUID : 07c4b1ae:ca3db1d6:7833754b:22e5b3f0
         Events : 0.126
/dev/md1:
        Version : 00.90.01
  Creation Time : Thu Jun  9 12:05:46 2005
     Raid Level : raid1
     Array Size : 156023168 (148.80 GiB 159.77 GB)
    Device Size : 156023168 (148.80 GiB 159.77 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Sat Jun 11 15:21:36 2005
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0


    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
           UUID : e20bcfe0:17084c56:11607a12:cacafc30
         Events : 0.17840
/dev/md2:
        Version : 00.90.01
  Creation Time : Thu Jun  9 12:05:46 2005
     Raid Level : raid1
     Array Size : 156023168 (148.80 GiB 159.77 GB)
    Device Size : 156023168 (148.80 GiB 159.77 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sat Jun 11 15:20:36 2005
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0


    Number   Major   Minor   RaidDevice State
       0       8       34        0      active sync   /dev/sdc2
       1       8       50        1      active sync   /dev/sdd2
           UUID : 668b1447:f95d147b:8c8013e2:c6b1a724
         Events : 0.10631

-------------------------------------------------------
dmesg (edited)
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
NFORCE-CK804: IDE controller at PCI slot 0000:00:06.0
NFORCE-CK804: chipset revision 162
NFORCE-CK804: not 100% native mode: will probe irqs later
NFORCE-CK804: 0000:00:06.0 (rev a2) UDMA133 controller
    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA
Probing IDE interface ide0...
hdb: SAMSUNG CD-ROM SC-152G, ATAPI CD/DVD-ROM drive
Using cfq io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
Probing IDE interface ide1...
Probing IDE interface ide2...
ide2: Wait for ready failed before probe !
Probing IDE interface ide3...
ide3: Wait for ready failed before probe !
Probing IDE interface ide4...
ide4: Wait for ready failed before probe !
Probing IDE interface ide5...
ide5: Wait for ready failed before probe !
hdb: ATAPI 52X CD-ROM drive, 128kB Cache, DMA
Uniform CD-ROM driver Revision: 3.20
ide-floppy driver 0.99.newide
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.0:USB HID core driver
mice: PS/2 mouse device common for all mice
input: AT Translated Set 2 keyboard on isa0060/serio0
input: ImPS/2 Generic Wheel Mouse on isa0060/serio1
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
SCSI subsystem initialized
libata version 1.02 loaded.
sata_nv version 0.03
ACPI: PCI interrupt 0000:00:07.0[A] -> GSI 23 (level, low) -> IRQ 177
PCI: Setting latency timer of device 0000:00:07.0 to 64
ata1: SATA max UDMA/133 cmd 0x9F0 ctl 0xBF2 bmdma 0xD800 irq 177
ata2: SATA max UDMA/133 cmd 0x970 ctl 0xB72 bmdma 0xD808 irq 177
ata1: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3c69 86:3c01 87:4003 
88:40ff
ata1: dev 0 ATA, max UDMA7, 312581808 sectors: lba48
nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
ata1: dev 0 configured for UDMA/133
scsi0 : sata_nv
ata2: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3c69 86:3c01 87:4003 
88:40ff
ata2: dev 0 ATA, max UDMA7, 312581808 sectors: lba48
nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
ata2: dev 0 configured for UDMA/133
scsi1 : sata_nv
  Vendor: ATA       Model: SAMSUNG SP1614C   Rev: SW10
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)
SCSI device sda: drive cache: write back
 sda:<4>nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
 sda1 sda2
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
  Vendor: ATA       Model: SAMSUNG SP1614C   Rev: SW10
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdb: 312581808 512-byte hdwr sectors (160042 MB)
SCSI device sdb: drive cache: write back
 sdb:<4>nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
 sdb1 sdb2
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
ACPI: PCI interrupt 0000:00:08.0[A] -> GSI 22 (level, low) -> IRQ 185
PCI: Setting latency timer of device 0000:00:08.0 to 64
ata3: SATA max UDMA/133 cmd 0x9E0 ctl 0xBE2 bmdma 0xC400 irq 185
ata4: SATA max UDMA/133 cmd 0x960 ctl 0xB62 bmdma 0xC408 irq 185
ata3: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3c69 86:3c01 87:4003 
88:40ff
ata3: dev 0 ATA, max UDMA7, 312581808 sectors: lba48
nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
ata3: dev 0 configured for UDMA/133
scsi2 : sata_nv
ata4: dev 0 cfg 49:2f00 82:346b 83:7f21 84:4003 85:3469 86:3c01 87:4003 
88:003f
ata4: dev 0 ATA, max UDMA/100, 312581808 sectors: lba48
nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
ata4: dev 0 configured for UDMA/100
scsi3 : sata_nv
  Vendor: ATA       Model: SAMSUNG SP1614C   Rev: SW10
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdc: 312581808 512-byte hdwr sectors (160042 MB)
SCSI device sdc: drive cache: write back
 sdc:<4>nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
 sdc1 sdc2
Attached scsi disk sdc at scsi2, channel 0, id 0, lun 0
  Vendor: ATA       Model: WDC WD1600JD-00G  Rev: 02.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdd: 312581808 512-byte hdwr sectors (160042 MB)
SCSI device sdd: drive cache: write back
 sdd:<4>nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
nv_sata: Primary device added
nv_sata: Primary device removed
nv_sata: Secondary device added
nv_sata: Secondary device removed
 sdd1 sdd2
Attached scsi disk sdd at scsi3, channel 0, id 0, lun 0
device-mapper: 4.1.0-ioctl (2003-12-10) initialised: dm@uk.sistina.com
md: raid1 personality registered as nr 3
md: Autodetecting RAID arrays.
md: autorun ...
md: considering sdd2 ...
md:  adding sdd2 ...
md: sdd1 has different UUID to sdd2
md:  adding sdc2 ...
md: sdc1 has different UUID to sdd2
md: sdb2 has different UUID to sdd2
md: sdb1 has different UUID to sdd2
md: sda2 has different UUID to sdd2
md: sda1 has different UUID to sdd2
md: created md2
md: bind<sdc2>
md: bind<sdd2>
md: running: <sdd2><sdc2>
raid1: raid set md2 active with 2 out of 2 mirrors
md: considering sdd1 ...
md:  adding sdd1 ...
md:  adding sdc1 ...
md: sdb2 has different UUID to sdd1
md:  adding sdb1 ...
md: sda2 has different UUID to sdd1
md:  adding sda1 ...
md: created md0
md: bind<sda1>
md: bind<sdb1>
md: bind<sdc1>
md: bind<sdd1>
md: running: <sdd1><sdc1><sdb1><sda1>
raid1: raid set md0 active with 4 out of 4 mirrors
md: considering sdb2 ...
md:  adding sdb2 ...
md:  adding sda2 ...
md: created md1
md: bind<sda2>
md: bind<sdb2>
md: running: <sdb2><sda2>
raid1: raid set md1 active with 2 out of 2 mirrors
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.

----------------------------------------------------------

More in /var/log/messages
Jun  9 20:29:24 localhost kernel:  disk 1, wo:0, o:1, dev:sdd2
Jun  9 20:29:55 localhost kernel: nv_sata: Primary device removed
Jun  9 20:30:25 localhost kernel: ata3: command 0x35 timeout, stat 0xd0
host_stat 0x41
Jun  9 20:30:25 localhost kernel: ata3: status=0xd0 { Busy }
Jun  9 20:30:25 localhost kernel: ata3: called with no error (D0)!
Jun  9 20:30:25 localhost kernel: scsi2: ERROR on channel 0, id 0, lun
0, CDB: Write (10) 00 12 a1 89 e1 00 00 08 00
Jun  9 20:30:25 localhost kernel: Current sdc: sense key Medium Error
Jun  9 20:30:25 localhost kernel: Additional sense: Write error - auto
reallocation failed
Jun  9 20:30:25 localhost kernel: end_request: I/O error, dev sdc,
sector 312576481
Jun  9 20:30:25 localhost kernel: ATA: abnormal status 0xD0 on port 0x9E7
Jun  9 20:30:25 localhost last message repeated 2 times
Jun  9 20:30:55 localhost kernel: ata3: command 0x35 timeout, stat 0xd0
host_stat 0x41
Jun  9 20:30:55 localhost kernel: ata3: status=0xd0 { Busy }
Jun  9 20:40:59 localhost kernel: ata3: called with no error (D0)!
Jun  9 20:40:59 localhost crond(pam_unix)[5686]: session opened for user
root by (uid=0)
Jun  9 20:40:59 localhost crond(pam_unix)[5685]: session opened for user
root by (uid=0)
Jun  9 20:40:59 localhost kernel: scsi2: ERROR on channel 0, id 0, lun
0, CDB: Write (10) 00 12 a1 89 e2 00 00 07 00
Jun  9 20:40:59 localhost crond(pam_unix)[5681]: session opened for user
root by (uid=0)
Jun  9 20:40:59 localhost kernel: Current sdc: sense key Medium Error
Jun  9 20:40:59 localhost kernel: Additional sense: Write error - auto
reallocation failed
Jun  9 20:40:59 localhost kernel: end_request: I/O error, dev sdc,
sector 312576482
Jun  9 20:40:59 localhost crond(pam_unix)[5687]: session opened for user
root by (uid=0)
Jun  9 20:40:59 localhost kernel: ATA: abnormal status 0xD0 on port 0x9E7
Jun  9 20:40:59 localhost crond(pam_unix)[5680]: session opened for user
root by (uid=0)
Jun  9 20:40:59 localhost kernel: ATA: abnormal status 0xD0 on port 0x9E7
Jun  9 20:40:59 localhost kernel: ATA: abnormal status 0xD0 on port 0x9E7
Jun  9 20:40:59 localhost kernel: ata3: command 0x35 timeout, stat 0xd0
host_stat 0x41
Jun  9 20:40:59 localhost kernel: ata3: status=0xd0 { Busy }
Jun  9 20:40:59 localhost kernel: ata3: called with no error (D0)!
Jun  9 20:40:59 localhost kernel: scsi2: ERROR on channel 0, id 0, lun
0, CDB: Write (10) 00 12 a1 89 e3 00 00 06 00
Jun  9 20:40:59 localhost kernel: Current sdc: sense key Medium Error
Jun  9 20:40:59 localhost kernel: Additional sense: Write error - auto
reallocation failed
Jun  9 20:40:59 localhost kernel: end_request: I/O error, dev sdc,
sector 312576483

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-11 16:13 sata_nv and RAID1 Diego M. Vadell
@ 2005-06-11 19:26 ` Jeff Garzik
  2005-06-11 20:29   ` Michael Tokarev
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Garzik @ 2005-06-11 19:26 UTC (permalink / raw)
  To: Diego M. Vadell; +Cc: linux-raid

On Sat, Jun 11, 2005 at 04:13:42PM +0000, Diego M. Vadell wrote:
> Hi,
>    A new computer arrived at work with 4 160GB SATA disks. I made a
> couple of RAID 1 (mirror) with two disks each, and then joined them wih
> LVM. Now I have 320GB in my root volume.
> 
> My boss asked me to test it, so we all gathered and unplugged the data
> cable of one of the disks. I was hoping to see linux making warnings for

Hotplug is not supported yet.  Don't do that :)

	Jeff




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-11 19:26 ` Jeff Garzik
@ 2005-06-11 20:29   ` Michael Tokarev
  2005-06-13  3:15     ` Diego M. Vadell
  2005-06-13  6:45     ` Jeff Garzik
  0 siblings, 2 replies; 37+ messages in thread
From: Michael Tokarev @ 2005-06-11 20:29 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Diego M. Vadell, linux-raid

Jeff Garzik wrote:
> On Sat, Jun 11, 2005 at 04:13:42PM +0000, Diego M. Vadell wrote:
> 
>>Hi,
>>   A new computer arrived at work with 4 160GB SATA disks. I made a
>>couple of RAID 1 (mirror) with two disks each, and then joined them wih
>>LVM. Now I have 320GB in my root volume.
>>
>>My boss asked me to test it, so we all gathered and unplugged the data
>>cable of one of the disks. I was hoping to see linux making warnings for
> 
> Hotplug is not supported yet.  Don't do that :)

It isn't hotPLUG -- it's hotUNplug.  Happens when drive is dying for
example, or when the cable is flaky, or due to millions of other
reasons...  And.. I for one expect linux to react to such a situation
*somehow* - after all, raid is used for this very stuff too, to be
able to continue running a system if one of the drives failed...

/mjt

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-11 20:29   ` Michael Tokarev
@ 2005-06-13  3:15     ` Diego M. Vadell
  2005-06-13  6:45     ` Jeff Garzik
  1 sibling, 0 replies; 37+ messages in thread
From: Diego M. Vadell @ 2005-06-13  3:15 UTC (permalink / raw)
  To: linux-raid

On Saturday 11 June 2005 17:29, Michael Tokarev wrote:
> Jeff Garzik wrote:
> > On Sat, Jun 11, 2005 at 04:13:42PM +0000, Diego M. Vadell wrote:
> >>Hi,
> >>   A new computer arrived at work with 4 160GB SATA disks. I made a
> >>couple of RAID 1 (mirror) with two disks each, and then joined them wih
> >>LVM. Now I have 320GB in my root volume.
> >>
> >>My boss asked me to test it, so we all gathered and unplugged the data
> >>cable of one of the disks. I was hoping to see linux making warnings for
> >
> > Hotplug is not supported yet.  Don't do that :)
>
> It isn't hotPLUG -- it's hotUNplug.  Happens when drive is dying for
> example, or when the cable is flaky, or due to millions of other
> reasons...  And.. I for one expect linux to react to such a situation
> *somehow* - after all, raid is used for this very stuff too, to be
> able to continue running a system if one of the drives failed...
>
> /mjt

So I thought... Who's fault is it? Is it something missing from sata_nv or 
from md? How hard could it be to implement (I dont know a thing, but maybe 
its just a matter of returning an error somewhere, as I dont want to retry or 
nothing but informing the md layer that it has to forget about that disk)? 
Has anybody started to do this?

Thanks a lot,
 -- Diego.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-11 20:29   ` Michael Tokarev
  2005-06-13  3:15     ` Diego M. Vadell
@ 2005-06-13  6:45     ` Jeff Garzik
  2005-06-13 11:57       ` Michael Tokarev
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff Garzik @ 2005-06-13  6:45 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Diego M. Vadell, linux-raid

Michael Tokarev wrote:
> Jeff Garzik wrote:
> 
>> On Sat, Jun 11, 2005 at 04:13:42PM +0000, Diego M. Vadell wrote:
>>
>>> Hi,
>>>   A new computer arrived at work with 4 160GB SATA disks. I made a
>>> couple of RAID 1 (mirror) with two disks each, and then joined them wih
>>> LVM. Now I have 320GB in my root volume.
>>>
>>> My boss asked me to test it, so we all gathered and unplugged the data
>>> cable of one of the disks. I was hoping to see linux making warnings for
>>
>>
>> Hotplug is not supported yet.  Don't do that :)
> 
> 
> It isn't hotPLUG -- it's hotUNplug.  Happens when drive is dying for

Same difference to me:  both require new code.

	Jeff




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-13  6:45     ` Jeff Garzik
@ 2005-06-13 11:57       ` Michael Tokarev
  2005-06-13 12:27         ` Peter T. Breuer
  2005-06-14 21:11         ` Molle Bestefich
  0 siblings, 2 replies; 37+ messages in thread
From: Michael Tokarev @ 2005-06-13 11:57 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Diego M. Vadell, linux-raid

Jeff Garzik wrote:
[linux I/O layer loops forever on SATA drive unplug;
  SATA hotplug is unsupported yet]

>> It isn't hotPLUG -- it's hotUNplug.  Happens when drive is dying for
> 
> Same difference to me:  both require new code.

Jeff,

I didn't want to blame you or anyone else (just in case
if that wasn't clear).  Instead, I just wanted to understand
what's the current state of the whole thing.  I know SATA
hotplug is unsupported, and some code has to be written for
that to work.  But I don't know if hotUNplugging and error
handling comes together.  That is, is there a difference
between real drive failure (and oh, there are alot of various
failure scenarios too, from bad block, including a drive dying
completely during normal operations as if there wa no drive at
all, up to unplugging the cable by a mistake) and such hot-
UN-plugging?  Will current code notice and properly propagate
I/O errors on the drive, or drive dying?  If some errors are
propagated properly now, Where's the "boundary" between I/O
errors (implemented) and hotplug (not implemented)?

This all is quite important IMHO.  Without proper error handling
(if I/O errors are "blacked" by that "boundary" too), linux SATA
subsystem isn't ready for production, and people should not
rely on it *now*.

/mjt

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-13 11:57       ` Michael Tokarev
@ 2005-06-13 12:27         ` Peter T. Breuer
  2005-06-13 14:40           ` Diego M. Vadell
  2005-06-14 21:11         ` Molle Bestefich
  1 sibling, 1 reply; 37+ messages in thread
From: Peter T. Breuer @ 2005-06-13 12:27 UTC (permalink / raw)
  To: linux-raid

Michael Tokarev <mjt@tls.msk.ru> wrote:
> Jeff Garzik wrote:
> [linux I/O layer loops forever on SATA drive unplug;
>   SATA hotplug is unsupported yet]

> >> It isn't hotPLUG -- it's hotUNplug.  Happens when drive is dying for
> > 
> > Same difference to me:  both require new code.

> I didn't want to blame you or anyone else (just in case
> if that wasn't clear).  Instead, I just wanted to understand
> what's the current state of the whole thing.  I know SATA
> hotplug is unsupported, and some code has to be written for
> that to work.  But I don't know if hotUNplugging and error
> handling comes together.  That is, is there a difference

I don't know either. For the FR1 code I implemented three new ioctls ..
all of them sent out by the FR1 (raid1) driver. 

  1) notify component that it is in an array and which
  2) notify component that it is no longer in an array and which
  3) send component a callback function through which it can 
     SET_FAULTY and re-HOTADD itself to the array it kno it is in
     as need be.

Maybe hotplugging has those facilities. I don't know.

Cooperating devices would have to implement the ioctls.

Peter


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-13 12:27         ` Peter T. Breuer
@ 2005-06-13 14:40           ` Diego M. Vadell
  2005-06-13 16:07             ` Peter T. Breuer
  0 siblings, 1 reply; 37+ messages in thread
From: Diego M. Vadell @ 2005-06-13 14:40 UTC (permalink / raw)
  To: Peter T. Breuer; +Cc: linux-raid

On Monday 13 June 2005 09:27, Peter T. Breuer wrote:

> I don't know either. For the FR1 code I implemented three new ioctls ..
> all of them sent out by the FR1 (raid1) driver.
>
>   1) notify component that it is in an array and which
>   2) notify component that it is no longer in an array and which
>   3) send component a callback function through which it can
>      SET_FAULTY and re-HOTADD itself to the array it kno it is in
>      as need be.
>
> Maybe hotplugging has those facilities. I don't know.
>
> Cooperating devices would have to implement the ioctls.
>
> Peter

Hi Peter,
   If I understand right, even if I used FR1, it wont pass the test 
(unplugging the cable). Notice that Im not interested int hotplugging, but in 
hotUNplugging it as Michael Tokarev said: unplug the cable, linux starts 
using a degraded mirror, and later I can shutdown the server, plug the disk 
and let it resync.

Thanks,
 -- Diego.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-13 14:40           ` Diego M. Vadell
@ 2005-06-13 16:07             ` Peter T. Breuer
  2005-06-13 16:51               ` Diego M. Vadell
  0 siblings, 1 reply; 37+ messages in thread
From: Peter T. Breuer @ 2005-06-13 16:07 UTC (permalink / raw)
  To: linux-raid

Diego M. Vadell <dvadell@lantech.com.ar> wrote:
> On Monday 13 June 2005 09:27, Peter T. Breuer wrote:

> > I don't know either. For the FR1 code I implemented three new ioctls ..
> > all of them sent out by the FR1 (raid1) driver.
> >
> >   1) notify component that it is in an array and which
> >   2) notify component that it is no longer in an array and which
> >   3) send component a callback function through which it can
> >      SET_FAULTY and re-HOTADD itself to the array it kno it is in
> >      as need be.
> >
> > Maybe hotplugging has those facilities. I don't know.
> >
> > Cooperating devices would have to implement the ioctls.



>    If I understand right, even if I used FR1, it wont pass the test 

Yes it will. The device driver will detect something wrong (if the
device driver doesn't know, NOBODY does) and call back to the raid array
driver to say "set me faulty".

That's the whole idea.

When the device driver senses its device is well again, it will call
back and say "hot add me again".

Peter


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-13 16:07             ` Peter T. Breuer
@ 2005-06-13 16:51               ` Diego M. Vadell
  2005-06-13 17:59                 ` Jeff Garzik
  2005-06-13 19:00                 ` Peter T. Breuer
  0 siblings, 2 replies; 37+ messages in thread
From: Diego M. Vadell @ 2005-06-13 16:51 UTC (permalink / raw)
  To: Peter T. Breuer; +Cc: linux-raid

On Monday 13 June 2005 13:07, Peter T. Breuer wrote:
> > > I don't know either. For the FR1 code I implemented three new ioctls ..
> > > all of them sent out by the FR1 (raid1) driver.
> > >
> > >   1) notify component that it is in an array and which
> > >   2) notify component that it is no longer in an array and which
> > >   3) send component a callback function through which it can
> > >      SET_FAULTY and re-HOTADD itself to the array it kno it is in
> > >      as need be.
> > >
> > > Maybe hotplugging has those facilities. I don't know.
> > >
> > > Cooperating devices would have to implement the ioctls.
> >
> >    If I understand right, even if I used FR1, it wont pass the test
>
> Yes it will. The device driver will detect something wrong (if the
> device driver doesn't know, NOBODY does) and call back to the raid array
> driver to say "set me faulty".
>
> That's the whole idea.
>
> When the device driver senses its device is well again, it will call
> back and say "hot add me again".
>
> Peter

But not as it is today... when you say "Cooperating devices would have to 
implement the ioctls." means that I have to touch sata_nv's source code to 
implement those ioctls, am I right?
If that is all I have to do, I will give it a try (supposing my boss does not 
make me use the raid in the motherboard) . 

Thanks a lot,
 -- Diego.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-13 16:51               ` Diego M. Vadell
@ 2005-06-13 17:59                 ` Jeff Garzik
  2005-06-13 21:00                   ` Diego M. Vadell
  2005-06-13 19:00                 ` Peter T. Breuer
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff Garzik @ 2005-06-13 17:59 UTC (permalink / raw)
  To: Diego M. Vadell; +Cc: Peter T. Breuer, linux-raid

Diego M. Vadell wrote:
> But not as it is today... when you say "Cooperating devices would have to 
> implement the ioctls." means that I have to touch sata_nv's source code to 
> implement those ioctls, am I right?
> If that is all I have to do, I will give it a try (supposing my boss does not 
> make me use the raid in the motherboard) . 


The task is to update sata_nv to notify libata-core that a device has 
disappeared.  libata-core then notifies the SCSI layer of this.  No new 
ioctls need to be supported.

	Jeff



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-13 16:51               ` Diego M. Vadell
  2005-06-13 17:59                 ` Jeff Garzik
@ 2005-06-13 19:00                 ` Peter T. Breuer
  2005-06-13 20:41                   ` Raz Ben-Jehuda(caro)
  2005-06-13 21:16                   ` Diego M. Vadell
  1 sibling, 2 replies; 37+ messages in thread
From: Peter T. Breuer @ 2005-06-13 19:00 UTC (permalink / raw)
  To: linux-raid

Diego M. Vadell <dvadell@lantech.com.ar> wrote:
> On Monday 13 June 2005 13:07, Peter T. Breuer wrote:
> > > > I don't know either. For the FR1 code I implemented three new ioctls ..
> > > > all of them sent out by the FR1 (raid1) driver.
> > > >
> > > >   1) notify component that it is in an array and which
> > > >   2) notify component that it is no longer in an array and which
> > > >   3) send component a callback function through which it can
> > > >      SET_FAULTY and re-HOTADD itself to the array it kno it is in
> > > >      as need be.
> > > >
> > > > Maybe hotplugging has those facilities. I don't know.
> > > >
> > > > Cooperating devices would have to implement the ioctls.
> > >
> > >    If I understand right, even if I used FR1, it wont pass the test
> >
> > Yes it will. The device driver will detect something wrong (if the
> > device driver doesn't know, NOBODY does) and call back to the raid array
> > driver to say "set me faulty".
> >
> > That's the whole idea.
> >
> > When the device driver senses its device is well again, it will call
> > back and say "hot add me again".

> But not as it is today...

Yes, "as it is today".

> when you say "Cooperating devices would have to 
> implement the ioctls." means that I have to touch sata_nv's source code to 
> implement those ioctls, am I right?

One has to implement or have implemented those ioctls in the driver of
whichever device you are interested in, in order to cooperate properly
with fr1 (in that respect).  That goes without saying.  I merely provided
the infrastructure in fr1 - indeed, I could not have provided anything
else because I do not control the code for anything else. Fr1 will send
the ioctls I listed above (or sketched, rather) to any component device.
It is up to that component device to make use of them.

There may well be another scheme/architecture already available. I
don't know. In my abject ignorance I simply implemented an adequate
scheme for my purposes, and waited for anyone to tell me of a better
one.  If hotplugging is supported by the target device, it may well be
the case that you could implement at least ioctl (3) via hotplugging.

Ioctls (1) and (2) are merely informative. They courteously let the
target device know that it has been included in (or excluded from) a
particular md array. This allows the target device to make any mode
shifts that may be appropriate to running in a raid array, such as
exposing errors immediately to the array instead of blocking and trying
again internally (or vice versa -). It also allows the target device
to decide when or if to make use of the callback function provided to
it in ioctl (3), with which it can tell the raid arrays in which it has
been informed that it has been included of its current state of health.

> If that is all I have to do, I will give it a try (supposing my boss does not 
> make me use the raid in the motherboard) . 

The enbd code (ftp://oboe.it.uc3m.es/pub/Programs/enbd-2.4.32pre.tgz)
implements the ioctls in question.

If you know of another scheme, please feel free to tell me about it.

My idea was that the target device should preemptively inform the array
if it is in good or bad health. This implies that it should know in
which array it is included, in order to know who is interested in
knowing its health. Hence ioctls (1) and (2) which tell it who wishes
to be informed about its health. And ioctl (3) which gives it a
telephone number to use.

Ioctl (3) could be implemented the other way round - that is, it could
be simply the md array which receives an existing SETFAULTY or HOTADD
ioctl. I don't know why I chose to send across a callback function to
the target device instead. Probably because I was aware of locking
difficulties.

Peter

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-13 19:00                 ` Peter T. Breuer
@ 2005-06-13 20:41                   ` Raz Ben-Jehuda(caro)
  2005-06-13 21:16                   ` Diego M. Vadell
  1 sibling, 0 replies; 37+ messages in thread
From: Raz Ben-Jehuda(caro) @ 2005-06-13 20:41 UTC (permalink / raw)
  To: Peter T. Breuer; +Cc: linux-raid

Well, I have the same problem as well , 2 sata disks and raid1.
I tried to do it with two ramdisks and it worked great. when it comes to sata
it fails.

On 6/13/05, Peter T. Breuer <ptb@lab.it.uc3m.es> wrote:
> Diego M. Vadell <dvadell@lantech.com.ar> wrote:
> > On Monday 13 June 2005 13:07, Peter T. Breuer wrote:
> > > > > I don't know either. For the FR1 code I implemented three new ioctls ..
> > > > > all of them sent out by the FR1 (raid1) driver.
> > > > >
> > > > >   1) notify component that it is in an array and which
> > > > >   2) notify component that it is no longer in an array and which
> > > > >   3) send component a callback function through which it can
> > > > >      SET_FAULTY and re-HOTADD itself to the array it kno it is in
> > > > >      as need be.
> > > > >
> > > > > Maybe hotplugging has those facilities. I don't know.
> > > > >
> > > > > Cooperating devices would have to implement the ioctls.
> > > >
> > > >    If I understand right, even if I used FR1, it wont pass the test
> > >
> > > Yes it will. The device driver will detect something wrong (if the
> > > device driver doesn't know, NOBODY does) and call back to the raid array
> > > driver to say "set me faulty".
> > >
> > > That's the whole idea.
> > >
> > > When the device driver senses its device is well again, it will call
> > > back and say "hot add me again".
> 
> > But not as it is today...
> 
> 
> Yes, "as it is today".
> 
> > when you say "Cooperating devices would have to
> > implement the ioctls." means that I have to touch sata_nv's source code to
> > implement those ioctls, am I right?
> 
> One has to implement or have implemented those ioctls in the driver of
> whichever device you are interested in, in order to cooperate properly
> with fr1 (in that respect).  That goes without saying.  I merely provided
> the infrastructure in fr1 - indeed, I could not have provided anything
> else because I do not control the code for anything else. Fr1 will send
> the ioctls I listed above (or sketched, rather) to any component device.
> It is up to that component device to make use of them.
> 
> There may well be another scheme/architecture already available. I
> don't know. In my abject ignorance I simply implemented an adequate
> scheme for my purposes, and waited for anyone to tell me of a better
> one.  If hotplugging is supported by the target device, it may well be
> the case that you could implement at least ioctl (3) via hotplugging.
> 
> Ioctls (1) and (2) are merely informative. They courteously let the
> target device know that it has been included in (or excluded from) a
> particular md array. This allows the target device to make any mode
> shifts that may be appropriate to running in a raid array, such as
> exposing errors immediately to the array instead of blocking and trying
> again internally (or vice versa -). It also allows the target device
> to decide when or if to make use of the callback function provided to
> it in ioctl (3), with which it can tell the raid arrays in which it has
> been informed that it has been included of its current state of health.
> 
> 
> > If that is all I have to do, I will give it a try (supposing my boss does not
> > make me use the raid in the motherboard) .
> 
> The enbd code (ftp://oboe.it.uc3m.es/pub/Programs/enbd-2.4.32pre.tgz)
> implements the ioctls in question.
> 
> If you know of another scheme, please feel free to tell me about it.
> 
> My idea was that the target device should preemptively inform the array
> if it is in good or bad health. This implies that it should know in
> which array it is included, in order to know who is interested in
> knowing its health. Hence ioctls (1) and (2) which tell it who wishes
> to be informed about its health. And ioctl (3) which gives it a
> telephone number to use.
> 
> Ioctl (3) could be implemented the other way round - that is, it could
> be simply the md array which receives an existing SETFAULTY or HOTADD
> ioctl. I don't know why I chose to send across a callback function to
> the target device instead. Probably because I was aware of locking
> difficulties.
> 
> 
> Peter
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Raz
Long Live the Penguin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-13 17:59                 ` Jeff Garzik
@ 2005-06-13 21:00                   ` Diego M. Vadell
  2005-06-13 21:20                     ` Jeff Garzik
  0 siblings, 1 reply; 37+ messages in thread
From: Diego M. Vadell @ 2005-06-13 21:00 UTC (permalink / raw)
  To: Jeff Garzik, linux-raid

On Monday 13 June 2005 14:59, you wrote:
>
> The task is to update sata_nv to notify libata-core that a device has
> disappeared.  libata-core then notifies the SCSI layer of this.  No new
> ioctls need to be supported.
>
> 	Jeff

Hi Jeff,
   Thank you for your answers. Reading a little of 
http://www.kernel.org/pub/linux/kernel/people/jgarzik/libata/libata.pdf and 
drivers/scsi/sata_nv.c , it seems to me that I have to add a call to 
ata_port_disable() into sata_nv.c:nv_check_hotplug(). 
   In sata_nv.c , nv_check_hotplug() is called from nv_interrupt() , which 
seems to be the interrupt handler. I add the call to ata_port_disable(ap) , 
taking ap from the ata_host_set structure, but that structure seems to be 
able to have many ap ports (its an array). 
   Question: is it ok to set ap as host_set->ports[0] or should I have to see 
what ata_port is the one that has been unplugged?

The only change so far looks like this. It does compile cleanly, but I will 
have the hardware to test it tomorrow.

static void nv_check_hotplug(struct ata_host_set *host_set)
{
        u8 intr_status;
	struct ata_port *ap;

	// Get the ATA Port to be disabled if hot-removed
	ap = host_set->ports[0];  

        intr_status = inb(host_set->ports[0]->ioaddr.scr_addr + 
NV_INT_STATUS);

        // Clear interrupt status.
        outb(0xff, host_set->ports[0]->ioaddr.scr_addr + NV_INT_STATUS);

        if (intr_status & NV_INT_STATUS_HOTPLUG) {
                if (intr_status & NV_INT_STATUS_PDEV_ADDED)
                        printk(KERN_WARNING "nv_sata: "
                                "Primary device added\n");

                if (intr_status & NV_INT_STATUS_PDEV_REMOVED) {
                        printk(KERN_WARNING "nv_sata: "
                                "Primary device removed\n");
			ata_port_disable(ap);
		}

                if (intr_status & NV_INT_STATUS_SDEV_ADDED)
                        printk(KERN_WARNING "nv_sata: "
                                "Secondary device added\n");

                if (intr_status & NV_INT_STATUS_SDEV_REMOVED) {
                        printk(KERN_WARNING "nv_sata: "
                                "Secondary device removed\n");
			ata_port_disable(ap);
		}
        }
}

Thanks in advance,
 -- Diego.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-13 19:00                 ` Peter T. Breuer
  2005-06-13 20:41                   ` Raz Ben-Jehuda(caro)
@ 2005-06-13 21:16                   ` Diego M. Vadell
  1 sibling, 0 replies; 37+ messages in thread
From: Diego M. Vadell @ 2005-06-13 21:16 UTC (permalink / raw)
  To: Peter T. Breuer, linux-raid

On Monday 13 June 2005 16:00, you wrote:
> Diego M. Vadell <dvadell@lantech.com.ar> wrote:
> > On Monday 13 June 2005 13:07, Peter T. Breuer wrote:
> > > > > I don't know either. For the FR1 code I implemented three new
> > > > > ioctls .. all of them sent out by the FR1 (raid1) driver.
> > > > >
> > > > >   1) notify component that it is in an array and which
> > > > >   2) notify component that it is no longer in an array and which
> > > > >   3) send component a callback function through which it can
> > > > >      SET_FAULTY and re-HOTADD itself to the array it kno it is in
> > > > >      as need be.
> > > > >
> > > > > Maybe hotplugging has those facilities. I don't know.
> > > > >
> > > > > Cooperating devices would have to implement the ioctls.
> > > >
> > > >    If I understand right, even if I used FR1, it wont pass the test
> > >
> > > Yes it will. The device driver will detect something wrong (if the
> > > device driver doesn't know, NOBODY does) and call back to the raid
> > > array driver to say "set me faulty".
> > >
> > > That's the whole idea.
> > >
> > > When the device driver senses its device is well again, it will call
> > > back and say "hot add me again".
> >
> > But not as it is today...
>
> Yes, "as it is today".
>
> > when you say "Cooperating devices would have to
> > implement the ioctls." means that I have to touch sata_nv's source code
> > to implement those ioctls, am I right?
>
> One has to implement or have implemented those ioctls in the driver of
> whichever device you are interested in, in order to cooperate properly
> with fr1 (in that respect).  That goes without saying.  I merely provided
> the infrastructure in fr1 - indeed, I could not have provided anything
> else because I do not control the code for anything else. Fr1 will send
> the ioctls I listed above (or sketched, rather) to any component device.
> It is up to that component device to make use of them.
>
> There may well be another scheme/architecture already available. I
> don't know. In my abject ignorance I simply implemented an adequate
> scheme for my purposes, and waited for anyone to tell me of a better
> one.  If hotplugging is supported by the target device, it may well be
> the case that you could implement at least ioctl (3) via hotplugging.
>
> Ioctls (1) and (2) are merely informative. They courteously let the
> target device know that it has been included in (or excluded from) a
> particular md array. This allows the target device to make any mode
> shifts that may be appropriate to running in a raid array, such as
> exposing errors immediately to the array instead of blocking and trying
> again internally (or vice versa -). It also allows the target device
> to decide when or if to make use of the callback function provided to
> it in ioctl (3), with which it can tell the raid arrays in which it has
> been informed that it has been included of its current state of health.
>
> > If that is all I have to do, I will give it a try (supposing my boss does
> > not make me use the raid in the motherboard) .
>
> The enbd code (ftp://oboe.it.uc3m.es/pub/Programs/enbd-2.4.32pre.tgz)
> implements the ioctls in question.
>
> If you know of another scheme, please feel free to tell me about it.
>
> My idea was that the target device should preemptively inform the array
> if it is in good or bad health. This implies that it should know in
> which array it is included, in order to know who is interested in
> knowing its health. Hence ioctls (1) and (2) which tell it who wishes
> to be informed about its health. And ioctl (3) which gives it a
> telephone number to use.
>
> Ioctl (3) could be implemented the other way round - that is, it could
> be simply the md array which receives an existing SETFAULTY or HOTADD
> ioctl. I don't know why I chose to send across a callback function to
> the target device instead. Probably because I was aware of locking
> difficulties.
>
>
> Peter

Peter,
   thanks you very much. When I recieved your mail, I was already trying Jeff 
Garzik's solution. Basically, if that doesn't work, I have to put the server 
in production with whatever it has, and I will be unable to make more tests.
   But besides my problem, Fast Raid seems a very good idea! Is it any reason 
why it isn't in the mainline kernel?

 -- Diego.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-13 21:00                   ` Diego M. Vadell
@ 2005-06-13 21:20                     ` Jeff Garzik
  2005-06-13 21:41                       ` Diego M. Vadell
  2005-06-14 21:11                       ` Molle Bestefich
  0 siblings, 2 replies; 37+ messages in thread
From: Jeff Garzik @ 2005-06-13 21:20 UTC (permalink / raw)
  To: Diego M. Vadell; +Cc: linux-raid

Diego M. Vadell wrote:
> On Monday 13 June 2005 14:59, you wrote:
> 
>>The task is to update sata_nv to notify libata-core that a device has
>>disappeared.  libata-core then notifies the SCSI layer of this.  No new
>>ioctls need to be supported.
>>
>>	Jeff
> 
> 
> Hi Jeff,
>    Thank you for your answers. Reading a little of 
> http://www.kernel.org/pub/linux/kernel/people/jgarzik/libata/libata.pdf and 
> drivers/scsi/sata_nv.c , it seems to me that I have to add a call to 
> ata_port_disable() into sata_nv.c:nv_check_hotplug(). 
>    In sata_nv.c , nv_check_hotplug() is called from nv_interrupt() , which 
> seems to be the interrupt handler. I add the call to ata_port_disable(ap) , 
> taking ap from the ata_host_set structure, but that structure seems to be 
> able to have many ap ports (its an array). 
>    Question: is it ok to set ap as host_set->ports[0] or should I have to see 
> what ata_port is the one that has been unplugged?

You need to go into the SCSI layer and figure out how to do it.

Calling scsi_remove_device() in the appropriate place is a good start. 
Add a debounce timer.  Make sure all commands are completed with an 
error (avoids memory leaks and lock-ups).  Other details.

Hot unplug is not just a simple function call...

	Jeff




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-13 21:20                     ` Jeff Garzik
@ 2005-06-13 21:41                       ` Diego M. Vadell
       [not found]                         ` <1118818568.3089.5.camel@raz-laptop>
  2005-06-14 21:11                       ` Molle Bestefich
  1 sibling, 1 reply; 37+ messages in thread
From: Diego M. Vadell @ 2005-06-13 21:41 UTC (permalink / raw)
  To: Jeff Garzik, linux-raid

On Monday 13 June 2005 18:20, you wrote:
> You need to go into the SCSI layer and figure out how to do it.
>
> Calling scsi_remove_device() in the appropriate place is a good start.
> Add a debounce timer.  Make sure all commands are completed with an
> error (avoids memory leaks and lock-ups).  Other details.
>
> Hot unplug is not just a simple function call...
>
> 	Jeff
thank you Jeff! I dont think I will be able to do it, though. Thank you for 
the fast answers.

 -- Diego.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-13 11:57       ` Michael Tokarev
  2005-06-13 12:27         ` Peter T. Breuer
@ 2005-06-14 21:11         ` Molle Bestefich
  2005-06-14 21:37           ` Michael Tokarev
  2005-06-14 21:53           ` David Greaves
  1 sibling, 2 replies; 37+ messages in thread
From: Molle Bestefich @ 2005-06-14 21:11 UTC (permalink / raw)
  To: linux-raid

Michael Tokarev wrote:
> Without proper error handling [...]
> linux SATA subsystem isn't ready for production,
> and people should not rely on it *now*.

As long as disks aren't removed, it's OK for production..

But I'll agree with anyone who says that IDE under Linux just sucks
donkey ass compared to IDE under Windows.  With Windows 2K and later,
unplugging both SATA and PATA devices Just Works (tm).  I'm having a
hard time figuring how this works: there's enough people supporting
Linux to build a complete web-enabled SCM system (git) in a couple of
weeks, but noone has bothered to fix this glaring flaw for the past 5
years?  What gives?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-13 21:20                     ` Jeff Garzik
  2005-06-13 21:41                       ` Diego M. Vadell
@ 2005-06-14 21:11                       ` Molle Bestefich
  1 sibling, 0 replies; 37+ messages in thread
From: Molle Bestefich @ 2005-06-14 21:11 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-raid

Jeff Garzik wrote:
> The task is to update sata_nv to notify libata-core that a device
> has disappeared.  libata-core then notifies the SCSI layer of this.
> No new ioctls need to be supported.

> Calling scsi_remove_device() in the appropriate place is a good start.
> Add a debounce timer.  Make sure all commands are completed with an
> error (avoids memory leaks and lock-ups).  Other details.

Could you explain why a debounce timer is needed and what the other
details that need to be taken care of are?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-14 21:11         ` Molle Bestefich
@ 2005-06-14 21:37           ` Michael Tokarev
  2005-06-14 22:10             ` Diego M. Vadell
  2005-06-14 22:26             ` Molle Bestefich
  2005-06-14 21:53           ` David Greaves
  1 sibling, 2 replies; 37+ messages in thread
From: Michael Tokarev @ 2005-06-14 21:37 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: linux-raid

Molle Bestefich wrote:
> Michael Tokarev wrote:
> 
>>Without proper error handling [...]
>>linux SATA subsystem isn't ready for production,
>>and people should not rely on it *now*.

Note there was an "if" in that [...] -- "IF I/O errors are
*really* not handled properly, linux SATA subsystem isn't
ready etc"

> As long as disks aren't removed, it's OK for production..
> 
> But I'll agree with anyone who says that IDE under Linux just sucks
> donkey ass compared to IDE under Windows.  With Windows 2K and later,
> unplugging both SATA and PATA devices Just Works (tm).  I'm having a
> hard time figuring how this works: there's enough people supporting
> Linux to build a complete web-enabled SCM system (git) in a couple of
> weeks, but noone has bothered to fix this glaring flaw for the past 5
> years?  What gives?

Well.. I don't want to start a flamewar, but I tend to disagree.
I for one don't care (for now) about SATA and hot[un]plug for
harddrives.  But ol'good IDE works under linux just fine, together
with proper (fsvo "proper", which isn't still proper for alot of
IDE drives, but that's hardware and linux can't do anything there)
error handling and stuff.  And with out-of-the-box toolset
(smartmontools, hdparm), the support is better than win* --
I have more options to monitor my drives, to reallocate bad
blocks, to watch for drives dying, to control several h/w
aspects of devices than on win*.  Well, ok, (and oh, it's
another hot flamewar topic), I'm trying to avoid usage of
IDE drives due to various limitations and defeciencies, and
tend to use SCSI devices if at all possible...  And ok, ok,
IDE != SATA... ;)

/mjt

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-14 21:11         ` Molle Bestefich
  2005-06-14 21:37           ` Michael Tokarev
@ 2005-06-14 21:53           ` David Greaves
  2005-06-14 22:30             ` Molle Bestefich
  1 sibling, 1 reply; 37+ messages in thread
From: David Greaves @ 2005-06-14 21:53 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: linux-raid

Molle Bestefich wrote:

>Michael Tokarev wrote:
>  
>
>>Without proper error handling [...]
>>linux SATA subsystem isn't ready for production,
>>and people should not rely on it *now*.
>>    
>>
>
>As long as disks aren't removed, it's OK for production..
>
>But I'll agree with anyone who says that IDE under Linux just sucks
>donkey ass compared to IDE under Windows.  With Windows 2K and later,
>unplugging both SATA and PATA devices Just Works (tm).  I'm having a
>hard time figuring how this works: there's enough people supporting
>Linux to build a complete web-enabled SCM system (git) in a couple of
>weeks,
>
SW engineering

> but noone has bothered to fix this glaring flaw for the past 5
>years? 
>
HW engineering

> What gives?
>  
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-14 21:37           ` Michael Tokarev
@ 2005-06-14 22:10             ` Diego M. Vadell
  2005-06-14 22:17               ` Michael Tokarev
  2005-06-15  0:08               ` Jeff Garzik
  2005-06-14 22:26             ` Molle Bestefich
  1 sibling, 2 replies; 37+ messages in thread
From: Diego M. Vadell @ 2005-06-14 22:10 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: linux-raid

Michael Tokarev wrote:

>
> I for one don't care (for now) about SATA and hot[un]plug for
> harddrives.  

Hi Michael,
   Why don't you care? Is it so unlikely to get a hard disk as broken as 
if it looked the same as if it were unplugged? I mean, am I worrying too 
much.

Thanks,
 -- Diego.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-14 22:10             ` Diego M. Vadell
@ 2005-06-14 22:17               ` Michael Tokarev
  2005-06-15  0:08               ` Jeff Garzik
  1 sibling, 0 replies; 37+ messages in thread
From: Michael Tokarev @ 2005-06-14 22:17 UTC (permalink / raw)
  To: Diego M. Vadell; +Cc: linux-raid

Diego M. Vadell wrote:
> Michael Tokarev wrote:
> 
>>
>> I for one don't care (for now) about SATA and hot[un]plug for
>> harddrives.  
> 
> Hi Michael,
>   Why don't you care? Is it so unlikely to get a hard disk as broken as 
> if it looked the same as if it were unplugged? I mean, am I worrying too 
> much.

Heh.  Just Because (tm?) I don't have any SATA drives (for now).
And it's unlikely for me to see them in the near future, for
various resasons, one of them is because I tend to use SCSI
if at all possible...  ;)

SATA *still* is quite new, and there isn't much *good* SATA
hardware out there (think NCQ for example, which just does
not exists on alot of controllers and harddrives; and 5-years
old average SCSI disk performs alot better on random I/O
compared to modern *good* EIDE (no SATA here, remember?)
disk, even when the latter is much faster on linear I/O,
with larger cache etc...).  All that "PATA to SATA converters"
still built into some drives which are really PATA internally
but made to work on SATA bus with a converter..  and other
scary things... ;)

/mjt

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-14 21:37           ` Michael Tokarev
  2005-06-14 22:10             ` Diego M. Vadell
@ 2005-06-14 22:26             ` Molle Bestefich
  2005-06-14 23:07               ` Bill Davidsen
  2005-06-15  0:11               ` Jeff Garzik
  1 sibling, 2 replies; 37+ messages in thread
From: Molle Bestefich @ 2005-06-14 22:26 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: linux-raid

Michael Tokarev wrote:
> Molle Bestefich wrote:
> > Michael Tokarev wrote:
> >
> > > Without proper error handling [...]
> > > linux SATA subsystem isn't ready for production,
> > > and people should not rely on it *now*.
> 
> Note there was an "if" in that [...] -- "IF I/O errors are
> *really* not handled properly, linux SATA subsystem isn't
> ready etc"

Oh, ok.  Didn't seem relevant.  Sorry.


> > As long as disks aren't removed, it's OK for production..
> >
> > But I'll agree with anyone who says that IDE under Linux just sucks
> > donkey ass compared to IDE under Windows.  With Windows 2K and later,
> > unplugging both SATA and PATA devices Just Works (tm).  I'm having a
> > hard time figuring how this works: there's enough people supporting
> > Linux to build a complete web-enabled SCM system (git) in a couple of
> > weeks, but noone has bothered to fix this glaring flaw for the past 5
> > years?  What gives?
> 
> Well.. I don't want to start a flamewar, but I tend to disagree.
> I for one don't care (for now) about SATA and hot[un]plug for
> harddrives.

So you don't care that your Linux system will deadlock if a PATA disk
is removed, for instance?  Just an example, one that I've seen happen
a couple of times.  Never seen it with Windows, it just tells you that
the disk is gone.

>  But ol'good IDE works under linux just fine, together
> with proper

I disagree..

> (fsvo "proper", which isn't still proper for alot of
> IDE drives, but that's hardware and linux can't do anything there)

I'll agree to that one.

> error handling and stuff.  And with out-of-the-box toolset
> (smartmontools, hdparm), the support is better than win* --
> I have more options to monitor my drives, to reallocate bad
> blocks, to watch for drives dying, to control several h/w
> aspects of devices than on win*.

I'll have to disagree again.
For instance, SMART monitoring is not available for SATA devices under
Linux, while it is ready, available, working etc. under Windows.  I'm
unsure where those "more options" you're talking about is hiding.

> Well, ok, (and oh, it's
> another hot flamewar topic), I'm trying to avoid usage of
> IDE drives due to various limitations and defeciencies, and
> tend to use SCSI devices if at all possible...  And ok, ok,
> IDE != SATA... ;)

Let's not get into that, then ;-).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-14 21:53           ` David Greaves
@ 2005-06-14 22:30             ` Molle Bestefich
  2005-06-15 19:17               ` Mark Hahn
  0 siblings, 1 reply; 37+ messages in thread
From: Molle Bestefich @ 2005-06-14 22:30 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-raid

David Greaves wrote:
> Molle Bestefich wrote:
> > But I'll agree with anyone who says that IDE under Linux just sucks
> > donkey ass compared to IDE under Windows.  With Windows 2K and later,
> > unplugging both SATA and PATA devices Just Works (tm).  I'm having a
> > hard time figuring how this works: there's enough people supporting
> > Linux to build a complete web-enabled SCM system (git) in a couple of
> > weeks,
> >
> SW engineering
> 
> > but noone has bothered to fix this glaring flaw for the past 5
> > years?
> >
> HW engineering

Aha.  And how would you explain that this currently works under Windows?

The hardware magically changes properties once I boot XP?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-14 22:26             ` Molle Bestefich
@ 2005-06-14 23:07               ` Bill Davidsen
  2005-06-14 23:18                 ` Molle Bestefich
  2005-06-14 23:46                 ` Mike Hardy
  2005-06-15  0:11               ` Jeff Garzik
  1 sibling, 2 replies; 37+ messages in thread
From: Bill Davidsen @ 2005-06-14 23:07 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: Michael Tokarev, linux-raid

Molle Bestefich wrote:

>Michael Tokarev wrote:
>  
>
>>Molle Bestefich wrote:
>>    
>>
>>>Michael Tokarev wrote:
>>>
>>>      
>>>
>>>>Without proper error handling [...]
>>>>linux SATA subsystem isn't ready for production,
>>>>and people should not rely on it *now*.
>>>>        
>>>>
>>Note there was an "if" in that [...] -- "IF I/O errors are
>>*really* not handled properly, linux SATA subsystem isn't
>>ready etc"
>>    
>>
>
>Oh, ok.  Didn't seem relevant.  Sorry.
>
>
>  
>
>>>As long as disks aren't removed, it's OK for production..
>>>
>>>But I'll agree with anyone who says that IDE under Linux just sucks
>>>donkey ass compared to IDE under Windows.  With Windows 2K and later,
>>>unplugging both SATA and PATA devices Just Works (tm).  I'm having a
>>>hard time figuring how this works: there's enough people supporting
>>>Linux to build a complete web-enabled SCM system (git) in a couple of
>>>weeks, but noone has bothered to fix this glaring flaw for the past 5
>>>years?  What gives?
>>>      
>>>
>>Well.. I don't want to start a flamewar, but I tend to disagree.
>>I for one don't care (for now) about SATA and hot[un]plug for
>>harddrives.
>>    
>>
>
>So you don't care that your Linux system will deadlock if a PATA disk
>is removed, for instance?  Just an example, one that I've seen happen
>a couple of times.  Never seen it with Windows, it just tells you that
>the disk is gone.
>

Actually I suspect most Linux users are smart enough not to pull a PATA 
drive out while in use, and it works just fine if you offline the drive 
before removal. I may be wrong recently, it seems that the ability to 
disable the interface was taken out of either hdparm or a recent kernel. 
I read about, exchange comments with Alan Cox, and I don't recall the 
details.

There may have been a step backward, hopefully just a brain fart and not 
a deliberate decision to disable hot swap.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-14 23:07               ` Bill Davidsen
@ 2005-06-14 23:18                 ` Molle Bestefich
  2005-06-15  0:12                   ` Jeff Garzik
  2005-06-14 23:46                 ` Mike Hardy
  1 sibling, 1 reply; 37+ messages in thread
From: Molle Bestefich @ 2005-06-14 23:18 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-raid

Bill Davidsen wrote:
> Molle Bestefich wrote:
> > So you don't care that your Linux system will deadlock if a PATA disk
> > is removed, for instance?  Just an example, one that I've seen happen
> > a couple of times.  Never seen it with Windows, it just tells you that
> > the disk is gone.
> 
> Actually I suspect most Linux users are smart enough not to pull a PATA
> drive out while in use, and it works just fine if you offline the drive
> before removal.

That's not a good argument, other things can happen to throw a disk offline.
Flaky IDE or power cables and disk failure come to mind.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-14 23:07               ` Bill Davidsen
  2005-06-14 23:18                 ` Molle Bestefich
@ 2005-06-14 23:46                 ` Mike Hardy
  1 sibling, 0 replies; 37+ messages in thread
From: Mike Hardy @ 2005-06-14 23:46 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Molle Bestefich, Michael Tokarev, linux-raid

Bill Davidsen wrote:

> Actually I suspect most Linux users are smart enough not to pull a PATA
> drive out while in use, and it works just fine if you offline the drive
> before removal. I may be wrong recently, it seems that the ability to
> disable the interface was taken out of either hdparm or a recent kernel.
> I read about, exchange comments with Alan Cox, and I don't recall the
> details.
> 
> There may have been a step backward, hopefully just a brain fart and not
> a deliberate decision to disable hot swap.

This is barely pertinent to the original discussion, but I still use
hdparm -[U|R] to enable/disable an interface and attach/detach an
optical drive on my laptop (Dell D800 - not that new or fancy anymore).

It works fine, as far as I can tell, once you get udev to have the
devices in /dev/ for you

Partial uname:
2.6.11-1.27_FC3 #1 Tue May 17 20:27:37 EDT 2005 i686

Hopefully they haven't disabled it after that, I'll be disappointed.

I do agree in general that SMART support for SATA at least should be in
kernels by this point, and correct error propagation as well. Perhaps
all the kernel devs either don't get the new hardware, or they're on
SCSI though. I'm still building with PATA until SATA's got it all anyway.

-Mike

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-14 22:10             ` Diego M. Vadell
  2005-06-14 22:17               ` Michael Tokarev
@ 2005-06-15  0:08               ` Jeff Garzik
  1 sibling, 0 replies; 37+ messages in thread
From: Jeff Garzik @ 2005-06-15  0:08 UTC (permalink / raw)
  To: Diego M. Vadell; +Cc: Michael Tokarev, linux-raid

Diego M. Vadell wrote:
> Michael Tokarev wrote:
> 
>>
>> I for one don't care (for now) about SATA and hot[un]plug for
>> harddrives.  
> 
> 
> Hi Michael,
>   Why don't you care? Is it so unlikely to get a hard disk as broken as 
> if it looked the same as if it were unplugged? I mean, am I worrying too 
> much.

The ATA drivers handle errors.  Just not hotplug and hotunplug, which 
are very different beasts.

	Jeff



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-14 22:26             ` Molle Bestefich
  2005-06-14 23:07               ` Bill Davidsen
@ 2005-06-15  0:11               ` Jeff Garzik
  2005-06-15  0:34                 ` Guy
  1 sibling, 1 reply; 37+ messages in thread
From: Jeff Garzik @ 2005-06-15  0:11 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: Michael Tokarev, linux-raid

Molle Bestefich wrote:
> So you don't care that your Linux system will deadlock if a PATA disk
> is removed, for instance?  Just an example, one that I've seen happen
> a couple of times.  Never seen it with Windows, it just tells you that
> the disk is gone.

Your computer will deadlock if you yank a PCI card out, too.  That 
doesn't mean all PCI drivers are broken.

Very very (did I say "very"?) few people yank hard drives which are not 
built specifically for hotplug, in a hotplug enclosure.

	Jeff



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-14 23:18                 ` Molle Bestefich
@ 2005-06-15  0:12                   ` Jeff Garzik
  2005-06-15  0:19                     ` Molle Bestefich
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Garzik @ 2005-06-15  0:12 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: Bill Davidsen, linux-raid

Molle Bestefich wrote:
> Bill Davidsen wrote:
> 
>>Molle Bestefich wrote:
>>
>>>So you don't care that your Linux system will deadlock if a PATA disk
>>>is removed, for instance?  Just an example, one that I've seen happen
>>>a couple of times.  Never seen it with Windows, it just tells you that
>>>the disk is gone.
>>
>>Actually I suspect most Linux users are smart enough not to pull a PATA
>>drive out while in use, and it works just fine if you offline the drive
>>before removal.
> 
> 
> That's not a good argument, other things can happen to throw a disk offline.
> Flaky IDE or power cables and disk failure come to mind.

Linux handles these just fine.

	Jeff




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-15  0:12                   ` Jeff Garzik
@ 2005-06-15  0:19                     ` Molle Bestefich
  0 siblings, 0 replies; 37+ messages in thread
From: Molle Bestefich @ 2005-06-15  0:19 UTC (permalink / raw)
  To: linux-raid

Jeff Garzik wrote:
> > That's not a good argument, other things can happen to throw a disk offline.
> > Flaky IDE or power cables and disk failure come to mind.
> 
> Linux handles these just fine.

Ok.  Since a flaky power cable is in essence the same as a "hot drive
unplug", you're basically saying that Linux handles that just fine
too, right?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: sata_nv and RAID1
  2005-06-15  0:11               ` Jeff Garzik
@ 2005-06-15  0:34                 ` Guy
  0 siblings, 0 replies; 37+ messages in thread
From: Guy @ 2005-06-15  0:34 UTC (permalink / raw)
  To: 'Jeff Garzik', 'Molle Bestefich'
  Cc: 'Michael Tokarev', linux-raid



> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Jeff Garzik
> Sent: Tuesday, June 14, 2005 8:12 PM
> To: Molle Bestefich
> Cc: Michael Tokarev; linux-raid@vger.kernel.org
> Subject: Re: sata_nv and RAID1
> 
> Molle Bestefich wrote:
> > So you don't care that your Linux system will deadlock if a PATA disk
> > is removed, for instance?  Just an example, one that I've seen happen
> > a couple of times.  Never seen it with Windows, it just tells you that
> > the disk is gone.
> 
> Your computer will deadlock if you yank a PCI card out, too.  That
> doesn't mean all PCI drivers are broken.
> 
> Very very (did I say "very"?) few people yank hard drives which are not
> built specifically for hotplug, in a hotplug enclosure.

But...
If a hard disk fails in such a way that the interface from the disk goes
dead, that is not a hot un-plug!  It is a drive failure.  Maybe a hot
un-plug should be considered a failure until support is added.

I have seen SCSI disks go away when they fail, no one pulled any plugs!

Also, it is common in the SCSI world to have 2 disk trays in a RAID1 config,
each with different power supplies.  If a power supply were to fail half the
disks would be gone.  Would SATA/Linux consider half the disks hot
un-plugged?

Guy

> 
> 	Jeff
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-14 22:30             ` Molle Bestefich
@ 2005-06-15 19:17               ` Mark Hahn
  2005-06-15 19:32                 ` Molle Bestefich
  0 siblings, 1 reply; 37+ messages in thread
From: Mark Hahn @ 2005-06-15 19:17 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: linux-raid

> > > unplugging both SATA and PATA devices Just Works (tm).  I'm having a
> > > hard time figuring how this works: there's enough people supporting
> > > Linux to build a complete web-enabled SCM system (git) in a couple of
> > > weeks,
> > 
> > SW engineering

and a real and urgent need.  and also the fact that it's easier.

> > > but noone has bothered to fix this glaring flaw for the past 5
> > > years?
> > >
> > HW engineering
> 
> Aha.  And how would you explain that this currently works under Windows?

someone paid to get it to work under windows.  also, the quality requirement
for windows is basically non-existent (redmond will not tell you you have 
crap code in your driver).

I think the main issue is that there's not that much demand for ata-hotplug
under linux.  it is, after all, an unusual thing to do, mainly involving
crashes.  perhaps linux experiences fewer crashes.

but if you're so upset about this, why not either go back to windows,
or do something constructive (program or pay someone to program for you)?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-15 19:17               ` Mark Hahn
@ 2005-06-15 19:32                 ` Molle Bestefich
  2005-06-15 19:34                   ` Molle Bestefich
  0 siblings, 1 reply; 37+ messages in thread
From: Molle Bestefich @ 2005-06-15 19:32 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-raid

Mark Hahn wrote:
> > > > unplugging both SATA and PATA devices Just Works (tm).  I'm having a
> > > > hard time figuring how this works: there's enough people supporting
> > > > Linux to build a complete web-enabled SCM system (git) in a couple of
> > > > weeks,
> > >
> > > SW engineering
> 
> and a real and urgent need.  and also the fact that it's easier.

Easier?  Why?
People need the operating system to cope with unplugging and/or
hotplugging devices under Windows but not Linux?  How do you figure?

> > > > but noone has bothered to fix this glaring flaw for the past 5
> > > > years?
> > > >
> > > HW engineering
> >
> > Aha.  And how would you explain that this currently works under Windows?
> 
> someone paid to get it to work under windows.  also, the quality requirement
> for windows is basically non-existent (redmond will not tell you you have
> crap code in your driver).

I've never had a crash in the IDE subsystem in Windows, I've had a ton in Linux.
Guess our experiences differ.

> I think the main issue is that there's not that much demand for ata-hotplug
> under linux.  it is, after all, an unusual thing to do, mainly involving
> crashes.  perhaps linux experiences fewer crashes.

"Perhaps it did 10 years ago, perhaps that's hardly a valid statement anymore."

> but if you're so upset about this, why not either go back to windows,

You don't have anything constructive to say and you can't provide any
answers, but you'd like to tell me to shut up...  Ok..

> or do something constructive (program or pay someone to program for you)?

For starters, I've asked Jeff Garzik (earlier in this thread) for what
details needed to be sorted out in order to make this work.  I'm still
eagerly awaiting his response.

And this thread itself could hopefully serve to make someone come
forward with some REAL arguments, or tell about current efforts, or
such.

I consider it constructive.  Perhaps it's just you that have an
attitude problem.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
  2005-06-15 19:32                 ` Molle Bestefich
@ 2005-06-15 19:34                   ` Molle Bestefich
  0 siblings, 0 replies; 37+ messages in thread
From: Molle Bestefich @ 2005-06-15 19:34 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-raid

I wrote:
> Perhaps it's just you that have an attitude problem.

What am I saying?  Sorry.
Cancel that remark, I'm not even going to get into that discussion with you.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: sata_nv and RAID1
       [not found]                           ` <200506151427.09114.dvadell@lantech.com.ar>
@ 2005-06-16  6:43                             ` raz ben jehuda
  0 siblings, 0 replies; 37+ messages in thread
From: raz ben jehuda @ 2005-06-16  6:43 UTC (permalink / raw)
  To: Diego M. Vadell; +Cc: linux-raid

"didn't fail" means that the raid entered into degraded mode. in
/proc/mdstats you would notice that the disk is faulty.
I have a root file system over raid1 and I could not allow myself 
letting the operating system hang, so i added it. 
This is not the best solution and the maintainers of the code would
disapprove it since it is not complete.But for myself it is suffice.   

On Wed, 2005-06-15 at 17:27, Diego M. Vadell wrote:
> Hi Raz,
>    My english is not good enough. When you say "And the raid1 didn't fail", 
> did you mean it worked as expected, entering degraded mode, or that it failed 
> to enter in degraded mode?
>    Also, is there any reason why you didnt send this also to the mailling 
> list? Maybe it's useful to somebody else.
> 
>   Thank you very much for your mail. I dont know if I will be able to test it, 
> as the server is in it's way to production , but if I can, I will give it a 
> try.
> 
>  -- Diego.
> 
> On Wednesday 15 June 2005 06:56, you wrote:
> > I had the same problem.I added at the end ata_scsi_error the following
> > command:
> > ata_port_disable(ap);
> > And the raid1 didn't fail. I didn't get the hotswap so rebooting is
> > necessary and then you raidhotadd the new disk.
> > Also, before hotadd type :
> > echo 1000 > /proc/sys/dev/raid1/speed_limit_max
> > so your system won't be too busy with ios.
> >
> > On Tue, 2005-06-14 at 00:41, Diego M. Vadell wrote:
> > > On Monday 13 June 2005 18:20, you wrote:
> > > > You need to go into the SCSI layer and figure out how to do it.
> > > >
> > > > Calling scsi_remove_device() in the appropriate place is a good start.
> > > > Add a debounce timer.  Make sure all commands are completed with an
> > > > error (avoids memory leaks and lock-ups).  Other details.
> > > >
> > > > Hot unplug is not just a simple function call...
> > > >
> > > >  Jeff
> > >
> > > thank you Jeff! I dont think I will be able to do it, though. Thank you
> > > for the fast answers.
> > >
> > >  -- Diego.
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Raz
Long Live The Penguin


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2005-06-16  6:43 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-11 16:13 sata_nv and RAID1 Diego M. Vadell
2005-06-11 19:26 ` Jeff Garzik
2005-06-11 20:29   ` Michael Tokarev
2005-06-13  3:15     ` Diego M. Vadell
2005-06-13  6:45     ` Jeff Garzik
2005-06-13 11:57       ` Michael Tokarev
2005-06-13 12:27         ` Peter T. Breuer
2005-06-13 14:40           ` Diego M. Vadell
2005-06-13 16:07             ` Peter T. Breuer
2005-06-13 16:51               ` Diego M. Vadell
2005-06-13 17:59                 ` Jeff Garzik
2005-06-13 21:00                   ` Diego M. Vadell
2005-06-13 21:20                     ` Jeff Garzik
2005-06-13 21:41                       ` Diego M. Vadell
     [not found]                         ` <1118818568.3089.5.camel@raz-laptop>
     [not found]                           ` <200506151427.09114.dvadell@lantech.com.ar>
2005-06-16  6:43                             ` raz ben jehuda
2005-06-14 21:11                       ` Molle Bestefich
2005-06-13 19:00                 ` Peter T. Breuer
2005-06-13 20:41                   ` Raz Ben-Jehuda(caro)
2005-06-13 21:16                   ` Diego M. Vadell
2005-06-14 21:11         ` Molle Bestefich
2005-06-14 21:37           ` Michael Tokarev
2005-06-14 22:10             ` Diego M. Vadell
2005-06-14 22:17               ` Michael Tokarev
2005-06-15  0:08               ` Jeff Garzik
2005-06-14 22:26             ` Molle Bestefich
2005-06-14 23:07               ` Bill Davidsen
2005-06-14 23:18                 ` Molle Bestefich
2005-06-15  0:12                   ` Jeff Garzik
2005-06-15  0:19                     ` Molle Bestefich
2005-06-14 23:46                 ` Mike Hardy
2005-06-15  0:11               ` Jeff Garzik
2005-06-15  0:34                 ` Guy
2005-06-14 21:53           ` David Greaves
2005-06-14 22:30             ` Molle Bestefich
2005-06-15 19:17               ` Mark Hahn
2005-06-15 19:32                 ` Molle Bestefich
2005-06-15 19:34                   ` Molle Bestefich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).