fc transport creates second set of targets for devices in an "md"

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* fc transport creates second set of targets for devices in an "md"
@ 2006-06-08 19:13 Michael Reed
  2006-06-08 19:57 ` James Smart
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Michael Reed @ 2006-06-08 19:13 UTC (permalink / raw)
  To: James.Smart; +Cc: Jeremy Higdon, Gary Hagensen, Michael Reed, linux-scsi

I created an md device on two fibre channel disks, sde and sdf.
I then disabled the switch port to which the hba is connected.
After the remote port time out messages, I re-enabled the switch
port.  Three things happen that are weird.  First, two unexpected
responses while scanning.  Second, the creation of sdm and
sdn.  Third, the md device remains inaccessible.

I don't think this is working the way it's intended to.  I
suspect it will cause big problems for multi-path volume managers
in a fail back situation.

Mike





duck /root# ls /sys/block
hda    loop1  loop4  loop7  ram10  ram13  ram2  ram5  ram8  sdb  sde  sdh  sdk
hdc    loop2  loop5  ram0   ram11  ram14  ram3  ram6  ram9  sdc  sdf  sdi  sdl
loop0  loop3  loop6  ram1   ram12  ram15  ram4  ram7  sda   sdd  sdg  sdj

duck /root# mdadm --create /dev/md0 --level stripe -n 2 /dev/sde /dev/sdf
md: bind<sde>
md: bind<sdf>
md0: setting max_sectors to 128, segment boundary to 32767
raid0: looking at sdf
raid0:   comparing sdf(35843584) with sdf(35843584)
raid0:   END
raid0:   ==> UNIQUE
raid0: 1 zones
raid0: looking at sde
raid0:   comparing sde(35843584) with sdf(35843584)
raid0:   EQUAL
raid0: FINAL 1 zones
raid0: done.
raid0 : md_size is 71687168 blocks.
raid0 : conf->hash_spacing is 71687168 blocks.
raid0 : nb_zone is 1.
raid0 : Allocating 8 bytes for hash.
mdadm: array /dev/md0 started.

duck /root# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [raid4] [multipath]
md0 : active raid0 sdf[1] sde[0]
      71687168 blocks 64k chunks
unused devices: <none>


duck /root# dd if=/dev/md0 bs=256k count=1 of=/dev/null
1+0 records in
1+0 records out
262144 bytes (262 kB) copied, 0.0133771 seconds, 19.6 MB/s


************************* portdisable 11 on fibre channel switch *******************

duck /root#  rport-5:0-0: blocked FC remote port time out: removing target and saving binding
 rport-5:0-1: blocked FC remote port time out: removing target and saving binding
 rport-5:0-2: blocked FC remote port time out: removing target and saving binding
 rport-5:0-3: blocked FC remote port time out: removing target and saving binding
 rport-5:0-4: blocked FC remote port time out: removing target and saving binding
 rport-5:0-5: blocked FC remote port time out: removing target and saving binding
 rport-5:0-6: blocked FC remote port time out: removing target and saving binding
 rport-5:0-7: blocked FC remote port time out: removing target and saving binding
 rport-5:0-8: blocked FC remote port time out: removing target and saving binding
 rport-5:0-9: blocked FC remote port time out: removing target and saving binding
 rport-5:0-10: blocked FC remote port time out: removing target and saving binding
 rport-5:0-11: blocked FC remote port time out: removing target and saving binding
 rport-5:0-12: blocked FC remote port time out: removing target and saving binding
 rport-5:0-13: blocked FC remote port time out: removing target and saving binding
 rport-5:0-14: blocked FC remote port time out: removing target and saving binding
 rport-5:0-15: blocked FC remote port time out: removing target and saving binding


duck /root# ls /sys/block
hda    loop1  loop4  loop7  ram1   ram12  ram15  ram4  ram7  sda  sdd
hdc    loop2  loop5  md0    ram10  ram13  ram2   ram5  ram8  sdb
loop0  loop3  loop6  ram0   ram11  ram14  ram3   ram6  ram9  sdc


duck /root# !dd
dd if=/dev/md0 bs=256k count=1 of=/dev/null
 5:0:8:0: rejecting I/O to dead device
Buffer I/O error on device md0, logical block 0
Buffer I/O error on device md0, logical block 1
Buffer I/O error on device md0, logical block 2
Buffer I/O error on device md0, logical block 3
Buffer I/O error on device md0, logical block 8
Buffer I/O error on device md0, logical block 9
Buffer I/O error on device md0, logical block 10
Buffer I/O error on device md0, logical block 11
Buffer I/O error on device md0, logical block 16
Buffer I/O error on device md0, logical block 17
 5:0:9:0: rejecting I/O to dead device
 5:0:8:0: rejecting I/O to dead device
dd: reading `/dev/md0': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.0655691 seconds, 0.0 kB/s

************************* portenable 11 on fibre channel switch *******************



  Vendor: SGI       Model: Universal Xport   Rev: 0615
  Type:   Direct-Access                      ANSI SCSI revision: 03
 5:0:0:31: Attached scsi generic sg4 type 0
  Vendor: SGI       Model: Universal Xport   Rev: 0615
  Type:   Direct-Access                      ANSI SCSI revision: 03
 5:0:1:31: Attached scsi generic sg5 type 0
  Vendor: SGI       Model: Universal Xport   Rev: 0615
  Type:   Direct-Access                      ANSI SCSI revision: 03
 5:0:2:31: Attached scsi generic sg6 type 0
  Vendor: SGI       Model: Universal Xport   Rev: 0615
  Type:   Direct-Access                      ANSI SCSI revision: 03
 5:0:3:31: Attached scsi generic sg7 type 0
  Vendor: SGI       Model: Universal Xport   Rev: 0615
  Type:   Direct-Access                      ANSI SCSI revision: 03
 5:0:4:31: Attached scsi generic sg8 type 0
  Vendor: SGI       Model: Universal Xport   Rev: 0615
  Type:   Direct-Access                      ANSI SCSI revision: 03
 5:0:5:31: Attached scsi generic sg9 type 0
  Vendor: SGI       Model: Universal Xport   Rev: 0615
  Type:   Direct-Access                      ANSI SCSI revision: 03
 5:0:6:31: Attached scsi generic sg10 type 0
  Vendor: SGI       Model: Universal Xport   Rev: 0615
  Type:   Direct-Access                      ANSI SCSI revision: 03
 5:0:7:31: Attached scsi generic sg11 type 0
  Vendor: SGI       Model: ST336753FC        Rev: 2701
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdg: 71687372 512-byte hdwr sectors (36704 MB)
sdg: Write Protect is off
sdg: Mode Sense: ab 00 10 08
SCSI device sdg: drive cache: write back w/ FUA
SCSI device sdg: 71687372 512-byte hdwr sectors (36704 MB)
sdg: Write Protect is off
sdg: Mode Sense: ab 00 10 08
SCSI device sdg: drive cache: write back w/ FUA
 sdg: sdg1 sdg9 sdg11
sd 5:0:8:0: Attached scsi disk sdg
sd 5:0:8:0: Attached scsi generic sg12 type 0
  Vendor: SGI       Model: ST336753FC        Rev: 2701
  Type:   Direct-Access                      ANSI SCSI revision: 03
error 1
 5:0:8:0: Unexpected response from lun 0 while scanning, scan aborted   <<<<<<<<<<<<<<<<<<<<<<<
  Vendor: SGI       Model: ST336753FC        Rev: 2701
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdh: 71687372 512-byte hdwr sectors (36704 MB)
sdh: Write Protect is off
sdh: Mode Sense: ab 00 10 08
SCSI device sdh: drive cache: write back w/ FUA
SCSI device sdh: 71687372 512-byte hdwr sectors (36704 MB)
sdh: Write Protect is off
sdh: Mode Sense: ab 00 10 08
SCSI device sdh: drive cache: write back w/ FUA
 sdh: sdh1 sdh9 sdh11
sd 5:0:9:0: Attached scsi disk sdh
sd 5:0:9:0: Attached scsi generic sg13 type 0
  Vendor: SGI       Model: ST336753FC        Rev: 2701
  Type:   Direct-Access                      ANSI SCSI revision: 03
error 1
 5:0:9:0: Unexpected response from lun 0 while scanning, scan aborted   <<<<<<<<<<<<<<<<<<<<<<<<<<
  Vendor: SGI       Model: ST336753FC        Rev: 2701
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdi: 71687372 512-byte hdwr sectors (36704 MB)
sdi: Write Protect is off
sdi: Mode Sense: ab 00 10 08
SCSI device sdi: drive cache: write back w/ FUA
SCSI device sdi: 71687372 512-byte hdwr sectors (36704 MB)
sdi: Write Protect is off
sdi: Mode Sense: ab 00 10 08
SCSI device sdi: drive cache: write back w/ FUA
 sdi: sdi1 sdi9 sdi11
sd 5:0:10:0: Attached scsi disk sdi
sd 5:0:10:0: Attached scsi generic sg14 type 0
  Vendor: SGI       Model: ST336753FC        Rev: 2701
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdj: 71687372 512-byte hdwr sectors (36704 MB)
sdj: Write Protect is off
sdj: Mode Sense: ab 00 10 08
SCSI device sdj: drive cache: write back w/ FUA
SCSI device sdj: 71687372 512-byte hdwr sectors (36704 MB)
sdj: Write Protect is off
sdj: Mode Sense: ab 00 10 08
SCSI device sdj: drive cache: write back w/ FUA
 sdj: sdj1 sdj9 sdj11
sd 5:0:11:0: Attached scsi disk sdj
sd 5:0:11:0: Attached scsi generic sg15 type 0
  Vendor: SGI       Model: ST336753FC        Rev: 2701
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdk: 71687372 512-byte hdwr sectors (36704 MB)
sdk: Write Protect is off
sdk: Mode Sense: ab 00 10 08
SCSI device sdk: drive cache: write back w/ FUA
SCSI device sdk: 71687372 512-byte hdwr sectors (36704 MB)
sdk: Write Protect is off
sdk: Mode Sense: ab 00 10 08
SCSI device sdk: drive cache: write back w/ FUA
 sdk: sdk1 sdk9 sdk11
sd 5:0:12:0: Attached scsi disk sdk
sd 5:0:12:0: Attached scsi generic sg16 type 0
  Vendor: SGI       Model: ST336753FC        Rev: 2701
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdl: 71687372 512-byte hdwr sectors (36704 MB)
sdl: Write Protect is off
sdl: Mode Sense: ab 00 10 08
SCSI device sdl: drive cache: write back w/ FUA
SCSI device sdl: 71687372 512-byte hdwr sectors (36704 MB)
sdl: Write Protect is off
sdl: Mode Sense: ab 00 10 08
SCSI device sdl: drive cache: write back w/ FUA
 sdl: sdl1 sdl9 sdl11
sd 5:0:13:0: Attached scsi disk sdl
sd 5:0:13:0: Attached scsi generic sg17 type 0
  Vendor: SGI       Model: ST336753FC        Rev: 2701
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdm: 71687372 512-byte hdwr sectors (36704 MB)
sdm: Write Protect is off
sdm: Mode Sense: ab 00 10 08
SCSI device sdm: drive cache: write back w/ FUA
SCSI device sdm: 71687372 512-byte hdwr sectors (36704 MB)
sdm: Write Protect is off
sdm: Mode Sense: ab 00 10 08
SCSI device sdm: drive cache: write back w/ FUA
 sdm: sdm1 sdm9 sdm11
sd 5:0:14:0: Attached scsi disk sdm
sd 5:0:14:0: Attached scsi generic sg18 type 0
  Vendor: SGI       Model: ST336753FC        Rev: 2701
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdn: 71687372 512-byte hdwr sectors (36704 MB)
sdn: Write Protect is off
sdn: Mode Sense: ab 00 10 08
SCSI device sdn: drive cache: write back w/ FUA
SCSI device sdn: 71687372 512-byte hdwr sectors (36704 MB)
sdn: Write Protect is off
sdn: Mode Sense: ab 00 10 08
SCSI device sdn: drive cache: write back w/ FUA
 sdn: sdn1 sdn9 sdn11
sd 5:0:15:0: Attached scsi disk sdn
sd 5:0:15:0: Attached scsi generic sg19 type 0



duck /root# ls /sys/block
hda    loop1  loop4  loop7  ram1   ram12  ram15  ram4  ram7  sda  sdd  sdi  sdl
hdc    loop2  loop5  md0    ram10  ram13  ram2   ram5  ram8  sdb  sdg  sdj  sdm
loop0  loop3  loop6  ram0   ram11  ram14  ram3   ram6  ram9  sdc  sdh  sdk  sdn


duck /root# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [raid4] [multipath]
md0 : active raid0 sdf[1] sde[0]
      71687168 blocks 64k chunks
unused devices: <none>



duck /root# !dd
dd if=/dev/md0 bs=256k count=1 of=/dev/null
 5:0:8:0: rejecting I/O to dead device
printk: 23 messages suppressed.
Buffer I/O error on device md0, logical block 0
Buffer I/O error on device md0, logical block 1
Buffer I/O error on device md0, logical block 2
Buffer I/O error on device md0, logical block 3
Buffer I/O error on device md0, logical block 8
Buffer I/O error on device md0, logical block 9
Buffer I/O error on device md0, logical block 10
Buffer I/O error on device md0, logical block 11
Buffer I/O error on device md0, logical block 16
Buffer I/O error on device md0, logical block 17
 5:0:9:0: rejecting I/O to dead device
 5:0:8:0: rejecting I/O to dead device
dd: reading `/dev/md0': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.154544 seconds, 0.0 kB/s



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fc transport creates second set of targets for devices in an "md"
  2006-06-08 19:13 fc transport creates second set of targets for devices in an "md" Michael Reed
@ 2006-06-08 19:57 ` James Smart
  2006-06-09 16:26   ` Michael Reed
  2006-06-08 20:19 ` Mike Christie
  2006-06-09  0:43 ` James Bottomley
  2 siblings, 1 reply; 12+ messages in thread
From: James Smart @ 2006-06-08 19:57 UTC (permalink / raw)
  To: Michael Reed; +Cc: Jeremy Higdon, Gary Hagensen, linux-scsi

The data that would be most interesting is a recursive listing of the
device via the /sys/devices tree.  This would show the host, rports,
targets, and sdevs.  Getting a copy of this both before, when missing,
and after they are reconnected. The additional contents to look at
is : contents of /sys/class/fc_remote_ports.

Also - it's unlikely that FC is to blame here. The above data would
show whether we had the same WWN's, reused target id's or not, and
what the midlayer reassigned to h/c/t/l's. It would show if the FC
transport is in error or not.

Relative to volume managers - yes, they have some difficulty. However,
the tact they plan on taking is to bind the device based on the udev
name that get built on it. Which means - md would have issues, but DM
is planning for it.

I do have a request to make an option for the transport to not remove
the devices upon disconnect.

-- james

Michael Reed wrote:
> I created an md device on two fibre channel disks, sde and sdf.
> I then disabled the switch port to which the hba is connected.
> After the remote port time out messages, I re-enabled the switch
> port.  Three things happen that are weird.  First, two unexpected
> responses while scanning.  Second, the creation of sdm and
> sdn.  Third, the md device remains inaccessible.
> 
> I don't think this is working the way it's intended to.  I
> suspect it will cause big problems for multi-path volume managers
> in a fail back situation.
> 
> Mike
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fc transport creates second set of targets for devices in an "md"
  2006-06-08 19:13 fc transport creates second set of targets for devices in an "md" Michael Reed
  2006-06-08 19:57 ` James Smart
@ 2006-06-08 20:19 ` Mike Christie
  2006-06-09 16:35   ` Michael Reed
  2006-06-09  0:43 ` James Bottomley
  2 siblings, 1 reply; 12+ messages in thread
From: Mike Christie @ 2006-06-08 20:19 UTC (permalink / raw)
  To: Michael Reed; +Cc: James.Smart, Jeremy Higdon, Gary Hagensen, linux-scsi

Michael Reed wrote:
> I created an md device on two fibre channel disks, sde and sdf.
> I then disabled the switch port to which the hba is connected.
> After the remote port time out messages, I re-enabled the switch
> port.  Three things happen that are weird.  First, two unexpected
> responses while scanning.  Second, the creation of sdm and
> sdn.  Third, the md device remains inaccessible.
> 
> I don't think this is working the way it's intended to.  I
> suspect it will cause big problems for multi-path volume managers
> in a fail back situation.
> 

Even if the rport is removed and the devices under it are removed, md
can still have a reference to the device so the memory does not
disappear on it (MD still thinks the device is there but scsi says it is
gone basically). Because of this, when you plug in the cable again and a
new rport is created sd.c can end up allocating another sdX value.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fc transport creates second set of targets for devices in an "md"
  2006-06-08 19:13 fc transport creates second set of targets for devices in an "md" Michael Reed
  2006-06-08 19:57 ` James Smart
  2006-06-08 20:19 ` Mike Christie
@ 2006-06-09  0:43 ` James Bottomley
  2006-06-09 16:35   ` Michael Reed
  2 siblings, 1 reply; 12+ messages in thread
From: James Bottomley @ 2006-06-09  0:43 UTC (permalink / raw)
  To: Michael Reed; +Cc: James.Smart, Jeremy Higdon, Gary Hagensen, linux-scsi

On Thu, 2006-06-08 at 14:13 -0500, Michael Reed wrote:
> I created an md device on two fibre channel disks, sde and sdf.
> I then disabled the switch port to which the hba is connected.
> After the remote port time out messages, I re-enabled the switch
> port.  Three things happen that are weird.  First, two unexpected
> responses while scanning.  Second, the creation of sdm and
> sdn.  Third, the md device remains inaccessible.

This is sort of as expected.  What you did was wait out the reconnection
timer, so the mid layer failed and offlined the devices.  Thus, when
they come back, they get new instances.  If you'd done a remove-device
after they went offline, they'd have come back to the same location (as
long as nothing had them open).  But this is user level stuff.
Basically, when a path goes dead it's up to the multi-path user level to
remove it an wait for udev to inform it that another SCSI node has
appeared and has the correct signature to be another path to the device.

> I don't think this is working the way it's intended to.  I
> suspect it will cause big problems for multi-path volume managers
> in a fail back situation.

James

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fc transport creates second set of targets for devices in an "md"
  2006-06-08 19:57 ` James Smart
@ 2006-06-09 16:26   ` Michael Reed
  2006-06-09 18:10     ` Michael Reed
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Reed @ 2006-06-09 16:26 UTC (permalink / raw)
  To: James.Smart
  Cc: Jeremy Higdon, Gary Hagensen, linux-scsi, Jim Nead, Michael Reed



James Smart wrote:
> The data that would be most interesting is a recursive listing of the
> device via the /sys/devices tree.  This would show the host, rports,
> targets, and sdevs.  Getting a copy of this both before, when missing,
> and after they are reconnected. The additional contents to look at
> is : contents of /sys/class/fc_remote_ports.

I'll capture data and post later today.

> 
> Also - it's unlikely that FC is to blame here. The above data would
> show whether we had the same WWN's, reused target id's or not, and
> what the midlayer reassigned to h/c/t/l's. It would show if the FC
> transport is in error or not.

Agree.  Sometimes the title is used to draw attention to a problem.  ;)
It worked.

> 
> Relative to volume managers - yes, they have some difficulty. However,
> the tact they plan on taking is to bind the device based on the udev
> name that get built on it. Which means - md would have issues, but DM
> is planning for it.

So, do any volume managers work correctly now?  By "correct" I mean,
when the device returns, will they be able to access it?

> 
> I do have a request to make an option for the transport to not remove
> the devices upon disconnect.

I understand why.  If this does what I suspect it will, please add SGI
to the list.  We should probably discuss requirements.

Mike

> 
> -- james
> 
> Michael Reed wrote:
>> I created an md device on two fibre channel disks, sde and sdf.
>> I then disabled the switch port to which the hba is connected.
>> After the remote port time out messages, I re-enabled the switch
>> port.  Three things happen that are weird.  First, two unexpected
>> responses while scanning.  Second, the creation of sdm and
>> sdn.  Third, the md device remains inaccessible.
>>
>> I don't think this is working the way it's intended to.  I
>> suspect it will cause big problems for multi-path volume managers
>> in a fail back situation.
>>
>> Mike
>>
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fc transport creates second set of targets for devices in an "md"
  2006-06-08 20:19 ` Mike Christie
@ 2006-06-09 16:35   ` Michael Reed
  2006-06-09 19:34     ` Mike Christie
  2006-06-09 19:52     ` James Smart
  0 siblings, 2 replies; 12+ messages in thread
From: Michael Reed @ 2006-06-09 16:35 UTC (permalink / raw)
  To: Mike Christie
  Cc: James.Smart, Jeremy Higdon, Gary Hagensen, linux-scsi, Jim Nead,
	James Bottomley



Mike Christie wrote:
> Michael Reed wrote:
>> I created an md device on two fibre channel disks, sde and sdf.
>> I then disabled the switch port to which the hba is connected.
>> After the remote port time out messages, I re-enabled the switch
>> port.  Three things happen that are weird.  First, two unexpected
>> responses while scanning.  Second, the creation of sdm and
>> sdn.  Third, the md device remains inaccessible.
>>
>> I don't think this is working the way it's intended to.  I
>> suspect it will cause big problems for multi-path volume managers
>> in a fail back situation.
>>
> 
> Even if the rport is removed and the devices under it are removed, md
> can still have a reference to the device so the memory does not
> disappear on it (MD still thinks the device is there but scsi says it is
> gone basically). Because of this, when you plug in the cable again and a
> new rport is created sd.c can end up allocating another sdX value.
> 

So, how about a callback to the driver, md, with the reference so that it
can release said reference?

Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fc transport creates second set of targets for devices in an "md"
  2006-06-09  0:43 ` James Bottomley
@ 2006-06-09 16:35   ` Michael Reed
  2006-06-09 19:23     ` James Bottomley
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Reed @ 2006-06-09 16:35 UTC (permalink / raw)
  To: James Bottomley
  Cc: James.Smart, Jeremy Higdon, Gary Hagensen, linux-scsi, Jim Nead



James Bottomley wrote:
> On Thu, 2006-06-08 at 14:13 -0500, Michael Reed wrote:
>> I created an md device on two fibre channel disks, sde and sdf.
>> I then disabled the switch port to which the hba is connected.
>> After the remote port time out messages, I re-enabled the switch
>> port.  Three things happen that are weird.  First, two unexpected
>> responses while scanning.  Second, the creation of sdm and
>> sdn.  Third, the md device remains inaccessible.
> 
> This is sort of as expected.  What you did was wait out the reconnection
> timer, so the mid layer failed and offlined the devices.  Thus, when
> they come back, they get new instances.  If you'd done a remove-device
> after they went offline, they'd have come back to the same location (as
> long as nothing had them open).  But this is user level stuff.

In this instance, there was no i/o in progress so the mid-layer didn't
take the device off line.  It was simply removed by the transport.

> Basically, when a path goes dead it's up to the multi-path user level to
> remove it an wait for udev to inform it that another SCSI node has
> appeared and has the correct signature to be another path to the device.

Are there notification mechanisms in place such that a driver which has
the device opened (claimed?) will be notified upon it's removal?  Should
there be?  Could it happen via a driver callback?  Udev?  Both?
I like the idea of a callback so that the removal has a chance of being
complete.

Is it possible to do a better job of reconnecting a removed target
to an open sd?

Thanks,
 Mike

> 
>> I don't think this is working the way it's intended to.  I
>> suspect it will cause big problems for multi-path volume managers
>> in a fail back situation.
> 
> James
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fc transport creates second set of targets for devices in an "md"
  2006-06-09 16:26   ` Michael Reed
@ 2006-06-09 18:10     ` Michael Reed
  2006-06-09 18:22       ` Michael Reed
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Reed @ 2006-06-09 18:10 UTC (permalink / raw)
  To: James.Smart
  Cc: Michael Reed, Jeremy Higdon, Gary Hagensen, linux-scsi, Jim Nead

[-- Attachment #1: Type: text/plain, Size: 601 bytes --]



Michael Reed wrote:
> 
> James Smart wrote:
>> The data that would be most interesting is a recursive listing of the
>> device via the /sys/devices tree.  This would show the host, rports,
>> targets, and sdevs.  Getting a copy of this both before, when missing,
>> and after they are reconnected. The additional contents to look at
>> is : contents of /sys/class/fc_remote_ports.
> 
> I'll capture data and post later today.
> 

Attached.  ls_no_md.tar.bz2 and ls_with_md.tar.bz2

I noticed that more than just the devices associated with the
md were reassigned in the ls_with_md.tar output.

Mike

[-- Attachment #2: ls_no_md.tar.bz2 --]
[-- Type: application/x-bzip2, Size: 5092 bytes --]

[-- Attachment #3: ls_with_md.tar.bz2 --]
[-- Type: application/x-bzip2, Size: 5115 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fc transport creates second set of targets for devices in an "md"
  2006-06-09 18:10     ` Michael Reed
@ 2006-06-09 18:22       ` Michael Reed
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Reed @ 2006-06-09 18:22 UTC (permalink / raw)
  To: James.Smart
  Cc: Michael Reed, Jeremy Higdon, Gary Hagensen, linux-scsi, Jim Nead

I tried using a mounted file system on a single device.  Same test,
disable / wait / enable.

There was no activity on the file system following the mount until
after the enable.

duck /root# ls /sdj1
 5:0:13:0: rejecting I/O to dead device
Buffer I/O error on device sdj1, logical block 516
Buffer I/O error on device sdj1, logical block 517
Buffer I/O error on device sdj1, logical block 518
Buffer I/O error on device sdj1, logical block 519
EXT3-fs error (device sdj1): ext3_readdir: directory #2 contains a hole at offset 0
Aborting journal on device sdj1.
 5:0:13:0: rejecting I/O to dead device
Buffer I/O error on device sdj1, logical block 0
lost page write due to I/O error on sdj1
ext3_abort called.
EXT3-fs error (device sdj1): ext3_journal_start_sb: Detected aborted journal
Remounting file system read-only

It would be really nice if the file system wouldn't be rendered
dead and require manual intervention following the scenario
of kicking a cable and plugging it back in after the dev_loss_tmo
expires.

I think I'm leaning toward wishing that the kernel could do a
better job of reconnecting devices following these little "accidents".

Mike

Michael Reed wrote:
> 
> Michael Reed wrote:
>> James Smart wrote:
>>> The data that would be most interesting is a recursive listing of the
>>> device via the /sys/devices tree.  This would show the host, rports,
>>> targets, and sdevs.  Getting a copy of this both before, when missing,
>>> and after they are reconnected. The additional contents to look at
>>> is : contents of /sys/class/fc_remote_ports.
>> I'll capture data and post later today.
>>
> 
> Attached.  ls_no_md.tar.bz2 and ls_with_md.tar.bz2
> 
> I noticed that more than just the devices associated with the
> md were reassigned in the ls_with_md.tar output.
> 
> Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fc transport creates second set of targets for devices in an "md"
  2006-06-09 16:35   ` Michael Reed
@ 2006-06-09 19:23     ` James Bottomley
  0 siblings, 0 replies; 12+ messages in thread
From: James Bottomley @ 2006-06-09 19:23 UTC (permalink / raw)
  To: Michael Reed
  Cc: James.Smart, Jeremy Higdon, Gary Hagensen, linux-scsi, Jim Nead

On Fri, 2006-06-09 at 11:35 -0500, Michael Reed wrote:
> In this instance, there was no i/o in progress so the mid-layer didn't
> take the device off line.  It was simply removed by the transport.

Oh, OK, it's as mike said then ... we end up with a transport disconnect
and the openers of the device are left with a reference that simply
fails all I/O until they close it.

> > Basically, when a path goes dead it's up to the multi-path user level to
> > remove it an wait for udev to inform it that another SCSI node has
> > appeared and has the correct signature to be another path to the device.
> 
> Are there notification mechanisms in place such that a driver which has
> the device opened (claimed?) will be notified upon it's removal?  Should
> there be?  Could it happen via a driver callback?  Udev?  Both?
> I like the idea of a callback so that the removal has a chance of being
> complete.

Well, yes, there was an event sent, at least a kobject remove event for
the device which could have been picked up by udev or anything else
listening.

> Is it possible to do a better job of reconnecting a removed target
> to an open sd?

Not really, and that's by choice.  What constitutes removal, and what
action should be taken on it (and even how you recognise the same device
returning) are pretty much policy issues to be sorted out at user level.
We do allow the transports to set a local remove policy (as FC does with
the devloss timer) but since you explicitly defeated that, you get into
the user arena.

James

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fc transport creates second set of targets for devices in an "md"
  2006-06-09 16:35   ` Michael Reed
@ 2006-06-09 19:34     ` Mike Christie
  2006-06-09 19:52     ` James Smart
  1 sibling, 0 replies; 12+ messages in thread
From: Mike Christie @ 2006-06-09 19:34 UTC (permalink / raw)
  To: Michael Reed
  Cc: James.Smart, Jeremy Higdon, Gary Hagensen, linux-scsi, Jim Nead,
	James Bottomley

Michael Reed wrote:
> 
> Mike Christie wrote:
>> Michael Reed wrote:
>>> I created an md device on two fibre channel disks, sde and sdf.
>>> I then disabled the switch port to which the hba is connected.
>>> After the remote port time out messages, I re-enabled the switch
>>> port.  Three things happen that are weird.  First, two unexpected
>>> responses while scanning.  Second, the creation of sdm and
>>> sdn.  Third, the md device remains inaccessible.
>>>
>>> I don't think this is working the way it's intended to.  I
>>> suspect it will cause big problems for multi-path volume managers
>>> in a fail back situation.
>>>
>> Even if the rport is removed and the devices under it are removed, md
>> can still have a reference to the device so the memory does not
>> disappear on it (MD still thinks the device is there but scsi says it is
>> gone basically). Because of this, when you plug in the cable again and a
>> new rport is created sd.c can end up allocating another sdX value.
>>
> 
> So, how about a callback to the driver, md, with the reference so that it
> can release said reference?
> 

Look at how DM's userspace multipath tools handle this. Last I looked it
could handle the hotplug events that are fired when a scsi device is
removed or added so as long as you set something like queue_if_no_path
or no_path_retry high enough then it could work.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fc transport creates second set of targets for devices in an "md"
  2006-06-09 16:35   ` Michael Reed
  2006-06-09 19:34     ` Mike Christie
@ 2006-06-09 19:52     ` James Smart
  1 sibling, 0 replies; 12+ messages in thread
From: James Smart @ 2006-06-09 19:52 UTC (permalink / raw)
  To: Michael Reed
  Cc: Mike Christie, Jeremy Higdon, Gary Hagensen, linux-scsi, Jim Nead,
	James Bottomley

Michael Reed wrote:
>> Even if the rport is removed and the devices under it are removed, md
>> can still have a reference to the device so the memory does not
>> disappear on it (MD still thinks the device is there but scsi says it is
>> gone basically). Because of this, when you plug in the cable again and a
>> new rport is created sd.c can end up allocating another sdX value.
>>
> 
> So, how about a callback to the driver, md, with the reference so that it
> can release said reference?

Mike C summarized the issue well. Callback - ugh. The real wish-list fix is to
make the midlayer reuse the old structures if the device comes back. Makes my
eyes bug out though to figure out this could be done. This would also solve
the race we saw in sysfs for recreation of the node while it was still outstanding
due to a reference (name collision).

-- james


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-06-09 19:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-08 19:13 fc transport creates second set of targets for devices in an "md" Michael Reed
2006-06-08 19:57 ` James Smart
2006-06-09 16:26   ` Michael Reed
2006-06-09 18:10     ` Michael Reed
2006-06-09 18:22       ` Michael Reed
2006-06-08 20:19 ` Mike Christie
2006-06-09 16:35   ` Michael Reed
2006-06-09 19:34     ` Mike Christie
2006-06-09 19:52     ` James Smart
2006-06-09  0:43 ` James Bottomley
2006-06-09 16:35   ` Michael Reed
2006-06-09 19:23     ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox