All of lore.kernel.org
 help / color / mirror / Atom feed
* multipath-tools causes path to come back as different block device
@ 2007-07-19 22:54 Brian De Wolf
  2007-07-20  7:24 ` Hannes Reinecke
  2007-07-20 10:10 ` Guido Guenther
  0 siblings, 2 replies; 12+ messages in thread
From: Brian De Wolf @ 2007-07-19 22:54 UTC (permalink / raw)
  To: device-mapper development

Hello again,

I've been testing multipath-tool's rdac capability with a qla2xxx HBA and an IBM
DS4800 some more and I've hit another stumbling block.  When I test unplugging
one of the HBA ports and plugging it back in with multipath running, it seems to
cause bad things to happen.  Here is what the syslog looks like (note:  sdb is a
path, sdd is initially unused, and sde is the second path):

Jul 19 14:30:35 jimbo kernel: qla2xxx 0000:02:01.1: LOOP DOWN detected (2).
Jul 19 14:30:41 jimbo kernel: rport-4:0-0: blocked FC remote port time out:
removing target and saving binding
Jul 19 14:30:41 jimbo kernel: sd 4:0:0:0: [sde] Synchronizing SCSI cache
Jul 19 14:30:41 jimbo kernel: sd 4:0:0:0: [sde] Result: hostbyte=0x01
driverbyte=0x00
Jul 19 14:30:48 jimbo multipathd: sde: rdac checker reports path is down
Jul 19 14:30:48 jimbo multipathd: checker failed path 8:64 in map test
Jul 19 14:30:48 jimbo kernel: scsi 4:0:0:0: rejecting I/O to dead device
Jul 19 14:30:48 jimbo kernel: device-mapper: multipath: Failing path 8:64.
Jul 19 14:30:48 jimbo multipathd: test: remaining active paths: 1
Jul 19 14:30:48 jimbo multipathd: test: switch to path group #2
Jul 19 14:30:52 jimbo kernel: qla2xxx 0000:02:01.1: LIP reset occured (f700).
Jul 19 14:30:52 jimbo kernel: qla2xxx 0000:02:01.1: LIP occured (f700).
Jul 19 14:30:52 jimbo kernel: qla2xxx 0000:02:01.1: LIP reset occured (f7f7).
Jul 19 14:30:53 jimbo kernel: scsi 4:0:0:0: rejecting I/O to dead device
Jul 19 14:30:53 jimbo multipathd: sde: rdac checker reports path is down
Jul 19 14:30:53 jimbo kernel: qla2xxx 0000:02:01.1: LOOP UP detected (4 Gbps).
Jul 19 14:30:53 jimbo kernel: scsi 4:0:0:0: Direct-Access     IBM      1815
 FAStT  0914 PQ: 0 ANSI: 3
Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] 6291456 512-byte hardware
sectors (3221 MB)
Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] Write Protect is off
Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] Mode Sense: 77 00 10 08
Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read
cache: enabled, supports DPO and FUA
Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] 6291456 512-byte hardware
sectors (3221 MB)
Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] Write Protect is off
Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] Mode Sense: 77 00 10 08
Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read
cache: enabled, supports DPO and FUA
Jul 19 14:30:53 jimbo kernel: sdd: sdd1
Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] Attached SCSI disk
Jul 19 14:30:53 jimbo kernel: scsi 4:0:0:0: Direct-Access     IBM      1815
 FAStT  0914 PQ: 0 ANSI: 3
Jul 19 14:30:53 jimbo kernel: kobject_add failed for 4:0:0:0 with -EEXIST, don't
try to register things with the same name in the same directory.
Jul 19 14:30:53 jimbo kernel:
Jul 19 14:30:53 jimbo kernel: Call Trace:
Jul 19 14:30:53 jimbo kernel: [<ffffffff802e1d9b>] kobject_shadow_add+0x187/0x191
Jul 19 14:30:53 jimbo kernel: [<ffffffff8033a495>] device_add+0xa1/0x59d
Jul 19 14:30:53 jimbo kernel: [<ffffffff803638e8>] scsi_sysfs_add_sdev+0x2e/0x24a
Jul 19 14:30:53 jimbo kernel: [<ffffffff80361f18>]
scsi_probe_and_add_lun+0x6ff/0x80f
Jul 19 14:30:53 jimbo kernel: [<ffffffff803612c8>] scsi_alloc_sdev+0x195/0x1ea
Jul 19 14:30:53 jimbo kernel: [<ffffffff80362580>] __scsi_scan_target+0x3e9/0x549
Jul 19 14:30:53 jimbo kernel: [<ffffffff80416d83>] thread_return+0x0/0xe2
Jul 19 14:30:53 jimbo kernel: [<ffffffff80362777>] scsi_scan_target+0x97/0xbc
Jul 19 14:30:53 jimbo kernel: [<ffffffff88003668>]
:scsi_transport_fc:fc_scsi_scan_rport+0x59/0x79
Jul 19 14:30:53 jimbo kernel: [<ffffffff8800360f>]
:scsi_transport_fc:fc_scsi_scan_rport+0x0/0x79
Jul 19 14:30:53 jimbo kernel: [<ffffffff802379c4>] run_workqueue+0x84/0x105
Jul 19 14:30:53 jimbo kernel: [<ffffffff80237a45>] worker_thread+0x0/0xf4
Jul 19 14:30:53 jimbo kernel: [<ffffffff80237b2f>] worker_thread+0xea/0xf4
Jul 19 14:30:53 jimbo kernel: [<ffffffff8023addd>] autoremove_wake_function+0x0/0x2e
Jul 19 14:30:53 jimbo kernel: [<ffffffff8023addd>] autoremove_wake_function+0x0/0x2e
Jul 19 14:30:53 jimbo kernel: [<ffffffff8023a888>] kthread+0x3d/0x63
Jul 19 14:30:53 jimbo kernel: [<ffffffff8020a338>] child_rip+0xa/0x12
Jul 19 14:30:53 jimbo kernel: [<ffffffff8023a84b>] kthread+0x0/0x63
Jul 19 14:30:53 jimbo kernel: [<ffffffff8020a32e>] child_rip+0x0/0x12
Jul 19 14:30:53 jimbo kernel:
Jul 19 14:30:53 jimbo kernel: error 1
Jul 19 14:30:53 jimbo kernel: scsi 4:0:0:0: Unexpected response from lun 0 while
scanning, scan aborted
Jul 19 14:30:53 jimbo scsi.agent[8613]: disk at
/devices/pci0000:00/0000:00:02.0/0000:02:01.1/host4/rport-4:0-0/target4:0:0/4:0:0:0
Jul 19 14:30:53 jimbo multipathd: sdd: add path (uevent)
Jul 19 14:30:53 jimbo kernel: scsi 4:0:0:0: rejecting I/O to dead device
Jul 19 14:30:53 jimbo multipathd: sde: checker msg is "rdac checker reports path
is down"
Jul 19 14:30:53 jimbo kernel: device-mapper: multipath rdac: using RDAC command
with timeout 15000
Jul 19 14:30:53 jimbo kernel: device-mapper: table: 254:6: multipath: error
getting device
Jul 19 14:30:53 jimbo kernel: device-mapper: ioctl: error adding target to table
Jul 19 14:30:53 jimbo multipathd: test: failed in domap for addition of new path sdd
Jul 19 14:30:53 jimbo multipathd: test: uev_add_path sleep
...

From here, the last 5 lines get repeated until I 'kill -9' the multipathd
process.  I'm not too keen on kernel internals (though playing with multipathing
is bringing me up to speed pretty quick), but I'm wondering if multipathd is
causing the call trace by not letting /dev/sde disappear so that the HBA's scsi
device can grab that name again.  I noticed this via lsof:
multipath 8390     root    5r      BLK               8,64              22254
/dev/sde (deleted)
multipath 8390     root    6r      BLK               8,16               1100
/dev/sdb
multipath 8390     root   10r      BLK               8,48              23647
/dev/sdd

When multipathd is running, unplugging and plugging in one of the ports causes
it to grab the next sd* device name.  As this is repeated, the number of deleted
block devices multipathd holds on to grows, along with the number of unhappy
rdac checkers.  As I said before, it takes a 'kill -9' to stop multipathd, and
subsequent plugging ins choose sd* names that were previously used but were held
onto as (deleted) by multipathd.

However, this behavior is not seen when multipathd is not running.  When the
port is unplugged, the /dev/sd* device disappears, and when it is plugged back
in, it takes the same name it had before (I assume it's just taking the lowest
name, and its old name has been freed) cleanly, with no call traces or anything.

Any ideas on how to correct this behavior?

Thanks!
Brian De Wolf

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: multipath-tools causes path to come back as different block device
  2007-07-19 22:54 multipath-tools causes path to come back as different block device Brian De Wolf
@ 2007-07-20  7:24 ` Hannes Reinecke
  2007-07-20 11:53   ` Guido Guenther
  2007-07-20 17:05   ` multipath-tools causes path to come back as different block device Brian De Wolf
  2007-07-20 10:10 ` Guido Guenther
  1 sibling, 2 replies; 12+ messages in thread
From: Hannes Reinecke @ 2007-07-20  7:24 UTC (permalink / raw)
  To: device-mapper development

Brian De Wolf wrote:
> Hello again,
> 
> I've been testing multipath-tool's rdac capability with a qla2xxx HBA and an IBM
> DS4800 some more and I've hit another stumbling block.  When I test unplugging
> one of the HBA ports and plugging it back in with multipath running, it seems to
> cause bad things to happen.  Here is what the syslog looks like (note:  sdb is a
> path, sdd is initially unused, and sde is the second path):
> 
> Jul 19 14:30:35 jimbo kernel: qla2xxx 0000:02:01.1: LOOP DOWN detected (2).
> Jul 19 14:30:41 jimbo kernel: rport-4:0-0: blocked FC remote port time out:
> removing target and saving binding
> Jul 19 14:30:41 jimbo kernel: sd 4:0:0:0: [sde] Synchronizing SCSI cache
> Jul 19 14:30:41 jimbo kernel: sd 4:0:0:0: [sde] Result: hostbyte=0x01
> driverbyte=0x00
> Jul 19 14:30:48 jimbo multipathd: sde: rdac checker reports path is down
> Jul 19 14:30:48 jimbo multipathd: checker failed path 8:64 in map test
> Jul 19 14:30:48 jimbo kernel: scsi 4:0:0:0: rejecting I/O to dead device
> Jul 19 14:30:48 jimbo kernel: device-mapper: multipath: Failing path 8:64.
> Jul 19 14:30:48 jimbo multipathd: test: remaining active paths: 1
> Jul 19 14:30:48 jimbo multipathd: test: switch to path group #2
> Jul 19 14:30:52 jimbo kernel: qla2xxx 0000:02:01.1: LIP reset occured (f700).
> Jul 19 14:30:52 jimbo kernel: qla2xxx 0000:02:01.1: LIP occured (f700).
> Jul 19 14:30:52 jimbo kernel: qla2xxx 0000:02:01.1: LIP reset occured (f7f7).
> Jul 19 14:30:53 jimbo kernel: scsi 4:0:0:0: rejecting I/O to dead device
> Jul 19 14:30:53 jimbo multipathd: sde: rdac checker reports path is down
> Jul 19 14:30:53 jimbo kernel: qla2xxx 0000:02:01.1: LOOP UP detected (4 Gbps).
> Jul 19 14:30:53 jimbo kernel: scsi 4:0:0:0: Direct-Access     IBM      1815
>  FAStT  0914 PQ: 0 ANSI: 3
> Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] 6291456 512-byte hardware
> sectors (3221 MB)
> Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] Write Protect is off
> Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] Mode Sense: 77 00 10 08
> Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read
> cache: enabled, supports DPO and FUA
> Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] 6291456 512-byte hardware
> sectors (3221 MB)
> Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] Write Protect is off
> Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] Mode Sense: 77 00 10 08
> Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read
> cache: enabled, supports DPO and FUA
> Jul 19 14:30:53 jimbo kernel: sdd: sdd1
> Jul 19 14:30:53 jimbo kernel: sd 4:0:0:0: [sdd] Attached SCSI disk
> Jul 19 14:30:53 jimbo kernel: scsi 4:0:0:0: Direct-Access     IBM      1815
>  FAStT  0914 PQ: 0 ANSI: 3
> Jul 19 14:30:53 jimbo kernel: kobject_add failed for 4:0:0:0 with -EEXIST, don't
> try to register things with the same name in the same directory.
> Jul 19 14:30:53 jimbo kernel:
> Jul 19 14:30:53 jimbo kernel: Call Trace:
> Jul 19 14:30:53 jimbo kernel: [<ffffffff802e1d9b>] kobject_shadow_add+0x187/0x191
> Jul 19 14:30:53 jimbo kernel: [<ffffffff8033a495>] device_add+0xa1/0x59d
> Jul 19 14:30:53 jimbo kernel: [<ffffffff803638e8>] scsi_sysfs_add_sdev+0x2e/0x24a
> Jul 19 14:30:53 jimbo kernel: [<ffffffff80361f18>]
> scsi_probe_and_add_lun+0x6ff/0x80f
> Jul 19 14:30:53 jimbo kernel: [<ffffffff803612c8>] scsi_alloc_sdev+0x195/0x1ea
> Jul 19 14:30:53 jimbo kernel: [<ffffffff80362580>] __scsi_scan_target+0x3e9/0x549
> Jul 19 14:30:53 jimbo kernel: [<ffffffff80416d83>] thread_return+0x0/0xe2
> Jul 19 14:30:53 jimbo kernel: [<ffffffff80362777>] scsi_scan_target+0x97/0xbc
> Jul 19 14:30:53 jimbo kernel: [<ffffffff88003668>]
> :scsi_transport_fc:fc_scsi_scan_rport+0x59/0x79
> Jul 19 14:30:53 jimbo kernel: [<ffffffff8800360f>]
> :scsi_transport_fc:fc_scsi_scan_rport+0x0/0x79
> Jul 19 14:30:53 jimbo kernel: [<ffffffff802379c4>] run_workqueue+0x84/0x105
> Jul 19 14:30:53 jimbo kernel: [<ffffffff80237a45>] worker_thread+0x0/0xf4
> Jul 19 14:30:53 jimbo kernel: [<ffffffff80237b2f>] worker_thread+0xea/0xf4
> Jul 19 14:30:53 jimbo kernel: [<ffffffff8023addd>] autoremove_wake_function+0x0/0x2e
> Jul 19 14:30:53 jimbo kernel: [<ffffffff8023addd>] autoremove_wake_function+0x0/0x2e
> Jul 19 14:30:53 jimbo kernel: [<ffffffff8023a888>] kthread+0x3d/0x63
> Jul 19 14:30:53 jimbo kernel: [<ffffffff8020a338>] child_rip+0xa/0x12
> Jul 19 14:30:53 jimbo kernel: [<ffffffff8023a84b>] kthread+0x0/0x63
> Jul 19 14:30:53 jimbo kernel: [<ffffffff8020a32e>] child_rip+0x0/0x12
> Jul 19 14:30:53 jimbo kernel:
> Jul 19 14:30:53 jimbo kernel: error 1
> Jul 19 14:30:53 jimbo kernel: scsi 4:0:0:0: Unexpected response from lun 0 while
> scanning, scan aborted
> Jul 19 14:30:53 jimbo scsi.agent[8613]: disk at
> /devices/pci0000:00/0000:00:02.0/0000:02:01.1/host4/rport-4:0-0/target4:0:0/4:0:0:0
> Jul 19 14:30:53 jimbo multipathd: sdd: add path (uevent)
> Jul 19 14:30:53 jimbo kernel: scsi 4:0:0:0: rejecting I/O to dead device
> Jul 19 14:30:53 jimbo multipathd: sde: checker msg is "rdac checker reports path
> is down"
> Jul 19 14:30:53 jimbo kernel: device-mapper: multipath rdac: using RDAC command
> with timeout 15000
> Jul 19 14:30:53 jimbo kernel: device-mapper: table: 254:6: multipath: error
> getting device
> Jul 19 14:30:53 jimbo kernel: device-mapper: ioctl: error adding target to table
> Jul 19 14:30:53 jimbo multipathd: test: failed in domap for addition of new path sdd
> Jul 19 14:30:53 jimbo multipathd: test: uev_add_path sleep
> ...
> 
>>From here, the last 5 lines get repeated until I 'kill -9' the multipathd
> process.  I'm not too keen on kernel internals (though playing with multipathing
> is bringing me up to speed pretty quick), but I'm wondering if multipathd is
> causing the call trace by not letting /dev/sde disappear so that the HBA's scsi
> device can grab that name again.  I noticed this via lsof:
> multipath 8390     root    5r      BLK               8,64              22254
> /dev/sde (deleted)
> multipath 8390     root    6r      BLK               8,16               1100
> /dev/sdb
> multipath 8390     root   10r      BLK               8,48              23647
> /dev/sdd
> 
> When multipathd is running, unplugging and plugging in one of the ports causes
> it to grab the next sd* device name.  As this is repeated, the number of deleted
> block devices multipathd holds on to grows, along with the number of unhappy
> rdac checkers.  As I said before, it takes a 'kill -9' to stop multipathd, and
> subsequent plugging ins choose sd* names that were previously used but were held
> onto as (deleted) by multipathd.
> 
> However, this behavior is not seen when multipathd is not running.  When the
> port is unplugged, the /dev/sd* device disappears, and when it is plugged back
> in, it takes the same name it had before (I assume it's just taking the lowest
> name, and its old name has been freed) cleanly, with no call traces or anything.
> 
> Any ideas on how to correct this behavior?
> 
Hmm. multipathd really should react to the 'remove' events for sdX.
Checking ...

Looks as if it does. And it even is supposed to stop the path checker.

Care to run multipathd with full debugging (ie -v 4) and post the output?
My guess is that somehow the path checker is not stopped and the fd is kept
open, so that the device is not released properly.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: multipath-tools causes path to come back as different block device
  2007-07-19 22:54 multipath-tools causes path to come back as different block device Brian De Wolf
  2007-07-20  7:24 ` Hannes Reinecke
@ 2007-07-20 10:10 ` Guido Guenther
  1 sibling, 0 replies; 12+ messages in thread
From: Guido Guenther @ 2007-07-20 10:10 UTC (permalink / raw)
  To: device-mapper development

Hi Brian,
On Thu, Jul 19, 2007 at 03:54:51PM -0700, Brian De Wolf wrote:
> I've been testing multipath-tool's rdac capability with a qla2xxx HBA and an IBM
> DS4800 some more and I've hit another stumbling block.  When I test unplugging
> one of the HBA ports and plugging it back in with multipath running, it seems to
> cause bad things to happen.  Here is what the syslog looks like (note:  sdb is a
> path, sdd is initially unused, and sde is the second path):
I'm seeing the same and send a report to Christophe about this just
yesterday, it doesn't seem to be a driver issue since iSCSI is affected
as well.
Cheers,
 -- Guido

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: multipath-tools causes path to come back as different block device
  2007-07-20  7:24 ` Hannes Reinecke
@ 2007-07-20 11:53   ` Guido Guenther
  2007-07-20 14:40     ` Hannes Reinecke
  2007-07-20 17:05   ` multipath-tools causes path to come back as different block device Brian De Wolf
  1 sibling, 1 reply; 12+ messages in thread
From: Guido Guenther @ 2007-07-20 11:53 UTC (permalink / raw)
  To: device-mapper development

Hi Hannes,
On Fri, Jul 20, 2007 at 09:24:15AM +0200, Hannes Reinecke wrote:
> Hmm. multipathd really should react to the 'remove' events for sdX.
> Checking ...
> 
> Looks as if it does. And it even is supposed to stop the path checker.
> 
> Care to run multipathd with full debugging (ie -v 4) and post the output?
> My guess is that somehow the path checker is not stopped and the fd is kept
> open, so that the device is not released properly.
This even happens with no multipathd running on a Clariion with qla2xx.
So multipathd can't be at fault here, can it? Also with multipathd
running the device nodes in /dev/ and the sysfs nodes in /sys/block get
removed properly (at least it looks like to me). Any hints where else to
start digging?
Cheers,
 -- Guido

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: multipath-tools causes path to come back as different block device
  2007-07-20 11:53   ` Guido Guenther
@ 2007-07-20 14:40     ` Hannes Reinecke
  2007-07-20 17:29       ` Guido Guenther
  0 siblings, 1 reply; 12+ messages in thread
From: Hannes Reinecke @ 2007-07-20 14:40 UTC (permalink / raw)
  To: device-mapper development

Guido Guenther wrote:
> Hi Hannes,
> On Fri, Jul 20, 2007 at 09:24:15AM +0200, Hannes Reinecke wrote:
>> Hmm. multipathd really should react to the 'remove' events for sdX.
>> Checking ...
>>
>> Looks as if it does. And it even is supposed to stop the path checker.
>>
>> Care to run multipathd with full debugging (ie -v 4) and post the output?
>> My guess is that somehow the path checker is not stopped and the fd is kept
>> open, so that the device is not released properly.
> This even happens with no multipathd running on a Clariion with qla2xx.
> So multipathd can't be at fault here, can it? Also with multipathd
> running the device nodes in /dev/ and the sysfs nodes in /sys/block get
> removed properly (at least it looks like to me). Any hints where else to
> start digging?
Removing from /dev and /sys in not sufficient, sadly.
The memory is only released after the last reference has gone.
So all fd's have to be closed (ie multipathd and the path checkers have
to have this path disabled) and no device-mapper table must reference it.

So easiest way is infact to have multipathd run with -v 4, as this will
tell you exactly if multipathd has stopped the path checker.
The device-mapper tables can be checked manually with dmsetup.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: multipath-tools causes path to come back as different block device
  2007-07-20  7:24 ` Hannes Reinecke
  2007-07-20 11:53   ` Guido Guenther
@ 2007-07-20 17:05   ` Brian De Wolf
  1 sibling, 0 replies; 12+ messages in thread
From: Brian De Wolf @ 2007-07-20 17:05 UTC (permalink / raw)
  To: device-mapper development

Hannes Reinecke wrote:
> Hmm. multipathd really should react to the 'remove' events for sdX.
> Checking ...
> 
> Looks as if it does. And it even is supposed to stop the path checker.
> 
> Care to run multipathd with full debugging (ie -v 4) and post the output?
> My guess is that somehow the path checker is not stopped and the fd is kept
> open, so that the device is not released properly.
> 
> Cheers,
> 
> Hannes

Here's the log, as requested:

Jul 20 09:12:59 jimbo kernel: qla2xxx 0000:02:01.1: LOOP DOWN detected (2).
Jul 20 09:12:59 jimbo multipathd: tick
Jul 20 09:13:00 jimbo multipathd: tick
Jul 20 09:13:01 jimbo multipathd: tick
Jul 20 09:13:05 jimbo kernel: rport-4:0-0: blocked FC remote port time out:
removing target and saving binding
Jul 20 09:13:05 jimbo multipathd: sdd: rdac checker reports path is down
Jul 20 09:13:05 jimbo multipathd: checker failed path 8:48 in map test
Jul 20 09:13:05 jimbo multipathd: test: remaining active paths: 1
Jul 20 09:13:05 jimbo kernel: device-mapper: multipath: Failing path 8:48.
Jul 20 09:13:05 jimbo kernel: sd 4:0:0:0: [sdd] Synchronizing SCSI cache
Jul 20 09:13:05 jimbo kernel: sd 4:0:0:0: [sdd] Result: hostbyte=0x01
driverbyte=0x00
Jul 20 09:13:05 jimbo multipathd: test: devmap event #5
Jul 20 09:13:05 jimbo multipathd: 8:16: delay next check 20s
Jul 20 09:13:05 jimbo multipathd: path prio refresh
Jul 20 09:13:05 jimbo multipathd: sdb: mask = 0x8
Jul 20 09:13:05 jimbo multipathd: sdb: prio = 1
Jul 20 09:13:05 jimbo multipathd: test: switch to path group #2
Jul 20 09:13:05 jimbo multipathd: test: discover
Jul 20 09:13:05 jimbo multipathd: *word = 0, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = rdac, len = 4
Jul 20 09:13:05 jimbo multipathd: *word = 2, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 2, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 8:48, len = 4
Jul 20 09:13:05 jimbo multipathd: *word = 1000, len = 4
Jul 20 09:13:05 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 8:16, len = 4
Jul 20 09:13:05 jimbo multipathd: *word = 1000, len = 4
Jul 20 09:13:05 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 0, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 2, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = E, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 0, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = F, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = E, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 0, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = A, len = 1
Jul 20 09:13:05 jimbo multipathd: *word = 0, len = 1
Jul 20 09:13:05 jimbo multipathd: test: rr_weight = 1 (internal default)
Jul 20 09:13:05 jimbo multipathd: test: pgfailback = -2 (LUN setting)
Jul 20 09:13:05 jimbo multipathd: test: no_path_retry = NONE (internal default)
Jul 20 09:13:05 jimbo multipathd: pg_timeout = NONE (internal default)
Jul 20 09:13:05 jimbo multipathd: uevent 'remove' from '/class/scsi_device/4:0:0:0'
Jul 20 09:13:05 jimbo multipathd: UDEV_LOG=3
Jul 20 09:13:05 jimbo multipathd: ACTION=remove
Jul 20 09:13:05 jimbo multipathd: DEVPATH=/class/scsi_device/4:0:0:0
Jul 20 09:13:05 jimbo multipathd: SUBSYSTEM=scsi_device
Jul 20 09:13:05 jimbo multipathd: SEQNUM=944
Jul 20 09:13:05 jimbo multipathd:
PHYSDEVPATH=/devices/pci0000:00/0000:00:02.0/0000:02:01.1/host4/rport-4:0-0/target4:0:0/4:0:0:0
Jul 20 09:13:05 jimbo multipathd: PHYSDEVBUS=scsi
Jul 20 09:13:05 jimbo multipathd: PHYSDEVDRIVER=sd
Jul 20 09:13:05 jimbo multipathd: UDEVD_EVENT=1
Jul 20 09:13:05 jimbo multipathd: IN_HOTPLUG=1
Jul 20 09:13:05 jimbo multipathd: discard event on /class/scsi_device/4:0:0:0
Jul 20 09:13:05 jimbo multipathd: uevent 'remove' from '/class/scsi_disk/4:0:0:0'
Jul 20 09:13:05 jimbo multipathd: UDEV_LOG=3
Jul 20 09:13:05 jimbo multipathd: ACTION=remove
Jul 20 09:13:05 jimbo multipathd: DEVPATH=/class/scsi_disk/4:0:0:0
Jul 20 09:13:05 jimbo multipathd: SUBSYSTEM=scsi_disk
Jul 20 09:13:05 jimbo multipathd: SEQNUM=945
Jul 20 09:13:05 jimbo multipathd:
PHYSDEVPATH=/devices/pci0000:00/0000:00:02.0/0000:02:01.1/host4/rport-4:0-0/target4:0:0/4:0:0:0
Jul 20 09:13:05 jimbo multipathd: PHYSDEVBUS=scsi
Jul 20 09:13:05 jimbo multipathd: PHYSDEVDRIVER=sd
Jul 20 09:13:05 jimbo multipathd: UDEVD_EVENT=1
Jul 20 09:13:05 jimbo multipathd: IN_HOTPLUG=1
Jul 20 09:13:05 jimbo multipathd: discard event on /class/scsi_disk/4:0:0:0
Jul 20 09:13:05 jimbo multipathd: uevent 'remove' from '/block/sdd/sdd1'
Jul 20 09:13:05 jimbo multipathd: UDEV_LOG=3
Jul 20 09:13:05 jimbo multipathd: ACTION=remove
Jul 20 09:13:05 jimbo multipathd: DEVPATH=/block/sdd/sdd1
Jul 20 09:13:05 jimbo multipathd: SUBSYSTEM=block
Jul 20 09:13:05 jimbo multipathd: SEQNUM=946
Jul 20 09:13:05 jimbo multipathd: MINOR=49
Jul 20 09:13:05 jimbo multipathd: MAJOR=8
Jul 20 09:13:05 jimbo multipathd:
PHYSDEVPATH=/devices/pci0000:00/0000:00:02.0/0000:02:01.1/host4/rport-4:0-0/target4:0:0/4:0:0:0
Jul 20 09:13:05 jimbo multipathd: PHYSDEVBUS=scsi
Jul 20 09:13:05 jimbo multipathd: PHYSDEVDRIVER=sd
Jul 20 09:13:05 jimbo multipathd: UDEVD_EVENT=1
Jul 20 09:13:05 jimbo multipathd: IN_HOTPLUG=1
Jul 20 09:13:05 jimbo multipathd: ID_VENDOR=IBM
Jul 20 09:13:05 jimbo multipathd: ID_MODEL=1815_FAStT
Jul 20 09:13:05 jimbo multipathd: ID_REVISION=0914
Jul 20 09:13:05 jimbo multipathd: ID_SERIAL=3600a0b80001199100000a624468e6438
Jul 20 09:13:05 jimbo multipathd: ID_SERIAL_SHORT=600a0b80001199100000a624468e6438
Jul 20 09:13:05 jimbo multipathd: ID_TYPE=disk
Jul 20 09:13:05 jimbo multipathd: ID_BUS=scsi
Jul 20 09:13:05 jimbo multipathd:
ID_PATH=pci-0000:02:01.1-fc-0x202700a0b8119910:0x0000000000000000
Jul 20 09:13:05 jimbo multipathd: ID_FS_USAGE=filesystem
Jul 20 09:13:05 jimbo multipathd: ID_FS_TYPE=ext3
Jul 20 09:13:05 jimbo multipathd: ID_FS_VERSION=1.0
Jul 20 09:13:05 jimbo multipathd: ID_FS_UUID=8fe0f813-4d2b-4ed1-9b67-ba8805a37561
Jul 20 09:13:05 jimbo multipathd: ID_FS_LABEL=
Jul 20 09:13:05 jimbo multipathd: ID_FS_LABEL_SAFE=
Jul 20 09:13:05 jimbo multipathd:
DEVLINKS=/dev/disk/by-id/scsi-3600a0b80001199100000a624468e6438-part1
/dev/disk/by-path/pci-0000:02:01.1-fc-0x202700a0b8119
910:
Jul 20 09:13:05 jimbo multipathd: DEVNAME=/dev/sdd1
Jul 20 09:13:05 jimbo multipathd: discard event on /block/sdd/sdd1
Jul 20 09:13:05 jimbo multipathd: uevent 'remove' from
'/devices/pci0000:00/0000:00:02.0/0000:02:01.1/host4/rport-4:0-0/target4:0:0/4:0:0:0'
Jul 20 09:13:05 jimbo multipathd: UDEV_LOG=3
Jul 20 09:13:05 jimbo multipathd: ACTION=remove
Jul 20 09:13:05 jimbo multipathd:
DEVPATH=/devices/pci0000:00/0000:00:02.0/0000:02:01.1/host4/rport-4:0-0/target4:0:0/4:0:0:0
Jul 20 09:13:05 jimbo multipathd: SUBSYSTEM=scsi
Jul 20 09:13:05 jimbo multipathd: SEQNUM=948
Jul 20 09:13:05 jimbo multipathd: PHYSDEVBUS=scsi
Jul 20 09:13:05 jimbo multipathd: MODALIAS=scsi:t-0x00
Jul 20 09:13:05 jimbo multipathd: UDEVD_EVENT=1
Jul 20 09:13:05 jimbo multipathd: IN_HOTPLUG=1
Jul 20 09:13:05 jimbo multipathd: discard event on
/devices/pci0000:00/0000:00:02.0/0000:02:01.1/host4/rport-4:0-0/target4:0:0/4:0:0:0
Jul 20 09:13:05 jimbo multipathd: uevent 'remove' from '/block/sdd'
Jul 20 09:13:05 jimbo multipathd: UDEV_LOG=3
Jul 20 09:13:05 jimbo multipathd: ACTION=remove
Jul 20 09:13:05 jimbo multipathd: DEVPATH=/block/sdd
Jul 20 09:13:05 jimbo multipathd: SUBSYSTEM=block
Jul 20 09:13:05 jimbo multipathd: SEQNUM=947
Jul 20 09:13:05 jimbo multipathd: MINOR=48
Jul 20 09:13:05 jimbo multipathd: MAJOR=8
Jul 20 09:13:05 jimbo multipathd:
PHYSDEVPATH=/devices/pci0000:00/0000:00:02.0/0000:02:01.1/host4/rport-4:0-0/target4:0:0/4:0:0:0
Jul 20 09:13:05 jimbo multipathd: PHYSDEVBUS=scsi
Jul 20 09:13:05 jimbo multipathd: PHYSDEVDRIVER=sd
Jul 20 09:13:05 jimbo multipathd: UDEVD_EVENT=1
Jul 20 09:13:05 jimbo multipathd: IN_HOTPLUG=1
Jul 20 09:13:05 jimbo multipathd: ID_VENDOR=IBM
Jul 20 09:13:05 jimbo multipathd: ID_MODEL=1815_FAStT
Jul 20 09:13:05 jimbo multipathd: ID_REVISION=0914
Jul 20 09:13:05 jimbo multipathd: ID_SERIAL=3600a0b80001199100000a624468e6438
Jul 20 09:13:05 jimbo multipathd: ID_SERIAL_SHORT=600a0b80001199100000a624468e6438
Jul 20 09:13:05 jimbo multipathd: ID_TYPE=disk
Jul 20 09:13:05 jimbo multipathd: ID_BUS=scsi
Jul 20 09:13:05 jimbo multipathd:
ID_PATH=pci-0000:02:01.1-fc-0x202700a0b8119910:0x0000000000000000
Jul 20 09:13:05 jimbo multipathd:
DEVLINKS=/dev/disk/by-id/scsi-3600a0b80001199100000a624468e6438
/dev/disk/by-path/pci-0000:02:01.1-fc-0x202700a0b8119910:0x
0000
Jul 20 09:13:05 jimbo multipathd: DEVNAME=/dev/sdd
Jul 20 09:13:06 jimbo multipathd: tick
Jul 20 09:13:06 jimbo multipathd: test: devmap event #6
Jul 20 09:13:06 jimbo multipathd: test: discover
Jul 20 09:13:06 jimbo multipathd: *word = 0, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = rdac, len = 4
Jul 20 09:13:06 jimbo multipathd: *word = 2, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 2, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 8:48, len = 4
Jul 20 09:13:06 jimbo multipathd: *word = 1000, len = 4
Jul 20 09:13:06 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 8:16, len = 4
Jul 20 09:13:06 jimbo multipathd: *word = 1000, len = 4
Jul 20 09:13:06 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 0, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 2, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = E, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 0, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = F, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = E, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 1, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 0, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = A, len = 1
Jul 20 09:13:06 jimbo multipathd: *word = 0, len = 1
Jul 20 09:13:06 jimbo multipathd: test: rr_weight = 1 (internal default)
Jul 20 09:13:06 jimbo multipathd: test: pgfailback = -2 (LUN setting)
Jul 20 09:13:06 jimbo multipathd: test: no_path_retry = NONE (internal default)
Jul 20 09:13:06 jimbo multipathd: pg_timeout = NONE (internal default)
Jul 20 09:13:07 jimbo multipathd: tick
Jul 20 09:13:07 jimbo multipathd: map garbage collection
Jul 20 09:13:08 jimbo multipathd: tick
Jul 20 09:13:09 jimbo multipathd: tick
Jul 20 09:13:10 jimbo multipathd: tick
Jul 20 09:13:10 jimbo kernel: scsi 4:0:0:0: rejecting I/O to dead device
Jul 20 09:13:10 jimbo multipathd: sdd: rdac checker reports path is down
Jul 20 09:13:10 jimbo multipathd: path prio refresh
Jul 20 09:13:10 jimbo multipathd: sdd: mask = 0x8
Jul 20 09:13:11 jimbo multipathd: tick
...

After that, the last five lines keep repeating until it's killed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: multipath-tools causes path to come back as different block device
  2007-07-20 14:40     ` Hannes Reinecke
@ 2007-07-20 17:29       ` Guido Guenther
  2007-07-20 20:52         ` [multipath 1/1] cache sysfs_devices -s
       [not found]         ` <11849647793576-git-send-email--s>
  0 siblings, 2 replies; 12+ messages in thread
From: Guido Guenther @ 2007-07-20 17:29 UTC (permalink / raw)
  To: device-mapper development

Hi Hannes,
On Fri, Jul 20, 2007 at 04:40:11PM +0200, Hannes Reinecke wrote:
> Removing from /dev and /sys in not sufficient, sadly.
> The memory is only released after the last reference has gone.
> So all fd's have to be closed (ie multipathd and the path checkers have
> to have this path disabled) and no device-mapper table must reference it.
Knowing this, the problem is much easier to spot...

> So easiest way is infact to have multipathd run with -v 4, as this will
> tell you exactly if multipathd has stopped the path checker.
> The device-mapper tables can be checked manually with dmsetup.
...it's in multpathd/main.c. The sysfs device is already gone when we
get the remove event. This patch makes it obvious:

diff --git a/multipathd/main.c b/multipathd/main.c
index 5f98c33..00890d7 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -635,8 +635,10 @@ uev_trigger (struct uevent * uev, void * trigger_data)
 		return 0;
 
 	sysdev = sysfs_device_get(uev->devpath);
-	if(!sysdev)
-		return 0;	
+	if (!sysdev) {
+		condlog(4, "Devpath '%s' of uevent '%s' vanished", uev->devpath, uev->action);
+		return 0;
+	}
 
 	lock(vecs->lock);

Signed-Off-By: Guido Günther <agx@sigxcpu.org>

The sysfs device is already gone, so sysfs_device_get returns 0 and we
never call uev_remove_path further down below. The check of the return
value was added by f1b1fca2ccbfd7d58350eb136105fdaf8aa4f59ca, which is
imho correct. Can we expect the sysfs node to stay around long enough? I
don't think so, so should we cache the sysfs_device structures somewhere
to use them on path removal? I could cook up a patch.
Cheers,
 -- Guido

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [multipath 1/1] cache sysfs_devices
  2007-07-20 17:29       ` Guido Guenther
@ 2007-07-20 20:52         ` -s
       [not found]         ` <11849647793576-git-send-email--s>
  1 sibling, 0 replies; 12+ messages in thread
From: -s @ 2007-07-20 20:52 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: dm-devel

From: Guido Guenther <agx@sigxcpu.org>

This also plugs a memory leak where we'd malloc space for the same sysfs device
over and over again for every processed uevent.
---
 libmultipath/sysfs.c |   47 +++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/libmultipath/sysfs.c b/libmultipath/sysfs.c
index 1fb5436..d8c65b2 100644
--- a/libmultipath/sysfs.c
+++ b/libmultipath/sysfs.c
@@ -45,6 +45,13 @@ struct sysfs_attr {
 	char value_local[NAME_SIZE];
 };
 
+/* list of sysfs devices */
+static LIST_HEAD(sysfs_dev_list);
+struct sysfs_dev {
+	struct list_head node;
+	struct sysfs_device dev;
+};
+
 int sysfs_init(char *path, size_t len)
 {
 	if (path) {
@@ -55,6 +62,7 @@ int sysfs_init(char *path, size_t len)
 	dbg("sysfs_path='%s'", sysfs_path);
 
 	INIT_LIST_HEAD(&attr_list);
+	INIT_LIST_HEAD(&sysfs_dev_list);
 	return 0;
 }
 
@@ -63,11 +71,17 @@ void sysfs_cleanup(void)
 	struct sysfs_attr *attr_loop;
 	struct sysfs_attr *attr_temp;
 
+	struct sysfs_dev *sysdev_loop;
+	struct sysfs_dev *sysdev_temp;
+
 	list_for_each_entry_safe(attr_loop, attr_temp, &attr_list, node) {
 		list_del(&attr_loop->node);
 		free(attr_loop);
 	}
 
+	list_for_each_entry_safe(sysdev_loop, sysdev_temp, &sysfs_dev_list, node) {
+		free(sysdev_loop);
+	}
 }
 
 void sysfs_device_set_values(struct sysfs_device *dev, const char *devpath,
@@ -140,7 +154,8 @@ struct sysfs_device *sysfs_device_get(const char *devpath)
 {
 	char path[PATH_SIZE];
 	char devpath_real[PATH_SIZE];
-	struct sysfs_device *dev;
+	struct sysfs_device *dev = NULL;
+	struct sysfs_dev *sysdev_loop, *sysdev;
 	struct stat statbuf;
 	char link_path[PATH_SIZE];
 	char link_target[PATH_SIZE];
@@ -155,7 +170,14 @@ struct sysfs_device *sysfs_device_get(const char *devpath)
 	strlcpy(path, sysfs_path, sizeof(path));
 	strlcat(path, devpath_real, sizeof(path));
 	if (lstat(path, &statbuf) != 0) {
+		/* if stat fails look in the cache */
 		dbg("stat '%s' failed: %s", path, strerror(errno));
+		list_for_each_entry(sysdev_loop, &sysfs_dev_list, node) {
+			if (strcmp(sysdev_loop->dev.devpath, devpath_real) == 0) {
+				dbg("found vanished dev in cache '%s'", sysdev_loop->dev.devpath);
+				return &sysdev_loop->dev;
+			}
+		}
 		return NULL;
 	}
 	if (S_ISLNK(statbuf.st_mode)) {
@@ -164,12 +186,22 @@ struct sysfs_device *sysfs_device_get(const char *devpath)
 
 	}
 
-	/* it is a new device */
-	dbg("new device '%s'", devpath_real);
-	dev = malloc(sizeof(struct sysfs_device));
-	if (dev == NULL)
-		return NULL;
-	memset(dev, 0x00, sizeof(struct sysfs_device));
+	list_for_each_entry(sysdev_loop, &sysfs_dev_list, node) {
+		if (strcmp(sysdev_loop->dev.devpath, devpath_real) == 0) {
+			dbg("found dev in cache '%s'", sysdev_loop->dev.devpath);
+				dev = &sysdev_loop->dev;
+		}
+	}
+	if(!dev) {
+		/* it is a new device */
+		dbg("new device '%s'", devpath_real);
+		sysdev = malloc(sizeof(struct sysfs_dev));
+		if (sysdev == NULL)
+			return NULL;
+		memset(sysdev, 0x00, sizeof(struct sysfs_dev));
+		list_add(&sysdev->node, &sysfs_dev_list);
+		dev = &sysdev->dev;
+	}
 
 	sysfs_device_set_values(dev, devpath_real, NULL, NULL);
 
@@ -226,7 +258,6 @@ struct sysfs_device *sysfs_device_get(const char *devpath)
 		if (pos != NULL)
 			strlcpy(dev->driver, &pos[1], sizeof(dev->driver));
 	}
-
 	return dev;
 }
 
-- 
1.5.2.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [multipath 1/1] cache sysfs_devices
       [not found]         ` <11849647793576-git-send-email--s>
@ 2007-07-20 21:03           ` Guido Guenther
  2007-07-20 22:18             ` Christophe Varoqui
  2007-07-24  6:35             ` Hannes Reinecke
  0 siblings, 2 replies; 12+ messages in thread
From: Guido Guenther @ 2007-07-20 21:03 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: dm-devel

I give up on git-send-email - it always fools me. Nevertheless this one
fixes the described bug of paths coming back as different block
devices:

Keep a list of sysfs devices for sysfs_device_get() so uev_trigger() can look
up the necessary information for proper path removal in case of a 'remove'
uevent - the sysfs files in the filesystem might be long gone at this point.

This also plugs a memory leak where we'd malloc space for the same sysfs device
over and over again for every processed uevent.

Signed-off-by: Guido Guenther <agx@sigxcpu.org>
---
 libmultipath/sysfs.c |   47 +++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/libmultipath/sysfs.c b/libmultipath/sysfs.c
index 1fb5436..d8c65b2 100644
--- a/libmultipath/sysfs.c
+++ b/libmultipath/sysfs.c
@@ -45,6 +45,13 @@ struct sysfs_attr {
 	char value_local[NAME_SIZE];
 };
 
+/* list of sysfs devices */
+static LIST_HEAD(sysfs_dev_list);
+struct sysfs_dev {
+	struct list_head node;
+	struct sysfs_device dev;
+};
+
 int sysfs_init(char *path, size_t len)
 {
 	if (path) {
@@ -55,6 +62,7 @@ int sysfs_init(char *path, size_t len)
 	dbg("sysfs_path='%s'", sysfs_path);
 
 	INIT_LIST_HEAD(&attr_list);
+	INIT_LIST_HEAD(&sysfs_dev_list);
 	return 0;
 }
 
@@ -63,11 +71,17 @@ void sysfs_cleanup(void)
 	struct sysfs_attr *attr_loop;
 	struct sysfs_attr *attr_temp;
 
+	struct sysfs_dev *sysdev_loop;
+	struct sysfs_dev *sysdev_temp;
+
 	list_for_each_entry_safe(attr_loop, attr_temp, &attr_list, node) {
 		list_del(&attr_loop->node);
 		free(attr_loop);
 	}
 
+	list_for_each_entry_safe(sysdev_loop, sysdev_temp, &sysfs_dev_list, node) {
+		free(sysdev_loop);
+	}
 }
 
 void sysfs_device_set_values(struct sysfs_device *dev, const char *devpath,
@@ -140,7 +154,8 @@ struct sysfs_device *sysfs_device_get(const char *devpath)
 {
 	char path[PATH_SIZE];
 	char devpath_real[PATH_SIZE];
-	struct sysfs_device *dev;
+	struct sysfs_device *dev = NULL;
+	struct sysfs_dev *sysdev_loop, *sysdev;
 	struct stat statbuf;
 	char link_path[PATH_SIZE];
 	char link_target[PATH_SIZE];
@@ -155,7 +170,14 @@ struct sysfs_device *sysfs_device_get(const char *devpath)
 	strlcpy(path, sysfs_path, sizeof(path));
 	strlcat(path, devpath_real, sizeof(path));
 	if (lstat(path, &statbuf) != 0) {
+		/* if stat fails look in the cache */
 		dbg("stat '%s' failed: %s", path, strerror(errno));
+		list_for_each_entry(sysdev_loop, &sysfs_dev_list, node) {
+			if (strcmp(sysdev_loop->dev.devpath, devpath_real) == 0) {
+				dbg("found vanished dev in cache '%s'", sysdev_loop->dev.devpath);
+				return &sysdev_loop->dev;
+			}
+		}
 		return NULL;
 	}
 	if (S_ISLNK(statbuf.st_mode)) {
@@ -164,12 +186,22 @@ struct sysfs_device *sysfs_device_get(const char *devpath)
 
 	}
 
-	/* it is a new device */
-	dbg("new device '%s'", devpath_real);
-	dev = malloc(sizeof(struct sysfs_device));
-	if (dev == NULL)
-		return NULL;
-	memset(dev, 0x00, sizeof(struct sysfs_device));
+	list_for_each_entry(sysdev_loop, &sysfs_dev_list, node) {
+		if (strcmp(sysdev_loop->dev.devpath, devpath_real) == 0) {
+			dbg("found dev in cache '%s'", sysdev_loop->dev.devpath);
+				dev = &sysdev_loop->dev;
+		}
+	}
+	if(!dev) {
+		/* it is a new device */
+		dbg("new device '%s'", devpath_real);
+		sysdev = malloc(sizeof(struct sysfs_dev));
+		if (sysdev == NULL)
+			return NULL;
+		memset(sysdev, 0x00, sizeof(struct sysfs_dev));
+		list_add(&sysdev->node, &sysfs_dev_list);
+		dev = &sysdev->dev;
+	}
 
 	sysfs_device_set_values(dev, devpath_real, NULL, NULL);
 
@@ -226,7 +258,6 @@ struct sysfs_device *sysfs_device_get(const char *devpath)
 		if (pos != NULL)
 			strlcpy(dev->driver, &pos[1], sizeof(dev->driver));
 	}
-
 	return dev;
 }
 
-- 
1.5.2.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [multipath 1/1] cache sysfs_devices
  2007-07-20 21:03           ` Guido Guenther
@ 2007-07-20 22:18             ` Christophe Varoqui
  2007-07-24  6:35             ` Hannes Reinecke
  1 sibling, 0 replies; 12+ messages in thread
From: Christophe Varoqui @ 2007-07-20 22:18 UTC (permalink / raw)
  To: Guido Guenther; +Cc: dm-devel

Le vendredi 20 juillet 2007 à 23:03 +0200, Guido Guenther a écrit :
> I give up on git-send-email - it always fools me. Nevertheless this one
> fixes the described bug of paths coming back as different block
> devices:
> 
Merged, thanks.

Would "git-show --pretty=email <sha>" be what you need ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [multipath 1/1] cache sysfs_devices
  2007-07-20 21:03           ` Guido Guenther
  2007-07-20 22:18             ` Christophe Varoqui
@ 2007-07-24  6:35             ` Hannes Reinecke
  2007-07-25 16:37               ` Guido Guenther
  1 sibling, 1 reply; 12+ messages in thread
From: Hannes Reinecke @ 2007-07-24  6:35 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

Guido Guenther wrote:
> I give up on git-send-email - it always fools me. Nevertheless this one
> fixes the described bug of paths coming back as different block
> devices:
> 
> Keep a list of sysfs devices for sysfs_device_get() so uev_trigger() can look
> up the necessary information for proper path removal in case of a 'remove'
> uevent - the sysfs files in the filesystem might be long gone at this point.
> 
> This also plugs a memory leak where we'd malloc space for the same sysfs device
> over and over again for every processed uevent.
> 
Hmm. But now we're running into the opposite trap: memory for a block device once
existing will never be freed. So over time we're likely to become a memory hog.

Better it would be to remove the sysfs device from the cache once we're done
with it.

I'll cook up a patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [multipath 1/1] cache sysfs_devices
  2007-07-24  6:35             ` Hannes Reinecke
@ 2007-07-25 16:37               ` Guido Guenther
  0 siblings, 0 replies; 12+ messages in thread
From: Guido Guenther @ 2007-07-25 16:37 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

On Tue, Jul 24, 2007 at 08:35:16AM +0200, Hannes Reinecke wrote:
> Guido Guenther wrote:
> > I give up on git-send-email - it always fools me. Nevertheless this one
> > fixes the described bug of paths coming back as different block
> > devices:
> > 
> > Keep a list of sysfs devices for sysfs_device_get() so uev_trigger() can look
> > up the necessary information for proper path removal in case of a 'remove'
> > uevent - the sysfs files in the filesystem might be long gone at this point.
> > 
> > This also plugs a memory leak where we'd malloc space for the same sysfs device
> > over and over again for every processed uevent.
> > 
> Hmm. But now we're running into the opposite trap: memory for a block device once
> existing will never be freed. So over time we're likely to become a memory hog.
So far we did not free any memory _and_ we would end up allocating memory
for every path that comes back up over and over again.

> Better it would be to remove the sysfs device from the cache once we're done
> with it.
Fair enough, we won't need the information then anyway.
Thanks,
 -- Guido

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-07-25 16:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-19 22:54 multipath-tools causes path to come back as different block device Brian De Wolf
2007-07-20  7:24 ` Hannes Reinecke
2007-07-20 11:53   ` Guido Guenther
2007-07-20 14:40     ` Hannes Reinecke
2007-07-20 17:29       ` Guido Guenther
2007-07-20 20:52         ` [multipath 1/1] cache sysfs_devices -s
     [not found]         ` <11849647793576-git-send-email--s>
2007-07-20 21:03           ` Guido Guenther
2007-07-20 22:18             ` Christophe Varoqui
2007-07-24  6:35             ` Hannes Reinecke
2007-07-25 16:37               ` Guido Guenther
2007-07-20 17:05   ` multipath-tools causes path to come back as different block device Brian De Wolf
2007-07-20 10:10 ` Guido Guenther

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.