All of lore.kernel.org
 help / color / mirror / Atom feed
* multipathd segfault and SCSI errors
@ 2008-10-24  6:11 Prakash Rudraraju
  2008-10-24  6:37 ` Prakash Rudraraju
  2008-10-24 13:30 ` Konrad Rzeszutek
  0 siblings, 2 replies; 4+ messages in thread
From: Prakash Rudraraju @ 2008-10-24  6:11 UTC (permalink / raw)
  To: dm-devel@redhat.com


[-- Attachment #1.1: Type: text/plain, Size: 4596 bytes --]

Hi,

We have setup a Compellent SAN with 2 HBA attached to dual fabrics. Under load when we import a 60GB database, paths fail very often. Following is the failed path behavior from syslog.

Oct 23 02:01:15 db03 kernel: sd 2:0:1:1: SCSI error: return code = 0x08000002
Oct 23 02:01:15 db03 kernel: sde: Current: sense key: Aborted Command
Oct 23 02:01:15 db03 kernel:     Add. Sense: Internal target failure
Oct 23 02:01:15 db03 kernel:
Oct 23 02:01:15 db03 kernel: end_request: I/O error, dev sde, sector 911585239
Oct 23 02:01:15 db03 kernel: device-mapper: multipath: Failing path 8:64.
Oct 23 02:01:15 db03 multipathd: 8:64: mark as failed
Oct 23 02:01:15 db03 multipathd: mpath1: remaining active paths: 1
Oct 23 02:01:15 db03 kernel: sd 1:0:3:1: SCSI error: return code = 0x08000002
Oct 23 02:01:15 db03 kernel: sdc: Current: sense key: Aborted Command
Oct 23 02:01:15 db03 kernel:     Add. Sense: Internal target failure
Oct 23 02:01:15 db03 kernel:
Oct 23 02:01:15 db03 kernel: end_request: I/O error, dev sdc, sector 911585239
Oct 23 02:01:15 db03 kernel: device-mapper: multipath: Failing path 8:32.
Oct 23 02:01:16 db03 multipathd: 8:32: mark as failed
Oct 23 02:01:16 db03 multipathd: mpath1: remaining active paths: 0
Oct 23 02:01:19 db03 multipathd: sde: tur checker reports path is up
Oct 23 02:01:19 db03 multipathd: 8:64: reinstated
Oct 23 02:01:19 db03 multipathd: mpath1: remaining active paths: 1
Oct 23 02:01:20 db03 multipathd: sdc: tur checker reports path is up
Oct 23 02:01:20 db03 multipathd: 8:32: reinstated
Oct 23 02:01:20 db03 multipathd: mpath1: remaining active paths: 2
Oct 23 02:01:21 db03 kernel: sd 2:0:1:1: SCSI error: return code = 0x08000002
Oct 23 02:01:21 db03 kernel: sde: Current: sense key: Aborted Command
Oct 23 02:01:21 db03 kernel:     Add. Sense: Internal target failure
Oct 23 02:01:21 db03 kernel:


Multipathd segfault during boot and following is from dmesg output:

multipathd[7165]: segfault at 000000000000000a rip 00002aaaaaf51a3d rsp 00007fff03b50090 error 4
sd 2:0:1:1: SCSI error: return code = 0x08000002
sde: Current: sense key: Aborted Command
    Add. Sense: Internal target failure

end_request: I/O error, dev sde, sector 912637903
device-mapper: multipath: Failing path 8:64.
sd 1:0:3:1: SCSI error: return code = 0x08000002
sdc: Current: sense key: Aborted Command
    Add. Sense: Internal target failure

end_request: I/O error, dev sdc, sector 915472343
device-mapper: multipath: Failing path 8:32.
sd 2:0:1:1: SCSI error: return code = 0x08000002
sde: Current: sense key: Aborted Command
    Add. Sense: Internal target failure

end_request: I/O error, dev sde, sector 915472343
device-mapper: multipath: Failing path 8:64.
sd 2:0:1:1: SCSI error: return code = 0x08000002
sde: Current: sense key: Aborted Command
    Add. Sense: Internal target failure

end_request: I/O error, dev sde, sector 919728103
device-mapper: multipath: Failing path 8:64.
sd 1:0:3:1: SCSI error: return code = 0x08000002
sdc: Current: sense key: Aborted Command
    Add. Sense: Internal target failure

We have experienced same failures on both RHEL 5.1 and CentOS. Following is /etc/multipathd.conf

defaults {
        user_friendly_names yes
        path_grouping_policy multibus
}

devices {
        device {
                vendor "COMPELNT"
                product "Compellent Vol"
                path_checker tur
                polling_interval 10
                no_path_retry queue
        }
}

blacklist {
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "^(hd|xvd)[a-z]*"
        wwid "*"
}

# Make sure our multipath devices are enabled.

blacklist_exceptions {
        wwid "36000d310000e63000000000000000007"
        wwid "36000d310000e6300000000000000000c"
}


# multipath -ll
mpath1 (36000d310000e6300000000000000000c) dm-5 COMPELNT,Compellent Vol
[size=500G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
 \_ 1:0:3:1 sdc 8:32  [active][ready]
 \_ 2:0:1:1 sde 8:64  [active][ready]
mpath0 (36000d310000e63000000000000000007) dm-0 COMPELNT,Compellent Vol
[size=50G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
 \_ 1:0:3:0 sdb 8:16  [active][ready]
 \_ 2:0:1:0 sdd 8:48  [active][ready]


Please let me know if you need more information. This is my first experience with SAN configuration and I feel that I have missed something very obvious, because I was not getting meaningful results for those search results.

Thanks,
Prakash.

[-- Attachment #1.2: Type: text/html, Size: 20576 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: multipathd segfault and SCSI errors
  2008-10-24  6:11 multipathd segfault and SCSI errors Prakash Rudraraju
@ 2008-10-24  6:37 ` Prakash Rudraraju
  2008-10-24 13:30 ` Konrad Rzeszutek
  1 sibling, 0 replies; 4+ messages in thread
From: Prakash Rudraraju @ 2008-10-24  6:37 UTC (permalink / raw)
  To: device-mapper development


[-- Attachment #1.1: Type: text/plain, Size: 5621 bytes --]

More info about setup from multipath verbose output.

# multipath -v3
dm-0: blacklisted
dm-1: blacklisted
dm-2: blacklisted
dm-3: blacklisted
dm-4: blacklisted
dm-5: blacklisted
dm-6: blacklisted
md0: blacklisted
ram0: blacklisted
ram10: blacklisted
ram11: blacklisted
ram12: blacklisted
ram13: blacklisted
ram14: blacklisted
ram15: blacklisted
ram1: blacklisted
ram2: blacklisted
ram3: blacklisted
ram4: blacklisted
ram5: blacklisted
ram6: blacklisted
ram7: blacklisted
ram8: blacklisted
ram9: blacklisted
sda: not found in pathvec
sda: mask = 0x1f
sda: bus = 1
sda: dev_t = 8:0
sda: size = 142082048
sda: vendor = DELL
sda: product = PERC 6/i Adapter
sda: rev = 1.11
sda: h:b:t:l = 0:2:0:0
sda: serial = 0036a2870ae53c2410003469e390ec01
sda: path checker = readsector0 (config file default)
sda: state = 2
sda: getprio = NULL (internal default)
sda: prio = 1
sda: getuid = /sbin/scsi_id -g -u -s /block/%n (config file default)
sda: uid = 36001ec90e369340010243ce50a87a236 (callout)
sdb: not found in pathvec
sdb: mask = 0x1f
sdb: bus = 1
sdb: dev_t = 8:16
sdb: size = 104857600
sdb: vendor = COMPELNT
sdb: product = Compellent Vol
sdb: rev = 0401
sdb: h:b:t:l = 1:0:3:0
sdb: tgt_node_name = 0x5000d310000e6302
sdb: serial = 00000e63-00000007
sdb: path checker = tur (controller setting)
sdb: state = 2
sdb: getprio = NULL (internal default)
sdb: prio = 1
sdb: getuid = /sbin/scsi_id -g -u -s /block/%n (config file default)
sdb: uid = 36000d310000e63000000000000000007 (callout)
sdc: not found in pathvec
sdc: mask = 0x1f
sdc: bus = 1
sdc: dev_t = 8:32
sdc: size = 1048576000
sdc: vendor = COMPELNT
sdc: product = Compellent Vol
sdc: rev = 0401
sdc: h:b:t:l = 1:0:3:1
sdc: tgt_node_name = 0x5000d310000e6302
sdc: serial = 00000e63-0000000c
sdc: path checker = tur (controller setting)
sdc: state = 2
sdc: getprio = NULL (internal default)
sdc: prio = 1
sdc: getuid = /sbin/scsi_id -g -u -s /block/%n (config file default)
sdc: uid = 36000d310000e6300000000000000000c (callout)
sdd: not found in pathvec
sdd: mask = 0x1f
sdd: bus = 1
sdd: dev_t = 8:48
sdd: size = 104857600
sdd: vendor = COMPELNT
sdd: product = Compellent Vol
sdd: rev = 0401
sdd: h:b:t:l = 2:0:1:0
sdd: tgt_node_name = 0x5000d310000e6302
sdd: serial = 00000e63-00000007
sdd: path checker = tur (controller setting)
sdd: state = 2
sdd: getprio = NULL (internal default)
sdd: prio = 1
sdd: getuid = /sbin/scsi_id -g -u -s /block/%n (config file default)
sdd: uid = 36000d310000e63000000000000000007 (callout)
sde: not found in pathvec
sde: mask = 0x1f
sde: bus = 1
sde: dev_t = 8:64
sde: size = 1048576000
sde: vendor = COMPELNT
sde: product = Compellent Vol
sde: rev = 0401
sde: h:b:t:l = 2:0:1:1
sde: tgt_node_name = 0x5000d310000e6302
sde: serial = 00000e63-0000000c
sde: path checker = tur (controller setting)
sde: state = 2
sde: getprio = NULL (internal default)
sde: prio = 1
sde: getuid = /sbin/scsi_id -g -u -s /block/%n (config file default)
sde: uid = 36000d310000e6300000000000000000c (callout)
===== paths list =====
uuid                              hcil    dev dev_t pri dm_st  chk_st  vend/pr
36001ec90e369340010243ce50a87a236 0:2:0:0 sda 8:0   1   [undef][ready] DELL,PE
36000d310000e63000000000000000007 1:0:3:0 sdb 8:16  1   [undef][ready] COMPELN
36000d310000e6300000000000000000c 1:0:3:1 sdc 8:32  1   [undef][ready] COMPELN
36000d310000e63000000000000000007 2:0:1:0 sdd 8:48  1   [undef][ready] COMPELN
36000d310000e6300000000000000000c 2:0:1:1 sde 8:64  1   [undef][ready] COMPELN
params = 1 queue_if_no_path 0 1 1 round-robin 0 2 1 8:32 1000 8:64 1000
status = 1 0 0 1 1 A 0 2 0 8:32 A 47 8:64 A 49
params = 1 queue_if_no_path 0 1 1 round-robin 0 2 1 8:16 1000 8:48 1000
status = 1 0 0 1 1 A 0 2 0 8:16 A 0 8:48 A 0
36001ec90e369340010243ce50a87a236: blacklisted
36000d310000e63000000000000000007: exception-listed
Found matching wwid [36000d310000e63000000000000000007] in bindings file.
Setting alias to mpath0
sdb: ownership set to mpath0
sdb: not found in pathvec
sdb: mask = 0xc
sdb: state = 2
sdb: prio = 1
sdd: ownership set to mpath0
sdd: not found in pathvec
sdd: mask = 0xc
sdd: state = 2
sdd: prio = 1
mpath0: pgfailover = -1 (internal default)
mpath0: pgpolicy = multibus (config file default)
mpath0: selector = round-robin 0 (internal default)
mpath0: features = 0 (internal default)
mpath0: hwhandler = 0 (internal default)
mpath0: rr_weight = 1 (internal default)
mpath0: minio = 1000 (config file default)
mpath0: no_path_retry = -2 (controller setting)
pg_timeout = NONE (internal default)
mpath0: set ACT_NOTHING (map unchanged)
36000d310000e6300000000000000000c: exception-listed
Found matching wwid [36000d310000e6300000000000000000c] in bindings file.
Setting alias to mpath1
sdc: ownership set to mpath1
sdc: not found in pathvec
sdc: mask = 0xc
sdc: state = 2
sdc: prio = 1
sde: ownership set to mpath1
sde: not found in pathvec
sde: mask = 0xc
sde: state = 2
sde: prio = 1
mpath1: pgfailover = -1 (internal default)
mpath1: pgpolicy = multibus (config file default)
mpath1: selector = round-robin 0 (internal default)
mpath1: features = 0 (internal default)
mpath1: hwhandler = 0 (internal default)
mpath1: rr_weight = 1 (internal default)
mpath1: minio = 1000 (config file default)
mpath1: no_path_retry = -2 (controller setting)
pg_timeout = NONE (internal default)
mpath1: set ACT_NOTHING (map unchanged)
36000d310000e63000000000000000007: exception-listed
36000d310000e6300000000000000000c: exception-listed

Thanks,
Prakash.

[-- Attachment #1.2: Type: text/html, Size: 32541 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: multipathd segfault and SCSI errors
  2008-10-24  6:11 multipathd segfault and SCSI errors Prakash Rudraraju
  2008-10-24  6:37 ` Prakash Rudraraju
@ 2008-10-24 13:30 ` Konrad Rzeszutek
  2008-10-24 15:39   ` Prakash Rudraraju
  1 sibling, 1 reply; 4+ messages in thread
From: Konrad Rzeszutek @ 2008-10-24 13:30 UTC (permalink / raw)
  To: device-mapper development

> multipathd[7165]: segfault at 000000000000000a rip 00002aaaaaf51a3d rsp 00007fff03b50090 error 4

Can you run multipathd as so:

multipathd -v9 -d

And provide the -200 lines output from the segfault output? You might need to
edit the /etc/init.d/multipathd to have this work and pipe the output to a log file or so.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: multipathd segfault and SCSI errors
  2008-10-24 13:30 ` Konrad Rzeszutek
@ 2008-10-24 15:39   ` Prakash Rudraraju
  0 siblings, 0 replies; 4+ messages in thread
From: Prakash Rudraraju @ 2008-10-24 15:39 UTC (permalink / raw)
  To: device-mapper development

Eliminating the local disk /dev/sda by adding it to blacklist solved the problem of segmentation fault during boot up.

blacklist {
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
        devnode "^(hd|xvd)[a-z]*"
        devnode "sda"
        wwid "*"
}

I am currently importing the database to check for path failures during load. It takes about 12 hours to complete, but we had no path failures so far.

I will get "multipathd -v9 -d" after the import is complete.

# multipathd -v9 -d
Oct 24 08:38:39 | --------start up--------
Oct 24 08:38:39 | read /etc/multipath.conf
Oct 24 08:38:39 | process is already running

Thanks,
Prakash.


-----Original Message-----
From: dm-devel-bounces@redhat.com [mailto:dm-devel-bounces@redhat.com] On Behalf Of Konrad Rzeszutek
Sent: Friday, October 24, 2008 6:31 AM
To: device-mapper development
Subject: Re: [dm-devel] multipathd segfault and SCSI errors

> multipathd[7165]: segfault at 000000000000000a rip 00002aaaaaf51a3d rsp 00007fff03b50090 error 4

Can you run multipathd as so:

multipathd -v9 -d

And provide the -200 lines output from the segfault output? You might need to
edit the /etc/init.d/multipathd to have this work and pipe the output to a log file or so.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-10-24 15:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-24  6:11 multipathd segfault and SCSI errors Prakash Rudraraju
2008-10-24  6:37 ` Prakash Rudraraju
2008-10-24 13:30 ` Konrad Rzeszutek
2008-10-24 15:39   ` Prakash Rudraraju

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.