All of lore.kernel.org
 help / color / mirror / Atom feed
* Failing on NIC removal
@ 2007-11-19 17:16 Scott Moseman
  2007-11-19 21:46 ` Mike Anderson
  0 siblings, 1 reply; 4+ messages in thread
From: Scott Moseman @ 2007-11-19 17:16 UTC (permalink / raw)
  To: device-mapper development

So I finally got my multipath running through both the NIC and HBA
interfaces, but I'm not having any luck going through testing to
verify it's actually failing over between the connections.

# multipath -l
mpath0 (30690a018f015191a6472441d1500f057)
[size=4 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 3:0:0:0 sdc 8:32 [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:0 sdb 8:16 [active][ready]

I can unplug the HBA (see below) and the connection to the SAN remains.

# multipath -l
mpath0 (30690a018f015191a6472441d1500f057)
[size=4 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 3:0:0:0 sdc 8:32 [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:1:0 sdb 8:16 [failed][faulty]

But when I unplug the NIC connection, the multipath command hands,
trying to list files on the SAN partition hangs, and I'm getting these
messages:

Nov 19 17:15:13 ems1 kernel: iscsi-sfnet:host3: Connect failed with rc
-113: No route to host
Nov 19 17:15:13 ems1 kernel: iscsi-sfnet:host3: establish_session
failed. Could not connect to target
Nov 19 17:15:13 ems1 kernel: iscsi-sfnet:host3: Waiting 10 seconds
before next login attempt

How to troubleshoot this situation?

Thanks,
Scott

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Failing on NIC removal
  2007-11-19 17:16 Failing on NIC removal Scott Moseman
@ 2007-11-19 21:46 ` Mike Anderson
  2007-11-19 21:58   ` Mike Christie
  2007-11-20 14:16   ` Scott Moseman
  0 siblings, 2 replies; 4+ messages in thread
From: Mike Anderson @ 2007-11-19 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: open-iscsi

cc'ing open-iscsi

Scott Moseman <scmoseman@gmail.com> wrote:
> So I finally got my multipath running through both the NIC and HBA
> interfaces, but I'm not having any luck going through testing to
> verify it's actually failing over between the connections.
> 
> # multipath -l
> mpath0 (30690a018f015191a6472441d1500f057)
> [size=4 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [active]
>  \_ 3:0:0:0 sdc 8:32 [active][ready]
> \_ round-robin 0 [enabled]
>  \_ 1:0:1:0 sdb 8:16 [active][ready]
> 
> I can unplug the HBA (see below) and the connection to the SAN remains.
> 
> # multipath -l
> mpath0 (30690a018f015191a6472441d1500f057)
> [size=4 GB][features="0"][hwhandler="0"]
> \_ round-robin 0 [active]
>  \_ 3:0:0:0 sdc 8:32 [active][ready]
> \_ round-robin 0 [enabled]
>  \_ 1:0:1:0 sdb 8:16 [failed][faulty]
> 
> But when I unplug the NIC connection, the multipath command hands,
> trying to list files on the SAN partition hangs, and I'm getting these
> messages:
> 
> Nov 19 17:15:13 ems1 kernel: iscsi-sfnet:host3: Connect failed with rc
> -113: No route to host
> Nov 19 17:15:13 ems1 kernel: iscsi-sfnet:host3: establish_session
> failed. Could not connect to target
> Nov 19 17:15:13 ems1 kernel: iscsi-sfnet:host3: Waiting 10 seconds
> before next login attempt
> 
> How to troubleshoot this situation?

The IO is hanging waiting for the connection to be reestablished. 

You may need to set ConnFailTimeout to a non-zero value as indicated in
http://people.redhat.com/mchristi/iscsi/RHEL4/doc/readme

Someone on the open-iscsi list may have a better suggestion.

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Failing on NIC removal
  2007-11-19 21:46 ` Mike Anderson
@ 2007-11-19 21:58   ` Mike Christie
  2007-11-20 14:16   ` Scott Moseman
  1 sibling, 0 replies; 4+ messages in thread
From: Mike Christie @ 2007-11-19 21:58 UTC (permalink / raw)
  To: device-mapper development; +Cc: open-iscsi

Mike Anderson wrote:
> cc'ing open-iscsi
> 
> Scott Moseman <scmoseman@gmail.com> wrote:
>> So I finally got my multipath running through both the NIC and HBA
>> interfaces, but I'm not having any luck going through testing to
>> verify it's actually failing over between the connections.
>>
>> # multipath -l
>> mpath0 (30690a018f015191a6472441d1500f057)
>> [size=4 GB][features="0"][hwhandler="0"]
>> \_ round-robin 0 [active]
>>  \_ 3:0:0:0 sdc 8:32 [active][ready]
>> \_ round-robin 0 [enabled]
>>  \_ 1:0:1:0 sdb 8:16 [active][ready]
>>
>> I can unplug the HBA (see below) and the connection to the SAN remains.
>>
>> # multipath -l
>> mpath0 (30690a018f015191a6472441d1500f057)
>> [size=4 GB][features="0"][hwhandler="0"]
>> \_ round-robin 0 [active]
>>  \_ 3:0:0:0 sdc 8:32 [active][ready]
>> \_ round-robin 0 [enabled]
>>  \_ 1:0:1:0 sdb 8:16 [failed][faulty]
>>
>> But when I unplug the NIC connection, the multipath command hands,
>> trying to list files on the SAN partition hangs, and I'm getting these
>> messages:
>>
>> Nov 19 17:15:13 ems1 kernel: iscsi-sfnet:host3: Connect failed with rc
>> -113: No route to host
>> Nov 19 17:15:13 ems1 kernel: iscsi-sfnet:host3: establish_session
>> failed. Could not connect to target
>> Nov 19 17:15:13 ems1 kernel: iscsi-sfnet:host3: Waiting 10 seconds
>> before next login attempt
>>
>> How to troubleshoot this situation?
> 
> The IO is hanging waiting for the connection to be reestablished. 
> 
> You may need to set ConnFailTimeout to a non-zero value as indicated in
> http://people.redhat.com/mchristi/iscsi/RHEL4/doc/readme
> 

Mike Anderson is right. If you are using multipath you should set 
ConnFailTimeout to a low value like 3 or 5 seconds, because we want to 
fail commands quickly to the multipath layer. For dm-multipath you want 
to then set no_path_retry to either queue IO forever (or until the paths 
come back), or to some timeout.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Failing on NIC removal
  2007-11-19 21:46 ` Mike Anderson
  2007-11-19 21:58   ` Mike Christie
@ 2007-11-20 14:16   ` Scott Moseman
  1 sibling, 0 replies; 4+ messages in thread
From: Scott Moseman @ 2007-11-20 14:16 UTC (permalink / raw)
  To: device-mapper development

On Nov 19, 2007 3:46 PM, Mike Anderson <andmike@linux.vnet.ibm.com> wrote:
>
> You may need to set ConnFailTimeout to a non-zero value as indicated in
> http://people.redhat.com/mchristi/iscsi/RHEL4/doc/readme
>

Hey Mike,

Sweet, my failover is working perfect now!

Thanks for the help!
Scott

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-11-20 14:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-19 17:16 Failing on NIC removal Scott Moseman
2007-11-19 21:46 ` Mike Anderson
2007-11-19 21:58   ` Mike Christie
2007-11-20 14:16   ` Scott Moseman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.