All of lore.kernel.org
 help / color / mirror / Atom feed
* Multipath / iSCSI issues
@ 2013-02-24 19:15 Devin
  2013-02-24 23:50 ` Mike Christie
  0 siblings, 1 reply; 4+ messages in thread
From: Devin @ 2013-02-24 19:15 UTC (permalink / raw)
  To: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 905 bytes --]

I am running Oracle Enterprise Linux 5.8 (which is really just Redhat). I
am using Multipath and I have LUNS presented to me via iSCSI from a Hitachi
SAN. I have the NICS bonded using the Linux bonding driver and using
Active-Backup mode. I notice that when I loose a switch or connection to
one of the switches that multipath freezes for at least 60 seconds before
it starts to respond again. Also it appears that IO being generated freezes
until multipath responds again, this pause up to 60 seconds is causing my
Oracle instances to crash.

I have not been able to easily find what settings i could possibly change
to make it fail to a new path faster. It almost seems like it's taking
multipath a bit to fail all IO to a new path that is working.

Is there any information that might be useful for me that I can check on
either the multipath side or the iSCSI side to see what is causing the
issue???

[-- Attachment #1.2: Type: text/html, Size: 1049 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Multipath / iSCSI issues
  2013-02-24 19:15 Multipath / iSCSI issues Devin
@ 2013-02-24 23:50 ` Mike Christie
  2013-02-25  2:07   ` Devin
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Christie @ 2013-02-24 23:50 UTC (permalink / raw)
  To: device-mapper development; +Cc: Devin

On 02/24/2013 01:15 PM, Devin wrote:
> 
> I am running Oracle Enterprise Linux 5.8 (which is really just Redhat).
> I am using Multipath and I have LUNS presented to me via iSCSI from a
> Hitachi SAN. I have the NICS bonded using the Linux bonding driver and
> using Active-Backup mode. I notice that when I loose a switch or
> connection to one of the switches that multipath freezes for at least 60
> seconds before it starts to respond again. Also it appears that IO being
> generated freezes until multipath responds again, this pause up to 60
> seconds is causing my Oracle instances to crash.
> 
> I have not been able to easily find what settings i could possibly
> change to make it fail to a new path faster. It almost seems like it's
> taking multipath a bit to fail all IO to a new path that is working.
> 
> Is there any information that might be useful for me that I can check on
> either the multipath side or the iSCSI side to see what is causing the
> issue???
> 

What iscsi driver are you using? If you are using software iscsi that
comes with OEL 5.8 what are our node.session.timeo.replacement_timeout,
.timeo.noop_out_timeout and .timeo.noop_out_interval. And what is your
scsi command timeout. You can see that by doing:

cat /sys/block/sdX/device/timeout

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Multipath / iSCSI issues
  2013-02-24 23:50 ` Mike Christie
@ 2013-02-25  2:07   ` Devin
  2013-02-25 16:42     ` Mike Christie
  0 siblings, 1 reply; 4+ messages in thread
From: Devin @ 2013-02-25  2:07 UTC (permalink / raw)
  To: Mike Christie; +Cc: device-mapper development


[-- Attachment #1.1: Type: text/plain, Size: 2160 bytes --]

I am using the iscsi tools that is included with Oracle
(iscsi-initiator-utils-6.2.0.872-13.0.1.el5).

The values I have in my iscsid.conf are:
node.session.timeo.replacement_timeout = 15
node.conn[0].timeo.noop_out_timeout = 1
node.conn[0].timeo.noop_out_interval = 1

{i have previously changed the settings from the values based upon some
feedback I got from a tech guy but that didn't seem to make much
difference}.

In regards to the scsi command timeout it appears to be set to 60.

# cat /sys/block/sdw/device/timeout
60

So is my thinking correct that I will want to have the SCSI devices to
timeout more quickly like 1 second versus the 60 seconds? If so where would
i make this change in regards to the disks???

Thanks much.

Devin Acosta


On Sun, Feb 24, 2013 at 4:50 PM, Mike Christie <michaelc@cs.wisc.edu> wrote:

> On 02/24/2013 01:15 PM, Devin wrote:
> >
> > I am running Oracle Enterprise Linux 5.8 (which is really just Redhat).
> > I am using Multipath and I have LUNS presented to me via iSCSI from a
> > Hitachi SAN. I have the NICS bonded using the Linux bonding driver and
> > using Active-Backup mode. I notice that when I loose a switch or
> > connection to one of the switches that multipath freezes for at least 60
> > seconds before it starts to respond again. Also it appears that IO being
> > generated freezes until multipath responds again, this pause up to 60
> > seconds is causing my Oracle instances to crash.
> >
> > I have not been able to easily find what settings i could possibly
> > change to make it fail to a new path faster. It almost seems like it's
> > taking multipath a bit to fail all IO to a new path that is working.
> >
> > Is there any information that might be useful for me that I can check on
> > either the multipath side or the iSCSI side to see what is causing the
> > issue???
> >
>
> What iscsi driver are you using? If you are using software iscsi that
> comes with OEL 5.8 what are our node.session.timeo.replacement_timeout,
> .timeo.noop_out_timeout and .timeo.noop_out_interval. And what is your
> scsi command timeout. You can see that by doing:
>
> cat /sys/block/sdX/device/timeout
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 3008 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Multipath / iSCSI issues
  2013-02-25  2:07   ` Devin
@ 2013-02-25 16:42     ` Mike Christie
  0 siblings, 0 replies; 4+ messages in thread
From: Mike Christie @ 2013-02-25 16:42 UTC (permalink / raw)
  To: Devin; +Cc: device-mapper development

On 02/24/2013 08:07 PM, Devin wrote:
> 
> I am using the iscsi tools that is included with Oracle
> (iscsi-initiator-utils-6.2.0.872-13.0.1.el5). 
> 
> The values I have in my iscsid.conf are:
> node.session.timeo.replacement_timeout = 15
> node.conn[0].timeo.noop_out_timeout = 1
> node.conn[0].timeo.noop_out_interval = 1

Those noop related values are too low. You will get really fast
failovers, but this is going to end up causing failures/failovers when
IO is just executing a little slow.

> 
> {i have previously changed the settings from the values based upon some
> feedback I got from a tech guy but that didn't seem to make much
> difference}.
> 
> In regards to the scsi command timeout it appears to be set to 60.
> 
> # cat /sys/block/sdw/device/timeout 
> 60
> 
> So is my thinking correct that I will want to have the SCSI devices to
> timeout more quickly like 1 second versus the 60 seconds? If so where
> would i make this change in regards to the disks???
> 

I would need the /var/log/messages with the iscsi logging turned on so I
can know for sure what where errors are being fired, but it sounds like
you would want to set the device timeout lower. You do not want it set
to only 1 second, because that is going to be too low. How fast does
Oracle need something to failover?

You can set the device timeout manually through that sysfs file, or you
there should be a udev rule in OEL 5.


> Thanks much.
> 
> Devin Acosta
> 
> 
> On Sun, Feb 24, 2013 at 4:50 PM, Mike Christie <michaelc@cs.wisc.edu
> <mailto:michaelc@cs.wisc.edu>> wrote:
> 
>     On 02/24/2013 01:15 PM, Devin wrote:
>     >
>     > I am running Oracle Enterprise Linux 5.8 (which is really just
>     Redhat).
>     > I am using Multipath and I have LUNS presented to me via iSCSI from a
>     > Hitachi SAN. I have the NICS bonded using the Linux bonding driver and
>     > using Active-Backup mode. I notice that when I loose a switch or
>     > connection to one of the switches that multipath freezes for at
>     least 60
>     > seconds before it starts to respond again. Also it appears that IO
>     being
>     > generated freezes until multipath responds again, this pause up to 60
>     > seconds is causing my Oracle instances to crash.
>     >
>     > I have not been able to easily find what settings i could possibly
>     > change to make it fail to a new path faster. It almost seems like it's
>     > taking multipath a bit to fail all IO to a new path that is working.
>     >
>     > Is there any information that might be useful for me that I can
>     check on
>     > either the multipath side or the iSCSI side to see what is causing the
>     > issue???
>     >
> 
>     What iscsi driver are you using? If you are using software iscsi that
>     comes with OEL 5.8 what are our node.session.timeo.replacement_timeout,
>     .timeo.noop_out_timeout and .timeo.noop_out_interval. And what is your
>     scsi command timeout. You can see that by doing:
> 
>     cat /sys/block/sdX/device/timeout
> 
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-02-25 16:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-24 19:15 Multipath / iSCSI issues Devin
2013-02-24 23:50 ` Mike Christie
2013-02-25  2:07   ` Devin
2013-02-25 16:42     ` Mike Christie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.