multipath: Path checks on open-iscsi software initiators

All of lore.kernel.org
 help / color / mirror / Atom feed

* multipath: Path checks on open-iscsi software initiators
@ 2010-02-09  1:18 Daniel Stodden
  2010-02-09  4:45 ` Mike Snitzer
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Stodden @ 2010-02-09  1:18 UTC (permalink / raw)
  To: device-mapper development

[-- Attachment #1: Type: text/plain, Size: 2259 bytes --]

Hi.

I've recently been spending some time tracing path checks on iSCSI
targets.

Samples described here were taken with the directio checker on a netapp
lun, but I believe the target kind doesn't matter here, since most of
what I find is rather driven by the initiator side.

So what I see is:

1. The directio checker issues its aio read on sector0.

2. The request obviously will block until iscsi is giving up on it.
  This typically happens not before target pings (noop-out ops) 
  issued internally by the initiator time out. Look like:

  iscsid: Nop-out timedout after 15 seconds on connection 1:0 
  state (3). Dropping session.

  (period and timeouts depend on the configuration at hand).

3. Session failure still won't unblock the read. This is because the
  iscsi session will enter recovery mode, to avoid failing the
  data path right away. The device will enter blocked state during 
  that period. 

  Since I'm provoking a complete failure, this will time out as well, 
  but only later:

  iscsi: session recovery timed out after 15 secs

  (again, timeouts are iscsid.conf-dependent)

4. This will finally unblock the directio check with EIO, 
   triggering the path failure.

My main issue is that a device sitting on a software iscsi initiator

 a) performs its own path failure detection and
 b) defers data path operations to mask failures, 
    which obviously counteracts a checker based on
    data path operations.

Kernels somewhere during the 2.6.2x series apparently started to move
part of the session checks into the kernel (apparently including the
noop-out itself, but I don't). One side effect of that is that session
state can be queried via sysfs. 

So right now I'm mainly wondering if a multipath failure driven rather
by polling session state that a data read wouldn't be more effective? 

I've only been browsing part of the iscsi code by now, but I don't see
how data path failures wouldn't relate to session state.

There's some code attached below to demonstrate that. It presently jumps
through some extra loops to reverse-map fd back to the block device
node, but the basic thing was relatively straightforward to implement.

Thanks in advance for about any input on that matter.

Cheers,
Daniel 

[-- Attachment #2: open-iscsi.c --]
[-- Type: text/x-csrc, Size: 4369 bytes --]

/*
 * Copyright (c) 2010, Citrix Systems, Inc.
 * All rights reserved.
 *
 * Author: Daniel Stodden <daniel.stodden@citrix.com>
 *
 * This  library is  free  software; you  can  redistribute it  and/or
 * modify it under the terms  of the GNU Lesser General Public License
 * as published by  the Free Software Foundation; either  version 2 of
 * the License, or (at your option) any later version.
 *
 * This library is distributed in the hope that it will be useful, but
 * WITHOUT  ANY  WARRANTY;  without   even  the  implied  warranty  of
 * MERCHANTABILITY or  FITNESS FOR A PARTICULAR PURPOSE.   See the GNU
 * Lesser General Public License for more details.
 *
 * You should  have received a copy  of the GNU  Lesser General Public
 * License along with this library; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
 * USA
 */

#define _BSD_SOURCE

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <linux/major.h>

#include "checkers.h"
#include "../libmultipath/debug.h"

#define MODULE OPEN_ISCSI

#define __MSG(_c, _fmt, _args ...) snprintf((c)->message, CHECKER_MSG_LEN, _fmt, ##_args);

struct open_iscsi_ctx {
	char state_path[128];
	char state_last[32];
};

static int
_major_idx(dev_t rdev)
{
	int _major = major(rdev);

	switch (_major) {
	case SCSI_DISK0_MAJOR:
		return 0;
	case SCSI_DISK1_MAJOR ... SCSI_DISK7_MAJOR:
		return _major - SCSI_DISK1_MAJOR + 1;
	case SCSI_DISK8_MAJOR ... SCSI_DISK15_MAJOR:
		return _major - SCSI_DISK8_MAJOR + 8;
	}

	return -EINVAL;
}

static int
scsi_disk_index(dev_t rdev)
{
	unsigned int index;
	int _minor;

	index  = _major_idx(rdev) << 4;
	if (index < 0)
		return index;

	_minor = minor(rdev);
	index |= ((_minor >> 4) & 0xf) | (_minor & 0xfff00);

	return index;
}

int
scsi_block_name(int fd, char *buf, size_t len)
{
	unsigned int index;
	struct stat st;
	int n, err;

	err = fstat(fd, &st);
	if (err)
		return -errno;

	index = scsi_disk_index(st.st_rdev);
	if (index < 0)
		return index;

	switch (index) {
	case 0 ... 26:
		n = snprintf(buf, len, "sd%c",
			     'a' + index % 26);
		break;
	case 27 ... (26 + 1) * 26:
		n = snprintf(buf, len, "sd%c%c",
			     'a' + index / 26 - 1,
			     'a' + index % 26);
		break;
	default:
		n = snprintf(buf, len, "sd%c%c%c",
			     'a' + (index / 26 - 1) / 26 - 1,
			     'a' + (index / 26 - 1) % 26,
			     'a' + index % 26);
	}

	if (n >= len)
		return -EFAULT;

	return 0;
}

static int
open_iscsi_session_path(int fd, char *path, size_t len)
{
	char name[8];
	char link[64];
	ssize_t n;
	int host, session, err;

	err = scsi_block_name(fd, name, sizeof(name));
	if (err)
		return err;

	n = snprintf(link, sizeof(link),
		     "/sys/block/%s/device", name);
	if (n >= sizeof(link))
		return -EFAULT;

	n = readlink(link, path, len);
	if (n < 0)
		return -errno;

	n = sscanf(path,
		   "../../devices/platform/host%d/session%d/target",
		   &host, &session);
	if (n != 2)
		return -EBADE;

	n = snprintf(path, len,
		     "/sys/class/iscsi_session/session%d/state",
		     session);
	if (n >= len)
		return -EFAULT;

	return 0;
}

int
open_iscsi_check(struct checker *c)
{
	struct open_iscsi_ctx *s = c->context;
	int fd, state, err;
	ssize_t n;

	fd = open(s->state_path, O_RDONLY);
	if (fd < 0) {
		err = -errno;
		goto fail;
	}

	n = read(fd, s->state_last, sizeof(s->state_last));
	if (n < 0) {
		err = -errno;
		goto fail;
	}
	if (!n || n >= sizeof(s->state_last)) {
		err = -EFAULT;
		goto fail;
	}

	s->state_last[n - 1] = 0;

	if (!strcmp(s->state_last, "LOGGED_IN\n"))
		state = PATH_UP;
	else
		state = PATH_DOWN;

out:
	if (fd)
		close(fd);

	return state;

fail:
	__MSG(c, MODULE ": path check failed: %s", strerror(-err));
	state = PATH_UP;
	goto out;
}

int
open_iscsi_init(struct checker * c)
{
	struct open_iscsi_ctx *s;
	int err;

	s = malloc(sizeof(struct open_iscsi_ctx));
	if (!s) {
		err = -errno;
		goto fail;
	}

	err = open_iscsi_session_path(c->fd,
				      s->state_path,
				      sizeof(s->state_path));
	if (err)
		goto fail;

	c->context = s;

	return 0;

fail:
	if (s)
		free(s);
	condlog(1, MODULE ": failed to initialize: %s", strerror(-err));
	return 1;
}

void
open_iscsi_free(struct checker * c)
{
	free(c->context);
	c->context = NULL;
}

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: multipath: Path checks on open-iscsi software initiators
  2010-02-09  1:18 multipath: Path checks on open-iscsi software initiators Daniel Stodden
@ 2010-02-09  4:45 ` Mike Snitzer
  2010-02-09  5:16   ` Daniel Stodden
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Snitzer @ 2010-02-09  4:45 UTC (permalink / raw)
  To: device-mapper development

On Mon, Feb 8, 2010 at 8:18 PM, Daniel Stodden
<daniel.stodden@citrix.com> wrote:
>
> Hi.
>
> I've recently been spending some time tracing path checks on iSCSI
> targets.
>
> Samples described here were taken with the directio checker on a netapp
> lun, but I believe the target kind doesn't matter here, since most of
> what I find is rather driven by the initiator side.
>
> So what I see is:
>
> 1. The directio checker issues its aio read on sector0.
>
> 2. The request obviously will block until iscsi is giving up on it.
>  This typically happens not before target pings (noop-out ops)
>  issued internally by the initiator time out. Look like:
>
>  iscsid: Nop-out timedout after 15 seconds on connection 1:0
>  state (3). Dropping session.
>
>  (period and timeouts depend on the configuration at hand).
>
> 3. Session failure still won't unblock the read. This is because the
>  iscsi session will enter recovery mode, to avoid failing the
>  data path right away. The device will enter blocked state during
>  that period.
>
>  Since I'm provoking a complete failure, this will time out as well,
>  but only later:
>
>  iscsi: session recovery timed out after 15 secs
>
>  (again, timeouts are iscsid.conf-dependent)
>
> 4. This will finally unblock the directio check with EIO,
>   triggering the path failure.
>
>
> My main issue is that a device sitting on a software iscsi initiator
>
>  a) performs its own path failure detection and
>  b) defers data path operations to mask failures,
>    which obviously counteracts a checker based on
>    data path operations.
>
> Kernels somewhere during the 2.6.2x series apparently started to move
> part of the session checks into the kernel (apparently including the
> noop-out itself, but I don't). One side effect of that is that session
> state can be queried via sysfs.
>
> So right now I'm mainly wondering if a multipath failure driven rather
> by polling session state that a data read wouldn't be more effective?
>
> I've only been browsing part of the iscsi code by now, but I don't see
> how data path failures wouldn't relate to session state.
>
> There's some code attached below to demonstrate that. It presently jumps
> through some extra loops to reverse-map fd back to the block device
> node, but the basic thing was relatively straightforward to implement.
>
> Thanks in advance for about any input on that matter.
>
> Cheers,
> Daniel
>

You might look at the multipath-tools patch included in a fairly
recent dm-devel mail titled "[PATCH] Update path_offline() to return
device status"

The committed patch is available here:
http://git.kernel.org/gitweb.cgi?p=linux/storage/multipath-tools/.git;a=commit;h=88c75172cf56e

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: multipath: Path checks on open-iscsi software initiators
  2010-02-09  4:45 ` Mike Snitzer
@ 2010-02-09  5:16   ` Daniel Stodden
  2010-02-19 12:23     ` Hannes Reinecke
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Stodden @ 2010-02-09  5:16 UTC (permalink / raw)
  To: device-mapper development

On Mon, 2010-02-08 at 23:45 -0500, Mike Snitzer wrote:
> On Mon, Feb 8, 2010 at 8:18 PM, Daniel Stodden
> <daniel.stodden@citrix.com> wrote:
> >
> > Hi.
> >
> > I've recently been spending some time tracing path checks on iSCSI
> > targets.
> >
> > Samples described here were taken with the directio checker on a netapp
> > lun, but I believe the target kind doesn't matter here, since most of
> > what I find is rather driven by the initiator side.
> >
> > So what I see is:
> >
> > 1. The directio checker issues its aio read on sector0.
> >
> > 2. The request obviously will block until iscsi is giving up on it.
> >  This typically happens not before target pings (noop-out ops)
> >  issued internally by the initiator time out. Look like:
> >
> >  iscsid: Nop-out timedout after 15 seconds on connection 1:0
> >  state (3). Dropping session.
> >
> >  (period and timeouts depend on the configuration at hand).
> >
> > 3. Session failure still won't unblock the read. This is because the
> >  iscsi session will enter recovery mode, to avoid failing the
> >  data path right away. The device will enter blocked state during
> >  that period.
> >
> >  Since I'm provoking a complete failure, this will time out as well,
> >  but only later:
> >
> >  iscsi: session recovery timed out after 15 secs
> >
> >  (again, timeouts are iscsid.conf-dependent)
> >
> > 4. This will finally unblock the directio check with EIO,
> >   triggering the path failure.
> >
> >
> > My main issue is that a device sitting on a software iscsi initiator
> >
> >  a) performs its own path failure detection and
> >  b) defers data path operations to mask failures,
> >    which obviously counteracts a checker based on
> >    data path operations.
> >
> > Kernels somewhere during the 2.6.2x series apparently started to move
> > part of the session checks into the kernel (apparently including the
> > noop-out itself, but I don't). One side effect of that is that session
> > state can be queried via sysfs.
> >
> > So right now I'm mainly wondering if a multipath failure driven rather
> > by polling session state that a data read wouldn't be more effective?
> >
> > I've only been browsing part of the iscsi code by now, but I don't see
> > how data path failures wouldn't relate to session state.
> >
> > There's some code attached below to demonstrate that. It presently jumps
> > through some extra loops to reverse-map fd back to the block device
> > node, but the basic thing was relatively straightforward to implement.
> >
> > Thanks in advance for about any input on that matter.
> >
> > Cheers,
> > Daniel
> >
> 
> You might look at the multipath-tools patch included in a fairly
> recent dm-devel mail titled "[PATCH] Update path_offline() to return
> device status"
> 
> The committed patch is available here:
> http://git.kernel.org/gitweb.cgi?p=linux/storage/multipath-tools/.git;a=commit;h=88c75172cf56e

Hi Mike.

Thanks very much for the link.

I think this stuff is going into the right direction, but judging from 
the present implementation of path_offline(), 

http://git.kernel.org/gitweb.cgi?p=linux/storage/multipath-tools/.git;a=blob;f=libmultipath/discovery.c;h=6b99d07452ed6a0e9bc4aaa91f74fda5445ed1cc;hb=HEAD#l581

this behavior still matches item 3 described above, or am I mistaken?

The scsi device will be blocked after the iscsi session already failed.

My understanding is that this is perfectly intentional -- the initiator
will block the device while trying to recover the session.

Which, as even described in the patch, makes the check transition to
pending in the meantime. The path is, however, already broken.

So to summarize: What I'm asking about is if path checks based on
datapath ops aren't rather ineffective if the underlying transport tries
to mask datapath failures.

Daniel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: multipath: Path checks on open-iscsi software initiators
  2010-02-09  5:16   ` Daniel Stodden
@ 2010-02-19 12:23     ` Hannes Reinecke
  0 siblings, 0 replies; 4+ messages in thread
From: Hannes Reinecke @ 2010-02-19 12:23 UTC (permalink / raw)
  To: device-mapper development

Daniel Stodden wrote:
> On Mon, 2010-02-08 at 23:45 -0500, Mike Snitzer wrote:
>> On Mon, Feb 8, 2010 at 8:18 PM, Daniel Stodden
>> <daniel.stodden@citrix.com> wrote:
>>> Hi.
>>>
>>> I've recently been spending some time tracing path checks on iSCSI
>>> targets.
>>>
>>> Samples described here were taken with the directio checker on a netapp
>>> lun, but I believe the target kind doesn't matter here, since most of
>>> what I find is rather driven by the initiator side.
>>>
>>> So what I see is:
>>>
>>> 1. The directio checker issues its aio read on sector0.
>>>
>>> 2. The request obviously will block until iscsi is giving up on it.
>>>  This typically happens not before target pings (noop-out ops)
>>>  issued internally by the initiator time out. Look like:
>>>
>>>  iscsid: Nop-out timedout after 15 seconds on connection 1:0
>>>  state (3). Dropping session.
>>>
>>>  (period and timeouts depend on the configuration at hand).
>>>
>>> 3. Session failure still won't unblock the read. This is because the
>>>  iscsi session will enter recovery mode, to avoid failing the
>>>  data path right away. The device will enter blocked state during
>>>  that period.
>>>
>>>  Since I'm provoking a complete failure, this will time out as well,
>>>  but only later:
>>>
>>>  iscsi: session recovery timed out after 15 secs
>>>
>>>  (again, timeouts are iscsid.conf-dependent)
>>>
>>> 4. This will finally unblock the directio check with EIO,
>>>   triggering the path failure.
>>>
>>>
>>> My main issue is that a device sitting on a software iscsi initiator
>>>
>>>  a) performs its own path failure detection and
>>>  b) defers data path operations to mask failures,
>>>    which obviously counteracts a checker based on
>>>    data path operations.
>>>
>>> Kernels somewhere during the 2.6.2x series apparently started to move
>>> part of the session checks into the kernel (apparently including the
>>> noop-out itself, but I don't). One side effect of that is that session
>>> state can be queried via sysfs.
>>>
>>> So right now I'm mainly wondering if a multipath failure driven rather
>>> by polling session state that a data read wouldn't be more effective?
>>>
>>> I've only been browsing part of the iscsi code by now, but I don't see
>>> how data path failures wouldn't relate to session state.
>>>
>>> There's some code attached below to demonstrate that. It presently jumps
>>> through some extra loops to reverse-map fd back to the block device
>>> node, but the basic thing was relatively straightforward to implement.
>>>
>>> Thanks in advance for about any input on that matter.
>>>
>>> Cheers,
>>> Daniel
>>>
>> You might look at the multipath-tools patch included in a fairly
>> recent dm-devel mail titled "[PATCH] Update path_offline() to return
>> device status"
>>
>> The committed patch is available here:
>> http://git.kernel.org/gitweb.cgi?p=linux/storage/multipath-tools/.git;a=commit;h=88c75172cf56e
> 
> Hi Mike.
> 
> Thanks very much for the link.
> 
> I think this stuff is going into the right direction, but judging from 
> the present implementation of path_offline(), 
> 
> http://git.kernel.org/gitweb.cgi?p=linux/storage/multipath-tools/.git;a=blob;f=libmultipath/discovery.c;h=6b99d07452ed6a0e9bc4aaa91f74fda5445ed1cc;hb=HEAD#l581
> 
> this behavior still matches item 3 described above, or am I mistaken?
> 
> The scsi device will be blocked after the iscsi session already failed.
> 
> My understanding is that this is perfectly intentional -- the initiator
> will block the device while trying to recover the session.
> 
> Which, as even described in the patch, makes the check transition to
> pending in the meantime. The path is, however, already broken.
> 
> So to summarize: What I'm asking about is if path checks based on
> datapath ops aren't rather ineffective if the underlying transport tries
> to mask datapath failures.
> 
Not inefficient as such (provided there is a timeout attached to the checks);
only that these tests wouldn't be able to give you any meaningful
information if the timeout occurs.

So the best you can say in these cases is "don't know, try later", for which
the 'pending' state is used in multipath.
And then you'd need another timeout in multipathing after which the 'pending'
state is interpreted as a failure, as the 'pending' state doesn't have any
information about the expected duration. IE the 'pending' state might indeed
be a permanent state.

So you have three timeouts to deal with:
- path checker issue timeout:
  how long should I wait for the path checker call to return; currently
  hardcoded to 5 nsecs.
- path checker duration timeout:
  how long should I wait for the path checker to complete; currently
  hardcoded to ASYNC_TIMEOUT_SEC.
- pending state duration timeout:
  how long should might a path remain in 'pending' before it is considered
  an error.

If all these timeouts are used and set correctly multipath is able to run
transport-agnostic, ie even a masking of the underlying datapath failures
will be handled properly.
Currently only the 'directio' checker is capable of distinguishing between
the first two timeouts, so that would be the checker of choice here.


I'll have some patches to modify the 'tur' checker also to run asynchronously,
but I'm not sure if that's the correct way here.
I'd rather prefer to have the 'sg' interface to be capable of using async_io
here. Have to poke Doug Gilbert about it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-02-19 12:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-09  1:18 multipath: Path checks on open-iscsi software initiators Daniel Stodden
2010-02-09  4:45 ` Mike Snitzer
2010-02-09  5:16   ` Daniel Stodden
2010-02-19 12:23     ` Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.