* [PATCH 1/1] dm-mpath: Extend path selector interface for supporting Dell EqualLogic path selector
@ 2010-06-28 13:53 Narendran Ganapathy
2010-06-29 8:32 ` Hannes Reinecke
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Narendran Ganapathy @ 2010-06-28 13:53 UTC (permalink / raw)
To: dm-devel; +Cc: Parind Shah, Jason Shamberger, agk, Narendran Ganapathy
[-- Attachment #1.1: Type: text/plain, Size: 12630 bytes --]
This patch extends the dm-path-selector interface to allow path selectors to use extra information from the IO request when selecting a path.
Dell EqualLogic and other iSCSI storage arrays use a distributed frameless architecture. In this architecture, the storage group consists of a number of distinct storage arrays ("members"), each having independent controllers, disk storage and network adapters. When a LUN is created it is spread across multiple members. The details of the distribution are hidden from initiators connected to this storage system. The storage group exposes a single target discovery portal, no matter how many members are being used. When iSCSI sessions are created, each session is connected to an eth port on a single member. Data to a LUN can be sent on any iSCSI session, and if the blocks being accessed are stored on another member the IO will be forwarded as required. This forwarding is invisible to the initiator. The storage layout is also dynamic, and the blocks stored on disk may be moved from member to member as needed to balance the load.
This architecture simplifies the management and configuration of both the storage group and initiators. In a multipathing configuration, it is possible to set up multiple iSCSI sessions to use multiple network interfaces on both the host and target to take advantage of the increased network bandwidth. An initiator can use a simple round robin algorithm to send IO on all paths and let the storage array members forward it as necessary. However, there is a performance advantage to sending data directly to the correct member. At the same time, the existing techniques of building a separate priority group for paths to each controller does not fit this model, because the block ranges may be moved at any time from member to member, and it is also acceptable to send IO to any member in the group when no direct path exists or there is a path error.
We propose to develop a new path selector to perform this location-based routing. The basic idea is to use knowledge about the current location of data to prefer paths directly to the owning member, but fall back to use any available path when no direct path is available. This submission includes the necessary changes to the dm-path-selector interface. In the current interface, the only information passed to the select_path routine is the path_selector struct and the number of bytes in the request. To do location based routing, we need the address information of the request.
We propose to extend the path selector interface to pass the entire request pointer to the 'select_path' / 'start_io' / 'end_io' functions so that the path selector can use any information therein to route the I/O.
We also propose extending the dm_mpath_io structure used to hold information about each I/O to include extra fields for the path selector to store I/O specific flags and a timestamp, so the path selector can determine the latency of I/Os on different paths and that information is passed to the 'select_path' / 'start_io' / 'end_io' functions for path selector usage. These additions to the dm_mpath_io allow future flexibility in developing algorithms that route IO based on other information from the request in the future.
Signed-off-by: Narendran Ganapathy <Narendran_Ganapathy@dell.com>
Signed-off-by: Jason Shamberger <Jason_Shamberger@dell.com>
---
drivers/md/dm-mpath.c | 37 +++++++++++++++++--------------------
drivers/md/dm-mpath.h | 20 ++++++++++++++++++++
drivers/md/dm-path-selector.h | 7 ++++---
drivers/md/dm-queue-length.c | 8 +++++---
drivers/md/dm-round-robin.c | 4 +++-
drivers/md/dm-service-time.c | 11 ++++++++---
6 files changed, 57 insertions(+), 30 deletions(-)
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 826bce7..cc45a99 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -98,14 +98,6 @@ struct multipath {
struct mutex work_mutex;
};
-/*
- * Context information attached to each bio we process.
- */
-struct dm_mpath_io {
- struct pgpath *pgpath;
- size_t nr_bytes;
-};
-
typedef int (*action_fn) (struct pgpath *pgpath);
#define MIN_IOS 256 /* Mempool size */
@@ -267,11 +259,13 @@ static void __switch_pg(struct multipath *m, struct pgpath *pgpath)
}
static int __choose_path_in_pg(struct multipath *m, struct priority_group *pg,
- size_t nr_bytes)
+ union dm_mpath_ps_io *psio,
+ struct request *clone)
{
struct dm_path *path;
- path = pg->ps.type->select_path(&pg->ps, &m->repeat_count, nr_bytes);
+ path = pg->ps.type->select_path(&pg->ps, &m->repeat_count, psio,
+ clone);
if (!path)
return -ENXIO;
@@ -283,7 +277,8 @@ static int __choose_path_in_pg(struct multipath *m, struct priority_group *pg,
return 0;
}
-static void __choose_pgpath(struct multipath *m, size_t nr_bytes)
+static void __choose_pgpath(struct multipath *m, union dm_mpath_ps_io *psio,
+ struct request *clone)
{
struct priority_group *pg;
unsigned bypassed = 1;
@@ -295,12 +290,13 @@ static void __choose_pgpath(struct multipath *m, size_t nr_bytes)
if (m->next_pg) {
pg = m->next_pg;
m->next_pg = NULL;
- if (!__choose_path_in_pg(m, pg, nr_bytes))
+ if (!__choose_path_in_pg(m, pg, psio, clone))
return;
}
/* Don't change PG until it has no remaining paths */
- if (m->current_pg && !__choose_path_in_pg(m, m->current_pg, nr_bytes))
+ if (m->current_pg &&
+ !__choose_path_in_pg(m, m->current_pg, psio, clone))
return;
/*
@@ -312,7 +308,7 @@ static void __choose_pgpath(struct multipath *m, size_t nr_bytes)
list_for_each_entry(pg, &m->priority_groups, list) {
if (pg->bypassed == bypassed)
continue;
- if (!__choose_path_in_pg(m, pg, nr_bytes))
+ if (!__choose_path_in_pg(m, pg, psio, clone))
return;
}
} while (bypassed--);
@@ -350,10 +346,12 @@ static int map_io(struct multipath *m, struct request *clone,
spin_lock_irqsave(&m->lock, flags);
+ mpio->u.nr_bytes = nr_bytes;
+
/* Do we need to select a new pgpath? */
if (!m->current_pgpath ||
(!m->queue_io && (m->repeat_count && --m->repeat_count == 0)))
- __choose_pgpath(m, nr_bytes);
+ __choose_pgpath(m, &mpio->u, clone);
pgpath = m->current_pgpath;
@@ -380,11 +378,10 @@ static int map_io(struct multipath *m, struct request *clone,
r = -EIO; /* Failed */
mpio->pgpath = pgpath;
- mpio->nr_bytes = nr_bytes;
if (r == DM_MAPIO_REMAPPED && pgpath->pg->ps.type->start_io)
pgpath->pg->ps.type->start_io(&pgpath->pg->ps, &pgpath->path,
- nr_bytes);
+ &mpio->u, clone);
spin_unlock_irqrestore(&m->lock, flags);
@@ -464,7 +461,7 @@ static void process_queued_ios(struct work_struct *work)
goto out;
if (!m->current_pgpath)
- __choose_pgpath(m, 0);
+ __choose_pgpath(m, NULL, NULL);
pgpath = m->current_pgpath;
@@ -1295,7 +1292,7 @@ static int multipath_end_io(struct dm_target *ti, struct request *clone,
if (pgpath) {
ps = &pgpath->pg->ps;
if (ps->type->end_io)
- ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes);
+ ps->type->end_io(ps, &pgpath->path, &mpio->u, clone);
}
mempool_free(mpio, m->mpio_pool);
@@ -1533,7 +1530,7 @@ static int multipath_ioctl(struct dm_target *ti, unsigned int cmd,
spin_lock_irqsave(&m->lock, flags);
if (!m->current_pgpath)
- __choose_pgpath(m, 0);
+ __choose_pgpath(m, NULL, NULL);
if (m->current_pgpath) {
bdev = m->current_pgpath->path.dev->bdev;
diff --git a/drivers/md/dm-mpath.h b/drivers/md/dm-mpath.h
index e230f71..45e9c58 100644
--- a/drivers/md/dm-mpath.h
+++ b/drivers/md/dm-mpath.h
@@ -16,6 +16,26 @@ struct dm_path {
void *pscontext; /* For path-selector use */
};
+
+/*
+ * Context information attached to each bio we process.
+ */
+struct dm_ps_io_ctx {
+ uint32_t flags;
+ unsigned long iotime;
+};
+
+union dm_mpath_ps_io {
+ size_t nr_bytes;
+ struct dm_ps_io_ctx ps_ctx;
+};
+
+struct dm_mpath_io {
+ struct pgpath *pgpath;
+ union dm_mpath_ps_io u;
+};
+
+
/* Callback for hwh_pg_init_fn to use when complete */
void dm_pg_init_complete(struct dm_path *path, unsigned err_flags);
diff --git a/drivers/md/dm-path-selector.h b/drivers/md/dm-path-selector.h
index e7d1fa8..cff8ca5 100644
--- a/drivers/md/dm-path-selector.h
+++ b/drivers/md/dm-path-selector.h
@@ -57,7 +57,8 @@ struct path_selector_type {
*/
struct dm_path *(*select_path) (struct path_selector *ps,
unsigned *repeat_count,
- size_t nr_bytes);
+ union dm_mpath_ps_io *psio,
+ struct request *clone);
/*
* Notify the selector that a path has failed.
@@ -77,9 +78,9 @@ struct path_selector_type {
status_type_t type, char *result, unsigned int maxlen);
int (*start_io) (struct path_selector *ps, struct dm_path *path,
- size_t nr_bytes);
+ union dm_mpath_ps_io *psio, struct request *clone);
int (*end_io) (struct path_selector *ps, struct dm_path *path,
- size_t nr_bytes);
+ union dm_mpath_ps_io *psio, struct request *clone);
};
/* Register a path selector */
diff --git a/drivers/md/dm-queue-length.c b/drivers/md/dm-queue-length.c
index f92b6ce..f4d9b47 100644
--- a/drivers/md/dm-queue-length.c
+++ b/drivers/md/dm-queue-length.c
@@ -168,7 +168,9 @@ static int ql_reinstate_path(struct path_selector *ps, struct dm_path *path)
* Select a path having the minimum number of in-flight I/Os
*/
static struct dm_path *ql_select_path(struct path_selector *ps,
- unsigned *repeat_count, size_t nr_bytes)
+ unsigned *repeat_count,
+ union dm_mpath_ps_io *psio,
+ struct request *clone)
{
struct selector *s = ps->context;
struct path_info *pi = NULL, *best = NULL;
@@ -197,7 +199,7 @@ static struct dm_path *ql_select_path(struct path_selector *ps,
}
static int ql_start_io(struct path_selector *ps, struct dm_path *path,
- size_t nr_bytes)
+ union dm_mpath_ps_io *psio, struct request *clone)
{
struct path_info *pi = path->pscontext;
@@ -207,7 +209,7 @@ static int ql_start_io(struct path_selector *ps, struct dm_path *path,
}
static int ql_end_io(struct path_selector *ps, struct dm_path *path,
- size_t nr_bytes)
+ union dm_mpath_ps_io *psio, struct request *clone)
{
struct path_info *pi = path->pscontext;
diff --git a/drivers/md/dm-round-robin.c b/drivers/md/dm-round-robin.c
index 24752f4..ecbde5f 100644
--- a/drivers/md/dm-round-robin.c
+++ b/drivers/md/dm-round-robin.c
@@ -161,7 +161,9 @@ static int rr_reinstate_path(struct path_selector *ps, struct dm_path *p)
}
static struct dm_path *rr_select_path(struct path_selector *ps,
- unsigned *repeat_count, size_t nr_bytes)
+ unsigned *repeat_count,
+ union dm_mpath_ps_io *psio,
+ struct request *clone)
{
struct selector *s = (struct selector *) ps->context;
struct path_info *pi = NULL;
diff --git a/drivers/md/dm-service-time.c b/drivers/md/dm-service-time.c
index 9c6c2e4..83f7534 100644
--- a/drivers/md/dm-service-time.c
+++ b/drivers/md/dm-service-time.c
@@ -254,10 +254,13 @@ static int st_compare_load(struct path_info *pi1, struct path_info *pi2,
}
static struct dm_path *st_select_path(struct path_selector *ps,
- unsigned *repeat_count, size_t nr_bytes)
+ unsigned *repeat_count,
+ union dm_mpath_ps_io *psio,
+ struct request *clone)
{
struct selector *s = ps->context;
struct path_info *pi = NULL, *best = NULL;
+ size_t nr_bytes = psio ? psio->nr_bytes : 0 ;
if (list_empty(&s->valid_paths))
return NULL;
@@ -278,9 +281,10 @@ static struct dm_path *st_select_path(struct path_selector *ps,
}
static int st_start_io(struct path_selector *ps, struct dm_path *path,
- size_t nr_bytes)
+ union dm_mpath_ps_io *psio, struct request *clone)
{
struct path_info *pi = path->pscontext;
+ size_t nr_bytes = psio ? psio->nr_bytes : 0 ;
atomic_add(nr_bytes, &pi->in_flight_size);
@@ -288,9 +292,10 @@ static int st_start_io(struct path_selector *ps, struct dm_path *path,
}
static int st_end_io(struct path_selector *ps, struct dm_path *path,
- size_t nr_bytes)
+ union dm_mpath_ps_io *psio, struct request *clone)
{
struct path_info *pi = path->pscontext;
+ size_t nr_bytes = psio ? psio->nr_bytes : 0 ;
atomic_sub(nr_bytes, &pi->in_flight_size);
--
1.6.5.2
[-- Attachment #1.2: Type: text/html, Size: 26505 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] dm-mpath: Extend path selector interface for supporting Dell EqualLogic path selector
2010-06-28 13:53 [PATCH 1/1] dm-mpath: Extend path selector interface for supporting Dell EqualLogic path selector Narendran Ganapathy
@ 2010-06-29 8:32 ` Hannes Reinecke
2010-06-30 13:01 ` Jason Shamberger
[not found] ` <86A3089001A44141B76B882059E7E53B05968CCF@M31.equallogic.com>
2010-06-29 11:10 ` Kiyoshi Ueda
` (2 subsequent siblings)
3 siblings, 2 replies; 9+ messages in thread
From: Hannes Reinecke @ 2010-06-29 8:32 UTC (permalink / raw)
To: device-mapper development
Cc: christophe varoqui, Parind Shah, Jason Shamberger, agk,
Narendran Ganapathy
Narendran Ganapathy wrote:
> This patch extends the dm-path-selector interface to allow path
> selectors to use extra information from the IO request when selecting a
> path.
>
> Dell EqualLogic and other iSCSI storage arrays use a distributed
> frameless architecture. In this architecture, the storage group
> consists of a number of distinct storage arrays ("members"), each having
> independent controllers, disk storage and network adapters. When a LUN
> is created it is spread across multiple members. The details of the
> distribution are hidden from initiators connected to this storage
> system. The storage group exposes a single target discovery portal, no
> matter how many members are being used. When iSCSI sessions are
> created, each session is connected to an eth port on a single member.
> Data to a LUN can be sent on any iSCSI session, and if the blocks being
> accessed are stored on another member the IO will be forwarded as
> required. This forwarding is invisible to the initiator. The storage
> layout is also dynamic, and the blocks stored on disk may be moved from
> member to member as needed to balance the load.
>
This sounds surprisingly similar to the upcoming 'Logical block dependent'
ALUA state of the upcoming T10 SPC-4.
Which begs the question if
a) is it implemented as such
and
b) if not why not?
And if it _is_ the logical block dependent ALUA state then yes, we
definitely need to update the multipath infrastructure for this.
However, given the good match between the 'REPORT REFERRALS' command
and the device-mapper table definitions I would rather propose to
use the output of 'REPORT REFERRALS' to create / modify the existing
multipath tables.
Currently we have a strict path-only mapping:
3600508b40008ddd70000600000620000 dm-0 HP,HSV300
[size=16G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=50][active]
\_ 6:0:6:1 sde 8:64 [active][ready]
\_ round-robin 0 [prio=10][enabled]
\_ 6:0:5:1 sdd 8:48 [active][ready]
or, in device-mapper output:
0 33554432 multipath 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:64 100 round-robin 0 1 1 8:48 100
With referrals we just need to adjust the 'start' and 'length' parameter to create _several_
device-mapper tables, eg
0 16777216 multipath 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:64 100 round-robin 0 1 1 8:48 100
16777217 33554432 multipath 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:48 100 round-robin 0 1 1 8:64 100
which would route the first half of the resulting multipath device to path '8:64' and the second half
to path '8:48'.
Which I think is far more logical and in match with the current device-mapper architecture.
But no, I don't have a patch for this. I've yet to see an array supporting referrals.
I might be tempted to do something here if I had one ...
And if your firmware does _not_ support referrals I would strongly advise to reimplement your
mechanism in the context of referrals.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] dm-mpath: Extend path selector interface for supporting Dell EqualLogic path selector
2010-06-28 13:53 [PATCH 1/1] dm-mpath: Extend path selector interface for supporting Dell EqualLogic path selector Narendran Ganapathy
2010-06-29 8:32 ` Hannes Reinecke
@ 2010-06-29 11:10 ` Kiyoshi Ueda
2010-06-29 21:30 ` Narendran Ganapathy
2010-07-09 16:10 ` Mike Snitzer
2010-07-12 22:34 ` Alasdair G Kergon
3 siblings, 1 reply; 9+ messages in thread
From: Kiyoshi Ueda @ 2010-06-29 11:10 UTC (permalink / raw)
To: Narendran Ganapathy
Cc: device-mapper development, Parind Shah, Jason Shamberger, agk
Hi,
On 06/28/2010 10:53 PM +0900, Narendran Ganapathy wrote:
> This patch extends the dm-path-selector interface to allow path
> selectors to use extra information from the IO request when selecting
> a path.
>
> Dell EqualLogic and other iSCSI storage arrays use a distributed
> frameless architecture. In this architecture, the storage group
> consists of a number of distinct storage arrays ("members"),
> each having independent controllers, disk storage and network adapters.
> When a LUN is created it is spread across multiple members.
> The details of the distribution are hidden from initiators connected to
> this storage system. The storage group exposes a single target
> discovery portal, no matter how many members are being used.
> When iSCSI sessions are created, each session is connected to an eth
> port on a single member. Data to a LUN can be sent on any iSCSI
> session, and if the blocks being accessed are stored on another
> member the IO will be forwarded as required. This forwarding is
> invisible to the initiator. The storage layout is also dynamic,
> and the blocks stored on disk may be moved from member to member
> as needed to balance the load.
>
> This architecture simplifies the management and configuration of both
> the storage group and initiators. In a multipathing configuration,
> it is possible to set up multiple iSCSI sessions to use multiple
> network interfaces on both the host and target to take advantage of
> the increased network bandwidth. An initiator can use a simple round
> robin algorithm to send IO on all paths and let the storage array
> members forward it as necessary. However, there is a performance
> advantage to sending data directly to the correct member.
> At the same time, the existing techniques of building a separate
> priority group for paths to each controller does not fit this model,
> because the block ranges may be moved at any time from member to member,
> and it is also acceptable to send IO to any member in the group when
> no direct path exists or there is a path error.
>
> We propose to develop a new path selector to perform this location-based
> routing. The basic idea is to use knowledge about the current location
> of data to prefer paths directly to the owning member, but fall back to
> use any available path when no direct path is available.
> This submission includes the necessary changes to the dm-path-selector
> interface. In the current interface, the only information passed
> to the select_path routine is the path_selector struct and the number
> of bytes in the request. To do location based routing, we need
> the address information of the request.
Interesting.
How can the path-selector know which path owns which location?
I can just see "it's invisible from initiator" from your explanation
above, so I can't understand why the path-selector can choose the best
path which is directly connected to the member owning the location.
It can just guess using the new path-selector parameters below?
> We propose to extend the path selector interface to pass the entire
> request pointer to the 'select_path' / 'start_io' / 'end_io' functions
> so that the path selector can use any information therein to route
> the I/O.
>
> We also propose extending the dm_mpath_io structure used to hold
> information about each I/O to include extra fields for the path
> selector to store I/O specific flags and a timestamp, so the path
> selector can determine the latency of I/Os on different paths and
> that information is passed to the 'select_path' / 'start_io' / 'end_io'
> functions for path selector usage. These additions to the dm_mpath_io
> allow future flexibility in developing algorithms that route IO based
> on other information from the request in the future.
I disagree with passing 'struct request' directly to path-selector.
As I said at http://marc.info/?l=dm-devel&m=123330286327402&w=2,
I think we should keep independency of path-selector from the type of
I/O structure as much as possible, since bio-based targets may want
to use them in the future (e.g. read balancing of dm-mirror).
So I would prefer the way below:
o define what parameters are needed (now and in the future)
o pass them to path-selector
Also, please post your path-selector patch if you already have,
so that people can understand your needs much correctly.
Thanks,
Kiyoshi Ueda
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] dm-mpath: Extend path selector interface for supporting Dell EqualLogic path selector
2010-06-29 11:10 ` Kiyoshi Ueda
@ 2010-06-29 21:30 ` Narendran Ganapathy
0 siblings, 0 replies; 9+ messages in thread
From: Narendran Ganapathy @ 2010-06-29 21:30 UTC (permalink / raw)
To: Kiyoshi Ueda
Cc: device-mapper development, Parind Shah, Jason Shamberger, agk
Hi,
Thanks for the feedback..
> Interesting.
> How can the path-selector know which path owns which location?
Path selector obtains the "path-location mappings" through a user land
helper application that queries the storage group. To explain further,
Path-Location mappings are nothing but relationships between the SCSI
minor and LBA ranges.
> I can just see "it's invisible from initiator" from your explanation
> above, so I can't understand why the path-selector can choose the best
> path which is directly connected to the member owning the location.
> It can just guess using the new path-selector parameters below?
By using the address parameter the path selector can choose the optimal
path to the member where the requested data is located and increase the
performance instead of getting forwarded from one member to another
member.
> I disagree with passing 'struct request' directly to path-selector.
> As I said at http://marc.info/?l=dm-devel&m=123330286327402&w=2,
> I think we should keep independency of path-selector from the type of
> I/O structure as much as possible, since bio-based targets may want
> to use them in the future (e.g. read balancing of dm-mirror).
> So I would prefer the way below:
> o define what parameters are needed (now and in the future)
> o pass them to path-selector
Ok. We thought passing the request pointer would allow extra flexibility
in the case of future path selectors who could choose a path based on
different criteria (like the direction of request, read goes to a
separate path, write goes to another path etc..). For our path selector,
we need the starting address of the request.
> Also, please post your path-selector patch if you already have,
> so that people can understand your needs much correctly.
We are working on it and would post the path-selector patch soon.
Thanks,
Naren.
-----Original Message-----
From: Kiyoshi Ueda [mailto:k-ueda@ct.jp.nec.com]
Sent: Tuesday, June 29, 2010 7:10 AM
To: Narendran Ganapathy
Cc: device-mapper development; Parind Shah; Jason Shamberger;
agk@redhat.com
Subject: Re: [dm-devel] [PATCH 1/1] dm-mpath: Extend path selector
interface for supporting Dell EqualLogic path selector
Hi,
On 06/28/2010 10:53 PM +0900, Narendran Ganapathy wrote:
> This patch extends the dm-path-selector interface to allow path
> selectors to use extra information from the IO request when selecting
> a path.
>
> Dell EqualLogic and other iSCSI storage arrays use a distributed
> frameless architecture. In this architecture, the storage group
> consists of a number of distinct storage arrays ("members"),
> each having independent controllers, disk storage and network
adapters.
> When a LUN is created it is spread across multiple members.
> The details of the distribution are hidden from initiators connected
to
> this storage system. The storage group exposes a single target
> discovery portal, no matter how many members are being used.
> When iSCSI sessions are created, each session is connected to an eth
> port on a single member. Data to a LUN can be sent on any iSCSI
> session, and if the blocks being accessed are stored on another
> member the IO will be forwarded as required. This forwarding is
> invisible to the initiator. The storage layout is also dynamic,
> and the blocks stored on disk may be moved from member to member
> as needed to balance the load.
>
> This architecture simplifies the management and configuration of both
> the storage group and initiators. In a multipathing configuration,
> it is possible to set up multiple iSCSI sessions to use multiple
> network interfaces on both the host and target to take advantage of
> the increased network bandwidth. An initiator can use a simple round
> robin algorithm to send IO on all paths and let the storage array
> members forward it as necessary. However, there is a performance
> advantage to sending data directly to the correct member.
> At the same time, the existing techniques of building a separate
> priority group for paths to each controller does not fit this model,
> because the block ranges may be moved at any time from member to
member,
> and it is also acceptable to send IO to any member in the group when
> no direct path exists or there is a path error.
>
> We propose to develop a new path selector to perform this
location-based
> routing. The basic idea is to use knowledge about the current
location
> of data to prefer paths directly to the owning member, but fall back
to
> use any available path when no direct path is available.
> This submission includes the necessary changes to the dm-path-selector
> interface. In the current interface, the only information passed
> to the select_path routine is the path_selector struct and the number
> of bytes in the request. To do location based routing, we need
> the address information of the request.
Interesting.
How can the path-selector know which path owns which location?
I can just see "it's invisible from initiator" from your explanation
above, so I can't understand why the path-selector can choose the best
path which is directly connected to the member owning the location.
It can just guess using the new path-selector parameters below?
> We propose to extend the path selector interface to pass the entire
> request pointer to the 'select_path' / 'start_io' / 'end_io'
functions
> so that the path selector can use any information therein to route
> the I/O.
>
> We also propose extending the dm_mpath_io structure used to hold
> information about each I/O to include extra fields for the path
> selector to store I/O specific flags and a timestamp, so the path
> selector can determine the latency of I/Os on different paths and
> that information is passed to the 'select_path' / 'start_io' /
'end_io'
> functions for path selector usage. These additions to the dm_mpath_io
> allow future flexibility in developing algorithms that route IO based
> on other information from the request in the future.
I disagree with passing 'struct request' directly to path-selector.
As I said at http://marc.info/?l=dm-devel&m=123330286327402&w=2,
I think we should keep independency of path-selector from the type of
I/O structure as much as possible, since bio-based targets may want
to use them in the future (e.g. read balancing of dm-mirror).
So I would prefer the way below:
o define what parameters are needed (now and in the future)
o pass them to path-selector
Also, please post your path-selector patch if you already have,
so that people can understand your needs much correctly.
Thanks,
Kiyoshi Ueda
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] dm-mpath: Extend path selector interface for supporting Dell EqualLogic path selector
2010-06-29 8:32 ` Hannes Reinecke
@ 2010-06-30 13:01 ` Jason Shamberger
[not found] ` <86A3089001A44141B76B882059E7E53B05968CCF@M31.equallogic.com>
1 sibling, 0 replies; 9+ messages in thread
From: Jason Shamberger @ 2010-06-30 13:01 UTC (permalink / raw)
To: Hannes Reinecke, device-mapper development
Cc: christophe varoqui, Parind Shah, Narendran Ganapathy, agk
Hannes,
Thanks for the feedback. I looked through the spec you mentioned, and the logical block dependent ALUA functionality is similar to what we are trying to accomplish, however it is not a good fit with our storage array architecture. Our storage array exposes all LUNs through a single Target Port Group, whereas multiple TPGs are needed for the ALUA functionality. The design choice of a single TPG was made to simplify the logic on the initiator side. The client does not need to decide which TPG(s) to connect to, it always connects to the single Target Port Group and the storage array decides where to place the connections through use of iSCSI redirection.
Our goal with this project is to keep the simplicity aspects of our storage array architecture, but add a path selector on the linux initiator to choose the optimal path for each IO. In this framework, we need additional information to be passed to the path selector to aid it in choosing the optimal path.
Jason
-----Original Message-----
From: Hannes Reinecke [mailto:hare@suse.de]
Sent: Tuesday, June 29, 2010 4:32 AM
To: device-mapper development
Cc: Parind Shah; Jason Shamberger; agk@redhat.com; Narendran Ganapathy; christophe varoqui
Subject: Re: [dm-devel] [PATCH 1/1] dm-mpath: Extend path selector interface for supporting Dell EqualLogic path selector
Narendran Ganapathy wrote:
> This patch extends the dm-path-selector interface to allow path
> selectors to use extra information from the IO request when selecting a
> path.
>
> Dell EqualLogic and other iSCSI storage arrays use a distributed
> frameless architecture. In this architecture, the storage group
> consists of a number of distinct storage arrays ("members"), each having
> independent controllers, disk storage and network adapters. When a LUN
> is created it is spread across multiple members. The details of the
> distribution are hidden from initiators connected to this storage
> system. The storage group exposes a single target discovery portal, no
> matter how many members are being used. When iSCSI sessions are
> created, each session is connected to an eth port on a single member.
> Data to a LUN can be sent on any iSCSI session, and if the blocks being
> accessed are stored on another member the IO will be forwarded as
> required. This forwarding is invisible to the initiator. The storage
> layout is also dynamic, and the blocks stored on disk may be moved from
> member to member as needed to balance the load.
>
This sounds surprisingly similar to the upcoming 'Logical block dependent'
ALUA state of the upcoming T10 SPC-4.
Which begs the question if
a) is it implemented as such
and
b) if not why not?
And if it _is_ the logical block dependent ALUA state then yes, we
definitely need to update the multipath infrastructure for this.
However, given the good match between the 'REPORT REFERRALS' command
and the device-mapper table definitions I would rather propose to
use the output of 'REPORT REFERRALS' to create / modify the existing
multipath tables.
Currently we have a strict path-only mapping:
3600508b40008ddd70000600000620000 dm-0 HP,HSV300
[size=16G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=50][active]
\_ 6:0:6:1 sde 8:64 [active][ready]
\_ round-robin 0 [prio=10][enabled]
\_ 6:0:5:1 sdd 8:48 [active][ready]
or, in device-mapper output:
0 33554432 multipath 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:64 100 round-robin 0 1 1 8:48 100
With referrals we just need to adjust the 'start' and 'length' parameter to create _several_
device-mapper tables, eg
0 16777216 multipath 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:64 100 round-robin 0 1 1 8:48 100
16777217 33554432 multipath 1 queue_if_no_path 0 2 1 round-robin 0 1 1 8:48 100 round-robin 0 1 1 8:64 100
which would route the first half of the resulting multipath device to path '8:64' and the second half
to path '8:48'.
Which I think is far more logical and in match with the current device-mapper architecture.
But no, I don't have a patch for this. I've yet to see an array supporting referrals.
I might be tempted to do something here if I had one ...
And if your firmware does _not_ support referrals I would strongly advise to reimplement your
mechanism in the context of referrals.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] dm-mpath: Extend path selector interface for supporting Dell EqualLogic path selector
[not found] ` <86A3089001A44141B76B882059E7E53B05968CCF@M31.equallogic.com>
@ 2010-07-01 7:01 ` Hannes Reinecke
0 siblings, 0 replies; 9+ messages in thread
From: Hannes Reinecke @ 2010-07-01 7:01 UTC (permalink / raw)
To: Jason Shamberger
Cc: device-mapper development, Parind Shah, Narendran Ganapathy, agk,
christophe varoqui
Jason Shamberger wrote:
> Hannes,
>
> Thanks for the feedback. I looked through the spec you mentioned, and the logical block dependent ALUA
> functionality is similar to what we are trying to accomplish, however it is not a good fit with our storage
> array architecture. Our storage array exposes all LUNs through a single Target Port Group, whereas multiple
> TPGs are needed for the ALUA functionality. The design choice of a single TPG was made to simplify the
> logic on the initiator side. The client does not need to decide which TPG(s) to connect to, it always
> connects to the single Target Port Group and the storage array decides where to place the connections
> through use of iSCSI redirection.
>
I still think it should be possible to use the ALUA infrastructure here.
As you are using iSCSI redirection _all_ nodes in you storage cluster already will have to have
an iSCSI interface. To have your proposed functionality mapped on LB-ALUA you only have to
expose each of these interfaces, and have the mapping shared between all of these
nodes (which you have to do anyway).
Assume two nodes A and B with iSCSI interfaces A1 and B1 both serving LUN1.
One part LUN1a residing on node A and the other part LUN1b residing on node B.
Each node (and every interface on that node) would reside in it's own SCSI Target Port Group,
exposing _the entire_ LUN as LB-ALUA capable. Node A would mark LUN1a as 'active/optimized' and
LUN1b as 'active/non-optimized', node B would mark LUN1b as 'active/optimized' and LUN1a as
'active/non-optimized'. Any accesses to the 'non-optimized' sections would be forwarded
to the other nodes via iSCSI redirection.
In effect access would work even when connecting to a single node, and a good speedup would
be observed when using an LB-ALUA capable initiator.
Yes, you would have to expose several TPGs (basically one group per node), but each LUN would
be present in each TPG. So it wouldn't matter much to the functionality here.
> Our goal with this project is to keep the simplicity aspects of our storage array architecture,
> but add a path selector on the linux initiator to choose the optimal path for each IO.
> In this framework, we need additional information to be passed to the path selector to
> aid it in choosing the optimal path.
>
But you have to fight for it, as you're the only one using this infrastructure.
If you would support LB-ALUA you wouldn't be alone in this, as we (as the linux
community) would have to support it eventually.
Plus it's simply is more fun to code against an official standard; proprietary
interfaces are a pain to support properly.
And I'd be glad to help in converting multipathing to make it LB-ALUA capable,
but my interest in proprietary interfaces is close to zero :-)
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] dm-mpath: Extend path selector interface for supporting Dell EqualLogic path selector
2010-06-28 13:53 [PATCH 1/1] dm-mpath: Extend path selector interface for supporting Dell EqualLogic path selector Narendran Ganapathy
2010-06-29 8:32 ` Hannes Reinecke
2010-06-29 11:10 ` Kiyoshi Ueda
@ 2010-07-09 16:10 ` Mike Snitzer
2010-07-12 22:34 ` Alasdair G Kergon
3 siblings, 0 replies; 9+ messages in thread
From: Mike Snitzer @ 2010-07-09 16:10 UTC (permalink / raw)
To: device-mapper development
Cc: Parind Shah, Jason Shamberger, agk, Narendran Ganapathy
On Mon, Jun 28 2010 at 9:53am -0400,
Narendran Ganapathy <Narendran_Ganapathy@dell.com> wrote:
> We propose to extend the path selector interface to pass the entire
> request pointer to the 'select_path' / 'start_io' / 'end_io'
> functions so that the path selector can use any information therein to
> route the I/O.
So passing the entire request aside (Kiyoshi already shared that this is
not desirable)...
> We also propose extending the dm_mpath_io structure used to hold
> information about each I/O to include extra fields for the path
> selector to store I/O specific flags and a timestamp, so the path
> selector can determine the latency of I/Os on different paths and that
> information is passed to the 'select_path' / 'start_io' / 'end_io'
> functions for path selector usage. These additions to the dm_mpath_io
> allow future flexibility in developing algorithms that route IO based
> on other information from the request in the future.
How can you reasonably pass "extra fields for the path selector to store
I/O specific flags and a timestamp"?
'nr_bytes' and 'struct dm_ps_io_ctx' are mutually exclussive
(side-effect of using a 'union dm_mpath_ps_io').
Yet I'm not seeing where the existing 'select_path' / 'start_io' /
'end_io' interfaces are weened off of their potential dependency on
'nr_bytes'.
dm-service-time path selector uses nr_bytes (and obviously doesn't use
dm_ps_io_ctx). But what if a path selector had a need for both
'nr_bytes' and 'dm_ps_io_ctx'? Presummably the future might require us
to add additional members to dm_ps_io_ctx or dm_mpath_ps_io too.
Long story short, I do not think you should be using a union for
dm_mpath_ps_io (it should just be a struct).
But I'd imagine you've likely already switched away from a union in
order to store the relevant members of the 'struct request' (rather than
passing the request to the path selectors) in dm_mpath_ps_io.
Mike
> diff --git a/drivers/md/dm-mpath.h b/drivers/md/dm-mpath.h
> index e230f71..45e9c58 100644
> --- a/drivers/md/dm-mpath.h
> +++ b/drivers/md/dm-mpath.h
> @@ -16,6 +16,26 @@ struct dm_path {
> void *pscontext; /* For path-selector use */
> };
>
> +
> +/*
> + * Context information attached to each bio we process.
> + */
> +struct dm_ps_io_ctx {
> + uint32_t flags;
> + unsigned long iotime;
> +};
> +
> +union dm_mpath_ps_io {
> + size_t nr_bytes;
> + struct dm_ps_io_ctx ps_ctx;
> +};
> +
> +struct dm_mpath_io {
> + struct pgpath *pgpath;
> + union dm_mpath_ps_io u;
> +};
> +
> +
> /* Callback for hwh_pg_init_fn to use when complete */
> void dm_pg_init_complete(struct dm_path *path, unsigned err_flags);
>
> diff --git a/drivers/md/dm-path-selector.h b/drivers/md/dm-path-selector.h
> index e7d1fa8..cff8ca5 100644
> --- a/drivers/md/dm-path-selector.h
> +++ b/drivers/md/dm-path-selector.h
> @@ -57,7 +57,8 @@ struct path_selector_type {
> */
> struct dm_path *(*select_path) (struct path_selector *ps,
> unsigned *repeat_count,
> - size_t nr_bytes);
> + union dm_mpath_ps_io *psio,
> + struct request *clone);
>
> /*
> * Notify the selector that a path has failed.
> @@ -77,9 +78,9 @@ struct path_selector_type {
> status_type_t type, char *result, unsigned int maxlen);
>
> int (*start_io) (struct path_selector *ps, struct dm_path *path,
> - size_t nr_bytes);
> + union dm_mpath_ps_io *psio, struct request *clone);
> int (*end_io) (struct path_selector *ps, struct dm_path *path,
> - size_t nr_bytes);
> + union dm_mpath_ps_io *psio, struct request *clone);
> };
>
> /* Register a path selector */
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] dm-mpath: Extend path selector interface for supporting Dell EqualLogic path selector
2010-06-28 13:53 [PATCH 1/1] dm-mpath: Extend path selector interface for supporting Dell EqualLogic path selector Narendran Ganapathy
` (2 preceding siblings ...)
2010-07-09 16:10 ` Mike Snitzer
@ 2010-07-12 22:34 ` Alasdair G Kergon
2010-07-14 21:34 ` [PATCH 1/1] dm-mpath: Extend path selectorinterface " Jason Shamberger
3 siblings, 1 reply; 9+ messages in thread
From: Alasdair G Kergon @ 2010-07-12 22:34 UTC (permalink / raw)
To: device-mapper development
Cc: Parind Shah, Jason Shamberger, Narendran Ganapathy
On Mon, Jun 28, 2010 at 09:53:20AM -0400, Narendran Ganapathy wrote:
> This patch extends the dm-path-selector interface to allow path selectors to
> use extra information from the IO request when selecting a path.
> To do location based routing, we need the address information of the request.
What other inputs - in addition to offset - will the path selector need to take
into account to make its decision and how will it get those inputs?
Presumably you envisage some sort of semi-static or cached information,
and not asking the hardware before every piece of I/O?
How many ranges are there likely to be in this offset-based routing table?
How frequently is the offset-based routing table likely to change?
As Hannes points out, the dm table layer is already designed to handle
offset-based routing, so I'll need some convincing there's a need to duplicate
part of this inside path selectors.
If this information is rapidly changing - many reconfigurations per minute,
then we may need to consider some in-kernel solution. Otherwise I'll be
seeking solutions performing the reconfiguration from userspace first.
> We also propose extending the dm_mpath_io structure used to hold information
I'll consider proposals for any new fields alongside patches that make good use
of them, but I won't add fields in advance.
Alasdair
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/1] dm-mpath: Extend path selectorinterface for supporting Dell EqualLogic path selector
2010-07-12 22:34 ` Alasdair G Kergon
@ 2010-07-14 21:34 ` Jason Shamberger
0 siblings, 0 replies; 9+ messages in thread
From: Jason Shamberger @ 2010-07-14 21:34 UTC (permalink / raw)
To: device-mapper development
Cc: Parind Shah, Narendran Ganapathy, Alasdair G Kergon
Thanks, all, for the feedback.
We will update our proposed patch to pass individual parameters to the
path
selector instead of the entire struct request *, so it is compatible
with
either request-based or bio-based usage. I do feel it makes sense to
define
a struct that will be passed through, now that we have more than just
nr_bytes. Mike, you have a good point about not using a union, this
has
been removed.
> What other inputs - in addition to offset - will the path selector
need to take
> into account to make its decision and how will it get those inputs?
> Presumably you envisage some sort of semi-static or cached
information,
> and not asking the hardware before every piece of I/O?
The additional parameters we would like to pass to the path selectors
are:
- Start address
- Whether the IO is a read or write. This will be stored in a
flags
field.
- Timestamp of when the IO started, so the path selector can
calculate
the latency in end_io()
We understand the full path selector code is necessary before this patch
can be accepted, and will post it in the near future.
> How many ranges are there likely to be in this offset-based routing
table?
> How frequently is the offset-based routing table likely to change?
> As Hannes points out, the dm table layer is already designed to handle
> offset-based routing, so I'll need some convincing there's a need to
duplicate
> part of this inside path selectors.
> If this information is rapidly changing - many reconfigurations per
minute,
> then we may need to consider some in-kernel solution. Otherwise I'll
be
> seeking solutions performing the reconfiguration from userspace first.
We agree with doing much of the work in userspace, such as reading the
information from the storage array. The routing table can change at any
time, but is not likely to be changing frequently, so a userspace
solution
is adequate. This information will be cached on the host so only a
quick
lookup is necessary in the IO path.
There are many ranges in our routing table - when we spread data across
the
arrays in our group, each range is typically only 10s of MB, so it is
typical
to have thousands of these ranges per LUN.
Our initial design was to use a single entry in the dm table and a path
selector that routed amongst paths based on the additional info proposed
above.
Based on Alasdair's and Hannes' feedback we have done some
experimentation
with the dm table layer offset based routing and it seems a viable
alternative,
but we hit a couple of problems. In the current kernel the DM is
rejecting our
table - the multipath target is request based, and it seems the request
based
dm only supports tables that have a single target. We have also tested
on an
older kernel that uses a bio-based dm-multipath, and are able to load a
table
with dmsetup. However, we seem to be bumping into a scalability limit
when
the table exceeds a few thousand lines.
Thanks,
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-07-14 21:34 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-28 13:53 [PATCH 1/1] dm-mpath: Extend path selector interface for supporting Dell EqualLogic path selector Narendran Ganapathy
2010-06-29 8:32 ` Hannes Reinecke
2010-06-30 13:01 ` Jason Shamberger
[not found] ` <86A3089001A44141B76B882059E7E53B05968CCF@M31.equallogic.com>
2010-07-01 7:01 ` Hannes Reinecke
2010-06-29 11:10 ` Kiyoshi Ueda
2010-06-29 21:30 ` Narendran Ganapathy
2010-07-09 16:10 ` Mike Snitzer
2010-07-12 22:34 ` Alasdair G Kergon
2010-07-14 21:34 ` [PATCH 1/1] dm-mpath: Extend path selectorinterface " Jason Shamberger
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.