public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Anderson <andmike@us.ibm.com>
To: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Patrick Mansfield <patmans@us.ibm.com>,
	Lars Marowsky-Bree <lmb@suse.de>,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: [RFC] Multi-path IO in 2.5/2.6 ?
Date: Mon, 9 Sep 2002 11:40:26 -0700	[thread overview]
Message-ID: <20020909184026.GD1334@beaverton.ibm.com> (raw)
In-Reply-To: <200209091734.g89HY5p11796@localhost.localdomain>

James Bottomley [James.Bottomley@SteelEye.com] wrote:
> patmans@us.ibm.com said:
> > Using md or volume manager is wrong for non-failover usage, and
> > somewhat bad for failover models; generic block layer is OK but it is
> > wasted code for any lower layers that do not or cannot have multi-path
> > IO (such as IDE). 
> 
> What about block devices that could usefully use multi-path to achieve network 
> redundancy, like nbd? If it's in the block layer or above, they can be made to 
> work with minimal effort.

When you get into networking I believe we may get into path failover
capability that is already implemented by the network stack. So the
paths may not be visible to the block layer.

> 
> My basic point is that the utility of the feature transcends SCSI, so SCSI is 
> too low a layer for it.
> 
> I wouldn't be too sure even of the IDE case: IDE has a habit of copying SCSI 
> features when they become more main-stream (and thus cheaper).  It wouldn't 
> suprise me to see multi-path as an adjunct to the IDE serial stuff.
> 

The utility does transcend SCSI, but transport / device specific
characteristics may make "true" generic implementations difficult.

To add functionality beyond failover multi-path you will need to get into
transport and device specific data gathering.

> > A major problem with multi-path in md or other volume manager is that
> > we use multiple (block layer) queues for a single device, when we
> > should be using a single queue. If we want to use all paths to a
> > device (i.e. round robin across paths or such, not a failover model)
> > this means the elevator code becomes inefficient, mabye even
> > counterproductive. For disk arrays, this might not be bad, but for
> > actual drives or even plugging single ported drives into a switch or
> > bus with multiple initiators, this could lead to slower disk
> > performance. 
> 
> That's true today, but may not be true in 2.6.  Suparna's bio splitting code 
> is aimed precisely at this and other software RAID cases.

I have not looked at Suparna's patch but it would seem that device
knowledge would be helpful for knowing when to split.

> > In the current code, each path is allocated a Scsi_Device, including a
> > request_queue_t, and a set of Scsi_Cmnd structures. Not only do we end
> > up with a Scsi_Device for each path, we also have an upper level (sd,
> > sg, st, or sr) driver attached to each Scsi_Device. 
> 
> You can't really get away from this.  Transfer parameters are negotiated at 
> the Scsi_Device level (i.e. per device path from HBA to controller), and LLDs 
> accept I/O's for Scsi_Devices.  Whatever you do, you still need an entity that 
> performs most of the same functions as the Scsi_Device, so you might as well 
> keep Scsi_Device itself, since it works.

James have you looked at the documentation / patch previously pointed to
by Patrick? There is still a Scsi_device.

> 
> > For sd, this means if you have n paths to each SCSI device, you are
> > limited to whatever limit sd has divided by n, right now 128 / n.
> > Having four paths to a device is very reasonable, limiting us to 32
> > devices, but with the overhead of 128 devices. 
> 
> I really don't expect this to be true in 2.6.
> 

While the device space may be increased in 2.6 you are still consuming
extra resources, but we do this in other places also.

> > We could implement multi-path IO in the block layer, but if the only
> > user is SCSI, this gains nothing compared to putting multi-path in the
> > scsi layers. Creating block level interfaces that will work for future
> > devices and/or future code is hard without already having the devices
> > or code in place. Any block level interface still requires support in
> > the the underlying layers.
> 
> > I'm not against a block level interface, but I don't have ideas or
> > code for such an implementation.
> 
> SCSI got into a lot of trouble by going down the "kernel doesn't have X 
> feature I need, so I'll just code it into the SCSI mid-layer instead", I'm 
> loth to accept something into SCSI that I don't think belongs there in the 
> long term.
> 
> Answer me this question:
> 
> - In the forseeable future does multi-path have uses other than SCSI?
> 

See top comment.

> The "scsi is everything" approach got its wings shot off at the kernel summit, 
> and subsequently confirmed its death in a protracted wrangle on lkml (I can't 
> remember the reference off the top of my head, but I'm sure others can).

Could you point this out so I can understand the context.

-Mike
-- 
Michael Anderson
andmike@us.ibm.com


  reply	other threads:[~2002-09-09 18:35 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-09 14:57 [RFC] Multi-path IO in 2.5/2.6 ? James Bottomley
2002-09-09 16:56 ` Patrick Mansfield
2002-09-09 17:34   ` James Bottomley
2002-09-09 18:40     ` Mike Anderson [this message]
2002-09-10 13:02       ` Lars Marowsky-Bree
2002-09-10 16:03         ` Patrick Mansfield
2002-09-10 16:27         ` Mike Anderson
2002-09-10  0:08     ` Patrick Mansfield
2002-09-10  7:55       ` Jeremy Higdon
2002-09-10 13:04         ` Lars Marowsky-Bree
2002-09-10 16:20           ` Patrick Mansfield
2002-09-10 13:16       ` Lars Marowsky-Bree
2002-09-10 19:26         ` Patrick Mansfield
2002-09-11 14:20           ` James Bottomley
2002-09-11 19:17             ` Lars Marowsky-Bree
2002-09-11 19:37               ` James Bottomley
2002-09-11 19:52                 ` Lars Marowsky-Bree
2002-09-12  1:15                   ` Bernd Eckenfels
2002-09-11 21:38                 ` Oliver Xymoron
2002-09-11 20:30             ` Doug Ledford
2002-09-11 21:17               ` Mike Anderson
2002-09-10 17:21       ` Patrick Mochel
2002-09-10 18:42         ` Patrick Mansfield
2002-09-10 19:00           ` Patrick Mochel
2002-09-10 19:37             ` Patrick Mansfield
2002-09-11  0:21 ` Neil Brown
     [not found] <patmans@us.ibm.com>
2002-10-30 16:58 ` [PATCH] 2.5 current bk fix setting scsi queue depths Patrick Mansfield
2002-10-30 17:17   ` James Bottomley
2002-10-30 18:05     ` Patrick Mansfield
2002-10-31  0:44       ` James Bottomley
  -- strict thread matches above, loose matches on Subject: below --
2002-09-10 16:34 [RFC] Multi-path IO in 2.5/2.6 ? Cameron, Steve
2002-09-10 18:48 ` Alan Cox
2002-09-10 14:43 Cameron, Steve
2002-09-10 15:05 ` Alan Cox
2002-09-10 14:06 Cameron, Steve
2002-09-10 14:25 ` Alan Cox
2002-09-09 17:58 Ulrich Weigand
2002-09-09 10:49 Lars Marowsky-Bree
2002-09-09 12:23 ` Alan Cox
2002-09-10 10:30 ` Heinz J . Mauelshagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020909184026.GD1334@beaverton.ibm.com \
    --to=andmike@us.ibm.com \
    --cc=James.Bottomley@SteelEye.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lmb@suse.de \
    --cc=patmans@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox