* Is there a grand plan for FC failover? @ 2004-01-26 14:18 Simon Kelley 2004-01-26 15:37 ` James Bottomley 0 siblings, 1 reply; 19+ messages in thread From: Simon Kelley @ 2004-01-26 14:18 UTC (permalink / raw) To: linux-scsi I see that 2.6.x kernels now have the qla2xxx driver in the mainline, but without the failover code. What is the reason for that? Is there a plan provide failover facilities at a higher level which will be usable with all suitable low-level drivers and hardware? I'm very much in favour of using drivers which are developed in the kernel mainline but I have an application which needs failover so I might be forced back to the qlogic-distributed code. Cheers, Simon. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is there a grand plan for FC failover? 2004-01-26 14:18 Is there a grand plan for FC failover? Simon Kelley @ 2004-01-26 15:37 ` James Bottomley 2004-01-28 15:02 ` Philip R. Auld 0 siblings, 1 reply; 19+ messages in thread From: James Bottomley @ 2004-01-26 15:37 UTC (permalink / raw) To: Simon Kelley; +Cc: SCSI Mailing List On Mon, 2004-01-26 at 08:18, Simon Kelley wrote: > I see that 2.6.x kernels now have the qla2xxx driver in the mainline, > but without the failover code. > > What is the reason for that? Is there a plan provide failover facilities > at a higher level which will be usable with all suitable low-level > drivers and hardware? > > I'm very much in favour of using drivers which are developed in the > kernel mainline but I have an application which needs failover so I > might be forced back to the qlogic-distributed code. Yes, the direction coming out of KS/OLS last year was to use the dm or md multi-path code to sit the failover driver on top of sd (or any other block driver). The idea being that the Volume Manager layer is the most stack generic place to do this type of thing. The thread on this is here: http://marc.theaimsgroup.com/?t=106005575400003 James ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is there a grand plan for FC failover? 2004-01-26 15:37 ` James Bottomley @ 2004-01-28 15:02 ` Philip R. Auld 2004-01-28 16:57 ` James Bottomley 0 siblings, 1 reply; 19+ messages in thread From: Philip R. Auld @ 2004-01-28 15:02 UTC (permalink / raw) To: James Bottomley; +Cc: Simon Kelley, SCSI Mailing List Hi, Rumor has it that on Mon, Jan 26, 2004 at 09:37:25AM -0600 James Bottomley said: > On Mon, 2004-01-26 at 08:18, Simon Kelley wrote: > > > I'm very much in favour of using drivers which are developed in the > > kernel mainline but I have an application which needs failover so I > > might be forced back to the qlogic-distributed code. > > Yes, the direction coming out of KS/OLS last year was to use the dm or > md multi-path code to sit the failover driver on top of sd (or any other > block driver). > > The idea being that the Volume Manager layer is the most stack generic > place to do this type of thing. The thread on this is here: > > http://marc.theaimsgroup.com/?t=106005575400003 > There are some issues left un-resolved with this approach. The one's that come mind are: 1) load balancing when possible: it's not enough to be just a failover mechanism. 2) requiring a userspace program to execute for failover is problematic when it could be the root disk that needs failing over. 3) Handling partitions is a problem. I see multipath and md/RAID as two different animals. Multipathing is multiple ways to reach the same physical block. That is it's under the logical layer. While RAID is multiple ways to reach the same logical block. It's basically many-to-one vs one-to-many. Having multiple physical paths to a logical partition is a little counter-intuitive. That said, I'm all for getting Linux to have a decent multipath implementation, at what ever layer is agreed upon. I'd love to be able to use the native Linux multipathing and stop supporting the ones I've written. FWIW, Phil > James > > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Philip R. Auld, Ph.D. Egenera, Inc. Principal Software Engineer 165 Forest St. (508) 858-2628 Marlboro, MA 01752 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is there a grand plan for FC failover? 2004-01-28 15:02 ` Philip R. Auld @ 2004-01-28 16:57 ` James Bottomley 2004-01-28 18:00 ` Philip R. Auld 0 siblings, 1 reply; 19+ messages in thread From: James Bottomley @ 2004-01-28 16:57 UTC (permalink / raw) To: Philip R. Auld; +Cc: Simon Kelley, SCSI Mailing List On Wed, 2004-01-28 at 09:02, Philip R. Auld wrote: > 1) load balancing when possible: it's not enough to be just a > failover mechanism. For first out, a failover target supplies most of the needs. Nothing prevents an aggregation target being added later, but failover is essential. > 2) requiring a userspace program to execute for failover is > problematic when it could be the root disk that needs > failing over. Userspace is required for *configuration* not failover. The path targets failover automatically as they see I/O down particular paths timing out or failing. That being said, nothing prevents async notifications via hotplug also triggering a failover (rather than having to wait for timeout etc) but it's not required. > 3) Handling partitions is a problem. It is? How? The block approach can sit either above or below partitions (assuming a slightly more flexible handling of partitions that dm provides). > I see multipath and md/RAID as two different animals. Multipathing is > multiple ways to reach the same physical block. That is it's under > the logical layer. While RAID is multiple ways to reach the same > logical block. It's basically many-to-one vs one-to-many. Having > multiple physical paths to a logical partition is a little > counter-intuitive. Nothing in the discussion assumed them to be similar. The notes were that md already had a multi-path target, and that translated error indications would be useful to software raid. Thus designing the fastfail to cater to both looks like a good idea, but that's just good design. Robust multi-pathing and raid will be built on top of scsi/block fastfail. James ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is there a grand plan for FC failover? 2004-01-28 16:57 ` James Bottomley @ 2004-01-28 18:00 ` Philip R. Auld 2004-01-28 20:47 ` Patrick Mansfield 2004-01-28 22:37 ` Mike Christie 0 siblings, 2 replies; 19+ messages in thread From: Philip R. Auld @ 2004-01-28 18:00 UTC (permalink / raw) To: James Bottomley; +Cc: Simon Kelley, SCSI Mailing List Hi James, Thanks for the reply. Rumor has it that on Wed, Jan 28, 2004 at 10:57:29AM -0600 James Bottomley said: > On Wed, 2004-01-28 at 09:02, Philip R. Auld wrote: > > 1) load balancing when possible: it's not enough to be just a > > failover mechanism. > > For first out, a failover target supplies most of the needs. Nothing > prevents an aggregation target being added later, but failover is > essential. Yes, failover is necessary, but not sufficient to make a decent multipath driver. I'm a little concerned by the "added later" part. Shouldn't it be designed in? > > > 2) requiring a userspace program to execute for failover is > > problematic when it could be the root disk that needs > > failing over. > > Userspace is required for *configuration* not failover. The path > targets failover automatically as they see I/O down particular paths > timing out or failing. That being said, nothing prevents async > notifications via hotplug also triggering a failover (rather than having > to wait for timeout etc) but it's not required. > There was some discussion about vendor plugins to do proprietary failover when needed. Say to make a passive path active. My concern is that this be memory resident and that we not have to go to the disk for such things. If there is no need for that or it's done in such a way that it doesn't need the disk this won't be a problem then. > > 3) Handling partitions is a problem. > > It is? How? The block approach can sit either above or below partitions > (assuming a slightly more flexible handling of partitions that dm > provides). > Great, that works for me. > > I see multipath and md/RAID as two different animals. Multipathing is > > multiple ways to reach the same physical block. That is it's under > > the logical layer. While RAID is multiple ways to reach the same > > logical block. It's basically many-to-one vs one-to-many. Having > > multiple physical paths to a logical partition is a little > > counter-intuitive. > > Nothing in the discussion assumed them to be similar. The notes were > that md already had a multi-path target, and that translated error > indications would be useful to software raid. Thus designing the > fastfail to cater to both looks like a good idea, but that's just good > design. Robust multi-pathing and raid will be built on top of > scsi/block fastfail. Aside from the implicit one made by having a multi-path target in the md driver, I guess that's true. As I said, I think this all sounds great. I just hadn't see these (non?)issues addressed on the list after they were raised and want to make sure there was some thought given to them. Cheers, Phil > > James > -- Philip R. Auld, Ph.D. Egenera, Inc. Principal Software Engineer 165 Forest St. (508) 858-2628 Marlboro, MA 01752 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is there a grand plan for FC failover? 2004-01-28 18:00 ` Philip R. Auld @ 2004-01-28 20:47 ` Patrick Mansfield 2004-01-28 22:14 ` James Bottomley 2004-01-28 22:37 ` Mike Christie 1 sibling, 1 reply; 19+ messages in thread From: Patrick Mansfield @ 2004-01-28 20:47 UTC (permalink / raw) To: Philip R. Auld; +Cc: James Bottomley, Simon Kelley, SCSI Mailing List, dm-devel [cc-ing dm-devel] My two main issues with dm multipath versus scsi core multipath are: 1) It does not handle character devices. 2) It does not have the information available about the state of the scsi_device or scsi_host (for path selection), or about the elevator. If we end up passing all the scsi information up to dm, and it does the same things that we already do in scsi (or in block), what is the point of putting the code into a separate layer? More scsi fastfail like code is still needed - probably for all the cases where scsi_dev_queue_ready and scsi_host_queue_ready return 0 - and more. For example, should we somehow make sdev->queue_depth available to dm? There are still issues with with a per path elevator (i.e. we have an elevator for each path rather than the entire device) that probably won't be fixed cleanly in 2.6. AFAIUI this requires moving dm from a bio based approach to a request based one. -- Patrick Mansfield ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is there a grand plan for FC failover? 2004-01-28 20:47 ` Patrick Mansfield @ 2004-01-28 22:14 ` James Bottomley 2004-01-29 0:55 ` Patrick Mansfield 0 siblings, 1 reply; 19+ messages in thread From: James Bottomley @ 2004-01-28 22:14 UTC (permalink / raw) To: Patrick Mansfield Cc: Philip R. Auld, Simon Kelley, SCSI Mailing List, dm-devel On Wed, 2004-01-28 at 15:47, Patrick Mansfield wrote: > [cc-ing dm-devel] > > My two main issues with dm multipath versus scsi core multipath are: > > 1) It does not handle character devices. Multi-path character devices are pretty much corner cases. It's not clear to me that you need to handle them in kernel at all. Things like multi-path tape often come with an application that's perfectly happy to take the presentation of two or more tape devices. Do we have a character device example we need to support as a single device? > 2) It does not have the information available about the state of the > scsi_device or scsi_host (for path selection), or about the elevator. Well, this is one of those abstraction case things. Can we make the information generic enough that the pathing layer makes the right decisions without worrying about what the underlying internals are? That's where enhancements to the fastfail layer come in. I believe we can get the fastfail information to the point where we can use it to make good decisions regardless of underlying transport (or even subsystem). > If we end up passing all the scsi information up to dm, and it does the > same things that we already do in scsi (or in block), what is the point of > putting the code into a separate layer? It's for interpretation by those modular add-ons that are allowed to cater to specific devices. > More scsi fastfail like code is still needed - probably for all the cases > where scsi_dev_queue_ready and scsi_host_queue_ready return 0 - and more. > For example, should we somehow make sdev->queue_depth available to dm? I agree. We only have the basics at the moment. Expanding the error indications is a necessary next step. > There are still issues with with a per path elevator (i.e. we have an > elevator for each path rather than the entire device) that probably won't > be fixed cleanly in 2.6. AFAIUI this requires moving dm from a bio based > approach to a request based one. We had the "where does the elevator go" discussion at the OLS bof. I think I heard agreement that the current situation of between dm and block is suboptimal and that we'd like a true coalescing elevator above dm with a vestigial one for the mid-layer to use for queueing below. I think this is a requirement for dm multipath to work well, but it's not a requirement for it actually to work. James ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is there a grand plan for FC failover? 2004-01-28 22:14 ` James Bottomley @ 2004-01-29 0:55 ` Patrick Mansfield 2004-01-30 19:48 ` [dm-devel] " Joe Thornber 0 siblings, 1 reply; 19+ messages in thread From: Patrick Mansfield @ 2004-01-29 0:55 UTC (permalink / raw) To: James Bottomley; +Cc: Philip R. Auld, Simon Kelley, SCSI Mailing List, dm-devel On Wed, Jan 28, 2004 at 05:14:26PM -0500, James Bottomley wrote: > On Wed, 2004-01-28 at 15:47, Patrick Mansfield wrote: > > [cc-ing dm-devel] > > > > My two main issues with dm multipath versus scsi core multipath are: > > > > 1) It does not handle character devices. > > Multi-path character devices are pretty much corner cases. It's not > clear to me that you need to handle them in kernel at all. Things like > multi-path tape often come with an application that's perfectly happy to > take the presentation of two or more tape devices. I have not seen such applications. Standard applications like tar and cpio are not going to work well. If you plug a single ported tape drive or other scsi device into a fibre channel SAN, it will show up multiple times, the hardware itself need not be multiported. > Do we have a > character device example we need to support as a single device? Not that I know of, but I have not worked in this area recently. I assume there are also fibre attached media changers. BTW, we need some sort of udev rules so we can have a multi-path device (sd part, not dm part) actually show up multiple times. > > 2) It does not have the information available about the state of the > > scsi_device or scsi_host (for path selection), or about the elevator. > > Well, this is one of those abstraction case things. Can we make the > information generic enough that the pathing layer makes the right > decisions without worrying about what the underlying internals are? I don't think current interfaces and passing up error codes will be enough, for example: a queue full on a given path (aka scsi_device) when there is no other IO on that device could lead to starvation, similiar to one node in a cluster starving other nodes out. Limiting IO via some sort of queue_depth in dm would help solve this particular problem, but there is nothing in place today for dm to have its own request queue or be request based, limiting the number of bio's to an aribitrary value would suck, also the sdev->queue_depth is not visible to dm today. > That's where enhancements to the fastfail layer come in. I believe we > can get the fastfail information to the point where we can use it to > make good decisions regardless of underlying transport (or even > subsystem). > > If we end up passing all the scsi information up to dm, and it does the > > same things that we already do in scsi (or in block), what is the point of > > putting the code into a separate layer? > > It's for interpretation by those modular add-ons that are allowed to > cater to specific devices. I'm not sure what you mean - adding code or data that is only every used by dm is wasted if you're not using dm. > > More scsi fastfail like code is still needed - probably for all the cases > > where scsi_dev_queue_ready and scsi_host_queue_ready return 0 - and more. > > For example, should we somehow make sdev->queue_depth available to dm? > > I agree. We only have the basics at the moment. Expanding the error > indications is a necessary next step. Yes, I was looking into this for use with changes Mike C is working on - pass up an error via end_that_request_first or such. > We had the "where does the elevator go" discussion at the OLS bof. I > think I heard agreement that the current situation of between dm and > block is suboptimal and that we'd like a true coalescing elevator above > dm with a vestigial one for the mid-layer to use for queueing below. I > think this is a requirement for dm multipath to work well, but it's not > a requirement for it actually to work. If the performance is bad enough, it doesn't matter if it works. -- Patrick Mansfield ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dm-devel] Re: Is there a grand plan for FC failover? 2004-01-29 0:55 ` Patrick Mansfield @ 2004-01-30 19:48 ` Joe Thornber 2004-01-31 9:30 ` Jens Axboe 0 siblings, 1 reply; 19+ messages in thread From: Joe Thornber @ 2004-01-30 19:48 UTC (permalink / raw) To: dm-devel; +Cc: James Bottomley, Philip R. Auld, Simon Kelley, SCSI Mailing List On Wed, Jan 28, 2004 at 04:55:34PM -0800, Patrick Mansfield wrote: > > We had the "where does the elevator go" discussion at the OLS bof. I > > think I heard agreement that the current situation of between dm and > > block is suboptimal and that we'd like a true coalescing elevator above > > dm with a vestigial one for the mid-layer to use for queueing below. I > > think this is a requirement for dm multipath to work well, but it's not > > a requirement for it actually to work. > > If the performance is bad enough, it doesn't matter if it works. It would be great to get some benchmarks to back up these arguments. eg, performance of dm mpath with a simple round robin selector, compared to a scsi layer implementation. Lifting the elevator (or lowering dm) is a big piece of work that I wont even consider unless there is very good reason; the reason probably needs to be broader than just multipath too. Even if we did decide to do this, it won't happen in 2.6. - Joe ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dm-devel] Re: Is there a grand plan for FC failover? 2004-01-30 19:48 ` [dm-devel] " Joe Thornber @ 2004-01-31 9:30 ` Jens Axboe 2004-01-31 16:59 ` Philip R. Auld 0 siblings, 1 reply; 19+ messages in thread From: Jens Axboe @ 2004-01-31 9:30 UTC (permalink / raw) To: Joe Thornber Cc: dm-devel, James Bottomley, Philip R. Auld, Simon Kelley, SCSI Mailing List On Fri, Jan 30 2004, Joe Thornber wrote: > On Wed, Jan 28, 2004 at 04:55:34PM -0800, Patrick Mansfield wrote: > > > We had the "where does the elevator go" discussion at the OLS bof. I > > > think I heard agreement that the current situation of between dm and > > > block is suboptimal and that we'd like a true coalescing elevator above > > > dm with a vestigial one for the mid-layer to use for queueing below. I > > > think this is a requirement for dm multipath to work well, but it's not > > > a requirement for it actually to work. > > > > If the performance is bad enough, it doesn't matter if it works. > > It would be great to get some benchmarks to back up these arguments. > eg, performance of dm mpath with a simple round robin selector, > compared to a scsi layer implementation. Lifting the elevator (or > lowering dm) is a big piece of work that I wont even consider unless > there is very good reason; the reason probably needs to be broader > than just multipath too. Even if we did decide to do this, it won't > happen in 2.6. I suspect the problem really isn't that huge in 2.6, since most performance file systems are using mpage or building their own big bio's. So in a sense, some of the merging already does happen above dm (and the io scheduler). -- Jens Axboe ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dm-devel] Re: Is there a grand plan for FC failover? 2004-01-31 9:30 ` Jens Axboe @ 2004-01-31 16:59 ` Philip R. Auld 2004-01-31 17:42 ` Jens Axboe 0 siblings, 1 reply; 19+ messages in thread From: Philip R. Auld @ 2004-01-31 16:59 UTC (permalink / raw) To: Jens Axboe Cc: Joe Thornber, dm-devel, James Bottomley, Simon Kelley, SCSI Mailing List Rumor has it that on Sat, Jan 31, 2004 at 10:30:37AM +0100 Jens Axboe said: > On Fri, Jan 30 2004, Joe Thornber wrote: > > It would be great to get some benchmarks to back up these arguments. > > eg, performance of dm mpath with a simple round robin selector, > > compared to a scsi layer implementation. Lifting the elevator (or > > lowering dm) is a big piece of work that I wont even consider unless > > there is very good reason; the reason probably needs to be broader > > than just multipath too. Even if we did decide to do this, it won't > > happen in 2.6. > > I suspect the problem really isn't that huge in 2.6, since most > performance file systems are using mpage or building their own big > bio's. So in a sense, some of the merging already does happen above dm > (and the io scheduler). Out of curiosity, where does raw io fit into that in 2.6? Thanks, Phil -- Philip R. Auld, Ph.D. Egenera, Inc. Principal Software Engineer 165 Forest St. (508) 858-2628 Marlboro, MA 01752 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dm-devel] Re: Is there a grand plan for FC failover? 2004-01-31 16:59 ` Philip R. Auld @ 2004-01-31 17:42 ` Jens Axboe 2004-02-12 15:17 ` Philip R. Auld 0 siblings, 1 reply; 19+ messages in thread From: Jens Axboe @ 2004-01-31 17:42 UTC (permalink / raw) To: Philip R. Auld Cc: Joe Thornber, dm-devel, James Bottomley, Simon Kelley, SCSI Mailing List On Sat, Jan 31 2004, Philip R. Auld wrote: > Rumor has it that on Sat, Jan 31, 2004 at 10:30:37AM +0100 Jens Axboe said: > > On Fri, Jan 30 2004, Joe Thornber wrote: > > > It would be great to get some benchmarks to back up these arguments. > > > eg, performance of dm mpath with a simple round robin selector, > > > compared to a scsi layer implementation. Lifting the elevator (or > > > lowering dm) is a big piece of work that I wont even consider unless > > > there is very good reason; the reason probably needs to be broader > > > than just multipath too. Even if we did decide to do this, it won't > > > happen in 2.6. > > > > I suspect the problem really isn't that huge in 2.6, since most > > performance file systems are using mpage or building their own big > > bio's. So in a sense, some of the merging already does happen above dm > > (and the io scheduler). > > Out of curiosity, where does raw io fit into that in 2.6? raw io (or O_DIRECT io, same path) should work even better, always send out bio's as big as the underlying device can support. -- Jens Axboe ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dm-devel] Re: Is there a grand plan for FC failover? 2004-01-31 17:42 ` Jens Axboe @ 2004-02-12 15:17 ` Philip R. Auld 2004-02-12 15:28 ` Arjan van de Ven 0 siblings, 1 reply; 19+ messages in thread From: Philip R. Auld @ 2004-02-12 15:17 UTC (permalink / raw) To: Jens Axboe Cc: Joe Thornber, dm-devel, James Bottomley, Simon Kelley, SCSI Mailing List Rumor has it that on Sat, Jan 31, 2004 at 06:42:01PM +0100 Jens Axboe said: > On Sat, Jan 31 2004, Philip R. Auld wrote: > > Rumor has it that on Sat, Jan 31, 2004 at 10:30:37AM +0100 Jens Axboe said: > > > On Fri, Jan 30 2004, Joe Thornber wrote: > > > > It would be great to get some benchmarks to back up these arguments. > > > > eg, performance of dm mpath with a simple round robin selector, > > > > compared to a scsi layer implementation. Lifting the elevator (or > > > > lowering dm) is a big piece of work that I wont even consider unless > > > > there is very good reason; the reason probably needs to be broader > > > > than just multipath too. Even if we did decide to do this, it won't > > > > happen in 2.6. > > > > > > I suspect the problem really isn't that huge in 2.6, since most > > > performance file systems are using mpage or building their own big > > > bio's. So in a sense, some of the merging already does happen above dm > > > (and the io scheduler). > > > > Out of curiosity, where does raw io fit into that in 2.6? > > raw io (or O_DIRECT io, same path) should work even better, always send > out bio's as big as the underlying device can support. > That size is based on the blocksize? Is there a way to set the block size higher than 512 w/o mounting it? I've gotten really bad rawio performance on 2.4. since I have a limit of 32 sg entries. When Rawio uses a 512 byte blocksize IOs are limited to 16K. Will this still be a problem in 2.6? Thanks again, Phil > -- > Jens Axboe > > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Philip R. Auld, Ph.D. Egenera, Inc. Principal Software Engineer 165 Forest St. (508) 858-2628 Marlboro, MA 01752 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dm-devel] Re: Is there a grand plan for FC failover? 2004-02-12 15:17 ` Philip R. Auld @ 2004-02-12 15:28 ` Arjan van de Ven 2004-02-12 16:03 ` Philip R. Auld 0 siblings, 1 reply; 19+ messages in thread From: Arjan van de Ven @ 2004-02-12 15:28 UTC (permalink / raw) To: Philip R. Auld; +Cc: SCSI Mailing List [-- Attachment #1: Type: text/plain, Size: 429 bytes --] > That size is based on the blocksize? Is there a way to set the block size higher than > 512 w/o mounting it? I've gotten really bad rawio performance on 2.4. since I have a > limit of 32 sg entries. When Rawio uses a 512 byte blocksize IOs are limited to 16K. > > Will this still be a problem in 2.6? If you code the driver right, it's not even a problem in 2.4 if you use a Enterprise Linux distro kernel..... [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [dm-devel] Re: Is there a grand plan for FC failover? 2004-02-12 15:28 ` Arjan van de Ven @ 2004-02-12 16:03 ` Philip R. Auld 0 siblings, 0 replies; 19+ messages in thread From: Philip R. Auld @ 2004-02-12 16:03 UTC (permalink / raw) To: Arjan van de Ven; +Cc: SCSI Mailing List Rumor has it that on Thu, Feb 12, 2004 at 04:28:41PM +0100 Arjan van de Ven said: > > > That size is based on the blocksize? Is there a way to set the block size higher than > > 512 w/o mounting it? I've gotten really bad rawio performance on 2.4. since I have a > > limit of 32 sg entries. When Rawio uses a 512 byte blocksize IOs are limited to 16K. > > > > Will this still be a problem in 2.6? > > If you code the driver right, it's not even a problem in 2.4 if you use > a Enterprise Linux distro kernel..... > Hi Arjan, I think you know that I do use your Enterprise kernel :) Can you please explain what "right" means in this case? I've a hardware limit of 32 sg entries. When the blocksize is the default hardware sector size of 512 I get commands of 32 separate 512 bytes scatter-gather entries. This is deteremined at a higher level than my LLDD. What is there that I can do about it in the driver? Cheers, Phil -- Philip R. Auld, Ph.D. Egenera, Inc. Principal Software Engineer 165 Forest St. (508) 858-2628 Marlboro, MA 01752 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is there a grand plan for FC failover? 2004-01-28 18:00 ` Philip R. Auld 2004-01-28 20:47 ` Patrick Mansfield @ 2004-01-28 22:37 ` Mike Christie 2004-01-29 15:24 ` Philip R. Auld 1 sibling, 1 reply; 19+ messages in thread From: Mike Christie @ 2004-01-28 22:37 UTC (permalink / raw) To: Philip R. Auld; +Cc: James Bottomley, Simon Kelley, SCSI Mailing List Philip R. Auld wrote: > Hi James, > Thanks for the reply. > > Rumor has it that on Wed, Jan 28, 2004 at 10:57:29AM -0600 James Bottomley said: > >>On Wed, 2004-01-28 at 09:02, Philip R. Auld wrote: >> >>> 1) load balancing when possible: it's not enough to be just a >>> failover mechanism. >> >>For first out, a failover target supplies most of the needs. Nothing >>prevents an aggregation target being added later, but failover is >>essential. > > > Yes, failover is necessary, but not sufficient to make a decent multipath driver. > I'm a little concerned by the "added later" part. Shouldn't it be designed in? The DM multipath can do both. Unfortunately, some work still needs to be done. For example, when doing load balancing how much IO to send down each path is an issue that the DM maintainer had asked for feedback on. Suggestions? > >>> 2) requiring a userspace program to execute for failover is >>> problematic when it could be the root disk that needs >>> failing over. >> >>Userspace is required for *configuration* not failover. The path >>targets failover automatically as they see I/O down particular paths >>timing out or failing. That being said, nothing prevents async >>notifications via hotplug also triggering a failover (rather than having >>to wait for timeout etc) but it's not required. >> > > > There was some discussion about vendor plugins to do proprietary failover > when needed. Say to make a passive path active. My concern is that this be > memory resident and that we not have to go to the disk for such things. > > If there is no need for that or it's done in such a way that it doesn't need > the disk this won't be a problem then. > This is something both MD and DM multiapth are not yet fully prepared for. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is there a grand plan for FC failover? 2004-01-28 22:37 ` Mike Christie @ 2004-01-29 15:24 ` Philip R. Auld 2004-01-29 16:00 ` James Bottomley 0 siblings, 1 reply; 19+ messages in thread From: Philip R. Auld @ 2004-01-29 15:24 UTC (permalink / raw) To: Mike Christie; +Cc: James Bottomley, Simon Kelley, SCSI Mailing List Rumor has it that on Wed, Jan 28, 2004 at 02:37:28PM -0800 Mike Christie said: > Philip R. Auld wrote: > > Hi James, > > Thanks for the reply. > > > > Rumor has it that on Wed, Jan 28, 2004 at 10:57:29AM -0600 James Bottomley said: > > > >>On Wed, 2004-01-28 at 09:02, Philip R. Auld wrote: > >> > >>> 1) load balancing when possible: it's not enough to be just a > >>> failover mechanism. > >> > >>For first out, a failover target supplies most of the needs. Nothing > >>prevents an aggregation target being added later, but failover is > >>essential. > > > > > > Yes, failover is necessary, but not sufficient to make a decent multipath driver. > > I'm a little concerned by the "added later" part. Shouldn't it be designed in? > > The DM multipath can do both. Unfortunately, some work still needs to be > done. For example, when doing load balancing how much IO to send down > each path is an issue that the DM maintainer had asked for feedback on. > Suggestions? > That leads back to where to put the request merging and elevator code... The way I currently do load-balancing is on a scsi_cmnd basis. At that point the IO is coalesced already. A shortest queue_depth falling back to round-robin algorithm balances really well on individual commands. From a logical block level, I'm not sure how. I'm a little unfamiliar (as my posts have shown) with how/where the DM code fits in. (I've also not gotten a good look into 2.6 BIO... busy on 2.4 still.) I think the place to do load balancing would be below the block queue so that the IOs are coalesced. IMO, until you've merged the individual block requests you can't make a good decision about how to balance the load. -- Philip R. Auld, Ph.D. Egenera, Inc. Principal Software Engineer 165 Forest St. (508) 858-2628 Marlboro, MA 01752 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is there a grand plan for FC failover? 2004-01-29 15:24 ` Philip R. Auld @ 2004-01-29 16:00 ` James Bottomley 2004-01-29 23:25 ` Mike Christie 0 siblings, 1 reply; 19+ messages in thread From: James Bottomley @ 2004-01-29 16:00 UTC (permalink / raw) To: Philip R. Auld; +Cc: Mike Christie, Simon Kelley, SCSI Mailing List On Thu, 2004-01-29 at 10:24, Philip R. Auld wrote: > I think the place to do load balancing would be below the block queue so that > the IOs are coalesced. IMO, until you've merged the individual block requests > you can't make a good decision about how to balance the load. Yes, that's what I think too, so the elevator should go above dm, with a vestigial elevator between dm and sd simply for SCSI to use in queuing. Unfortunately, the way it works today is that the elevator is between dm and sd. I know people are thinking about how to change this, but I'm not sure how far anyone's got. James ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is there a grand plan for FC failover? 2004-01-29 16:00 ` James Bottomley @ 2004-01-29 23:25 ` Mike Christie 0 siblings, 0 replies; 19+ messages in thread From: Mike Christie @ 2004-01-29 23:25 UTC (permalink / raw) To: James Bottomley Cc: Philip R. Auld, Simon Kelley, SCSI Mailing List, Jens Axboe James Bottomley wrote: > On Thu, 2004-01-29 at 10:24, Philip R. Auld wrote: > >>I think the place to do load balancing would be below the block queue so that >>the IOs are coalesced. IMO, until you've merged the individual block requests >>you can't make a good decision about how to balance the load. > > > Yes, that's what I think too, so the elevator should go above dm, with a > vestigial elevator between dm and sd simply for SCSI to use in queuing. I do not think there is anyone that will disagree that there should be one elevator that does the sorting, merging, aging etc and that it should be at a different layer than it is today for DM multipath. The modifications to DM to allow a multipath device to have a queue and to use it are minial. The problem is it's request_fn_proc. In the DM request_fn_proc we could just dequeue a request from the DM device's queue, clone it so the IO sched state does not get overwritten, map it to the correct lower level device queue, then insert it in the lower level device queue. Bad design right? We could do something similar where the request_fn_proc recieves a request instead of a queue like Jen's is planning for 2.7 (http://www.kernel.org/pub/linux/kernel/people/axboe/experimental/), and then build a framework similar to bio remapping where we just clone and remap requests to the correct driver. But still a bad design right? It doesn't seem like there are a lot of options for 2.6 with the current framework. We can add hints so we can try to make a decent decsion, but it isn't going to be perfect. But as James said the elevator problem is not a requirement for dm to actually work. I guess people will continue rolling their own solution if they feel the performance is that bad. I cc'd Jens as he is best qualified to describe why there are problems with queueing requests between different queue's. I apoligize for bugging him about this again. Mike ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2004-02-12 16:05 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-01-26 14:18 Is there a grand plan for FC failover? Simon Kelley 2004-01-26 15:37 ` James Bottomley 2004-01-28 15:02 ` Philip R. Auld 2004-01-28 16:57 ` James Bottomley 2004-01-28 18:00 ` Philip R. Auld 2004-01-28 20:47 ` Patrick Mansfield 2004-01-28 22:14 ` James Bottomley 2004-01-29 0:55 ` Patrick Mansfield 2004-01-30 19:48 ` [dm-devel] " Joe Thornber 2004-01-31 9:30 ` Jens Axboe 2004-01-31 16:59 ` Philip R. Auld 2004-01-31 17:42 ` Jens Axboe 2004-02-12 15:17 ` Philip R. Auld 2004-02-12 15:28 ` Arjan van de Ven 2004-02-12 16:03 ` Philip R. Auld 2004-01-28 22:37 ` Mike Christie 2004-01-29 15:24 ` Philip R. Auld 2004-01-29 16:00 ` James Bottomley 2004-01-29 23:25 ` Mike Christie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox