From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hannes Reinecke Subject: Re: Re: [k-ueda@ct.jp.nec.com: Re: request-based dm-multipath] Date: Thu, 16 Apr 2009 09:29:22 +0200 Message-ID: <49E6DE52.9090007@suse.de> References: <20090410151914.GA3800@redhat.com> <20090410153102.GC3800@redhat.com> <20090415201829.GB9064@redhat.com> Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: device-mapper development List-Id: dm-devel.ids Mikulas Patocka wrote: > On Wed, 15 Apr 2009, Mike Snitzer wrote: >=20 >> On Wed, Apr 15 2009 at 3:09pm -0400, >> Mikulas Patocka wrote: >> >>> On Fri, 10 Apr 2009, Mike Snitzer wrote: >>> >>>> Hi Mikulaus, >>>> >>>> Figured I'd give you this heads up on the request-based multipath >>>> patches too considering your recent "bottom-layer barrier support" >>>> patchset (where you said multipath support is coming later). >>>> >>>> We likely want to coordinate with the NEC guys so as to make sure th= ings >>>> are in order for the request-based patches to get merged along with = your >>>> remaining barrier work for 2.6.31. >>>> >>>> Mike >>>> >>>> p.s. below you can see I mistakenly said to Kiyoshi that the recent >>>> barrier patches that got merged upstream were "the last of the DM >>>> barrier support"... >>> Hi >>> >>> I would say one thing about the request-based patches --- don't do th= is. >>> >>> Your patch adds an alternate I/O path to request processing on device= =20 >>> mapper. >>> >>> So, with your patch, there will be two I/O request paths. It means th= at=20 >>> any work on generic device-mapper code that will have to be done in t= he=20 >>> future (such as for example barriers that I did) will be twice harder= . It=20 >>> will take twice the time to understand request processing, twice brai= n=20 >>> capacity to remember it, twice the time for coding, twice the time fo= r=20 >>> code review, twice the time for testing. >>> >>> If the patch goes in, it will make a lot of things twice harder. And = once=20 >>> the patch is in productive kernels, there'd be very little possibilit= y to=20 >>> pull it out. >>> >>> What is the exact reason for your patch? I suppose that it's some=20 >>> performance degradation caused by the fact that dm-multipath doesn't=20 >>> distributes requests optimally across both paths. dm-multipath has=20 >>> pluggable path selectors, so you could improve dm-round-robin.c (or w= rite=20 >>> alternate path selector module) and you don't have to touch generic d= m=20 >>> code to solve this problem. >>> >>> The point is that improving dm-multipath target with better path sele= ctor=20 >>> is much less intrusive than patching device mapper core. If you impro= ve=20 >>> dm-multipath target, only people hacking on dm-multipath will have to= =20 >>> learn about your code. If you modify generic dm.c file, anyone doing=20 >>> anything on device mapper must learn about your code --- so human tim= e=20 >>> consumed in much worse in this case. >>> >>> So, try the alternate solution (write new path selector for dm-multip= ath)=20 >>> and then you can compare them and see the result --- and then it can = be=20 >>> consisdered if the high human time consumed by patching dm.c is worth= the=20 >>> performance improvement. >> Mikulas, >> >> Section 3.1 of the the following 2007 Linux Symposium paper answers th= e >> "why?" on request-based dm-multipath: >> http://ols.108.redhat.com/2007/Reprints/ueda-Reprint.pdf >> >> In summary: >> With request-based multipath performance and path error handling is >> improved. =20 >> >> Performance: >> The I/O scheduler is leveraged to merge bios into requests; and these >> requests are then able to be more evenly balanced across the available >> paths (no need to starve other paths like the bio-based multipath is >> prone to do). >=20 > So you can improve the bio-based selector. You can count number&size of= =20 > outstanding requests on each path and select the less loaded path. >=20 And which is what you _cannot_ do. You have _no_ idea at all how the bios are merged in to requests. And as you do scheduling decisions based on the _bios_ you will interfere with the elevator. Hence you always have to select large scheduling intervals as to have the disturbance of the elevator as small as possible= . Just decrease 'min_rr_io' setting and watch the performace to drop. > You can remember several end positions of last requests and when new=20 > request matches one of them, send it to the appropriate path, assuming=20 > that the lower device's scheduler will marge that. Or --- another solut= ion=20 > is to access queues of the underlying devices and ask them if there's=20 > anything to merge --- and then send the request down the path that has=20 > some adjacent request. >=20 But this would duplicate the elevator merging logic, wouldn't it? You would have to out-guess the elevator which requests it would merge next ... > I know that the round-robin selector is silly, but you just haven't eve= n=20 > try to improve it. >=20 > If there is non-intrusive solution (improve path selector), it should b= e=20 > tried first, before making an intrusive solution (alternate request pat= h=20 > in dm core). >=20 >> Error handling: >> Finer grained error statistics are available when interfacing more >> directly with the hardware like the request-based multipath does. >=20 > You can signal it via flags in bios. No need to rewrite dm core. >=20 But this makes failover costly, as you have to fail over each individual bio. Hence you cannot (by design) have a real load balancing as the cost of failover a single path is prohibitively high. Using request-based multipathing OTOH the cost for failover becomes really small and you can do real load-balancing. Like setting rr_min_io setting to '1' and you won't suffer any performance drawback. >> NEC may already have comparative performance data that will help >> illustrate the improvement associated with request-based multipath? >> They apparently have dynamic load balancing patches that they develope= d >> for use with the current bio-based multipath. >=20 > So where is it better and why? Does it save CPU time or disk throughtpu= t?=20 > How? On which workload? >=20 > Did they really try to implement some smart path ballancing that takes=20 > into account merging? >=20 No, they didn't, because of the abovementioned points. Cheers, Hannes --=20 Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg GF: Markus Rex, HRB 16746 (AG N=FCrnberg)