From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: raid0 vs. mkfs Date: Wed, 30 Nov 2016 00:45:10 +0200 Message-ID: <33bb250a-4dfd-0acc-9958-30fdac10918c@scylladb.com> References: <56c83c4e-d451-07e5-88e2-40b085d8681c@scylladb.com> <87oa108a1x.fsf@notabene.neil.brown.name> <286a5fc1-eda3-0421-a88e-b03c09403259@scylladb.com> <87inr880au.fsf@notabene.neil.brown.name> <87d1he7zv9.fsf@notabene.neil.brown.name> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <87d1he7zv9.fsf@notabene.neil.brown.name> Sender: linux-raid-owner@vger.kernel.org To: NeilBrown , linux-raid@vger.kernel.org List-Id: linux-raid.ids On 11/29/2016 11:14 PM, NeilBrown wrote: > On Mon, Nov 28 2016, Avi Kivity wrote: >>> If it is easy for the upper layer to break a very large request into a >>> few very large requests, then I wouldn't necessarily object. >> I can't see why it would be hard. It's simple arithmetic. > That is easy to say before writing the code :-) > It probably is easy for RAID0. Less so for RAID10. Even less for > RAID6. > pick the largest subrange wihin the inpu range whose bounds are 0 (mod stripe-size); TRIM it (for all members); apply the regular algorithm to the head and tail subranges. Works for all RAID types. If the RAID is undergoing reshaping, exclude the range undergoing reshaping, and treat the two halves separately. >>> But unless it is very hard for the lower layer to merge requests, it >>> should be doing that too. >> Merging has tradeoffs. When you merge requests R1, R2, ... Rn you make >> the latency request R1 sum of the latencies of R1..Rn. You may gain >> some efficiency in the process, but that's not going to make up for a >> factor of n. The queuing layer has no way to tell whether the caller is >> interested in the latency of individual requests. By sending large >> requests, the caller indicates it's not interested in the latency of >> individual subranges. The queuing layer is still free to internally >> split the request to smaller ranges, to satisfy hardware constraints, or >> to reduce worst-case latencies for competing request streams. > I would have thought that using plug/unplug to group requests is a > fairly strong statement that they can be handled as a unit if that is > convenient. It is not. As an example, consider a read and a few associated read-ahead requests submitted in a batch. The last thing you want is for them to be treated as a unit. Plug/unplug means: I have a bunch of requests here. Whether they should be merged or reordered is orthogonal to whether they are members of a batch or not. > > >> So I disagree that all the work should be pushed to the merging layer. >> It has less information to work with, so the fewer decisions it has to >> make, the better. > I think that the merging layer should be as efficient as it reasonably > can be, and particularly should take into account plugging. This > benefits all callers. Yes, but plugging does not mean "please merge anything you can until the unplug". > If it can be demonstrated that changes to some of the upper layers bring > further improvements with acceptable costs, then certainly it is good to > have those too. Generating millions of requests only to merge them again is inefficient. It happens in an edge case (TRIM of the entirety of a very large RAID), but it already caused on user to believe the system failed. I think the system should be more robust than that. > NeilBrown