public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Benefits from computing physical IDE disk geometry?
@ 2003-04-12 22:46 Timothy Miller
  2003-04-12 23:10 ` AW: " Oliver S.
                   ` (4 more replies)
  0 siblings, 5 replies; 34+ messages in thread
From: Timothy Miller @ 2003-04-12 22:46 UTC (permalink / raw)
  To: linux-kernel

I'm excited about the new I/O scheduler (proposed?) in the 2.5.x kernel, but
I have to admit to a considerable amount of ignorance of its actual
behavior.  Thus, if it already does what I'm talking about, please feel free
to ignore this post.  :)


Any good SCSI drive knows the physical geometry of the disk and can
therefore optimally schedule reads and writes.  Although necessary features,
like read queueing, are also available in the current SATA spec, I'm not
sure most drives will implement it, at least not very well.

So, what if one were to write a program which would perform a bunch of
seek-time tests to estimate an IDE disk's physical geometry?  It could then
make that information available to the kernel to use to reorder accesses
more optimally.  Additionally, discrepancies from expected seek times could
be logged in the kernel and used to further improve efficiency over time.
If it were good enough, many of the advantages of using SCSI disks would
become less significant.

Ideas?




^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: Benefits from computing physical IDE disk geometry?
@ 2003-04-13 18:03 Chuck Ebbert
  2003-04-13 18:24 ` Dr. David Alan Gilbert
  2003-04-13 22:15 ` Alan Cox
  0 siblings, 2 replies; 34+ messages in thread
From: Chuck Ebbert @ 2003-04-13 18:03 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

Nick Piggin wrote:


>>
>> Any good SCSI drive knows the physical geometry of the disk and can
>> therefore optimally schedule reads and writes.  Although necessary features,
>> like read queueing, are also available in the current SATA spec, I'm not
>> sure most drives will implement it, at least not very well.
>>
> The "continuous" nature of drive addressing means that the kernel
> can do a fine job seek-wise. Due to write caches and read track
> buffers, rotational scheduling (which could be done if we knew
> geometry) would provide too little gain for the complexity. I would
> say that for most workloads you wouldn't see any difference. (IMO)


  OTOH you can come up with scenarios like, say, a DBMS doing 16K page
aligned IO to raw devices where you might see big gains from making sure
those 16K chunks didn't cross a physical cylinder boundary.


--
 Chuck

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: Benefits from computing physical IDE disk geometry?
@ 2003-04-13 22:13 Chuck Ebbert
  2003-04-13 23:38 ` Andreas Dilger
  0 siblings, 1 reply; 34+ messages in thread
From: Chuck Ebbert @ 2003-04-13 22:13 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: linux-kernel

Lars Marowsky-Bree wrote:


> Object Based Storage (see Lustre).


  Thanks, I was trying to remember where I'd seen that.

  Is anyone actually making such things for sale?

--
 Chuck

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: Benefits from computing physical IDE disk geometry?
@ 2003-04-14  2:29 Chuck Ebbert
  0 siblings, 0 replies; 34+ messages in thread
From: Chuck Ebbert @ 2003-04-14  2:29 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel


> You couldn't even tell where such boundaries exist, or what the real
> block size of the underlying media is. Cyliners are all different sizes.


 Not even if you do timing tests?  I know people have done tests that
pinpoint where the xfer rate changes, for example.  I'm sure it
wouldn't be easy, but I bet you could get some useful information.
And at the very least, remapped sectors should be easy to spot...


--

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: Benefits from computing physical IDE disk geometry?
@ 2003-04-14  3:44 Chuck Ebbert
  0 siblings, 0 replies; 34+ messages in thread
From: Chuck Ebbert @ 2003-04-14  3:44 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Alan Cox wrote:


> You couldn't even tell where such boundaries exist, or what the real
> block size of the underlying media is. Cyliners are all different sizes.


  Found this by accident while skimming a new book; haven't visited yet
but it may be of interest:

    A technical report describing Skippy and two other disk drive
microbenchmarks (run in seconds or minutes rather than hours or days)
is at

      http://sunsite.berkeley.edu/
             Dienst/UI/2.0/Describe/ncstrl.ucb/CSD-99-1063

(Hennessey & Patterson 3rd ed., Ch.7, exercise 5.)


--
 "Let's fight till six, and then have dinner," said Tweedledum.
  --Lewis Carroll, _Through the Looking Glass_

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: Benefits from computing physical IDE disk geometry?
@ 2003-04-14 21:27 Chuck Ebbert
  2003-04-15  0:03 ` Nick Piggin
  0 siblings, 1 reply; 34+ messages in thread
From: Chuck Ebbert @ 2003-04-14 21:27 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

Nick Piggin wrote:


>>
>>but seek time is some combination of headswitch time, rotational 
>>latency, and actual head motion.  the first three are basically
>>not accessible in any practical way.  the latter is easy, and the 
>>
> Yep which is one reason why I don't think being fancy will pay
> off (the other reason being that even if you did know the parameters
> I don't think they would help a lot).


 The graphs in the "skippy" disk benchmark paper are
interesting -- some disks show bizarre peaks and dips in their
times across increasing sector distances. (The lesson here might
be to test your disks and avoid the bad actors rather than try
to compensate for their behavior, though.)


> There is (in AS) no cost function further than comparing distance
> from the head. Closest forward seek wins.


 The RAID1 code has its own scheduler that does similar things.  Why
aren't they being integrated? (See raid1.c:read_balance())

--
 Chuck

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: Benefits from computing physical IDE disk geometry?
@ 2003-04-15  1:19 Chuck Ebbert
  2003-04-15  8:28 ` Nick Piggin
  0 siblings, 1 reply; 34+ messages in thread
From: Chuck Ebbert @ 2003-04-15  1:19 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel


> If RAID1 can use the generic elevator then it should. I
> guess it can't though.


  No, but it is feeding IO requests into the elevators of the 
block devices below it.  For a given read, all it wants to do
is pick one device to handle the work.  If it could look into
the queues maybe it could make better decisions.

--
 Chuck

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: Benefits from computing physical IDE disk geometry?
@ 2003-04-15 18:33 Chuck Ebbert
  2003-04-16  1:16 ` Nick Piggin
  0 siblings, 1 reply; 34+ messages in thread
From: Chuck Ebbert @ 2003-04-15 18:33 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel, linux-raid

Nick Piggin wrote:


> OK right. As far as I can see, the algorithm in the RAID1 code
> is used to select the best drive to read from? If that is the
> case then I don't think it could make better decisions given
> more knowledge.


  How about if it just asks the elevator whether or not a given read
is a good fit with its current workload?  I saw in 2.5 where the balance
code is looking at the number of pending requests and if it's zero then
it sends it to that device.  Somehow I think something better than
that could be done, anyway.


> It seems to me that a better way to layer it would be to have
> the complex (ie deadline/AS/CFQ/etc) scheduler handling all
> requests into the raid block device, then having a raid
> scheduler distributing to the disks, and having the disks
> run no scheduler (fifo).


 That only works if RAID1 is working at the physical disk level (which
it should be AFAIC but people want flexibility to mirror partitions.)


> In practice the current scheme probably works OK, though I
> wouldn't know due to lack of resources here :P


 I've been playing with the 2.4 read balance code and have some
improvements, but real gains need a new approach.

(cc'd to linux-raid)

--
 Chuck

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: Benefits from computing physical IDE disk geometry?
@ 2003-04-16 13:28 Chuck Ebbert
  2003-04-16 23:06 ` Nick Piggin
  0 siblings, 1 reply; 34+ messages in thread
From: Chuck Ebbert @ 2003-04-16 13:28 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel, linux-raid


> The way you would do a good "goodness" function, I guess,
> would be to search through all requests on the device, and return
> the minimum distance from the request you are running the query
> on. Do this for both queues, and insert the request into the
> queue with the smallest delta. I don't see much else doing any
> good.


  That would be perfect.  And like you say in a later message, they're
in a tree so it might actually work.  Then the read balance code
wouldn't need to do that calculation at all.

  How hard would this be to add?



> On the other hand, if you simply have a fifo after the RAID
> scheduler, the RAID scheduler itself knows where each disk's
> head will end up simply by tracking the value of the last
> sector it has submitted to the device. It also has the advantage
> that it doesn't have "high level" scheduling stuff below it
> ie. request deadline handling, elevator scheme, etc.
> 
> This gives the RAID scheduler more information, without
> taking any away from the high level scheduler AFAIKS.


 But then wouldn't you have to put all that into the RAID
scheduler?

--
 Chuck

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2003-04-18 13:10 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-12 22:46 Benefits from computing physical IDE disk geometry? Timothy Miller
2003-04-12 23:10 ` AW: " Oliver S.
2003-04-13  9:51 ` John Bradford
2003-04-13 11:50 ` Nick Piggin
2003-04-13 15:25   ` Timothy Miller
2003-04-14  3:52     ` Nick Piggin
2003-04-14  6:44       ` Mark Hahn
2003-04-14 13:28         ` Nick Piggin
2003-04-13 14:29 ` Alan Cox
2003-04-13 16:15   ` John Bradford
2003-04-18 13:01     ` Helge Hafting
2003-04-18 13:25       ` John Bradford
2003-04-14 18:27 ` Wes Felter
  -- strict thread matches above, loose matches on Subject: below --
2003-04-13 18:03 Chuck Ebbert
2003-04-13 18:24 ` Dr. David Alan Gilbert
2003-04-13 18:32   ` Lars Marowsky-Bree
2003-04-13 18:51     ` Dr. David Alan Gilbert
2003-04-13 22:14   ` Alan Cox
2003-04-14  0:17     ` Andreas Dilger
2003-04-13 22:15 ` Alan Cox
2003-04-14  3:58   ` Nick Piggin
2003-04-13 22:13 Chuck Ebbert
2003-04-13 23:38 ` Andreas Dilger
2003-04-14  2:29 Chuck Ebbert
2003-04-14  3:44 Chuck Ebbert
2003-04-14 21:27 Chuck Ebbert
2003-04-15  0:03 ` Nick Piggin
2003-04-15  1:19 Chuck Ebbert
2003-04-15  8:28 ` Nick Piggin
2003-04-15 18:33 Chuck Ebbert
2003-04-16  1:16 ` Nick Piggin
2003-04-16  1:59   ` Nick Piggin
2003-04-16 13:28 Chuck Ebbert
2003-04-16 23:06 ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox