linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: Mark Knecht <markknecht@gmail.com>
Cc: Roy Sigurd Karlsbakk <roy@karlsbakk.net>,
	Jeff Johnson <jeff.johnson@aeoncomputing.com>,
	Linux-RAID <linux-raid@vger.kernel.org>
Subject: Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
Date: Sun, 31 Mar 2013 12:41:28 -0500	[thread overview]
Message-ID: <51587548.3060306@hardwarefreak.com> (raw)
In-Reply-To: <CAK2H+efkcjobknaWBupLQpxPkM+m_9JMDGwY1sH9UWw4tc=Czw@mail.gmail.com>

On 3/31/2013 12:15 PM, Mark Knecht wrote:
> On Sun, Mar 31, 2013 at 8:56 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>> On 3/27/2013 5:18 PM, Mark Knecht wrote:
> <SNIP>
>>> Is there a way for me to measure, say over a whole day or some fixed
>>> time, what the workload really looks like?
>>
>> That's not the way to go about this.
>>
> OK
> 
>>> The machine is a basic Gentoo desktop machine running KDE. The only
>>> workload where I really care about performance is that I run a bunch
>>> of Virtualbox Win 7 & Win XP VMs where I need to the performance to be
>>> as good as I can reasonably get. The problem I have is these VMs are
>>> either 1 huge file (40-50GB in a single file) or many 2GB files. I
>>> haven't a clue how Windows & Virtualbox is accessing what it sees as a
>>> virtual drive and then underlying that how the vbox drivers are using
>>> the system to get to the RAID.
>>
>> So you have a bunch of Windows VM guests that write to large sparse
>> files residing on what, EXT4?  NTFS block size is 4KB so that's your
>> smallest IO.
>>
> 
> Currently EXT3 based on my starting point 2 years ago and never having
> changed. I'm open to EXT4 if this discussion show me it warrants the
> work. Would rather not deal with anything more exotic right now.

Doesn't make a difference here.

>>> It would be interesting to set some program running, probably on a
>>> weekend or sometime when performance isn't so critical, and see what
>>> sort of data gets collected, assuming there's a program that does that
>>> sort of thing.
>>
>> Again, that's not the way to approach this.  What would be informative
>> to know is what applications you're running in these Windows VMs.  The
>> application dictates the write pattern.  You don't need a "collector" to
>> tell you that.  You just need to know the application(s).  If you're
>> just running productivity apps (web/mail/pdf/etc) inside these VMs then
>> there's nothing to optimize WRT RAID stripe parameters as you have no
>> sustained write IO.  So what are the Windows apps?
> 
> Currently 3 VMs, but only 2 matter for performance. The one that
> doesn't matter is a VMWare Player VM used for things like watching
> Netflix & Hulu. Nothing much more than that. 1 CPU core dedicated. CPU
> usage is generally low. I haven't paid much attention to disk usage
> for this VM but will check it out.
> 
> Performance VMs:
> 
> 1) This first VM primarily runs TradeStation, a rules-based trading
> platform for trading stocks & futures. I generally run with 2-4 CPU
> cores and almost never uses much computational power. The big deal in
> this VM is stock data caching with years or even decades of data for
> each stock or futures contract. Currently this cache appears to be
> sitting in a single file which is about 3GB in size. This data streams
> into the VM over the net when the markets are open (pretty much 24/7)
> and the cache grows. Depending on the type of market and chart the
> data might be as fine grained as each individual trade taking place
> that day, or it might only be updated once every bar. (1 minute bar, 5
> minute bar, daily bar, etc.) TradeStation reads the cache as it needs
> data. I have no idea what the access looks like in real time but
> generally I expect that it's accessing the data in date order. Whether
> the data is sorted or not in this cache file I have no idea.
> 
> 2) This second VM is more computational in nature. It primarily runs
> two apps for long periods of time, although I don't think either app
> is all that disk intensive. Noth apps read market data once from disk,
> cache it in memory and then computer for hours to days depending on
> what I'm asking them to do. I will say I don't see a lot of disk
> activity lights when either of these programs are running.
> 
> - Adaptrade Builder - a genetic optimization program that attempts to
> generate TradeStation EasyLanguage trading strategies. I believe that
> once it has the market data in memory it's using memory and disk to
> store interesting strategies for me to look at later. The output of
> the program is generally a single file ranging in size from 1MB to
> maybe 50MB.
> 
> - TradingSolutions - a neural network program that attempts to
> generate neural network models for trading markets. Each instance of
> this program (I typically run 2-3 instances) generally has access to
> one file sized 25MB-200MB plus a lot (50-100) small files under 20K in
> size. I have no idea how often any of these programs are read or
> written. The program runs for hours doing it's work.
> 
> I suppose there are other things that happen in the VMs. I run Excel a
> lot, but it's not a lot of data.
> 
> Hopefully that gives you enough info to suggest a direction.

These applications append small data slowly over a long period of time,
which usually means fragmentation.  Thus there's not much to optimize at
the chunk/stripe level, other than keeping chunk size small to spread
random reads over all platters.  You currently have a 16KB chunk, IIRC,
which is about as good as you'll get for this workload.  Given your
applications' low write throughput chunk/strip really doesn't matter.

-- 
Stan


  reply	other threads:[~2013-03-31 17:41 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-27  5:30 Possible to change chunk size on RAID-1 without re-init or destructive result? Jeff Johnson
2013-03-27  5:56 ` Mikael Abrahamsson
2013-03-27  6:02 ` Roman Mamedov
2013-03-27 16:01 ` Roy Sigurd Karlsbakk
2013-03-27 16:23   ` Jeff Johnson
2013-03-27 16:44     ` Roman Mamedov
2013-03-27 19:36     ` Stan Hoeppner
2013-03-27 19:11   ` Stan Hoeppner
2013-03-27 19:23     ` Mark Knecht
2013-03-27 20:10       ` Stan Hoeppner
2013-03-27 21:06         ` Mark Knecht
2013-03-27 22:08           ` Stan Hoeppner
2013-03-27 22:18             ` Mark Knecht
2013-03-31 15:56               ` Stan Hoeppner
2013-03-31 17:15                 ` Mark Knecht
2013-03-31 17:41                   ` Stan Hoeppner [this message]
2013-03-31 17:56                     ` Mark Knecht
2013-04-01  0:28                       ` Stan Hoeppner
2013-04-01 16:46                         ` Mark Knecht
2013-04-02  1:15                           ` Brad Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51587548.3060306@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=jeff.johnson@aeoncomputing.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=markknecht@gmail.com \
    --cc=roy@karlsbakk.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).