All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Barton <eeb@sun.com>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] Oleg/Mike Work on Apps Metrics - FW: Mike Booth week ending 2009.03.15
Date: Mon, 16 Mar 2009 12:56:45 +0000	[thread overview]
Message-ID: <023201c9a636$a987e4b0$fc97ae10$@com> (raw)
In-Reply-To: <48A6CF99-F630-4581-9FFC-05E3AFF66FA0@Sun.COM>

Mike,

Yes, it would be fun to discuss - but I'm probably not going to be
available for a discussion like that for a week or 2.

BTW, I'm cc-ing lustre-devel since this is of general interest.

I _do_ agree that for some apps, if there was sufficient memory on the
app node to buffer the local component of a checkpoint and let it
"dribble" out to disk would achieve better utilization of the compute
resource.  However parallel apps can be very sensitive to "noise" on
the network they're using for inter- process communication - i.e. the
checkpoint data has either to be written all the way to disk, or at
least buffered somewhere so that moving it to disk will not interfere
with the app's own communications.

This latter concept is the basis for the "flash cache" concept.
Actually, I think it's worth exploring the economics of it in more
detail.

The variables are aggregate network bandwidth into the distributed
checkpoint cache, which determines the checkpoint time, and aggregate
path-minimum bandwidth (i.e. lesser of network and disk bandwidth)
from the cache to disk, which determines how soon the cache can be
ready for the next checkpoint.  The cache could be dedicated nodes and
storage (e.g. flash) or additional storage on the OSSes, or any
combination of either.  And the interesting relationship is how
compute cluster utilisation varies with the cost of the server and
cache subsystems.

-- 

        Cheers,
                   Eric

> -----Original Message-----
> From: Michael.Booth at Sun.COM [mailto:Michael.Booth at Sun.COM]
> Sent: 16 March 2009 3:06 AM
> To: Eric Barton
> Subject: Re: Oleg/Mike Work on Apps Metrics - FW: Mike Booth week ending 2009.03.15
> 
> Eric,
> 
> This is too bad.  I should run the test on my laptop and see if I get
> the same behavior.
> 
> The huge bandwidth requirements (30+ gbyes/sec) that I see for
> checkpoint-style I/O is driven in burst that last about 1/10 of the
> time of the following computation.  There is not a desire to assure
> that everything is on disk before resuming computations.  If while the
> computations proceeded the system cleared out the cache, the next
> write would go to cache at memory speed if the previous clean pages
> could be reused for the next write.  The bandwidth requirement to
> achieve what appears to be memory speed I/o could be achieved in this
> case with 3 gbytes/sec.
> 
>   There are middleware schemes being developed to do asynchronous I/O
> on "other" nodes to transfer the checkpoint data out to the other
> nodes so they write it all out.  To me this is the middleware working
> at odds with what the system software should naturally do for the
> application.
> 
> I think it is safe to say it is a minority of scientific applications
> that are writing out and quickly reading it back like a typical linux
> application, like web browsers.  This type of I/O is usually limited
> to codes that are larger than the sum  of the nodes memory,, which is
> rarer and rarer these days.
> 
> I believe that making this work for these codes is a win in three ways;,
> 
>    One: reduces the need for high burst rate I/O to disk for many
> programs while giving the perception of much faster I/O to the
> application.
> 
>    Two:  helps to reduce the impact of filesystem performance
> variability,
> 
>    Three: Overall in the system, not having the system being hit with
> huge burst of I/O by tens of thousands of cores at seemingly random
> times, could reduce the variability of the complete file system.
> 
> Should we discuss on the phone, with Oleg?
> 
> Thanks,, this is fun,
> 
> Mike
> 
> 
> Michael Booth
> michael.booth at sun.com
> mobile  512-289-3805
> 

       reply	other threads:[~2009-03-16 12:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <D691F7E6-6030-4EB9-896C-77201445D865@Sun.COM>
     [not found] ` <C5E2A328.36EE0%Daniel.Ferber@Sun.Com>
     [not found]   ` <015c01c9a5a7$cb2f83a0$618e8ae0$@com>
     [not found]     ` <52E982EB-3ADE-46FA-A7BC-D1645961DF0B@Sun.COM>
     [not found]       ` <018101c9a5b6$25dbc300$71934900$@com>
     [not found]         ` <48A6CF99-F630-4581-9FFC-05E3AFF66FA0@Sun.COM>
2009-03-16 12:56           ` Eric Barton [this message]
2009-03-18 20:31             ` [Lustre-devel] Oleg/Mike Work on Apps Metrics - FW: Mike Booth week ending 2009.03.15 Oleg Drokin
2009-03-31 18:51               ` Andreas Dilger
2009-03-31 20:58                 ` Oleg Drokin
2009-04-01  3:35                 ` di wang
2009-04-01  3:55                   ` Michael Booth
2009-04-01  4:34                     ` Oleg Drokin
2009-04-01 11:41                       ` Michael Booth
2009-04-02 22:43                     ` Andreas Dilger
2009-04-03 18:27                       ` Michael Booth
2009-04-06 22:12                       ` di wang
2009-04-07  7:54                         ` Andreas Dilger
2009-04-01  5:01                 ` Eric Barton
2009-04-01  5:08                   ` Mike Booth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='023201c9a636$a987e4b0$fc97ae10$@com' \
    --to=eeb@sun.com \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.