Re: Queuing of disk writes - Charles Samuels

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Charles Samuels <charles@cariden.com>
To: "Ted Ts'o" <tytso@mit.edu>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Queuing of disk writes
Date: Mon, 4 Apr 2011 10:50:12 -0700	[thread overview]
Message-ID: <201104041050.12731.charles@cariden.com> (raw)
In-Reply-To: <20110404020235.GA4706@thunk.org>

Hi,

Thanks for the reply.

On Sunday, April 03, 2011 7:02:35 pm Ted Ts'o wrote:
> On Fri, Apr 01, 2011 at 12:59:53PM -0700, Charles Samuels wrote:
> > I have an application that is writing large amounts of very
> > fragmented data to harddrives. That is, I could write megabytes of
> > data in blocks of a few bytes scattered around a multi-gigabyte
> > file.
> 
> Doctor, doctor, it hurts when I do this....  any way you can avoid
> doing this?  What is your application doing at the high level.
Not really, I need the on-disk data organized in this pattern, so that the 
reads are optimized nicely. It's a database application.

> 
> > Obviously, doing this causes the harddrive to seek a lot and takes a
> > while.  From what I understand, if I allow linux to cache the
> > writes, it will fill up the kernel's write cache, and then
> > consequently the disk drive's DMA queue. As a result of that, the
> > harddrive can pick the correct order to do these writes,
> > significantly reducing seek times.
> 
> This is one way to avoid some of the seeks, yes.

What's another way? Other than not doing it :)

> Who or what is calling fsync()?  Is it being called by your
> application because you want to initiate writeout?  Or is it being
> called by some completely unrelated process?

It's being called by my own process. When fsync finishes, I update another file 
with some offset counters, fsync that, and with some luck, my writes are 
transactional.

> If it is being called by the application, one thing you can do is to
> use the Linux-specific system call sync_file_range().  You can use
> this to do asynchronous data flushes of the file, and control which
> range of bytes are written out, which can also help avoid flooding the
> disk with too many write requests.

What would be good use of sync_file_range? It looks pretty useful, but I don't 
know how to make good use of it.

For example, SYNC_FILE_RANGE_WRITE, wouldn't linux start this pretty much 
immediately? And wouldn't I really not want to give it a suggestion for what 
order it does it in?

Would calling sync_file_range with a flag that allows blocking have a 
performance benefit compared to fsync? Specifically, can I expect Linux to not 
totally block all reads and writes to other files?

Charles

next prev parent reply	other threads:[~2011-04-04 17:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-01 19:59 Queuing of disk writes Charles Samuels
2011-04-01 20:10 ` Alan Cox
2011-04-01 20:34   ` Charles Samuels
2011-04-01 20:39     ` Alan Cox
2011-04-04  2:02 ` Ted Ts'o
2011-04-04 17:50   ` Charles Samuels [this message]
2011-04-04 17:54     ` david
2011-04-05 19:37     ` Ted Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201104041050.12731.charles@cariden.com \
    --to=charles@cariden.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.