public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Charles Samuels <charles@cariden.com>
To: "Ted Ts'o" <tytso@mit.edu>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Queuing of disk writes
Date: Mon, 4 Apr 2011 10:50:12 -0700	[thread overview]
Message-ID: <201104041050.12731.charles@cariden.com> (raw)
In-Reply-To: <20110404020235.GA4706@thunk.org>

Hi,

Thanks for the reply.

On Sunday, April 03, 2011 7:02:35 pm Ted Ts'o wrote:
> On Fri, Apr 01, 2011 at 12:59:53PM -0700, Charles Samuels wrote:
> > I have an application that is writing large amounts of very
> > fragmented data to harddrives. That is, I could write megabytes of
> > data in blocks of a few bytes scattered around a multi-gigabyte
> > file.
> 
> Doctor, doctor, it hurts when I do this....  any way you can avoid
> doing this?  What is your application doing at the high level.
Not really, I need the on-disk data organized in this pattern, so that the 
reads are optimized nicely. It's a database application.

> 
> > Obviously, doing this causes the harddrive to seek a lot and takes a
> > while.  From what I understand, if I allow linux to cache the
> > writes, it will fill up the kernel's write cache, and then
> > consequently the disk drive's DMA queue. As a result of that, the
> > harddrive can pick the correct order to do these writes,
> > significantly reducing seek times.
> 
> This is one way to avoid some of the seeks, yes.

What's another way? Other than not doing it :)

> Who or what is calling fsync()?  Is it being called by your
> application because you want to initiate writeout?  Or is it being
> called by some completely unrelated process?

It's being called by my own process. When fsync finishes, I update another file 
with some offset counters, fsync that, and with some luck, my writes are 
transactional.

> If it is being called by the application, one thing you can do is to
> use the Linux-specific system call sync_file_range().  You can use
> this to do asynchronous data flushes of the file, and control which
> range of bytes are written out, which can also help avoid flooding the
> disk with too many write requests.

What would be good use of sync_file_range? It looks pretty useful, but I don't 
know how to make good use of it.

For example, SYNC_FILE_RANGE_WRITE, wouldn't linux start this pretty much 
immediately? And wouldn't I really not want to give it a suggestion for what 
order it does it in?

Would calling sync_file_range with a flag that allows blocking have a 
performance benefit compared to fsync? Specifically, can I expect Linux to not 
totally block all reads and writes to other files?

Charles

  reply	other threads:[~2011-04-04 17:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-01 19:59 Queuing of disk writes Charles Samuels
2011-04-01 20:10 ` Alan Cox
2011-04-01 20:34   ` Charles Samuels
2011-04-01 20:39     ` Alan Cox
2011-04-04  2:02 ` Ted Ts'o
2011-04-04 17:50   ` Charles Samuels [this message]
2011-04-04 17:54     ` david
2011-04-05 19:37     ` Ted Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201104041050.12731.charles@cariden.com \
    --to=charles@cariden.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox