public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@suse.de>
To: Andrew Morton <akpm@osdl.org>
Cc: neilb@suse.de, nathans@sgi.com, linux-kernel@vger.kernel.org,
	drepper@redhat.com, mtk-manpages@gmx.net,
	nickpiggin@yahoo.com.au
Subject: Re: [patch 1/1] sys_sync_file_range()
Date: Mon, 3 Apr 2006 16:31:00 +0200	[thread overview]
Message-ID: <20060403143100.GH4647@suse.de> (raw)
In-Reply-To: <20060403012753.218db397.akpm@osdl.org>

On Mon, Apr 03 2006, Andrew Morton wrote:
> Jens Axboe <axboe@suse.de> wrote:
> >
> > On Mon, Apr 03 2006, Neil Brown wrote:
> > > On Friday March 31, nathans@sgi.com wrote:
> > > > On Thu, Mar 30, 2006 at 06:58:46PM +1100, Neil Brown wrote:
> > > > > On Wednesday March 29, akpm@osdl.org wrote:
> > > > > > Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
> > > > > > fadvise() additions, do it in a new sys_sync_file_range() syscall
> > > > > > instead. 
> > > > > 
> > > > > Hmmm... any chance this could be split into a sys_sync_file_range and
> > > > > a vfs_sync_file_range which takes a 'struct file*' and does less (or
> > > > > no) sanity checking, so I can call it from nfsd?
> > > > > 
> > > > > Currently I implement COMMIT (which has a range) with a by messing
> > > > > around with filemap_fdatawrite and filemap_fdatawait (ignoring the
> > > > > range) and I'd rather than a vfs helper.
> > > > 
> > > > I'm not 100% sure, but it looks like the PF_SYNCWRITE process flag
> > > > should be set on the nfsd's while they're doing that, which doesn't
> > > > seem to be happening atm.  Looks like a couple of the IO schedulers
> > > > will make use of that knowledge now.  All the more reason for a VFS
> > > > helper here I guess. ;)
> > > 
> > > PF_SYNCWRITE? What's that???
> > > 
> > > (find | xargs grep ...)
> > > Oh.  The block device schedulers like to know if a request is sync or
> > > async (and all reads are assumed to be sync) - which is reasonable -
> > > and so have a per-task flag to tell them - which isn't (IMO).
> > > 
> > > md/raid (particularly raid5) often does the write from a different
> > > process than generated the original request, so that will break
> > > completely. 
> > 
> > I don't think any disagrees with you, the sync-write process flag is
> > indeed an atrocious beast...
> 
> Yeah.  PF_SYNCWRITE was a performance tweak for the anticipatory scheduler.
> As cfq is using it as well now (hopefully to good effect) I guess it could
> be formalised more.

Yup, both 'as' and 'cfq' would prefer to just look at a SYNC bio flag
instead. But the logic itself is definitely needed.

> > > What is wrong with a bio flag I wonder....
> > 
> > Nothing, in fact I would love for it to be changed. I'm sure such a
> > patch would be accepted with open arms! :-)
> 
> I think once someone starts coding it, they'll become a big fan of
> PF_SYNCWRITE...

They might not become a big fan, but they'll surely appreciate the
simplicity of it :-)

> For the page writeback functions it's probably possible to use
> writeback_control.sync_mode=WB_SYNC_ALL as a trigger, propagate that into
> the IO layer.  Maybe that'll always be sufficient - it's hard to tell.  The
> writeback paths are twisty and deep...
> 
> Then again, using WB_SYNC_ALL as a hint that this process will be waiting
> for this writeout to complete is a bit hacky too - it doesn't _really_ mean
> that - it just means that I/O should be _started_ against all relevant
> dirty data.
> 
> Good luck ;)

It's not a hard problem, but it will definitely cost a little sweat to
go through. I'm sure Neil could pull it off, the question is more if he
wants to :-)

-- 
Jens Axboe


  reply	other threads:[~2006-04-03 14:30 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-30  7:41 [patch 1/1] sys_sync_file_range() akpm
2006-03-30  7:58 ` Neil Brown
2006-03-30  8:11   ` Andrew Morton
2006-03-30  8:32   ` Andrew Morton
2006-03-30  8:55     ` Neil Brown
2006-03-30 15:31     ` OGAWA Hirofumi
2006-03-30 20:19       ` Andrew Morton
2006-03-30 21:17   ` Nathan Scott
2006-04-03  1:24     ` Neil Brown
2006-04-03  7:50       ` Jens Axboe
2006-04-03  8:27         ` Andrew Morton
2006-04-03 14:31           ` Jens Axboe [this message]
2006-04-03 21:21             ` Andrew Morton
2006-04-04 20:50 ` Luck, Tony
2006-04-04 21:00   ` Andrew Morton
2006-04-04 21:01   ` Randy.Dunlap
2006-04-04 21:03   ` Ulrich Drepper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060403143100.GH4647@suse.de \
    --to=axboe@suse.de \
    --cc=akpm@osdl.org \
    --cc=drepper@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtk-manpages@gmx.net \
    --cc=nathans@sgi.com \
    --cc=neilb@suse.de \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox