From: Greg Stark <gsstark@mit.edu>
To: Ingo Oeser <ioe-lkml@rameria.de>
Cc: Greg Stark <gsstark@mit.edu>,
Helge Hafting <helgehaf@aitel.hist.no>,
Joel Becker <Joel.Becker@oracle.com>,
Jamie Lokier <jamie@shareable.org>,
Trond Myklebust <trond.myklebust@fys.uio.no>,
Ulrich Drepper <drepper@redhat.com>,
Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: statfs() / statvfs() syscall ballsup...
Date: 16 Oct 2003 10:02:27 -0400 [thread overview]
Message-ID: <87smlt9t70.fsf@stark.dyndns.tv> (raw)
In-Reply-To: <200310161229.44861.ioe-lkml@rameria.de>
Ingo Oeser <ioe-lkml@rameria.de> writes:
> Hi there,
>
> first: I think the problem is solvable with mixing blocking and
> non-blocking IO or simply AIO, which will be supported nicely by 2.6.0,
> is a POSIX standard and is meant for doing your own IO scheduling.
I think aio could be very useful for databases, but not in this area. I think
it's useful as a more fine-grained tool than sync/fsync. Currently the
database has to fsync a file to commit a transaction, which means flushing
_all_writes to the file even ones from other transactions. If aio inserted
write barriers to the disk controller then it would provide a way to ensure
the current transaction is synced without having to flush all other
transactions writes at the same time.
But I don't see how it's useful for the problem I'm describing.
> On Wednesday 15 October 2003 17:03, Greg Stark wrote:
> > Ingo Oeser <ioe-lkml@rameria.de> writes:
> > > On Monday 13 October 2003 10:45, Helge Hafting wrote:
> > > > This is easier than trying to tell the kernel that the job is
> > > > less important, that goes wrong wether the job runs too much
> > > > or too little. Let that job sleep a little when its services
> > > > aren't needed, or when you need the disk bandwith elsewhere.
> >
> > Actually I think that's exactly backwards. The problem is that if the
> > user-space tries to throttle the process it doesn't know how much or when.
> > The kernel knows exactly when there are other higher priority writes, it
> > can schedule just enough writes from vacuum to not interfere.
>
> On dedicated servers this might be true. But on these you could also
> solve it in user space by measuring disk bandwidth and issueing just
> enough IO to keep up roughly with it.
Indeed we're discussing methods for doing that now. But this seems like a
awkward way to accomplish what the kernel could do very precisely. I don't see
why non-dedicated servers would be make priorities any less useful, in fact I
think that's exactly where they would shine.
> > So if vacuum slept a bit, say every 64k of data vacuumed. It could end up
> > sleeping when the disks are actually idle. Or it could be not sleeping
> > enough and still be interfering with transactions.
>
> The vacuum io is submitted (via AIO or simulation of it) normally in a
> unit U and waiting ALWAYS for U to complete, before submitting a new one.
> Between submitting units, the vacuums checks for outstanding transactions
> and stops, when we have one.
>
> Now a transaction is submitted and the submitting from vacuum is stopped
> by it existing. The transaction waits for completion (e.g. aio_suspend())
> and signals vacuum to continue.
User-space has no idea if disk i/o is occurring. The data the transaction
needs could be cached, or it could be on a different disk.
Besides, I think this is far too coarse-grained than what's needed.
Transactions sometimes run for seconds, minutes, or hours,, some of that time
is spent doing disk i/o and some of it doing cpu calculations. It can't stop
and signal another process every time it finishes reading a block and needs to
do a bit of calculation. Then context switch again a millisecond later so it
can read the next block...
And besides, this is would only useful on dedicated servers.
--
greg
next prev parent reply other threads:[~2003-10-16 14:02 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-10-09 22:16 statfs() / statvfs() syscall ballsup Trond Myklebust
2003-10-09 22:26 ` Linus Torvalds
2003-10-09 23:19 ` Ulrich Drepper
2003-10-10 0:22 ` viro
2003-10-10 4:49 ` Jamie Lokier
2003-10-10 5:26 ` Trond Myklebust
2003-10-10 12:37 ` Jamie Lokier
2003-10-10 13:46 ` Trond Myklebust
2003-10-10 14:35 ` Jamie Lokier
2003-10-10 15:32 ` Misc NFSv4 (was Re: statfs() / statvfs() syscall ballsup...) Trond Myklebust
2003-10-10 15:53 ` Jamie Lokier
2003-10-10 16:07 ` Trond Myklebust
2003-10-10 15:55 ` Michael Shuey
2003-10-10 16:20 ` Trond Myklebust
2003-10-10 16:45 ` J. Bruce Fields
2003-10-10 14:39 ` statfs() / statvfs() syscall ballsup Jamie Lokier
2003-10-09 23:31 ` Trond Myklebust
2003-10-10 12:27 ` Joel Becker
2003-10-10 14:59 ` Linus Torvalds
2003-10-10 15:27 ` Joel Becker
2003-10-10 16:00 ` Linus Torvalds
2003-10-10 16:26 ` Joel Becker
2003-10-10 16:50 ` Linus Torvalds
2003-10-10 17:33 ` Joel Becker
2003-10-10 17:51 ` Linus Torvalds
2003-10-10 18:13 ` Joel Becker
2003-10-10 16:27 ` Valdis.Kletnieks
2003-10-10 16:33 ` Chris Friesen
2003-10-10 17:04 ` Linus Torvalds
2003-10-10 17:07 ` Linus Torvalds
2003-10-10 17:21 ` Joel Becker
2003-10-10 16:01 ` Jamie Lokier
2003-10-10 16:33 ` Joel Becker
2003-10-10 16:58 ` Chris Friesen
2003-10-10 17:05 ` Trond Myklebust
2003-10-10 17:20 ` Joel Becker
2003-10-10 17:33 ` Chris Friesen
2003-10-10 17:40 ` Linus Torvalds
2003-10-10 17:54 ` Trond Myklebust
2003-10-10 18:05 ` Linus Torvalds
2003-10-10 20:40 ` Trond Myklebust
2003-10-10 21:09 ` Linus Torvalds
2003-10-10 22:17 ` Trond Myklebust
2003-10-11 2:53 ` Andrew Morton
2003-10-11 3:47 ` Trond Myklebust
2003-10-10 18:05 ` Joel Becker
2003-10-10 18:31 ` Andrea Arcangeli
2003-10-10 20:33 ` Helge Hafting
2003-10-10 20:07 ` Jamie Lokier
2003-10-12 15:31 ` Greg Stark
2003-10-12 16:13 ` Linus Torvalds
2003-10-12 22:09 ` Greg Stark
2003-10-13 8:45 ` Helge Hafting
2003-10-15 13:25 ` Ingo Oeser
2003-10-15 15:03 ` Greg Stark
2003-10-15 18:37 ` Helge Hafting
2003-10-16 10:29 ` Ingo Oeser
2003-10-16 14:02 ` Greg Stark [this message]
2003-10-21 11:47 ` Ingo Oeser
2003-10-10 18:20 ` Andrea Arcangeli
2003-10-10 18:36 ` Linus Torvalds
2003-10-10 19:03 ` Andrea Arcangeli
2003-10-09 23:16 ` Andreas Dilger
2003-10-09 23:24 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87smlt9t70.fsf@stark.dyndns.tv \
--to=gsstark@mit.edu \
--cc=Joel.Becker@oracle.com \
--cc=drepper@redhat.com \
--cc=helgehaf@aitel.hist.no \
--cc=ioe-lkml@rameria.de \
--cc=jamie@shareable.org \
--cc=linux-kernel@vger.kernel.org \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox