cfq misbehaving on 2.6.11-1.14

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* cfq misbehaving on 2.6.11-1.14_FC3
@ 2005-06-10 22:54 spaminos-ker
  2005-06-11  9:29 ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: spaminos-ker @ 2005-06-10 22:54 UTC (permalink / raw)
  To: linux-kernel

Hello, I am running into a very bad problem on one of my production servers.

* the config
Linux Fedora core 3 latest everything, kernel 2.6.11-1.14_FC3
AMD Opteron 2 GHz, 1 G RAM, 80 GB Hard drive (IDE, Western Digital)

I have a log processor running in the background, it's using sqlite for storing
the information it finds in the logs. It takes a few hours to complete a run.
It's clearly I/O bound (SleepAVG = 98%, according to /proc/pid/status).
I have to use the cfq scheduler because it's the only scheduler that is fair
between processes (or should be, keep reading).

* the problem
Now, after an hour or so of processing, the machine becomes very unresponsive
when trying to do new disk operations. I say new because existing processes
that stream data to disk don't seem to suffer so much.

On the other hand, opening a blank new file in vi and saving it takes about 5
minutes or so.
Logging in with ssh just times out (so I have to keep a connection open to
avoid being locked out). << that's where it's a really bad problem for me :)

Now, if I switch the disk to anticipatory or deadline, by setting
/sys/block/hda/queue/scheduler, things go back to regular times very quickly.
Saving a file in vi takes about 12 seconds (slow, but not unbearable,
considering the machine is doing a lot of things).
Logging in takes less than a second.

I did a strace on the process that is causing havock, and the pattern of usage
is:
* open files
*
about 5000 of combinations of
llseek+read
llseek+write
in 1000 bytes requests.
* close files

The process is also niced to 8, but it doesn't seem to make any difference. I
found references to a "ionice" or "iorenice" syscall, but that doesn't seem to
exist anymore.
I thought that the i/o scheduler was taking the priority into account?

Is this a know problem? I also thought that timed cfq was supposed to take care
of such workloads?

Any idea on how I could improve the situation?

Thanks

Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-10 22:54 cfq misbehaving on 2.6.11-1.14_FC3 spaminos-ker
@ 2005-06-11  9:29 ` Andrew Morton
  2005-06-14  2:19   ` spaminos-ker
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2005-06-11  9:29 UTC (permalink / raw)
  To: spaminos-ker; +Cc: linux-kernel

<spaminos-ker@yahoo.com> wrote:
>
> Hello, I am running into a very bad problem on one of my production servers.
> 
>  * the config
>  Linux Fedora core 3 latest everything, kernel 2.6.11-1.14_FC3
>  AMD Opteron 2 GHz, 1 G RAM, 80 GB Hard drive (IDE, Western Digital)
> 
>  I have a log processor running in the background, it's using sqlite for storing
>  the information it finds in the logs. It takes a few hours to complete a run.
>  It's clearly I/O bound (SleepAVG = 98%, according to /proc/pid/status).
>  I have to use the cfq scheduler because it's the only scheduler that is fair
>  between processes (or should be, keep reading).
> 
>  * the problem
>  Now, after an hour or so of processing, the machine becomes very unresponsive
>  when trying to do new disk operations. I say new because existing processes
>  that stream data to disk don't seem to suffer so much.

It might be useful to test 2.6.12-rc6-mm1 - it has a substantially
rewritten CFQ implementation.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-11  9:29 ` Andrew Morton
@ 2005-06-14  2:19   ` spaminos-ker
  2005-06-14  7:03     ` Andrew Morton
  0 siblings, 1 reply; 16+ messages in thread
From: spaminos-ker @ 2005-06-14  2:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

--- Andrew Morton <akpm@osdl.org> wrote:
> It might be useful to test 2.6.12-rc6-mm1 - it has a substantially
> rewritten CFQ implementation.
> 

Just did, and while things seem to be a little better, cfq still gets
performance even worst than noop.

For this type of load, I think that cfq should get latencies much lower than
noop.

I ran an automated vi "write to file", to get a more persistant test, on the
different i/o scheduler.

while true ; do time vi -c '%s/a/aa/g' -c '%s/aa/a/g' -c 'x' /root/somefile >
/dev/null ; sleep 1m ; done

For some reason, doing a "cp" or appending to files is very fast. I suspect
that vi's mmap calls are the reason for the latency problem.

the times I got (to save a 200 bytes file on ext3) in seconds:

cfq 13,19,23,19,23,15,14,16,14 = 17.3 avg

deadline 7,12,11,15,15,8,17,14,16,11 = 12.6 avg

noop 23,12,14,12,12,13,14,14,14 = 14.2 avg

anticipatory 9,13,13,15,19,15,23,15,12 = 14.8 avg


Here is the memory status

top - 17:07:44 up  1:42,  1 user,  load average: 3.74, 3.62, 3.29
Tasks:  55 total,   2 running,  53 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0% us,  0.0% sy,  0.0% ni,  0.0% id, 99.0% wa,  1.0% hi,  0.0% si
Mem:   1035156k total,  1019344k used,    15812k free,    30092k buffers
Swap:  4192956k total,        0k used,  4192956k free,   671724k cached

and the disk activity (as you can see, mostly writes at this point, as I think
most of the data is cached in memory).

# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 2  1      0  20368  30320 670780    0    0    45  1189  498   201 26  2 20 52
 0  3      0  19376  30320 671916    0    0   128  1052  512   211 77  5  0 18
 0  3      0  19376  30320 671960    0    0     0  1220  543   231  3  0  0 97
 0  3      0  19128  30320 672136    0    0     0  2284  658   250 13  1  0 86
 0  3      0  19128  30320 672220    0    0     0  1160  535   222  7  0  0 93
 1  2      0  18880  30320 672376    0    0     0  1040  509   204 13  0  0 87
 0  3      0  18756  30320 672496    0    0     0  1076  514   210 11  1  0 88
 0  3      0  18260  30320 672680    0    0     0  1052  559   356 18  3  0 79
 1  1      0  19376  30328 671692    0    0     0   876  529   187 64  3  0 33
 1  3      0  18384  30340 672620    0    0   128  2856  515   197 64  5  0 31
 0  4      0  18136  30340 672856    0    0     0  1204  546   234 21  0  0 79
 0  4      0  18136  30340 672916    0    0     0  1124  530   231  5  2  0 93
 0  4      0  18136  30340 672976    0    0     0  2212  627   255  7  1  0 92
 0  4      0  18012  30340 673064    0    0     0  1092  523   235  7  1  0 92
 0  4      0  17888  30340 673228    0    0     0  1188  545   239 12  0  0 88
 1  3      0  17640  30340 673500    0    0     0  1092  515   229 26  0  0 74
 0  4      0  17392  30340 673684    0    0     0  1032  515   236 15  1  0 84
 1  1      0  17888  30348 672480    0    0     0  1560  568   249 41  4  0 55
 1  3      0  16896  30360 673524    0    0   128  1976  586   223 74  3  0 23
 0  4      0  16524  30360 673800    0    0     0  1112  522   233 25  1  0 74
 0  4      0  16524  30360 673844    0    0     0  1600  588   257  4  1  0 95



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-14  2:19   ` spaminos-ker
@ 2005-06-14  7:03     ` Andrew Morton
  2005-06-14 23:21       ` spaminos-ker
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2005-06-14  7:03 UTC (permalink / raw)
  To: spaminos-ker; +Cc: linux-kernel

<spaminos-ker@yahoo.com> wrote:
>
> --- Andrew Morton <akpm@osdl.org> wrote:
> > It might be useful to test 2.6.12-rc6-mm1 - it has a substantially
> > rewritten CFQ implementation.
> > 
> 
> Just did, and while things seem to be a little better, cfq still gets
> performance even worst than noop.
> 
> For this type of load, I think that cfq should get latencies much lower than
> noop.
> 
> I ran an automated vi "write to file", to get a more persistant test, on the
> different i/o scheduler.
> 
> while true ; do time vi -c '%s/a/aa/g' -c '%s/aa/a/g' -c 'x' /root/somefile >
> /dev/null ; sleep 1m ; done

Bear in mind that after one minute, all of vi's text may have been
reclaimed from pagecache, so the above would have to do a lot of randomish
reads to reload vi into memory.  Try reducing the sleep interval a lot.

> For some reason, doing a "cp" or appending to files is very fast. I suspect
> that vi's mmap calls are the reason for the latency problem.

Don't know.  Try to work out (from vmstat or diskstats) how much reading is
going on.

Try stracing the check, see if your version of vi is doing a sync() or
something odd like that.

> the times I got (to save a 200 bytes file on ext3) in seconds:
> 
> cfq 13,19,23,19,23,15,14,16,14 = 17.3 avg
> 
> deadline 7,12,11,15,15,8,17,14,16,11 = 12.6 avg
> 
> noop 23,12,14,12,12,13,14,14,14 = 14.2 avg
> 
> anticipatory 9,13,13,15,19,15,23,15,12 = 14.8 avg
> 

OK, well if the latency is mainly due to reads then one would hope that the
anticipatory scheduler would do better than that.

But what happened to this, from your first report?

> On the other hand, opening a blank new file in vi and saving it takes about 5
> minutes or so.

Are you able to reproduce that 5-minute stall in the more recent testing?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-14  7:03     ` Andrew Morton
@ 2005-06-14 23:21       ` spaminos-ker
  2005-06-17 14:10         ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: spaminos-ker @ 2005-06-14 23:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

--- Andrew Morton <akpm@osdl.org> wrote:
> > For some reason, doing a "cp" or appending to files is very fast. I suspect
> > that vi's mmap calls are the reason for the latency problem.
> 
> Don't know.  Try to work out (from vmstat or diskstats) how much reading is
> going on.
> 
> Try stracing the check, see if your version of vi is doing a sync() or
> something odd like that.

The read/write patterns of the background process is about 35% reads.

vi is indeed doing a sync on the open file, and that's where the time was
spend.
So I just changed my test to simply opening a file, writing some data in it and
calling flush on the fd.

I also reduced the sleep to 1s instead of 1m, and here are the results:

cfq: 20,20,21,21,20,22,20,20,18,21 - avg 20.3
noop: 12,12,12,13,5,10,10,12,12,13 - avg 11.1
deadline: 16,9,16,14,10,6,8,8,15,9 - avg 11.1
as: 6,11,14,11,9,15,16,9,8,9 - avg 10.8

As you can see, cfq stands out (and it should stand out the other way).

> OK, well if the latency is mainly due to reads then one would hope that the
> anticipatory scheduler would do better than that.

I suspect the latency is due to writes: it seems (and correct me if I am wrong)
that write requests are enqueued in one giant queue, thus the cfq algorithm can
not be applied to the requests.

Either that, or there is a different queue that cancels out the benefits of cfq
when writing (because even though the writes are down the right way, this other
queue to the device keeps way too much data).

But then, why would other i/o schedulers perform better in that case?

> 
> But what happened to this, from your first report?
> 
> > On the other hand, opening a blank new file in vi and saving it takes about
> 5
> > minutes or so.
> 
> Are you able to reproduce that 5-minute stall in the more recent testing?
> 
> 
The most I got with this kernel, is a 1 minute stall, so there is improvement
there. Yet, a single process should not be able to cause this kind of stall
with cfq.

Nicolas

------------------------------------------------------------
video meliora proboque deteriora sequor
------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-14 23:21       ` spaminos-ker
@ 2005-06-17 14:10         ` Jens Axboe
  2005-06-17 15:51           ` Andrea Arcangeli
  2005-06-17 23:01           ` spaminos-ker
  0 siblings, 2 replies; 16+ messages in thread
From: Jens Axboe @ 2005-06-17 14:10 UTC (permalink / raw)
  To: spaminos-ker; +Cc: Andrew Morton, linux-kernel

On Tue, Jun 14 2005, spaminos-ker@yahoo.com wrote:
> --- Andrew Morton <akpm@osdl.org> wrote:
> > > For some reason, doing a "cp" or appending to files is very fast. I suspect
> > > that vi's mmap calls are the reason for the latency problem.
> > 
> > Don't know.  Try to work out (from vmstat or diskstats) how much reading is
> > going on.
> > 
> > Try stracing the check, see if your version of vi is doing a sync() or
> > something odd like that.
> 
> The read/write patterns of the background process is about 35% reads.
> 
> vi is indeed doing a sync on the open file, and that's where the time
> was spend.  So I just changed my test to simply opening a file,
> writing some data in it and calling flush on the fd.
> 
> I also reduced the sleep to 1s instead of 1m, and here are the
> results:
> 
> cfq: 20,20,21,21,20,22,20,20,18,21 - avg 20.3 noop:
> 12,12,12,13,5,10,10,12,12,13 - avg 11.1 deadline:
> 16,9,16,14,10,6,8,8,15,9 - avg 11.1 as: 6,11,14,11,9,15,16,9,8,9 - avg
> 10.8
> 
> As you can see, cfq stands out (and it should stand out the other
> way).

This doesn't look good (or expected) at all. In the initial posting you
mention this being an ide driver - I want to make sure if it's hda or
sata driven (eg sda or similar)?

> > OK, well if the latency is mainly due to reads then one would hope that the
> > anticipatory scheduler would do better than that.
> 
> I suspect the latency is due to writes: it seems (and correct me if I
> am wrong) that write requests are enqueued in one giant queue, thus
> the cfq algorithm can not be applied to the requests.

That is correct. Each process has a sync queue associated with it, async
requests like writes go to a per-device async queue. The cost of
tracking who dirtied a given page was too large and not worth it.
Perhaps rmap could be used to lookup who has a specific page mapped...

> But then, why would other i/o schedulers perform better in that case?

Yeah, the global write queue doesn't explain anything, the other
schedulers either share read/write queue or have a seperate single write
queue as well.

I'll try and reproduce (and fix) your problem.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-17 14:10         ` Jens Axboe
@ 2005-06-17 15:51           ` Andrea Arcangeli
  2005-06-17 18:16             ` Jens Axboe
  2005-06-17 23:01           ` spaminos-ker
  1 sibling, 1 reply; 16+ messages in thread
From: Andrea Arcangeli @ 2005-06-17 15:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: spaminos-ker, Andrew Morton, linux-kernel

On Fri, Jun 17, 2005 at 04:10:40PM +0200, Jens Axboe wrote:
> Perhaps rmap could be used to lookup who has a specific page mapped...

I doubt, the computing and locking cost for every single page write
would be probably too high. Doing it during swapping isn't a big deal
since cpu is mostly idle during swapouts, but doing it all the time
sounds a bit overkill.

A mechanism to pass down a pid would be much better. However I'm unsure
where you could put the info while dirtying the page. If it was an uid
it might be reasonable to have it in the address_space, but if you want
a pid as index, then it'd need to go in the page_t, which would waste
tons of space. Having a pid in the address space, may not work well with
a database or some other app with multiple processes.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-17 15:51           ` Andrea Arcangeli
@ 2005-06-17 18:16             ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2005-06-17 18:16 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: spaminos-ker, Andrew Morton, linux-kernel

On Fri, Jun 17 2005, Andrea Arcangeli wrote:
> On Fri, Jun 17, 2005 at 04:10:40PM +0200, Jens Axboe wrote:
> > Perhaps rmap could be used to lookup who has a specific page mapped...
> 
> I doubt, the computing and locking cost for every single page write
> would be probably too high. Doing it during swapping isn't a big deal
> since cpu is mostly idle during swapouts, but doing it all the time
> sounds a bit overkill.

We could cut the lookup down to per-request, it's not very likely that
seperate threads would be competing for the exact same disk location.
But it's still not too nice...

> A mechanism to pass down a pid would be much better. However I'm unsure
> where you could put the info while dirtying the page. If it was an uid
> it might be reasonable to have it in the address_space, but if you want
> a pid as index, then it'd need to go in the page_t, which would waste
> tons of space. Having a pid in the address space, may not work well with
> a database or some other app with multiple processes.

The previous patch just added a pid_t to struct page, but I knew all
along that this was just for testing, I never intended to merge that
part.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-17 14:10         ` Jens Axboe
  2005-06-17 15:51           ` Andrea Arcangeli
@ 2005-06-17 23:01           ` spaminos-ker
  2005-06-22  9:24             ` Jens Axboe
  1 sibling, 1 reply; 16+ messages in thread
From: spaminos-ker @ 2005-06-17 23:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel

--- Jens Axboe <axboe@suse.de> wrote:
> This doesn't look good (or expected) at all. In the initial posting you
> mention this being an ide driver - I want to make sure if it's hda or
> sata driven (eg sda or similar)?

This is a regular IDE drive (a WDC WD800JB), no SATA, using hda

I didn't mention it before, but this is on a AMD8111 board.

> 
> I'll try and reproduce (and fix) your problem.

I don't know how all this works, but would there be a way to slow down the
offending writer by not allowing too many pending write requests per process?
Is there a tunable for the size of the write queue for a given device?
Reducing it will reduce the throughput, but the latency as well.

Of course, there has to be a way to get this to work right.

To go back to high latencies, maybe a different problem (but at least closely
related):

If I start in the background the command
dd if=/dev/zero of=/tmp/somefile2 bs=1024

and then run my test program in a loop, with
while true ; do time ./io 1; sleep 1s ; done

I get:

cfq: 47,33,27,48,32,29,26,49,25,47 -> 36.3 avg
deadline: 32,28,52,33,35,29,49,39,40,33 -> 37 avg
noop: 62,47,57,39,59,44,56,49,57,47 -> 51.7 avg

Now, cfq doesn't behave worst than the others, like expected (now, why it
behaved worst with the real daemons, I don't know).
Still > 30 seconds has to be improved for cfq.

the test program being:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(int argc, char **argv) {
        int fd,bytes;

        fd = open("/tmp/somefile", O_WRONLY | O_CREAT | O_CREAT, S_IRWXU);
        if (fd < 0) {
                perror("Could not open file");
                return 1;
        }
        bytes = write(fd, &fd, sizeof(fd));
        if (bytes < sizeof(fd)) {
                perror("Could not write");
                return 2;
        }
        if (argc != 1) {
                fsync(fd);
        }
        close(fd);
        return 0;
}


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-17 23:01           ` spaminos-ker
@ 2005-06-22  9:24             ` Jens Axboe
  2005-06-22 17:54               ` spaminos-ker
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2005-06-22  9:24 UTC (permalink / raw)
  To: spaminos-ker; +Cc: Andrew Morton, linux-kernel

On Fri, 2005-06-17 at 16:01 -0700, spaminos-ker@yahoo.com wrote:
> I don't know how all this works, but would there be a way to slow down the
> offending writer by not allowing too many pending write requests per process?
> Is there a tunable for the size of the write queue for a given device?
> Reducing it will reduce the throughput, but the latency as well.

The 2.4 SUSE kernel actually has something in place to limit in-flight
write requests against a single device. cfq will already limit the
number of write requests you can have in-flight against a single queue,
but it's request based and not size based.

> Of course, there has to be a way to get this to work right.
> 
> To go back to high latencies, maybe a different problem (but at least closely
> related):
> 
> If I start in the background the command
> dd if=/dev/zero of=/tmp/somefile2 bs=1024
> 
> and then run my test program in a loop, with
> while true ; do time ./io 1; sleep 1s ; done
> 
> I get:
> 
> cfq: 47,33,27,48,32,29,26,49,25,47 -> 36.3 avg
> deadline: 32,28,52,33,35,29,49,39,40,33 -> 37 avg
> noop: 62,47,57,39,59,44,56,49,57,47 -> 51.7 avg
> 
> Now, cfq doesn't behave worst than the others, like expected (now, why it
> behaved worst with the real daemons, I don't know).
> Still > 30 seconds has to be improved for cfq.

THe problem here is that cfq  (and the other io schedulers) still
consider the io async even if fsync() ends up waiting for it to
complete. So there's no real QOS being applied to these pending writes,
and I don't immediately see how we can improve that situation right now.

What file system are you using? I ran your test on ext2, and it didn't
give me more than ~2 seconds latency for the fsync. Tried reiserfs now,
and it's in the 23-24 range.

-- 
Jens Axboe <axboe@suse.de>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-22  9:24             ` Jens Axboe
@ 2005-06-22 17:54               ` spaminos-ker
  2005-06-22 20:43                 ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: spaminos-ker @ 2005-06-22 17:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel

--- Jens Axboe <axboe@suse.de> wrote:
> THe problem here is that cfq  (and the other io schedulers) still
> consider the io async even if fsync() ends up waiting for it to
> complete. So there's no real QOS being applied to these pending writes,
> and I don't immediately see how we can improve that situation right now.
<I might sound stupid>
I still don't understand why async requests are in a different queue than the
sync ones?
Wouldn't it be simpler to consider all the IO the same, and like you pointed
out, consider synced IO to be equivalent to async + some sync (as in wait for
completion) call (fsync goes a little too far).
</I might sound stupid>

> 
> What file system are you using? I ran your test on ext2, and it didn't
> give me more than ~2 seconds latency for the fsync. Tried reiserfs now,
> and it's in the 23-24 range.
> 
I am using ext3 on Fedora Core 3.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-22 17:54               ` spaminos-ker
@ 2005-06-22 20:43                 ` Jens Axboe
  2005-06-23 18:30                   ` spaminos-ker
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2005-06-22 20:43 UTC (permalink / raw)
  To: spaminos-ker; +Cc: Andrew Morton, linux-kernel

On Wed, Jun 22 2005, spaminos-ker@yahoo.com wrote:
> --- Jens Axboe <axboe@suse.de> wrote:
> > THe problem here is that cfq  (and the other io schedulers) still
> > consider the io async even if fsync() ends up waiting for it to
> > complete. So there's no real QOS being applied to these pending writes,
> > and I don't immediately see how we can improve that situation right now.
> <I might sound stupid>
> I still don't understand why async requests are in a different queue than the
> sync ones?
> Wouldn't it be simpler to consider all the IO the same, and like you pointed
> out, consider synced IO to be equivalent to async + some sync (as in wait for
> completion) call (fsync goes a little too far).
> </I might sound stupid>

First, lets cover a little terminology. All io is really async in Linux,
the block io model is inherently async in nature. So sync io is really
just async io that is being waited on immediately. When I talk about
sync and async io in the context of the io scheduler, the sync io refers
to io that is wanted right away. That would be reads or direct writes.
The async io is something that we can complete at will, where latency
typically doesn't matter. That would be normal dirtying of data that
needs to be flushed to disk.

Another property of sync io in the io scheduler is that it usually
implies that another sync io request will follow immediately (well,
almost) after one has completed. So there's a depedency relation between
sync requests, that async requests don't share.

So there are different requirements for sync and async io. The io
scheduler tries to minimize latencies for async requests somewhat,
mainly just by making sure that it isn't starved for too long. However,
when you do an fsync, you want to complete lots of writes, but the io
scheduler doesn't get this info passed down. If you keep flooding the
queue with new writes, this could take quite a while to finish. We could
improve this situation by only flushing out the needed data, or just a
simple hack to onlu flush out already queued io (provided the fsync()
already made sure that the correct data is already queued).

I will try and play a little with this, it's definitely something that
would be interesting and worthwhile to improve.

> > What file system are you using? I ran your test on ext2, and it didn't
> > give me more than ~2 seconds latency for the fsync. Tried reiserfs now,
> > and it's in the 23-24 range.
> > 
> I am using ext3 on Fedora Core 3.

Journalled file systems will behave worse for this, because it has to
tend to the journal as well. Can you try mounting that partition as ext2
and see what numbers that gives you?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-22 20:43                 ` Jens Axboe
@ 2005-06-23 18:30                   ` spaminos-ker
  2005-06-23 23:33                     ` Con Kolivas
  0 siblings, 1 reply; 16+ messages in thread
From: spaminos-ker @ 2005-06-23 18:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel

--- Jens Axboe <axboe@suse.de> wrote:
> Journalled file systems will behave worse for this, because it has to
> tend to the journal as well. Can you try mounting that partition as ext2
> and see what numbers that gives you?

I did the tests again on a partition that I could mkfs/mount at will.

On ext3, I get about 33 seconds average latency.

And on ext2, as predicted, I have latencies in average of about 0.4 seconds.

I also tried reiserfs, and it gets about 22 seconds latency.

As you pointed out, it seems that there is a flow in the way IO queues and
journals (that are in some ways queues as well), interact in the presence of
flushes.

Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-23 18:30                   ` spaminos-ker
@ 2005-06-23 23:33                     ` Con Kolivas
  2005-06-24  2:33                       ` spaminos-ker
  0 siblings, 1 reply; 16+ messages in thread
From: Con Kolivas @ 2005-06-23 23:33 UTC (permalink / raw)
  To: linux-kernel, spaminos-ker; +Cc: Jens Axboe, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 913 bytes --]

On Fri, 24 Jun 2005 04:30, spaminos-ker@yahoo.com wrote:
> --- Jens Axboe <axboe@suse.de> wrote:
> > Journalled file systems will behave worse for this, because it has to
> > tend to the journal as well. Can you try mounting that partition as ext2
> > and see what numbers that gives you?
>
> I did the tests again on a partition that I could mkfs/mount at will.
>
> On ext3, I get about 33 seconds average latency.
>
> And on ext2, as predicted, I have latencies in average of about 0.4
> seconds.
>
> I also tried reiserfs, and it gets about 22 seconds latency.
>
> As you pointed out, it seems that there is a flow in the way IO queues and
> journals (that are in some ways queues as well), interact in the presence
> of flushes.

I found the same, and the effect was blunted by noatime and 
journal_data_writeback (on ext3). Try them one at a time and see what you 
get.

Cheers,
Con

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-23 23:33                     ` Con Kolivas
@ 2005-06-24  2:33                       ` spaminos-ker
  2005-06-24  3:27                         ` Con Kolivas
  0 siblings, 1 reply; 16+ messages in thread
From: spaminos-ker @ 2005-06-24  2:33 UTC (permalink / raw)
  To: Con Kolivas, linux-kernel; +Cc: Jens Axboe, Andrew Morton

--- Con Kolivas <kernel@kolivas.org> wrote:
> I found the same, and the effect was blunted by noatime and 
> journal_data_writeback (on ext3). Try them one at a time and see what you 
> get.

I had to move to a different box, but get the same kind of results (for ext3
default mount options).

Here are the latencies (all cfq) I get with different values for the mount
parameters

ext2 default
0.1s

ext3 default
52.6s avg

reiser defaults
29s avg 5 minutes
then, 
12.9s avg

ext3 rw,noatime,data=writeback
0.1s avg

reiser rw,noatime,data=writeback
4s avg for 20 seconds
then 0.1 seconds avg


So, indeed adding noatime,data=writeback to the mount options improves things a
lot.
I also tried without the noatime, and that doesn't make much difference to me.

That looks like a good workaround, I'll now try with the actual server and see
how things go.

Nicolas


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cfq misbehaving on 2.6.11-1.14_FC3
  2005-06-24  2:33                       ` spaminos-ker
@ 2005-06-24  3:27                         ` Con Kolivas
  0 siblings, 0 replies; 16+ messages in thread
From: Con Kolivas @ 2005-06-24  3:27 UTC (permalink / raw)
  To: spaminos-ker; +Cc: linux-kernel, Jens Axboe, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1434 bytes --]

On Fri, 24 Jun 2005 12:33, spaminos-ker@yahoo.com wrote:
> --- Con Kolivas <kernel@kolivas.org> wrote:
> > I found the same, and the effect was blunted by noatime and
> > journal_data_writeback (on ext3). Try them one at a time and see what you
> > get.
>
> I had to move to a different box, but get the same kind of results (for
> ext3 default mount options).
>
> Here are the latencies (all cfq) I get with different values for the mount
> parameters
>
> ext2 default
> 0.1s
>
> ext3 default
> 52.6s avg
>
> reiser defaults
> 29s avg 5 minutes
> then,
> 12.9s avg
>
> ext3 rw,noatime,data=writeback
> 0.1s avg
>
> reiser rw,noatime,data=writeback
> 4s avg for 20 seconds
> then 0.1 seconds avg
>
>
> So, indeed adding noatime,data=writeback to the mount options improves
> things a lot.
> I also tried without the noatime, and that doesn't make much difference to
> me.
>
> That looks like a good workaround, I'll now try with the actual server and
> see how things go.

That's more or less what I found, although I found noatime also helped my test 
cases, but also less than the journal options. Coincidentally I only 
discovered this recently and hadn't gotten around to telling anyone how 
dramatic this was and this seemed as good a time as any. I am suspicious that 
it wasn't this bad in past kernels but haven't been able to instrument 
earlier kernels to check.

Cheers,
Con

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2005-06-24  3:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-10 22:54 cfq misbehaving on 2.6.11-1.14_FC3 spaminos-ker
2005-06-11  9:29 ` Andrew Morton
2005-06-14  2:19   ` spaminos-ker
2005-06-14  7:03     ` Andrew Morton
2005-06-14 23:21       ` spaminos-ker
2005-06-17 14:10         ` Jens Axboe
2005-06-17 15:51           ` Andrea Arcangeli
2005-06-17 18:16             ` Jens Axboe
2005-06-17 23:01           ` spaminos-ker
2005-06-22  9:24             ` Jens Axboe
2005-06-22 17:54               ` spaminos-ker
2005-06-22 20:43                 ` Jens Axboe
2005-06-23 18:30                   ` spaminos-ker
2005-06-23 23:33                     ` Con Kolivas
2005-06-24  2:33                       ` spaminos-ker
2005-06-24  3:27                         ` Con Kolivas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox