Errors in the VM - detailed

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Errors in the VM - detailed
@ 2002-01-31 15:05 Roy Sigurd Karlsbakk
  2002-01-31 15:44 ` David Mansfield
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-01-31 15:05 UTC (permalink / raw)
  To: linux-kernel

hi all

The last month or so, I've been trying to make a particular configuration work
with Linux-2.4.17 and other 2.4.x kernels. Two major bugs have been blocking
my way into the light. Below follows a detailed description on both bugs. One
of them seems to be solved in the latests -rmap patches. The other is still
unsolved.

CONFIGURATION INTRO

The test has been performed on two equally configured computers, giving the
same results, telling the chance of hardware failure is rather small.

Config:

1xAthlon 1133
2x512MB (1GB) SRAM
Asus A7S-VM motherboard with
	Realtek 10/100Mbps card
	ATA100
	Sound+VGA+USB+other crap
1xPromise ATA133 controller
2xWDC 120GB drives (with ATA100 cabeling connected to Promise controller)
1xWDC 20GB drive (connected to motherboard - configured as the boot device)
1xIntel desktop gigE card (e1000 driver - modular)

Server is configured with console on serial port
Highmem is disabled
The two 120GB drives is configured in RAID-0 with chunk size [256|512|1024]
I have tried several different file systems - same error

Versions tested:

Linux-2.4.1([3-7]|8-pre.) tested. All buggy. Bug #1 was fixed in -rmap11c

TEST SETUP

Reading 100 500MB files with dd, tux, apache, cat, something, and redirecting
the output to /dev/null. With tux/apache, I used another computer using wget
to retrieve the same amount of data.

The test scripts look something like this

#!/bin/bash
dd if=file0000 of=/dev/null &
dd if=file0001 of=/dev/null &
dd if=file0002 of=/dev/null &
dd if=file0003 of=/dev/null &
...
dd if=file0099 of=/dev/null &

or similar - just with wget -O /dev/null ... &

BUGS

Bug #1:

When (RAMx2) bytes has been read from disk, I/O as reported from vmstat drops
to a mere 1MB/s

When reading starts, the speed is initially high. Then, slowly, the speed
decreases until it goes to something close to a complete halt (see output from
vmstat below).

# vmstat 2
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy id
 0 200  1   1676   3200   3012 786004   0 292 42034   298  791   745   4  29 67
 0 200  1   1676   3308   3136 785760   0   0 44304     0  748   758   3  15 82
 0 200  1   1676   3296   3232 785676   0   0 44236     0  756   710   2  23 75
 0 200  1   1676   3304   3356 785548   0   0 38662    70  778   791   3  19 78
 0 200  1   1676   3200   3456 785552   0   0 33536     0  693   594   3  13 84
 1 200  0   1676   3224   3528 785192   0   0 35330    24  794   712   3  16 81
 0 200  0   1676   3304   3736 784324   0   0 30524    74  725   793  12  14 74
 0 200  0   1676   3256   3796 783664   0   0 29984     0  718   826   4  10 86
 0 200  0   1676   3288   3868 783592   0   0 25540   152  763   812   3  17 80
 0 200  0   1676   3276   3908 783472   0   0 22820     0  693   731   0   7 92
 0 200  0   1676   3200   3964 783540   0   0 23312     6  759   827   4  11 85
 0 200  0   1676   3308   3984 783452   0   0 17506     0  687   697   0  11 89
 0 200  0   1676   3388   4012 783888   0   0 14512     0  671   638   1   5 93
 0 200  0   2188   3208   4048 784156   0 512 16104   548  707   833   2  10 88
 0 200  0   3468   3204   4048 784788   0  66  8220    66  628   662   0   3 96
 0 200  0   3468   3296   4060 784680   0   0  1036     6  687   714   1   6 93
 0 200  0   3468   3316   4060 784668   0   0  1018     0  613   631   1   2 97
 0 200  0   3468   3292   4060 784688   0   0  1034     0  617   638   0   3 97
 0 200  0   3468   3200   4068 784772   0   0  1066     6  694   727   2   4 94

Bug #2:

Doing the same test on Rik's -rmap(.*) somehow fixes Bug #1, and makes room
for another bug to come out.

Doing the same test, I can, with -rmap, get some 33-35MB/s sustained from
/dev/md0 to memory. This is all good, but when doing this test, only 40 of the
original processes ever finish. The same error occurs both locally (dd) and
remotely (tux). If new i/o requests is issued to the same device, they don't
hang. If tux is restarted, it works fine afterwards.

Please - anyone - help me with this. I've been trying to setup this system for
almost two months now, fighting various bugs.

Best regards

roy

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-01-31 15:05 Errors in the VM - detailed Roy Sigurd Karlsbakk
@ 2002-01-31 15:44 ` David Mansfield
  2002-01-31 20:21 ` Roger Larsson
  2002-02-01 17:39 ` Denis Vlasenko
  2 siblings, 0 replies; 27+ messages in thread
From: David Mansfield @ 2002-01-31 15:44 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: linux-kernel

On Thu, 31 Jan 2002, Roy Sigurd Karlsbakk wrote:

> hi all
> 
> The last month or so, I've been trying to make a particular configuration work
> with Linux-2.4.17 and other 2.4.x kernels. Two major bugs have been blocking
> my way into the light. Below follows a detailed description on both bugs. One
> of them seems to be solved in the latests -rmap patches. The other is still
> unsolved.
> 
> CONFIGURATION INTRO
> 

My config:

Athlon 1400mhz, 512mb ram, single HD seagate ST360020A 60GB ATA100.  I am 
running the 2.4.17rc2aa2 kernel, which many on the list (and I will 
second) have stated to be a very excellent kernel.  I noticed you haven't 
tried the aa kernels.  You should.

I'm *not* running sw raid however, and this may be the significant factor, 
have you tested your drives singly (without the raid?).

I created 100 100mb files (I don't have enough free space to do anything 
else) using dd if=/dev/zero of=file???.  I did this sequentially.  Then I 
wrote a  second script to use dd if=file??? of=/dev/null & and started 100 
reader in parallel.  There were no stalls in the read from beginning to 
end, my system maintained about 6-8Mb/s xfer rate throughout the test.  
That's about what I would expect for 100 simultaneous readers.

In your tests, are you sure that you are synchronising the starting of 
your reader processes?  Maybe you are seeing the first readers getting 
started first (and you have less seeks ruining your I/O bandwidth) and 
then as they get going, the additional seeks ruin everything.  I honestly 
think this is unlikely, since your I/O level does drop to a disgustingly 
low level.

Hope this helps.

David

-- 
/==============================\
| David Mansfield              |
| david@cobite.com             |
\==============================/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-01-31 15:05 Errors in the VM - detailed Roy Sigurd Karlsbakk
  2002-01-31 15:44 ` David Mansfield
@ 2002-01-31 20:21 ` Roger Larsson
  2002-01-31 20:29   ` Jens Axboe
  2002-02-01 17:39 ` Denis Vlasenko
  2 siblings, 1 reply; 27+ messages in thread
From: Roger Larsson @ 2002-01-31 20:21 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk, linux-kernel

On Thursday den 31 January 2002 16.05, Roy Sigurd Karlsbakk wrote:
> hi all
> - - -
> Versions tested:
>
> Linux-2.4.1([3-7]|8-pre.) tested. All buggy. Bug #1 was fixed in -rmap11c
>
> TEST SETUP
>
> Reading 100 500MB files with dd, tux, apache, cat, something, and
> redirecting the output to /dev/null. With tux/apache, I used another
> computer using wget to retrieve the same amount of data.
>
> The test scripts look something like this
>
> #!/bin/bash
> dd if=file0000 of=/dev/null &
> dd if=file0001 of=/dev/null &
> dd if=file0002 of=/dev/null &
> dd if=file0003 of=/dev/null &
> ...
> dd if=file0099 of=/dev/null &
>
> or similar - just with wget -O /dev/null ... &
>
> BUGS
>
> Bug #1:
>
> When (RAMx2) bytes has been read from disk, I/O as reported from vmstat
> drops to a mere 1MB/s
>
> When reading starts, the speed is initially high. Then, slowly, the speed
> decreases until it goes to something close to a complete halt (see output
> from vmstat below).
>

Wait a minute - it might be readahead that gets killed.
If I remember correctly READA requests are dropped when failing to allocate 
space for it - yes I did...

/usr/src/develop/linux/drivers/block/ll_rw_block.c:746 (earlier kernel)

	/*
	 * Grab a free request from the freelist - if that is empty, check
	 * if we are doing read ahead and abort instead of blocking for
	 * a free slot.
	 */
get_rq:
	if (freereq) {
		req = freereq;
		freereq = NULL;
	} else if ((req = get_request(q, rw)) == NULL) {
		spin_unlock_irq(&io_request_lock);
		if (rw_ahead)
			goto end_io;

		freereq = __get_request_wait(q, rw);
		goto again;
	}

Suppose we fail with get_request, the request is a rw_ahead,
it quits... => no read ahead.

Try to add a prink there...
		if (rw_ahead) {
			printk("Skipping readahead...\n");
			goto end_io;
		}

Can it be the problem???

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-01-31 20:21 ` Roger Larsson
@ 2002-01-31 20:29   ` Jens Axboe
  2002-01-31 20:43     ` Andrew Morton
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2002-01-31 20:29 UTC (permalink / raw)
  To: Roger Larsson; +Cc: Roy Sigurd Karlsbakk, linux-kernel

On Thu, Jan 31 2002, Roger Larsson wrote:
> Wait a minute - it might be readahead that gets killed.
> If I remember correctly READA requests are dropped when failing to allocate 
> space for it - yes I did...

s/allocate/retrieve

No allocation takes place.

> /usr/src/develop/linux/drivers/block/ll_rw_block.c:746 (earlier kernel)
> 
> 	/*
> 	 * Grab a free request from the freelist - if that is empty, check
> 	 * if we are doing read ahead and abort instead of blocking for
> 	 * a free slot.
> 	 */
> get_rq:
> 	if (freereq) {
> 		req = freereq;
> 		freereq = NULL;
> 	} else if ((req = get_request(q, rw)) == NULL) {
> 		spin_unlock_irq(&io_request_lock);
> 		if (rw_ahead)
> 			goto end_io;
> 
> 		freereq = __get_request_wait(q, rw);
> 		goto again;
> 	}
> 
> Suppose we fail with get_request, the request is a rw_ahead,
> it quits... => no read ahead.
> 
> Try to add a prink there...
> 		if (rw_ahead) {
> 			printk("Skipping readahead...\n");
> 			goto end_io;
> 		}

That will trigger _all the time_ even on a moderately busy machine.
Checking if tossing away read-ahead is the issue is probably better
tested with just increasing the request slots. Roy, please try and change
the queue_nr_requests assignment in ll_rw_blk:blk_dev_init() to
something like 2048.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-01-31 20:29   ` Jens Axboe
@ 2002-01-31 20:43     ` Andrew Morton
  2002-01-31 21:37       ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2002-01-31 20:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Roger Larsson, Roy Sigurd Karlsbakk, linux-kernel

Jens Axboe wrote:
> 
> That will trigger _all the time_ even on a moderately busy machine.
> Checking if tossing away read-ahead is the issue is probably better
> tested with just increasing the request slots. Roy, please try and change
> the queue_nr_requests assignment in ll_rw_blk:blk_dev_init() to
> something like 2048.
> 

heh.  Yep, Roger finally nailed it, I think.

Roy says the bug was fixed in rmap11c.  Changelog says:

rmap 11c:
  ...
  - elevator improvement                                  (Andrew Morton)

Which includes:

-       queue_nr_requests = 64;
-       if (total_ram > MB(32))
-               queue_nr_requests = 128;                                                                 +       queue_nr_requests = (total_ram >> 9) & ~15;     /* One per half-megabyte */
+       if (queue_nr_requests < 32)
+               queue_nr_requests = 32;
+       if (queue_nr_requests > 1024)
+               queue_nr_requests = 1024;

So Roy is running with 1024 requests.

The question is (sorry, Roy): does this need fixing?

The only thing which can trigger it is when we have
zillions of threads doing reads (or zillions of outstanding
aio read requests) or when there are a large number of
unmerged write requests in the elevator.  It's a rare
case.

If we _do_ need a fix, then perhaps we should just stop
using READA in the readhead code?  readahead is absolutely
vital to throughput, and best-effort request allocation
just isn't good enough.

Thoughts?

-

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-01-31 20:43     ` Andrew Morton
@ 2002-01-31 21:37       ` Jens Axboe
  2002-02-01 16:05         ` Roy Sigurd Karlsbakk
  2002-02-01 16:11         ` Roy Sigurd Karlsbakk
  0 siblings, 2 replies; 27+ messages in thread
From: Jens Axboe @ 2002-01-31 21:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Roger Larsson, Roy Sigurd Karlsbakk, linux-kernel

On Thu, Jan 31 2002, Andrew Morton wrote:
> rmap 11c:
>   ...
>   - elevator improvement                                  (Andrew Morton)
> 
> Which includes:
> 
> -       queue_nr_requests = 64;
> -       if (total_ram > MB(32))
> -               queue_nr_requests = 128;                                                                 +       queue_nr_requests = (total_ram >> 9) & ~15;     /* One per half-megabyte */
> +       if (queue_nr_requests < 32)
> +               queue_nr_requests = 32;
> +       if (queue_nr_requests > 1024)
> +               queue_nr_requests = 1024;
> 
> 
> So Roy is running with 1024 requests.

Ah yes, of course.

> The question is (sorry, Roy): does this need fixing?
> 
> The only thing which can trigger it is when we have
> zillions of threads doing reads (or zillions of outstanding
> aio read requests) or when there are a large number of
> unmerged write requests in the elevator.  It's a rare
> case.

Indeed.

> If we _do_ need a fix, then perhaps we should just stop
> using READA in the readhead code?  readahead is absolutely
> vital to throughput, and best-effort request allocation
> just isn't good enough.

Hmm well. Maybe just a small pool of requests set aside for READA would
be a better idea. That way "normal" reads are not able to starve READA
completely.

Something ala this, completely untested. Will try and boot it now :-)
Roy, could you please test? It's against 2.4.18-pre7, I'll boot it now
as well...

--- /opt/kernel/linux-2.4.18-pre7/include/linux/blkdev.h	Mon Nov 26 14:29:17 2001
+++ linux/include/linux/blkdev.h	Thu Jan 31 22:29:01 2002
@@ -74,9 +74,9 @@
 struct request_queue
 {
 	/*
-	 * the queue request freelist, one for reads and one for writes
+	 * the queue request freelist, one for READ, WRITE, and READA
 	 */
-	struct request_list	rq[2];
+	struct request_list	rq[3];
 
 	/*
 	 * Together with queue_head for cacheline sharing
--- /opt/kernel/linux-2.4.18-pre7/drivers/block/ll_rw_blk.c	Sun Jan 27 16:06:31 2002
+++ linux/drivers/block/ll_rw_blk.c	Thu Jan 31 22:36:24 2002
@@ -333,8 +333,10 @@
 
 	INIT_LIST_HEAD(&q->rq[READ].free);
 	INIT_LIST_HEAD(&q->rq[WRITE].free);
+	INIT_LIST_HEAD(&q->rq[READA].free);
 	q->rq[READ].count = 0;
 	q->rq[WRITE].count = 0;
+	q->rq[READA].count = 0;
 
 	/*
 	 * Divide requests in half between read and write
@@ -352,6 +354,20 @@
 		q->rq[i&1].count++;
 	}
 
+	for (i = 0; i < queue_nr_requests / 4; i++) {
+		rq = kmem_cache_alloc(request_cachep, SLAB_KERNEL);
+		/*
+		 * hey well, this needs better checking (as well as the above)
+		 */
+		if (!rq)
+			break;
+
+		memset(rq, 0, sizeof(struct request));
+		rq->rq_status = RQ_INACTIVE;
+		list_add(&rq->queue, &q->rq[READA].free);
+		q->rq[READA].count++;
+	}
+
 	init_waitqueue_head(&q->wait_for_request);
 	spin_lock_init(&q->queue_lock);
 }
@@ -752,12 +768,18 @@
 		req = freereq;
 		freereq = NULL;
 	} else if ((req = get_request(q, rw)) == NULL) {
-		spin_unlock_irq(&io_request_lock);
+
 		if (rw_ahead)
-			goto end_io;
+			req = get_request(q, READA);
 
-		freereq = __get_request_wait(q, rw);
-		goto again;
+		spin_unlock_irq(&io_request_lock);
+
+		if (!req && rw_ahead)
+			goto end_io;
+		else if (!req) {
+			freereq = __get_request_wait(q, rw);
+			goto again;
+		}
 	}
 
 /* fill up the request-info, and add it to the queue */
@@ -1119,7 +1141,7 @@
 	 */
 	queue_nr_requests = 64;
 	if (total_ram > MB(32))
-		queue_nr_requests = 128;
+		queue_nr_requests = 256;
 
 	/*
 	 * Batch frees according to queue length

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
       [not found] <OF675D993F.933C6CB9-ON88256B52.00595CCC@boulder.ibm.com>
@ 2002-02-01 11:52 ` Roy Sigurd Karlsbakk
  0 siblings, 0 replies; 27+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-02-01 11:52 UTC (permalink / raw)
  To: James Washer; +Cc: linux-kernel

> I'm looking into this a bit.. One question, you seem to have 200 processes
> waiting on IO, That's interesting since it is exactly double your 100
> readers.. Any idea what those other 100 are? btw, would you mind repeating
> this with a tiny little c program that just reads the data, and doesn't
> write the data out anywhere??

sorry... I've been doing testing with both 200 and 100 processes. The
numbers on the vmstat was from another run

>
> let me know
>
>  - jim
>
> Roy Sigurd Karlsbakk <roy@karlsbakk.net>@vger.kernel.org on 01/31/2002
> 07:05:12 AM
>
> Sent by:    linux-kernel-owner@vger.kernel.org
>
>
> To:    <linux-kernel@vger.kernel.org>
> cc:
> Subject:    Errors in the VM - detailed
>
>
>
> hi all
>
> The last month or so, I've been trying to make a particular configuration
> work
> with Linux-2.4.17 and other 2.4.x kernels. Two major bugs have been
> blocking
> my way into the light. Below follows a detailed description on both bugs.
> One
> of them seems to be solved in the latests -rmap patches. The other is still
> unsolved.
>
> CONFIGURATION INTRO
>
> The test has been performed on two equally configured computers, giving the
> same results, telling the chance of hardware failure is rather small.
>
> Config:
>
> 1xAthlon 1133
> 2x512MB (1GB) SRAM
> Asus A7S-VM motherboard with
>  Realtek 10/100Mbps card
>  ATA100
>  Sound+VGA+USB+other crap
> 1xPromise ATA133 controller
> 2xWDC 120GB drives (with ATA100 cabeling connected to Promise controller)
> 1xWDC 20GB drive (connected to motherboard - configured as the boot device)
> 1xIntel desktop gigE card (e1000 driver - modular)
>
> Server is configured with console on serial port
> Highmem is disabled
> The two 120GB drives is configured in RAID-0 with chunk size [256|512|1024]
> I have tried several different file systems - same error
>
> Versions tested:
>
> Linux-2.4.1([3-7]|8-pre.) tested. All buggy. Bug #1 was fixed in -rmap11c
>
> TEST SETUP
>
> Reading 100 500MB files with dd, tux, apache, cat, something, and
> redirecting
> the output to /dev/null. With tux/apache, I used another computer using
> wget
> to retrieve the same amount of data.
>
> The test scripts look something like this
>
> #!/bin/bash
> dd if=file0000 of=/dev/null &
> dd if=file0001 of=/dev/null &
> dd if=file0002 of=/dev/null &
> dd if=file0003 of=/dev/null &
> ...
> dd if=file0099 of=/dev/null &
>
> or similar - just with wget -O /dev/null ... &
>
> BUGS
>
> Bug #1:
>
> When (RAMx2) bytes has been read from disk, I/O as reported from vmstat
> drops
> to a mere 1MB/s
>
> When reading starts, the speed is initially high. Then, slowly, the speed
> decreases until it goes to something close to a complete halt (see output
> from
> vmstat below).
>
> # vmstat 2
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy
>  id
>  0 200  1   1676   3200   3012 786004   0 292 42034   298  791   745   4
>  29 67
>  0 200  1   1676   3308   3136 785760   0   0 44304     0  748   758   3
>  15 82
>  0 200  1   1676   3296   3232 785676   0   0 44236     0  756   710   2
>  23 75
>  0 200  1   1676   3304   3356 785548   0   0 38662    70  778   791   3
>  19 78
>  0 200  1   1676   3200   3456 785552   0   0 33536     0  693   594   3
>  13 84
>  1 200  0   1676   3224   3528 785192   0   0 35330    24  794   712   3
>  16 81
>  0 200  0   1676   3304   3736 784324   0   0 30524    74  725   793  12
>  14 74
>  0 200  0   1676   3256   3796 783664   0   0 29984     0  718   826   4
>  10 86
>  0 200  0   1676   3288   3868 783592   0   0 25540   152  763   812   3
>  17 80
>  0 200  0   1676   3276   3908 783472   0   0 22820     0  693   731   0
>  7 92
>  0 200  0   1676   3200   3964 783540   0   0 23312     6  759   827   4
>  11 85
>  0 200  0   1676   3308   3984 783452   0   0 17506     0  687   697   0
>  11 89
>  0 200  0   1676   3388   4012 783888   0   0 14512     0  671   638   1
>  5 93
>  0 200  0   2188   3208   4048 784156   0 512 16104   548  707   833   2
>  10 88
>  0 200  0   3468   3204   4048 784788   0  66  8220    66  628   662   0
>  3 96
>  0 200  0   3468   3296   4060 784680   0   0  1036     6  687   714   1
>  6 93
>  0 200  0   3468   3316   4060 784668   0   0  1018     0  613   631   1
>  2 97
>  0 200  0   3468   3292   4060 784688   0   0  1034     0  617   638   0
>  3 97
>  0 200  0   3468   3200   4068 784772   0   0  1066     6  694   727   2
>  4 94
>
> Bug #2:
>
> Doing the same test on Rik's -rmap(.*) somehow fixes Bug #1, and makes room
> for another bug to come out.
>
> Doing the same test, I can, with -rmap, get some 33-35MB/s sustained from
> /dev/md0 to memory. This is all good, but when doing this test, only 40 of
> the
> original processes ever finish. The same error occurs both locally (dd) and
> remotely (tux). If new i/o requests is issued to the same device, they
> don't
> hang. If tux is restarted, it works fine afterwards.
>
> Please - anyone - help me with this. I've been trying to setup this system
> for
> almost two months now, fighting various bugs.
>
> Best regards
>
> roy
>
> --
> Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA
>
> Computers are like air conditioners.
> They stop working when you open Windows.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-01-31 21:37       ` Jens Axboe
@ 2002-02-01 16:05         ` Roy Sigurd Karlsbakk
  2002-02-01 16:11         ` Roy Sigurd Karlsbakk
  1 sibling, 0 replies; 27+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-02-01 16:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, Roger Larsson, linux-kernel

> Something ala this, completely untested. Will try and boot it now :-)
> Roy, could you please test? It's against 2.4.18-pre7, I'll boot it now
> as well...

Still problems after installing the patch. No change at all. The patch was
installed against 2.4.17-rmap12a+ide+tux.

Testing with Apache2 now (apache2 uses mmap instead of sendfile() as
far as I can see) ... these tests take some time

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-01-31 21:37       ` Jens Axboe
  2002-02-01 16:05         ` Roy Sigurd Karlsbakk
@ 2002-02-01 16:11         ` Roy Sigurd Karlsbakk
  2002-02-01 18:44           ` Roger Larsson
  1 sibling, 1 reply; 27+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-02-01 16:11 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, Roger Larsson, linux-kernel

It does not seem to be possible to reproduce the error with apache2. But
this may be because Apache2's i/o handling doesn't impress much. With Tux,
I keep getting up to 40 megs per sec, but with Apache the average is
~15MB/s.

Btw ... It looks like your patch (against rmap12a) gave me an extra
performance kick. 12c gave me a max of ~32MB/s, whereas your patch
highered this to ~41.

thanks

roy

On Thu, 31 Jan 2002, Jens Axboe wrote:

> On Thu, Jan 31 2002, Andrew Morton wrote:
> > rmap 11c:
> >   ...
> >   - elevator improvement                                  (Andrew Morton)
> >
> > Which includes:
> >
> > -       queue_nr_requests = 64;
> > -       if (total_ram > MB(32))
> > -               queue_nr_requests = 128;                                                                 +       queue_nr_requests = (total_ram >> 9) & ~15;     /* One per half-megabyte */
> > +       if (queue_nr_requests < 32)
> > +               queue_nr_requests = 32;
> > +       if (queue_nr_requests > 1024)
> > +               queue_nr_requests = 1024;
> >
> >
> > So Roy is running with 1024 requests.
>
> Ah yes, of course.
>
> > The question is (sorry, Roy): does this need fixing?
> >
> > The only thing which can trigger it is when we have
> > zillions of threads doing reads (or zillions of outstanding
> > aio read requests) or when there are a large number of
> > unmerged write requests in the elevator.  It's a rare
> > case.
>
> Indeed.
>
> > If we _do_ need a fix, then perhaps we should just stop
> > using READA in the readhead code?  readahead is absolutely
> > vital to throughput, and best-effort request allocation
> > just isn't good enough.
>
> Hmm well. Maybe just a small pool of requests set aside for READA would
> be a better idea. That way "normal" reads are not able to starve READA
> completely.
>
> Something ala this, completely untested. Will try and boot it now :-)
> Roy, could you please test? It's against 2.4.18-pre7, I'll boot it now
> as well...
>
> --- /opt/kernel/linux-2.4.18-pre7/include/linux/blkdev.h	Mon Nov 26 14:29:17 2001
> +++ linux/include/linux/blkdev.h	Thu Jan 31 22:29:01 2002
> @@ -74,9 +74,9 @@
>  struct request_queue
>  {
>  	/*
> -	 * the queue request freelist, one for reads and one for writes
> +	 * the queue request freelist, one for READ, WRITE, and READA
>  	 */
> -	struct request_list	rq[2];
> +	struct request_list	rq[3];
>
>  	/*
>  	 * Together with queue_head for cacheline sharing
> --- /opt/kernel/linux-2.4.18-pre7/drivers/block/ll_rw_blk.c	Sun Jan 27 16:06:31 2002
> +++ linux/drivers/block/ll_rw_blk.c	Thu Jan 31 22:36:24 2002
> @@ -333,8 +333,10 @@
>
>  	INIT_LIST_HEAD(&q->rq[READ].free);
>  	INIT_LIST_HEAD(&q->rq[WRITE].free);
> +	INIT_LIST_HEAD(&q->rq[READA].free);
>  	q->rq[READ].count = 0;
>  	q->rq[WRITE].count = 0;
> +	q->rq[READA].count = 0;
>
>  	/*
>  	 * Divide requests in half between read and write
> @@ -352,6 +354,20 @@
>  		q->rq[i&1].count++;
>  	}
>
> +	for (i = 0; i < queue_nr_requests / 4; i++) {
> +		rq = kmem_cache_alloc(request_cachep, SLAB_KERNEL);
> +		/*
> +		 * hey well, this needs better checking (as well as the above)
> +		 */
> +		if (!rq)
> +			break;
> +
> +		memset(rq, 0, sizeof(struct request));
> +		rq->rq_status = RQ_INACTIVE;
> +		list_add(&rq->queue, &q->rq[READA].free);
> +		q->rq[READA].count++;
> +	}
> +
>  	init_waitqueue_head(&q->wait_for_request);
>  	spin_lock_init(&q->queue_lock);
>  }
> @@ -752,12 +768,18 @@
>  		req = freereq;
>  		freereq = NULL;
>  	} else if ((req = get_request(q, rw)) == NULL) {
> -		spin_unlock_irq(&io_request_lock);
> +
>  		if (rw_ahead)
> -			goto end_io;
> +			req = get_request(q, READA);
>
> -		freereq = __get_request_wait(q, rw);
> -		goto again;
> +		spin_unlock_irq(&io_request_lock);
> +
> +		if (!req && rw_ahead)
> +			goto end_io;
> +		else if (!req) {
> +			freereq = __get_request_wait(q, rw);
> +			goto again;
> +		}
>  	}
>
>  /* fill up the request-info, and add it to the queue */
> @@ -1119,7 +1141,7 @@
>  	 */
>  	queue_nr_requests = 64;
>  	if (total_ram > MB(32))
> -		queue_nr_requests = 128;
> +		queue_nr_requests = 256;
>
>  	/*
>  	 * Batch frees according to queue length
>
> --
> Jens Axboe
>

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-01-31 15:05 Errors in the VM - detailed Roy Sigurd Karlsbakk
  2002-01-31 15:44 ` David Mansfield
  2002-01-31 20:21 ` Roger Larsson
@ 2002-02-01 17:39 ` Denis Vlasenko
  2 siblings, 0 replies; 27+ messages in thread
From: Denis Vlasenko @ 2002-02-01 17:39 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: linux-kernel

On 31 January 2002 13:05, Roy Sigurd Karlsbakk wrote:
> The last month or so, I've been trying to make a particular configuration
> work with Linux-2.4.17 and other 2.4.x kernels. Two major bugs have been
> blocking my way into the light. Below follows a detailed description on
> both bugs. One of them seems to be solved in the latests -rmap patches. The
> other is still unsolved.

I've seen your posts. Can't help you directly, but:

> The two 120GB drives is configured in RAID-0 with chunk size [256|512|1024]

Do bugs bite you with plain partitions (no RAID). Maybe it's a RAID bug?

> When (RAMx2) bytes has been read from disk, I/O as reported from vmstat
> drops to a mere 1MB/s
> When reading starts, the speed is initially high. Then, slowly, the speed
> decreases until it goes to something close to a complete halt (see output
> from vmstat below).

Can you run oom_trigger at this point and watch what will happen?
It will force most (if not all) of the page cache to be flushed, speed might
increase. This is not a solution, just a way to get additional info on bug
behavior. I've got a little patch which improves (read: fixes) cache flush
behavior. Attached below. BTW, did you try -aa kernels?

> Bug #2:
>
> Doing the same test on Rik's -rmap(.*) somehow fixes Bug #1, and makes room
> for another bug to come out.
>
> Doing the same test, I can, with -rmap, get some 33-35MB/s sustained from
> /dev/md0 to memory. This is all good, but when doing this test, only 40 of
> the original processes ever finish. The same error occurs both locally (dd)
> and remotely (tux). If new i/o requests is issued to the same device, they
> don't hang. If tux is restarted, it works fine afterwards.

After they hang, make Alt-SysRq-T trace, ksymoops it and send to Rik and LKML.
CC'ing Andrea won't hurt I think.
--
vda

oom_trigger.c
=============
#include <stdlib.h>
int main() {
    void *p;
    unsigned size = 1<<20;
    unsigned long total=0;
    while(size) {
	p = malloc(size);
	if(!p) size>>=1;
	else {
	    memset(p, 0x77, size);
	    total+=size;
	    printf("Allocated %9u bytes, %12lu total\n",size,total);
	}
    }
    return 0;
}


vmscan.patch.2.4.17.d (author: "M.H.VanLeeuwen" <vanl@megsinet.net>)
====================================================================
--- linux.virgin/mm/vmscan.c	Mon Dec 31 12:46:25 2001
+++ linux/mm/vmscan.c	Fri Jan 11 18:03:05 2002
@@ -394,9 +394,9 @@
 		if (PageDirty(page) && is_page_cache_freeable(page) && page->mapping) {
 			/*
 			 * It is not critical here to write it only if
-			 * the page is unmapped beause any direct writer
+			 * the page is unmapped because any direct writer
 			 * like O_DIRECT would set the PG_dirty bitflag
-			 * on the phisical page after having successfully
+			 * on the physical page after having successfully
 			 * pinned it and after the I/O to the page is finished,
 			 * so the direct writes to the page cannot get lost.
 			 */
@@ -480,11 +480,14 @@
 
 			/*
 			 * Alert! We've found too many mapped pages on the
-			 * inactive list, so we start swapping out now!
+			 * inactive list.
+			 * Move referenced pages to the active list.
 			 */
-			spin_unlock(&pagemap_lru_lock);
-			swap_out(priority, gfp_mask, classzone);
-			return nr_pages;
+			if (PageReferenced(page) && !PageLocked(page)) {
+				del_page_from_inactive_list(page);
+				add_page_to_active_list(page);
+			}
+			continue;
 		}
 
 		/*
@@ -521,6 +524,9 @@
 	}
 	spin_unlock(&pagemap_lru_lock);
 
+	if (max_mapped <= 0 && (nr_pages > 0 || priority < DEF_PRIORITY))
+		swap_out(priority, gfp_mask, classzone);
+
 	return nr_pages;
 }
 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-02-01 16:11         ` Roy Sigurd Karlsbakk
@ 2002-02-01 18:44           ` Roger Larsson
  2002-02-01 18:52             ` Roger Larsson
                               ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Roger Larsson @ 2002-02-01 18:44 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk, Jens Axboe; +Cc: Andrew Morton, linux-kernel

On Fridayen den 1 February 2002 17.11, Roy Sigurd Karlsbakk wrote:
> It does not seem to be possible to reproduce the error with apache2. But
> this may be because Apache2's i/o handling doesn't impress much. With Tux,
> I keep getting up to 40 megs per sec, but with Apache the average is
> ~15MB/s.
>
> Btw ... It looks like your patch (against rmap12a) gave me an extra
> performance kick. 12c gave me a max of ~32MB/s, whereas your patch
> highered this to ~41.
>

Hmm.. suppose this is the problem anyway and that Jens patch was not enough.
How do the disk drive sound during the test?

Does it start to sound more when performance goes down?

About Jens patch:

My feeling is that there should be (a lot) more  READA than READ.
since sequential READ really only NEEDS one at a time.

Number of READ limits the number of concurrent streams.
And READA limits the maximum total read ahead.

Jens said earlier "Roy, please try and change
the queue_nr_requests assignment in ll_rw_blk:blk_dev_init() to
something like 2048." - Roy have you tested this too?

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-02-01 18:44           ` Roger Larsson
@ 2002-02-01 18:52             ` Roger Larsson
  2002-02-01 18:57             ` Jens Axboe
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Roger Larsson @ 2002-02-01 18:52 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk, Jens Axboe; +Cc: Andrew Morton, linux-kernel

One more thing, that I think is important.

On Friday den 1 February 2002 19.44, Roger Larsson wrote:
> On Friday den 1 February 2002 17.11, Roy Sigurd Karlsbakk wrote:
> - - -
> About Jens patch:
>
> My feeling is that there should be (a lot) more  READA than READ.
> since sequential READ really only NEEDS one at a time.
>
> Number of READ limits the number of concurrent streams.
> And READA limits the maximum total read ahead.

With RAID as Roy uses this gets even worse!
READs has to be > concurrent streams * raid disks   [IMHO]
since each stream is splitted out on all disks...

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-02-01 18:44           ` Roger Larsson
  2002-02-01 18:52             ` Roger Larsson
@ 2002-02-01 18:57             ` Jens Axboe
  2002-02-02 14:52               ` Roy Sigurd Karlsbakk
  2002-02-02 14:43             ` Roy Sigurd Karlsbakk
  2002-02-02 14:43             ` Roy Sigurd Karlsbakk
  3 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2002-02-01 18:57 UTC (permalink / raw)
  To: Roger Larsson; +Cc: Roy Sigurd Karlsbakk, Andrew Morton, linux-kernel

On Fri, Feb 01 2002, Roger Larsson wrote:
> On Fridayen den 1 February 2002 17.11, Roy Sigurd Karlsbakk wrote:
> > It does not seem to be possible to reproduce the error with apache2. But
> > this may be because Apache2's i/o handling doesn't impress much. With Tux,
> > I keep getting up to 40 megs per sec, but with Apache the average is
> > ~15MB/s.
> >
> > Btw ... It looks like your patch (against rmap12a) gave me an extra
> > performance kick. 12c gave me a max of ~32MB/s, whereas your patch
> > highered this to ~41.
> >
> 
> Hmm.. suppose this is the problem anyway and that Jens patch was not enough.
> How do the disk drive sound during the test?
> 
> Does it start to sound more when performance goes down?

Yes that would be interesting to know, if the disk becomes seek bound.

> About Jens patch:
> 
> My feeling is that there should be (a lot) more  READA than READ.
> since sequential READ really only NEEDS one at a time.

Probably, my patch was really just a quick try to see if it changed
anything.

> Number of READ limits the number of concurrent streams.
> And READA limits the maximum total read ahead.

Correct, Roy you could try and change the READA balance by allocating
lots more READA requests. Simply play around with the
queue_nr_requests / 4 setting. Try something "absurd" like
queue_nr_requests << 2 or even bigger.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-02-01 18:44           ` Roger Larsson
  2002-02-01 18:52             ` Roger Larsson
  2002-02-01 18:57             ` Jens Axboe
@ 2002-02-02 14:43             ` Roy Sigurd Karlsbakk
  2002-02-02 14:43             ` Roy Sigurd Karlsbakk
  3 siblings, 0 replies; 27+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-02-02 14:43 UTC (permalink / raw)
  To: Roger Larsson; +Cc: Jens Axboe, Andrew Morton, linux-kernel

> Hmm.. suppose this is the problem anyway and that Jens patch was not enough.
> How do the disk drive sound during the test?

The disk is SILENT! I can hardly hear anything.

> Does it start to sound more when performance goes down?

I don't beleive it's a seek problem, as the readahead (RAID chunk size) is
1MB

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-02-01 18:44           ` Roger Larsson
                               ` (2 preceding siblings ...)
  2002-02-02 14:43             ` Roy Sigurd Karlsbakk
@ 2002-02-02 14:43             ` Roy Sigurd Karlsbakk
  2002-02-02 14:44               ` Jens Axboe
  3 siblings, 1 reply; 27+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-02-02 14:43 UTC (permalink / raw)
  To: Roger Larsson; +Cc: Jens Axboe, Andrew Morton, linux-kernel

> Jens said earlier "Roy, please try and change
> the queue_nr_requests assignment in ll_rw_blk:blk_dev_init() to
> something like 2048." - Roy have you tested this too?

No ... Where do I change it?

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-02-02 14:43             ` Roy Sigurd Karlsbakk
@ 2002-02-02 14:44               ` Jens Axboe
  2002-02-02 15:03                 ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2002-02-02 14:44 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Roger Larsson, Andrew Morton, linux-kernel

On Sat, Feb 02 2002, Roy Sigurd Karlsbakk wrote:
> > Jens said earlier "Roy, please try and change
> > the queue_nr_requests assignment in ll_rw_blk:blk_dev_init() to
> > something like 2048." - Roy have you tested this too?
> 
> No ... Where do I change it?

drivers/block/ll_rw_blk.c:blk_dev_init()
{
	queue_nr_requests = 64;
	if (total_ram > MB(32))
		queue_nr_requests = 256;

Change the 256 to 2048.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-02-01 18:57             ` Jens Axboe
@ 2002-02-02 14:52               ` Roy Sigurd Karlsbakk
  0 siblings, 0 replies; 27+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-02-02 14:52 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Roger Larsson, Andrew Morton, linux-kernel

> > Hmm.. suppose this is the problem anyway and that Jens patch was not enough.
> > How do the disk drive sound during the test?
> >
> > Does it start to sound more when performance goes down?
>
> Yes that would be interesting to know, if the disk becomes seek bound.

The performance never goes down. It's stable @ ~40-43 MB/s. It DID go
down, but that was before -rmap11c. Then the problem was in the VM

> Probably, my patch was really just a quick try to see if it changed
> anything.
>
> > Number of READ limits the number of concurrent streams.
> > And READA limits the maximum total read ahead.
>
> Correct, Roy you could try and change the READA balance by allocating
> lots more READA requests. Simply play around with the
> queue_nr_requests / 4 setting. Try something "absurd" like
> queue_nr_requests << 2 or even bigger.

sure.

where do I change this???

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-02-02 14:44               ` Jens Axboe
@ 2002-02-02 15:03                 ` Roy Sigurd Karlsbakk
  2002-02-02 15:06                   ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-02-02 15:03 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Roger Larsson, Andrew Morton, linux-kernel

On Sat, 2 Feb 2002, Jens Axboe wrote:

> On Sat, Feb 02 2002, Roy Sigurd Karlsbakk wrote:
> > > Jens said earlier "Roy, please try and change
> > > the queue_nr_requests assignment in ll_rw_blk:blk_dev_init() to
> > > something like 2048." - Roy have you tested this too?
> >
> > No ... Where do I change it?
>
> drivers/block/ll_rw_blk.c:blk_dev_init()
> {
> 	queue_nr_requests = 64;
> 	if (total_ram > MB(32))
> 		queue_nr_requests = 256;
>
> Change the 256 to 2048.

Is this an attempt to fix the problem #2 (as described in the initial
email), or to further improve throughtput?

Problem #2 is _the_ worst problem, as it makes the server more-or-less
unusable

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-02-02 15:03                 ` Roy Sigurd Karlsbakk
@ 2002-02-02 15:06                   ` Jens Axboe
  2002-02-02 15:22                     ` Errors in the VM - detailed (or is it Tux?) Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2002-02-02 15:06 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Roger Larsson, Andrew Morton, linux-kernel

On Sat, Feb 02 2002, Roy Sigurd Karlsbakk wrote:
> On Sat, 2 Feb 2002, Jens Axboe wrote:
> 
> > On Sat, Feb 02 2002, Roy Sigurd Karlsbakk wrote:
> > > > Jens said earlier "Roy, please try and change
> > > > the queue_nr_requests assignment in ll_rw_blk:blk_dev_init() to
> > > > something like 2048." - Roy have you tested this too?
> > >
> > > No ... Where do I change it?
> >
> > drivers/block/ll_rw_blk.c:blk_dev_init()
> > {
> > 	queue_nr_requests = 64;
> > 	if (total_ram > MB(32))
> > 		queue_nr_requests = 256;
> >
> > Change the 256 to 2048.
> 
> Is this an attempt to fix the problem #2 (as described in the initial
> email), or to further improve throughtput?

Further "improvement", the question is will it make a difference.
Bumping READA count would interesting too, as outlined.

> Problem #2 is _the_ worst problem, as it makes the server more-or-less
> unusable

Please send sysrq-t traces for such stuck processes. It's _impossible_
to guess whats going on from here, the crystal ball just isn't good
enough :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed (or is it Tux?)
  2002-02-02 15:06                   ` Jens Axboe
@ 2002-02-02 15:22                     ` Roy Sigurd Karlsbakk
  2002-02-02 15:31                       ` Errors in the VM - detailed (or is it Tux? or rmap? or those together...) Roger Larsson
  0 siblings, 1 reply; 27+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-02-02 15:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Roger Larsson, Andrew Morton, linux-kernel, Tux mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 676 bytes --]

> > Problem #2 is _the_ worst problem, as it makes the server more-or-less
> > unusable
>
> Please send sysrq-t traces for such stuck processes. It's _impossible_
> to guess whats going on from here, the crystal ball just isn't good
> enough :-)

Decoded sysrq+t is attached.

I've found only the first 60 wget processes started from the remote
machine is being serviced. After they are done, Tux hangs, using 100%
system time, still open on port ## (80), but doesn't do anything.

I don't understand anything...

Thanks, guys. You're of great help!

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.

[-- Attachment #2: Type: TEXT/PLAIN, Size: 12318 bytes --]

ksymoops 2.4.1 on i686 2.4.17.  Options used
     -v ../linux/vmlinux (specified)
     -k ksyms (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.4.17/ (default)
     -m ../linux/System.map (specified)

Warning (expand_objects): object /lib/modules/2.4.17pronto4/kernel/drivers/net/e1000.o for module e1000 has changed since load
Warning (compare_ksyms_lsmod): module fat is in lsmod but not in ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module loop is in lsmod but not in ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module nls_iso8859-15 is in lsmod but not in ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module vfat is in lsmod but not in ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module ymfpci is in lsmod but not in ksyms, probably no symbols exported
Error (compare_ksyms_lsmod): module e1000 is in ksyms but not in lsmod
Warning (compare_maps): mismatch on symbol partition_name  , ksyms_base says c01c9780, vmlinux says c014c540.  Ignoring ksyms_base entry
Call Trace: [<c011017a>] [<c01100b0>] [<c013d361>] [<c013d710>] [<c0106e8b>] 
Call Trace: [<c011e33b>] [<c0105708>] 
Call Trace: [<c011017a>] [<c01100b0>] [<c010e9b6>] [<c010f335>] [<c01056ff>] 
   [<c0105708>] 
Call Trace: [<c0116e9e>] [<c0105708>] 
Call Trace: [<c011017a>] [<c01100b0>] [<c0110802>] [<c0129eb9>] [<c0129d03>] 
   [<c0105708>] 
Call Trace: [<c01107ad>] [<c0134215>] [<c0105708>] 
Call Trace: [<c011017a>] [<c01100b0>] [<c01342a2>] [<c0105708>] 
Call Trace: [<c01cd846>] [<c0105708>] 
Call Trace: [<c011017a>] [<c01100b0>] [<c0110802>] [<c01b559e>] [<c0105708>] 
Call Trace: [<c01107ad>] [<c015ba69>] [<c015b920>] [<c0105708>] 
Call Trace: [<c0110117>] [<c013d361>] [<c013d710>] [<c0106e8b>] 
Call Trace: [<c01128f1>] [<c014a8c1>] [<c0130656>] [<c0106e8b>] 
Call Trace: [<c0110117>] [<c013d967>] [<c013d99c>] [<c013db9d>] [<c01d07ec>] 
   [<c0106e8b>] 
Call Trace: [<c0110117>] [<c013d967>] [<c013d99c>] [<c013db9d>] [<c01301fc>] 
   [<c0106e8b>] 
Call Trace: [<c0110117>] [<c0241d2f>] [<c0175fb4>] [<c0105708>] 
Call Trace: [<c0110117>] [<c0241d2f>] [<c0184115>] [<c0105708>] 
Call Trace: [<c023ef14>] [<c0105708>] 
Call Trace: [<c0110117>] [<c01301fc>] [<c0241d2f>] [<c0175fb4>] [<c0105708>] 
Call Trace: [<c0110117>] [<c01301fc>] [<c0241d2f>] [<c0175fb4>] [<c0105708>] 
Call Trace: [<c0110117>] [<c01301fc>] [<c0241d2f>] [<c0175fb4>] [<c0105708>] 
Call Trace: [<c0110117>] [<c01301fc>] [<c0241d2f>] [<c0175fb4>] [<c0105708>] 
Call Trace: [<c0110117>] [<c01301fc>] [<c0241d2f>] [<c0175fb4>] [<c0105708>] 
Call Trace: [<c0110117>] [<c01301fc>] [<c0241d2f>] [<c0175fb4>] [<c0105708>] 
Call Trace: [<c0110117>] [<c01301fc>] [<c0241d2f>] [<c0175fb4>] [<c0105708>] 
Call Trace: [<c010b772>] [<c0106e8b>] 
Call Trace: [<c011017a>] [<c01100b0>] [<c022804a>] [<c0105708>] 
Call Trace: [<c0115bee>] [<c0106e8b>] 
Call Trace: [<c02308f0>] [<c0232452>] [<c0106e8b>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c022630f>] [<c0105708>] 
Call Trace: [<c0110117>] [<c0194f2d>] [<c0191008>] [<c0130656>] [<c0106e8b>] 
Call Trace: [<c0115bee>] [<c0106e8b>] 
Call Trace: [<c0110117>] [<c0194f2d>] [<c0191008>] [<c0130656>] [<c0106e8b>] 
Call Trace: [<c011017a>] [<c01100b0>] [<c011a4c6>] [<c0106e8b>] 
Warning (Oops_read): Code line not seen, dumping what data is available

Trace; c011017a <schedule_timeout+7a/a0>
Trace; c01100b0 <process_timeout+0/50>
Trace; c013d361 <do_select+1a1/1e0>
Trace; c013d710 <sys_select+340/480>
Trace; c0106e8b <system_call+33/38>
Trace; c011e33b <context_thread+fb/1a0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c011017a <schedule_timeout+7a/a0>
Trace; c01100b0 <process_timeout+0/50>
Trace; c010e9b6 <apm_mainloop+96/c0>
Trace; c010f335 <apm+2b5/2d0>
Trace; c01056ff <kernel_thread+1f/40>
Trace; c0105708 <kernel_thread+28/40>
Trace; c0116e9e <ksoftirqd+6e/b0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c011017a <schedule_timeout+7a/a0>
Trace; c01100b0 <process_timeout+0/50>
Trace; c0110802 <interruptible_sleep_on_timeout+42/60>
Trace; c0129eb9 <wakeup_memwaiters+79/100>
Trace; c0129d03 <kswapd+2c3/2d0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c01107ad <interruptible_sleep_on+3d/50>
Trace; c0134215 <bdflush+95/a0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c011017a <schedule_timeout+7a/a0>
Trace; c01100b0 <process_timeout+0/50>
Trace; c01342a2 <kupdate+82/110>
Trace; c0105708 <kernel_thread+28/40>
Trace; c01cd846 <md_thread+d6/150>
Trace; c0105708 <kernel_thread+28/40>
Trace; c011017a <schedule_timeout+7a/a0>
Trace; c01100b0 <process_timeout+0/50>
Trace; c0110802 <interruptible_sleep_on_timeout+42/60>
Trace; c01b559e <rtl8139_thread+9e/1b0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c01107ad <interruptible_sleep_on+3d/50>
Trace; c015ba69 <kjournald+129/1a0>
Trace; c015b920 <commit_timeout+0/10>
Trace; c0105708 <kernel_thread+28/40>
Trace; c0110117 <schedule_timeout+17/a0>
Trace; c013d361 <do_select+1a1/1e0>
Trace; c013d710 <sys_select+340/480>
Trace; c0106e8b <system_call+33/38>
Trace; c01128f1 <do_syslog+c1/2e0>
Trace; c014a8c1 <kmsg_read+11/20>
Trace; c0130656 <sys_read+96/d0>
Trace; c0106e8b <system_call+33/38>
Trace; c0110117 <schedule_timeout+17/a0>
Trace; c013d967 <do_poll+87/e0>
Trace; c013d99c <do_poll+bc/e0>
Trace; c013db9d <sys_poll+1dd/2f0>
Trace; c01d07ec <sys_socketcall+1dc/200>
Trace; c0106e8b <system_call+33/38>
Trace; c0110117 <schedule_timeout+17/a0>
Trace; c013d967 <do_poll+87/e0>
Trace; c013d99c <do_poll+bc/e0>
Trace; c013db9d <sys_poll+1dd/2f0>
Trace; c01301fc <filp_close+5c/70>
Trace; c0106e8b <system_call+33/38>
Trace; c0110117 <schedule_timeout+17/a0>
Trace; c0241d2f <svc_recv+1af/3e0>
Trace; c0175fb4 <nfsd+d4/2f0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c0110117 <schedule_timeout+17/a0>
Trace; c0241d2f <svc_recv+1af/3e0>
Trace; c0184115 <lockd+125/250>
Trace; c0105708 <kernel_thread+28/40>
Trace; c023ef14 <rpciod+164/210>
Trace; c0105708 <kernel_thread+28/40>
Trace; c0110117 <schedule_timeout+17/a0>
Trace; c01301fc <filp_close+5c/70>
Trace; c0241d2f <svc_recv+1af/3e0>
Trace; c0175fb4 <nfsd+d4/2f0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c0110117 <schedule_timeout+17/a0>
Trace; c01301fc <filp_close+5c/70>
Trace; c0241d2f <svc_recv+1af/3e0>
Trace; c0175fb4 <nfsd+d4/2f0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c0110117 <schedule_timeout+17/a0>
Trace; c01301fc <filp_close+5c/70>
Trace; c0241d2f <svc_recv+1af/3e0>
Trace; c0175fb4 <nfsd+d4/2f0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c0110117 <schedule_timeout+17/a0>
Trace; c01301fc <filp_close+5c/70>
Trace; c0241d2f <svc_recv+1af/3e0>
Trace; c0175fb4 <nfsd+d4/2f0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c0110117 <schedule_timeout+17/a0>
Trace; c01301fc <filp_close+5c/70>
Trace; c0241d2f <svc_recv+1af/3e0>
Trace; c0175fb4 <nfsd+d4/2f0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c0110117 <schedule_timeout+17/a0>
Trace; c01301fc <filp_close+5c/70>
Trace; c0241d2f <svc_recv+1af/3e0>
Trace; c0175fb4 <nfsd+d4/2f0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c0110117 <schedule_timeout+17/a0>
Trace; c01301fc <filp_close+5c/70>
Trace; c0241d2f <svc_recv+1af/3e0>
Trace; c0175fb4 <nfsd+d4/2f0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c010b772 <sys_pause+12/20>
Trace; c0106e8b <system_call+33/38>
Trace; c011017a <schedule_timeout+7a/a0>
Trace; c01100b0 <process_timeout+0/50>
Trace; c022804a <logger_thread+13a/1a0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c0115bee <sys_wait4+35e/390>
Trace; c0106e8b <system_call+33/38>
Trace; c02308f0 <event_loop+50/190>
Trace; c0232452 <__sys_tux+372/a00>
Trace; c0106e8b <system_call+33/38>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c022630f <cachemiss_thread+16f/1c0>
Trace; c0105708 <kernel_thread+28/40>
Trace; c0110117 <schedule_timeout+17/a0>
Trace; c0194f2d <read_chan+3ad/700>
Trace; c0191008 <tty_read+b8/e0>
Trace; c0130656 <sys_read+96/d0>
Trace; c0106e8b <system_call+33/38>
Trace; c0115bee <sys_wait4+35e/390>
Trace; c0106e8b <system_call+33/38>
Trace; c0110117 <schedule_timeout+17/a0>
Trace; c0194f2d <read_chan+3ad/700>
Trace; c0191008 <tty_read+b8/e0>
Trace; c0130656 <sys_read+96/d0>
Trace; c0106e8b <system_call+33/38>
Trace; c011017a <schedule_timeout+7a/a0>
Trace; c01100b0 <process_timeout+0/50>
Trace; c011a4c6 <sys_nanosleep+116/1f0>
Trace; c0106e8b <system_call+33/38>


8 warnings and 1 error issued.  Results may not be reliable.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed (or is it Tux? or rmap? or those together...)
  2002-02-02 15:22                     ` Errors in the VM - detailed (or is it Tux?) Roy Sigurd Karlsbakk
@ 2002-02-02 15:31                       ` Roger Larsson
  2002-02-02 15:38                         ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 27+ messages in thread
From: Roger Larsson @ 2002-02-02 15:31 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk, Jens Axboe
  Cc: Andrew Morton, linux-kernel, Tux mailing list

On Saturdayen den 2 February 2002 16.22, Roy Sigurd Karlsbakk wrote:
> > > Problem #2 is _the_ worst problem, as it makes the server more-or-less
> > > unusable
> >
> > Please send sysrq-t traces for such stuck processes. It's _impossible_
> > to guess whats going on from here, the crystal ball just isn't good
> > enough :-)
>
> Decoded sysrq+t is attached.
>
> I've found only the first 60 wget processes started from the remote
> machine is being serviced. After they are done, Tux hangs, using 100%
> system time, still open on port ## (80), but doesn't do anything.
>
> I don't understand anything...

I have reread the first mail in this series - I would say that Bug#2 is much
worse than Bug#1. This since Bug#1 is "only" a performance problem,
but Bug#2 is about correctness...

Are you 100% sure that tux works with rmap?

I would suggest testing the simplest possible case.
* Standard kernel
* concurrent dd:s

What can your problem be:
* something to do with the VM - but the problem is in several different VMs...
* something to do with read ahead? you got some patch suggestions -
  please use them on a standard kernel, not rmap (for now...)

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed (or is it Tux? or rmap? or those together...)
  2002-02-02 15:31                       ` Errors in the VM - detailed (or is it Tux? or rmap? or those together...) Roger Larsson
@ 2002-02-02 15:38                         ` Roy Sigurd Karlsbakk
  2002-02-02 16:24                           ` Roger Larsson
  0 siblings, 1 reply; 27+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-02-02 15:38 UTC (permalink / raw)
  To: Roger Larsson; +Cc: Jens Axboe, Andrew Morton, linux-kernel, Tux mailing list

> I have reread the first mail in this series - I would say that Bug#2 is much
> worse than Bug#1. This since Bug#1 is "only" a performance problem,
> but Bug#2 is about correctness...
>
> Are you 100% sure that tux works with rmap?

Of course not. How can I be sure???

> I would suggest testing the simplest possible case.
> * Standard kernel
> * concurrent dd:s

Won't work. Then all I get is (ref prob #1) good throughput until RAMx2
bytes is read from disk. Then it all falls down to ~1MB/s. See
http://karlsbakk.net/dev/kernel/vm-fsckup.txt for more details.

> What can your problem be:
> * something to do with the VM - but the problem is in several different VMs...
> * something to do with read ahead? you got some patch suggestions -
>   please use them on a standard kernel, not rmap (for now...)

Then fix the problem rmap11c fixed. I first need that fixed before being
able to do any further testing!

roy

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed (or is it Tux? or rmap? or those together...)
  2002-02-02 15:38                         ` Roy Sigurd Karlsbakk
@ 2002-02-02 16:24                           ` Roger Larsson
  2002-02-02 16:39                             ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 27+ messages in thread
From: Roger Larsson @ 2002-02-02 16:24 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Jens Axboe, Andrew Morton, linux-kernel

On Saturday den 2 February 2002 16.38, Roy Sigurd Karlsbakk wrote:
> > I have reread the first mail in this series - I would say that Bug#2 is
> > much worse than Bug#1. This since Bug#1 is "only" a performance problem,
> > but Bug#2 is about correctness...
> >
> > Are you 100% sure that tux works with rmap?
>
> Of course not. How can I be sure???
>
> > I would suggest testing the simplest possible case.
> > * Standard kernel
> > * concurrent dd:s
>
> Won't work. Then all I get is (ref prob #1) good throughput until RAMx2
> bytes is read from disk. Then it all falls down to ~1MB/s. See
> http://karlsbakk.net/dev/kernel/vm-fsckup.txt for more details.

How do you know that it gets into this at RAMx2? Have you added 'bi' from
vmstat?

One interesting thing to notice from vmstat is...

r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy id
When performing nicely:
0 200  1   1676   3200   3012 786004   0 292 42034   298  791   745   4  29 67
0 200  1   1676   3308   3136 785760   0   0 44304     0  748   758   3  15 82
0 200  1   1676   3296   3232 785676   0   0 44236     0  756   710   2  23 75
Later when being slow:
0 200  0  3468   3316  4060 784668  0   0  1018    0  613   631   1   2 97
0 200  0  3468   3292  4060 784688  0   0  1034    0  617   638   0   3 97
0 200  0  3468   3200  4068 784772  0   0  1066    6  694   727   2   4 94

No swap activity (si + so == 0), mostly idle (id > 90).
So it is waiting - on what??? timer? disk?

>
> > What can your problem be:
> > * something to do with the VM - but the problem is in several different
> > VMs... * something to do with read ahead? you got some patch suggestions
> > - please use them on a standard kernel, not rmap (for now...)
>
> Then fix the problem rmap11c fixed. I first need that fixed before being
> able to do any further testing!

Roy, did you notice the mail from Andrew Morton:
> heh.  Yep, Roger finally nailed it, I think.
> 
> Roy says the bug was fixed in rmap11c.  Changelog says:
> 
> 
> rmap 11c:
>   ...
>   - elevator improvement                                  (Andrew Morton)
> 
> Which includes:
> 
> -       queue_nr_requests = 64;
> -       if (total_ram > MB(32))
> -               queue_nr_requests = 128;                                    
>                              +       queue_nr_requests = (total_ram >> 9) & 
> ~15;     /* One per half-megabyte */
> +       if (queue_nr_requests < 32)
> +               queue_nr_requests = 32;
> +       if (queue_nr_requests > 1024)
> +               queue_nr_requests = 1024;

rmap11c changed the queue_nr_requests, that problem went away.
But another one showed its ugly head...

Could you please try this part of rmap11c only? Or the very simple one 
setting queue_nr_request to = 2048 for a test drive...

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed (or is it Tux? or rmap? or those together...)
  2002-02-02 16:24                           ` Roger Larsson
@ 2002-02-02 16:39                             ` Roy Sigurd Karlsbakk
  2002-02-02 16:52                               ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 27+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-02-02 16:39 UTC (permalink / raw)
  To: Roger Larsson; +Cc: Jens Axboe, Andrew Morton, linux-kernel

> How do you know that it gets into this at RAMx2? Have you added 'bi' from
> vmstat?

yes

> One interesting thing to notice from vmstat is...
>
> r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy id
> When performing nicely:
> 0 200  1   1676   3200   3012 786004   0 292 42034   298  791   745   4  29 67
> 0 200  1   1676   3308   3136 785760   0   0 44304     0  748   758   3  15 82
> 0 200  1   1676   3296   3232 785676   0   0 44236     0  756   710   2  23 75
> Later when being slow:
> 0 200  0  3468   3316  4060 784668  0   0  1018    0  613   631   1   2 97
> 0 200  0  3468   3292  4060 784688  0   0  1034    0  617   638   0   3 97
> 0 200  0  3468   3200  4068 784772  0   0  1066    6  694   727   2   4 94
>
> No swap activity (si + so == 0), mostly idle (id > 90).
> So it is waiting - on what??? timer? disk?

I don't know. All I know is that with rmap-11c, it works

> Roy, did you notice the mail from Andrew Morton:
> > heh.  Yep, Roger finally nailed it, I think.
> >
> > Roy says the bug was fixed in rmap11c.  Changelog says:
> >
> >
> > rmap 11c:
> >   ...
> >   - elevator improvement                                  (Andrew Morton)
> >
> > Which includes:
> >
> > -       queue_nr_requests = 64;
> > -       if (total_ram > MB(32))
> > -               queue_nr_requests = 128;
> >                              +       queue_nr_requests = (total_ram >> 9) &
> > ~15;     /* One per half-megabyte */
> > +       if (queue_nr_requests < 32)
> > +               queue_nr_requests = 32;
> > +       if (queue_nr_requests > 1024)
> > +               queue_nr_requests = 1024;
>
> rmap11c changed the queue_nr_requests, that problem went away.
> But another one showed its ugly head...
>
> Could you please try this part of rmap11c only? Or the very simple one
> setting queue_nr_request to = 2048 for a test drive...

u mean - on a 2.4.1[18](-pre.)? kernel?

I'll try

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed (or is it Tux? or rmap? or those together...)
  2002-02-02 16:39                             ` Roy Sigurd Karlsbakk
@ 2002-02-02 16:52                               ` Roy Sigurd Karlsbakk
  2002-02-02 17:29                                 ` Roger Larsson
  0 siblings, 1 reply; 27+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-02-02 16:52 UTC (permalink / raw)
  To: Roger Larsson; +Cc: Jens Axboe, Andrew Morton, linux-kernel

> > Roy, did you notice the mail from Andrew Morton:
> > > heh.  Yep, Roger finally nailed it, I think.
> > >
> > > Roy says the bug was fixed in rmap11c.  Changelog says:
> > >
> > >
> > > rmap 11c:
> > >   ...
> > >   - elevator improvement                                  (Andrew Morton)
> > >
> > > Which includes:
> > >
> > > -       queue_nr_requests = 64;
> > > -       if (total_ram > MB(32))
> > > -               queue_nr_requests = 128;
> > >                              +       queue_nr_requests = (total_ram >> 9) &
> > > ~15;     /* One per half-megabyte */
> > > +       if (queue_nr_requests < 32)
> > > +               queue_nr_requests = 32;
> > > +       if (queue_nr_requests > 1024)
> > > +               queue_nr_requests = 1024;
> >
> > rmap11c changed the queue_nr_requests, that problem went away.
> > But another one showed its ugly head...
> >
> > Could you please try this part of rmap11c only? Or the very simple one
> > setting queue_nr_request to = 2048 for a test drive...
>
> u mean - on a 2.4.1[18](-pre.)? kernel?
>
> I'll try

er..

# grep queue_nr_requests /usr/src/packed/k/2.4.17-rmap-11c
#


---
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed (or is it Tux? or rmap? or those together...)
  2002-02-02 16:52                               ` Roy Sigurd Karlsbakk
@ 2002-02-02 17:29                                 ` Roger Larsson
  2002-02-02 17:45                                   ` Errors in the VM - detailed Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 27+ messages in thread
From: Roger Larsson @ 2002-02-02 17:29 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Jens Axboe, Andrew Morton, linux-kernel

Hi again Roy,

> er..
> 
> # grep queue_nr_requests /usr/src/packed/k/2.4.17-rmap-11c
> #
Andrew did supply a patch for Riel but he did not accept all of it?

Lets see again. Do I understand you correctly:
rmap 11c fixes the problem #1 but not 11b? are all later
rmaps good?

rmap 11c:
  - oom_kill race locking fix                             (Andres Salomon)
  - elevator improvement                                  (Andrew Morton)
  - dirty buffer writeout speedup (hopefully ;))          (me)
  - small documentation updates                           (me)
  - page_launder() never does synchronous IO, kswapd
    and the processes calling it sleep on higher level    (me)
  - deadlock fix in touch_page()                          (me)
rmap 11b:

Lets see, not oom condition, no dirty buffers (read "only"),
not documentation, page_launder (no dirty...), not deadlock.
Remaining is the elevator... And that can really be it!
(read ahead related too...)

and 2.4.18-pre2 (or later) does not fix it?

2.4.18-pre2:
- ...
- Fix elevator insertion point on failed
  request merge					(Jens Axboe)
- ...
pre1:

-- 
Roger Larsson
Skellefteå
Sweden

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Errors in the VM - detailed
  2002-02-02 17:29                                 ` Roger Larsson
@ 2002-02-02 17:45                                   ` Roy Sigurd Karlsbakk
  0 siblings, 0 replies; 27+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-02-02 17:45 UTC (permalink / raw)
  To: Roger Larsson; +Cc: Jens Axboe, Andrew Morton, linux-kernel

> Andrew did supply a patch for Riel but he did not accept all of it?
>
> Lets see again. Do I understand you correctly:
> rmap 11c fixes the problem #1 but not 11b? are all later
> rmaps good?

I've just tried 11c and 12a. Both are good. The change was made between
11b and 11c.

>
> rmap 11c:
>   - oom_kill race locking fix                             (Andres Salomon)
>   - elevator improvement                                  (Andrew Morton)
>   - dirty buffer writeout speedup (hopefully ;))          (me)
>   - small documentation updates                           (me)
>   - page_launder() never does synchronous IO, kswapd
>     and the processes calling it sleep on higher level    (me)
>   - deadlock fix in touch_page()                          (me)
> rmap 11b:
>
> Lets see, not oom condition, no dirty buffers (read "only"),
> not documentation, page_launder (no dirty...), not deadlock.
> Remaining is the elevator... And that can really be it!
> (read ahead related too...)
>
> and 2.4.18-pre2 (or later) does not fix it?

I'll try.

>
> 2.4.18-pre2:
> - ...
> - Fix elevator insertion point on failed
>   request merge					(Jens Axboe)
> - ...
> pre1:

btw... I beleive the error #2 is Tux specific. I'm debugging it now. Sorry
for that

roy

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2002-02-02 17:46 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-01-31 15:05 Errors in the VM - detailed Roy Sigurd Karlsbakk
2002-01-31 15:44 ` David Mansfield
2002-01-31 20:21 ` Roger Larsson
2002-01-31 20:29   ` Jens Axboe
2002-01-31 20:43     ` Andrew Morton
2002-01-31 21:37       ` Jens Axboe
2002-02-01 16:05         ` Roy Sigurd Karlsbakk
2002-02-01 16:11         ` Roy Sigurd Karlsbakk
2002-02-01 18:44           ` Roger Larsson
2002-02-01 18:52             ` Roger Larsson
2002-02-01 18:57             ` Jens Axboe
2002-02-02 14:52               ` Roy Sigurd Karlsbakk
2002-02-02 14:43             ` Roy Sigurd Karlsbakk
2002-02-02 14:43             ` Roy Sigurd Karlsbakk
2002-02-02 14:44               ` Jens Axboe
2002-02-02 15:03                 ` Roy Sigurd Karlsbakk
2002-02-02 15:06                   ` Jens Axboe
2002-02-02 15:22                     ` Errors in the VM - detailed (or is it Tux?) Roy Sigurd Karlsbakk
2002-02-02 15:31                       ` Errors in the VM - detailed (or is it Tux? or rmap? or those together...) Roger Larsson
2002-02-02 15:38                         ` Roy Sigurd Karlsbakk
2002-02-02 16:24                           ` Roger Larsson
2002-02-02 16:39                             ` Roy Sigurd Karlsbakk
2002-02-02 16:52                               ` Roy Sigurd Karlsbakk
2002-02-02 17:29                                 ` Roger Larsson
2002-02-02 17:45                                   ` Errors in the VM - detailed Roy Sigurd Karlsbakk
2002-02-01 17:39 ` Denis Vlasenko
     [not found] <OF675D993F.933C6CB9-ON88256B52.00595CCC@boulder.ibm.com>
2002-02-01 11:52 ` Roy Sigurd Karlsbakk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox