public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 4G SGI quad Xeon - memory-related slowdowns
@ 2001-01-15 20:59 Paul Hubbard
  2001-01-15 21:32 ` Linus Torvalds
  2001-01-16 20:14 ` Mark Hemment
  0 siblings, 2 replies; 4+ messages in thread
From: Paul Hubbard @ 2001-01-15 20:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: Richard E. Jetton


We're having some problems with the 2.4.0 kernel on our SGI 1450, and
were hoping for some help.
 The box is a quad Xeon 700/2MB, with 4GB of memory, ServerSet III HE
chipset, RH6.1 (slightly modified for local configuration) distribution.

a) If we compile the kernel with no high memory support, /proc/meminfo
shows 1G of memory and everything works fine.

b) If we compile for 4G of memory, /proc/meminfo shows about 3G, and
overriding the amount at the lilo prompt causes kernel panics at bootup.
However, other than missing a quarter of the memory, it works just fine.

c) If we compile the kernel for 64G high memory (PAE mode), we see all
of the memory but have other problems:
  i) mkefs -m0 on a 72GB Seagate SCSI disk runs very slowly (about
5MB/sec instead of 22-25) and the machine hangs after the format
completes. To be exact, the command prompt returns, but
     ls or any other command will never return, and you have to reset
the box. This is a 
     showstopper for us!

  ii) If I override the amount of memory via lilo, we still get the
hang, but performance 
     actually improves! At 1G, it's slow for a few seconds, and then
runs fine. At 2G, it's 
     slow, and when I tried to boot 3G I got an odd startup crash that
I've not had time to
     replicate.

Other notes: 

1) SCSI is onboard Adaptec 39160 (aic7xxx driver, dual-channel) and
we've tried different drives, cables, terminators, etc. 

2) Other block I/O output (eg dd if=/dev/zero of=/dev/sdi bs=4M) also
run very slowly
3) We are using vmstat 1 to monitor data rates
4) I tried the format with 2.4 prerelease, and the mkfs was very slow,
and I got a SCSI reset at the end of the format. Perhaps this is
related?
5) If necessary, we can easily load a different distribution on the
machine if that might be part of the problem.

If necessary, we can setup a login on the machine, or run whatever test
code is necessary. Other than this, it's a pretty nice box to work on.

Please reply to rjetton and phubbard at fnal.gov, thanks.

-Paul

-- 
Paul Hubbard  phubbard@fnal.gov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 4G SGI quad Xeon - memory-related slowdowns
  2001-01-15 20:59 4G SGI quad Xeon - memory-related slowdowns Paul Hubbard
@ 2001-01-15 21:32 ` Linus Torvalds
  2001-01-15 22:33   ` Ingo Molnar
  2001-01-16 20:14 ` Mark Hemment
  1 sibling, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2001-01-15 21:32 UTC (permalink / raw)
  To: linux-kernel

In article <3A6364AF.AC4D4081@fnal.gov>,
Paul Hubbard  <phubbard@fnal.gov> wrote:
>
>We're having some problems with the 2.4.0 kernel on our SGI 1450, and
>were hoping for some help.
> The box is a quad Xeon 700/2MB, with 4GB of memory, ServerSet III HE
>chipset, RH6.1 (slightly modified for local configuration) distribution.
>
>a) If we compile the kernel with no high memory support, /proc/meminfo
>shows 1G of memory and everything works fine.

Good.

>b) If we compile for 4G of memory, /proc/meminfo shows about 3G, and
>overriding the amount at the lilo prompt causes kernel panics at bootup.
>However, other than missing a quarter of the memory, it works just
>fine.

3GB is right - your last 1GB is above the 4GB mark, and it's mapped
there explicitly so that you'll have space in the low 32 bits to map PCI
devices etc (and things like the APIC, you get the idea). 

If you try to override it, you will very obviously crash, because if you
tell Linux that you have 4GB of memory, Linux will think that you have
4GB of _contiguous_ memory, which is not true.  The only way to use that
last gigabyte is to enable support for memory > 4GB, and get the proper
memory map _without_ any overrides that shows the proper holes for PCI
space. 

Check your "dmesg" output under a working kernel for details - you'll
see how the memory is laid out and reported by the e820 call..

>c) If we compile the kernel for 64G high memory (PAE mode), we see all
>of the memory but have other problems:
>  i) mkefs -m0 on a 72GB Seagate SCSI disk runs very slowly (about
>5MB/sec instead of 22-25) and the machine hangs after the format
>completes. To be exact, the command prompt returns, but
>     ls or any other command will never return, and you have to reset
>the box. This is a 
>     showstopper for us!

Sounds like a true-to-God bug. Possibly in the form of incorrect MTRR
settings. Make sure you enable MTRR support.

I do need more information on what seems to hang, and how it hangs. One
of the pre-kernels will give you a nice stack backtrace for each process
if you press control-scrolllock, and that might be useful.

>  ii) If I override the amount of memory via lilo, we still get the
>       hang, but performance actually improves!

The performance problem is _probably_ due to the kernel having to
double-buffer the IO requests, coupled with bad MTRR settings (ie memory
above the 4GB range is probably marked as non-cacheable or something,
which means that you'll get really bad performance). 

Not using the high memory will avoid the double-buffering, and will also
avoid using memory that isn't cached. If I'm right.

The hang still indicates that something is wrong in PAE-land, though.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 4G SGI quad Xeon - memory-related slowdowns
  2001-01-15 21:32 ` Linus Torvalds
@ 2001-01-15 22:33   ` Ingo Molnar
  0 siblings, 0 replies; 4+ messages in thread
From: Ingo Molnar @ 2001-01-15 22:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel List, Paul Hubbard


On 15 Jan 2001, Linus Torvalds wrote:

> The performance problem is _probably_ due to the kernel having to
> double-buffer the IO requests, coupled with bad MTRR settings (ie
> memory above the 4GB range is probably marked as non-cacheable or
> something, which means that you'll get really bad performance).

the highmem related double-buffering alone on such a category of system is
miniscule, compared to other costs of IO, and considering the expected
bandwidth (20-30 MB/sec).

the MTRR part could be a problem.

> Not using the high memory will avoid the double-buffering, and will
> also avoid using memory that isn't cached. If I'm right.

> The hang still indicates that something is wrong in PAE-land, though.

it's working just fine on all 4GB+ systems tested (including 32GB
systems), Intel, Dell, Compaq boxes. So if it's a unique PAE bug, then it
must be some boundary condition.

Paul, here is the memory map of my 8GB system:

BIOS-provided physical RAM map:
 BIOS-e820: 000000000009d400 @ 0000000000000000 (usable)
 BIOS-e820: 0000000000002c00 @ 000000000009d400 (reserved)
 BIOS-e820: 0000000000020000 @ 00000000000e0000 (reserved)
 BIOS-e820: 0000000003ef8000 @ 0000000000100000 (usable)
 BIOS-e820: 0000000000007c00 @ 0000000003ff8000 (ACPI data)
 BIOS-e820: 0000000000000400 @ 0000000003fffc00 (ACPI NVS)
 BIOS-e820: 00000000ec000000 @ 0000000004000000 (usable)
 BIOS-e820: 0000000001400000 @ 00000000fec00000 (reserved)
 BIOS-e820: 00000000f0000000 @ 0000000100000000 (usable)

and here are the MTRR settings:

[root@m mingo]# cat /proc/mtrr
reg00: base=0xf0000000 (3840MB), size= 256MB: uncachable, count=1
reg01: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
reg02: base=0x100000000 (4096MB), size=2048MB: write-back, count=1
reg03: base=0x180000000 (6144MB), size=1024MB: write-back, count=1
reg04: base=0x1c0000000 (7168MB), size= 512MB: write-back, count=1
reg05: base=0x1e0000000 (7680MB), size= 256MB: write-back, count=1

i'd suggest using the mem=exact feature to force different type of memory
maps. Eg. i'm using the following append= line to force a 800 MB setup:

        append="mem=exactmap mem=0x0009d800@0x00000000 mem=0x03ef8000@0x00100000 mem=0x2bffe000@0x04000000"

such mem=exactmap lines can be constructed based on the BIOS output.

	Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 4G SGI quad Xeon - memory-related slowdowns
  2001-01-15 20:59 4G SGI quad Xeon - memory-related slowdowns Paul Hubbard
  2001-01-15 21:32 ` Linus Torvalds
@ 2001-01-16 20:14 ` Mark Hemment
  1 sibling, 0 replies; 4+ messages in thread
From: Mark Hemment @ 2001-01-16 20:14 UTC (permalink / raw)
  To: Paul Hubbard; +Cc: linux-kernel, Richard E. Jetton

[-- Attachment #1: Type: TEXT/PLAIN, Size: 813 bytes --]

Hi Paul,
 
> 2) Other block I/O output (eg dd if=/dev/zero of=/dev/sdi bs=4M) also
> run very slowly

What do you notice when running "top" and doing the above?
Does the "buff" value grow high (+700MB), with high CPU usage?

  If so, I think this might be down to nr_free_buffer_pages().

  This function includes the pages in all zones (including the HIGHMEM
zone) in its calculations, while only DMA and NORMAL zone pages are used
for buffers.  This upsets the result from 
balance_dirty_state() (fs/buffer.c), and as a result the required flushing
of buffers is only done as a result of running v low of pages in the DMA
and NORMAL zones.

  I've attached a "quick hack" I did for 2.4.0.  It doesn't completely
solve the problem, but moves it in the right direction.

  Please let me know if this helps.

Mark

[-- Attachment #2: buffer.patch --]
[-- Type: TEXT/PLAIN, Size: 1612 bytes --]

diff -urN -X dontdiff linux-2.4.0/mm/page_alloc.c markhe-2.4.0/mm/page_alloc.c
--- linux-2.4.0/mm/page_alloc.c	Wed Jan  3 17:59:06 2001
+++ markhe-2.4.0/mm/page_alloc.c	Mon Jan 15 15:35:14 2001
@@ -583,6 +583,27 @@
 }
 
 /*
+ *	Free pages in zone "type", and the zones below it.
+ */
+unsigned int nr_free_pages_zone (int type)
+{
+	unsigned int sum;
+	zone_t *zone;
+	pg_data_t *pgdat = pgdat_list;
+
+	if (type >= MAX_NR_ZONES)
+		BUG();
+
+	sum = 0;
+	while (pgdat) {
+		for (zone = pgdat->node_zones; zone < pgdat->node_zones + type; zone++)
+			sum += zone->free_pages;
+		pgdat = pgdat->node_next;
+	}
+	return sum;
+}
+
+/*
  * Total amount of inactive_clean (allocatable) RAM:
  */
 unsigned int nr_inactive_clean_pages (void)
@@ -600,6 +621,25 @@
 	return sum;
 }
 
+unsigned int nr_inactive_clean_pages_zone(int type)
+{
+	unsigned int sum;
+	zone_t *zone;
+	pg_data_t *pgdat = pgdat_list;
+
+	if (type >= MAX_NR_ZONES)
+		BUG();
+	type++;
+
+	sum = 0;
+	while (pgdat) {
+		for (zone = pgdat->node_zones; zone < pgdat->node_zones + type; zone++)
+			sum += zone->inactive_clean_pages;
+		pgdat = pgdat->node_next;
+	}
+	return sum;
+}
+
 /*
  * Amount of free RAM allocatable as buffer memory:
  */
@@ -607,9 +647,9 @@
 {
 	unsigned int sum;
 
-	sum = nr_free_pages();
-	sum += nr_inactive_clean_pages();
-	sum += nr_inactive_dirty_pages;
+	sum = nr_free_pages_zone(ZONE_NORMAL);
+	sum += nr_inactive_clean_pages_zone(ZONE_NORMAL);
+	sum += nr_inactive_dirty_pages;	/* XXX */
 
 	/*
 	 * Keep our write behind queue filled, even if

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2001-01-16 20:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-15 20:59 4G SGI quad Xeon - memory-related slowdowns Paul Hubbard
2001-01-15 21:32 ` Linus Torvalds
2001-01-15 22:33   ` Ingo Molnar
2001-01-16 20:14 ` Mark Hemment

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox