public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.4 very slow memory access on abit kd7raid (kt400); ten times slower than on kg7raid
@ 2002-10-27  3:28 KORN Andras
  2002-10-27  7:39 ` freaky
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: KORN Andras @ 2002-10-27  3:28 UTC (permalink / raw)
  To: linux-kernel

Hi,

I recently upgraded the motherboard in a small server. Everything else
stayed the same (discussed below).

Everything slowed down. The easiest way to demontsrate this is by looking at
these figures:

 raid5: measuring checksumming speed
-   8regs     :  2343.600 MB/sec
-   32regs    :  1944.000 MB/sec
-   pIII_sse  :  4163.600 MB/sec
-   pII_mmx   :  3584.400 MB/sec
-   p5_mmx    :  4600.800 MB/sec
-raid5: using function: pIII_sse (4163.600 MB/sec)
+   8regs     :   228.400 MB/sec
+   32regs    :   199.200 MB/sec
+   pIII_sse  :   352.000 MB/sec
+   pII_mmx   :   316.800 MB/sec
+   p5_mmx    :   432.800 MB/sec
+raid5: using function: pIII_sse (352.000 MB/sec)

Old motherboard above, new below. (Why it chose pIII_sse even when p5_mmx
was faster is also an interesting question... :)

What could be causing this? I believe it is a kernel issue because memtest86
reports realistic memory bandwidths (about 590MB/s).

I'm now seeing obscene load averages (in excess of 50 not uncommon), and
'system' usage according to 'top' is almost always over 50 percent (used to
be around 15-20 with old motherboard).

Switching back to the old motherboard solves the problem. Load drops back to
the usual 0.2-2.

I use a 2.4.20-pre10-ac2 kernel now; I can't easily try a stock kernel
because they tend to fail during boot in spectacular ways (perhaps the MB is
too new?).

The box has four NICs, moderate network traffic (less than 10Mb/s total, on
average; I'm routing and firewalling between the subnets). CPU is an AMD
AthlonXP1800+. I have 1GB of DDR266 RAM.

I use LVM1 (so I guess I can't easily try a 2.5 kernel until the devmapper
stuff gets sorted out); I have two ATA133 disks sitting on the HPT372 that's
integrated on the MB (one disk on each channel). A single EIDE CDROM drive
is hooked up to the VIA IDE controller.

Three of the NICs are tulip cards from KTI (Intel 21143PD); the fourth is
the VIA Rhine integrated on the MB.

I compiled the kernel optimizing for K7; highmem is enabled; I tried both
with APIC enabled and disabled (the MB has one). USB support is compiled as
a module; it doesn't matter whether it's loaded or not. I don't use a
graphical framebuffer (card is an Ati Rage 128). MTRRs are enabled;
/proc/mtrr contains this:

reg00: base=0x00000000 (   0MB), size=1024MB: write-back, count=1
reg01: base=0xc0000000 (3072MB), size= 256MB: write-combining, count=1
reg05: base=0xc0000000 (3072MB), size= 256MB: write-combining, count=1

I'm running Debian unstable, a Domino server (yuck), tomcat 3 and apache
1.3.

A google search for 'kt400 linux slow' and similar phrases turned up nothing
that looked useful.

A diff between the bootup messages of the old and the new mb looks like this
(I omit stuff that doesn't seem relevant):

- BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
+ BIOS-e820: 0000000000000000 - 000000000009f000 (usable)
+ BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
  BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
  BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
  BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
+ BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
+ BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
  BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
...
 CPU: L2 Cache: 256K (64 bytes/line)
 Intel machine check architecture supported.
 Intel machine check reporting enabled on CPU#0.
-CPU:     After generic, caps: 0383f9ff c1cbf9ff 00000000 00000000
-CPU:             Common caps: 0383f9ff c1cbf9ff 00000000 00000000
+CPU:     After generic, caps: 0383fbff c1cbfbff 00000000 00000000
+CPU:             Common caps: 0383fbff c1cbfbff 00000000 00000000
 CPU: AMD Athlon(tm) XP 1800+ stepping 02
 Enabling fast FPU save and restore... done.
 Enabling unmasked SIMD FPU exception support... done.
...
 PCI: Using configuration type 1
...
 Bus scan for 01 returning with max=01
 Scanning behind PCI bridge 00:01.0, config 010100, pass 1
 Bus scan for 00 returning with max=01
-PCI: Bus 01 already known
-PCI: Using IRQ router VIA [1106/0686] at 00:07.0
+PCI: Using IRQ router VIA [1106/3177] at 00:11.0
...
 ACPI: Core Subsystem version [20011018]
 ACPI: Subsystem enabled
 ACPI: System firmware supports S0 S1 S4 S5
-Processor[0]: C0 C1, 2 throttling states
+Processor[0]: C0 C1 C2, 2 throttling states
 ACPI: Power Button (FF) found
 ACPI: Multiple power buttons detected, ignoring fixed-feature
 ACPI: Power Button (CM) found
 ACPI: Sleep Button (CM) found
+ACPI: Thermal Zone found
...
-VP_IDE: IDE controller at PCI slot 00:07.1
+VP_IDE: IDE controller at PCI slot 00:11.1
+PCI: No IRQ known for interrupt pin A of device 00:11.1. Please try using pci=biosirq.
(NB: I did that and it didn't help.)
 VP_IDE: chipset revision 6
 VP_IDE: not 100% native mode: will probe irqs later
 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
-VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci00:07.1
-    ide0: BM-DMA at 0xb400-0xb407, BIOS settings: hda:DMA, hdb:pio
-    ide1: BM-DMA at 0xb408-0xb40f, BIOS settings: hdc:pio, hdd:pio
-HPT370A: IDE controller at PCI slot 00:13.0
-PCI: Found IRQ 11 for device 00:13.0
-HPT370A: chipset revision 4
-HPT370A: not 100% native mode: will probe irqs later
+VP_IDE: VIA vt8235 (rev 00) IDE UDMA133 controller on pci00:11.1
+    ide0: BM-DMA at 0xc800-0xc807, BIOS settings: hda:DMA, hdb:pio
+    ide1: BM-DMA at 0xc808-0xc80f, BIOS settings: hdc:pio, hdd:pio
+HPT372: IDE controller at PCI slot 00:13.0
+PCI: Found IRQ 10 for device 00:13.0
+PCI: Sharing IRQ 10 with 00:10.2
+HPT372: chipset revision 5
+HPT372: not 100% native mode: will probe irqs later
 HPT37X: using 33MHz PCI clock
...

What can I do to fix or narrow down the problem with the new motherboard?

Is there any patch that might help?

I tried playing with the BIOS settings for memory access, but that didn't
buy me anything either.

Thanks in advance for any suggestions.

Andrew

Ps. I'm not subscribed; I'll check the archives, but I'd be obliged if you
could Cc: me with replies anyway.

-- 
          Andrew Korn (Korn Andras) <korn at chardonnay.math.bme.hu>
           Finger korn at chardonnay.math.bme.hu for pgp key. QOTD:
               "Make love not war." - "I'm married and do both."

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4 very slow memory access on abit kd7raid (kt400); ten times slower than on kg7raid
  2002-10-27  3:28 2.4 very slow memory access on abit kd7raid (kt400); ten times slower than on kg7raid KORN Andras
@ 2002-10-27  7:39 ` freaky
       [not found] ` <20021027075346.GB29184@alpha.home.local>
       [not found] ` <Pine.LNX.4.44.0210271245430.3202-100000@pianoman.cluster.toy>
  2 siblings, 0 replies; 6+ messages in thread
From: freaky @ 2002-10-27  7:39 UTC (permalink / raw)
  To: korn-linuxkernel; +Cc: linux-kernel

l> Everything slowed down. The easiest way to demontsrate this is by looking
at
> these figures:
>
>  raid5: measuring checksumming speed
> -   8regs     :  2343.600 MB/sec
> -   32regs    :  1944.000 MB/sec
> -   pIII_sse  :  4163.600 MB/sec
> -   pII_mmx   :  3584.400 MB/sec
> -   p5_mmx    :  4600.800 MB/sec
> -raid5: using function: pIII_sse (4163.600 MB/sec)
> +   8regs     :   228.400 MB/sec
> +   32regs    :   199.200 MB/sec
> +   pIII_sse  :   352.000 MB/sec
> +   pII_mmx   :   316.800 MB/sec
> +   p5_mmx    :   432.800 MB/sec
> +raid5: using function: pIII_sse (352.000 MB/sec)
>
> Old motherboard above, new below. (Why it chose pIII_sse even when p5_mmx
> was faster is also an interesting question... :)

I have seen the same on a precompiled slackware 8.1 raid.s kernel I tried
for my promise controller. It's an AMD AthlonXP 2000+. Like you I found that
the PIII_SSE was slower than the P5_MMX and still got selected. I got higher
numers than you though, around your old mobo's speeds.... (KT333 chipset).
(MSI KT3 Ultra2-R). specs are in the KT333, IO-APIC, Promise Fasttak, Initrd
topic.


the 5 disks spanning ram image doesn't even load properly with me, maybe
it's caused by memory problems as well? Tho' I thought north bridges are for
memory access whilst I only get a message that my southbridge isn't
recognized...




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4 very slow memory access on abit kd7raid (kt400); ten times slower than on kg7raid
@ 2002-10-27  9:16 Manfred Spraul
  0 siblings, 0 replies; 6+ messages in thread
From: Manfred Spraul @ 2002-10-27  9:16 UTC (permalink / raw)
  To: KORN Andras, linux-kernel

It could be a bug in the memory detection. I had a similar problem with 
one PC-chips board.

Could you check if
- an explicit "mem=63m" line helps?
- disabling all power management in the bios help?

--
    Manfred


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4 very slow memory access on abit kd7raid (kt400); ten times slower than on kg7raid
       [not found]   ` <Pine.LNX.4.33.0210270139530.22820-100000@coffee.psychology.mcmaster.ca>
@ 2002-10-27 13:33     ` KORN Andras
  0 siblings, 0 replies; 6+ messages in thread
From: KORN Andras @ 2002-10-27 13:33 UTC (permalink / raw)
  To: linux-kernel

On Sun, Oct 27, 2002 at 01:44:26AM -0400, Mark Hahn wrote:

> >  raid5: measuring checksumming speed
> > -   8regs     :  2343.600 MB/sec
...
> > -raid5: using function: pIII_sse (4163.600 MB/sec)
> > +   8regs     :   228.400 MB/sec
...
> > +raid5: using function: pIII_sse (352.000 MB/sec)
> caching is disabled.

That was what it looked like to me, but I read in the FAQ I shouldn't jump
to conclusions. :)

> > What could be causing this? I believe it is a kernel issue because
> > memtest86 reports realistic memory bandwidths (about 590MB/s).
> 590 MB/s is quite low.  but I believe memtest86 also explicitly manages
> cache and mtrr's.

It does. With 'realistic' I meant it's on the same order of magnitude as
with the other MB.

> > reg00: base=0x00000000 (   0MB), size=1024MB: write-back, count=1
> I wonder if it's lying.

How can I find out? (Well, it sure looks like it's lying, so there's little
point in going to great lengths to confirm it; but why does it lie?)

> > +ACPI: Thermal Zone found
> any idea whether the CPU is hot?  (ie, there's usually a temp monitoring
> screen in the bios.)

It's not. Never seen it go above 40 degrees Celsius (about 104 Fahrenheit).

On Sun, Oct 27, 2002 at 08:53:46AM +0100, Willy Tarreau wrote:
> > What could be causing this? I believe it is a kernel issue because
> > memtest86 reports realistic memory bandwidths (about 590MB/s).
> does memtest86 report high speeds for the L2 cache ?

Over 3000MB/s for L2 and over 9000MB/s for L1. (I can't check exactly right
now.)

> I don't know if a buggy bios can slow it down that much, but that could
> explain your problem.

To me, everything looks right in memtest86. The values are slightly higher
than with the old MB.

> you can also take a look at /proc/interrupts to see if one source (NMI,
> machine check...) is bombing (ie more than tens of thousands/sec), thus
> letting no more time for other operations.

Sorry, I meant to include that in my original post. Here goes (without APIC):

           CPU0
  0:    3657439          XT-PIC  timer
  1:          2          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  5:    1187394          XT-PIC  eth1, eth2
  8:          3          XT-PIC  rtc
  9:          0          XT-PIC  acpi
 10:     245852          XT-PIC  ide2, ide3
 11:     461019          XT-PIC  eth3
 12:      86306          XT-PIC  eth0
 14:          3          XT-PIC  ide0
NMI:          0
ERR:          0
 
This is after 10 hours of uptime (with load continually in excess of 20). It
doesn't look suspicious to me. I could probably shuffle eth1 and eth2 around
so they don't share IRQs, but that wouldn't make much of a difference, I
think.

Andrew

Ps. Please keep Cc:ing me with replies, if it's not too much trouble.

-- 
          Andrew Korn (Korn Andras) <korn at chardonnay.math.bme.hu>
           Finger korn at chardonnay.math.bme.hu for pgp key. QOTD:
                     Why did Kamikaze pilots wear helmets?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [solved] 2.4 very slow memory access on abit kd7raid (kt400); ten times slower than on kg7raid
       [not found] ` <Pine.LNX.4.44.0210271245430.3202-100000@pianoman.cluster.toy>
@ 2002-10-27 18:19   ` KORN Andras
  2002-10-28  1:42     ` John Clemens
  0 siblings, 1 reply; 6+ messages in thread
From: KORN Andras @ 2002-10-27 18:19 UTC (permalink / raw)
  To: linux-kernel

On Sun, Oct 27, 2002 at 12:46:05PM -0500, John Clemens wrote:

Hi,

> Try booting with "acpi=off"

OK, this worked. The system is running at normal speed now.

What was the problem? What did this have to do with acpi?

Andrew

-- 
          Andrew Korn (Korn Andras) <korn at chardonnay.math.bme.hu>
           Finger korn at chardonnay.math.bme.hu for pgp key. QOTD:
            Never let any mechanical device know you're in a hurry.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [solved] 2.4 very slow memory access on abit kd7raid (kt400); ten times slower than on kg7raid
  2002-10-27 18:19   ` [solved] " KORN Andras
@ 2002-10-28  1:42     ` John Clemens
  0 siblings, 0 replies; 6+ messages in thread
From: John Clemens @ 2002-10-28  1:42 UTC (permalink / raw)
  To: KORN Andras; +Cc: linux-kernel


I actually don't know, but I ran into a very similar problem on a Kt133
based Athlon ages ago (in computer years, anyway.. around december of last
year ;).. I remember it was an ACPI problem that just "went away" with a
later version of ACPI.  That was an MSI (K7 Master, maybe?) motherboard.

Unfortunately, that's all i know.  If you find the answer, please let me
know as well.

john.c

On Sun, 27 Oct 2002, KORN Andras wrote:

> On Sun, Oct 27, 2002 at 12:46:05PM -0500, John Clemens wrote:
>
> Hi,
>
> > Try booting with "acpi=off"
>
> OK, this worked. The system is running at normal speed now.
>
> What was the problem? What did this have to do with acpi?
>
> Andrew
>
>

-- 
John Clemens          http://www.deater.net/john
john@deater.net     ICQ: 7175925, IM: PianoManO8
      "I Hate Quotes" -- Samuel L. Clemens



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-10-28  1:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-27  3:28 2.4 very slow memory access on abit kd7raid (kt400); ten times slower than on kg7raid KORN Andras
2002-10-27  7:39 ` freaky
     [not found] ` <20021027075346.GB29184@alpha.home.local>
     [not found]   ` <Pine.LNX.4.33.0210270139530.22820-100000@coffee.psychology.mcmaster.ca>
2002-10-27 13:33     ` KORN Andras
     [not found] ` <Pine.LNX.4.44.0210271245430.3202-100000@pianoman.cluster.toy>
2002-10-27 18:19   ` [solved] " KORN Andras
2002-10-28  1:42     ` John Clemens
  -- strict thread matches above, loose matches on Subject: below --
2002-10-27  9:16 Manfred Spraul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox