I/O throughput problem in newer kernels

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* I/O throughput problem in newer kernels
@ 2009-04-02 15:06 adobra
  2009-04-07  7:35 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: adobra @ 2009-04-02 15:06 UTC (permalink / raw)
  To: linux-kernel

While putting together a not so average machine for database research, I
bumped into the following performance problem with the newer kernels (I
tested 2.6.27.11, 2.6.29): the aggregate throughput drops drastically when
more than 20 hard drives are involved in the operation. This problem is
not happening on 2.6.22.9 or 2.6.20 (did not test other kernels).

Since I am not subscribed to the mailing list, I would appreciate you
cc-ing me on any reply or discussion.

1. Description of the machine
-----------------------------------------------
8 Quad-Core AMD Opteron(tm) Processor 8346 HE
Each processor has independent memory banks (16GB in each bank for 128GB
total)
Two PCI busses (connected in different places in the NUMA architecture)
8 hard drives installed into the base system on SATA interfaces
    First hard drive dedicated to the OS
    7 Western Digital hard drives (90 MB/s max throughput)
    Nvidia SATA chipset
4 Adaptec 5805 RAID cards installed in PCI-E 16X slots (all running at 8X
speed)
   The 4 cards live on two separate PCI busses
6 IBM EXP3000 disk enclosures
   2 cards connect to 2 enclosures each, the other 2 to 1 enclosure
8 Western Digital Velociraptor HD in each enclosure
   Max measured throughput 110-120 MB/s

Total number of hard drives used the tests: 7+47=54 or subsets
The Adaptec cards are configure to expose each disk individually to the
OS. Any RAID configuration seems to limit the throughput at 300-350MB/s
which is too low for the purpose of this system.

2. Throughput tests
--------------------------------
I did two types of tests: using dd (spawning parallel dd jobs that lasted
at least 10s) or using a multi-threaded program that simulates the
intended usage for the system. Results using both are consistent so I will
only report the results with the custom program. Both the dd test and the
custom one do reads in large chunks (256K/request at least). All request
in the custom program are made with "read" system call to page aligned
memory (allocated with mmap to make sure). The kernel is doing a zero-copy
to user space otherwise the speeds observed are not possible.

Here is what I observed in terms of throughput:
a. Speed/WD disk: 90 MB/s
b. Speed/Velociraptor disk: 110 MB/s
c. Speed of all WD disks in base system: 700MB/s
d. Speed of disks in one enclosure: 750 MB/s
e. Speed of disks connected to one Adaptec card: 1000 MB/s
f. Speed of disks connected on a single PCI bus: 2000 MB/s

The above numbers look good and are consistent on all kernels that I tried.

THE PROBLEM: when the number of disks exceeds 20 the throughput plummets
on newer kernels.

g. SPEED OF ALL DISKS: 600 MB/s on newer kernels, 2700 MB/s on older kernels
The throughput drops drastically the moment 20-25 hard drives are involved

3. Tests I performed to ensure the number of hard drives is the culprit
----------------------------------------------------------------------------------------------------------------
a. Took 1, 2, 3 and 4 disks from each enclosure to ensure uniform load on
buses
    performance going up as expected until 20 drives reached than dropping

b. Involved combinations of the regular WD drives and the Velociraptors.
    Had no major influence on the observation

c. Involved combinations of enclosures
    No influence

d. Used the hard drives in decreasing order of measured speed (as reported
by hdparm)
    Only minor influence and still drastic drop at 20

e. Changed the I/O scheduler used for the hard drives
    No influence

4. Things that I do not think are wrong
--------------------------------------------------------------
a. aacraid or scsi_nv drivers
    The problem depends only on the number of hard drives not the
combination of the drives themselves

b. Limitations on the buses
    The measured speeds of the subsystems indicate that no bottleneck on
individual buses is reached. Even if this is the case, the throughput
should level up not drop dramatically

c. Failures in the system
    No errors reported in /var/log/messages or other logs related to I/O

Of course, this begs the question WHAT IS WRONG?

I would be more than happy to run any tests you suggest on my system to
find the problem.

Alin

-- 
Alin Dobra
Assistant Professor
Computer Information Science & Engineering Department
University of Florida

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: I/O throughput problem in newer kernels
  2009-04-02 15:06 I/O throughput problem in newer kernels adobra
@ 2009-04-07  7:35 ` Andrew Morton
  2009-04-07 12:58   ` Alin Dobra
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2009-04-07  7:35 UTC (permalink / raw)
  To: adobra; +Cc: linux-kernel

On Thu, 2 Apr 2009 11:06:08 -0400 (EDT) adobra@cise.ufl.edu wrote:

> While putting together a not so average machine for database research, I
> bumped into the following performance problem with the newer kernels (I
> tested 2.6.27.11, 2.6.29): the aggregate throughput drops drastically when
> more than 20 hard drives are involved in the operation. This problem is
> not happening on 2.6.22.9 or 2.6.20 (did not test other kernels).

Well that's bad.  You'd at least expect the throughput to level out.

> Since I am not subscribed to the mailing list, I would appreciate you
> cc-ing me on any reply or discussion.
> 
> 1. Description of the machine
> -----------------------------------------------
> 8 Quad-Core AMD Opteron(tm) Processor 8346 HE
> Each processor has independent memory banks (16GB in each bank for 128GB
> total)
> Two PCI busses (connected in different places in the NUMA architecture)
> 8 hard drives installed into the base system on SATA interfaces
>     First hard drive dedicated to the OS
>     7 Western Digital hard drives (90 MB/s max throughput)
>     Nvidia SATA chipset
> 4 Adaptec 5805 RAID cards installed in PCI-E 16X slots (all running at 8X
> speed)
>    The 4 cards live on two separate PCI busses
> 6 IBM EXP3000 disk enclosures
>    2 cards connect to 2 enclosures each, the other 2 to 1 enclosure
> 8 Western Digital Velociraptor HD in each enclosure
>    Max measured throughput 110-120 MB/s
> 
> Total number of hard drives used the tests: 7+47=54 or subsets
> The Adaptec cards are configure to expose each disk individually to the
> OS. Any RAID configuration seems to limit the throughput at 300-350MB/s
> which is too low for the purpose of this system.
> 
> 2. Throughput tests
> --------------------------------
> I did two types of tests: using dd (spawning parallel dd jobs that lasted
> at least 10s) or using a multi-threaded program that simulates the
> intended usage for the system. Results using both are consistent so I will
> only report the results with the custom program. Both the dd test and the
> custom one do reads in large chunks (256K/request at least). All request
> in the custom program are made with "read" system call to page aligned
> memory (allocated with mmap to make sure). The kernel is doing a zero-copy
> to user space otherwise the speeds observed are not possible.
> 
> Here is what I observed in terms of throughput:
> a. Speed/WD disk: 90 MB/s
> b. Speed/Velociraptor disk: 110 MB/s
> c. Speed of all WD disks in base system: 700MB/s
> d. Speed of disks in one enclosure: 750 MB/s
> e. Speed of disks connected to one Adaptec card: 1000 MB/s
> f. Speed of disks connected on a single PCI bus: 2000 MB/s
> 
> The above numbers look good and are consistent on all kernels that I tried.
> 
> THE PROBLEM: when the number of disks exceeds 20 the throughput plummets
> on newer kernels.
> 
> g. SPEED OF ALL DISKS: 600 MB/s on newer kernels, 2700 MB/s on older kernels
> The throughput drops drastically the moment 20-25 hard drives are involved
> 
> 3. Tests I performed to ensure the number of hard drives is the culprit
> ----------------------------------------------------------------------------------------------------------------
> a. Took 1, 2, 3 and 4 disks from each enclosure to ensure uniform load on
> buses
>     performance going up as expected until 20 drives reached than dropping
> 
> b. Involved combinations of the regular WD drives and the Velociraptors.
>     Had no major influence on the observation
> 
> c. Involved combinations of enclosures
>     No influence
> 
> d. Used the hard drives in decreasing order of measured speed (as reported
> by hdparm)
>     Only minor influence and still drastic drop at 20
> 
> e. Changed the I/O scheduler used for the hard drives
>     No influence
> 
> 4. Things that I do not think are wrong
> --------------------------------------------------------------
> a. aacraid or scsi_nv drivers
>     The problem depends only on the number of hard drives not the
> combination of the drives themselves
> 
> b. Limitations on the buses
>     The measured speeds of the subsystems indicate that no bottleneck on
> individual buses is reached. Even if this is the case, the throughput
> should level up not drop dramatically
> 
> c. Failures in the system
>     No errors reported in /var/log/messages or other logs related to I/O
> 
> Of course, this begs the question WHAT IS WRONG?
> 
> I would be more than happy to run any tests you suggest on my system to
> find the problem.
> 

Did you monitor the CPU utilisation?

It would be interesting to test with O_DIRECT (dd conv=direct) to
remove the page allocator and page reclaim from the picture.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: I/O throughput problem in newer kernels
  2009-04-07  7:35 ` Andrew Morton
@ 2009-04-07 12:58   ` Alin Dobra
  0 siblings, 0 replies; 3+ messages in thread
From: Alin Dobra @ 2009-04-07 12:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew (and other people interestead),

I forgot to mention in the post that I used O_DIRECT for all tests
(including dd tests). The CPU utilization is negligible; keep in mind
that the machine has 32 cores so cpu should not be  problem, or memory
bandwidth for that matter. The think that puzzles me the most is the
fact that the throughput does not level up, it goes down dramatically
(a factor of 3 for 54 hard drives versus 20 hard drives).

Alin

--
Alin Dobra
Assistant Professor
Computer Information Science & Engineering Department
University of Florida

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-04-07 12:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-02 15:06 I/O throughput problem in newer kernels adobra
2009-04-07  7:35 ` Andrew Morton
2009-04-07 12:58   ` Alin Dobra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox