From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Mon, 24 Jun 2002 13:24:39 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Mon, 24 Jun 2002 13:24:38 -0400 Received: from ebiederm.dsl.xmission.com ([166.70.28.69]:41076 "EHLO frodo.biederman.org") by vger.kernel.org with ESMTP id ; Mon, 24 Jun 2002 13:24:38 -0400 To: "Gross, Mark" Cc: "'Christopher E. Brown'" , "Griffiths, Richard A" , "'Andrew Morton'" , mgross@unix-os.sc.intel.com, "'Jens Axboe'" , Linux Kernel Mailing List , lse-tech@lists.sourceforge.net Subject: Re: [Lse-tech] RE: ext3 performance bottleneck as the number of s pindles gets large References: <59885C5E3098D511AD690002A5072D3C057B49C9@orsmsx111.jf.intel.com> From: ebiederm@xmission.com (Eric W. Biederman) Date: 24 Jun 2002 11:14:08 -0600 In-Reply-To: <59885C5E3098D511AD690002A5072D3C057B49C9@orsmsx111.jf.intel.com> Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org "Gross, Mark" writes: > We are running the tests with the following mother board. > http://www.intel.com/design/servers/scb2/index.htm?iid=ipp_browse+motherbd_s > cb2& > > Its a very nice box with 2 independent 64/66 PCI buses. > Capable of 2x503MB/sec, using your logic ;) > > Regardless, the 640MB/s number was computed without considering the PCI bus > limitations, or the dual port nature of the base 160MB/sec nature of the > Adabptec SCSI-39160. > http://www.adaptec.com/worldwide/product/proddetail.html?sess=no&prodkey=ASC > -39160&cat=Products > > Realistically, we are looking for a max throughput of about 320MB/sec with 4 > adapters with enough drives attached. Careful even with at 320MB/sec this requires 50% of your systems theoretical memory bandwidth in DMA transfers. Application level benchmarks like streams can only achieve memory copy numbers on a PIII platform of about 320MB/sec. A highly tuned mmx, or sse memory copy can do better but it is a challenge. That close to the hardware limits finding the actual bottleneck can get very tricky. At the very least I would run the system with just one processor, and attempt to get the numbers that way. I can trivially see spinlock hold times staying high simply because the memory is busy with a DMA transfer, and so cannot be used to transfer the new contents of the lock. Eric