Re: ext3 performance bottleneck as the number of spindles gets large

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@zip.com.au>
To: mgross@unix-os.sc.intel.com
Cc: "Griffiths, Richard A" <richard.a.griffiths@intel.com>,
	"'Jens Axboe'" <axboe@suse.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	lse-tech@lists.sourceforge.net
Subject: Re: ext3 performance bottleneck as the number of spindles gets large
Date: Thu, 20 Jun 2002 14:25:53 -0700	[thread overview]
Message-ID: <3D124861.B22D03AB@zip.com.au> (raw)
In-Reply-To: 200206202101.g5KL1BP10776@unix-os.sc.intel.com

mgross wrote:
> 
> On Thursday 20 June 2002 04:18 pm, Andrew Morton wrote:
> > Yup.  I take it back - high ext3 lock contention happens on 2.5
> > as well, which has block-highmem.  With heavy write traffic onto
> > six disks, two controllers, six filesystems, four CPUs the machine
> > spends about 40% of the time spinning on locks in fs/ext3/inode.c
> > You're un dual CPU, so the contention is less.
> >
> > Not very nice.  But given that the longest spin time was some
> > tens of milliseconds, with the average much lower, it shouldn't
> > affect overall I/O throughput.
> 
> How could losing 40% of your CPU's to spin locks NOT spank your throughtput?

The limiting factor is usually disk bandwidth, seek latency, rotational
latency.  That's why I want to know your bandwidth.

> Can you copy your lockmeter data from its kernel_flag section?  Id like to
> see it.

I don't find lockmeter very useful.  Here's oprofile output for 2.5.23:

c013ec08 873      1.07487     rmqueue                 
c018a8e4 950      1.16968     do_get_write_access     
c013b00c 969      1.19307     kmem_cache_alloc_batch  
c018165c 1120     1.37899     ext3_writepage          
c0193120 1457     1.79392     journal_add_journal_head 
c0180e30 1458     1.79515     ext3_prepare_write      
c0136948 6546     8.05969     generic_file_write      
c01838ac 42608    52.4606     .text.lock.inode      

So I lost two CPUs on the BKL in fs/ext3/inode.c.  The remaining
two should be enough to saturate all but the most heroic disk
subsystems.

A couple of possibilities come to mind:

1: Processes which should be submitting I/O against disk "A" are
   instead spending tons of time asleep in the page allocator waiting
   for I/O to complete against disk "B".

2: ext3 is just too slow for the rate of data which you're trying to
   push at it.   This exhibits as lock contention, but the root cause
   is the cost of things like ext3_mark_inode_dirty().  And *that*
   is something we can fix - can shave 75% off the cost of that.

Need more data...

> >
> > Possibly something else is happening.  Have you tested ext2?
> 
> No.  We're attempting to see if we can scale to large numbers of spindles
> with EXT3 at the moment.  Perhaps we can effect positive changes to ext3
> before giving up on it and moving to another Journaled FS.

Have you tried *any* other fs?

-

next prev parent reply	other threads:[~2002-06-20 21:27 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-06-20 15:26 ext3 performance bottleneck as the number of spindles gets la rge Griffiths, Richard A
2002-06-20 20:18 ` ext3 performance bottleneck as the number of spindles gets large Andrew Morton
2002-06-20 18:08   ` mgross
2002-06-20 21:25     ` Andrew Morton [this message]
  -- strict thread matches above, loose matches on Subject: below --
2002-06-20 21:50 ext3 performance bottleneck as the number of spindles gets la rge Griffiths, Richard A
2002-06-21  7:58 ` ext3 performance bottleneck as the number of spindles gets large Andrew Morton
2002-06-21 18:46   ` mgross
2002-06-21 19:26     ` Chris Mason
2002-06-21 19:56     ` Andrew Morton
2002-06-23  4:02 ` Christopher E. Brown
2002-06-23  4:33   ` Andreas Dilger
2002-06-23  6:00     ` Christopher E. Brown
2002-06-19 21:29 mgross
2002-06-20  0:54 ` Andrew Morton
2002-06-20  9:54   ` Stephen C. Tweedie
2002-06-20  1:55 ` Andrew Morton
2002-06-20  6:05   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D124861.B22D03AB@zip.com.au \
    --to=akpm@zip.com.au \
    --cc=axboe@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=mgross@unix-os.sc.intel.com \
    --cc=richard.a.griffiths@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox