All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sunil Mushran <sunil.mushran@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] Some fsck perf numbers
Date: Fri, 16 Sep 2011 14:25:39 -0700	[thread overview]
Message-ID: <4E73BED3.2090106@oracle.com> (raw)

I have been playing with fsck.ocfs2. Performance-wise. Have some
interesting numbers to share.

This volume is 2T in size with 1.5 million files. Many exploded
kernels trees + some large files. The particulars are listed below.

I did 3 runs.

The first set of numbers are vanilla fsck.

In the second one, I added prefill before each of the allocator
chain scan. It fills up the cache before calling verify_chain().
The logic is simple. After the bitmap inode is read, it issues aios
for all first level groups. 243 of them. Then it reads the next_group
of all and again issues aios. And so on.

There is another piece of code in vanilla fsck. It is called precache.
The idea there is similar. During the suballocator scans, it force reads
the entire block group. The idea is to warm the cache for Pass 1. The
problem, as we know, is that precache only works when the cache is large
enough. In this run, it is not. The second set disables precache.

So set 2 enables prefill and disables precache.

In the third set, I also increased the size of the buffer in
open_inode_scan(). It was reading 32K to 1M. I upped it to one suballoc
block group. So 4MB max.

================================================================
   Number of blocks:   536870202
   Block size:         4096
   Number of clusters: 536870202
   Cluster size:       4096
   Number of slots:    1

   # of inodes with depth 0/1/2/3/4/5: 844325/16/0/0/0/0
   # of orphaned inodes found/deleted: 0/0

      1556247 regular files (712550 inlines, 0 reflinks)
        96706 directories (96056 inlines)
            0 character device files
            0 block device files
            0 fifos
            0 links
           50 symbolic links (50 fast symbolic links)
            0 sockets

Inline rule!
================================================================

   Cache size: 1017MB
   I/O read disk/cache: 15519MB / 511MB, write: 0MB, rate: 17.48MB/s
   Times real: 917.039s, user: 59.392s, sys: 10.997s

   Cache size: 1016MB
   I/O read disk/cache: 6956MB / 582MB, write: 0MB, rate: 11.93MB/s
   Times real: 631.968s, user: 48.739s, sys: 7.591s

   Cache size: 1019MB
   I/O read disk/cache: 6956MB / 582MB, write: 0MB, rate: 17.79MB/s
   Times real: 423.701s, user: 47.015s, sys: 4.621s

These are global numbers. I calculate numbers per pass and keep adding
them. Notice how the first set reads almost double the amount from disk.
It is because the inode allocator had 6G and the box had 1G of cache.
Pre reading the inodes hurts us. The third set reads the same amount as
second but has a better thruput. That's because open_inode_scan is reading
the entire block group.

Meaning we don't need precache. Instead we could increase the buffer size
in open_scan().

Now numbers per pass.

================================================================
Pass 0a: Checking cluster allocation chains
   I/O read disk/cache: 66MB / 1MB, write: 0MB, rate: 0.68MB/s
   Times real: 97.072s, user: 0.423s, sys: 0.280s

   I/O read disk/cache: 66MB / 66MB, write: 0MB, rate: 10.27MB/s
   Times real: 12.756s, user: 0.343s, sys: 0.156s

   I/O read disk/cache: 66MB / 66MB, write: 0MB, rate: 10.53MB/s
   Times real: 12.443s, user: 0.398s, sys: 0.178s

In 2 and 3, the cluster groups are read using aio. And it helps!
================================================================

Pass 0b: Checking inode allocation chains
   I/O read disk/cache: 6471MB / 14MB, write: 0MB, rate: 42.93MB/s
   Times real: 151.066s, user: 8.222s, sys: 2.512s

   I/O read disk/cache: 7MB / 20MB, write: 0MB, rate: 26.85MB/s
   Times real: 0.968s, user: 0.186s, sys: 0.025s

   I/O read disk/cache: 7MB / 20MB, write: 0MB, rate: 14.93MB/s
   Times real: 1.741s, user: 0.234s, sys: 0.034s

Disabling precache in 2 and 3 helps tremendously.
================================================================

Pass 0c: Checking extent block allocation chains
   I/O read disk/cache: 2101MB / 3MB, write: 0MB, rate: 42.70MB/s
   Times real: 49.249s, user: 2.628s, sys: 0.804s

   I/O read disk/cache: 3MB / 3MB, write: 0MB, rate: 19.68MB/s
   Times real: 0.254s, user: 0.053s, sys: 0.007s

   I/O read disk/cache: 3MB / 3MB, write: 0MB, rate: 19.97MB/s
   Times real: 0.250s, user: 0.056s, sys: 0.006s

Disabling precache in 2 and 3 helps. The caveat here is that this
volume has mainly files with depth 0.
================================================================

Pass 1: Checking inodes and blocks
   I/O read disk/cache: 6532MB / 67MB, write: 0MB, rate: 13.64MB/s
   Times real: 483.811s, user: 31.493s, sys: 5.995s

   I/O read disk/cache: 6531MB / 68MB, write: 0MB, rate: 13.70MB/s
   Times real: 481.581s, user: 31.039s, sys: 5.958s

   I/O read disk/cache: 6531MB / 68MB, write: 0MB, rate: 24.34MB/s
   Times real: 271.107s, user: 29.263s, sys: 2.982s

Set 3 is best because of the large buffer size in open_scan.
================================================================

The rest of the passes are unchanged. It will look at that next.

Comments welcome.

Sunil

             reply	other threads:[~2011-09-16 21:25 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-16 21:25 Sunil Mushran [this message]
2011-09-21 17:15 ` [Ocfs2-devel] Some fsck perf numbers Sunil Mushran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E73BED3.2090106@oracle.com \
    --to=sunil.mushran@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.