All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rogier Wolff <R.E.Wolff@BitWizard.nl>
To: linux-ext4@vger.kernel.org
Subject: fsck performance.
Date: Sun, 20 Feb 2011 10:06:56 +0100	[thread overview]
Message-ID: <20110220090656.GA11402@bitwizard.nl> (raw)


Hi,

I was running debian-stable, on my backup-server (the server that
does backups, not the "just-in-case" server). 

Debian apparently recently pointed that to the new release squeeze, so
I got upgraded. I went from kernel 2.6.26 to 2.6.32. After about a day
my system rebooted without my consent. So now it's running 2.6.32.

Since then I'm getting kernel-oops-lookalikes that start with:
[71664.306573] swapper: page allocation failure. order:5, mode:0x4020

Lots of them actually. 

(on the other hand, none of these happened before my filesystem got
thrashed...)



Anyway, upon boot into the new kernel ext3 printed abunch of these: 
[    5.212119] ext3_orphan_cleanup: deleting unreferenced inode 1335743

A few hours later, my storage partition was marked read-only and the
backups started failing.

kern.log.1.gz:Feb 18 05:39:53 driepoot kernel: [10328.424778] 
  EXT3-fs error (device md3): ext3_lookup: deleted inode referenced: 277447722

So to correct the situation I started an fsck. 

After about 24 hours, I decided that the fsck was taking too long and
decided to upgrade e2fsck. It has now been running for an hour and a
half. Now I don't mind fsck taking an hour or two. But I expect fsck
to be disk bound.

However iostat shows me it's doing next to nothing for seconds
at a time: 

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
md3               0.00         0.00         0.00          0          0
md3               0.00         0.00         0.00          0          0
md3               0.00         0.00         0.00          0          0
md3             733.33      2933.33         0.00       2904          0
md3               0.00         0.00         0.00          0          0
md3              63.37       253.47         0.00        256          0
md3               0.00         0.00         0.00          0          0
md3               0.00         0.00         0.00          0          0
md3               5.88        23.53         0.00         24          0

and it turns out that fsck is completely CPU bound: 


top - 09:26:29 up 2 days,  6:38, 10 users,  load average: 1.06, 1.07, 1.27
Tasks: 136 total,   2 running, 134 sleeping,   0 stopped,   0 zombie
Cpu(s): 79.1%us,  4.9%sy,  0.0%ni,  0.0%id,  0.4%wa,  1.5%hi, 14.1%si,  0.0%st
Mem:    969400k total,   956624k used,    12776k free,   226828k buffers
Swap:  1975976k total,   252220k used,  1723756k free,    67768k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
10274 root      20   0  839m 631m  52m R 97.7 66.7  50:07.09 e2fsck 

and when I trace fsck I get: 


fcntl64(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=164, len=1}) = 0
fcntl64(6, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=164, len=1}) = 0
fcntl64(6, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=264, len=1}) = 0
fcntl64(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=572, len=1}) = 0
fcntl64(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=164, len=1}) = 0
fcntl64(5, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=164, len=1}) = 0
fcntl64(5, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=572, len=1}) = 0
fcntl64(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=572, len=1}) = 0
fcntl64(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=164, len=1}) = 0

So, my question is: Are these fcntl calls neccesary? 
As far as I know locking is neccesary if another process might be 
handling the same data. Here is is doing this with the cache 
files: 

lrwx------ 1 root root 64 Feb 20 09:28 5 -> 
       /var/cache/e2fsck/123a1cfe-2455-4646-aa32-87492ed1ac97-icount-ayxVou
lrwx------ 1 root root 64 Feb 20 09:28 6 -> 
       /var/cache/e2fsck/123a1cfe-2455-4646-aa32-87492ed1ac97-dirinfo-rBBTtb

were, using these swap files makes sense as some machines don't have
the memory and/or addressingspace to handle a big fsck, but in my
case I have 1G RAM, and these two files are 56M total: 
-rw------- 1 root root 21M Feb 20 09:30 ...97-dirinfo-rBBTtb
-rw------- 1 root root 35M Feb 20 09:30 ...97-icount-ayxVou

# strace -p 10274 | & head -100000 | sort | uniq -c | sort -n

shows me that out of 100k system calls 

  10876 fcntl64(6, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=164, len=1}) = 0
  10877 fcntl64(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=164, len=1}) = 0
  13339 fcntl64(5, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=164, len=1}) = 0
  13339 fcntl64(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=164, len=1}) = 0

and 60-139 locks for different locations.

Oh... and fsck is now at the stage: 
 Pass 1: Checking inodes, blocks, and sizes

The filesystem is 3T: 
md3 : active raid5 sda3[0] sdd3[3] sdc3[2] sdb3[1]
      2868686592 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]

I'm studying e2fsck source code abit, but I don't yet see where the
fcntl calls are coming from.

	Roger.

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
**    Delftechpark 26 2628 XH  Delft, The Netherlands. KVK: 27239233    **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. 
Does it sit on the couch all day? Is it unemployed? Please be specific! 
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

             reply	other threads:[~2011-02-20  9:13 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-20  9:06 Rogier Wolff [this message]
2011-02-20 17:09 ` fsck performance Ted Ts'o
2011-02-20 19:34   ` Ted Ts'o
2011-02-20 21:55     ` Rogier Wolff
2011-02-20 22:20       ` Ted Ts'o
2011-02-20 23:15         ` Rogier Wolff
2011-02-20 23:41           ` Ted Ts'o
2011-02-21 10:31             ` Amir Goldstein
2011-02-21 16:04               ` Paweł Brodacki
2011-02-21 18:00                 ` Andreas Dilger
2011-02-22 10:20                   ` Rogier Wolff
2011-02-22 13:36                     ` Rogier Wolff
2011-02-22 13:54                       ` Rogier Wolff
2011-02-22 16:32                         ` Andreas Dilger
2011-02-22 22:13                           ` Ted Ts'o
2011-02-23  4:44                             ` Rogier Wolff
2011-02-23 11:32                               ` Theodore Tso
2011-02-23 20:53                                 ` Rogier Wolff
2011-02-23 22:24                                   ` Andreas Dilger
2011-02-23 23:17                                     ` Ted Ts'o
2011-02-24  0:41                                       ` Andreas Dilger
2011-02-24  8:59                                         ` Rogier Wolff
2011-02-24  7:29                                     ` Rogier Wolff
2011-02-24  8:59                                       ` Amir Goldstein
2011-02-24  9:02                                         ` Rogier Wolff
2011-02-24  9:33                                           ` Amir Goldstein
2011-02-24 23:53                                         ` Rogier Wolff
2011-02-25  0:26                                       ` Daniel Taylor
2011-02-23  2:54                           ` Rogier Wolff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110220090656.GA11402@bitwizard.nl \
    --to=r.e.wolff@bitwizard.nl \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.