linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fsck performance.
@ 2011-02-20  9:06 Rogier Wolff
  2011-02-20 17:09 ` Ted Ts'o
  0 siblings, 1 reply; 29+ messages in thread
From: Rogier Wolff @ 2011-02-20  9:06 UTC (permalink / raw)
  To: linux-ext4


Hi,

I was running debian-stable, on my backup-server (the server that
does backups, not the "just-in-case" server). 

Debian apparently recently pointed that to the new release squeeze, so
I got upgraded. I went from kernel 2.6.26 to 2.6.32. After about a day
my system rebooted without my consent. So now it's running 2.6.32.

Since then I'm getting kernel-oops-lookalikes that start with:
[71664.306573] swapper: page allocation failure. order:5, mode:0x4020

Lots of them actually. 

(on the other hand, none of these happened before my filesystem got
thrashed...)



Anyway, upon boot into the new kernel ext3 printed abunch of these: 
[    5.212119] ext3_orphan_cleanup: deleting unreferenced inode 1335743

A few hours later, my storage partition was marked read-only and the
backups started failing.

kern.log.1.gz:Feb 18 05:39:53 driepoot kernel: [10328.424778] 
  EXT3-fs error (device md3): ext3_lookup: deleted inode referenced: 277447722

So to correct the situation I started an fsck. 

After about 24 hours, I decided that the fsck was taking too long and
decided to upgrade e2fsck. It has now been running for an hour and a
half. Now I don't mind fsck taking an hour or two. But I expect fsck
to be disk bound.

However iostat shows me it's doing next to nothing for seconds
at a time: 

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
md3               0.00         0.00         0.00          0          0
md3               0.00         0.00         0.00          0          0
md3               0.00         0.00         0.00          0          0
md3             733.33      2933.33         0.00       2904          0
md3               0.00         0.00         0.00          0          0
md3              63.37       253.47         0.00        256          0
md3               0.00         0.00         0.00          0          0
md3               0.00         0.00         0.00          0          0
md3               5.88        23.53         0.00         24          0

and it turns out that fsck is completely CPU bound: 


top - 09:26:29 up 2 days,  6:38, 10 users,  load average: 1.06, 1.07, 1.27
Tasks: 136 total,   2 running, 134 sleeping,   0 stopped,   0 zombie
Cpu(s): 79.1%us,  4.9%sy,  0.0%ni,  0.0%id,  0.4%wa,  1.5%hi, 14.1%si,  0.0%st
Mem:    969400k total,   956624k used,    12776k free,   226828k buffers
Swap:  1975976k total,   252220k used,  1723756k free,    67768k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
10274 root      20   0  839m 631m  52m R 97.7 66.7  50:07.09 e2fsck 

and when I trace fsck I get: 


fcntl64(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=164, len=1}) = 0
fcntl64(6, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=164, len=1}) = 0
fcntl64(6, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=264, len=1}) = 0
fcntl64(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=572, len=1}) = 0
fcntl64(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=164, len=1}) = 0
fcntl64(5, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=164, len=1}) = 0
fcntl64(5, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=572, len=1}) = 0
fcntl64(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=572, len=1}) = 0
fcntl64(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=164, len=1}) = 0

So, my question is: Are these fcntl calls neccesary? 
As far as I know locking is neccesary if another process might be 
handling the same data. Here is is doing this with the cache 
files: 

lrwx------ 1 root root 64 Feb 20 09:28 5 -> 
       /var/cache/e2fsck/123a1cfe-2455-4646-aa32-87492ed1ac97-icount-ayxVou
lrwx------ 1 root root 64 Feb 20 09:28 6 -> 
       /var/cache/e2fsck/123a1cfe-2455-4646-aa32-87492ed1ac97-dirinfo-rBBTtb

were, using these swap files makes sense as some machines don't have
the memory and/or addressingspace to handle a big fsck, but in my
case I have 1G RAM, and these two files are 56M total: 
-rw------- 1 root root 21M Feb 20 09:30 ...97-dirinfo-rBBTtb
-rw------- 1 root root 35M Feb 20 09:30 ...97-icount-ayxVou

# strace -p 10274 | & head -100000 | sort | uniq -c | sort -n

shows me that out of 100k system calls 

  10876 fcntl64(6, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=164, len=1}) = 0
  10877 fcntl64(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=164, len=1}) = 0
  13339 fcntl64(5, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=164, len=1}) = 0
  13339 fcntl64(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=164, len=1}) = 0

and 60-139 locks for different locations.

Oh... and fsck is now at the stage: 
 Pass 1: Checking inodes, blocks, and sizes

The filesystem is 3T: 
md3 : active raid5 sda3[0] sdd3[3] sdc3[2] sdb3[1]
      2868686592 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]

I'm studying e2fsck source code abit, but I don't yet see where the
fcntl calls are coming from.

	Roger.

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
**    Delftechpark 26 2628 XH  Delft, The Netherlands. KVK: 27239233    **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. 
Does it sit on the couch all day? Is it unemployed? Please be specific! 
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2011-02-25  0:37 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-20  9:06 fsck performance Rogier Wolff
2011-02-20 17:09 ` Ted Ts'o
2011-02-20 19:34   ` Ted Ts'o
2011-02-20 21:55     ` Rogier Wolff
2011-02-20 22:20       ` Ted Ts'o
2011-02-20 23:15         ` Rogier Wolff
2011-02-20 23:41           ` Ted Ts'o
2011-02-21 10:31             ` Amir Goldstein
2011-02-21 16:04               ` Paweł Brodacki
2011-02-21 18:00                 ` Andreas Dilger
2011-02-22 10:20                   ` Rogier Wolff
2011-02-22 13:36                     ` Rogier Wolff
2011-02-22 13:54                       ` Rogier Wolff
2011-02-22 16:32                         ` Andreas Dilger
2011-02-22 22:13                           ` Ted Ts'o
2011-02-23  4:44                             ` Rogier Wolff
2011-02-23 11:32                               ` Theodore Tso
2011-02-23 20:53                                 ` Rogier Wolff
2011-02-23 22:24                                   ` Andreas Dilger
2011-02-23 23:17                                     ` Ted Ts'o
2011-02-24  0:41                                       ` Andreas Dilger
2011-02-24  8:59                                         ` Rogier Wolff
2011-02-24  7:29                                     ` Rogier Wolff
2011-02-24  8:59                                       ` Amir Goldstein
2011-02-24  9:02                                         ` Rogier Wolff
2011-02-24  9:33                                           ` Amir Goldstein
2011-02-24 23:53                                         ` Rogier Wolff
2011-02-25  0:26                                       ` Daniel Taylor
2011-02-23  2:54                           ` Rogier Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).