All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joe Landman <joe.landman@gmail.com>
To: xfs@oss.sgi.com, linux-raid <linux-raid@vger.kernel.org>
Subject: very slow file deletion on an SSD
Date: Fri, 25 May 2012 06:37:05 -0400	[thread overview]
Message-ID: <4FBF60D1.80104@gmail.com> (raw)

Hi folks:

   Just ran into this (see posted output at bottom).  3.2.14 kernel, MD 
RAID 5, xfs file system.  Not sure (precisely) where the problem is, 
hence posting to both lists.

  [root@siFlash ~]# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md22 : active raid5 sdl[0] sds[7] sdx[6] sdu[5] sdk[4] sdz[3] sdw[2] sdr[1]
       1641009216 blocks super 1.2 level 5, 32k chunk, algorithm 2 [8/8] 
[UUUUUUUU]

md20 : active raid5 sdh[0] sdf[7] sdm[6] sdd[5] sdc[4] sde[3] sdi[2] sdg[1]
       1641009216 blocks super 1.2 level 5, 32k chunk, algorithm 2 [8/8] 
[UUUUUUUU]

md21 : active raid5 sdy[0] sdq[7] sdp[6] sdo[5] sdn[4] sdj[3] sdv[2] sdt[1]
       1641009216 blocks super 1.2 level 5, 32k chunk, algorithm 2 [8/8] 
[UUUUUUUU]

md0 : active raid1 sdb1[1] sda1[0]
       93775800 blocks super 1.0 [2/2] [UU]
       bitmap: 1/1 pages [4KB], 65536KB chunk


md2* are SSD RAID5 arrays we are experimenting with.  Xfs file systems 
atop them:

[root@siFlash ~]# mount | grep md2
/dev/md20 on /data/1 type xfs (rw)
/dev/md21 on /data/2 type xfs (rw)
/dev/md22 on /data/3 type xfs (rw)

vanilla mount options (following Dave Chinner's long standing advice)

meta-data=/dev/md20              isize=2048   agcount=32, 
agsize=12820392 blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=410252304, imaxpct=5
          =                       sunit=8      swidth=56 blks
naming   =version 2              bsize=65536  ascii-ci=0
log      =internal               bsize=4096   blocks=30720, version=2
          =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

[root@siFlash ~]# mdadm --detail /dev/md20
/dev/md20:
         Version : 1.2
   Creation Time : Sun Apr  1 19:36:39 2012
      Raid Level : raid5
      Array Size : 1641009216 (1564.99 GiB 1680.39 GB)
   Used Dev Size : 234429888 (223.57 GiB 240.06 GB)
    Raid Devices : 8
   Total Devices : 8
     Persistence : Superblock is persistent

     Update Time : Fri May 25 06:26:23 2012
           State : clean
  Active Devices : 8
Working Devices : 8
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 32K

            Name : siFlash.sicluster:20
            UUID : 2f023323:6ec29eb9:a943de06:f6e0c25d
          Events : 296

     Number   Major   Minor   RaidDevice State
        0       8      112        0      active sync   /dev/sdh
        1       8       96        1      active sync   /dev/sdg
        2       8      128        2      active sync   /dev/sdi
        3       8       64        3      active sync   /dev/sde
        4       8       32        4      active sync   /dev/sdc
        5       8       48        5      active sync   /dev/sdd
        6       8      192        6      active sync   /dev/sdm
        7       8       80        7      active sync   /dev/sdf

All the SSDs are on deadline scheduler

[root@siFlash ~]# cat /sys/block/sd*/queue/scheduler | uniq
noop [deadline] cfq


All this said, deletes from this unit are taking 1-2 seconds per file ...

[root@siFlash ~]# strace -ttt -T rm -f /data/2/test/*
1337941514.040788 execve("/bin/rm", ["rm", "-f", 
"/data/2/test/2.8t-r.97.0", "/data/2/test/2.8t-r.98.0", 
"/data/2/test/2.8t-r.99.0", "/data/2/test/2.9.0"], [/* 40 vars */]) = 0 
<0.000552>
1337941514.041713 brk(0)                = 0x60d000 <0.000031>
1337941514.041927 mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7bc2779000 <0.000032>
1337941514.042113 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No 
such file or directory) <0.000109>
1337941514.042395 open("/etc/ld.so.cache", O_RDONLY) = 3 <0.000050>
1337941514.042614 fstat(3, {st_mode=S_IFREG|0644, st_size=81118, ...}) = 
0 <0.000102>
1337941514.042928 mmap(NULL, 81118, PROT_READ, MAP_PRIVATE, 3, 0) = 
0x7f7bc2765000 <0.000042>
1337941514.043078 close(3)              = 0 <0.000019>
1337941514.043235 open("/lib64/libc.so.6", O_RDONLY) = 3 <0.000115>
1337941514.043477 read(3, 
"\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\355\301W4\0\0\0"..., 
832) = 832 <0.000039>
1337941514.043647 fstat(3, {st_mode=S_IFREG|0755, st_size=1908792, ...}) 
= 0 <0.000020>
1337941514.043860 mmap(0x3457c00000, 3733672, PROT_READ|PROT_EXEC, 
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3457c00000 <0.000085>
1337941514.044065 mprotect(0x3457d86000, 2097152, PROT_NONE) = 0 <0.000034>
1337941514.044191 mmap(0x3457f86000, 20480, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x186000) = 0x3457f86000 <0.000034>
1337941514.044388 mmap(0x3457f8b000, 18600, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3457f8b000 <0.000085>
1337941514.044592 close(3)              = 0 <0.000058>
1337941514.044763 mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7bc2764000 <0.000039>
1337941514.044893 mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7bc2763000 <0.000020>
1337941514.044981 mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7bc2762000 <0.000018>
1337941514.045076 arch_prctl(ARCH_SET_FS, 0x7f7bc2763700) = 0 <0.000018>
1337941514.045183 mprotect(0x3457f86000, 16384, PROT_READ) = 0 <0.000023>
1337941514.045270 mprotect(0x345761f000, 4096, PROT_READ) = 0 <0.000019>
1337941514.045350 munmap(0x7f7bc2765000, 81118) = 0 <0.000028>
1337941514.045619 brk(0)                = 0x60d000 <0.000017>
1337941514.045698 brk(0x62e000)         = 0x62e000 <0.000018>
1337941514.045803 open("/usr/lib/locale/locale-archive", O_RDONLY) = 3 
<0.000028>
1337941514.045904 fstat(3, {st_mode=S_IFREG|0644, st_size=99158704, 
...}) = 0 <0.000017>
1337941514.046012 mmap(NULL, 99158704, PROT_READ, MAP_PRIVATE, 3, 0) = 
0x7f7bbc8d1000 <0.000020>
1337941514.046099 close(3)              = 0 <0.000017>
1337941514.046235 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost 
isig icanon echo ...}) = 0 <0.000020>
1337941514.046373 newfstatat(AT_FDCWD, "/data/2/test/2.8t-r.97.0", 
{st_mode=S_IFREG|0600, st_size=1073741824, ...}, AT_SYMLINK_NOFOLLOW) = 
0 <0.000024>
1337941514.046504 unlinkat(AT_FDCWD, "/data/2/test/2.8t-r.97.0", 0) = 0 
<1.357571>
1337941515.404257 newfstatat(AT_FDCWD, "/data/2/test/2.8t-r.98.0", 
{st_mode=S_IFREG|0600, st_size=1073741824, ...}, AT_SYMLINK_NOFOLLOW) = 
0 <0.000072>
1337941515.404485 unlinkat(AT_FDCWD, "/data/2/test/2.8t-r.98.0", 0) = 0 
<1.608016>
1337941517.012706 newfstatat(AT_FDCWD, "/data/2/test/2.8t-r.99.0", 
{st_mode=S_IFREG|0600, st_size=1073741824, ...}, AT_SYMLINK_NOFOLLOW) = 
0 <0.000082>
1337941517.012957 unlinkat(AT_FDCWD, "/data/2/test/2.8t-r.99.0", 0) = 0 
<1.133890>
1337941518.146983 newfstatat(AT_FDCWD, "/data/2/test/2.9.0", 
{st_mode=S_IFREG|0600, st_size=8589934592, ...}, AT_SYMLINK_NOFOLLOW) = 
0 <0.000023>
1337941518.147145 unlinkat(AT_FDCWD, "/data/2/test/2.9.0", 0) = 0 <0.938754>
1337941519.086125 close(0)              = 0 <0.000102>
1337941519.086357 close(1)              = 0 <0.000061>
1337941519.086540 close(2)              = 0 <0.000021>
1337941519.086694 exit_group(0)         = ?

Anything obvious that we are doing wrong?


Machine may be occupied for a bit.  Might be a few days before we can 
get results back.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615


WARNING: multiple messages have this Message-ID (diff)
From: Joe Landman <joe.landman@gmail.com>
To: xfs@oss.sgi.com, linux-raid <linux-raid@vger.kernel.org>
Subject: very slow file deletion on an SSD
Date: Fri, 25 May 2012 06:37:05 -0400	[thread overview]
Message-ID: <4FBF60D1.80104@gmail.com> (raw)

Hi folks:

   Just ran into this (see posted output at bottom).  3.2.14 kernel, MD 
RAID 5, xfs file system.  Not sure (precisely) where the problem is, 
hence posting to both lists.

  [root@siFlash ~]# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md22 : active raid5 sdl[0] sds[7] sdx[6] sdu[5] sdk[4] sdz[3] sdw[2] sdr[1]
       1641009216 blocks super 1.2 level 5, 32k chunk, algorithm 2 [8/8] 
[UUUUUUUU]

md20 : active raid5 sdh[0] sdf[7] sdm[6] sdd[5] sdc[4] sde[3] sdi[2] sdg[1]
       1641009216 blocks super 1.2 level 5, 32k chunk, algorithm 2 [8/8] 
[UUUUUUUU]

md21 : active raid5 sdy[0] sdq[7] sdp[6] sdo[5] sdn[4] sdj[3] sdv[2] sdt[1]
       1641009216 blocks super 1.2 level 5, 32k chunk, algorithm 2 [8/8] 
[UUUUUUUU]

md0 : active raid1 sdb1[1] sda1[0]
       93775800 blocks super 1.0 [2/2] [UU]
       bitmap: 1/1 pages [4KB], 65536KB chunk


md2* are SSD RAID5 arrays we are experimenting with.  Xfs file systems 
atop them:

[root@siFlash ~]# mount | grep md2
/dev/md20 on /data/1 type xfs (rw)
/dev/md21 on /data/2 type xfs (rw)
/dev/md22 on /data/3 type xfs (rw)

vanilla mount options (following Dave Chinner's long standing advice)

meta-data=/dev/md20              isize=2048   agcount=32, 
agsize=12820392 blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=410252304, imaxpct=5
          =                       sunit=8      swidth=56 blks
naming   =version 2              bsize=65536  ascii-ci=0
log      =internal               bsize=4096   blocks=30720, version=2
          =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

[root@siFlash ~]# mdadm --detail /dev/md20
/dev/md20:
         Version : 1.2
   Creation Time : Sun Apr  1 19:36:39 2012
      Raid Level : raid5
      Array Size : 1641009216 (1564.99 GiB 1680.39 GB)
   Used Dev Size : 234429888 (223.57 GiB 240.06 GB)
    Raid Devices : 8
   Total Devices : 8
     Persistence : Superblock is persistent

     Update Time : Fri May 25 06:26:23 2012
           State : clean
  Active Devices : 8
Working Devices : 8
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 32K

            Name : siFlash.sicluster:20
            UUID : 2f023323:6ec29eb9:a943de06:f6e0c25d
          Events : 296

     Number   Major   Minor   RaidDevice State
        0       8      112        0      active sync   /dev/sdh
        1       8       96        1      active sync   /dev/sdg
        2       8      128        2      active sync   /dev/sdi
        3       8       64        3      active sync   /dev/sde
        4       8       32        4      active sync   /dev/sdc
        5       8       48        5      active sync   /dev/sdd
        6       8      192        6      active sync   /dev/sdm
        7       8       80        7      active sync   /dev/sdf

All the SSDs are on deadline scheduler

[root@siFlash ~]# cat /sys/block/sd*/queue/scheduler | uniq
noop [deadline] cfq


All this said, deletes from this unit are taking 1-2 seconds per file ...

[root@siFlash ~]# strace -ttt -T rm -f /data/2/test/*
1337941514.040788 execve("/bin/rm", ["rm", "-f", 
"/data/2/test/2.8t-r.97.0", "/data/2/test/2.8t-r.98.0", 
"/data/2/test/2.8t-r.99.0", "/data/2/test/2.9.0"], [/* 40 vars */]) = 0 
<0.000552>
1337941514.041713 brk(0)                = 0x60d000 <0.000031>
1337941514.041927 mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7bc2779000 <0.000032>
1337941514.042113 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No 
such file or directory) <0.000109>
1337941514.042395 open("/etc/ld.so.cache", O_RDONLY) = 3 <0.000050>
1337941514.042614 fstat(3, {st_mode=S_IFREG|0644, st_size=81118, ...}) = 
0 <0.000102>
1337941514.042928 mmap(NULL, 81118, PROT_READ, MAP_PRIVATE, 3, 0) = 
0x7f7bc2765000 <0.000042>
1337941514.043078 close(3)              = 0 <0.000019>
1337941514.043235 open("/lib64/libc.so.6", O_RDONLY) = 3 <0.000115>
1337941514.043477 read(3, 
"\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\355\301W4\0\0\0"..., 
832) = 832 <0.000039>
1337941514.043647 fstat(3, {st_mode=S_IFREG|0755, st_size=1908792, ...}) 
= 0 <0.000020>
1337941514.043860 mmap(0x3457c00000, 3733672, PROT_READ|PROT_EXEC, 
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3457c00000 <0.000085>
1337941514.044065 mprotect(0x3457d86000, 2097152, PROT_NONE) = 0 <0.000034>
1337941514.044191 mmap(0x3457f86000, 20480, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x186000) = 0x3457f86000 <0.000034>
1337941514.044388 mmap(0x3457f8b000, 18600, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3457f8b000 <0.000085>
1337941514.044592 close(3)              = 0 <0.000058>
1337941514.044763 mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7bc2764000 <0.000039>
1337941514.044893 mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7bc2763000 <0.000020>
1337941514.044981 mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7bc2762000 <0.000018>
1337941514.045076 arch_prctl(ARCH_SET_FS, 0x7f7bc2763700) = 0 <0.000018>
1337941514.045183 mprotect(0x3457f86000, 16384, PROT_READ) = 0 <0.000023>
1337941514.045270 mprotect(0x345761f000, 4096, PROT_READ) = 0 <0.000019>
1337941514.045350 munmap(0x7f7bc2765000, 81118) = 0 <0.000028>
1337941514.045619 brk(0)                = 0x60d000 <0.000017>
1337941514.045698 brk(0x62e000)         = 0x62e000 <0.000018>
1337941514.045803 open("/usr/lib/locale/locale-archive", O_RDONLY) = 3 
<0.000028>
1337941514.045904 fstat(3, {st_mode=S_IFREG|0644, st_size=99158704, 
...}) = 0 <0.000017>
1337941514.046012 mmap(NULL, 99158704, PROT_READ, MAP_PRIVATE, 3, 0) = 
0x7f7bbc8d1000 <0.000020>
1337941514.046099 close(3)              = 0 <0.000017>
1337941514.046235 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost 
isig icanon echo ...}) = 0 <0.000020>
1337941514.046373 newfstatat(AT_FDCWD, "/data/2/test/2.8t-r.97.0", 
{st_mode=S_IFREG|0600, st_size=1073741824, ...}, AT_SYMLINK_NOFOLLOW) = 
0 <0.000024>
1337941514.046504 unlinkat(AT_FDCWD, "/data/2/test/2.8t-r.97.0", 0) = 0 
<1.357571>
1337941515.404257 newfstatat(AT_FDCWD, "/data/2/test/2.8t-r.98.0", 
{st_mode=S_IFREG|0600, st_size=1073741824, ...}, AT_SYMLINK_NOFOLLOW) = 
0 <0.000072>
1337941515.404485 unlinkat(AT_FDCWD, "/data/2/test/2.8t-r.98.0", 0) = 0 
<1.608016>
1337941517.012706 newfstatat(AT_FDCWD, "/data/2/test/2.8t-r.99.0", 
{st_mode=S_IFREG|0600, st_size=1073741824, ...}, AT_SYMLINK_NOFOLLOW) = 
0 <0.000082>
1337941517.012957 unlinkat(AT_FDCWD, "/data/2/test/2.8t-r.99.0", 0) = 0 
<1.133890>
1337941518.146983 newfstatat(AT_FDCWD, "/data/2/test/2.9.0", 
{st_mode=S_IFREG|0600, st_size=8589934592, ...}, AT_SYMLINK_NOFOLLOW) = 
0 <0.000023>
1337941518.147145 unlinkat(AT_FDCWD, "/data/2/test/2.9.0", 0) = 0 <0.938754>
1337941519.086125 close(0)              = 0 <0.000102>
1337941519.086357 close(1)              = 0 <0.000061>
1337941519.086540 close(2)              = 0 <0.000021>
1337941519.086694 exit_group(0)         = ?

Anything obvious that we are doing wrong?


Machine may be occupied for a bit.  Might be a few days before we can 
get results back.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

             reply	other threads:[~2012-05-25 10:37 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-25 10:37 Joe Landman [this message]
2012-05-25 10:37 ` very slow file deletion on an SSD Joe Landman
2012-05-25 10:45 ` Bernd Schubert
2012-05-25 10:45   ` Bernd Schubert
2012-05-25 10:49   ` Joe Landman
2012-05-25 10:49     ` Joe Landman
2012-05-25 14:48     ` Roberto Spadim
2012-05-25 14:48       ` Roberto Spadim
2012-05-25 16:57 ` Ben Myers
2012-05-25 16:57   ` Ben Myers
2012-05-25 16:54   ` Joe Landman
2012-05-25 16:54     ` Joe Landman
2012-05-25 16:59   ` Christoph Hellwig
2012-05-25 16:59     ` Christoph Hellwig
2012-05-26 16:00     ` David Brown
2012-05-26 16:00       ` David Brown
2012-05-26 19:56 ` Stan Hoeppner
2012-05-26 19:56   ` Stan Hoeppner
2012-05-26 23:18 ` Dave Chinner
2012-05-26 23:25   ` Joe Landman
2012-05-26 23:25     ` Joe Landman
2012-05-27  0:07     ` Dave Chinner
2012-05-27  0:07       ` Dave Chinner
2012-05-27  0:10       ` joe.landman
2012-05-27  0:10         ` joe.landman
2012-05-27  1:49       ` Joe Landman
2012-05-27  1:49         ` Joe Landman
2012-05-27  2:40         ` Eric Sandeen
2012-05-27  2:43           ` Eric Sandeen
2012-05-27  7:34           ` Stefan Ring
2012-05-27 13:15             ` Krzysztof Adamski
2012-05-27 13:15               ` Krzysztof Adamski
2012-05-27 14:59               ` joe.landman
2012-05-27 14:59                 ` joe.landman
2012-05-27 16:07                 ` Eric Sandeen
2012-05-27 16:07                   ` Eric Sandeen
2012-05-27 17:14                   ` Joe Landman
2012-05-27 17:14                     ` Joe Landman
2012-05-27 19:24                     ` Peter Grandi
2012-05-27 17:17                   ` Joe Landman
2012-05-27 17:17                     ` Joe Landman
2012-05-26 23:55   ` Joe Landman
2012-05-26 23:55     ` Joe Landman
2012-05-27  0:07     ` Jon Nelson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FBF60D1.80104@gmail.com \
    --to=joe.landman@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.