Poor performance unlinking hard-linked files

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Poor performance unlinking hard-linked files
@ 2010-11-13  3:25 Bron Gondwana
  2010-11-16 12:54 ` Poor performance unlinking hard-linked files (repost) Bron Gondwana
  0 siblings, 1 reply; 12+ messages in thread
From: Bron Gondwana @ 2010-11-13  3:25 UTC (permalink / raw)
  To: linux-btrfs

I had a spare piece of hardware sitting around, so I thought I'd test btrfs performance with the Cyrus IMAPd server by setting up an extra replica target on the spare machine.

Some background on Cyrus replication: when copying a folder the replication system first "reserves" all messages it's going to need.  It tries to maintain "single instance store" as it's called in Cyrus terminology - hard links between identical messages on disk.

This is done in the latest version of Cyrus by storing the sha1 of each file in an index, and scanning the currently active mailboxes on the replica to see if they already have a copy of the file.  If so, a hard link is made in the data/sync./$pid/ directory back to the original file in the mailbox directory.

Cyrus stores one file per email, which pushes filesystems pretty hard.  We used reiser3 until recently, and are part way through converting to ext4.

If the file is not already available on the replica, a new copy is uploaded directly into the sync./$pid directory.

Either way, when the mailbox is then created or updated, the files get hardlinked from the sync./$pid directory to their final location.

They get kept around for a little while, until the sync_server decides it's time for a reset because it's using too much memory keeping all the tracking data.  Then it unlinks all the files in sync./$pid and starts searching for necessary files again.

Most of the time, this means single instance store works - the source and destination mailboxes always get heated up by adding both of them to the sync log, so the duplication will be found.

-----------------

Anyway, that's the background - a daemon that creates a pile of files in one directory, symlinks them out all over the file system, then unlinks all the original files later.

We're finding that as the filesystem grows (currently about 30% full on a 300Gb filesystem) the unlink performance becomes horrible.  Watching iostat, there's a lot of reading going on as well.  It really looks like the unlinks are performing pretty badly in this one case.

Ideally there would be a nice filesystem API Cyrus could call that said "delete all the files in this directory"!  Failing that, is there anything we can do to improve this use case?  Real-time production use isn't QUITE so bad as an initial sync, but lmtp delivery uses the same method - spool to staging file, parse it there, then symlink to all the delivery targets before unlinking the original.

Thanks,

Bron.
-- 
  Bron Gondwana
  brong@fastmail.fm

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Poor performance unlinking hard-linked files (repost)
  2010-11-13  3:25 Poor performance unlinking hard-linked files Bron Gondwana
@ 2010-11-16 12:54 ` Bron Gondwana
  2010-11-16 13:38   ` Chris Mason
  0 siblings, 1 reply; 12+ messages in thread
From: Bron Gondwana @ 2010-11-16 12:54 UTC (permalink / raw)
  To: Bron Gondwana; +Cc: linux-btrfs

Just posting this again more neatly formatted and just the
'meat':

a) program creates piles of small temporary files, hard
   links them out to different directories, unlinks the
   originals.

b) filesystem size: ~ 300Gb (backed by hardware RAID5)

c) as the filesystem grows (currently about 30% full) 
   the unlink performance becomes horrible.  Watching
   iostat, there's a lot of reading going on as well.

Is this expected?  Is there anything we can do about it?
(short of rewrite Cyrus replication)

Thanks,

Bron.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor performance unlinking hard-linked files (repost)
  2010-11-16 12:54 ` Poor performance unlinking hard-linked files (repost) Bron Gondwana
@ 2010-11-16 13:38   ` Chris Mason
  2010-11-17  4:11     ` Bron Gondwana
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Mason @ 2010-11-16 13:38 UTC (permalink / raw)
  To: Bron Gondwana; +Cc: linux-btrfs

Excerpts from Bron Gondwana's message of 2010-11-16 07:54:45 -0500:
> Just posting this again more neatly formatted and just the
> 'meat':
> 
> a) program creates piles of small temporary files, hard
>    links them out to different directories, unlinks the
>    originals.
> 
> b) filesystem size: ~ 300Gb (backed by hardware RAID5)
> 
> c) as the filesystem grows (currently about 30% full) 
>    the unlink performance becomes horrible.  Watching
>    iostat, there's a lot of reading going on as well.
> 
> Is this expected?  Is there anything we can do about it?
> (short of rewrite Cyrus replication)

Hi,

It sounds like the unlink speed is limited by the reading, and the reads
are coming from one of two places.  We're either reading to cache cold
block groups or we're reading to find the directory entries.

Could you sysrq-w while the performance is bad?  That would narrow it
down.

Josef has the reads for caching block groups fixed, but we'll have to
look hard at the reads for the rest of unlink.

-chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor performance unlinking hard-linked files (repost)
  2010-11-16 13:38   ` Chris Mason
@ 2010-11-17  4:11     ` Bron Gondwana
  2010-11-17  9:56       ` Bron Gondwana
  2010-11-18 15:30       ` Chris Mason
  0 siblings, 2 replies; 12+ messages in thread
From: Bron Gondwana @ 2010-11-17  4:11 UTC (permalink / raw)
  To: Chris Mason; +Cc: Bron Gondwana, linux-btrfs

On Tue, Nov 16, 2010 at 08:38:13AM -0500, Chris Mason wrote:
> Excerpts from Bron Gondwana's message of 2010-11-16 07:54:45 -0500:
> > Just posting this again more neatly formatted and just the
> > 'meat':
> > 
> > a) program creates piles of small temporary files, hard
> >    links them out to different directories, unlinks the
> >    originals.
> > 
> > b) filesystem size: ~ 300Gb (backed by hardware RAID5)
> > 
> > c) as the filesystem grows (currently about 30% full) 
> >    the unlink performance becomes horrible.  Watching
> >    iostat, there's a lot of reading going on as well.
> > 
> > Is this expected?  Is there anything we can do about it?
> > (short of rewrite Cyrus replication)
> 
> Hi,
> 
> It sounds like the unlink speed is limited by the reading, and the reads
> are coming from one of two places.  We're either reading to cache cold
> block groups or we're reading to find the directory entries.

All the unlinks for a single process will be happening in the same
directory (though the hard linked copies will be all over)

> Could you sysrq-w while the performance is bad?  That would narrow it
> down.

Here's one:

http://pastebin.com/Tg7agv42
 
> Josef has the reads for caching block groups fixed, but we'll have to
> look hard at the reads for the rest of unlink.

I suspect you may want a couple more before you have enough data.  I
could set up a job to run one every 10 minutes for a couple of hours
or something.  There will be at least two, possibly three threads of
"sync_server" running on this particular server instance.  It has two
btrfs partitions - a 15Gb partition on RAID1 and a 300Gb partition
on RAID5.  All the unlinks will be happening to the RAID5 one.

Bron ( our usual fully loaded server might have up to 40 of these pairs
       over 12 separate RAID sets, so anything that doesn't scale out
       to lots of filesystems would make us pretty sad too )

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor performance unlinking hard-linked files (repost)
  2010-11-17  4:11     ` Bron Gondwana
@ 2010-11-17  9:56       ` Bron Gondwana
  2010-11-18 15:30       ` Chris Mason
  1 sibling, 0 replies; 12+ messages in thread
From: Bron Gondwana @ 2010-11-17  9:56 UTC (permalink / raw)
  To: Bron Gondwana; +Cc: Chris Mason, linux-btrfs

On Wed, Nov 17, 2010 at 03:11:48PM +1100, Bron Gondwana wrote:
> > Could you sysrq-w while the performance is bad?  That would narrow it
> > down.
> 
> Here's one:
> 
> http://pastebin.com/Tg7agv42

And here's another one, inline this time.  The iostat for 10 seconds
just before said: (iostat -x 10 10)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          32.43    0.00   31.63   21.84    0.00   14.09

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.70    1.30    0.20    25.60     7.20    21.87     0.15  348.27  33.07   4.96
sda1              0.00     0.70    1.30    0.20    25.60     7.20    21.87     0.15  348.27  33.07   4.96
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda8              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     6.00    0.10   75.20     0.80   860.80    11.44     0.01    0.15   0.06   0.48
sdb1              0.00     5.20    0.10    3.80     0.80    72.00    18.67     0.00    0.41   0.31   0.12
sdb2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb8              0.00     0.80    0.00   71.40     0.00   788.80    11.05     0.01    0.13   0.05   0.36
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdc4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdd               0.00     2.40  121.80  252.40 43012.00 10223.20   142.26     2.61    6.76   1.24  46.56
sdd1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdd2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdd3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdd4              0.00     2.40  121.80  252.40 43012.00 10223.20   142.26     2.61    6.76   1.24  46.56

(sdb8 and sdd4 are the meta and data partitions respectively - sdd4 is where
all the interesting stuff is happening)

By the way - we're running with the deadline scheduler, I'm pretty sure.
Let me know if that's silly...

[533206.344314] SysRq : Show Blocked State
[533206.344376]   task                        PC stack   pid father
[533206.344500] sync_server   D 0000000107f0e028     0 17027  10416 0x00020000
[533206.344564]  ffff88016c6898a8 0000000000200046 ffff88016c688010 ffff88022a153d00
[533206.344671]  ffff88016c689fd8 ffff88016c689fd8 0000000000013300 0000000000013300
[533206.344779]  0000000000013300 0000000000013300 0000000000013300 0000000000013300
[533206.344886] Call Trace:
[533206.344948]  [<ffffffff817c17dd>] io_schedule+0x4d/0x70
[533206.345005]  [<ffffffff81093e4d>] sync_page+0x3d/0x70
[533206.345061]  [<ffffffff817c1cfa>] __wait_on_bit+0x5a/0x90
[533206.345116]  [<ffffffff81093e10>] ? sync_page+0x0/0x70
[533206.345170]  [<ffffffff810940af>] wait_on_page_bit+0x6f/0x80
[533206.345227]  [<ffffffff8105dbd0>] ? wake_bit_function+0x0/0x40
[533206.345287]  [<ffffffff81278878>] ? submit_one_bio+0x88/0xa0
[533206.345341]  [<ffffffff8127cd2d>] read_extent_buffer_pages+0x4ed/0x530
[533206.345401]  [<ffffffff81254a30>] ? btree_get_extent+0x0/0x1a0
[533206.345456]  [<ffffffff8125490e>] btree_read_extent_buffer_pages+0x5e/0xc0
[533206.345512]  [<ffffffff81255406>] read_tree_block+0x56/0x80
[533206.345569]  [<ffffffff8123a235>] read_block_for_search+0x105/0x3d0
[533206.345626]  [<ffffffff81289869>] ? btrfs_tree_unlock+0x59/0x60
[533206.345680]  [<ffffffff81239ec5>] ? unlock_up+0x145/0x160
[533206.345735]  [<ffffffff81242602>] btrfs_search_slot+0x412/0x880
[533206.345792]  [<ffffffff8124351a>] btrfs_insert_empty_items+0x6a/0xd0
[533206.345850]  [<ffffffff810cc462>] ? kmem_cache_alloc+0x92/0xf0
[533206.345905]  [<ffffffff81254039>] btrfs_insert_inode_ref+0x79/0x190
[533206.345962]  [<ffffffff812627e1>] btrfs_add_link+0x121/0x1a0
[533206.346017]  [<ffffffff817c1f39>] ? mutex_unlock+0x9/0x10
[533206.346071]  [<ffffffff8126289e>] btrfs_add_nondir+0x3e/0x70
[533206.346126]  [<ffffffff81262fe2>] btrfs_link+0xe2/0x180
[533206.346182]  [<ffffffff810dead1>] vfs_link+0x101/0x160
[533206.346237]  [<ffffffff810e1f51>] sys_linkat+0x131/0x150
[533206.346293]  [<ffffffff810e1f89>] sys_link+0x19/0x20
[533206.346349]  [<ffffffff8102cc83>] ia32_sysret+0x0/0x5
[533206.346408] sync_server   D 0000000107f0e03c     0  5431  10416 0x00020000
[533206.346470]  ffff8800ca13bc58 0000000000200046 ffff8800ca13a010 ffff8801ea888000
[533206.346577]  ffff8800ca13bfd8 ffff8800ca13bfd8 0000000000013300 0000000000013300
[533206.347724]  0000000000013300 0000000000013300 0000000000013300 0000000000013300
[533206.347830] Call Trace:
[533206.347883]  [<ffffffff817c17dd>] io_schedule+0x4d/0x70
[533206.347937]  [<ffffffff810fdbb5>] sync_buffer+0x45/0x50
[533206.347992]  [<ffffffff817c1cfa>] __wait_on_bit+0x5a/0x90
[533206.348004]  [<ffffffff810fdb70>] ? sync_buffer+0x0/0x50
[533206.348004]  [<ffffffff810fdb70>] ? sync_buffer+0x0/0x50
[533206.348004]  [<ffffffff817c1da4>] out_of_line_wait_on_bit+0x74/0x90
[533206.348004]  [<ffffffff8105dbd0>] ? wake_bit_function+0x0/0x40
[533206.348004]  [<ffffffff810fdae6>] __wait_on_buffer+0x26/0x30
[533206.348004]  [<ffffffff81256e78>] write_dev_supers+0x238/0x310
[533206.348004]  [<ffffffff81257152>] write_all_supers+0x202/0x280
[533206.348004]  [<ffffffff812571de>] write_ctree_super+0xe/0x10
[533206.348004]  [<ffffffff8128f687>] btrfs_sync_log+0x3a7/0x5c0
[533206.348004]  [<ffffffff81267e27>] btrfs_sync_file+0x187/0x1b0
[533206.348004]  [<ffffffff810fa6e1>] vfs_fsync_range+0x81/0xa0
[533206.348004]  [<ffffffff810fa767>] vfs_fsync+0x17/0x20
[533206.348004]  [<ffffffff810fa7a5>] do_fsync+0x35/0x60
[533206.348004]  [<ffffffff810fa7fb>] sys_fsync+0xb/0x10
[533206.348004]  [<ffffffff8102cc83>] ia32_sysret+0x0/0x5
[533206.348004] Sched Debug Version: v0.09, 2.6.36-dev64 #1
[533206.348004] now at 533206348.885250 msecs
[533206.348004]   .jiffies                                 : 4428193883
[533206.348004]   .sysctl_sched_latency                    : 12.000000
[533206.348004]   .sysctl_sched_min_granularity            : 1.500000
[533206.348004]   .sysctl_sched_wakeup_granularity         : 2.000000
[533206.348004]   .sysctl_sched_child_runs_first           : 0
[533206.348004]   .sysctl_sched_features                   : 15471
[533206.348004]   .sysctl_sched_tunable_scaling            : 1 (logaritmic)
[533206.348004] 
[533206.348004] cpu#0, 3000.402 MHz
[533206.348004]   .nr_running                    : 0
[533206.348004]   .load                          : 0
[533206.348004]   .nr_switches                   : 22546403
[533206.348004]   .nr_load_updates               : 133301585
[533206.348004]   .nr_uninterruptible            : 2
[533206.348004]   .next_balance                  : 4428.193884
[533206.348004]   .curr->pid                     : 0
[533206.348004]   .clock                         : 533206348.006654
[533206.348004]   .cpu_load[0]                   : 0
[533206.348004]   .cpu_load[1]                   : 0
[533206.348004]   .cpu_load[2]                   : 32
[533206.348004]   .cpu_load[3]                   : 147
[533206.348004]   .cpu_load[4]                   : 225
[533206.348004]   .yld_count                     : 0
[533206.348004]   .sched_switch                  : 0
[533206.348004]   .sched_count                   : 25635109
[533206.348004]   .sched_goidle                  : 8442206
[533206.348004]   .avg_idle                      : 891600
[533206.348004]   .ttwu_count                    : 11929488
[533206.348004]   .ttwu_local                    : 7108567
[533206.348004]   .bkl_count                     : 2862
[533206.348004] 
[533206.348004] cfs_rq[0]:
[533206.348004]   .exec_clock                    : 4785380.650156
[533206.348004]   .MIN_vruntime                  : 0.000001
[533206.348004]   .min_vruntime                  : 4266012.055723
[533206.348004]   .max_vruntime                  : 0.000001
[533206.348004]   .spread                        : 0.000000
[533206.348004]   .spread0                       : 0.000000
[533206.348004]   .nr_running                    : 0
[533206.348004]   .load                          : 0
[533206.348004]   .nr_spread_over                : 525
[533206.348004] 
[533206.348004] rt_rq[0]:
[533206.348004]   .rt_nr_running                 : 0
[533206.348004]   .rt_throttled                  : 0
[533206.348004]   .rt_time                       : 0.000000
[533206.348004]   .rt_runtime                    : 950.000000
[533206.348004] 
[533206.348004] runnable tasks:
[533206.348004]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[533206.348004] ----------------------------------------------------------------------------------------------------------
[533206.348004] 
[533206.348004] cpu#1, 3000.402 MHz
[533206.348004]   .nr_running                    : 1
[533206.348004]   .load                          : 1024
[533206.348004]   .nr_switches                   : 20052917
[533206.348004]   .nr_load_updates               : 133301525
[533206.348004]   .nr_uninterruptible            : 0
[533206.348004]   .next_balance                  : 4428.193883
[533206.348004]   .curr->pid                     : 6175
[533206.348004]   .clock                         : 533206344.023423
[533206.348004]   .cpu_load[0]                   : 0
[533206.348004]   .cpu_load[1]                   : 0
[533206.348004]   .cpu_load[2]                   : 15
[533206.348004]   .cpu_load[3]                   : 133
[533206.348004]   .cpu_load[4]                   : 330
[533206.348004]   .yld_count                     : 0
[533206.348004]   .sched_switch                  : 0
[533206.348004]   .sched_count                   : 24068035
[533206.348004]   .sched_goidle                  : 6629197
[533206.348004]   .avg_idle                      : 881626
[533206.348004]   .ttwu_count                    : 10794852
[533206.348004]   .ttwu_local                    : 8391194
[533206.348004]   .bkl_count                     : 2823
[533206.348004] 
[533206.348004] cfs_rq[1]:
[533206.348004]   .exec_clock                    : 4041404.226026
[533206.348004]   .MIN_vruntime                  : 0.000001
[533206.348004]   .min_vruntime                  : 4070860.187615
[533206.348004]   .max_vruntime                  : 0.000001
[533206.348004]   .spread                        : 0.000000
[533206.348004]   .spread0                       : -195151.868108
[533206.348004]   .nr_running                    : 1
[533206.348004]   .load                          : 1024
[533206.348004]   .nr_spread_over                : 615
[533206.348004] 
[533206.348004] rt_rq[1]:
[533206.348004]   .rt_nr_running                 : 0
[533206.348004]   .rt_throttled                  : 0
[533206.348004]   .rt_time                       : 0.000000
[533206.348004]   .rt_runtime                    : 950.000000
[533206.348004] 
[533206.348004] runnable tasks:
[533206.348004]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[533206.348004] ----------------------------------------------------------------------------------------------------------
[533206.348004] R           bash  6175   4070854.187615        73   120   4070854.187615        32.744445     49348.100455
[533206.348004] 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor performance unlinking hard-linked files (repost)
  2010-11-17  4:11     ` Bron Gondwana
  2010-11-17  9:56       ` Bron Gondwana
@ 2010-11-18 15:30       ` Chris Mason
  2010-11-18 21:46         ` Bron Gondwana
  1 sibling, 1 reply; 12+ messages in thread
From: Chris Mason @ 2010-11-18 15:30 UTC (permalink / raw)
  To: Bron Gondwana; +Cc: linux-btrfs

Excerpts from Bron Gondwana's message of 2010-11-16 23:11:48 -0500:
> On Tue, Nov 16, 2010 at 08:38:13AM -0500, Chris Mason wrote:
> > Excerpts from Bron Gondwana's message of 2010-11-16 07:54:45 -0500:
> > > Just posting this again more neatly formatted and just the
> > > 'meat':
> > > 
> > > a) program creates piles of small temporary files, hard
> > >    links them out to different directories, unlinks the
> > >    originals.
> > > 
> > > b) filesystem size: ~ 300Gb (backed by hardware RAID5)
> > > 
> > > c) as the filesystem grows (currently about 30% full) 
> > >    the unlink performance becomes horrible.  Watching
> > >    iostat, there's a lot of reading going on as well.
> > > 
> > > Is this expected?  Is there anything we can do about it?
> > > (short of rewrite Cyrus replication)
> > 
> > Hi,
> > 
> > It sounds like the unlink speed is limited by the reading, and the reads
> > are coming from one of two places.  We're either reading to cache cold
> > block groups or we're reading to find the directory entries.
> 
> All the unlinks for a single process will be happening in the same
> directory (though the hard linked copies will be all over)
> 
> > Could you sysrq-w while the performance is bad?  That would narrow it
> > down.
> 
> Here's one:
> 
> http://pastebin.com/Tg7agv42

Ok, we're mixing unlinks and fsyncs.  If it fsyncing directories too?

-chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor performance unlinking hard-linked files (repost)
  2010-11-18 15:30       ` Chris Mason
@ 2010-11-18 21:46         ` Bron Gondwana
  2010-11-19 14:10           ` Chris Mason
  0 siblings, 1 reply; 12+ messages in thread
From: Bron Gondwana @ 2010-11-18 21:46 UTC (permalink / raw)
  To: Chris Mason; +Cc: Bron Gondwana, linux-btrfs

On Thu, Nov 18, 2010 at 10:30:47AM -0500, Chris Mason wrote:
> Excerpts from Bron Gondwana's message of 2010-11-16 23:11:48 -0500:
> > > > a) program creates piles of small temporary files, hard
> > > >    links them out to different directories, unlinks the
> > > >    originals.
> > > > 
> > > > b) filesystem size: ~ 300Gb (backed by hardware RAID5)
> > > > 
> > > > c) as the filesystem grows (currently about 30% full) 
> > > >    the unlink performance becomes horrible.  Watching
> > > >    iostat, there's a lot of reading going on as well.
> > > 
> > > It sounds like the unlink speed is limited by the reading, and the reads
> > > are coming from one of two places.  We're either reading to cache cold
> > > block groups or we're reading to find the directory entries.
> > 
> > All the unlinks for a single process will be happening in the same
> > directory (though the hard linked copies will be all over)
> > 
> > > Could you sysrq-w while the performance is bad?  That would narrow it
> > > down.
> > 
> > Here's one:
> > 
> > http://pastebin.com/Tg7agv42
> 
> Ok, we're mixing unlinks and fsyncs.  If it fsyncing directories too?

Nup.  I'm pretty sure it doesn't, just files.  Yes - there will certainly
be fsyncs going on as well - Cyrus is very careful to fsync everything it
cares about at the file level, but all it does with directories is mkdir
them if they don't exist.

This just a single "sync_server" process on an experimental server.  A 
real server under full load is going to have multiple processes doing
fsyncs and unlinks.

A significant portion of unlinks are of files that have another link on
the filesystem.  Every mailbox "move" is implemented as a copy (hardlink)
plus expunge (delayed unlink).  The "delay" works by marking the message
to be deleted in the cyrus.index metadata file, and then deleting later
(tunable: 7 to 14 days in our case depending when the next weekend is)

Bron.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor performance unlinking hard-linked files (repost)
  2010-11-18 21:46         ` Bron Gondwana
@ 2010-11-19 14:10           ` Chris Mason
  2010-11-19 21:58             ` Bron Gondwana
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Mason @ 2010-11-19 14:10 UTC (permalink / raw)
  To: Bron Gondwana; +Cc: linux-btrfs

Excerpts from Bron Gondwana's message of 2010-11-18 16:46:31 -0500:
> On Thu, Nov 18, 2010 at 10:30:47AM -0500, Chris Mason wrote:
> > Excerpts from Bron Gondwana's message of 2010-11-16 23:11:48 -0500:
> > > > > a) program creates piles of small temporary files, hard
> > > > >    links them out to different directories, unlinks the
> > > > >    originals.
> > > > > 
> > > > > b) filesystem size: ~ 300Gb (backed by hardware RAID5)
> > > > > 
> > > > > c) as the filesystem grows (currently about 30% full) 
> > > > >    the unlink performance becomes horrible.  Watching
> > > > >    iostat, there's a lot of reading going on as well.
> > > > 
> > > > It sounds like the unlink speed is limited by the reading, and the reads
> > > > are coming from one of two places.  We're either reading to cache cold
> > > > block groups or we're reading to find the directory entries.
> > > 
> > > All the unlinks for a single process will be happening in the same
> > > directory (though the hard linked copies will be all over)
> > > 
> > > > Could you sysrq-w while the performance is bad?  That would narrow it
> > > > down.
> > > 
> > > Here's one:
> > > 
> > > http://pastebin.com/Tg7agv42
> > 
> > Ok, we're mixing unlinks and fsyncs.  If it fsyncing directories too?
> 
> Nup.  I'm pretty sure it doesn't, just files.  Yes - there will certainly
> be fsyncs going on as well - Cyrus is very careful to fsync everything it
> cares about at the file level, but all it does with directories is mkdir
> them if they don't exist.

Could you double check this one please?  fsyncing the directory is a ton
more expensive, I just want to make sure it isn't part of the workload.

Otherwise it looks like we're seeking to read in the inode and unlink
it.  One possibility is that we're not giving the elevator enough clues
about the IO being synchronous.

Are you using cfq or deadline?  I bet we can improve the latencies using
READ_SYNC.

-chris


> 
> This just a single "sync_server" process on an experimental server.  A 
> real server under full load is going to have multiple processes doing
> fsyncs and unlinks.
> 
> A significant portion of unlinks are of files that have another link on
> the filesystem.  Every mailbox "move" is implemented as a copy (hardlink)
> plus expunge (delayed unlink).  The "delay" works by marking the message
> to be deleted in the cyrus.index metadata file, and then deleting later
> (tunable: 7 to 14 days in our case depending when the next weekend is)
> 
> Bron.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor performance unlinking hard-linked files (repost)
  2010-11-19 14:10           ` Chris Mason
@ 2010-11-19 21:58             ` Bron Gondwana
  2010-11-30  9:35               ` Bron Gondwana
  0 siblings, 1 reply; 12+ messages in thread
From: Bron Gondwana @ 2010-11-19 21:58 UTC (permalink / raw)
  To: Chris Mason; +Cc: Bron Gondwana, linux-btrfs

On Fri, Nov 19, 2010 at 09:10:08AM -0500, Chris Mason wrote:
> Excerpts from Bron Gondwana's message of 2010-11-18 16:46:31 -0500:
> > On Thu, Nov 18, 2010 at 10:30:47AM -0500, Chris Mason wrote:
> > > > http://pastebin.com/Tg7agv42
> > > 
> > > Ok, we're mixing unlinks and fsyncs.  If it fsyncing directories too?
> > 
> > Nup.  I'm pretty sure it doesn't, just files.  Yes - there will certainly
> > be fsyncs going on as well - Cyrus is very careful to fsync everything it
> > cares about at the file level, but all it does with directories is mkdir
> > them if they don't exist.
> 
> Could you double check this one please?  fsyncing the directory is a ton
> more expensive, I just want to make sure it isn't part of the workload.
> 
> Otherwise it looks like we're seeking to read in the inode and unlink
> it.  One possibility is that we're not giving the elevator enough clues
> about the IO being synchronous.
> 
> Are you using cfq or deadline?  I bet we can improve the latencies using
> READ_SYNC.

I'm using deadline.

Here's a redacted strace of a single message upload:  (those gettimeofday
calls are actually caused by "trickle" being used to bandwidth limit these
things from nuking our internal network if they call go crazy at one)

All I'm seeing is the fsyncs on the files.  And some unnecessary mkdir
calls that I can probably remove, and an unneccary truncate on the
quota file.

Bron.

gettimeofday({1290202884, 848919}, NULL) = 0
gettimeofday({1290202884, 849006}, NULL) = 0
mkdir("/mnt", 0755)                     = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4", 0755)          = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23/data", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23/data/sync.", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23/data/sync./11284", 0755) = -1 EEXIST (File exists)
open("/mnt/sata96b1d4/slots96b1p4/store23/data/sync./11284/9be294a24866fc162e5a2d48925d57642ff20a71", O_RDWR|O_CREAT|O_TRUNC, 0666) = 11
fstat64(11, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf4cb2000
time(NULL)                              = 1290202884
read(0, "<CENSORED>"..., 4096) = 4096
fcntl64(0, F_GETFL)                     = 0x2 (flags O_RDWR)
gettimeofday({1290202884, 851014}, NULL) = 0
gettimeofday({1290202884, 851101}, NULL) = 0
write(11, "MIME-Version: 1.0\r\nContent-Transf"..., 4096) = 4096
time(NULL)                              = 1290202884
read(0, "<CENSORED>"..., 4096) = 4096
fcntl64(0, F_GETFL)                     = 0x2 (flags O_RDWR)
gettimeofday({1290202884, 851952}, NULL) = 0
gettimeofday({1290202884, 852038}, NULL) = 0
write(11, "<CENSORED>"..., 4096) = 4096
time(NULL)                              = 1290202884
read(0, "<CENSORED>"..., 4096) = 4096
fcntl64(0, F_GETFL)                     = 0x2 (flags O_RDWR)
gettimeofday({1290202884, 852644}, NULL) = 0
gettimeofday({1290202884, 852729}, NULL) = 0
write(11, "family: Arial; font-size: medium="..., 4096) = 4096
time(NULL)                              = 1290202884
read(0, "<CENSORED>"..., 4096) = 4096
fcntl64(0, F_GETFL)                     = 0x2 (flags O_RDWR)
gettimeofday({1290202884, 853303}, NULL) = 0
gettimeofday({1290202884, 853389}, NULL) = 0
write(11, "<CENSORED>"..., 4096) = 4096
time(NULL)                              = 1290202884
read(0, "<CENSORED>"..., 4096) = 4096
fcntl64(0, F_GETFL)                     = 0x2 (flags O_RDWR)
gettimeofday({1290202884, 853960}, NULL) = 0
gettimeofday({1290202884, 854045}, NULL) = 0
write(11, "<CENSORED>"..., 4096) = 4096
time(NULL)                              = 1290202884
read(0, "<CENSORED>"..., 4096) = 4096
fcntl64(0, F_GETFL)                     = 0x2 (flags O_RDWR)
gettimeofday({1290202884, 854617}, NULL) = 0
gettimeofday({1290202884, 854703}, NULL) = 0
write(11, "<CENSORED>"..., 4096) = 4096
time(NULL)                              = 1290202884
read(0, "<CENSORED>"..., 4096) = 910
fcntl64(0, F_GETFL)                     = 0x2 (flags O_RDWR)
gettimeofday({1290202884, 855431}, NULL) = 0
gettimeofday({1290202884, 855552}, NULL) = 0
write(11, "<CENSORED>"..., 4096) = 4096
write(11, "<CENSORED>"..., 668) = 668
fsync(11)                               = 0
close(11)                               = 0
munmap(0xf4cb2000, 4096)                = 0
write(1, "<CENSORED>"..., 32) = 32
time(NULL)                              = 1290202884
read(0, "<CENSORED>"..., 4096) = 731
fcntl64(0, F_GETFL)                     = 0x2 (flags O_RDWR)
gettimeofday({1290202884, 858721}, NULL) = 0
gettimeofday({1290202884, 858809}, NULL) = 0
open("/mnt/sata96b1m4/slots96b1p4/store23/conf/lock/domain/a/airpost.net/p/user/<CENSORED>/Drafts.lock", O_RDWR|O_CREAT|O_TRUNC, 0666) = 11
fcntl64(11, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}) = 0
fcntl64(6, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}) = 0
fstat64(6, {st_mode=S_IFREG|0600, st_size=809668, ...}) = 0
stat64("/mnt/sata96b1m4/slots96b1p4/store23/conf/mailboxes.db", {st_mode=S_IFREG|0600, st_size=809668, ...}) = 0
fcntl64(6, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
open("/mnt/sata96b1m4/slots96b1p4/store23/meta/domain/a/airpost.net/p/user/<CENSORED>/Drafts/cyrus.index", O_RDWR) = 13
fstat64(13, {st_mode=S_IFREG|0600, st_size=9536, ...}) = 0
mmap2(NULL, 24576, PROT_READ, MAP_SHARED, 13, 0) = 0xf4cad000
fcntl64(13, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
stat64("/mnt/sata96b1m4/slots96b1p4/store23/meta/domain/a/airpost.net/p/user/<CENSORED>/Drafts/cyrus.header", {st_mode=S_IFREG|0600, st_size=241, ...}) = 0
open("/mnt/sata96b1m4/slots96b1p4/store23/meta/domain/a/airpost.net/p/user/<CENSORED>/Drafts/cyrus.header", O_RDONLY) = 14
fstat64(14, {st_mode=S_IFREG|0600, st_size=241, ...}) = 0
mmap2(NULL, 241, PROT_READ, MAP_SHARED, 14, 0) = 0xf4cac000
munmap(0xf4cac000, 241)                 = 0
lseek(13, 9440, SEEK_SET)               = 9440
write(13, "<REWRITE INDEX RECORD (unrelated)>"..., 96) = 96
time(NULL)                              = 1290202884
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
send(5, "<181>Nov 19 16:41:24 slots96b1p4/"..., 238, MSG_NOSIGNAL) = 238
mkdir("/mnt", 0755)                     = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4", 0755)          = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23/data", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23/data/sync.", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23/data/sync./11284", 0755) = -1 EEXIST (File exists)
open("/mnt/sata96b1d4/slots96b1p4/store23/data/sync./11284/9be294a24866fc162e5a2d48925d57642ff20a71", O_RDONLY) = 15
fstat64(15, {st_mode=S_IFREG|0600, st_size=29340, ...}) = 0
mmap2(NULL, 29340, PROT_READ, MAP_SHARED, 15, 0) = 0xf4ca5000
munmap(0xf4ca5000, 29340)               = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
close(15)                               = 0
mkdir("/mnt", 0755)                     = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4", 0755)          = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23/data", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23/data/domain", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23/data/domain/a", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23/data/domain/a/airpost.net", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23/data/domain/a/airpost.net/p", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23/data/domain/a/airpost.net/p/user", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23/data/domain/a/airpost.net/p/user/<CENSORED>", 0755) = -1 EEXIST (File exists)
mkdir("/mnt/sata96b1d4/slots96b1p4/store23/data/domain/a/airpost.net/p/user/<CENSORED>/Drafts", 0755) = -1 EEXIST (File exists)
link("/mnt/sata96b1d4/slots96b1p4/store23/data/sync./11284/9be294a24866fc162e5a2d48925d57642ff20a71", "/mnt/sata96b1d4/slots96b1p4/store23/data/domain/a/airpost.net/p/user/<CENSORED>/Drafts/4907.") = 0
utime("/mnt/sata96b1d4/slots96b1p4/store23/data/domain/a/airpost.net/p/user/<CENSORED>/Drafts/4907.", [2010/11/19-16:41:24, 2010/11/19-16:41:24]) = 0
open("/mnt/sata96b1m4/slots96b1p4/store23/meta/domain/a/airpost.net/p/user/<CENSORED>/Drafts/cyrus.cache", O_RDWR) = 15
fstat64(15, {st_mode=S_IFREG|0600, st_size=105488, ...}) = 0
mmap2(NULL, 114688, PROT_READ, MAP_SHARED, 15, 0) = 0xf4c91000
lseek(15, 0, SEEK_END)                  = 105488
write(15, "<CACHE ENTRY>"..., 1200) = 1200
lseek(13, 9536, SEEK_SET)               = 9536
write(13, "<INDEX RECORD FOR THIS UPLOAD>"..., 96) = 96
time(NULL)                              = 1290202884
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
send(5, "<181>Nov 19 16:41:24 slots96b1p4/"..., 237, MSG_NOSIGNAL) = 237
time(NULL)                              = 1290202884
fsync(15)                               = 0
open("/mnt/sata96b1m4/slots96b1p4/store23/conf/domain/a/airpost.net/quota/p/user.<CENSORED>", O_RDWR) = 16
fcntl64(16, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
fstat64(16, {st_mode=S_IFREG|0600, st_size=18, ...}) = 0
stat64("/mnt/sata96b1m4/slots96b1p4/store23/conf/domain/a/airpost.net/quota/p/user.<CENSORED>", {st_mode=S_IFREG|0600, st_size=18, ...}) = 0
fstat64(16, {st_mode=S_IFREG|0600, st_size=18, ...}) = 0
mmap2(NULL, 18, PROT_READ, MAP_SHARED, 16, 0) = 0xf4c90000
munmap(0xf4c90000, 18)                  = 0
unlink("/mnt/sata96b1m4/slots96b1p4/store23/conf/domain/a/airpost.net/quota/p/user.<CENSORED>.NEW") = -1 ENOENT (No such file or directory)
open("/mnt/sata96b1m4/slots96b1p4/store23/conf/domain/a/airpost.net/quota/p/user.<CENSORED>.NEW", O_RDWR|O_CREAT|O_TRUNC, 0666) = 17
fcntl64(17, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
lseek(17, 0, SEEK_SET)                  = 0
write(17, "<CENSORED>"..., 18) = 18
ftruncate(17, 18)                       = 0
fsync(17)                               = 0
fstat64(17, {st_mode=S_IFREG|0600, st_size=18, ...}) = 0
rename("/mnt/sata96b1m4/slots96b1p4/store23/conf/domain/a/airpost.net/quota/p/user.<CENSORED>.NEW", "/mnt/sata96b1m4/slots96b1p4/store23/conf/domain/a/airpost.net/quota/p/user.<CENSORED>") = 0
fcntl64(17, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
close(17)                               = 0
fcntl64(16, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
close(16)                               = 0
lseek(13, 0, SEEK_SET)                  = 0
write(13, "<UPDATED INDEX HEADER>"..., 128) = 128
fsync(13)                               = 0
fcntl64(8, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
fstat64(8, {st_mode=S_IFREG|0600, st_size=144, ...}) = 0
stat64("/mnt/sata96b1m4/slots96b1p4/store23/conf/statuscache.db", {st_mode=S_IFREG|0600, st_size=144, ...}) = 0
fcntl64(8, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
fcntl64(13, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
munmap(0xf4cad000, 24576)               = 0
munmap(0xf4c91000, 114688)              = 0
close(13)                               = 0




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor performance unlinking hard-linked files (repost)
  2010-11-19 21:58             ` Bron Gondwana
@ 2010-11-30  9:35               ` Bron Gondwana
  2010-11-30 12:49                 ` Chris Mason
  0 siblings, 1 reply; 12+ messages in thread
From: Bron Gondwana @ 2010-11-30  9:35 UTC (permalink / raw)
  To: Bron Gondwana; +Cc: Chris Mason, linux-btrfs

On Sat, Nov 20, 2010 at 08:58:10AM +1100, Bron Gondwana wrote:
> On Fri, Nov 19, 2010 at 09:10:08AM -0500, Chris Mason wrote:
> > Excerpts from Bron Gondwana's message of 2010-11-18 16:46:31 -0500:
> > > On Thu, Nov 18, 2010 at 10:30:47AM -0500, Chris Mason wrote:
> > > > Ok, we're mixing unlinks and fsyncs.  If it fsyncing directories too?
> > > 
> > > Nup.  I'm pretty sure it doesn't, just files.  Yes - there will certainly
> > > be fsyncs going on as well - Cyrus is very careful to fsync everything it
> > > cares about at the file level, but all it does with directories is mkdir
> > > them if they don't exist.
> > 
> > Could you double check this one please?  fsyncing the directory is a ton
> > more expensive, I just want to make sure it isn't part of the workload.
> > 
> > Otherwise it looks like we're seeking to read in the inode and unlink
> > it.  One possibility is that we're not giving the elevator enough clues
> > about the IO being synchronous.
> > 
> > Are you using cfq or deadline?  I bet we can improve the latencies using
> > READ_SYNC.
> 
> I'm using deadline.
> 
> All I'm seeing is the fsyncs on the files.  And some unnecessary mkdir
> calls that I can probably remove, and an unneccary truncate on the
> quota file.

Do you have any suggestsions for what I could try?  You mentioned READ_SYNC
above.  We now have one working partition on this machine, but it took longer
to set up than most, and I'm not sure how it will cope with 7 more of them 
(which is my next project - compare to the historical performance of this
box first with reiserfs and then with ext4!)

Bron.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor performance unlinking hard-linked files (repost)
  2010-11-30  9:35               ` Bron Gondwana
@ 2010-11-30 12:49                 ` Chris Mason
  2010-11-30 23:24                   ` Bron Gondwana
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Mason @ 2010-11-30 12:49 UTC (permalink / raw)
  To: Bron Gondwana; +Cc: linux-btrfs

Excerpts from Bron Gondwana's message of 2010-11-30 04:35:10 -0500:
> On Sat, Nov 20, 2010 at 08:58:10AM +1100, Bron Gondwana wrote:
> > On Fri, Nov 19, 2010 at 09:10:08AM -0500, Chris Mason wrote:
> > > Excerpts from Bron Gondwana's message of 2010-11-18 16:46:31 -0500:
> > > > On Thu, Nov 18, 2010 at 10:30:47AM -0500, Chris Mason wrote:
> > > > > Ok, we're mixing unlinks and fsyncs.  If it fsyncing directories too?
> > > > 
> > > > Nup.  I'm pretty sure it doesn't, just files.  Yes - there will certainly
> > > > be fsyncs going on as well - Cyrus is very careful to fsync everything it
> > > > cares about at the file level, but all it does with directories is mkdir
> > > > them if they don't exist.
> > > 
> > > Could you double check this one please?  fsyncing the directory is a ton
> > > more expensive, I just want to make sure it isn't part of the workload.
> > > 
> > > Otherwise it looks like we're seeking to read in the inode and unlink
> > > it.  One possibility is that we're not giving the elevator enough clues
> > > about the IO being synchronous.
> > > 
> > > Are you using cfq or deadline?  I bet we can improve the latencies using
> > > READ_SYNC.
> > 
> > I'm using deadline.
> > 
> > All I'm seeing is the fsyncs on the files.  And some unnecessary mkdir
> > calls that I can probably remove, and an unneccary truncate on the
> > quota file.
> 
> Do you have any suggestsions for what I could try?  You mentioned READ_SYNC
> above.  We now have one working partition on this machine, but it took longer
> to set up than most, and I'm not sure how it will cope with 7 more of them 
> (which is my next project - compare to the historical performance of this
> box first with reiserfs and then with ext4!)

Let me work up a patch that does READ_SYNC calls for the metadata reads,
and I'll try to model this here a little.  We should be able to improve
things.

-chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor performance unlinking hard-linked files (repost)
  2010-11-30 12:49                 ` Chris Mason
@ 2010-11-30 23:24                   ` Bron Gondwana
  0 siblings, 0 replies; 12+ messages in thread
From: Bron Gondwana @ 2010-11-30 23:24 UTC (permalink / raw)
  To: Chris Mason; +Cc: Bron Gondwana, linux-btrfs

On Tue, Nov 30, 2010 at 07:49:26AM -0500, Chris Mason wrote:
> Excerpts from Bron Gondwana's message of 2010-11-30 04:35:10 -0500:
> > Do you have any suggestsions for what I could try?  You mentioned READ_SYNC
> > above.  We now have one working partition on this machine, but it took longer
> > to set up than most, and I'm not sure how it will cope with 7 more of them 
> > (which is my next project - compare to the historical performance of this
> > box first with reiserfs and then with ext4!)
> 
> Let me work up a patch that does READ_SYNC calls for the metadata reads,
> and I'll try to model this here a little.  We should be able to improve
> things.

Is there any reason why the read is going back down to the disk?  The
machine has 8Gb of RAM, and should easily have been able to cache all
the metadata under the workload it had.

I look forward to trying the patch :)

Thanks,

Bron.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-11-30 23:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-13  3:25 Poor performance unlinking hard-linked files Bron Gondwana
2010-11-16 12:54 ` Poor performance unlinking hard-linked files (repost) Bron Gondwana
2010-11-16 13:38   ` Chris Mason
2010-11-17  4:11     ` Bron Gondwana
2010-11-17  9:56       ` Bron Gondwana
2010-11-18 15:30       ` Chris Mason
2010-11-18 21:46         ` Bron Gondwana
2010-11-19 14:10           ` Chris Mason
2010-11-19 21:58             ` Bron Gondwana
2010-11-30  9:35               ` Bron Gondwana
2010-11-30 12:49                 ` Chris Mason
2010-11-30 23:24                   ` Bron Gondwana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).