linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Poor performance unlinking hard-linked files
@ 2010-11-13  3:25 Bron Gondwana
  2010-11-16 12:54 ` Poor performance unlinking hard-linked files (repost) Bron Gondwana
  0 siblings, 1 reply; 12+ messages in thread
From: Bron Gondwana @ 2010-11-13  3:25 UTC (permalink / raw)
  To: linux-btrfs

I had a spare piece of hardware sitting around, so I thought I'd test btrfs performance with the Cyrus IMAPd server by setting up an extra replica target on the spare machine.

Some background on Cyrus replication: when copying a folder the replication system first "reserves" all messages it's going to need.  It tries to maintain "single instance store" as it's called in Cyrus terminology - hard links between identical messages on disk.

This is done in the latest version of Cyrus by storing the sha1 of each file in an index, and scanning the currently active mailboxes on the replica to see if they already have a copy of the file.  If so, a hard link is made in the data/sync./$pid/ directory back to the original file in the mailbox directory.

Cyrus stores one file per email, which pushes filesystems pretty hard.  We used reiser3 until recently, and are part way through converting to ext4.

If the file is not already available on the replica, a new copy is uploaded directly into the sync./$pid directory.

Either way, when the mailbox is then created or updated, the files get hardlinked from the sync./$pid directory to their final location.

They get kept around for a little while, until the sync_server decides it's time for a reset because it's using too much memory keeping all the tracking data.  Then it unlinks all the files in sync./$pid and starts searching for necessary files again.

Most of the time, this means single instance store works - the source and destination mailboxes always get heated up by adding both of them to the sync log, so the duplication will be found.

-----------------

Anyway, that's the background - a daemon that creates a pile of files in one directory, symlinks them out all over the file system, then unlinks all the original files later.

We're finding that as the filesystem grows (currently about 30% full on a 300Gb filesystem) the unlink performance becomes horrible.  Watching iostat, there's a lot of reading going on as well.  It really looks like the unlinks are performing pretty badly in this one case.

Ideally there would be a nice filesystem API Cyrus could call that said "delete all the files in this directory"!  Failing that, is there anything we can do to improve this use case?  Real-time production use isn't QUITE so bad as an initial sync, but lmtp delivery uses the same method - spool to staging file, parse it there, then symlink to all the delivery targets before unlinking the original.

Thanks,

Bron.
-- 
  Bron Gondwana
  brong@fastmail.fm


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-11-30 23:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-13  3:25 Poor performance unlinking hard-linked files Bron Gondwana
2010-11-16 12:54 ` Poor performance unlinking hard-linked files (repost) Bron Gondwana
2010-11-16 13:38   ` Chris Mason
2010-11-17  4:11     ` Bron Gondwana
2010-11-17  9:56       ` Bron Gondwana
2010-11-18 15:30       ` Chris Mason
2010-11-18 21:46         ` Bron Gondwana
2010-11-19 14:10           ` Chris Mason
2010-11-19 21:58             ` Bron Gondwana
2010-11-30  9:35               ` Bron Gondwana
2010-11-30 12:49                 ` Chris Mason
2010-11-30 23:24                   ` Bron Gondwana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).