From: "Bron Gondwana" <brong@fastmail.fm>
To: linux-btrfs@vger.kernel.org
Subject: Poor performance unlinking hard-linked files
Date: Sat, 13 Nov 2010 14:25:24 +1100 [thread overview]
Message-ID: <1289618724.28645.1405062363@webmail.messagingengine.com> (raw)
I had a spare piece of hardware sitting around, so I thought I'd test btrfs performance with the Cyrus IMAPd server by setting up an extra replica target on the spare machine.
Some background on Cyrus replication: when copying a folder the replication system first "reserves" all messages it's going to need. It tries to maintain "single instance store" as it's called in Cyrus terminology - hard links between identical messages on disk.
This is done in the latest version of Cyrus by storing the sha1 of each file in an index, and scanning the currently active mailboxes on the replica to see if they already have a copy of the file. If so, a hard link is made in the data/sync./$pid/ directory back to the original file in the mailbox directory.
Cyrus stores one file per email, which pushes filesystems pretty hard. We used reiser3 until recently, and are part way through converting to ext4.
If the file is not already available on the replica, a new copy is uploaded directly into the sync./$pid directory.
Either way, when the mailbox is then created or updated, the files get hardlinked from the sync./$pid directory to their final location.
They get kept around for a little while, until the sync_server decides it's time for a reset because it's using too much memory keeping all the tracking data. Then it unlinks all the files in sync./$pid and starts searching for necessary files again.
Most of the time, this means single instance store works - the source and destination mailboxes always get heated up by adding both of them to the sync log, so the duplication will be found.
-----------------
Anyway, that's the background - a daemon that creates a pile of files in one directory, symlinks them out all over the file system, then unlinks all the original files later.
We're finding that as the filesystem grows (currently about 30% full on a 300Gb filesystem) the unlink performance becomes horrible. Watching iostat, there's a lot of reading going on as well. It really looks like the unlinks are performing pretty badly in this one case.
Ideally there would be a nice filesystem API Cyrus could call that said "delete all the files in this directory"! Failing that, is there anything we can do to improve this use case? Real-time production use isn't QUITE so bad as an initial sync, but lmtp delivery uses the same method - spool to staging file, parse it there, then symlink to all the delivery targets before unlinking the original.
Thanks,
Bron.
--
Bron Gondwana
brong@fastmail.fm
next reply other threads:[~2010-11-13 3:25 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-13 3:25 Bron Gondwana [this message]
2010-11-16 12:54 ` Poor performance unlinking hard-linked files (repost) Bron Gondwana
2010-11-16 13:38 ` Chris Mason
2010-11-17 4:11 ` Bron Gondwana
2010-11-17 9:56 ` Bron Gondwana
2010-11-18 15:30 ` Chris Mason
2010-11-18 21:46 ` Bron Gondwana
2010-11-19 14:10 ` Chris Mason
2010-11-19 21:58 ` Bron Gondwana
2010-11-30 9:35 ` Bron Gondwana
2010-11-30 12:49 ` Chris Mason
2010-11-30 23:24 ` Bron Gondwana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1289618724.28645.1405062363@webmail.messagingengine.com \
--to=brong@fastmail.fm \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).