From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Karl Moskowski <kmoskowski@me.com>,
Jeff King <peff@peff.net>, Mike Hommey <mh@glandium.org>,
David Turner <dturner@twopensource.com>,
Michael Haggerty <mhagger@alum.mit.edu>
Subject: [PATCH 00/20] Delete directories left empty after ref deletion
Date: Tue, 16 Feb 2016 14:22:13 +0100 [thread overview]
Message-ID: <cover.1455626201.git.mhagger@alum.mit.edu> (raw)
Previously, we were pretty sloppy about leaving empty directories
behind (under both $GIT_DIR/refs and $GIT_DIR/logs) when deleting
references. Such directories could accumulate essentially forever.
It's true that `pack-refs` deletes directories that it empties, but if
a directory is *already* empty, then `pack-refs` doesn't remove it. It
is also true that if an empty directory gets in the way of the
creation of a *new* reference, then it is deleted. But otherwise there
is no systematic cleanup of empty directories.
Aside from being messy, wasting disk space, and costing extra time
when enumerating references, the proliferation of empty directories
was triggering a libgit2 bug [1] on our servers. (We use temporary,
uniquely-named references for internal purposes, causing a lot of
empty directories to accumulate in some cases.)
This problem was also recently reported by Karl Moskowski [2] to be
the cause of problem on non-case-sensitive filesystems.
This patch series makes the reference update machinery more aggressive
about deleting empty directories under $GIT_DIR/refs and under
$GIT_DIR/logs when deleting references and/or reflogs. This doesn't
eliminate all situations where empty directories can be left behind
[3], but it covers the worst offenders.
As prelude to the main change, there are a number of patches that make
the *creation* of reflog directories more robust against races. Since
we want to delete such directories more aggressively, we have to worry
more about a race between a process that is creating a new reflog, and
another process that might be deleting the containing directory at the
same time. (We already had protection against this sort of race for
reference creation, but not for reflog creation.)
And since I got tired of writing the same code over and over, I
abstracted the code for retrying directory creation into a new
function, raceproof_create_file(). This function replaces similar code
that appeared in two other places and is now also used for creating
reflog files. (Can anybody think of any other code that has to deal
with the same kind of race and could maybe benefit?)
This patch series is also available from my GitHub account [4] as
branch delete-empty-refs-dirs.
As you might imagine, this patch series conflicts with David Turner's
dt/refs-backend-lmdb series [5]. But the conflicts are not too bad [6].
When one or the other of these series is ready to progress, I'd be
happy to help resolve the conflicts with the other.
Michael
[1] https://github.com/libgit2/libgit2/issues/3576
[2] http://thread.gmane.org/gmane.comp.version-control.git/283504
[3] Off the top of my head, one case that is not covered is when a
reflog is expired: a lock is acquired under $GIT_DIR/refs, but the
directory that held it is not deleted when the lock is released.
[4] http://github.com/mhagger/git
[5] http://thread.gmane.org/gmane.comp.version-control.git/285604
[6] For a merge of this patch series with dt/refs-backend-lmdb v4, see
branch merge-delete-empty-refs-dirs-refs-backend-lmdb at my GitHub
repository [4].
Michael Haggerty (20):
safe_create_leading_directories_const(): preserve errno
safe_create_leading_directories(): set errno on SCLD_EXISTS
raceproof_create_file(): new function
lock_ref_sha1_basic(): use raceproof_create_file()
rename_tmp_log(): use raceproof_create_file()
rename_tmp_log(): improve error reporting
log_ref_setup(): separate code for create vs non-create
log_ref_setup(): improve robustness against races
log_ref_setup(): pass the open file descriptor back to the caller
log_ref_write_1(): don't depend on logfile
log_ref_setup(): manage the name of the reflog file internally
log_ref_write_1(): inline function
try_remove_empty_parents(): rename parameter "name" -> "refname"
try_remove_empty_parents(): don't trash argument contents
try_remove_empty_parents(): don't accommodate consecutive slashes
t5505: use "for-each-ref" to test for the non-existence of references
delete_ref_loose(): derive loose reference path from lock
delete_ref_loose(): inline function
try_remove_empty_parents(): teach to remove parents of reflogs, too
ref_transaction_commit(): clean up empty directories
cache.h | 26 +++-
refs/files-backend.c | 370 ++++++++++++++++++++++++++------------------------
refs/refs-internal.h | 9 +-
sha1_file.c | 77 ++++++++++-
t/t1400-update-ref.sh | 27 ++++
t/t5505-remote.sh | 2 +-
6 files changed, 325 insertions(+), 186 deletions(-)
--
2.7.0
next reply other threads:[~2016-02-16 13:30 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-16 13:22 Michael Haggerty [this message]
2016-02-16 13:22 ` [PATCH 01/20] safe_create_leading_directories_const(): preserve errno Michael Haggerty
2016-02-16 23:45 ` Jeff King
2016-02-16 13:22 ` [PATCH 02/20] safe_create_leading_directories(): set errno on SCLD_EXISTS Michael Haggerty
2016-02-17 19:23 ` Junio C Hamano
2016-02-18 15:33 ` Michael Haggerty
2016-02-16 13:22 ` [PATCH 03/20] raceproof_create_file(): new function Michael Haggerty
2016-02-17 19:38 ` Junio C Hamano
2016-02-16 13:22 ` [PATCH 04/20] lock_ref_sha1_basic(): use raceproof_create_file() Michael Haggerty
2016-02-17 20:44 ` Junio C Hamano
2016-02-16 13:22 ` [PATCH 05/20] rename_tmp_log(): " Michael Haggerty
2016-02-17 20:53 ` Junio C Hamano
2016-02-19 16:07 ` Michael Haggerty
2016-02-19 17:15 ` Junio C Hamano
2016-02-16 13:22 ` [PATCH 06/20] rename_tmp_log(): improve error reporting Michael Haggerty
2016-02-18 22:14 ` Junio C Hamano
2016-02-16 13:22 ` [PATCH 07/20] log_ref_setup(): separate code for create vs non-create Michael Haggerty
2016-02-16 13:22 ` [PATCH 08/20] log_ref_setup(): improve robustness against races Michael Haggerty
2016-02-18 22:17 ` Junio C Hamano
2016-02-16 13:22 ` [PATCH 09/20] log_ref_setup(): pass the open file descriptor back to the caller Michael Haggerty
2016-02-18 22:21 ` Junio C Hamano
2016-02-16 13:22 ` [PATCH 10/20] log_ref_write_1(): don't depend on logfile Michael Haggerty
2016-02-16 13:22 ` [PATCH 11/20] log_ref_setup(): manage the name of the reflog file internally Michael Haggerty
2016-02-16 13:22 ` [PATCH 12/20] log_ref_write_1(): inline function Michael Haggerty
2016-02-18 22:23 ` Junio C Hamano
2016-02-16 13:22 ` [PATCH 13/20] try_remove_empty_parents(): rename parameter "name" -> "refname" Michael Haggerty
2016-02-16 13:22 ` [PATCH 14/20] try_remove_empty_parents(): don't trash argument contents Michael Haggerty
2016-02-16 13:22 ` [PATCH 15/20] try_remove_empty_parents(): don't accommodate consecutive slashes Michael Haggerty
2016-02-16 13:22 ` [PATCH 16/20] t5505: use "for-each-ref" to test for the non-existence of references Michael Haggerty
2016-02-16 13:22 ` [PATCH 17/20] delete_ref_loose(): derive loose reference path from lock Michael Haggerty
2016-02-16 13:22 ` [PATCH 18/20] delete_ref_loose(): inline function Michael Haggerty
2016-02-16 13:22 ` [PATCH 19/20] try_remove_empty_parents(): teach to remove parents of reflogs, too Michael Haggerty
2016-02-16 13:22 ` [PATCH 20/20] ref_transaction_commit(): clean up empty directories Michael Haggerty
2016-02-17 0:08 ` [PATCH 00/20] Delete directories left empty after ref deletion Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1455626201.git.mhagger@alum.mit.edu \
--to=mhagger@alum.mit.edu \
--cc=dturner@twopensource.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kmoskowski@me.com \
--cc=mh@glandium.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).