From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>
Subject: [PATCH v2 0/2] Clean up some memory leaks in and around dir.c
Date: Tue, 18 Aug 2020 22:58:24 +0000 [thread overview]
Message-ID: <pull.831.v2.git.git.1597791506.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.831.git.git.1597561152.gitgitgadget@gmail.com>
Some memory leaks in dir.c were making it hard for me to find my own leaks
in merge-ort. I followed a few threads and tried to both clean up the leaks
and make the API less prone to incorrect use.
Changes since v1:
* dropped the first patch, since Peff submitted an identical patch a day
before me
* implemented Peff's suggestions to rename dir_free() to dir_clear() and
have it call dir_init() at the end so that folks can re-use the struct
after dir_clear() if they so choose.
Elijah Newren (2):
dir: make clear_directory() free all relevant memory
dir: fix problematic API to avoid memory leaks
builtin/add.c | 4 ++--
builtin/check-ignore.c | 4 ++--
builtin/clean.c | 12 ++++--------
builtin/grep.c | 3 ++-
builtin/ls-files.c | 4 ++--
builtin/stash.c | 7 ++-----
dir.c | 20 +++++++++++++++++---
dir.h | 21 +++++++++++----------
merge.c | 3 ++-
wt-status.c | 8 ++------
10 files changed, 46 insertions(+), 40 deletions(-)
base-commit: 878e727637ec5815ccb3301eb994a54df95b21b8
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-831%2Fnewren%2Fdir_memory_cleanup-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-831/newren/dir_memory_cleanup-v2
Pull-Request: https://github.com/git/git/pull/831
Range-diff vs v1:
1: 932741d759 < -: ---------- dir: fix leak of parent_hashmap and recursive_hashmap
2: 068e097e22 ! 1: f890ac0279 dir: make clear_directory() free all relevant memory
@@ Commit message
The calling convention for the dir API is supposed to end with a call to
clear_directory() to free up no longer needed memory. However,
clear_directory() didn't free dir->entries or dir->ignored. I believe
- this was oversight, but a number of callers noticed memory leaks and
- started free'ing these, but often somewhat haphazardly (sometimes
- freeing the entries in the arrays, and sometimes only free'ing the
- arrays themselves). This suggests the callers weren't trying to make
- sure any possible memory used might be free'd, but just the memory they
- noticed their usecase definitely had allocated. This also caused the
- extra memory deallocations to be duplicated in many places.
+ this was an oversight, but a number of callers noticed memory leaks and
+ started free'ing these. Unfortunately, they did so somewhat haphazardly
+ (sometimes freeing the entries in the arrays, and sometimes only
+ free'ing the arrays themselves). This suggests the callers weren't
+ trying to make sure any possible memory used might be free'd, but just
+ the memory they noticed their usecase definitely had allocated.
Fix this mess by moving all the duplicated free'ing logic into
- clear_directory().
+ clear_directory(). End by resetting dir to a pristine state so it could
+ be reused if desired.
Signed-off-by: Elijah Newren <newren@gmail.com>
@@ builtin/stash.c: static int get_untracked_files(const struct pathspec *ps, int i
## dir.c ##
@@ dir.c: int remove_path(const char *name)
- return 0;
}
--/*
+ /*
- * Frees memory within dir which was allocated for exclude lists and
- * the exclude_stack. Does not free dir itself.
-- */
-+/* Frees memory within dir which was allocated. Does not free dir itself. */
++ * Frees memory within dir which was allocated, and resets fields for further
++ * use. Does not free dir itself.
+ */
void clear_directory(struct dir_struct *dir)
{
- int i, j;
@@ dir.c: void clear_directory(struct dir_struct *dir)
free(group->pl);
}
@@ dir.c: void clear_directory(struct dir_struct *dir)
stk = dir->exclude_stack;
while (stk) {
struct exclude_stack *prev = stk->prev;
+@@ dir.c: void clear_directory(struct dir_struct *dir)
+ stk = prev;
+ }
+ strbuf_release(&dir->basebuf);
++
++ memset(&dir, 0, sizeof(*dir));
+ }
+
+ struct ondisk_untracked_cache {
## dir.h ##
@@
3: b9310e9941 ! 2: dcd72eef48 dir: fix problematic API to avoid memory leaks
@@ Metadata
## Commit message ##
dir: fix problematic API to avoid memory leaks
- Two commits ago, we noticed that parent_hashmap and recursive_hashmap
- were being leaked and fixed it by making clear_pattern_list() free those
- hashmaps. One commit ago, we noticed that clear_directory() was only
- taking responsibility for a subset of fields within dir_struct, despite
- the fact that entries[] and ignored[] we allocated internally to dir.c,
- resulting in many callers either leaking or haphazardly trying to free
- these arrays and their contents.
+ The dir structure seemed to have a number of leaks and problems around
+ it. First I noticed that parent_hashmap and recursive_hashmap were
+ being leaked (though Peff noticed and submitted fixes before me). Then
+ I noticed in the previous commit that clear_directory() was only taking
+ responsibility for a subset of fields within dir_struct, despite the
+ fact that entries[] and ignored[] we allocated internally to dir.c.
+ That, of course, resulted in many callers either leaking or haphazardly
+ trying to free these arrays and their contents.
Digging further, I found that despite the pretty clear documentation
near the top of dir.h that folks were supposed to call clear_directory()
@@ Commit message
which then snowballed into further problems with the entries[],
ignored[], parent_hashmap, and recursive_hashmap problems.
- Rename clear_directory() to dir_free() to be more in line with other
+ Rename clear_directory() to dir_clear() to be more in line with other
data structures in git, and introduce a dir_init() to handle the
suggested memsetting of dir_struct to all zeroes. I hope that a name
- like "dir_free()" is more clear, and that the presence of dir_init()
- will provide a hint to those looking at the code that there may be a
- corresponding dir_free() that they need to call.
+ like "dir_clear()" is more clear, and that the presence of dir_init()
+ will provide a hint to those looking at the code that they need to look
+ for either a dir_clear() or a dir_free() and lead them to find
+ dir_clear().
Signed-off-by: Elijah Newren <newren@gmail.com>
@@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
COMMIT_LOCK | SKIP_IF_UNCHANGED))
die(_("Unable to write new index file"));
-+ dir_free(&dir);
++ dir_clear(&dir);
UNLEAK(pathspec);
- UNLEAK(dir);
return exit_status;
@@ builtin/check-ignore.c: int cmd_check_ignore(int argc, const char **argv, const
}
- clear_directory(&dir);
-+ dir_free(&dir);
++ dir_clear(&dir);
return !num_ignored;
}
@@ builtin/clean.c: static int filter_by_patterns_cmd(void)
strbuf_list_free(ignore_list);
- clear_directory(&dir);
-+ dir_free(&dir);
++ dir_clear(&dir);
}
strbuf_release(&confirm);
@@ builtin/clean.c: int cmd_clean(int argc, const char **argv, const char *prefix)
}
- clear_directory(&dir);
-+ dir_free(&dir);
++ dir_clear(&dir);
if (interactive && del_list.nr > 0)
interactive_main_loop();
@@ builtin/grep.c: static int grep_directory(struct grep_opt *opt, const struct pat
if (hit && opt->status_only)
break;
}
-+ dir_free(&dir);
++ dir_clear(&dir);
return hit;
}
@@ builtin/ls-files.c: int cmd_ls_files(int argc, const char **argv, const char *cm
}
- UNLEAK(dir);
-+ dir_free(&dir);
++ dir_clear(&dir);
return 0;
}
@@ builtin/stash.c: static int get_untracked_files(const struct pathspec *ps, int i
}
- clear_directory(&dir);
-+ dir_free(&dir);
++ dir_clear(&dir);
return found;
}
@@ dir.c: static enum path_treatment read_directory_recursive(struct dir_struct *di
{
int cnt = 0;
@@ dir.c: int remove_path(const char *name)
- }
-
- /* Frees memory within dir which was allocated. Does not free dir itself. */
+ * Frees memory within dir which was allocated, and resets fields for further
+ * use. Does not free dir itself.
+ */
-void clear_directory(struct dir_struct *dir)
-+void dir_free(struct dir_struct *dir)
++void dir_clear(struct dir_struct *dir)
{
int i, j;
struct exclude_list_group *group;
+@@ dir.c: void clear_directory(struct dir_struct *dir)
+ }
+ strbuf_release(&dir->basebuf);
+
+- memset(&dir, 0, sizeof(*dir));
++ dir_init(dir);
+ }
+
+ struct ondisk_untracked_cache {
## dir.h ##
@@
@@ dir.h
- * - Use `dir.entries[]`.
+ * - Use `dir.entries[]` and `dir.ignored[]`.
*
- * - Call `clear_directory()` when the contained elements are no longer in use.
+- * - Call `clear_directory()` when the contained elements are no longer in use.
++ * - Call `dir_clear()` when the contained elements are no longer in use.
*
+ */
+
@@ dir.h: int match_pathspec(const struct index_state *istate,
int report_path_error(const char *ps_matched, const struct pathspec *pathspec);
int within_depth(const char *name, int namelen, int depth, int max_depth);
@@ dir.h: void parse_path_pattern(const char **string, int *patternlen, unsigned *f
int baselen, struct pattern_list *pl, int srcpos);
void clear_pattern_list(struct pattern_list *pl);
-void clear_directory(struct dir_struct *dir);
-+void dir_free(struct dir_struct *dir);
++void dir_clear(struct dir_struct *dir);
int repo_file_exists(struct repository *repo, const char *path);
int file_exists(const char *);
@@ merge.c: int checkout_fast_forward(struct repository *r,
clear_unpack_trees_porcelain(&opts);
return -1;
}
-+ dir_free(&dir);
++ dir_clear(&dir);
clear_unpack_trees_porcelain(&opts);
if (write_locked_index(r->index, &lock_file, COMMIT_LOCK))
@@ wt-status.c: static void wt_status_collect_untracked(struct wt_status *s)
}
- clear_directory(&dir);
-+ dir_free(&dir);
++ dir_clear(&dir);
if (advice_status_u_option)
s->untracked_in_ms = (getnanotime() - t_begin) / 1000000;
--
gitgitgadget
next prev parent reply other threads:[~2020-08-18 22:58 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-16 6:59 [PATCH 0/3] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
2020-08-16 6:59 ` [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap Elijah Newren via GitGitGadget
2020-08-16 8:43 ` Jeff King
2020-08-17 16:57 ` Elijah Newren
2020-08-16 6:59 ` [PATCH 2/3] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-16 8:54 ` Jeff King
2020-08-17 16:58 ` Elijah Newren
2020-08-16 6:59 ` [PATCH 3/3] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-16 9:11 ` Jeff King
2020-08-17 17:19 ` Elijah Newren
2020-08-17 19:00 ` Jeff King
2020-08-18 22:58 ` Elijah Newren via GitGitGadget [this message]
2020-08-18 22:58 ` [PATCH v2 1/2] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-18 22:58 ` [PATCH v2 2/2] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-19 13:51 ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.831.v2.git.git.1597791506.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.