From: Daniel Ferreira <bnmvco@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, sbeller@google.com, pclouds@gmail.com,
mhagger@alum.mit.edu, Daniel Ferreira <bnmvco@gmail.com>
Subject: [PATCH v7 4/5] dir_iterator: refactor state machine model
Date: Sun, 2 Apr 2017 17:03:07 -0300 [thread overview]
Message-ID: <1491163388-41255-5-git-send-email-bnmvco@gmail.com> (raw)
In-Reply-To: <1491163388-41255-1-git-send-email-bnmvco@gmail.com>
Perform major refactor of dir_iterator_advance(). dir_iterator has
ceased to rely on a convoluted state machine mechanism of two loops and
two state variables (level.initialized and level.dir_state). This serves
to ease comprehension of the iterator mechanism and ease addition of new
features to the iterator.
Create an option for the dir_iterator API to iterate over subdirectories
only after having iterated through their contents. This feature was
predicted, although not implemented by 0fe5043 ("dir_iterator: new API
for iterating over a directory tree", 2016-06-18). This is useful for
recursively removing a directory and calling rmdir() on a directory only
after all of its contents have been wiped.
Add an option for the dir_iterator API to iterate over the root
directory (the one it was initialized with) as well.
Add the "flags" parameter to dir_iterator_create, allowing for the
aforementioned new features to be enabled. The new default behavior
(i.e. flags set to 0) does not iterate over directories. Flag
DIR_ITERATOR_PRE_ORDER_TRAVERSAL iterates over a directory before doing
so over its contents. DIR_ITERATOR_POST_ORDER_TRAVERSAL iterates over a
directory after doing so over its contents. DIR_ITERATOR_LIST_ROOT_DIR
iterates over the root directory. These flags do not conflict with each
other and may be used simultaneously.
Amend a call to dir_iterator_begin() in refs/files-backend.c to pass
the flags parameter introduced.
Improve t/t0065-dir-iterator.sh and t/helper/test-dir-iterator.c to
test "post-order" and "iterate-over-root" modes.
Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
---
dir-iterator.c | 155 +++++++++++++++++++++++++++----------------
dir-iterator.h | 28 ++++++--
refs/files-backend.c | 2 +-
t/helper/test-dir-iterator.c | 6 +-
t/t0065-dir-iterator.sh | 61 ++++++++++++++++-
5 files changed, 183 insertions(+), 69 deletions(-)
diff --git a/dir-iterator.c b/dir-iterator.c
index ce8bf81..18b7e68 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -4,8 +4,6 @@
#include "dir-iterator.h"
struct dir_iterator_level {
- int initialized;
-
DIR *dir;
/*
@@ -20,8 +18,11 @@ struct dir_iterator_level {
* iteration and also iterated into):
*/
enum {
- DIR_STATE_ITER,
- DIR_STATE_RECURSE
+ DIR_STATE_PUSHED,
+ DIR_STATE_PRE_ITERATION,
+ DIR_STATE_ITERATING,
+ DIR_STATE_POST_ITERATION,
+ DIR_STATE_EXHAUSTED
} dir_state;
};
@@ -48,15 +49,20 @@ struct dir_iterator_int {
* that will be included in this iteration.
*/
struct dir_iterator_level *levels;
+
+ /* Holds the flags passed to dir_iterator_begin(). */
+ unsigned flags;
};
static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level)
{
- level->dir_state = DIR_STATE_RECURSE;
ALLOC_GROW(iter->levels, iter->levels_nr + 1,
iter->levels_alloc);
+
+ /* Push a new level */
level = &iter->levels[iter->levels_nr++];
- level->initialized = 0;
+ level->dir = NULL;
+ level->dir_state = DIR_STATE_PUSHED;
}
static inline int pop_dir_level(struct dir_iterator_int *iter)
@@ -75,18 +81,35 @@ static inline int set_iterator_data(struct dir_iterator_int *iter, struct dir_it
}
/*
- * We have to set these each time because
- * the path strbuf might have been realloc()ed.
+ * Check if we are dealing with the root directory as an
+ * item that's being iterated through.
*/
- iter->base.relative_path =
- iter->base.path.buf + iter->levels[0].prefix_len;
+ if (level->dir_state != DIR_STATE_ITERATING &&
+ iter->levels_nr == 1) {
+ iter->base.relative_path = ".";
+ }
+ else {
+ iter->base.relative_path =
+ iter->base.path.buf + iter->levels[0].prefix_len;
+ }
iter->base.basename =
iter->base.path.buf + level->prefix_len;
- level->dir_state = DIR_STATE_ITER;
return 0;
}
+/*
+ * This function uses a state machine with the following states:
+ * -> DIR_STATE_PUSHED: the directory has been pushed to the
+ * iterator traversal tree.
+ * -> DIR_STATE_PRE_ITERATION: the directory is *NOT* initialized. The
+ * dirpath has already been returned if pre-order traversal is set.
+ * -> DIR_STATE_ITERATING: the directory is initialized. We are traversing
+ * through it.
+ * -> DIR_STATE_POST_ITERATION: the directory has been iterated through.
+ * We are ready to close it.
+ * -> DIR_STATE_EXHAUSTED: the directory is closed and ready to be popped.
+ */
int dir_iterator_advance(struct dir_iterator *dir_iterator)
{
struct dir_iterator_int *iter =
@@ -97,7 +120,18 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
&iter->levels[iter->levels_nr - 1];
struct dirent *de;
- if (!level->initialized) {
+ if (level->dir_state == DIR_STATE_PUSHED) {
+ level->dir_state = DIR_STATE_PRE_ITERATION;
+
+ if (iter->flags & DIR_ITERATOR_PRE_ORDER_TRAVERSAL) {
+ /* We may not want the root directory to be iterated over */
+ if (iter->levels_nr != 1 ||
+ (iter->flags & DIR_ITERATOR_LIST_ROOT_DIR)) {
+ set_iterator_data(iter, level);
+ return ITER_OK;
+ }
+ }
+ } else if (level->dir_state == DIR_STATE_PRE_ITERATION) {
/*
* Note: dir_iterator_begin() ensures that
* path is not the empty string.
@@ -107,64 +141,35 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
level->prefix_len = iter->base.path.len;
level->dir = opendir(iter->base.path.buf);
- if (!level->dir && errno != ENOENT) {
- warning("error opening directory %s: %s",
- iter->base.path.buf, strerror(errno));
- /* Popping the level is handled below */
- }
-
- level->initialized = 1;
- } else if (S_ISDIR(iter->base.st.st_mode)) {
- if (level->dir_state == DIR_STATE_ITER) {
+ if (!level->dir) {
/*
- * The directory was just iterated
- * over; now prepare to iterate into
- * it.
+ * This level wasn't opened sucessfully; pretend we
+ * iterated through it already.
*/
- push_dir_level(iter, level);
+ if (errno != ENOENT) {
+ warning("error opening directory %s: %s",
+ iter->base.path.buf, strerror(errno));
+ }
+
+ level->dir_state = DIR_STATE_POST_ITERATION;
continue;
- } else {
- /*
- * The directory has already been
- * iterated over and iterated into;
- * we're done with it.
- */
}
- }
- if (!level->dir) {
- /*
- * This level is exhausted (or wasn't opened
- * successfully); pop up a level.
- */
- if (pop_dir_level(iter) == 0)
- return dir_iterator_abort(dir_iterator);
-
- continue;
- }
-
- /*
- * Loop until we find an entry that we can give back
- * to the caller:
- */
- while (1) {
+ level->dir_state = DIR_STATE_ITERATING;
+ } else if (level->dir_state == DIR_STATE_ITERATING) {
strbuf_setlen(&iter->base.path, level->prefix_len);
errno = 0;
de = readdir(level->dir);
if (!de) {
- /* This level is exhausted; pop up a level. */
+ /* In case of readdir() error */
if (errno) {
warning("error reading directory %s: %s",
iter->base.path.buf, strerror(errno));
- } else if (closedir(level->dir))
- warning("error closing directory %s: %s",
- iter->base.path.buf, strerror(errno));
+ }
- level->dir = NULL;
- if (pop_dir_level(iter) == 0)
- return dir_iterator_abort(dir_iterator);
- break;
+ level->dir_state = DIR_STATE_POST_ITERATION;
+ continue;
}
if (is_dot_or_dotdot(de->d_name))
@@ -175,7 +180,38 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
if (set_iterator_data(iter, level))
continue;
+ if (S_ISDIR(iter->base.st.st_mode)) {
+ push_dir_level(iter, level);
+ continue;
+ }
+
return ITER_OK;
+ } else if (level->dir_state == DIR_STATE_POST_ITERATION) {
+ if (level->dir != NULL && closedir(level->dir)) {
+ warning("error closing directory %s: %s",
+ iter->base.path.buf, strerror(errno));
+ }
+ level->dir_state = DIR_STATE_EXHAUSTED;
+
+ strbuf_setlen(&iter->base.path, level->prefix_len);
+ /*
+ * Since we are iterating through the dirpath
+ * after we have gone through it, we still need
+ * to get rid of the trailing slash we appended.
+ */
+ strbuf_strip_suffix(&iter->base.path, "/");
+
+ if (iter->flags & DIR_ITERATOR_POST_ORDER_TRAVERSAL) {
+ /* We may not want the root directory to be iterated over */
+ if (iter->levels_nr != 1 ||
+ (iter->flags & DIR_ITERATOR_LIST_ROOT_DIR)) {
+ set_iterator_data(iter, level);
+ return ITER_OK;
+ }
+ }
+ } else if (level->dir_state == DIR_STATE_EXHAUSTED) {
+ if (pop_dir_level(iter) == 0)
+ return dir_iterator_abort(dir_iterator);
}
}
}
@@ -201,7 +237,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
return ITER_DONE;
}
-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
{
struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
struct dir_iterator *dir_iterator = &iter->base;
@@ -209,13 +245,16 @@ struct dir_iterator *dir_iterator_begin(const char *path)
if (!path || !*path)
die("BUG: empty path passed to dir_iterator_begin()");
+ iter->flags = flags;
+
strbuf_init(&iter->base.path, PATH_MAX);
strbuf_addstr(&iter->base.path, path);
ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
iter->levels_nr = 1;
- iter->levels[0].initialized = 0;
+ iter->levels[0].dir = NULL;
+ iter->levels[0].dir_state = DIR_STATE_PUSHED;
return dir_iterator;
}
diff --git a/dir-iterator.h b/dir-iterator.h
index 27739e6..0e82f36 100644
--- a/dir-iterator.h
+++ b/dir-iterator.h
@@ -11,13 +11,12 @@
* Every time dir_iterator_advance() is called, update the members of
* the dir_iterator structure to reflect the next path in the
* iteration. The order that paths are iterated over within a
- * directory is undefined, but directory paths are always iterated
- * over before the subdirectory contents.
+ * directory is undefined.
*
* A typical iteration looks like this:
*
* int ok;
- * struct iterator *iter = dir_iterator_begin(path);
+ * struct iterator *iter = dir_iterator_begin(path, flags);
*
* while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
* if (want_to_stop_iteration()) {
@@ -38,6 +37,22 @@
* dir_iterator_advance() again.
*/
+/*
+ * Possible flags for dir_iterator_begin().
+ *
+ * -> DIR_ITERATOR_PRE_ORDER_TRAVERSAL: the iterator shall return
+ * a dirpath it has found before iterating through that directory's
+ * contents.
+ * -> DIR_ITERATOR_POST_ORDER_TRAVERSAL: the iterator shall return
+ * a dirpath it has found after iterating through that directory's
+ * contents.
+ * -> DIR_ITERATOR_LIST_ROOT_DIR: the iterator shall return the dirpath
+ * of the root directory it is iterating through.
+ */
+#define DIR_ITERATOR_PRE_ORDER_TRAVERSAL (1 << 0)
+#define DIR_ITERATOR_POST_ORDER_TRAVERSAL (1 << 1)
+#define DIR_ITERATOR_LIST_ROOT_DIR (1 << 2)
+
struct dir_iterator {
/* The current path: */
struct strbuf path;
@@ -57,15 +72,16 @@ struct dir_iterator {
};
/*
- * Start a directory iteration over path. Return a dir_iterator that
- * holds the internal state of the iteration.
+ * Start a directory iteration over path, with options specified in
+ * 'flags'. Return a dir_iterator that holds the internal state of
+ * the iteration.
*
* The iteration includes all paths under path, not including path
* itself and not including "." or ".." entries.
*
* path is the starting directory. An internal copy will be made.
*/
-struct dir_iterator *dir_iterator_begin(const char *path);
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags);
/*
* Advance the iterator to the first or next item and return ITER_OK.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 50188e9..c29dc68 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3346,7 +3346,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
files_downcast(ref_store, 0, "reflog_iterator_begin");
base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
- iter->dir_iterator = dir_iterator_begin(git_path("logs"));
+ iter->dir_iterator = dir_iterator_begin(git_path("logs"), DIR_ITERATOR_PRE_ORDER_TRAVERSAL);
return ref_iterator;
}
diff --git a/t/helper/test-dir-iterator.c b/t/helper/test-dir-iterator.c
index 06f03fc..a1b8c78 100644
--- a/t/helper/test-dir-iterator.c
+++ b/t/helper/test-dir-iterator.c
@@ -6,6 +6,7 @@
int cmd_main(int argc, const char **argv) {
struct strbuf path = STRBUF_INIT;
struct dir_iterator *diter;
+ unsigned flag = DIR_ITERATOR_PRE_ORDER_TRAVERSAL;
if (argc < 2) {
return 1;
@@ -13,7 +14,10 @@ int cmd_main(int argc, const char **argv) {
strbuf_add(&path, argv[1], strlen(argv[1]));
- diter = dir_iterator_begin(path.buf);
+ if (argc == 3)
+ flag = atoi(argv[2]);
+
+ diter = dir_iterator_begin(path.buf, flag);
while (dir_iterator_advance(diter) == ITER_OK) {
if (S_ISDIR(diter->st.st_mode))
diff --git a/t/t0065-dir-iterator.sh b/t/t0065-dir-iterator.sh
index b857c07..ade3ee0 100755
--- a/t/t0065-dir-iterator.sh
+++ b/t/t0065-dir-iterator.sh
@@ -28,12 +28,28 @@ test_expect_success 'dir-iterator should iterate through all files' '
>dir/a/e &&
>dir/d/e/d/a &&
- test-dir-iterator ./dir | sort >./actual-pre-order-sorted-output &&
+ test-dir-iterator ./dir 1 | sort >./actual-pre-order-sorted-output &&
rm -rf dir &&
test_cmp expect-sorted-output actual-pre-order-sorted-output
'
+test_expect_success 'dir-iterator should iterate through all files on post-order mode' '
+ mkdir -p dir &&
+ mkdir -p dir/a/b/c/ &&
+ >dir/b &&
+ >dir/c &&
+ mkdir -p dir/d/e/d/ &&
+ >dir/a/b/c/d &&
+ >dir/a/e &&
+ >dir/d/e/d/a &&
+
+ test-dir-iterator ./dir 2 | sort >actual-post-order-sorted-output &&
+ rm -rf dir &&
+
+ test_cmp expect-sorted-output actual-post-order-sorted-output
+'
+
cat >expect-pre-order-output <<-\EOF &&
[d] (a) ./dir/a
[d] (a/b) ./dir/a/b
@@ -41,14 +57,53 @@ cat >expect-pre-order-output <<-\EOF &&
[f] (a/b/c/d) ./dir/a/b/c/d
EOF
-test_expect_success 'dir-iterator should list files in the correct order' '
+test_expect_success 'dir-iterator should list files properly on pre-order mode' '
mkdir -p dir/a/b/c/ &&
>dir/a/b/c/d &&
- test-dir-iterator ./dir >actual-pre-order-output &&
+ test-dir-iterator ./dir 1 >actual-pre-order-output &&
rm -rf dir &&
test_cmp expect-pre-order-output actual-pre-order-output
'
+cat >expect-post-order-output <<-\EOF &&
+[f] (a/b/c/d) ./dir/a/b/c/d
+[d] (a/b/c) ./dir/a/b/c
+[d] (a/b) ./dir/a/b
+[d] (a) ./dir/a
+EOF
+
+test_expect_success 'dir-iterator should list files properly on post-order mode' '
+ mkdir -p dir/a/b/c/ &&
+ >dir/a/b/c/d &&
+
+ test-dir-iterator ./dir 2 >actual-post-order-output &&
+ rm -rf dir &&
+
+ test_cmp expect-post-order-output actual-post-order-output
+'
+
+cat >expect-pre-order-post-order-root-dir-output <<-\EOF &&
+[d] (.) ./dir
+[d] (a) ./dir/a
+[d] (a/b) ./dir/a/b
+[d] (a/b/c) ./dir/a/b/c
+[f] (a/b/c/d) ./dir/a/b/c/d
+[d] (a/b/c) ./dir/a/b/c
+[d] (a/b) ./dir/a/b
+[d] (a) ./dir/a
+[d] (.) ./dir
+EOF
+
+test_expect_success 'dir-iterator should list files properly on pre-order + post-order + root-dir mode' '
+ mkdir -p dir/a/b/c/ &&
+ >dir/a/b/c/d &&
+
+ test-dir-iterator ./dir 7 >actual-pre-order-post-order-root-dir-output &&
+ rm -rf dir &&
+
+ test_cmp expect-pre-order-post-order-root-dir-output actual-pre-order-post-order-root-dir-output
+'
+
test_done
--
2.7.4 (Apple Git-66)
next prev parent reply other threads:[~2017-04-02 20:05 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-02 20:03 [PATCH v7 0/5] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-04-02 20:03 ` [PATCH v7 1/5] dir_iterator: add tests for dir_iterator API Daniel Ferreira
2017-04-03 22:31 ` Stefan Beller
2017-04-02 20:03 ` [PATCH v7 2/5] remove_subtree(): test removing nested directories Daniel Ferreira
2017-04-03 22:35 ` Stefan Beller
2017-04-02 20:03 ` [PATCH v7 3/5] dir_iterator: add helpers to dir_iterator_advance Daniel Ferreira
2017-04-03 22:48 ` Stefan Beller
2017-04-02 20:03 ` Daniel Ferreira [this message]
2017-04-03 3:36 ` [PATCH v7 4/5] dir_iterator: refactor state machine model Michael Haggerty
2017-04-03 18:03 ` Daniel Ferreira (theiostream)
2017-04-04 6:36 ` Michael Haggerty
2017-04-02 20:03 ` [PATCH v7 5/5] remove_subtree(): reimplement using iterators Daniel Ferreira
2017-04-03 3:42 ` Michael Haggerty
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1491163388-41255-5-git-send-email-bnmvco@gmail.com \
--to=bnmvco@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).