* [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators @ 2017-03-25 18:12 Daniel Ferreira 2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Daniel Ferreira @ 2017-03-25 18:12 UTC (permalink / raw) To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira This is the third version of the GSoC microproject of refactoring remove_subtree() from recursively using readdir() to use dir_iterator. Below are the threads for other versions: v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#mae023e7a7d7626f00e0923833c4359f5af493730 v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t Duy suggested adding features to dir_iterator might go beyond the intention of a microproject, but I figured I might go for it to learn more about the project. The dir_iterator reimplementation has been tested in a separate binary I created (and linked with libgit.a) to reproduce remove_subtree()'s contents. As pointed out in the last thread, git's tests for this function were unable to catch a daunting bug I had introduced, and I still haven't been able to come up with a way to reproduce remove_subtree() being called. Any help? Thank you all again for all the reviews. Daniel Ferreira (2): dir_iterator: iterate over dir after its contents remove_subtree(): reimplement using iterators dir-iterator.c | 100 ++++++++++++++++++++++++++++++++++++++++++++------------- dir-iterator.h | 7 ++++ entry.c | 32 +++++++----------- 3 files changed, 95 insertions(+), 44 deletions(-) -- 2.7.4 (Apple Git-66) ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents 2017-03-25 18:12 [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira @ 2017-03-25 18:12 ` Daniel Ferreira 2017-03-26 22:40 ` Michael Haggerty 2017-03-26 22:48 ` Michael Haggerty 2017-03-25 18:12 ` [PATCH v3 2/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira 2017-03-28 17:13 ` [PATCH v3 0/2] " Stefan Beller 2 siblings, 2 replies; 6+ messages in thread From: Daniel Ferreira @ 2017-03-25 18:12 UTC (permalink / raw) To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira Create an option for the dir_iterator API to iterate over a directory path only after having iterated through its contents. This feature was predicted, although not implemented by 0fe5043 ("dir_iterator: new API for iterating over a directory tree", 2016-06-18). This is useful for recursively removing a directory and calling rmdir() on a directory only after all of its contents have been wiped. An "options" member has been added to the dir_iterator struct. It contains the "iterate_dirs_after_files" flag, that enables the feature when set to 1. Default behavior continues to be iterating over directory paths before its contents. Two inline functions have been added to dir_iterator's code to avoid code repetition inside dir_iterator_advance() and make code more clear. No particular functions or wrappers for setting the options struct's fields have been added to avoid either breaking the current dir_iterator API or over-engineering an extremely simple option architecture. Signed-off-by: Daniel Ferreira <bnmvco@gmail.com> --- dir-iterator.c | 100 ++++++++++++++++++++++++++++++++++++++++++++------------- dir-iterator.h | 7 ++++ 2 files changed, 84 insertions(+), 23 deletions(-) diff --git a/dir-iterator.c b/dir-iterator.c index 34182a9..833d56a 100644 --- a/dir-iterator.c +++ b/dir-iterator.c @@ -50,6 +50,43 @@ struct dir_iterator_int { struct dir_iterator_level *levels; }; +static inline void push_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level) +{ + level->dir_state = DIR_STATE_RECURSE; + ALLOC_GROW(iter->levels, iter->levels_nr + 1, + iter->levels_alloc); + level = &iter->levels[iter->levels_nr++]; + level->initialized = 0; +} + +static inline int pop_dir_level(struct dir_iterator_int *iter, struct dir_iterator_level *level) +{ + return --iter->levels_nr; +} + +static inline int set_iterator_data(struct dir_iterator_int *iter, struct dir_iterator_level *level) +{ + if (lstat(iter->base.path.buf, &iter->base.st) < 0) { + if (errno != ENOENT) + warning("error reading path '%s': %s", + iter->base.path.buf, + strerror(errno)); + return -1; + } + + /* + * We have to set these each time because + * the path strbuf might have been realloc()ed. + */ + iter->base.relative_path = + iter->base.path.buf + iter->levels[0].prefix_len; + iter->base.basename = + iter->base.path.buf + level->prefix_len; + level->dir_state = DIR_STATE_ITER; + + return 0; +} + int dir_iterator_advance(struct dir_iterator *dir_iterator) { struct dir_iterator_int *iter = @@ -77,18 +114,16 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) } level->initialized = 1; - } else if (S_ISDIR(iter->base.st.st_mode)) { + } else if (S_ISDIR(iter->base.st.st_mode) && + !iter->base.options.iterate_dirs_after_files) { if (level->dir_state == DIR_STATE_ITER) { /* * The directory was just iterated * over; now prepare to iterate into - * it. + * it (unless an option is set for us + * to do otherwise). */ - level->dir_state = DIR_STATE_RECURSE; - ALLOC_GROW(iter->levels, iter->levels_nr + 1, - iter->levels_alloc); - level = &iter->levels[iter->levels_nr++]; - level->initialized = 0; + push_dir_level(iter, level); continue; } else { /* @@ -104,7 +139,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) * This level is exhausted (or wasn't opened * successfully); pop up a level. */ - if (--iter->levels_nr == 0) + if (pop_dir_level(iter, level) == 0) return dir_iterator_abort(dir_iterator); continue; @@ -120,16 +155,33 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) de = readdir(level->dir); if (!de) { - /* This level is exhausted; pop up a level. */ + /* This level is exhausted */ if (errno) { warning("error reading directory %s: %s", iter->base.path.buf, strerror(errno)); + } else if (iter->base.options.iterate_dirs_after_files) { + /* If we are handling dirpaths after their contents, + * we have to iterate over the directory now that we'll + * have finished iterating into it. */ + level->dir = NULL; + + if (pop_dir_level(iter, level) == 0) + return dir_iterator_abort(dir_iterator); + + level = &iter->levels[iter->levels_nr - 1]; + /* Remove a trailing slash */ + strbuf_strip_suffix(&iter->base.path, "/"); + + if (set_iterator_data(iter, level)) + continue; + + return ITER_OK; } else if (closedir(level->dir)) warning("error closing directory %s: %s", iter->base.path.buf, strerror(errno)); level->dir = NULL; - if (--iter->levels_nr == 0) + if (pop_dir_level(iter, level) == 0) return dir_iterator_abort(dir_iterator); break; } @@ -138,26 +190,26 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator) continue; strbuf_addstr(&iter->base.path, de->d_name); - if (lstat(iter->base.path.buf, &iter->base.st) < 0) { - if (errno != ENOENT) - warning("error reading path '%s': %s", - iter->base.path.buf, - strerror(errno)); + + if (set_iterator_data(iter, level)) continue; - } /* - * We have to set these each time because - * the path strbuf might have been realloc()ed. + * If we want to iterate dirs after files, we shall + * begin looking into them *before* we return the dir + * itself. */ - iter->base.relative_path = - iter->base.path.buf + iter->levels[0].prefix_len; - iter->base.basename = - iter->base.path.buf + level->prefix_len; - level->dir_state = DIR_STATE_ITER; + if (S_ISDIR(iter->base.st.st_mode) && + iter->base.options.iterate_dirs_after_files) { + push_dir_level(iter, level); + goto continue_outer_loop; + } return ITER_OK; } + +continue_outer_loop: + ; } } @@ -190,6 +242,8 @@ struct dir_iterator *dir_iterator_begin(const char *path) if (!path || !*path) die("BUG: empty path passed to dir_iterator_begin()"); + iter->base.options.iterate_dirs_after_files = 0; + strbuf_init(&iter->base.path, PATH_MAX); strbuf_addstr(&iter->base.path, path); diff --git a/dir-iterator.h b/dir-iterator.h index 27739e6..4304913 100644 --- a/dir-iterator.h +++ b/dir-iterator.h @@ -38,7 +38,14 @@ * dir_iterator_advance() again. */ +struct dir_iterator_options { + unsigned iterate_dirs_after_files : 1; +}; + struct dir_iterator { + /* Options for dir_iterator */ + struct dir_iterator_options options; + /* The current path: */ struct strbuf path; -- 2.7.4 (Apple Git-66) ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents 2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira @ 2017-03-26 22:40 ` Michael Haggerty 2017-03-26 22:48 ` Michael Haggerty 1 sibling, 0 replies; 6+ messages in thread From: Michael Haggerty @ 2017-03-26 22:40 UTC (permalink / raw) To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds On 03/25/2017 07:12 PM, Daniel Ferreira wrote: > Create an option for the dir_iterator API to iterate over a directory > path only after having iterated through its contents. This feature was > predicted, although not implemented by 0fe5043 ("dir_iterator: new API > for iterating over a directory tree", 2016-06-18). > > This is useful for recursively removing a directory and calling rmdir() > on a directory only after all of its contents have been wiped. > > An "options" member has been added to the dir_iterator struct. It > contains the "iterate_dirs_after_files" flag, that enables the feature > when set to 1. Default behavior continues to be iterating over directory > paths before its contents. > > Two inline functions have been added to dir_iterator's code to avoid > code repetition inside dir_iterator_advance() and make code more clear. > > No particular functions or wrappers for setting the options struct's > fields have been added to avoid either breaking the current dir_iterator > API or over-engineering an extremely simple option architecture. This patch would be easier to read if it were split into two: one extracting the new functions and changing old code to use them, and a second adding the new functionality. As one patch, is is hard to see quickly which changes have what purpose. I also suggest adding a new `unsigned int flags` parameter to `dir_iterator_begin`. I think that's more natural, because it doesn't make sense to change the iteration order during an iteration. It's not much of a problem to change the API given that all callers are in the same codebase. If you were to forget to convert any callers (or if a different in-flight patch series were to add a new caller using the old call style), the compiler would complain, and the problem would be obvious and easy to fix. I didn't actually read the patch carefully yet because I don't have time this evening to seek out the interesting parts in the long diff. Michael > [...] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents 2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira 2017-03-26 22:40 ` Michael Haggerty @ 2017-03-26 22:48 ` Michael Haggerty 1 sibling, 0 replies; 6+ messages in thread From: Michael Haggerty @ 2017-03-26 22:48 UTC (permalink / raw) To: Daniel Ferreira, git; +Cc: gitster, sbeller, pclouds On 03/25/2017 07:12 PM, Daniel Ferreira wrote: > Create an option for the dir_iterator API to iterate over a directory > path only after having iterated through its contents. This feature was > predicted, although not implemented by 0fe5043 ("dir_iterator: new API > for iterating over a directory tree", 2016-06-18). > > This is useful for recursively removing a directory and calling rmdir() > on a directory only after all of its contents have been wiped. > > An "options" member has been added to the dir_iterator struct. It > contains the "iterate_dirs_after_files" flag, that enables the feature > when set to 1. Default behavior continues to be iterating over directory > paths before its contents. > > Two inline functions have been added to dir_iterator's code to avoid > code repetition inside dir_iterator_advance() and make code more clear. > > No particular functions or wrappers for setting the options struct's > fields have been added to avoid either breaking the current dir_iterator > API or over-engineering an extremely simple option architecture. > > Signed-off-by: Daniel Ferreira <bnmvco@gmail.com> > --- > dir-iterator.c | 100 ++++++++++++++++++++++++++++++++++++++++++++------------- > dir-iterator.h | 7 ++++ > 2 files changed, 84 insertions(+), 23 deletions(-) > > [...] > diff --git a/dir-iterator.h b/dir-iterator.h > index 27739e6..4304913 100644 > --- a/dir-iterator.h > +++ b/dir-iterator.h > @@ -38,7 +38,14 @@ > * dir_iterator_advance() again. > */ > > +struct dir_iterator_options { > + unsigned iterate_dirs_after_files : 1; > +}; > + > struct dir_iterator { > + /* Options for dir_iterator */ > + struct dir_iterator_options options; > + > /* The current path: */ > struct strbuf path; Another thing I noticed: the name of this option, `iterate_dirs_after_files`, is a little bit misleading. If I understand correctly, it doesn't make the iteration process files before directories within a single directory; rather, it ensures that subdirectories and their contents are processed before the containing directory. Therefore, a better name might be something like "depth_first". I should mention that I like the overall idea to add this new feature and use it to simplify `remove_subtree()`. Michael ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 2/2] [GSoC] remove_subtree(): reimplement using iterators 2017-03-25 18:12 [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira 2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira @ 2017-03-25 18:12 ` Daniel Ferreira 2017-03-28 17:13 ` [PATCH v3 0/2] " Stefan Beller 2 siblings, 0 replies; 6+ messages in thread From: Daniel Ferreira @ 2017-03-25 18:12 UTC (permalink / raw) To: git; +Cc: gitster, sbeller, pclouds, mhagger, Daniel Ferreira Use dir_iterator to traverse through remove_subtree()'s directory tree, avoiding the need for recursive calls to readdir(). Simplify remove_subtree()'s code. A conversion similar in purpose was previously done at 46d092a ("for_each_reflog(): reimplement using iterators", 2016-05-21). Signed-off-by: Daniel Ferreira <bnmvco@gmail.com> --- entry.c | 32 +++++++++++--------------------- 1 file changed, 11 insertions(+), 21 deletions(-) diff --git a/entry.c b/entry.c index c6eea24..670ffeb 100644 --- a/entry.c +++ b/entry.c @@ -2,6 +2,8 @@ #include "blob.h" #include "dir.h" #include "streaming.h" +#include "iterator.h" +#include "dir-iterator.h" static void create_directories(const char *path, int path_len, const struct checkout *state) @@ -46,29 +48,17 @@ static void create_directories(const char *path, int path_len, static void remove_subtree(struct strbuf *path) { - DIR *dir = opendir(path->buf); - struct dirent *de; - int origlen = path->len; - - if (!dir) - die_errno("cannot opendir '%s'", path->buf); - while ((de = readdir(dir)) != NULL) { - struct stat st; - - if (is_dot_or_dotdot(de->d_name)) - continue; - - strbuf_addch(path, '/'); - strbuf_addstr(path, de->d_name); - if (lstat(path->buf, &st)) - die_errno("cannot lstat '%s'", path->buf); - if (S_ISDIR(st.st_mode)) - remove_subtree(path); - else if (unlink(path->buf)) + struct dir_iterator *diter = dir_iterator_begin(path->buf); + diter->options.iterate_dirs_after_files = 1; + + while (dir_iterator_advance(diter) == ITER_OK) { + if (S_ISDIR(diter->st.st_mode)) { + if (rmdir(diter->path.buf)) + die_errno("cannot rmdir '%s'", path->buf); + } else if (unlink(diter->path.buf)) die_errno("cannot unlink '%s'", path->buf); - strbuf_setlen(path, origlen); } - closedir(dir); + if (rmdir(path->buf)) die_errno("cannot rmdir '%s'", path->buf); } -- 2.7.4 (Apple Git-66) ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators 2017-03-25 18:12 [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira 2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira 2017-03-25 18:12 ` [PATCH v3 2/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira @ 2017-03-28 17:13 ` Stefan Beller 2 siblings, 0 replies; 6+ messages in thread From: Stefan Beller @ 2017-03-28 17:13 UTC (permalink / raw) To: Daniel Ferreira Cc: git@vger.kernel.org, Junio C Hamano, Duy Nguyen, Michael Haggerty On Sat, Mar 25, 2017 at 11:12 AM, Daniel Ferreira <bnmvco@gmail.com> wrote: > This is the third version of the GSoC microproject > of refactoring remove_subtree() from recursively using > readdir() to use dir_iterator. Below are the threads for > other versions: > > v1: https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsAh16ANYg@mail.gmail.com/T/#mae023e7a7d7626f00e0923833c4359f5af493730 > v2: https://public-inbox.org/git/CACsJy8Dxh-QPBBLfaFWPAWUsbA9GVXA7x+mXLjEvYKhk1zOpig@mail.gmail.com/T/#t > > Duy suggested adding features to dir_iterator might go > beyond the intention of a microproject, but I figured I > might go for it to learn more about the project. > > The dir_iterator reimplementation has been tested in a > separate binary I created (and linked with libgit.a) to > reproduce remove_subtree()'s contents. As pointed out in the > last thread, git's tests for this function were unable to > catch a daunting bug I had introduced, and I still haven't > been able to come up with a way to reproduce remove_subtree() > being called. Any help? > I would think a test llike the following would work: test_expect_success 'remove nested subtrees' ' test_commit initial && mkdir -p dir/with/nested/dir && echo content >dir/with/nested/dir/file && echo content >dir/file && git add dir/with/nested/dir/file dir/file && git commit -a -m "commit directory structure" && git checkout initial && ! test dir ' ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-03-28 17:13 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-03-25 18:12 [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira 2017-03-25 18:12 ` [PATCH v3 1/2] [GSoC] dir_iterator: iterate over dir after its contents Daniel Ferreira 2017-03-26 22:40 ` Michael Haggerty 2017-03-26 22:48 ` Michael Haggerty 2017-03-25 18:12 ` [PATCH v3 2/2] [GSoC] remove_subtree(): reimplement using iterators Daniel Ferreira 2017-03-28 17:13 ` [PATCH v3 0/2] " Stefan Beller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).