Re: [PATCH v2 12/13] dir_iterator: new API for iterating over a directory tree

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Haggerty <mhagger@alum.mit.edu>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"David Turner" <dturner@twopensource.com>,
	"Ramsay Jones" <ramsay@ramsayjones.plus.com>,
	"Jeff King" <peff@peff.net>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Git List" <git@vger.kernel.org>
Subject: Re: [PATCH v2 12/13] dir_iterator: new API for iterating over a directory tree
Date: Thu, 9 Jun 2016 13:46:03 +0200	[thread overview]
Message-ID: <575956FB.9010207@alum.mit.edu> (raw)
In-Reply-To: <CAPig+cRKa2QF9fp0in7wbPBzyY3UQbvNrioFWjxPjQQsoY=F9A@mail.gmail.com>

On 06/07/2016 07:13 AM, Eric Sunshine wrote:
> On Fri, Jun 3, 2016 at 8:33 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> The iterator interface is modeled on that for references, though no
>> vtable is necessary because there is (so far?) only one type of
>> dir_iterator.
>> [...]
> 
> Some minor comments below, though probably nothing demanding a re-roll...
> 
>> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
>> ---
>> diff --git a/dir-iterator.c b/dir-iterator.c
>> @@ -0,0 +1,185 @@
>> +int dir_iterator_advance(struct dir_iterator *dir_iterator)
>> +{
>> +       struct dir_iterator_int *iter =
>> +               (struct dir_iterator_int *)dir_iterator;
>> +
>> +       while (1) {
>> +               struct dir_iterator_level *level =
>> +                       &iter->levels[iter->levels_nr - 1];
>> +               struct dirent *de;
>> +
>> +               if (!level->initialized) {
>> +                       if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
> 
> I realize that dir_iterator_begin() ensures that we never get this far
> if path has zero length, but that check is so (textually and perhaps
> mentally) distant from this chunk of code that it's still a little bit
> scary to see this *potential* access of base.path.buf[-1]. Perhaps an
> assert(iter->base.path.len) just before this line would document that
> this condition was taken into consideration?

An assert() seems a little bit heavy for checking this invariant, which
is easy to check by inspection. I find assert() more useful for checking
state that results from more complicated algorithms, where inspection
doesn't so quickly make it clear that the invariant is held.

If you have no objections, I'll add a comment instead.

>> +                               strbuf_addch(&iter->base.path, '/');
>> +                       level->prefix_len = iter->base.path.len;
>> +
>> +                       /* opendir() errors are handled below */
>> +                       level->dir = opendir(iter->base.path.buf);
>> +[...]
>> +               if (!level->dir) {
>> +                       /*
>> +                        * This level is exhausted (or wasn't opened
>> +                        * successfully); pop up a level.
>> +                        */
>> +                       if (--iter->levels_nr == 0) {
>> +                               return dir_iterator_abort(dir_iterator);
>> +                       }
> 
> Style: unnecessary braces

Thanks.

>> +                       continue;
>> +               }
>> +
>> +               /*
>> +                * Loop until we find an entry that we can give back
>> +                * to the caller:
>> +                */
>> +               while (1) {
>> +[...]
>> +                       strbuf_addstr(&iter->base.path, de->d_name);
>> +                       if (lstat(iter->base.path.buf, &iter->base.st) < 0) {
>> +                               if (errno != ENOENT)
>> +                                       warning("error reading path '%s': %s",
>> +                                               iter->base.path.buf,
>> +                                               strerror(errno));
> 
> Duy's warning_errno() series is already in 'master'...

This patch series is based on a version of master from before Duy's
series was merged. I think it's better to leave this the way it is, and
fix it up in a separate patch sometime after it is merged, than to make
this patch series depend on both mh/split-under-lock *and* a
post-warning_errno() version of master.

>> +                               continue;
>> +                       }
>> +
>> +                       /*
>> +                        * We have to set these each time because
>> +                        * the path strbuf might have been realloc()ed.
>> +                        */
>> +
> 
> Maybe drop the blank line between the comment and the code to which it applies.

Thanks.

>> +                       iter->base.relative_path =
>> +                               iter->base.path.buf + iter->levels[0].prefix_len;
>> +                       iter->base.basename =
>> +                               iter->base.path.buf + level->prefix_len;
>> +                       level->dir_state = DIR_STATE_ITER;
>> +
>> +                       return ITER_OK;
>> +               }
>> +       }
>> +}
>> +
>> +int dir_iterator_abort(struct dir_iterator *dir_iterator)
>> +{
>> +       struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator;
>> +
>> +       while (iter->levels_nr) {
>> +               struct dir_iterator_level *level =
>> +                       &iter->levels[--iter->levels_nr];
> 
> It's a bit difficult to locate the loop control embedded within this
> (textually) noisy expression. I wonder if it would be easier to read
> as a plain for-loop instead:
> 
>     for (; iter->levels_nr; iter->levels_nr--) {
> 
>> +               if (level->dir)
>> +                       closedir(level->dir);

Yes, that's a bit clearer. I'll change it.

> The documentation talks about this function returning ITER_ERROR upon
> abort failure. This implementation doesn't care about closedir()
> errors, thus never returns ITER_ERROR, which I guess agrees with
> dir_iterator_advance() which also doesn't worry about opendir() or
> closedir() errors. Do opendir() and closedir() errors at least deserve
> a warning, as lstat() failure does? (Genuine question; I haven't
> thought too hard about it.)

readdir() and closedir() can only fail with EBADF, which (I think) can
only happen in the case of a bug. But we do want to learn about bugs
*sometime*, so I'll add warnings for them.

opendir(), on the other hand, can fail for several reasons. The code
that this is replacing was completely silent in the case of opendir()
failures. I agree that some things are worth warning about, though I
don't think we should get really chatty about errors that could have
been caused by, e.g., a race. So I'll add a warning for errno != ENOENT.

>> +       }
>> +
>> +       free(iter->levels);
>> +       strbuf_release(&iter->base.path);
>> +       free(iter);
>> +       return ITER_DONE;
>> +}
>> diff --git a/dir-iterator.h b/dir-iterator.h
>> @@ -0,0 +1,86 @@
>> +/*
>> + * Advance the iterator to the first or next item and return ITER_OK.
>> + * If the iteration is exhausted, free the resources associated with
>> + * the iterator and return ITER_DONE. On error, return ITER_ERROR. It
>> + * is a bug to use iterator or call this function again after it has
>> + * returned false.
>> + */
>> +int dir_iterator_advance(struct dir_iterator *iterator);
> 
> The documentation makes it clear that the iterator is freed upon
> ITER_OK, but I'm having trouble determining if it is freed for
> ITER_ERROR, as well. The documentation for ref_iterator_advance(), on
> the other hand, makes it clear that the iterator is freed in all
> cases.

Iterators are always freed when these functions return ITER_DONE or
ITER_ERROR. I'll mention this in the docstring.

Thanks for all your great comments!

Michael

next prev parent reply	other threads:[~2016-06-09 11:46 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-03 12:33 [PATCH v2 00/13] Reference iterators Michael Haggerty
2016-06-03 12:33 ` [PATCH v2 01/13] refs: remove unnecessary "extern" keywords Michael Haggerty
2016-06-03 12:33 ` [PATCH v2 02/13] do_for_each_ref(): move docstring to the header file Michael Haggerty
2016-06-03 12:33 ` [PATCH v2 03/13] refs: use name "prefix" consistently Michael Haggerty
2016-06-03 12:33 ` [PATCH v2 04/13] delete_refs(): add a flags argument Michael Haggerty
2016-06-03 12:33 ` [PATCH v2 05/13] remote rm: handle symbolic refs correctly Michael Haggerty
2016-06-03 12:33 ` [PATCH v2 06/13] get_ref_cache(): only create an instance if there is a submodule Michael Haggerty
2016-06-03 12:33 ` [PATCH v2 07/13] entry_resolves_to_object(): rename function from ref_resolves_to_object() Michael Haggerty
2016-06-03 12:33 ` [PATCH v2 08/13] ref_resolves_to_object(): new function Michael Haggerty
2016-06-03 12:33 ` [PATCH v2 09/13] refs: introduce an iterator interface Michael Haggerty
2016-06-03 12:33 ` [PATCH v2 10/13] do_for_each_ref(): reimplement using reference iteration Michael Haggerty
2016-06-03 12:33 ` [PATCH v2 11/13] for_each_reflog(): don't abort for bad references Michael Haggerty
2016-06-03 12:33 ` [PATCH v2 12/13] dir_iterator: new API for iterating over a directory tree Michael Haggerty
2016-06-07  5:13   ` Eric Sunshine
2016-06-09 11:46     ` Michael Haggerty [this message]
2016-06-09 12:35       ` Michael Haggerty
2016-06-09 16:19         ` Junio C Hamano
2016-06-03 12:33 ` [PATCH v2 13/13] for_each_reflog(): reimplement using iterators Michael Haggerty
2016-06-03 18:02 ` [PATCH v2 00/13] Reference iterators David Turner
2016-06-03 21:33 ` Junio C Hamano
2016-06-03 21:55   ` Michael Haggerty
2016-06-03 23:40     ` Junio C Hamano
2016-06-07 11:58       ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=575956FB.9010207@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).