All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tanay Abhra <tanayabh@gmail.com>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: Git List <git@vger.kernel.org>,
	Ramkumar Ramachandra <artagnon@gmail.com>,
	Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>,
	Jeff King <peff@peff.net>, Torsten Bogershausen <tboegi@web.de>
Subject: Re: [PATCH v1] config: Add hashtable for config parsing & retrival
Date: Tue, 10 Jun 2014 05:35:08 -0700	[thread overview]
Message-ID: <5396FB7C.6000709@gmail.com> (raw)
In-Reply-To: <CAPig+cTR5SdDF0QnKN8GFEKFkNEK_HoCDHj_vTnAbp37HK3kDA@mail.gmail.com>

Hi,

Thanks for the review, Eric. I have replied to your comments below,
I will try to reply early and more promptly now.

On 06/10/2014 04:27 AM, Eric Sunshine wrote:
>> ---
>> diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt
>> index 230b3a0..5b6e376 100644
>> --- a/Documentation/technical/api-config.txt
>> +++ b/Documentation/technical/api-config.txt
>> @@ -77,6 +77,24 @@ To read a specific file in git-config format, use
>>  `git_config_from_file`. This takes the same callback and data parameters
>>  as `git_config`.
>>
>> +Querying For Specific Variables
>> +-------------------------------
>> +
>> +For programs wanting to query for specific variables in a non-callback
>> +manner, the config API provides two functions `git_config_get_string`
>> +and `git_config_get_string_multi`. They both take a single parameter,
>> +
>> +- a key string in canonical flat form for which the corresponding values
>> +  will be retrieved and returned.
> 
> It's strange to have a bulleted list for a single item. It can be
> stated more naturally as a full sentence, without the list.

Point Noted.

>> +They both read values from an internal cache generated previously from
>> +reading the config files. `git_config_get_string` returns the value with
>> +the highest priority(i.e. for the same variable value in the repo config
>> +will be preferred over value in user wide config).
>> +
>> +`git_config_get_string_multi` returns a `string_list` containing all the
>> +values for that particular key, sorted in order of increasing priority.
>> +
>>  Value Parsing Helpers
>>  ---------------------
>>
>> diff --git a/config.c b/config.c
>> index a30cb5c..6f6b04e 100644
>> --- a/config.c
>> +++ b/config.c
>> @@ -9,6 +9,8 @@
>>  #include "exec_cmd.h"
>>  #include "strbuf.h"
>>  #include "quote.h"
>> +#include "hashmap.h"
>> +#include "string-list.h"
>>
>>  struct config_source {
>>         struct config_source *prev;
>> @@ -37,6 +39,120 @@ static struct config_source *cf;
>>
>>  static int zlib_compression_seen;
>>
>> +struct config_cache_entry {
>> +       struct hashmap_entry ent;
>> +       char *key;
>> +       struct string_list *value_list;
> 
> Same question as in my v1 and v2 reviews [1][2], and reiterated by
> Matthieu [3]: Why is 'value_list' a pointer to a structure rather than
> just a structure?
> 

Sorry for the lack of response on this part. I thought choosing a pointer to a
structure or just the structure itself was a stylistic choice. Since most of the
functions just pass the pointer to this struct, I thought it would be more natural to
use it. But I have changed my mind on this one. In the next iteration I will be using
a plane old struct.

> 
>> +};
>> +
>> +static int hashmap_is_init;
>> +
>> +static int config_cache_set_value(const char *key, const char *value);
>> +
>> +static int config_cache_entry_cmp_case(const struct config_cache_entry *e1,
>> +                                const struct config_cache_entry *e2, const void *unused)
>> +{
>> +       return strcmp(e1->key, e2->key);
> 
> As suggested by Peff [4], this comparison is now case-sensitive,
> instead of case-insensitive as in the previous version. Rather than
> changing the appended '_icase' to '_case', a more typical function
> name would be simply config_cache_entry_cmp().

Noted.

>> +}
>> +
>> +static void config_cache_init(struct hashmap *config_cache)
>> +{
>> +       hashmap_init(config_cache, (hashmap_cmp_fn) config_cache_entry_cmp_case, 0);
> 
> In his review [4], Peff suggested giving config_cache_entry_cmp_case()
> the correct hashmap_cmp_fn signature so that this cast can be dropped.
> It's not clear whether you disagree with his advice or overlooked or
> forgot about it. If you disagree with his suggestion, it's okay to say
> so and explain why you feel the way you do, but without feedback from
> you, he or another reviewer is going to continue suggesting the same
> change.

Now on this one, I checked the code thoroughly. Every single hashmap_init()
incantation in git code has a hashmap_cmp_fn cast. I don't think that it is
necessary to do so. Is it?

>> +}
>> +
>> +static int config_cache_callback(const char *key, const char *value, void *unused)
>> +{
>> +       config_cache_set_value(key, value);
>> +       return 0;
>> +}
>> +
>> +static struct hashmap *get_config_cache(void)
>> +{
>> +       static struct hashmap config_cache;
>> +       if (!hashmap_is_init) {
>> +               config_cache_init(&config_cache);
>> +               hashmap_is_init = 1;
>> +               git_config(config_cache_callback, NULL);
>> +               return &config_cache;
> 
> Why do you 'return' here rather than just falling through to the
> 'return' outside the conditional?

Noted.

>> +static struct config_cache_entry *config_cache_find_entry(const char *key)
>> +{
>> +       struct hashmap *config_cache;
>> +       struct config_cache_entry k;
>> +       char *normalized_key;
>> +       int baselength = 0, ret;
>> +       config_cache = get_config_cache();
>> +       ret = git_config_parse_key(key, &normalized_key, &baselength);
> 
> Since you neither care about nor ever reference 'baselength', you
> should just pass NULL as the third argument.
> 

Noted.

>> +       if (ret)
>> +               return NULL;
>> +
>> +       hashmap_entry_init(&k, strhash(normalized_key));
>> +       k.key = (char*) key;
> 
> This looks fishy. You're hashing based upon 'normalized_key' but then
> comparing against the unnormalized 'key'. Peff's suggestion [4] was to
> store the normalized key in the hash, which means that you should use
> 'normalized_key' for both hashing and comparing. (See additional
> commentary about this issue below in config_cache_set_value().)

Ouch, this I had corrected in a future commit. But forgot to include in
this patch.

> Style: (char *)key

Noted. In function arguments the code uses (char*) key. I copied it from there.
:)

>> +       return hashmap_get(config_cache, &k, NULL);
> 
> You're leaking 'normalized_key', which git_config_parse_key()
> allocated on your behalf.
>
Noted. I will now check with valgrind before sending any future series.

>> +}
>> +
>> +static struct string_list *config_cache_get_value(const char *key)
>> +{
>> +       struct config_cache_entry *e = config_cache_find_entry(key);
>> +       return e ? e->value_list : NULL;
>> +}
>> +
>> +
>> +static int config_cache_set_value(const char *key, const char *value)
>> +{
>> +       struct hashmap *config_cache;
>> +       struct config_cache_entry *e;
>> +
>> +       config_cache = get_config_cache();
>> +       e = config_cache_find_entry(key);
>> +       if (!e) {
>> +               e = xmalloc(sizeof(*e));
>> +               hashmap_entry_init(e, strhash(key));
> 
> The hash computed to store the key should be the same as the hash to
> look it up. In config_cache_find_entry(), you're correctly computing
> the hash based upon the normalized key, but here you're doing so from
> the unnormalized key.
> 
>> +               e->key = xstrdup(key);
> 
> Likewise. Normalized keys should be stored in the hash, not unnormalized.
> 
> You should be invoking git_config_parse_key() to normalize the key
> both for hashing and storing.
> 
> Note, also, that call git_config_parse_key() allocates the normalize
> key on your behalf, so you do *not* want to xstrdup() it here.

config_cache_set_value() gets its values from git_config() as it the callback.
git_config() feeds the callback only normalized values by using functions like
get_extended_base_var(), get_base_var() etc. Thus, I didn't use
git_config_parse_key(). Please correct me if I am wrong, but I tested this case
thoroughly.

>> +               e->value_list = xcalloc(sizeof(struct string_list), 1);
>> +               e->value_list->strdup_strings = 1;
> 
> This code is perhaps too intimate with the implementation of
> string_list. It may work today (because it is safe to initialize all
> string_list fields to 0 via xcalloc()), but a future change to the
> string_list implementation may invalidate that assumption. The
> initialization macros in string-list.h exist to preserve the
> abstraction, so you don't have to know the details of initialization.
> The macros are not suitable for your use-case, but an initialization
> function, such as string_list_init(), would be suitable, and it would
> be appropriate to add such a function in a preparatory patch (as
> already suggested by [1]).

Noted. As I am going to use a simple struct for the list, this won't be
a problem.

>> +               string_list_append(e->value_list, value);
>> +               hashmap_add(config_cache, e);
>> +       } else {
>> +               string_list_append(e->value_list, value);
>> +       }
>> +       return 0;
>> +}
>> +
>> +extern const char *git_config_get_string(const char *key)
> 
> Drop the 'extern'. The header already declares it such.
>

Noted.

>> +{
>> +       struct string_list *values;
>> +       values = config_cache_get_value(key);
>> +       if (!values)
>> +               return NULL;
>> +       assert(values->nr > 0);
>> +       return values->items[values->nr - 1].string;
>> +}
>> +
>> +extern const struct string_list *git_config_get_string_multi(const char *key)
> 
> Drop the 'extern'. The header already declares it such.
> 

Noted.

>> +{
>> +       return config_cache_get_value(key);
>> +}
>> +
>>  static int config_file_fgetc(struct config_source *conf)
>>  {
>>         return fgetc(conf->u.file);
>> @@ -1700,6 +1816,12 @@ int git_config_set_multivar_in_file(const char *config_filename,
>>         lock = NULL;
>>         ret = 0;
>>
>> +       /* contents of config file has changed, so invalidate the
>> +        * config cache used by non-callback based query functions.
>> +        */
>> +       if (hashmap_is_init)
>> +               config_cache_free();
>> +
>>  out_free:
>>         if (lock)
>>                 rollback_lock_file(lock);
>> --
>> 1.9.0.GIT
> 
> [1]: http://git.661346.n2.nabble.com/RFC-PATCH-0-2-Git-config-cache-amp-special-querying-api-utilizing-the-cache-td7611691.html#a7611860
> [2]: http://thread.gmane.org/gmane.comp.version-control.git/250566/focus=251058
> [3]: http://thread.gmane.org/gmane.comp.version-control.git/251073/focus=251079
> [4]: http://thread.gmane.org/gmane.comp.version-control.git/250566/focus=250618
> 

  reply	other threads:[~2014-06-10 12:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-09 12:49 [PATCH v1] Git config cache & special querying api utilizing the cache Tanay Abhra
2014-06-09 12:49 ` [PATCH v1] config: Add hashtable for config parsing & retrival Tanay Abhra
2014-06-09 14:24   ` Matthieu Moy
2014-06-10 11:51     ` Eric Sunshine
2014-06-10 11:27   ` Eric Sunshine
2014-06-10 12:35     ` Tanay Abhra [this message]
2014-06-10 21:28       ` Eric Sunshine
2014-06-10 11:45   ` Eric Sunshine
2014-06-10 12:37     ` Tanay Abhra
2014-06-10 23:30   ` Eric Sunshine
2014-06-11 14:01 ` [PATCH v1] Git config cache & special querying api utilizing the cache Matthieu Moy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5396FB7C.6000709@gmail.com \
    --to=tanayabh@gmail.com \
    --cc=Matthieu.Moy@grenoble-inp.fr \
    --cc=artagnon@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=sunshine@sunshineco.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.