From: Tanay Abhra <tanayabh@gmail.com>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: Git List <git@vger.kernel.org>,
Ramkumar Ramachandra <artagnon@gmail.com>,
Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>,
Jeff King <peff@peff.net>, Torsten Bogershausen <tboegi@web.de>
Subject: Re: [PATCH v1] config: Add hashtable for config parsing & retrival
Date: Tue, 10 Jun 2014 05:35:08 -0700 [thread overview]
Message-ID: <5396FB7C.6000709@gmail.com> (raw)
In-Reply-To: <CAPig+cTR5SdDF0QnKN8GFEKFkNEK_HoCDHj_vTnAbp37HK3kDA@mail.gmail.com>
Hi,
Thanks for the review, Eric. I have replied to your comments below,
I will try to reply early and more promptly now.
On 06/10/2014 04:27 AM, Eric Sunshine wrote:
>> ---
>> diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt
>> index 230b3a0..5b6e376 100644
>> --- a/Documentation/technical/api-config.txt
>> +++ b/Documentation/technical/api-config.txt
>> @@ -77,6 +77,24 @@ To read a specific file in git-config format, use
>> `git_config_from_file`. This takes the same callback and data parameters
>> as `git_config`.
>>
>> +Querying For Specific Variables
>> +-------------------------------
>> +
>> +For programs wanting to query for specific variables in a non-callback
>> +manner, the config API provides two functions `git_config_get_string`
>> +and `git_config_get_string_multi`. They both take a single parameter,
>> +
>> +- a key string in canonical flat form for which the corresponding values
>> + will be retrieved and returned.
>
> It's strange to have a bulleted list for a single item. It can be
> stated more naturally as a full sentence, without the list.
Point Noted.
>> +They both read values from an internal cache generated previously from
>> +reading the config files. `git_config_get_string` returns the value with
>> +the highest priority(i.e. for the same variable value in the repo config
>> +will be preferred over value in user wide config).
>> +
>> +`git_config_get_string_multi` returns a `string_list` containing all the
>> +values for that particular key, sorted in order of increasing priority.
>> +
>> Value Parsing Helpers
>> ---------------------
>>
>> diff --git a/config.c b/config.c
>> index a30cb5c..6f6b04e 100644
>> --- a/config.c
>> +++ b/config.c
>> @@ -9,6 +9,8 @@
>> #include "exec_cmd.h"
>> #include "strbuf.h"
>> #include "quote.h"
>> +#include "hashmap.h"
>> +#include "string-list.h"
>>
>> struct config_source {
>> struct config_source *prev;
>> @@ -37,6 +39,120 @@ static struct config_source *cf;
>>
>> static int zlib_compression_seen;
>>
>> +struct config_cache_entry {
>> + struct hashmap_entry ent;
>> + char *key;
>> + struct string_list *value_list;
>
> Same question as in my v1 and v2 reviews [1][2], and reiterated by
> Matthieu [3]: Why is 'value_list' a pointer to a structure rather than
> just a structure?
>
Sorry for the lack of response on this part. I thought choosing a pointer to a
structure or just the structure itself was a stylistic choice. Since most of the
functions just pass the pointer to this struct, I thought it would be more natural to
use it. But I have changed my mind on this one. In the next iteration I will be using
a plane old struct.
>
>> +};
>> +
>> +static int hashmap_is_init;
>> +
>> +static int config_cache_set_value(const char *key, const char *value);
>> +
>> +static int config_cache_entry_cmp_case(const struct config_cache_entry *e1,
>> + const struct config_cache_entry *e2, const void *unused)
>> +{
>> + return strcmp(e1->key, e2->key);
>
> As suggested by Peff [4], this comparison is now case-sensitive,
> instead of case-insensitive as in the previous version. Rather than
> changing the appended '_icase' to '_case', a more typical function
> name would be simply config_cache_entry_cmp().
Noted.
>> +}
>> +
>> +static void config_cache_init(struct hashmap *config_cache)
>> +{
>> + hashmap_init(config_cache, (hashmap_cmp_fn) config_cache_entry_cmp_case, 0);
>
> In his review [4], Peff suggested giving config_cache_entry_cmp_case()
> the correct hashmap_cmp_fn signature so that this cast can be dropped.
> It's not clear whether you disagree with his advice or overlooked or
> forgot about it. If you disagree with his suggestion, it's okay to say
> so and explain why you feel the way you do, but without feedback from
> you, he or another reviewer is going to continue suggesting the same
> change.
Now on this one, I checked the code thoroughly. Every single hashmap_init()
incantation in git code has a hashmap_cmp_fn cast. I don't think that it is
necessary to do so. Is it?
>> +}
>> +
>> +static int config_cache_callback(const char *key, const char *value, void *unused)
>> +{
>> + config_cache_set_value(key, value);
>> + return 0;
>> +}
>> +
>> +static struct hashmap *get_config_cache(void)
>> +{
>> + static struct hashmap config_cache;
>> + if (!hashmap_is_init) {
>> + config_cache_init(&config_cache);
>> + hashmap_is_init = 1;
>> + git_config(config_cache_callback, NULL);
>> + return &config_cache;
>
> Why do you 'return' here rather than just falling through to the
> 'return' outside the conditional?
Noted.
>> +static struct config_cache_entry *config_cache_find_entry(const char *key)
>> +{
>> + struct hashmap *config_cache;
>> + struct config_cache_entry k;
>> + char *normalized_key;
>> + int baselength = 0, ret;
>> + config_cache = get_config_cache();
>> + ret = git_config_parse_key(key, &normalized_key, &baselength);
>
> Since you neither care about nor ever reference 'baselength', you
> should just pass NULL as the third argument.
>
Noted.
>> + if (ret)
>> + return NULL;
>> +
>> + hashmap_entry_init(&k, strhash(normalized_key));
>> + k.key = (char*) key;
>
> This looks fishy. You're hashing based upon 'normalized_key' but then
> comparing against the unnormalized 'key'. Peff's suggestion [4] was to
> store the normalized key in the hash, which means that you should use
> 'normalized_key' for both hashing and comparing. (See additional
> commentary about this issue below in config_cache_set_value().)
Ouch, this I had corrected in a future commit. But forgot to include in
this patch.
> Style: (char *)key
Noted. In function arguments the code uses (char*) key. I copied it from there.
:)
>> + return hashmap_get(config_cache, &k, NULL);
>
> You're leaking 'normalized_key', which git_config_parse_key()
> allocated on your behalf.
>
Noted. I will now check with valgrind before sending any future series.
>> +}
>> +
>> +static struct string_list *config_cache_get_value(const char *key)
>> +{
>> + struct config_cache_entry *e = config_cache_find_entry(key);
>> + return e ? e->value_list : NULL;
>> +}
>> +
>> +
>> +static int config_cache_set_value(const char *key, const char *value)
>> +{
>> + struct hashmap *config_cache;
>> + struct config_cache_entry *e;
>> +
>> + config_cache = get_config_cache();
>> + e = config_cache_find_entry(key);
>> + if (!e) {
>> + e = xmalloc(sizeof(*e));
>> + hashmap_entry_init(e, strhash(key));
>
> The hash computed to store the key should be the same as the hash to
> look it up. In config_cache_find_entry(), you're correctly computing
> the hash based upon the normalized key, but here you're doing so from
> the unnormalized key.
>
>> + e->key = xstrdup(key);
>
> Likewise. Normalized keys should be stored in the hash, not unnormalized.
>
> You should be invoking git_config_parse_key() to normalize the key
> both for hashing and storing.
>
> Note, also, that call git_config_parse_key() allocates the normalize
> key on your behalf, so you do *not* want to xstrdup() it here.
config_cache_set_value() gets its values from git_config() as it the callback.
git_config() feeds the callback only normalized values by using functions like
get_extended_base_var(), get_base_var() etc. Thus, I didn't use
git_config_parse_key(). Please correct me if I am wrong, but I tested this case
thoroughly.
>> + e->value_list = xcalloc(sizeof(struct string_list), 1);
>> + e->value_list->strdup_strings = 1;
>
> This code is perhaps too intimate with the implementation of
> string_list. It may work today (because it is safe to initialize all
> string_list fields to 0 via xcalloc()), but a future change to the
> string_list implementation may invalidate that assumption. The
> initialization macros in string-list.h exist to preserve the
> abstraction, so you don't have to know the details of initialization.
> The macros are not suitable for your use-case, but an initialization
> function, such as string_list_init(), would be suitable, and it would
> be appropriate to add such a function in a preparatory patch (as
> already suggested by [1]).
Noted. As I am going to use a simple struct for the list, this won't be
a problem.
>> + string_list_append(e->value_list, value);
>> + hashmap_add(config_cache, e);
>> + } else {
>> + string_list_append(e->value_list, value);
>> + }
>> + return 0;
>> +}
>> +
>> +extern const char *git_config_get_string(const char *key)
>
> Drop the 'extern'. The header already declares it such.
>
Noted.
>> +{
>> + struct string_list *values;
>> + values = config_cache_get_value(key);
>> + if (!values)
>> + return NULL;
>> + assert(values->nr > 0);
>> + return values->items[values->nr - 1].string;
>> +}
>> +
>> +extern const struct string_list *git_config_get_string_multi(const char *key)
>
> Drop the 'extern'. The header already declares it such.
>
Noted.
>> +{
>> + return config_cache_get_value(key);
>> +}
>> +
>> static int config_file_fgetc(struct config_source *conf)
>> {
>> return fgetc(conf->u.file);
>> @@ -1700,6 +1816,12 @@ int git_config_set_multivar_in_file(const char *config_filename,
>> lock = NULL;
>> ret = 0;
>>
>> + /* contents of config file has changed, so invalidate the
>> + * config cache used by non-callback based query functions.
>> + */
>> + if (hashmap_is_init)
>> + config_cache_free();
>> +
>> out_free:
>> if (lock)
>> rollback_lock_file(lock);
>> --
>> 1.9.0.GIT
>
> [1]: http://git.661346.n2.nabble.com/RFC-PATCH-0-2-Git-config-cache-amp-special-querying-api-utilizing-the-cache-td7611691.html#a7611860
> [2]: http://thread.gmane.org/gmane.comp.version-control.git/250566/focus=251058
> [3]: http://thread.gmane.org/gmane.comp.version-control.git/251073/focus=251079
> [4]: http://thread.gmane.org/gmane.comp.version-control.git/250566/focus=250618
>
next prev parent reply other threads:[~2014-06-10 12:35 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-09 12:49 [PATCH v1] Git config cache & special querying api utilizing the cache Tanay Abhra
2014-06-09 12:49 ` [PATCH v1] config: Add hashtable for config parsing & retrival Tanay Abhra
2014-06-09 14:24 ` Matthieu Moy
2014-06-10 11:51 ` Eric Sunshine
2014-06-10 11:27 ` Eric Sunshine
2014-06-10 12:35 ` Tanay Abhra [this message]
2014-06-10 21:28 ` Eric Sunshine
2014-06-10 11:45 ` Eric Sunshine
2014-06-10 12:37 ` Tanay Abhra
2014-06-10 23:30 ` Eric Sunshine
2014-06-11 14:01 ` [PATCH v1] Git config cache & special querying api utilizing the cache Matthieu Moy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5396FB7C.6000709@gmail.com \
--to=tanayabh@gmail.com \
--cc=Matthieu.Moy@grenoble-inp.fr \
--cc=artagnon@gmail.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=sunshine@sunshineco.com \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).