* RFC GSoC idea: git configuration caching (needs co-mentor!) @ 2014-03-06 5:57 Michael Haggerty 2014-03-06 19:24 ` Junio C Hamano 0 siblings, 1 reply; 4+ messages in thread From: Michael Haggerty @ 2014-03-06 5:57 UTC (permalink / raw) To: git discussion list; +Cc: Jeff King, Junio C Hamano, Matthieu Moy I just wrote up the idea that fell out of the discussion [1] about the other configuration features that I proposed. As far as I am concerned, it can be merged as soon as somebody volunteers as a co-mentor. The idea is embodied in a pull request against the git.github.io repository [2]; the text is also appended below for your convenience. Michael [1] http://article.gmane.org/gmane.comp.version-control.git/242952 [2] https://github.com/git/git.github.io/pull/7 ### git configuration API improvements There are many places in Git that need to read a configuration value. Currently, each such site calls `git_config()`, which reads and parses the configuration files every time that it is called. This is wasteful, because it results in the configuration files being processed multiple times during a single `git` invocation. It also prevents the implementation of potential new features, like adding syntax to allow a configuration file to unset a previously-set value. This goal of this project is to make configuration work as follows: * Read the configuration from files once and cache the results in an appropriate data structure in memory. * Change `git_config()` to iterate through the pre-read values in memory rather than re-reading the configuration files. * Add new API calls that allow the cache to be inquired easily and efficiently. Rewrite other functions like `git_config_int()` to be cache-aware. * Rewrite callers to use the new API wherever possible. You will need to consider how to handle other config API entry points like `git_config_early()` and `git_config_from_file()`, as well as how to invalidate the cache correctly in the case that the configuration is changed while `git` is executing. See [this mailing list thread](http://article.gmane.org/gmane.comp.version-control.git/242952) for some discussion about this and related ideas. - Language: C - Difficulty: medium - Possible mentors: Michael Haggerty -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC GSoC idea: git configuration caching (needs co-mentor!) 2014-03-06 5:57 RFC GSoC idea: git configuration caching (needs co-mentor!) Michael Haggerty @ 2014-03-06 19:24 ` Junio C Hamano 2014-03-06 19:46 ` Jeff King 2014-03-06 21:33 ` Michael Haggerty 0 siblings, 2 replies; 4+ messages in thread From: Junio C Hamano @ 2014-03-06 19:24 UTC (permalink / raw) To: Michael Haggerty; +Cc: git discussion list, Jeff King, Matthieu Moy Michael Haggerty <mhagger@alum.mit.edu> writes: > I just wrote up the idea that fell out of the discussion [1] about the > other configuration features that I proposed. As far as I am concerned, > it can be merged as soon as somebody volunteers as a co-mentor. The > idea is embodied in a pull request against the git.github.io repository > [2]; the text is also appended below for your convenience. > > Michael > > [1] http://article.gmane.org/gmane.comp.version-control.git/242952 > [2] https://github.com/git/git.github.io/pull/7 > > ### git configuration API improvements > > There are many places in Git that need to read a configuration value. > Currently, each such site calls `git_config()`, which reads and parses > the configuration files every time that it is called. This is > wasteful, because it results in the configuration files being > processed multiple times during a single `git` invocation. It also > prevents the implementation of potential new features, like adding > syntax to allow a configuration file to unset a previously-set value. > > This goal of this project is to make configuration work as follows: > > * Read the configuration from files once and cache the results in an > appropriate data structure in memory. > > * Change `git_config()` to iterate through the pre-read values in > memory rather than re-reading the configuration files. > > * Add new API calls that allow the cache to be inquired easily and > efficiently. Rewrite other functions like `git_config_int()` to be > cache-aware. Are you sure about the second sentence of this item is what you want? git_config_<type>(name, value) are all about parsing "value" (string or NULL) as <type>, return the parsed value or complain against a bad value for "name". They do not care where these "name" and "value" come from right now, and there is no reason for them to start caring about caching. They will still be the underlying helper functions the git_config() callbacks will depend on even after the second item in your list happens. A set of new API calls would look more like this, I would think: extern int git_get_config_string_multi(const char *, int *, const char ***); const char **values; int num_values; if (git_get_config_string_multi("sample.str", &num_values, &values)) return -1; printf("[sample]\n"); for (i = 0; i < num_values; i++) printf(" str = %s\n", value[i]); printf("\n"); free(values); with a "singleton" wrapper that may be in essense: const char *git_get_config_string(const char *name) { const char **values, *result; int num_values; if (git_get_config_string_multi("sample.str", &num_values, &values)) return NULL; result = num_values ? values[num_values - 1] : NULL; free(values); return result; } that implements the "last one wins" semantics. The real thing would need to avoid allocation and free overhead. > * Rewrite callers to use the new API wherever possible. > > You will need to consider how to handle other config API entry points > like `git_config_early()` and `git_config_from_file()`, as well as how > to invalidate the cache correctly in the case that the configuration > is changed while `git` is executing. > > See > [this mailing list > thread](http://article.gmane.org/gmane.comp.version-control.git/242952) > for some discussion about this and related ideas. > > - Language: C > - Difficulty: medium > - Possible mentors: Michael Haggerty ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC GSoC idea: git configuration caching (needs co-mentor!) 2014-03-06 19:24 ` Junio C Hamano @ 2014-03-06 19:46 ` Jeff King 2014-03-06 21:33 ` Michael Haggerty 1 sibling, 0 replies; 4+ messages in thread From: Jeff King @ 2014-03-06 19:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: Michael Haggerty, git discussion list, Matthieu Moy On Thu, Mar 06, 2014 at 11:24:18AM -0800, Junio C Hamano wrote: > > * Add new API calls that allow the cache to be inquired easily and > > efficiently. Rewrite other functions like `git_config_int()` to be > > cache-aware. > > Are you sure about the second sentence of this item is what you > want? > > git_config_<type>(name, value) are all about parsing "value" (string > or NULL) as <type>, return the parsed value or complain against a > bad value for "name". They do not care where these "name" and > "value" come from right now, and there is no reason for them to > start caring about caching. They will still be the underlying > helper functions the git_config() callbacks will depend on even > after the second item in your list happens. Yeah, I agree we want a _new_ set of helpers for retrieving values in a non-callback way. We could call those helpers "git_config_int" (and rename the existing pure functions), but I'd rather not, as it simply invites confusion with the existing ones. > A set of new API calls would look more like this, I would think: > > extern int git_get_config_string_multi(const char *, int *, const char ***); Not important at this stage, but I was hoping we could keep the names of the new helpers shorter. :) > const char *git_get_config_string(const char *name) > { > const char **values, *result; > int num_values; > > if (git_get_config_string_multi("sample.str", &num_values, &values)) > return NULL; > result = num_values ? values[num_values - 1] : NULL; > free(values); > return result; > } > > that implements the "last one wins" semantics. The real thing would > need to avoid allocation and free overhead. One of the things that needs to be figured out by the student is the format of the internal cache. I had actually envisioned a mapping of keys to values, where values are represented as a full list of strings. Then your "string_multi" can just return a pointer to that list, and a last-one-wins lookup can grab the final value, with no allocation or ownership complexity. We'd lose the relative order of different config keys, but those should never be important (only the order of single keys, but that is reflected in the order of the value list). Another approach would be to actually represent the syntax tree of the config file in memory. That would make lookups of individual keys more expensive, but would enable other manipulation. E.g., if your syntax tree included nodes for comments and other non-semantic constructs, then we can use it for a complete rewrite. And "git config" becomes: 1. Read the tree. 2. Perform operations on the tree (add nodes, delete nodes, etc). 3. Write out the tree. and things like "remove the section header when the last item in the section is removed" become trivial during step 2. But comparing those approaches is something for the student to figure out, I think. -Peff ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC GSoC idea: git configuration caching (needs co-mentor!) 2014-03-06 19:24 ` Junio C Hamano 2014-03-06 19:46 ` Jeff King @ 2014-03-06 21:33 ` Michael Haggerty 1 sibling, 0 replies; 4+ messages in thread From: Michael Haggerty @ 2014-03-06 21:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: git discussion list, Jeff King, Matthieu Moy On 03/06/2014 08:24 PM, Junio C Hamano wrote: > Michael Haggerty <mhagger@alum.mit.edu> writes: > >> I just wrote up the idea that fell out of the discussion [1] about the >> other configuration features that I proposed. As far as I am concerned, >> it can be merged as soon as somebody volunteers as a co-mentor. The >> idea is embodied in a pull request against the git.github.io repository >> [2]; the text is also appended below for your convenience. >> >> Michael >> >> [1] http://article.gmane.org/gmane.comp.version-control.git/242952 >> [2] https://github.com/git/git.github.io/pull/7 >> >> ### git configuration API improvements >> >> There are many places in Git that need to read a configuration value. >> Currently, each such site calls `git_config()`, which reads and parses >> the configuration files every time that it is called. This is >> wasteful, because it results in the configuration files being >> processed multiple times during a single `git` invocation. It also >> prevents the implementation of potential new features, like adding >> syntax to allow a configuration file to unset a previously-set value. >> >> This goal of this project is to make configuration work as follows: >> >> * Read the configuration from files once and cache the results in an >> appropriate data structure in memory. >> >> * Change `git_config()` to iterate through the pre-read values in >> memory rather than re-reading the configuration files. >> >> * Add new API calls that allow the cache to be inquired easily and >> efficiently. Rewrite other functions like `git_config_int()` to be >> cache-aware. > > Are you sure about the second sentence of this item is what you > want? > > git_config_<type>(name, value) are all about parsing "value" (string > or NULL) as <type>, return the parsed value or complain against a > bad value for "name". They do not care where these "name" and > "value" come from right now, and there is no reason for them to > start caring about caching. They will still be the underlying > helper functions the git_config() callbacks will depend on even > after the second item in your list happens. You're right of course. For some reason I had it in my brain that these functions retrieved *and* parsed values, as opposed to just parsing them. I just fixed the text and pushed it live. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-03-06 21:33 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-06 5:57 RFC GSoC idea: git configuration caching (needs co-mentor!) Michael Haggerty 2014-03-06 19:24 ` Junio C Hamano 2014-03-06 19:46 ` Jeff King 2014-03-06 21:33 ` Michael Haggerty
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).