* Design of multiple hash support @ 2018-11-05 1:00 brian m. carlson 2018-11-05 2:36 ` Junio C Hamano 2018-11-05 19:03 ` Duy Nguyen 0 siblings, 2 replies; 7+ messages in thread From: brian m. carlson @ 2018-11-05 1:00 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 1469 bytes --] I'm currently working on getting Git to support multiple hash algorithms in the same binary (SHA-1 and SHA-256). In order to have a fully functional binary, we'll need to have some way of indicating to certain commands (such as init and show-index) that they should assume a certain hash algorithm. There are basically two approaches I can take. The first is to provide each command that needs to learn about this with its own --hash argument. So we'd have: git init --hash=sha256 git show-index --hash=sha256 <some-file The other alternative is that we provide a global option to git, which is parsed by all programs, like so: git --hash=sha256 init git --hash=sha256 show-index <some-file There's also the question of what we want to call the option. The obvious name is --hash, which is intuitive and straightforward. However, the transition plan names the config option extensions.objectFormat, so --object-format is also a possibility. If we ever decide to support, say, zstd compression instead of zlib, we could leverage the same option (say, --object-format=sha256:zstd) and avoid the need for an additional option. This might be planning for a future that never occurs, though. I'd like to write this code in the way most acceptable to the list, so I'd appreciate input from others on what they'd like to see in the final series. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 868 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Design of multiple hash support 2018-11-05 1:00 Design of multiple hash support brian m. carlson @ 2018-11-05 2:36 ` Junio C Hamano 2018-11-05 18:03 ` Stefan Beller 2018-11-05 19:03 ` Duy Nguyen 1 sibling, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2018-11-05 2:36 UTC (permalink / raw) To: brian m. carlson; +Cc: git "brian m. carlson" <sandals@crustytoothpaste.net> writes: > I'm currently working on getting Git to support multiple hash algorithms > in the same binary (SHA-1 and SHA-256). In order to have a fully > functional binary, we'll need to have some way of indicating to certain > commands (such as init and show-index) that they should assume a certain > hash algorithm. > > There are basically two approaches I can take. The first is to provide > each command that needs to learn about this with its own --hash > argument. So we'd have: > > git init --hash=sha256 > git show-index --hash=sha256 <some-file > > The other alternative is that we provide a global option to git, which > is parsed by all programs, like so: > > git --hash=sha256 init > git --hash=sha256 show-index <some-file I am assuming that "show-index" above is a typo for something like "hash-object"? It is hard to answer the question without knowing what exactly does "(to) support multiple hash algorithms" mean. For example, inside today's repository, what should this command do? git --hash=sha256 cat-file commit HEAD It can work this way: - read HEAD, discover that I am on 'master' branch, read refs/heads/master to learn the object name in 40-hex, realize that it cannot be sha256 and report "corrupt ref". Or it can work this way: - read repository format, realize it is a good old sha1 repository. - do the usual thing to get to read_object() to read the commit object data for the commit at HEAD, doing all of it in sha1. - in the commit object data, locate references to other objects that use sha1 name. - replace these sha1 references with their sha256 counterparts and show the result. I am guessing that you are doing the former as a good first step, in which case, as an option that changes/affects the behaviour of git globally, I think "git --hash=sha256" would make sense, like other global options like --literal-pathspecs and --no-replace-objects. Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Design of multiple hash support 2018-11-05 2:36 ` Junio C Hamano @ 2018-11-05 18:03 ` Stefan Beller 2018-11-05 23:54 ` brian m. carlson 0 siblings, 1 reply; 7+ messages in thread From: Stefan Beller @ 2018-11-05 18:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: brian m. carlson, git On Sun, Nov 4, 2018 at 6:36 PM Junio C Hamano <gitster@pobox.com> wrote: > > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > I'm currently working on getting Git to support multiple hash algorithms > > in the same binary (SHA-1 and SHA-256). In order to have a fully > > functional binary, we'll need to have some way of indicating to certain > > commands (such as init and show-index) that they should assume a certain > > hash algorithm. > > > > There are basically two approaches I can take. The first is to provide > > each command that needs to learn about this with its own --hash > > argument. So we'd have: > > > > git init --hash=sha256 > > git show-index --hash=sha256 <some-file > > > > The other alternative is that we provide a global option to git, which > > is parsed by all programs, like so: > > > > git --hash=sha256 init > > git --hash=sha256 show-index <some-file > > I am assuming that "show-index" above is a typo for something like > "hash-object"? Actually both seem plausible, as both do not require RUN_SETUP, which means they cannot rely on the extensions.objectFormat setting. When having a global setting, would that override the configured object format extension in a repository, or do we error out? So maybe git -c extensions.objectFormat=sha256 init is the way to go, for now? (Are repository format extensions parsed just like normal config, such that non-RUN_SETUP commands can rely on the (non-)existence to determine whether to use the default or the given hash function?) > It is hard to answer the question without knowing what exactly does > "(to) support multiple hash algorithms" mean. For example, inside > today's repository, what should this command do? > > git --hash=sha256 cat-file commit HEAD There is a section "Object names on the command line" in Documentation/technical/hash-function-transition.txt and I assume that this before the "dark launch" phase, so I would expect the latter to work (no error but conversion/translation on the fly) eventually as a goal. But the former might be in scope of one series. > It can work this way: > > - read HEAD, discover that I am on 'master' branch, read refs/heads/master > to learn the object name in 40-hex, realize that it cannot be > sha256 and report "corrupt ref". > > Or it can work this way: > > - read repository format, realize it is a good old sha1 repository. > > - do the usual thing to get to read_object() to read the commit > object data for the commit at HEAD, doing all of it in sha1. > > - in the commit object data, locate references to other objects > that use sha1 name. > > - replace these sha1 references with their sha256 counterparts and > show the result. > > I am guessing that you are doing the former as a good first step, in > which case, as an option that changes/affects the behaviour of git > globally, I think "git --hash=sha256" would make sense, like other > global options like --literal-pathspecs and --no-replace-objects. > > Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Design of multiple hash support 2018-11-05 18:03 ` Stefan Beller @ 2018-11-05 23:54 ` brian m. carlson 0 siblings, 0 replies; 7+ messages in thread From: brian m. carlson @ 2018-11-05 23:54 UTC (permalink / raw) To: Stefan Beller; +Cc: Junio C Hamano, git [-- Attachment #1: Type: text/plain, Size: 5556 bytes --] On Mon, Nov 05, 2018 at 10:03:21AM -0800, Stefan Beller wrote: > On Sun, Nov 4, 2018 at 6:36 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > > > I'm currently working on getting Git to support multiple hash algorithms > > > in the same binary (SHA-1 and SHA-256). In order to have a fully > > > functional binary, we'll need to have some way of indicating to certain > > > commands (such as init and show-index) that they should assume a certain > > > hash algorithm. > > > > > > There are basically two approaches I can take. The first is to provide > > > each command that needs to learn about this with its own --hash > > > argument. So we'd have: > > > > > > git init --hash=sha256 > > > git show-index --hash=sha256 <some-file > > > > > > The other alternative is that we provide a global option to git, which > > > is parsed by all programs, like so: > > > > > > git --hash=sha256 init > > > git --hash=sha256 show-index <some-file > > > > I am assuming that "show-index" above is a typo for something like > > "hash-object"? > Actually both seem plausible, as both do not require > RUN_SETUP, which means they cannot rely on the > extensions.objectFormat setting. Correct. In general, I assume that options that want a repository will use the repository for that information. There are a small number of programs, such as init, that need to either set up a repository (without reference to another repository) or need to inspect files without necessarily being in a repository. For example, we will want to have a way of indicating which hash we would like to use in a fresh repository. I am for the moment assuming that we're in a stage 4 configuration: that is, that we're all SHA-1 or all SHA-256. A clone will provide this for us, but a git init will not. Also, our pack index v3 format knows about which hash algorithm is in use, but packs are not labeled with the algorithm they use. This isn't really a problem in normal use, since we always know from context which algorithm is in use, but we'll need to indicate to index-pack (which technically need not run in a repository) which algorithm it should use. show-index will eventually learn to parse the index itself to learn which algorithms are in use, so it is technically not required here. > When having a global setting, would that override the configured > object format extension in a repository, or do we error out? > > So maybe > > git -c extensions.objectFormat=sha256 init > > is the way to go, for now? (Are repository format extensions parsed > just like normal config, such that non-RUN_SETUP commands > can rely on the (non-)existence to determine whether to use > the default or the given hash function?) The extensions callbacks are only handled in check_repo_format, so they necessarily require a repository. This is not new with my code. Furthermore, one would have to specify "-c core.repositoryformatversion=1" as well, as extensions require that version in order to have any effect. My current approach for the testsuite is to have git init honor a new GIT_DEFAULT_HASH environment variable so we need not modify every place in the testsuite that calls git init (of which there are many). That may or may not be greeted with joy by reviewers, but it seemed to be the minimum viable approach. > There is a section "Object names on the command line" > in Documentation/technical/hash-function-transition.txt > and I assume that this before the "dark launch" > phase, so I would expect the latter to work (no error > but conversion/translation on the fly) eventually as a goal. > But the former might be in scope of one series. Currently, I'm not implementing the stage 1-3 implementations. I'm merely going from the point where we have a binary that does only SHA-256 and cannot perform SHA-1 operations at all to a stage 4 implementation, where the binary can do either, but a repository is wholly one or the other. > > It can work this way: > > > > - read HEAD, discover that I am on 'master' branch, read refs/heads/master > > to learn the object name in 40-hex, realize that it cannot be > > sha256 and report "corrupt ref". > > > > Or it can work this way: > > > > - read repository format, realize it is a good old sha1 repository. > > > > - do the usual thing to get to read_object() to read the commit > > object data for the commit at HEAD, doing all of it in sha1. > > > > - in the commit object data, locate references to other objects > > that use sha1 name. > > > > - replace these sha1 references with their sha256 counterparts and > > show the result. > > > > I am guessing that you are doing the former as a good first step, in > > which case, as an option that changes/affects the behaviour of git > > globally, I think "git --hash=sha256" would make sense, like other > > global options like --literal-pathspecs and --no-replace-objects. Right now, we always read the repository configuration when possible, and honor that. I'm not planning, even when we have a full implementation, to let the configuration of input and output format be modified by command-line options. That's a configuration of the repository in the current transition plan, and I have no intention of changing that (apart from possibly honoring "git -c"). -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 868 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Design of multiple hash support 2018-11-05 1:00 Design of multiple hash support brian m. carlson 2018-11-05 2:36 ` Junio C Hamano @ 2018-11-05 19:03 ` Duy Nguyen 2018-11-05 22:00 ` Jonathan Nieder 1 sibling, 1 reply; 7+ messages in thread From: Duy Nguyen @ 2018-11-05 19:03 UTC (permalink / raw) To: brian m. carlson, Git Mailing List On Mon, Nov 5, 2018 at 2:02 AM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > I'm currently working on getting Git to support multiple hash algorithms > in the same binary (SHA-1 and SHA-256). In order to have a fully > functional binary, we'll need to have some way of indicating to certain > commands (such as init and show-index) that they should assume a certain > hash algorithm. > > There are basically two approaches I can take. The first is to provide > each command that needs to learn about this with its own --hash > argument. So we'd have: > > git init --hash=sha256 > git show-index --hash=sha256 <some-file > > The other alternative is that we provide a global option to git, which > is parsed by all programs, like so: > > git --hash=sha256 init > git --hash=sha256 show-index <some-file > I suppose this is about the "no repository/standalone" mode, because - it's hard to pass global arguments down to builtin commands (we often have to rely on global variables which are on the way out) - global options confuse new people and also harder to reorder (if you forget it, you have to alt-b all the way back to near the beginning of the command line and add it there, instead of near the end) - there aren't that many standalone commands I'm leaning towards "git foo --hash=". > There's also the question of what we want to call the option. The > obvious name is --hash, which is intuitive and straightforward. > However, the transition plan names the config option > extensions.objectFormat, so --object-format is also a possibility. If > we ever decide to support, say, zstd compression instead of zlib, we > could leverage the same option (say, --object-format=sha256:zstd) and > avoid the need for an additional option. This might be planning for a > future that never occurs, though. --object-format is less vague than --hash. The downside is it's longer (more to type) but I'm counting on git-completion.bash and the guess that people rarely need to use this option. -- Duy ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Design of multiple hash support 2018-11-05 19:03 ` Duy Nguyen @ 2018-11-05 22:00 ` Jonathan Nieder 2018-11-06 0:13 ` brian m. carlson 0 siblings, 1 reply; 7+ messages in thread From: Jonathan Nieder @ 2018-11-05 22:00 UTC (permalink / raw) To: Duy Nguyen; +Cc: brian m. carlson, Git Mailing List Hi, Duy Nguyen wrote: > On Mon, Nov 5, 2018 at 2:02 AM brian m. carlson > <sandals@crustytoothpaste.net> wrote: >> There are basically two approaches I can take. The first is to provide >> each command that needs to learn about this with its own --hash >> argument. So we'd have: >> >> git init --hash=sha256 >> git show-index --hash=sha256 <some-file >> >> The other alternative is that we provide a global option to git, which >> is parsed by all programs, like so: >> >> git --hash=sha256 init >> git --hash=sha256 show-index <some-file [...] > I'm leaning towards "git foo --hash=". Can you say a little more about the semantics of the option? For commands like "git init", I tend to agree with Duy here, since it allows each command's manual to describe what the option means in the context of that command. For "git show-index", ideally Git should use the object format named in the idx file. >> There's also the question of what we want to call the option. The >> obvious name is --hash, which is intuitive and straightforward. >> However, the transition plan names the config option >> extensions.objectFormat, [...] > --object-format is less vague than --hash. The downside is it's longer > (more to type) but I'm counting on git-completion.bash and the guess > that people rarely need to use this option. Agreed. --object-format makes more sense to me than --hash, since it's more precise about what the option affects. Thanks for looking into this. Sincerely, Jonathan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Design of multiple hash support 2018-11-05 22:00 ` Jonathan Nieder @ 2018-11-06 0:13 ` brian m. carlson 0 siblings, 0 replies; 7+ messages in thread From: brian m. carlson @ 2018-11-06 0:13 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Duy Nguyen, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 1616 bytes --] On Mon, Nov 05, 2018 at 02:00:42PM -0800, Jonathan Nieder wrote: > Hi, > > Duy Nguyen wrote: > > On Mon, Nov 5, 2018 at 2:02 AM brian m. carlson > > <sandals@crustytoothpaste.net> wrote: > > >> There are basically two approaches I can take. The first is to provide > >> each command that needs to learn about this with its own --hash > >> argument. So we'd have: > >> > >> git init --hash=sha256 > >> git show-index --hash=sha256 <some-file > >> > >> The other alternative is that we provide a global option to git, which > >> is parsed by all programs, like so: > >> > >> git --hash=sha256 init > >> git --hash=sha256 show-index <some-file > [...] > > I'm leaning towards "git foo --hash=". > > Can you say a little more about the semantics of the option? For > commands like "git init", I tend to agree with Duy here, since it > allows each command's manual to describe what the option means in the > context of that command. Sure. The semantics for git init are "produce a repository with this hash algorithm". The semantics for git index-pack are "the pack I want you to index uses this hash algorithm". Essentially, more generically, the semantics are "the repository or data object uses this hash algorithm". > For "git show-index", ideally Git should use the object format named > in the idx file. I agree that will be the eventual goal. It will also be what I ship in the final series, in all likelihood. I have most of pack v3 implemented, but it's not complete yet. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 868 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-11-06 0:13 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-11-05 1:00 Design of multiple hash support brian m. carlson 2018-11-05 2:36 ` Junio C Hamano 2018-11-05 18:03 ` Stefan Beller 2018-11-05 23:54 ` brian m. carlson 2018-11-05 19:03 ` Duy Nguyen 2018-11-05 22:00 ` Jonathan Nieder 2018-11-06 0:13 ` brian m. carlson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).