* [PATCH 00/13] builtin: implement, document and test url-parse
@ 2024-04-28 22:30 Matheus Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 01/13] url: move helper function to URL header and source Matheus Afonso Martins Moreira via GitGitGadget
` (14 more replies)
0 siblings, 15 replies; 44+ messages in thread
From: Matheus Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw)
To: git; +Cc: Matheus Moreira
Git commands accept a wide variety of URLs syntaxes, not just standard URLs.
This can make parsing git URLs difficult since standard URL parsers cannot
be used. Even if an external parser were implemented, it would have to track
git's development closely in case support for any new URL schemes are added.
These patches introduce a new url-parse builtin command that exposes git's
native URL parsing algorithms as a plumbing command, allowing other programs
to then call upon git itself to parse the git URLs and their components.
This should be quite useful for scripts. For example, a script might want to
add remotes to repositories, naming them according to the domain name where
the repository is hosted. This new builtin allows it to parse the git URL
and extract its host name which can then be used as input for other
operations. This would be difficult to implement otherwise due to git's
support for scp style URLs.
Signed-off-by: Matheus Afonso Martins Moreira matheus@matheusmoreira.com
Matheus Afonso Martins Moreira (13):
url: move helper function to URL header and source
urlmatch: define url_parse function
builtin: create url-parse command
url-parse: add URL parsing helper function
url-parse: enumerate possible URL components
url-parse: define component extraction helper fn
url-parse: define string to component converter fn
url-parse: define usage and options
url-parse: parse options given on the command line
url-parse: validate all given git URLs
url-parse: output URL components selected by user
Documentation: describe the url-parse builtin
tests: add tests for the new url-parse builtin
.gitignore | 1 +
Documentation/git-url-parse.txt | 59 ++++++++++
Makefile | 1 +
builtin.h | 1 +
builtin/url-parse.c | 132 ++++++++++++++++++++++
command-list.txt | 1 +
connect.c | 8 --
connect.h | 1 -
git.c | 1 +
remote.c | 1 +
t/t9904-url-parse.sh | 194 ++++++++++++++++++++++++++++++++
url.c | 8 ++
url.h | 2 +
urlmatch.c | 90 +++++++++++++++
urlmatch.h | 1 +
15 files changed, 492 insertions(+), 9 deletions(-)
create mode 100644 Documentation/git-url-parse.txt
create mode 100644 builtin/url-parse.c
create mode 100755 t/t9904-url-parse.sh
base-commit: e326e520101dcf43a0499c3adc2df7eca30add2d
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1715%2Fmatheusmoreira%2Furl-parse-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1715/matheusmoreira/url-parse-v1
Pull-Request: https://github.com/git/git/pull/1715
--
gitgitgadget
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 01/13] url: move helper function to URL header and source
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 02/13] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
` (13 subsequent siblings)
14 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw)
To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
It will be used in more places so it should be placed in url.h.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
connect.c | 8 --------
connect.h | 1 -
remote.c | 1 +
url.c | 8 ++++++++
url.h | 2 ++
5 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/connect.c b/connect.c
index 0d77737a536..0cd9439501b 100644
--- a/connect.c
+++ b/connect.c
@@ -693,14 +693,6 @@ enum protocol {
PROTO_GIT
};
-int url_is_local_not_ssh(const char *url)
-{
- const char *colon = strchr(url, ':');
- const char *slash = strchr(url, '/');
- return !colon || (slash && slash < colon) ||
- (has_dos_drive_prefix(url) && is_valid_path(url));
-}
-
static const char *prot_name(enum protocol protocol)
{
switch (protocol) {
diff --git a/connect.h b/connect.h
index 1645126c17f..8d84f6656b1 100644
--- a/connect.h
+++ b/connect.h
@@ -13,7 +13,6 @@ int git_connection_is_socket(struct child_process *conn);
int server_supports(const char *feature);
int parse_feature_request(const char *features, const char *feature);
const char *server_feature_value(const char *feature, size_t *len_ret);
-int url_is_local_not_ssh(const char *url);
struct packet_reader;
enum protocol_version discover_version(struct packet_reader *reader);
diff --git a/remote.c b/remote.c
index 2b650b813b7..2425dbc4660 100644
--- a/remote.c
+++ b/remote.c
@@ -5,6 +5,7 @@
#include "gettext.h"
#include "hex.h"
#include "remote.h"
+#include "url.h"
#include "urlmatch.h"
#include "refs.h"
#include "refspec.h"
diff --git a/url.c b/url.c
index 282b12495ae..c36818c3037 100644
--- a/url.c
+++ b/url.c
@@ -119,3 +119,11 @@ void str_end_url_with_slash(const char *url, char **dest)
free(*dest);
*dest = strbuf_detach(&buf, NULL);
}
+
+int url_is_local_not_ssh(const char *url)
+{
+ const char *colon = strchr(url, ':');
+ const char *slash = strchr(url, '/');
+ return !colon || (slash && slash < colon) ||
+ (has_dos_drive_prefix(url) && is_valid_path(url));
+}
diff --git a/url.h b/url.h
index 2a27c342776..867d3af6691 100644
--- a/url.h
+++ b/url.h
@@ -21,4 +21,6 @@ char *url_decode_parameter_value(const char **query);
void end_url_with_slash(struct strbuf *buf, const char *url);
void str_end_url_with_slash(const char *url, char **dest);
+int url_is_local_not_ssh(const char *url);
+
#endif /* URL_H */
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 02/13] urlmatch: define url_parse function
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 01/13] url: move helper function to URL header and source Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
2024-05-01 22:18 ` Ghanshyam Thakkar
2024-04-28 22:30 ` [PATCH 03/13] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
` (12 subsequent siblings)
14 siblings, 1 reply; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw)
To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Define general parsing function that supports all Git URLs
including scp style URLs such as hostname:~user/repo.
Has the same interface as the URL normalization function
and uses the same data structures, facilitating its use.
It's adapted from the algorithm used to process URLs in connect.c,
so it should support the same inputs.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
urlmatch.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
urlmatch.h | 1 +
2 files changed, 91 insertions(+)
diff --git a/urlmatch.c b/urlmatch.c
index 1d0254abacb..5a442e31fa2 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -3,6 +3,7 @@
#include "hex-ll.h"
#include "strbuf.h"
#include "urlmatch.h"
+#include "url.h"
#define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
#define URL_DIGIT "0123456789"
@@ -438,6 +439,95 @@ char *url_normalize(const char *url, struct url_info *out_info)
return url_normalize_1(url, out_info, 0);
}
+enum protocol {
+ PROTO_UNKNOWN = 0,
+ PROTO_LOCAL,
+ PROTO_FILE,
+ PROTO_SSH,
+ PROTO_GIT,
+};
+
+static enum protocol url_get_protocol(const char *name, size_t n)
+{
+ if (!strncmp(name, "ssh", n))
+ return PROTO_SSH;
+ if (!strncmp(name, "git", n))
+ return PROTO_GIT;
+ if (!strncmp(name, "git+ssh", n)) /* deprecated - do not use */
+ return PROTO_SSH;
+ if (!strncmp(name, "ssh+git", n)) /* deprecated - do not use */
+ return PROTO_SSH;
+ if (!strncmp(name, "file", n))
+ return PROTO_FILE;
+ return PROTO_UNKNOWN;
+}
+
+char *url_parse(const char *url_orig, struct url_info *out_info)
+{
+ struct strbuf url;
+ char *host, *separator;
+ char *detached, *normalized;
+ enum protocol protocol = PROTO_LOCAL;
+ struct url_info local_info;
+ struct url_info *info = out_info? out_info : &local_info;
+ bool scp_syntax = false;
+
+ if (is_url(url_orig)) {
+ url_orig = url_decode(url_orig);
+ } else {
+ url_orig = xstrdup(url_orig);
+ }
+
+ strbuf_init(&url, strlen(url_orig) + sizeof("ssh://"));
+ strbuf_addstr(&url, url_orig);
+
+ host = strstr(url.buf, "://");
+ if (host) {
+ protocol = url_get_protocol(url.buf, host - url.buf);
+ host += 3;
+ } else {
+ if (!url_is_local_not_ssh(url.buf)) {
+ scp_syntax = true;
+ protocol = PROTO_SSH;
+ strbuf_insertstr(&url, 0, "ssh://");
+ host = url.buf + 6;
+ }
+ }
+
+ /* path starts after ':' in scp style SSH URLs */
+ if (scp_syntax) {
+ separator = strchr(host, ':');
+ if (separator) {
+ if (separator[1] == '/')
+ strbuf_remove(&url, separator - url.buf, 1);
+ else
+ *separator = '/';
+ }
+ }
+
+ detached = strbuf_detach(&url, NULL);
+ normalized = url_normalize(detached, info);
+ free(detached);
+
+ if (!normalized) {
+ return NULL;
+ }
+
+ /* point path to ~ for URL's like this:
+ *
+ * ssh://host.xz/~user/repo
+ * git://host.xz/~user/repo
+ * host.xz:~user/repo
+ *
+ */
+ if (protocol == PROTO_GIT || protocol == PROTO_SSH) {
+ if (normalized[info->path_off + 1] == '~')
+ info->path_off++;
+ }
+
+ return normalized;
+}
+
static size_t url_match_prefix(const char *url,
const char *url_prefix,
size_t url_prefix_len)
diff --git a/urlmatch.h b/urlmatch.h
index 5ba85cea139..6b3ce428582 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -35,6 +35,7 @@ struct url_info {
};
char *url_normalize(const char *, struct url_info *);
+char *url_parse(const char *, struct url_info *);
struct urlmatch_item {
size_t hostmatch_len;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 03/13] builtin: create url-parse command
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 01/13] url: move helper function to URL header and source Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 02/13] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 04/13] url-parse: add URL parsing helper function Matheus Afonso Martins Moreira via GitGitGadget
` (11 subsequent siblings)
14 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw)
To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Git commands can accept a rather wide variety of URLs syntaxes.
The range of accepted inputs might expand even more in the future.
This makes the parsing of URL components difficult since standard URL
parsers cannot be used. Extracting the components of a git URL would
require implementing all the schemes that git itself supports, not to
mention tracking its development continuously in case new URL schemes
are added.
The url-parse builtin command is designed to solve this problem
by exposing git's native URL parsing facilities as a plumbing command.
Other programs can then call upon git itself to parse the git URLs and
extract their components. This should be quite useful for scripts.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
.gitignore | 1 +
Makefile | 1 +
builtin.h | 1 +
builtin/url-parse.c | 18 ++++++++++++++++++
command-list.txt | 1 +
git.c | 1 +
6 files changed, 23 insertions(+)
create mode 100644 builtin/url-parse.c
diff --git a/.gitignore b/.gitignore
index 612c0f6a0ff..4f8dde600a5 100644
--- a/.gitignore
+++ b/.gitignore
@@ -174,6 +174,7 @@
/git-update-server-info
/git-upload-archive
/git-upload-pack
+/git-url-parse
/git-var
/git-verify-commit
/git-verify-pack
diff --git a/Makefile b/Makefile
index 1e31acc72ec..b6054b5c1f4 100644
--- a/Makefile
+++ b/Makefile
@@ -1326,6 +1326,7 @@ BUILTIN_OBJS += builtin/update-ref.o
BUILTIN_OBJS += builtin/update-server-info.o
BUILTIN_OBJS += builtin/upload-archive.o
BUILTIN_OBJS += builtin/upload-pack.o
+BUILTIN_OBJS += builtin/url-parse.o
BUILTIN_OBJS += builtin/var.o
BUILTIN_OBJS += builtin/verify-commit.o
BUILTIN_OBJS += builtin/verify-pack.o
diff --git a/builtin.h b/builtin.h
index 28280636da8..e8858808943 100644
--- a/builtin.h
+++ b/builtin.h
@@ -240,6 +240,7 @@ int cmd_update_server_info(int argc, const char **argv, const char *prefix);
int cmd_upload_archive(int argc, const char **argv, const char *prefix);
int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix);
int cmd_upload_pack(int argc, const char **argv, const char *prefix);
+int cmd_url_parse(int argc, const char **argv, const char *prefix);
int cmd_var(int argc, const char **argv, const char *prefix);
int cmd_verify_commit(int argc, const char **argv, const char *prefix);
int cmd_verify_tag(int argc, const char **argv, const char *prefix);
diff --git a/builtin/url-parse.c b/builtin/url-parse.c
new file mode 100644
index 00000000000..994ccec4b2e
--- /dev/null
+++ b/builtin/url-parse.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0-only
+ *
+ * url-parse - parses git URLs and extracts their components
+ *
+ * Copyright © 2024 Matheus Afonso Martins Moreira
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2.
+ */
+
+#include "builtin.h"
+#include "gettext.h"
+
+int cmd_url_parse(int argc, const char **argv, const char *prefix)
+{
+ return 0;
+}
diff --git a/command-list.txt b/command-list.txt
index c4cd0f352b8..6d89b6c4dc6 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -196,6 +196,7 @@ git-update-ref plumbingmanipulators
git-update-server-info synchingrepositories
git-upload-archive synchelpers
git-upload-pack synchelpers
+git-url-parse plumbinginterrogators
git-var plumbinginterrogators
git-verify-commit ancillaryinterrogators
git-verify-pack plumbinginterrogators
diff --git a/git.c b/git.c
index 654d615a188..7aac812d9d4 100644
--- a/git.c
+++ b/git.c
@@ -625,6 +625,7 @@ static struct cmd_struct commands[] = {
{ "upload-archive", cmd_upload_archive, NO_PARSEOPT },
{ "upload-archive--writer", cmd_upload_archive_writer, NO_PARSEOPT },
{ "upload-pack", cmd_upload_pack },
+ { "url-parse", cmd_url_parse, NO_PARSEOPT },
{ "var", cmd_var, RUN_SETUP_GENTLY | NO_PARSEOPT },
{ "verify-commit", cmd_verify_commit, RUN_SETUP },
{ "verify-pack", cmd_verify_pack },
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 04/13] url-parse: add URL parsing helper function
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (2 preceding siblings ...)
2024-04-28 22:30 ` [PATCH 03/13] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 05/13] url-parse: enumerate possible URL components Matheus Afonso Martins Moreira via GitGitGadget
` (10 subsequent siblings)
14 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw)
To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
This function either successfully parses an URL
or dies with an error message. Since this is a
plumbing command, the error message is not translated.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
builtin/url-parse.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index 994ccec4b2e..933e63aaa0a 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -11,6 +11,16 @@
#include "builtin.h"
#include "gettext.h"
+#include "urlmatch.h"
+
+static void parse_or_die(const char *url, struct url_info *info)
+{
+ if (url_parse(url, info)) {
+ return;
+ } else {
+ die("invalid git URL '%s', %s", url, info->err);
+ }
+}
int cmd_url_parse(int argc, const char **argv, const char *prefix)
{
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 05/13] url-parse: enumerate possible URL components
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (3 preceding siblings ...)
2024-04-28 22:30 ` [PATCH 04/13] url-parse: add URL parsing helper function Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 06/13] url-parse: define component extraction helper fn Matheus Afonso Martins Moreira via GitGitGadget
` (9 subsequent siblings)
14 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw)
To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Create an enumeration containing all possible git URL components
which may be selected by the user. The URL_NONE component is used
when the user did not request the parsing of any component.
In this case, the command will return successfully if the URL parses.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
builtin/url-parse.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index 933e63aaa0a..d250338422e 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -13,6 +13,16 @@
#include "gettext.h"
#include "urlmatch.h"
+enum url_component {
+ URL_NONE = 0,
+ URL_PROTOCOL,
+ URL_USER,
+ URL_PASSWORD,
+ URL_HOST,
+ URL_PORT,
+ URL_PATH,
+};
+
static void parse_or_die(const char *url, struct url_info *info)
{
if (url_parse(url, info)) {
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 06/13] url-parse: define component extraction helper fn
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (4 preceding siblings ...)
2024-04-28 22:30 ` [PATCH 05/13] url-parse: enumerate possible URL components Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 07/13] url-parse: define string to component converter fn Matheus Afonso Martins Moreira via GitGitGadget
` (8 subsequent siblings)
14 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw)
To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
The extract function returns a newly allocated string
whose contents are the specified git URL component.
The string must be freed later.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
builtin/url-parse.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index d250338422e..b8ac46dcdeb 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -32,6 +32,42 @@ static void parse_or_die(const char *url, struct url_info *info)
}
}
+static char *extract(enum url_component component, struct url_info *info)
+{
+ size_t offset, length;
+
+ switch (component) {
+ case URL_PROTOCOL:
+ offset = 0;
+ length = info->scheme_len;
+ break;
+ case URL_USER:
+ offset = info->user_off;
+ length = info->user_len;
+ break;
+ case URL_PASSWORD:
+ offset = info->passwd_off;
+ length = info->passwd_len;
+ break;
+ case URL_HOST:
+ offset = info->host_off;
+ length = info->host_len;
+ break;
+ case URL_PORT:
+ offset = info->port_off;
+ length = info->port_len;
+ break;
+ case URL_PATH:
+ offset = info->path_off;
+ length = info->path_len;
+ break;
+ case URL_NONE:
+ return NULL;
+ }
+
+ return xstrndup(info->url + offset, length);
+}
+
int cmd_url_parse(int argc, const char **argv, const char *prefix)
{
return 0;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 07/13] url-parse: define string to component converter fn
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (5 preceding siblings ...)
2024-04-28 22:30 ` [PATCH 06/13] url-parse: define component extraction helper fn Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 08/13] url-parse: define usage and options Matheus Afonso Martins Moreira via GitGitGadget
` (7 subsequent siblings)
14 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw)
To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Converts a git URL component name to its corresponding
enumeration value so that it can be conveniently used
internally by the url-parse command.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
builtin/url-parse.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index b8ac46dcdeb..15923460a78 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -32,6 +32,23 @@ static void parse_or_die(const char *url, struct url_info *info)
}
}
+static enum url_component get_component_or_die(const char *arg)
+{
+ if (!strcmp("path", arg))
+ return URL_PATH;
+ if (!strcmp("host", arg))
+ return URL_HOST;
+ if (!strcmp("protocol", arg))
+ return URL_PROTOCOL;
+ if (!strcmp("user", arg))
+ return URL_USER;
+ if (!strcmp("password", arg))
+ return URL_PASSWORD;
+ if (!strcmp("port", arg))
+ return URL_PORT;
+ die("invalid git URL component '%s'", arg);
+}
+
static char *extract(enum url_component component, struct url_info *info)
{
size_t offset, length;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 08/13] url-parse: define usage and options
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (6 preceding siblings ...)
2024-04-28 22:30 ` [PATCH 07/13] url-parse: define string to component converter fn Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 09/13] url-parse: parse options given on the command line Matheus Afonso Martins Moreira via GitGitGadget
` (6 subsequent siblings)
14 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw)
To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Create the data structures expected by the git option parser.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
builtin/url-parse.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index 15923460a78..c6095b37ede 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -11,8 +11,22 @@
#include "builtin.h"
#include "gettext.h"
+#include "parse-options.h"
#include "urlmatch.h"
+static const char * const builtin_url_parse_usage[] = {
+ N_("git url-parse [<options>] [--] <url>..."),
+ NULL
+};
+
+static char *component_arg = NULL;
+
+static struct option builtin_url_parse_options[] = {
+ OPT_STRING('c', "component", &component_arg, "<component>", \
+ N_("which URL component to extract")),
+ OPT_END(),
+};
+
enum url_component {
URL_NONE = 0,
URL_PROTOCOL,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 09/13] url-parse: parse options given on the command line
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (7 preceding siblings ...)
2024-04-28 22:30 ` [PATCH 08/13] url-parse: define usage and options Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 10/13] url-parse: validate all given git URLs Matheus Afonso Martins Moreira via GitGitGadget
` (5 subsequent siblings)
14 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw)
To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Prepare to handle input by parsing the command line options
and removing them from the arguments vector.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
builtin/url-parse.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index c6095b37ede..03030035b4f 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -101,5 +101,10 @@ static char *extract(enum url_component component, struct url_info *info)
int cmd_url_parse(int argc, const char **argv, const char *prefix)
{
+ argc = parse_options(argc, argv, prefix,
+ builtin_url_parse_options,
+ builtin_url_parse_usage,
+ 0);
+
return 0;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 10/13] url-parse: validate all given git URLs
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (8 preceding siblings ...)
2024-04-28 22:30 ` [PATCH 09/13] url-parse: parse options given on the command line Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 11/13] url-parse: output URL components selected by user Matheus Afonso Martins Moreira via GitGitGadget
` (4 subsequent siblings)
14 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw)
To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Parse all the git URLs given as input on the command line.
Die if an URL cannot be parsed.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
builtin/url-parse.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index 03030035b4f..ab996eadf38 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -101,10 +101,18 @@ static char *extract(enum url_component component, struct url_info *info)
int cmd_url_parse(int argc, const char **argv, const char *prefix)
{
+ struct url_info info;
+ int i;
+
argc = parse_options(argc, argv, prefix,
builtin_url_parse_options,
builtin_url_parse_usage,
0);
+ for (i = 0; i < argc; ++i) {
+ parse_or_die(argv[i], &info);
+ free(info.url);
+ }
+
return 0;
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 11/13] url-parse: output URL components selected by user
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (9 preceding siblings ...)
2024-04-28 22:30 ` [PATCH 10/13] url-parse: validate all given git URLs Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:31 ` [PATCH 12/13] Documentation: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
` (3 subsequent siblings)
14 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:30 UTC (permalink / raw)
To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Parse the specified git URL component from each of the given git URLs
and print them to standard output, one per line.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
builtin/url-parse.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/builtin/url-parse.c b/builtin/url-parse.c
index ab996eadf38..6c1a8676bad 100644
--- a/builtin/url-parse.c
+++ b/builtin/url-parse.c
@@ -102,6 +102,8 @@ static char *extract(enum url_component component, struct url_info *info)
int cmd_url_parse(int argc, const char **argv, const char *prefix)
{
struct url_info info;
+ enum url_component selected = URL_NONE;
+ char *extracted;
int i;
argc = parse_options(argc, argv, prefix,
@@ -109,8 +111,20 @@ int cmd_url_parse(int argc, const char **argv, const char *prefix)
builtin_url_parse_usage,
0);
+ if (component_arg)
+ selected = get_component_or_die(component_arg);
+
for (i = 0; i < argc; ++i) {
parse_or_die(argv[i], &info);
+
+ if (selected != URL_NONE) {
+ extracted = extract(selected, &info);
+ if (extracted) {
+ puts(extracted);
+ free(extracted);
+ }
+ }
+
free(info.url);
}
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 12/13] Documentation: describe the url-parse builtin
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (10 preceding siblings ...)
2024-04-28 22:30 ` [PATCH 11/13] url-parse: output URL components selected by user Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:31 ` Matheus Afonso Martins Moreira via GitGitGadget
2024-04-30 7:37 ` Ghanshyam Thakkar
2024-04-28 22:31 ` [PATCH 13/13] tests: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
` (2 subsequent siblings)
14 siblings, 1 reply; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:31 UTC (permalink / raw)
To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
The new url-parse builtin validates git URLs
and optionally extracts their components.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
Documentation/git-url-parse.txt | 59 +++++++++++++++++++++++++++++++++
1 file changed, 59 insertions(+)
create mode 100644 Documentation/git-url-parse.txt
diff --git a/Documentation/git-url-parse.txt b/Documentation/git-url-parse.txt
new file mode 100644
index 00000000000..bfbbad6c033
--- /dev/null
+++ b/Documentation/git-url-parse.txt
@@ -0,0 +1,59 @@
+git-url-parse(1)
+================
+
+NAME
+----
+git-url-parse - Parse and extract git URL components
+
+SYNOPSIS
+--------
+[verse]
+'git url-parse' [<options>] [--] <url>...
+
+DESCRIPTION
+-----------
+
+Git supports many ways to specify URLs, some of them non-standard.
+For example, git supports the scp style [user@]host:[path] format.
+This command eases interoperability with git URLs by enabling the
+parsing and extraction of the components of all git URLs.
+
+OPTIONS
+-------
+
+-c <arg>::
+--component <arg>::
+ Extract the `<arg>` component from the given git URLs.
+ `<arg>` can be one of:
+ `protocol`, `user`, `password`, `host`, `port`, `path`.
+
+EXAMPLES
+--------
+
+* Print the host name:
++
+------------
+$ git url-parse --component host https://example.com/user/repo
+example.com
+------------
+
+* Print the path:
++
+------------
+$ git url-parse --component path https://example.com/user/repo
+/usr/repo
+$ git url-parse --component path example.com:~user/repo
+~user/repo
+$ git url-parse --component path example.com:user/repo
+/user/repo
+------------
+
+* Validate URLs without outputting anything:
++
+------------
+$ git url-parse https://example.com/user/repo example.com:~user/repo
+------------
+
+GIT
+---
+Part of the linkgit:git[1] suite
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 13/13] tests: add tests for the new url-parse builtin
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (11 preceding siblings ...)
2024-04-28 22:31 ` [PATCH 12/13] Documentation: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-28 22:31 ` Matheus Afonso Martins Moreira via GitGitGadget
2024-04-29 20:53 ` [PATCH 00/13] builtin: implement, document and test url-parse Torsten Bögershausen
2026-05-01 23:15 ` [PATCH v2 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
14 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2024-04-28 22:31 UTC (permalink / raw)
To: git; +Cc: Matheus Moreira, Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Test git URL parsing, validation and component extraction
on all documented git URL schemes and syntaxes.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
t/t9904-url-parse.sh | 194 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 194 insertions(+)
create mode 100755 t/t9904-url-parse.sh
diff --git a/t/t9904-url-parse.sh b/t/t9904-url-parse.sh
new file mode 100755
index 00000000000..f147f00591c
--- /dev/null
+++ b/t/t9904-url-parse.sh
@@ -0,0 +1,194 @@
+#!/bin/sh
+#
+# Copyright © 2024 Matheus Afonso Martins Moreira
+#
+
+test_description='git url-parse tests'
+
+. ./test-lib.sh
+
+test_expect_success 'git url-parse -- ssh syntax' '
+ git url-parse "ssh://user@example.com:1234/repository/path" &&
+ git url-parse "ssh://user@example.com/repository/path" &&
+ git url-parse "ssh://example.com:1234/repository/path" &&
+ git url-parse "ssh://example.com/repository/path"
+'
+
+test_expect_success 'git url-parse -- git syntax' '
+ git url-parse "git://example.com:1234/repository/path" &&
+ git url-parse "git://example.com/repository/path"
+'
+
+test_expect_success 'git url-parse -- http syntax' '
+ git url-parse "https://example.com:1234/repository/path" &&
+ git url-parse "https://example.com/repository/path" &&
+ git url-parse "http://example.com:1234/repository/path" &&
+ git url-parse "http://example.com/repository/path"
+'
+
+test_expect_success 'git url-parse -- scp syntax' '
+ git url-parse "user@example.com:/repository/path" &&
+ git url-parse "example.com:/repository/path"
+'
+
+test_expect_success 'git url-parse -- username expansion - ssh syntax' '
+ git url-parse "ssh://user@example.com:1234/~user/repository" &&
+ git url-parse "ssh://user@example.com/~user/repository" &&
+ git url-parse "ssh://example.com:1234/~user/repository" &&
+ git url-parse "ssh://example.com/~user/repository"
+'
+
+test_expect_success 'git url-parse -- username expansion - git syntax' '
+ git url-parse "git://example.com:1234/~user/repository" &&
+ git url-parse "git://example.com/~user/repository"
+'
+
+test_expect_success 'git url-parse -- username expansion - scp syntax' '
+ git url-parse "user@example.com:~user/repository" &&
+ git url-parse "example.com:~user/repository"
+'
+
+test_expect_success 'git url-parse -- file urls' '
+ git url-parse "file:///repository/path" &&
+ git url-parse "file:///" &&
+ git url-parse "file://"
+'
+
+test_expect_success 'git url-parse -c protocol -- ssh syntax' '
+ test ssh = "$(git url-parse -c protocol "ssh://user@example.com:1234/repository/path")" &&
+ test ssh = "$(git url-parse -c protocol "ssh://user@example.com/repository/path")" &&
+ test ssh = "$(git url-parse -c protocol "ssh://example.com:1234/repository/path")" &&
+ test ssh = "$(git url-parse -c protocol "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c protocol -- git syntax' '
+ test git = "$(git url-parse -c protocol "git://example.com:1234/repository/path")" &&
+ test git = "$(git url-parse -c protocol "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c protocol -- http syntax' '
+ test https = "$(git url-parse -c protocol "https://example.com:1234/repository/path")" &&
+ test https = "$(git url-parse -c protocol "https://example.com/repository/path")" &&
+ test http = "$(git url-parse -c protocol "http://example.com:1234/repository/path")" &&
+ test http = "$(git url-parse -c protocol "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c protocol -- scp syntax' '
+ test ssh = "$(git url-parse -c protocol "user@example.com:/repository/path")" &&
+ test ssh = "$(git url-parse -c protocol "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- ssh syntax' '
+ test user = "$(git url-parse -c user "ssh://user@example.com:1234/repository/path")" &&
+ test user = "$(git url-parse -c user "ssh://user@example.com/repository/path")" &&
+ test "" = "$(git url-parse -c user "ssh://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c user "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- git syntax' '
+ test "" = "$(git url-parse -c user "git://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c user "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- http syntax' '
+ test "" = "$(git url-parse -c user "https://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c user "https://example.com/repository/path")" &&
+ test "" = "$(git url-parse -c user "http://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c user "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- scp syntax' '
+ test user = "$(git url-parse -c user "user@example.com:/repository/path")" &&
+ test "" = "$(git url-parse -c user "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- ssh syntax' '
+ test example.com = "$(git url-parse -c host "ssh://user@example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "ssh://user@example.com/repository/path")" &&
+ test example.com = "$(git url-parse -c host "ssh://example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- git syntax' '
+ test example.com = "$(git url-parse -c host "git://example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- http syntax' '
+ test example.com = "$(git url-parse -c host "https://example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "https://example.com/repository/path")" &&
+ test example.com = "$(git url-parse -c host "http://example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- scp syntax' '
+ test example.com = "$(git url-parse -c host "user@example.com:/repository/path")" &&
+ test example.com = "$(git url-parse -c host "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- ssh syntax' '
+ test 1234 = "$(git url-parse -c port "ssh://user@example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "ssh://user@example.com/repository/path")" &&
+ test 1234 = "$(git url-parse -c port "ssh://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- git syntax' '
+ test 1234 = "$(git url-parse -c port "git://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- http syntax' '
+ test 1234 = "$(git url-parse -c port "https://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "https://example.com/repository/path")" &&
+ test 1234 = "$(git url-parse -c port "http://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- scp syntax' '
+ test "" = "$(git url-parse -c port "user@example.com:/repository/path")" &&
+ test "" = "$(git url-parse -c port "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- ssh syntax' '
+ test "/repository/path" = "$(git url-parse -c path "ssh://user@example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "ssh://user@example.com/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "ssh://example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- git syntax' '
+ test "/repository/path" = "$(git url-parse -c path "git://example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- http syntax' '
+ test "/repository/path" = "$(git url-parse -c path "https://example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "https://example.com/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "http://example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- scp syntax' '
+ test "/repository/path" = "$(git url-parse -c path "user@example.com:/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- username expansion - ssh syntax' '
+ test "~user/repository" = "$(git url-parse -c path "ssh://user@example.com:1234/~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "ssh://user@example.com/~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "ssh://example.com:1234/~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "ssh://example.com/~user/repository")"
+'
+
+test_expect_success 'git url-parse -c path -- username expansion - git syntax' '
+ test "~user/repository" = "$(git url-parse -c path "git://example.com:1234/~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "git://example.com/~user/repository")"
+'
+
+test_expect_success 'git url-parse -c path -- username expansion - scp syntax' '
+ test "~user/repository" = "$(git url-parse -c path "user@example.com:~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "example.com:~user/repository")"
+'
+
+test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 00/13] builtin: implement, document and test url-parse
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (12 preceding siblings ...)
2024-04-28 22:31 ` [PATCH 13/13] tests: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-29 20:53 ` Torsten Bögershausen
2024-04-29 22:04 ` Reply to community feedback Matheus Afonso Martins Moreira
2026-05-01 23:15 ` [PATCH v2 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
14 siblings, 1 reply; 44+ messages in thread
From: Torsten Bögershausen @ 2024-04-29 20:53 UTC (permalink / raw)
To: Matheus Moreira via GitGitGadget; +Cc: git, Matheus Moreira
On Sun, Apr 28, 2024 at 10:30:48PM +0000, Matheus Moreira via GitGitGadget wrote:
> Git commands accept a wide variety of URLs syntaxes, not just standard URLs.
> This can make parsing git URLs difficult since standard URL parsers cannot
> be used. Even if an external parser were implemented, it would have to track
> git's development closely in case support for any new URL schemes are added.
>
> These patches introduce a new url-parse builtin command that exposes git's
> native URL parsing algorithms as a plumbing command, allowing other programs
> to then call upon git itself to parse the git URLs and their components.
>
> This should be quite useful for scripts. For example, a script might want to
> add remotes to repositories, naming them according to the domain name where
> the repository is hosted. This new builtin allows it to parse the git URL
> and extract its host name which can then be used as input for other
> operations. This would be difficult to implement otherwise due to git's
> support for scp style URLs.
>
All in all, having a URL parser as such is a good thing, thanks for working
on that.
There are, however, some notes and questions, up for discussion:
- are there any plans to integrate the parser into connect.c and fetch ?
Speaking as a person, who manage to break the parsing of URLs once,
with the good intention to improve things, I need to learn that
test cases are important.
Some work can be seen in t5601-clone.sh
Especially, when dealing with literal IPv6 addresses, the ones with []
and the simplified ssh syntax 'myhost:src' are interesting to test.
Git itself strives to be RFC compliant when parsing URLs, but
we do not fully guarantee to be "fully certified".
And some features using the [] syntax to embedd a port number
inside the simplified ssh syntax had not been documented,
but used in practise, and are now part of the test suite.
See "[myhost:123]:src" in t5601
- Or is this new tool just a helper, to verify "good" URL's,
and not accepting our legacy parser quirks ?
Then we still should see some IPv6 tests ?
Or may be not, as we prefer hostnames these days ?
- One minor comment:
in 02/13 we read:
+enum protocol {
+ PROTO_UNKNOWN = 0,
+ PROTO_LOCAL,
+ PROTO_FILE,
+ PROTO_SSH,
+ PROTO_GIT,
The RFC 1738 uses the term "scheme" here, and using the very generic
term "protocol" may lead to name clashes later.
Would something like "git_scheme" or so be better ?
- One minor comment:
In 13/13 we read:
+ git url-parse "file:///" &&
+ git url-parse "file://"
I think that the "///" version is superflous, it should already
be covered by the "//" version
^ permalink raw reply [flat|nested] 44+ messages in thread
* Reply to community feedback
2024-04-29 20:53 ` [PATCH 00/13] builtin: implement, document and test url-parse Torsten Bögershausen
@ 2024-04-29 22:04 ` Matheus Afonso Martins Moreira
2024-04-30 6:51 ` Torsten Bögershausen
0 siblings, 1 reply; 44+ messages in thread
From: Matheus Afonso Martins Moreira @ 2024-04-29 22:04 UTC (permalink / raw)
To: tboegi; +Cc: git
Thank you for your feedback.
> are there any plans to integrate the parser into connect.c and fetch ?
Yes.
That was my intention but I was not confident enough to touch connect.c
before getting feedback from the community, since it's critical code
and it is my first contribution.
I do want to merge all URL parsing in git into this one function though,
thereby creating a "single point of truth". This is so that if the algorithm
is modified the changes are visible to the URL parser builtin as well.
> Speaking as a person, who manage to break the parsing of URLs once,
> with the good intention to improve things, I need to learn that
> test cases are important.
Absolutely agree.
When adding test cases, I looked at the possibilities enumerated in urls.txt
and generated test cases based on those. I also looked at the urlmatch.h
test cases. However...
> Some work can be seen in t5601-clone.sh
... I did not think to check those.
> Especially, when dealing with literal IPv6 addresses,
> the ones with [] and the simplified ssh syntax 'myhost:src'
> are interesting to test.
You're right about that. I shall prepare an updated v2 patchset
with more test cases, and also any other changes/improvements
requested by maintainers.
> And some features using the [] syntax to embedd a port number
> inside the simplified ssh syntax had not been documented,
> but used in practise, and are now part of the test suite.
> See "[myhost:123]:src" in t5601
Indeed, I did not read anything of the sort when I checked it.
Would you like me to commit a note to this effect to urls.txt ?
> Or is this new tool just a helper, to verify "good" URL's,
> and not accepting our legacy parser quirks ?
It is my intention that this builtin be able to accept, parse
and decompose all types of URLs that git itself can accept.
> Then we still should see some IPv6 tests ?
I will add them!
> Or may be not, as we prefer hostnames these days ?
I would have to defer that choice to someone more experienced
with the codebase. Please advise on how to proceed.
> The RFC 1738 uses the term "scheme" here, and using the very generic
> term "protocol" may lead to name clashes later.
> Would something like "git_scheme" or so be better ?
Scheme does seem like a better word if it's the terminology used by RFCs.
I can change that in a new version if necessary.
That code is based on the existing connect.c parsing code though.
> I think that the "///" version is superflous, it should already
> be covered by the "//" version
I thought it was a good idea because of existing precedent:
my first approach to creating the test cases was to copy the
ones from t0110-urlmatch-normalization.sh which did have many
cases such as those. Then as I developed the code I came to
believe that it was not necessary: I call url_normalize
in the url_parse function and url_normalize is already being
tested. I think I just forgot to delete those lines.
Reading that file over once again, it does have IPv6 address
test cases. So I should probably go over it again.
Thanks again for the feedback,
Matheus
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Reply to community feedback
2024-04-29 22:04 ` Reply to community feedback Matheus Afonso Martins Moreira
@ 2024-04-30 6:51 ` Torsten Bögershausen
0 siblings, 0 replies; 44+ messages in thread
From: Torsten Bögershausen @ 2024-04-30 6:51 UTC (permalink / raw)
To: Matheus Afonso Martins Moreira; +Cc: git
On Mon, Apr 29, 2024 at 07:04:40PM -0300, Matheus Afonso Martins Moreira wrote:
> Thank you for your feedback.
>
> > are there any plans to integrate the parser into connect.c and fetch ?
>
> Yes.
>
> That was my intention but I was not confident enough to touch connect.c
> before getting feedback from the community, since it's critical code
> and it is my first contribution.
Welcome to the Git community.
I wasn't aware of t0110 as a test case...
>
> I do want to merge all URL parsing in git into this one function though,
> thereby creating a "single point of truth". This is so that if the algorithm
> is modified the changes are visible to the URL parser builtin as well.
>
That is a good thing to do. Be prepared for a longer journey, since we have
this legacy stuff to deal with. But I am happy to help with reviews, even
if that may take some days,
[]
> When adding test cases, I looked at the possibilities enumerated in urls.txt
> and generated test cases based on those. I also looked at the urlmatch.h
> test cases. However...
>
> > Some work can be seen in t5601-clone.sh
>
> ... I did not think to check those.
>
> > Especially, when dealing with literal IPv6 addresses,
> > the ones with [] and the simplified ssh syntax 'myhost:src'
> > are interesting to test.
>
> You're right about that. I shall prepare an updated v2 patchset
> with more test cases, and also any other changes/improvements
> requested by maintainers.
>
> > And some features using the [] syntax to embedd a port number
> > inside the simplified ssh syntax had not been documented,
> > but used in practise, and are now part of the test suite.
> > See "[myhost:123]:src" in t5601
>
> Indeed, I did not read anything of the sort when I checked it.
> Would you like me to commit a note to this effect to urls.txt ?
On short: please not.
This kind of syntax was never ment to be used.
The official "ssh://myhost:123/src" is recommended.
When IPv6 parsing was added, people discovered that it could be
used to "protect" the ':' from being a seperator between the hostname
and the path, and can be used to seperate the hostname from the port.
Once that was used in real live, it was too late to change it.
If we now get a better debug tool, it could mention that this is
a legacy feature, and recommend the longer "ssh://" syntax.
>
> > Or is this new tool just a helper, to verify "good" URL's,
> > and not accepting our legacy parser quirks ?
>
> It is my intention that this builtin be able to accept, parse
> and decompose all types of URLs that git itself can accept.
>
> > Then we still should see some IPv6 tests ?
>
> I will add them!
>
> > Or may be not, as we prefer hostnames these days ?
>
> I would have to defer that choice to someone more experienced
> with the codebase. Please advise on how to proceed.
Re-reading this email conversation,
I think that we should support (in the future),
what we support today.
Having a new parser tool means, that there is a chance to reject
those URLs with the note/hint, that they are depracted, and should
be replaced by a proper one.
From my point of view this means that all existing test case should pass
even with the new parser, as a general approach.
Deprecating things is hard, may take years, and may be done in a seperate
task/patch series. Or may be part of this one, in seperate commits.
>
> > The RFC 1738 uses the term "scheme" here, and using the very generic
> > term "protocol" may lead to name clashes later.
> > Would something like "git_scheme" or so be better ?
>
> Scheme does seem like a better word if it's the terminology used by RFCs.
> I can change that in a new version if necessary.
> That code is based on the existing connect.c parsing code though.
>
> > I think that the "///" version is superflous, it should already
> > be covered by the "//" version
>
> I thought it was a good idea because of existing precedent:
> my first approach to creating the test cases was to copy the
> ones from t0110-urlmatch-normalization.sh which did have many
> cases such as those. Then as I developed the code I came to
> believe that it was not necessary: I call url_normalize
> in the url_parse function and url_normalize is already being
> tested. I think I just forgot to delete those lines.
>
> Reading that file over once again, it does have IPv6 address
> test cases. So I should probably go over it again.
>
> Thanks again for the feedback,
>
> Matheus
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 12/13] Documentation: describe the url-parse builtin
2024-04-28 22:31 ` [PATCH 12/13] Documentation: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-04-30 7:37 ` Ghanshyam Thakkar
0 siblings, 0 replies; 44+ messages in thread
From: Ghanshyam Thakkar @ 2024-04-30 7:37 UTC (permalink / raw)
To: Matheus Afonso Martins Moreira via GitGitGadget
Cc: git, Matheus Moreira, Matheus Afonso Martins Moreira
On Sun, 28 Apr 2024, Matheus Afonso Martins Moreira via GitGitGadget <gitgitgadget@gmail.com> wrote:
> +* Print the path:
> ++
> +------------
> +$ git url-parse --component path https://example.com/user/repo
> +/usr/repo
s/usr/user/
Thanks.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/13] urlmatch: define url_parse function
2024-04-28 22:30 ` [PATCH 02/13] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
@ 2024-05-01 22:18 ` Ghanshyam Thakkar
2024-05-02 4:02 ` Torsten Bögershausen
0 siblings, 1 reply; 44+ messages in thread
From: Ghanshyam Thakkar @ 2024-05-01 22:18 UTC (permalink / raw)
To: Matheus Afonso Martins Moreira via GitGitGadget
Cc: git, Matheus Moreira, Matheus Afonso Martins Moreira
On Sun, 28 Apr 2024, Matheus Afonso Martins Moreira via GitGitGadget <gitgitgadget@gmail.com> wrote:
> From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
>
> Define general parsing function that supports all Git URLs
> including scp style URLs such as hostname:~user/repo.
> Has the same interface as the URL normalization function
> and uses the same data structures, facilitating its use.
> It's adapted from the algorithm used to process URLs in connect.c,
> so it should support the same inputs.
>
> Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
> ---
> urlmatch.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> urlmatch.h | 1 +
> 2 files changed, 91 insertions(+)
>
> diff --git a/urlmatch.c b/urlmatch.c
> index 1d0254abacb..5a442e31fa2 100644
> --- a/urlmatch.c
> +++ b/urlmatch.c
> @@ -3,6 +3,7 @@
> #include "hex-ll.h"
> #include "strbuf.h"
> #include "urlmatch.h"
> +#include "url.h"
>
> #define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
> #define URL_DIGIT "0123456789"
> @@ -438,6 +439,95 @@ char *url_normalize(const char *url, struct url_info *out_info)
> return url_normalize_1(url, out_info, 0);
> }
>
> +enum protocol {
> + PROTO_UNKNOWN = 0,
> + PROTO_LOCAL,
> + PROTO_FILE,
> + PROTO_SSH,
> + PROTO_GIT,
> +};
> +
> +static enum protocol url_get_protocol(const char *name, size_t n)
> +{
> + if (!strncmp(name, "ssh", n))
> + return PROTO_SSH;
> + if (!strncmp(name, "git", n))
> + return PROTO_GIT;
> + if (!strncmp(name, "git+ssh", n)) /* deprecated - do not use */
> + return PROTO_SSH;
> + if (!strncmp(name, "ssh+git", n)) /* deprecated - do not use */
> + return PROTO_SSH;
> + if (!strncmp(name, "file", n))
> + return PROTO_FILE;
> + return PROTO_UNKNOWN;
> +}
> +
> +char *url_parse(const char *url_orig, struct url_info *out_info)
> +{
> + struct strbuf url;
> + char *host, *separator;
> + char *detached, *normalized;
> + enum protocol protocol = PROTO_LOCAL;
> + struct url_info local_info;
> + struct url_info *info = out_info? out_info : &local_info;
> + bool scp_syntax = false;
> +
> + if (is_url(url_orig)) {
> + url_orig = url_decode(url_orig);
> + } else {
> + url_orig = xstrdup(url_orig);
> + }
> +
> + strbuf_init(&url, strlen(url_orig) + sizeof("ssh://"));
> + strbuf_addstr(&url, url_orig);
> +
> + host = strstr(url.buf, "://");
> + if (host) {
> + protocol = url_get_protocol(url.buf, host - url.buf);
> + host += 3;
> + } else {
> + if (!url_is_local_not_ssh(url.buf)) {
> + scp_syntax = true;
> + protocol = PROTO_SSH;
> + strbuf_insertstr(&url, 0, "ssh://");
> + host = url.buf + 6;
> + }
> + }
Interesting.
`
$ ./git url-parse -c protocol file:/test/test
ssh
`
seems like only having a single slash after the 'protocol:' prints
'ssh' always (I think this may not even be a valid url). After this 'else'
block, the url turns into 'ssh://file/test/test'. Will examine the details
later. Not that it's your code's doing, and rather the result of
url_is_local_not_ssh(). But just wanted to point this out and ask if this
should error out or is this an intended behavior that I can't figure out.
Thanks.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 02/13] urlmatch: define url_parse function
2024-05-01 22:18 ` Ghanshyam Thakkar
@ 2024-05-02 4:02 ` Torsten Bögershausen
0 siblings, 0 replies; 44+ messages in thread
From: Torsten Bögershausen @ 2024-05-02 4:02 UTC (permalink / raw)
To: Ghanshyam Thakkar
Cc: Matheus Afonso Martins Moreira via GitGitGadget, git,
Matheus Moreira, Matheus Afonso Martins Moreira
[]
> Interesting.
>
> `
> $ ./git url-parse -c protocol file:/test/test
> ssh
> `
>
> seems like only having a single slash after the 'protocol:' prints
> 'ssh' always (I think this may not even be a valid url). After this 'else'
> block, the url turns into 'ssh://file/test/test'. Will examine the details
> later. Not that it's your code's doing, and rather the result of
> url_is_local_not_ssh(). But just wanted to point this out and ask if this
> should error out or is this an intended behavior that I can't figure out.
ssh is the correct answer, try something like
`git clone localhost:/home/myself/project/git.git`
It is the scp syntax, supported by Git as well.
From `man scp`
scp copies files between hosts on a network.
[]
The source and target may be specified as a local pathname,
a remote host with optional path in the form
[user@]host:[path],
or a URI in the form scp://[user@]host[:port][/path].
Local file names can be made explicit using absolute or relative pathnames
to avoid scp treating file names containing ‘:’ as host specifiers.
So yes, they share similar problems
with the ':' that could mean different things when using the short form.
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 0/8] builtin: implement, document and test url-parse
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (13 preceding siblings ...)
2024-04-29 20:53 ` [PATCH 00/13] builtin: implement, document and test url-parse Torsten Bögershausen
@ 2026-05-01 23:15 ` Matheus Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 1/8] connect: rename enum protocol to url_scheme Matheus Afonso Martins Moreira via GitGitGadget
` (8 more replies)
14 siblings, 9 replies; 44+ messages in thread
From: Matheus Moreira via GitGitGadget @ 2026-05-01 23:15 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira
This series adds git url-parse, a plumbing builtin for inspecting git URLs.
Git accepts a wider variety of URL forms than any standard parser handles.
The supported forms include RFC URLs, file:// URLs, scp-style
[user@]host:path for SSH, and IPv6 in brackets. Tools wanting to reason
about them have historically had to reimplement git's parsing or shell out
indirectly. With git url-parse, scripts can ask git directly: validate a
URL, extract a component (scheme, user, host, port, path, password), or
both.
The series consists of eight commits.
The first four are preparatory. They rename enum protocol to enum url_scheme
for RFC alignment, move url_is_local_not_ssh and the scheme-detection
routines from connect.c to url.h/url.c, and stop url_get_scheme from dying
on unknown schemes so other parsers can handle unknowns gracefully.
The fifth commit defines the new parser, url_parse, in urlmatch.c. It is
adapted from parse_connect_url and uses the same data structures as
url_normalize. The parser returns NULL on failure with err populated, and
exposes URL components as offset/length pairs into the normalized URL
buffer.
The sixth commit adds the user-facing command, with a helpful error when the
input looks like a local path rather than a URL.
The last two commits are documentation (a manpage) and 53 tests covering URL
form, scp form, IPv6 in URL and scp forms, bracket forms, username
expansion, query/fragment stripping, the local-path error, and
validation-only mode.
Several choices in this series are judgment calls. Happy to amend or follow
up on any of them.
The component name is scheme, not protocol. RFC 1738/3986 calls them
schemes. The series renames enum protocol to enum url_scheme internally, and
the user-facing component name follows the same direction. I considered
accepting both as aliases but decided against the precedent for a new
command. If you would rather see protocol, or both protocol and scheme, that
is easy to change.
Local paths are deliberately not URLs. parse_connect_url accepts bare paths
like /abs/path or ./rel as URL_SCHEME_LOCAL. url_parse rejects them, since
url_normalize requires a scheme://host form, and silent conversion to
file:// has no good answer for relative or tilde forms. The builtin emits a
helpful error suggesting the explicit file:// form. If full git clone parity
is preferred (bare paths accepted via auto-conversion or a new flag), that
could be added.
Absent and empty components are conflated in output. --component user
http://host/ and --component user http://@host/ both produce empty lines.
The underlying struct url_info preserves the distinction: *_off == 0 vs
*_off != 0 with *_len == 0. A future option can expose it without breaking
change. Can amend this patch set if necessary.
Changes since v1:
* Bug fix: ~user paths with a query string or fragment were leaking the ?
or # into the path output. The ~user-skip logic in url_parse previously
ran only for file://. It now runs for git/ssh/scp URLs as well, matching
what parse_connect_url does and what users expect.
* Helpful error for local paths instead of the cryptic "invalid URL scheme
name or missing '://' suffix".
* -c protocol renamed to -c scheme for consistency with the internal rename
and the RFC.
* Documented the deliberate divergence from parse_connect_url (local paths
and unknown schemes) in the urlmatch commit message.
* Doc and command-list polish: purehelpers category, asciidoc placeholder
convention, [synopsis] form.
* Original micro commit style staged buildup of the builtin collapsed to a
single self-contained commit. The rest of the series is unchanged in
shape.
Matheus Afonso Martins Moreira (8):
connect: rename enum protocol to url_scheme
url: move url_is_local_not_ssh to url.h
url: move scheme detection to URL header/source
url: return URL_SCHEME_UNKNOWN instead of dying
urlmatch: define url_parse function
builtin: create url-parse command
doc: describe the url-parse builtin
t9904: add tests for the new url-parse builtin
.gitignore | 1 +
Documentation/git-url-parse.adoc | 80 ++++++
Documentation/meson.build | 1 +
Makefile | 1 +
builtin.h | 1 +
builtin/url-parse.c | 132 ++++++++++
command-list.txt | 1 +
connect.c | 78 ++----
connect.h | 1 -
git.c | 1 +
meson.build | 1 +
remote.c | 1 +
t/meson.build | 1 +
t/t9904-url-parse.sh | 319 ++++++++++++++++++++++++
t/unit-tests/u-urlmatch-normalization.c | 45 ++++
url.c | 23 ++
url.h | 16 ++
urlmatch.c | 127 ++++++++++
urlmatch.h | 1 +
19 files changed, 777 insertions(+), 54 deletions(-)
create mode 100644 Documentation/git-url-parse.adoc
create mode 100644 builtin/url-parse.c
create mode 100755 t/t9904-url-parse.sh
base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1715%2Fmatheusmoreira%2Furl-parse-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1715/matheusmoreira/url-parse-v2
Pull-Request: https://github.com/git/git/pull/1715
Range-diff vs v1:
-: ---------- > 1: 38f797362d connect: rename enum protocol to url_scheme
1: 42eb0cbf68 ! 2: a4153e1d24 url: move helper function to URL header and source
@@ Metadata
Author: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
## Commit message ##
- url: move helper function to URL header and source
+ url: move url_is_local_not_ssh to url.h
- It will be used in more places so it should be placed in url.h.
+ Move url_is_local_not_ssh from connect.c/connect.h
+ to url.c/url.h so that the new url_parse function
+ in urlmatch.c, and any future code that needs to
+ distinguish a local path from an scp style SSH URL,
+ can reuse the heuristic without depending on connect.c.
+
+ No behavior change.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
## connect.c ##
-@@ connect.c: enum protocol {
- PROTO_GIT
+@@ connect.c: enum url_scheme {
+ URL_SCHEME_GIT
};
-int url_is_local_not_ssh(const char *url)
@@ connect.c: enum protocol {
- (has_dos_drive_prefix(url) && is_valid_path(url));
-}
-
- static const char *prot_name(enum protocol protocol)
+ static const char *url_scheme_name(enum url_scheme scheme)
{
- switch (protocol) {
+ switch (scheme) {
## connect.h ##
@@ connect.h: int git_connection_is_socket(struct child_process *conn);
@@ url.h: char *url_decode_parameter_value(const char **query);
+int url_is_local_not_ssh(const char *url);
+
- #endif /* URL_H */
+ /*
+ * The set of unreserved characters as per STD66 (RFC3986) is
+ * '[A-Za-z0-9-._~]'. These characters are safe to appear in URI
-: ---------- > 3: e584fb03f3 url: move scheme detection to URL header/source
-: ---------- > 4: 7381704c38 url: return URL_SCHEME_UNKNOWN instead of dying
2: 13b81b8aa0 ! 5: 89932a70f3 urlmatch: define url_parse function
@@ Metadata
## Commit message ##
urlmatch: define url_parse function
- Define general parsing function that supports all Git URLs
+ Define url_parse, a general parsing function that supports all Git URLs
including scp style URLs such as hostname:~user/repo.
- Has the same interface as the URL normalization function
- and uses the same data structures, facilitating its use.
- It's adapted from the algorithm used to process URLs in connect.c,
- so it should support the same inputs.
+
+ It is adapted from the algorithm in connect.c's parse_connect_url
+ and reuses the shared enum url_scheme and url_get_scheme function
+ that previous commits made available in url.h. The new parser and
+ the connect path agree on scheme classification. url_parse has the
+ same interface as url_normalize and uses the same data structures.
+
+ Both functions accept the same URL forms with one deliberate
+ exception. Bare local paths such as "/abs/path", "./rel"
+ or "repo" are accepted by parse_connect_url as URL_SCHEME_LOCAL,
+ but rejected by url_parse because url_normalize requires a URL
+ with a scheme://host form. A consumer that wants to handle both
+ URLs and local paths needs to dispatch on url_is_local_not_ssh
+ before calling url_parse, just as the connect path does internally.
+
+ The duplication with parse_connect_url is intentional.
+ The two functions have different contracts:
+
+ - parse_connect_url
+
+ Calls die() on an unknown scheme
+ and returns NUL-terminated host/path
+ strings for the connect path
+
+ - url_parse
+
+ Returns NULL on failure while populating
+ out_info->err, and exposes components
+ as offset/length pairs into the normalized
+ URL buffer, matching url_normalize.
+
+ Reconciling both is possible, but not in the scope
+ of the current patch set.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
+ ## t/unit-tests/u-urlmatch-normalization.c ##
+@@ t/unit-tests/u-urlmatch-normalization.c: void test_urlmatch_normalization__equivalents(void)
+ compare_normalized_urls("https://@x.y/^/../abc", "httpS://@x.y:0443/abc", 1);
+ compare_normalized_urls("https://@x.y/^/..", "httpS://@x.y:0443/", 1);
+ }
++
++static void check_parsed_path(const char *url, const char *expected_path)
++{
++ struct url_info info;
++ char *parsed = url_parse(url, &info);
++ char *path;
++
++ cl_assert(parsed != NULL);
++ path = xstrndup(parsed + info.path_off, info.path_len);
++ cl_assert_equal_s(path, expected_path);
++ free(path);
++ free(parsed);
++}
++
++void test_urlmatch_normalization__parse_scp(void)
++{
++ check_parsed_path("host:path", "/path");
++ check_parsed_path("user@host:path", "/path");
++ check_parsed_path("host:~user/repo", "~user/repo");
++ check_parsed_path("user@host:~user/repo", "~user/repo");
++ check_parsed_path("[host]:src", "/src");
++ check_parsed_path("[host:123]:src", "/src");
++ check_parsed_path("[::1]:repo", "/repo");
++ check_parsed_path("user@[::1]:repo", "/repo");
++}
++
++void test_urlmatch_normalization__parse_url_form(void)
++{
++ check_parsed_path("ssh://host/repo", "/repo");
++ check_parsed_path("ssh://host/~user/repo", "~user/repo");
++ check_parsed_path("git://host:9418/repo", "/repo");
++ check_parsed_path("git://host/~user/repo", "~user/repo");
++ check_parsed_path("ssh://[::1]:1234/repo", "/repo");
++ check_parsed_path("http://[2001:db8::1]/repo", "/repo");
++}
++
++void test_urlmatch_normalization__parse_strips_query_and_fragment(void)
++{
++ check_parsed_path("ssh://host/~user/repo?q", "~user/repo");
++ check_parsed_path("ssh://host/~user/repo#frag", "~user/repo");
++ check_parsed_path("git://host/~user/repo?q", "~user/repo");
++ check_parsed_path("user@host:~user/repo?q", "~user/repo");
++ check_parsed_path("https://host/repo?q", "/repo");
++ check_parsed_path("https://host/repo#frag", "/repo");
++}
+
## urlmatch.c ##
@@
#include "hex-ll.h"
@@ urlmatch.c: char *url_normalize(const char *url, struct url_info *out_info)
return url_normalize_1(url, out_info, 0);
}
-+enum protocol {
-+ PROTO_UNKNOWN = 0,
-+ PROTO_LOCAL,
-+ PROTO_FILE,
-+ PROTO_SSH,
-+ PROTO_GIT,
-+};
-+
-+static enum protocol url_get_protocol(const char *name, size_t n)
-+{
-+ if (!strncmp(name, "ssh", n))
-+ return PROTO_SSH;
-+ if (!strncmp(name, "git", n))
-+ return PROTO_GIT;
-+ if (!strncmp(name, "git+ssh", n)) /* deprecated - do not use */
-+ return PROTO_SSH;
-+ if (!strncmp(name, "ssh+git", n)) /* deprecated - do not use */
-+ return PROTO_SSH;
-+ if (!strncmp(name, "file", n))
-+ return PROTO_FILE;
-+ return PROTO_UNKNOWN;
-+}
-+
+char *url_parse(const char *url_orig, struct url_info *out_info)
+{
+ struct strbuf url;
+ char *host, *separator;
+ char *detached, *normalized;
-+ enum protocol protocol = PROTO_LOCAL;
++ char *url_decoded;
++ enum url_scheme scheme = URL_SCHEME_LOCAL;
+ struct url_info local_info;
-+ struct url_info *info = out_info? out_info : &local_info;
++ struct url_info *info = out_info ? out_info : &local_info;
+ bool scp_syntax = false;
+
-+ if (is_url(url_orig)) {
-+ url_orig = url_decode(url_orig);
-+ } else {
-+ url_orig = xstrdup(url_orig);
-+ }
++ if (is_url(url_orig))
++ url_decoded = url_decode(url_orig);
++ else
++ url_decoded = xstrdup(url_orig);
+
-+ strbuf_init(&url, strlen(url_orig) + sizeof("ssh://"));
-+ strbuf_addstr(&url, url_orig);
++ strbuf_init(&url, strlen(url_decoded) + sizeof("ssh://"));
++ strbuf_addstr(&url, url_decoded);
++ free(url_decoded);
+
+ host = strstr(url.buf, "://");
+ if (host) {
-+ protocol = url_get_protocol(url.buf, host - url.buf);
++ /*
++ * Temporarily NUL-terminate the scheme name
++ * so we can pass it to url_get_scheme(),
++ * then restore the ':' so the buffer
++ * is intact for url_normalize() below.
++ */
++ char saved = *host;
++ *host = '\0';
++ scheme = url_get_scheme(url.buf);
++ *host = saved;
+ host += 3;
+ } else {
+ if (!url_is_local_not_ssh(url.buf)) {
+ scp_syntax = true;
-+ protocol = PROTO_SSH;
++ scheme = URL_SCHEME_SSH;
+ strbuf_insertstr(&url, 0, "ssh://");
-+ host = url.buf + 6;
++ host = url.buf + strlen("ssh://");
+ }
+ }
+
-+ /* path starts after ':' in scp style SSH URLs */
++ /*
++ * Path starts after ':' in scp style SSH URLs.
++ *
++ * The host portion can begin with an optional "user@",
++ * and the host itself can be wrapped in '[' ']' brackets.
++ * The bracket form is git's legacy way of supporting:
++ *
++ * - IPv6 literals: [::1]:repo
++ * - host:port pairs in the short form: [myhost:123]:src
++ * - Plain hostnames that happen to need bracketing: [host]:path
++ *
++ * Treat '[' followed by 0 or 1 inner colons as the host:port
++ * or plain hostname form and strip the brackets so url_normalize
++ * sees host[:port] natively. Two or more inner colons mark an
++ * IPv6 literal: keep the brackets for url_normalize to recognize.
++ *
++ * The scp path separator is the ':' that follows the host part,
++ * and we must skip over user@ and any '[...]' before searching.
++ */
+ if (scp_syntax) {
-+ separator = strchr(host, ':');
++ char *user_at;
++ char *host_start;
++ char *bracket_end;
++
++ user_at = strchr(host, '@');
++ host_start = user_at ? user_at + 1 : host;
++
++ if (*host_start == '[') {
++ char *p;
++ int inner_colons;
++
++ bracket_end = strchr(host_start, ']');
++ inner_colons = 0;
++ for (p = host_start + 1; bracket_end && p < bracket_end; p++)
++ if (*p == ':')
++ inner_colons++;
++
++ if (bracket_end && inner_colons <= 1) {
++ size_t close_off = bracket_end - url.buf;
++ size_t open_off = host_start - url.buf;
++ strbuf_remove(&url, close_off, 1);
++ strbuf_remove(&url, open_off, 1);
++ separator = url.buf + close_off - 1;
++ } else if (bracket_end) {
++ separator = strchr(bracket_end + 1, ':');
++ } else {
++ separator = strchr(host_start, ':');
++ }
++ } else {
++ separator = strchr(host_start, ':');
++ }
++
+ if (separator) {
+ if (separator[1] == '/')
+ strbuf_remove(&url, separator - url.buf, 1);
@@ urlmatch.c: char *url_normalize(const char *url, struct url_info *out_info)
+ normalized = url_normalize(detached, info);
+ free(detached);
+
-+ if (!normalized) {
++ if (!normalized)
+ return NULL;
-+ }
+
-+ /* point path to ~ for URL's like this:
++ /*
++ * Point path to ~ for URLs like this:
+ *
+ * ssh://host.xz/~user/repo
+ * git://host.xz/~user/repo
+ * host.xz:~user/repo
-+ *
+ */
-+ if (protocol == PROTO_GIT || protocol == PROTO_SSH) {
-+ if (normalized[info->path_off + 1] == '~')
++ if (scheme == URL_SCHEME_GIT || scheme == URL_SCHEME_SSH) {
++ if (normalized[info->path_off + 1] == '~') {
+ info->path_off++;
++ info->path_len--;
++ }
+ }
+
+ return normalized;
3: e4781b36d5 ! 6: 886a7d659e builtin: create url-parse command
@@ Commit message
The url-parse builtin command is designed to solve this problem
by exposing git's native URL parsing facilities as a plumbing command.
- Other programs can then call upon git itself to parse the git URLs and
- extract their components. This should be quite useful for scripts.
+ Other programs can then call upon git itself to parse the git URLs
+ and extract their components. This should be quite useful for scripts.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
@@ Makefile: BUILTIN_OBJS += builtin/update-ref.o
BUILTIN_OBJS += builtin/verify-pack.o
## builtin.h ##
-@@ builtin.h: int cmd_update_server_info(int argc, const char **argv, const char *prefix);
- int cmd_upload_archive(int argc, const char **argv, const char *prefix);
- int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix);
- int cmd_upload_pack(int argc, const char **argv, const char *prefix);
-+int cmd_url_parse(int argc, const char **argv, const char *prefix);
- int cmd_var(int argc, const char **argv, const char *prefix);
- int cmd_verify_commit(int argc, const char **argv, const char *prefix);
- int cmd_verify_tag(int argc, const char **argv, const char *prefix);
+@@ builtin.h: int cmd_update_server_info(int argc, const char **argv, const char *prefix, stru
+ int cmd_upload_archive(int argc, const char **argv, const char *prefix, struct repository *repo);
+ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix, struct repository *repo);
+ int cmd_upload_pack(int argc, const char **argv, const char *prefix, struct repository *repo);
++int cmd_url_parse(int argc, const char **argv, const char *prefix, struct repository *repo);
+ int cmd_var(int argc, const char **argv, const char *prefix, struct repository *repo);
+ int cmd_verify_commit(int argc, const char **argv, const char *prefix, struct repository *repo);
+ int cmd_verify_tag(int argc, const char **argv, const char *prefix, struct repository *repo);
## builtin/url-parse.c (new) ##
@@
-+/* SPDX-License-Identifier: GPL-2.0-only
-+ *
-+ * url-parse - parses git URLs and extracts their components
-+ *
-+ * Copyright © 2024 Matheus Afonso Martins Moreira
-+ *
-+ * This program is free software; you can redistribute it and/or modify
-+ * it under the terms of the GNU General Public License as published by
-+ * the Free Software Foundation; version 2.
-+ */
-+
+#include "builtin.h"
+#include "gettext.h"
++#include "parse-options.h"
++#include "url.h"
++#include "urlmatch.h"
++
++static const char * const builtin_url_parse_usage[] = {
++ N_("git url-parse [-c <component>] [--] <url>..."),
++ NULL
++};
++
++static char *component_arg;
++
++static struct option builtin_url_parse_options[] = {
++ OPT_STRING('c', "component", &component_arg, N_("component"),
++ N_("which URL component to extract")),
++ OPT_END(),
++};
++
++enum url_component {
++ URL_NONE = 0,
++ URL_SCHEME,
++ URL_USER,
++ URL_PASSWORD,
++ URL_HOST,
++ URL_PORT,
++ URL_PATH,
++};
++
++static void parse_or_die(const char *url, struct url_info *info)
++{
++ if (url_is_local_not_ssh(url)) {
++ if (*url == '/')
++ die("'%s' is not a URL; if you meant a local "
++ "repository, use 'file://%s'", url, url);
++ die("'%s' is not a URL; if you meant a local repository, "
++ "use a 'file://' URL with an absolute path", url);
++ }
++ if (!url_parse(url, info))
++ die("invalid git URL '%s': %s", url, info->err);
++}
++
++static enum url_component get_component_or_die(const char *arg)
++{
++ if (!strcmp("path", arg))
++ return URL_PATH;
++ if (!strcmp("host", arg))
++ return URL_HOST;
++ if (!strcmp("scheme", arg))
++ return URL_SCHEME;
++ if (!strcmp("user", arg))
++ return URL_USER;
++ if (!strcmp("password", arg))
++ return URL_PASSWORD;
++ if (!strcmp("port", arg))
++ return URL_PORT;
++ die("invalid git URL component '%s'", arg);
++}
++
++static char *extract_component(enum url_component component,
++ struct url_info *info)
++{
++ size_t offset, length;
++
++ switch (component) {
++ case URL_SCHEME:
++ offset = 0;
++ length = info->scheme_len;
++ break;
++ case URL_USER:
++ offset = info->user_off;
++ length = info->user_len;
++ break;
++ case URL_PASSWORD:
++ offset = info->passwd_off;
++ length = info->passwd_len;
++ break;
++ case URL_HOST:
++ offset = info->host_off;
++ length = info->host_len;
++ break;
++ case URL_PORT:
++ offset = info->port_off;
++ length = info->port_len;
++ break;
++ case URL_PATH:
++ offset = info->path_off;
++ length = info->path_len;
++ break;
++ case URL_NONE:
++ return NULL;
++ }
++
++ return xstrndup(info->url + offset, length);
++}
+
-+int cmd_url_parse(int argc, const char **argv, const char *prefix)
++int cmd_url_parse(int argc,
++ const char **argv,
++ const char *prefix,
++ struct repository *repo UNUSED)
+{
++ struct url_info info;
++ enum url_component selected = URL_NONE;
++ char *extracted;
++ int i;
++
++ argc = parse_options(argc, argv, prefix, builtin_url_parse_options,
++ builtin_url_parse_usage, 0);
++
++ if (argc == 0)
++ usage_with_options(builtin_url_parse_usage,
++ builtin_url_parse_options);
++
++ if (component_arg)
++ selected = get_component_or_die(component_arg);
++
++ for (i = 0; i < argc; i++) {
++ parse_or_die(argv[i], &info);
++
++ if (selected != URL_NONE) {
++ extracted = extract_component(selected, &info);
++ if (extracted) {
++ puts(extracted);
++ free(extracted);
++ }
++ }
++
++ free(info.url);
++ }
++
+ return 0;
+}
@@ command-list.txt: git-update-ref plumbingmanipulators
git-update-server-info synchingrepositories
git-upload-archive synchelpers
git-upload-pack synchelpers
-+git-url-parse plumbinginterrogators
++git-url-parse purehelpers
git-var plumbinginterrogators
git-verify-commit ancillaryinterrogators
git-verify-pack plumbinginterrogators
@@ git.c: static struct cmd_struct commands[] = {
{ "upload-archive", cmd_upload_archive, NO_PARSEOPT },
{ "upload-archive--writer", cmd_upload_archive_writer, NO_PARSEOPT },
{ "upload-pack", cmd_upload_pack },
-+ { "url-parse", cmd_url_parse, NO_PARSEOPT },
++ { "url-parse", cmd_url_parse },
{ "var", cmd_var, RUN_SETUP_GENTLY | NO_PARSEOPT },
{ "verify-commit", cmd_verify_commit, RUN_SETUP },
{ "verify-pack", cmd_verify_pack },
+
+ ## meson.build ##
+@@ meson.build: builtin_sources = [
+ 'builtin/update-server-info.c',
+ 'builtin/upload-archive.c',
+ 'builtin/upload-pack.c',
++ 'builtin/url-parse.c',
+ 'builtin/var.c',
+ 'builtin/verify-commit.c',
+ 'builtin/verify-pack.c',
4: 1e0895651c < -: ---------- url-parse: add URL parsing helper function
5: 0bf83ee122 < -: ---------- url-parse: enumerate possible URL components
6: 149c476b1e < -: ---------- url-parse: define component extraction helper fn
7: eb9ef8a17b < -: ---------- url-parse: define string to component converter fn
8: a2acfdbc76 < -: ---------- url-parse: define usage and options
9: 5de00324fb < -: ---------- url-parse: parse options given on the command line
10: 15d355a43c < -: ---------- url-parse: validate all given git URLs
11: 4e93509c80 < -: ---------- url-parse: output URL components selected by user
12: abda074aee ! 7: 3c44e0f478 Documentation: describe the url-parse builtin
@@ Metadata
Author: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
## Commit message ##
- Documentation: describe the url-parse builtin
+ doc: describe the url-parse builtin
The new url-parse builtin validates git URLs
and optionally extracts their components.
+ Helped-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
- ## Documentation/git-url-parse.txt (new) ##
+ ## Documentation/git-url-parse.adoc (new) ##
@@
+git-url-parse(1)
+================
@@ Documentation/git-url-parse.txt (new)
+
+SYNOPSIS
+--------
-+[verse]
-+'git url-parse' [<options>] [--] <url>...
++[synopsis]
++git url-parse [-c <component>] [--] <url>...
+
+DESCRIPTION
+-----------
@@ Documentation/git-url-parse.txt (new)
+This command eases interoperability with git URLs by enabling the
+parsing and extraction of the components of all git URLs.
+
++Any syntactically valid URL is parsed, even if the scheme is not one
++git supports for fetching or pushing.
++
+OPTIONS
+-------
+
-+-c <arg>::
-+--component <arg>::
-+ Extract the `<arg>` component from the given git URLs.
-+ `<arg>` can be one of:
-+ `protocol`, `user`, `password`, `host`, `port`, `path`.
++`-c <component>`::
++`--component <component>`::
++ Extract the _<component>_ component from the given Git URLs.
++ _<component>_ can be one of:
++ `scheme`, `user`, `password`, `host`, `port`, `path`.
++
++OUTPUT
++------
++
++When `--component` is given, the requested component of each URL
++is printed on its own line, in the order the URLs were given. If
++the URL has no such component (for example, a port in a URL that
++does not specify one), an empty line is printed in its place.
++
++When `--component` is not given, no output is produced. The exit
++status is zero if every URL parses successfully and non-zero
++otherwise, allowing the command to be used purely as a validator.
+
+EXAMPLES
+--------
@@ Documentation/git-url-parse.txt (new)
++
+------------
+$ git url-parse --component path https://example.com/user/repo
-+/usr/repo
++/user/repo
+$ git url-parse --component path example.com:~user/repo
+~user/repo
+$ git url-parse --component path example.com:user/repo
@@ Documentation/git-url-parse.txt (new)
+$ git url-parse https://example.com/user/repo example.com:~user/repo
+------------
+
++SEE ALSO
++--------
++linkgit:git-clone[1],
++linkgit:git-fetch[1],
++linkgit:git-config[1]
++
+GIT
+---
+Part of the linkgit:git[1] suite
+
+ ## Documentation/meson.build ##
+@@ Documentation/meson.build: manpages = {
+ 'git-update-server-info.adoc' : 1,
+ 'git-upload-archive.adoc' : 1,
+ 'git-upload-pack.adoc' : 1,
++ 'git-url-parse.adoc' : 1,
+ 'git-var.adoc' : 1,
+ 'git-verify-commit.adoc' : 1,
+ 'git-verify-pack.adoc' : 1,
13: 33e128496b ! 8: cf2ae409e6 tests: add tests for the new url-parse builtin
@@ Metadata
Author: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
## Commit message ##
- tests: add tests for the new url-parse builtin
+ t9904: add tests for the new url-parse builtin
Test git URL parsing, validation and component extraction
on all documented git URL schemes and syntaxes.
+ Add IPv6 host coverage in URL form:
+
+ ssh://[::1]/path
+ ssh://user@[::1]:1234/path
+ git://[::1]:9418/path
+ http://[2001:db8::1]/path
+ https://[2001:db8::1]/path
+
+ In URL form the brackets are kept in the host component (RFC 3986
+ syntax for IPv6 literals).
+
+ Also exercise the bracketed scp short forms that t5601-clone.sh
+ covers via parse_connect_url:
+
+ [host]:path
+ [host:port]:path
+ [::1]:repo
+ user@[::1]:repo
+ user@[host:port]:path
+
+ In scp form, brackets are kept for IPv6 literals (two or more inner
+ colons) and stripped for plain hostnames or host:port pairs.
+
+ Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
+ ## t/meson.build ##
+@@ t/meson.build: integration_tests = [
+ 't9901-git-web--browse.sh',
+ 't9902-completion.sh',
+ 't9903-bash-prompt.sh',
++ 't9904-url-parse.sh',
+ ]
+
+ benchmarks = [
+
## t/t9904-url-parse.sh (new) ##
@@
+#!/bin/sh
+#
-+# Copyright © 2024 Matheus Afonso Martins Moreira
++# Copyright (c) 2024 Matheus Afonso Martins Moreira
+#
+
+test_description='git url-parse tests'
@@ t/t9904-url-parse.sh (new)
+
+test_expect_success 'git url-parse -- file urls' '
+ git url-parse "file:///repository/path" &&
-+ git url-parse "file:///" &&
+ git url-parse "file://"
+'
+
-+test_expect_success 'git url-parse -c protocol -- ssh syntax' '
-+ test ssh = "$(git url-parse -c protocol "ssh://user@example.com:1234/repository/path")" &&
-+ test ssh = "$(git url-parse -c protocol "ssh://user@example.com/repository/path")" &&
-+ test ssh = "$(git url-parse -c protocol "ssh://example.com:1234/repository/path")" &&
-+ test ssh = "$(git url-parse -c protocol "ssh://example.com/repository/path")"
++test_expect_success 'git url-parse -c scheme -- ssh syntax' '
++ test ssh = "$(git url-parse -c scheme "ssh://user@example.com:1234/repository/path")" &&
++ test ssh = "$(git url-parse -c scheme "ssh://user@example.com/repository/path")" &&
++ test ssh = "$(git url-parse -c scheme "ssh://example.com:1234/repository/path")" &&
++ test ssh = "$(git url-parse -c scheme "ssh://example.com/repository/path")"
+'
+
-+test_expect_success 'git url-parse -c protocol -- git syntax' '
-+ test git = "$(git url-parse -c protocol "git://example.com:1234/repository/path")" &&
-+ test git = "$(git url-parse -c protocol "git://example.com/repository/path")"
++test_expect_success 'git url-parse -c scheme -- git syntax' '
++ test git = "$(git url-parse -c scheme "git://example.com:1234/repository/path")" &&
++ test git = "$(git url-parse -c scheme "git://example.com/repository/path")"
+'
+
-+test_expect_success 'git url-parse -c protocol -- http syntax' '
-+ test https = "$(git url-parse -c protocol "https://example.com:1234/repository/path")" &&
-+ test https = "$(git url-parse -c protocol "https://example.com/repository/path")" &&
-+ test http = "$(git url-parse -c protocol "http://example.com:1234/repository/path")" &&
-+ test http = "$(git url-parse -c protocol "http://example.com/repository/path")"
++test_expect_success 'git url-parse -c scheme -- http syntax' '
++ test https = "$(git url-parse -c scheme "https://example.com:1234/repository/path")" &&
++ test https = "$(git url-parse -c scheme "https://example.com/repository/path")" &&
++ test http = "$(git url-parse -c scheme "http://example.com:1234/repository/path")" &&
++ test http = "$(git url-parse -c scheme "http://example.com/repository/path")"
+'
+
-+test_expect_success 'git url-parse -c protocol -- scp syntax' '
-+ test ssh = "$(git url-parse -c protocol "user@example.com:/repository/path")" &&
-+ test ssh = "$(git url-parse -c protocol "example.com:/repository/path")"
++test_expect_success 'git url-parse -c scheme -- scp syntax' '
++ test ssh = "$(git url-parse -c scheme "user@example.com:/repository/path")" &&
++ test ssh = "$(git url-parse -c scheme "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- ssh syntax' '
@@ t/t9904-url-parse.sh (new)
+ test "" = "$(git url-parse -c user "example.com:/repository/path")"
+'
+
++test_expect_success 'git url-parse -c password -- http syntax' '
++ test secret = "$(git url-parse -c password "https://user:secret@example.com:1234/repository/path")" &&
++ test secret = "$(git url-parse -c password "http://user:secret@example.com/repository/path")" &&
++ test "" = "$(git url-parse -c password "https://user@example.com/repository/path")" &&
++ test "" = "$(git url-parse -c password "https://example.com/repository/path")"
++'
++
+test_expect_success 'git url-parse -c host -- ssh syntax' '
+ test example.com = "$(git url-parse -c host "ssh://user@example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "ssh://user@example.com/repository/path")" &&
@@ t/t9904-url-parse.sh (new)
+ test "~user/repository" = "$(git url-parse -c path "example.com:~user/repository")"
+'
+
++test_expect_success 'git url-parse -c path -- username expansion strips query and fragment' '
++ test "~user/repository" = "$(git url-parse -c path "ssh://example.com/~user/repository?query")" &&
++ test "~user/repository" = "$(git url-parse -c path "ssh://example.com/~user/repository#fragment")" &&
++ test "~user/repository" = "$(git url-parse -c path "git://example.com/~user/repository?query")" &&
++ test "~user/repository" = "$(git url-parse -c path "user@example.com:~user/repository?query")"
++'
++
++test_expect_success 'git url-parse -- ssh syntax with IPv6' '
++ git url-parse "ssh://user@[::1]:1234/repository/path" &&
++ git url-parse "ssh://user@[::1]/repository/path" &&
++ git url-parse "ssh://[::1]:1234/repository/path" &&
++ git url-parse "ssh://[::1]/repository/path" &&
++ git url-parse "ssh://[2001:db8::1]/repository/path"
++'
++
++test_expect_success 'git url-parse -- git syntax with IPv6' '
++ git url-parse "git://[::1]:9418/repository/path" &&
++ git url-parse "git://[::1]/repository/path"
++'
++
++test_expect_success 'git url-parse -- http syntax with IPv6' '
++ git url-parse "https://[::1]:1234/repository/path" &&
++ git url-parse "https://[::1]/repository/path" &&
++ git url-parse "http://[2001:db8::1]/repository/path"
++'
++
++test_expect_success 'git url-parse -c host -- IPv6 in URL form' '
++ test "[::1]" = "$(git url-parse -c host "ssh://user@[::1]:1234/repository/path")" &&
++ test "[::1]" = "$(git url-parse -c host "ssh://[::1]/repository/path")" &&
++ test "[2001:db8::1]" = "$(git url-parse -c host "ssh://[2001:db8::1]/repository/path")" &&
++ test "[::1]" = "$(git url-parse -c host "git://[::1]/repository/path")" &&
++ test "[2001:db8::1]" = "$(git url-parse -c host "https://[2001:db8::1]/repository/path")"
++'
++
++test_expect_success 'git url-parse -c port -- IPv6 in URL form' '
++ test 1234 = "$(git url-parse -c port "ssh://user@[::1]:1234/repository/path")" &&
++ test "" = "$(git url-parse -c port "ssh://[::1]/repository/path")" &&
++ test 9418 = "$(git url-parse -c port "git://[::1]:9418/repository/path")"
++'
++
++test_expect_success 'git url-parse -- scp syntax with IPv6' '
++ git url-parse "[::1]:repository/path" &&
++ git url-parse "user@[::1]:repository/path" &&
++ git url-parse "[2001:db8::1]:repo"
++'
++
++test_expect_success 'git url-parse -- scp syntax with bracketed hostname' '
++ git url-parse "[myhost]:src" &&
++ git url-parse "user@[myhost]:src"
++'
++
++test_expect_success 'git url-parse -- scp syntax with bracketed host:port' '
++ git url-parse "[myhost:123]:src" &&
++ git url-parse "user@[myhost:123]:src"
++'
++
++test_expect_success 'git url-parse -c host -- scp+IPv6' '
++ test "[::1]" = "$(git url-parse -c host "[::1]:repository/path")" &&
++ test "[::1]" = "$(git url-parse -c host "user@[::1]:repository/path")" &&
++ test "[2001:db8::1]" = "$(git url-parse -c host "[2001:db8::1]:repo")"
++'
++
++test_expect_success 'git url-parse -c path -- scp+IPv6' '
++ test "/repository/path" = "$(git url-parse -c path "[::1]:/repository/path")" &&
++ test "/repository/path" = "$(git url-parse -c path "[::1]:repository/path")" &&
++ test "/repo" = "$(git url-parse -c path "[2001:db8::1]:repo")"
++'
++
++test_expect_success 'git url-parse -c host,port,path -- scp [host:port]:src' '
++ test myhost = "$(git url-parse -c host "[myhost:123]:src")" &&
++ test 123 = "$(git url-parse -c port "[myhost:123]:src")" &&
++ test "/src" = "$(git url-parse -c path "[myhost:123]:src")"
++'
++
++test_expect_success 'git url-parse -c host,path -- scp [host]:src' '
++ test myhost = "$(git url-parse -c host "[myhost]:src")" &&
++ test "/src" = "$(git url-parse -c path "[myhost]:src")"
++'
++
++test_expect_success 'git url-parse -c user -- scp with user@ and brackets' '
++ test user = "$(git url-parse -c user "user@[::1]:repo")" &&
++ test user = "$(git url-parse -c user "user@[myhost:123]:src")" &&
++ test user = "$(git url-parse -c user "user@[myhost]:src")"
++'
++
++test_expect_success 'git url-parse -- scp+IPv6 with username expansion' '
++ test "~user/repo" = "$(git url-parse -c path "[::1]:~user/repo")" &&
++ test "~user/repo" = "$(git url-parse -c path "user@[::1]:~user/repo")"
++'
++
++test_expect_success 'git url-parse fails on invalid URL' '
++ test_must_fail git url-parse "not a url"
++'
++
++test_expect_success 'git url-parse helpful error for absolute local path' '
++ test_must_fail git url-parse "/abs/path" 2>err &&
++ test_grep "is not a URL" err &&
++ test_grep "file:///abs/path" err
++'
++
++test_expect_success 'git url-parse helpful error for relative local path' '
++ test_must_fail git url-parse "./rel" 2>err &&
++ test_grep "is not a URL" err &&
++ test_grep "absolute path" err
++'
++
++test_expect_success 'git url-parse fails on unknown -c component name' '
++ test_must_fail git url-parse -c bogus "https://example.com/repo"
++'
++
++test_expect_success 'git url-parse fails on URL missing host' '
++ test_must_fail git url-parse "https://"
++'
++
++test_expect_success 'git url-parse with no URL prints usage' '
++ test_must_fail git url-parse 2>err &&
++ test_grep "usage:" err
++'
++
+test_done
--
gitgitgadget
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 1/8] connect: rename enum protocol to url_scheme
2026-05-01 23:15 ` [PATCH v2 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
@ 2026-05-01 23:15 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 2/8] url: move url_is_local_not_ssh to url.h Matheus Afonso Martins Moreira via GitGitGadget
` (7 subsequent siblings)
8 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-01 23:15 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
RFC 1738 names the part of a URL before the colon a "scheme".
connect.c calls it "protocol", which is more generic
and collides with the unrelated enum protocol_version.
Rename:
enum protocol -> enum url_scheme
PROTO_* -> URL_SCHEME_*
prot_name -> url_scheme_name
get_protocol -> url_get_scheme
The local variables in parse_connect_url and git_connect
are renamed accordingly, from protocol to scheme.
No behavior change. The user-visible diagnostics
and translated error messages are preserved:
"Diag: protocol=..."
"protocol '%s' is not supported"
"unknown protocol"
This rename also prepares for moving the scheme-detection functions
to a shared header so that a future plumbing command can parse URLs
using the same logic as the connect path.
Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
connect.c | 68 +++++++++++++++++++++++++++----------------------------
1 file changed, 34 insertions(+), 34 deletions(-)
diff --git a/connect.c b/connect.c
index fcd35c5539..46da89905e 100644
--- a/connect.c
+++ b/connect.c
@@ -700,11 +700,11 @@ int server_supports(const char *feature)
return !!server_feature_value(feature, NULL);
}
-enum protocol {
- PROTO_LOCAL = 1,
- PROTO_FILE,
- PROTO_SSH,
- PROTO_GIT
+enum url_scheme {
+ URL_SCHEME_LOCAL = 1,
+ URL_SCHEME_FILE,
+ URL_SCHEME_SSH,
+ URL_SCHEME_GIT
};
int url_is_local_not_ssh(const char *url)
@@ -715,33 +715,33 @@ int url_is_local_not_ssh(const char *url)
(has_dos_drive_prefix(url) && is_valid_path(url));
}
-static const char *prot_name(enum protocol protocol)
+static const char *url_scheme_name(enum url_scheme scheme)
{
- switch (protocol) {
- case PROTO_LOCAL:
- case PROTO_FILE:
+ switch (scheme) {
+ case URL_SCHEME_LOCAL:
+ case URL_SCHEME_FILE:
return "file";
- case PROTO_SSH:
+ case URL_SCHEME_SSH:
return "ssh";
- case PROTO_GIT:
+ case URL_SCHEME_GIT:
return "git";
default:
return "unknown protocol";
}
}
-static enum protocol get_protocol(const char *name)
+static enum url_scheme url_get_scheme(const char *name)
{
if (!strcmp(name, "ssh"))
- return PROTO_SSH;
+ return URL_SCHEME_SSH;
if (!strcmp(name, "git"))
- return PROTO_GIT;
+ return URL_SCHEME_GIT;
if (!strcmp(name, "git+ssh")) /* deprecated - do not use */
- return PROTO_SSH;
+ return URL_SCHEME_SSH;
if (!strcmp(name, "ssh+git")) /* deprecated - do not use */
- return PROTO_SSH;
+ return URL_SCHEME_SSH;
if (!strcmp(name, "file"))
- return PROTO_FILE;
+ return URL_SCHEME_FILE;
die(_("protocol '%s' is not supported"), name);
}
@@ -1083,14 +1083,14 @@ static char *get_port(char *host)
* Extract protocol and relevant parts from the specified connection URL.
* The caller must free() the returned strings.
*/
-static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
- char **ret_path)
+static enum url_scheme parse_connect_url(const char *url_orig, char **ret_host,
+ char **ret_path)
{
char *url;
char *host, *path;
char *end;
int separator = '/';
- enum protocol protocol = PROTO_LOCAL;
+ enum url_scheme scheme = URL_SCHEME_LOCAL;
if (is_url(url_orig))
url = url_decode(url_orig);
@@ -1100,12 +1100,12 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
host = strstr(url, "://");
if (host) {
*host = '\0';
- protocol = get_protocol(url);
+ scheme = url_get_scheme(url);
host += 3;
} else {
host = url;
if (!url_is_local_not_ssh(url)) {
- protocol = PROTO_SSH;
+ scheme = URL_SCHEME_SSH;
separator = ':';
}
}
@@ -1116,13 +1116,13 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
*/
end = host_end(&host, 0);
- if (protocol == PROTO_LOCAL)
+ if (scheme == URL_SCHEME_LOCAL)
path = end;
- else if (protocol == PROTO_FILE && *host != '/' &&
+ else if (scheme == URL_SCHEME_FILE && *host != '/' &&
!has_dos_drive_prefix(host) &&
offset_1st_component(host - 2) > 1)
path = host - 2; /* include the leading "//" */
- else if (protocol == PROTO_FILE && has_dos_drive_prefix(end))
+ else if (scheme == URL_SCHEME_FILE && has_dos_drive_prefix(end))
path = end; /* "file://$(pwd)" may be "file://C:/projects/repo" */
else
path = strchr(end, separator);
@@ -1138,7 +1138,7 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
end = path; /* Need to \0 terminate host here */
if (separator == ':')
path++; /* path starts after ':' */
- if (protocol == PROTO_GIT || protocol == PROTO_SSH) {
+ if (scheme == URL_SCHEME_GIT || scheme == URL_SCHEME_SSH) {
if (path[1] == '~')
path++;
}
@@ -1149,7 +1149,7 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
*ret_host = xstrdup(host);
*ret_path = path;
free(url);
- return protocol;
+ return scheme;
}
static const char *get_ssh_command(void)
@@ -1434,7 +1434,7 @@ struct child_process *git_connect(int fd[2], const char *url,
{
char *hostandport, *path;
struct child_process *conn;
- enum protocol protocol;
+ enum url_scheme scheme;
enum protocol_version version = get_protocol_version_config();
/*
@@ -1451,14 +1451,14 @@ struct child_process *git_connect(int fd[2], const char *url,
*/
signal(SIGCHLD, SIG_DFL);
- protocol = parse_connect_url(url, &hostandport, &path);
- if ((flags & CONNECT_DIAG_URL) && (protocol != PROTO_SSH)) {
+ scheme = parse_connect_url(url, &hostandport, &path);
+ if ((flags & CONNECT_DIAG_URL) && (scheme != URL_SCHEME_SSH)) {
printf("Diag: url=%s\n", url ? url : "NULL");
- printf("Diag: protocol=%s\n", prot_name(protocol));
+ printf("Diag: protocol=%s\n", url_scheme_name(scheme));
printf("Diag: hostandport=%s\n", hostandport ? hostandport : "NULL");
printf("Diag: path=%s\n", path ? path : "NULL");
conn = NULL;
- } else if (protocol == PROTO_GIT) {
+ } else if (scheme == URL_SCHEME_GIT) {
conn = git_connect_git(fd, hostandport, path, prog, version, flags);
conn->trace2_child_class = "transport/git";
} else {
@@ -1481,7 +1481,7 @@ struct child_process *git_connect(int fd[2], const char *url,
conn->use_shell = 1;
conn->in = conn->out = -1;
- if (protocol == PROTO_SSH) {
+ if (scheme == URL_SCHEME_SSH) {
char *ssh_host = hostandport;
const char *port = NULL;
transport_check_allowed("ssh");
@@ -1492,7 +1492,7 @@ struct child_process *git_connect(int fd[2], const char *url,
if (flags & CONNECT_DIAG_URL) {
printf("Diag: url=%s\n", url ? url : "NULL");
- printf("Diag: protocol=%s\n", prot_name(protocol));
+ printf("Diag: protocol=%s\n", url_scheme_name(scheme));
printf("Diag: userandhost=%s\n", ssh_host ? ssh_host : "NULL");
printf("Diag: port=%s\n", port ? port : "NONE");
printf("Diag: path=%s\n", path ? path : "NULL");
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v2 2/8] url: move url_is_local_not_ssh to url.h
2026-05-01 23:15 ` [PATCH v2 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 1/8] connect: rename enum protocol to url_scheme Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-01 23:15 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 3/8] url: move scheme detection to URL header/source Matheus Afonso Martins Moreira via GitGitGadget
` (6 subsequent siblings)
8 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-01 23:15 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Move url_is_local_not_ssh from connect.c/connect.h
to url.c/url.h so that the new url_parse function
in urlmatch.c, and any future code that needs to
distinguish a local path from an scp style SSH URL,
can reuse the heuristic without depending on connect.c.
No behavior change.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
connect.c | 8 --------
connect.h | 1 -
remote.c | 1 +
url.c | 8 ++++++++
url.h | 2 ++
5 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/connect.c b/connect.c
index 46da89905e..cb145de30e 100644
--- a/connect.c
+++ b/connect.c
@@ -707,14 +707,6 @@ enum url_scheme {
URL_SCHEME_GIT
};
-int url_is_local_not_ssh(const char *url)
-{
- const char *colon = strchr(url, ':');
- const char *slash = strchr(url, '/');
- return !colon || (slash && slash < colon) ||
- (has_dos_drive_prefix(url) && is_valid_path(url));
-}
-
static const char *url_scheme_name(enum url_scheme scheme)
{
switch (scheme) {
diff --git a/connect.h b/connect.h
index 1645126c17..8d84f6656b 100644
--- a/connect.h
+++ b/connect.h
@@ -13,7 +13,6 @@ int git_connection_is_socket(struct child_process *conn);
int server_supports(const char *feature);
int parse_feature_request(const char *features, const char *feature);
const char *server_feature_value(const char *feature, size_t *len_ret);
-int url_is_local_not_ssh(const char *url);
struct packet_reader;
enum protocol_version discover_version(struct packet_reader *reader);
diff --git a/remote.c b/remote.c
index a664cd166a..24a8118d25 100644
--- a/remote.c
+++ b/remote.c
@@ -8,6 +8,7 @@
#include "gettext.h"
#include "hex.h"
#include "remote.h"
+#include "url.h"
#include "urlmatch.h"
#include "refs.h"
#include "refspec.h"
diff --git a/url.c b/url.c
index 3ca5987e90..057576042a 100644
--- a/url.c
+++ b/url.c
@@ -132,3 +132,11 @@ void str_end_url_with_slash(const char *url, char **dest)
free(*dest);
*dest = strbuf_detach(&buf, NULL);
}
+
+int url_is_local_not_ssh(const char *url)
+{
+ const char *colon = strchr(url, ':');
+ const char *slash = strchr(url, '/');
+ return !colon || (slash && slash < colon) ||
+ (has_dos_drive_prefix(url) && is_valid_path(url));
+}
diff --git a/url.h b/url.h
index cd9140e994..39d621312f 100644
--- a/url.h
+++ b/url.h
@@ -21,6 +21,8 @@ char *url_decode_parameter_value(const char **query);
void end_url_with_slash(struct strbuf *buf, const char *url);
void str_end_url_with_slash(const char *url, char **dest);
+int url_is_local_not_ssh(const char *url);
+
/*
* The set of unreserved characters as per STD66 (RFC3986) is
* '[A-Za-z0-9-._~]'. These characters are safe to appear in URI
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v2 3/8] url: move scheme detection to URL header/source
2026-05-01 23:15 ` [PATCH v2 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 1/8] connect: rename enum protocol to url_scheme Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 2/8] url: move url_is_local_not_ssh to url.h Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-01 23:15 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 4/8] url: return URL_SCHEME_UNKNOWN instead of dying Matheus Afonso Martins Moreira via GitGitGadget
` (5 subsequent siblings)
8 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-01 23:15 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Move enum url_scheme and url_get_scheme()
from connect.c to url.h and url.c
so that other code can identify
a URL's scheme without depending
on connect.c.
No behavior change. url_get_scheme() still dies
on an unrecognized scheme name, with the same
translated message as before.
scheme_name() stays in connect.c
because it has no other callers.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
connect.c | 22 ----------------------
url.c | 16 ++++++++++++++++
url.h | 13 +++++++++++++
3 files changed, 29 insertions(+), 22 deletions(-)
diff --git a/connect.c b/connect.c
index cb145de30e..1ac7acc6e8 100644
--- a/connect.c
+++ b/connect.c
@@ -700,13 +700,6 @@ int server_supports(const char *feature)
return !!server_feature_value(feature, NULL);
}
-enum url_scheme {
- URL_SCHEME_LOCAL = 1,
- URL_SCHEME_FILE,
- URL_SCHEME_SSH,
- URL_SCHEME_GIT
-};
-
static const char *url_scheme_name(enum url_scheme scheme)
{
switch (scheme) {
@@ -722,21 +715,6 @@ static const char *url_scheme_name(enum url_scheme scheme)
}
}
-static enum url_scheme url_get_scheme(const char *name)
-{
- if (!strcmp(name, "ssh"))
- return URL_SCHEME_SSH;
- if (!strcmp(name, "git"))
- return URL_SCHEME_GIT;
- if (!strcmp(name, "git+ssh")) /* deprecated - do not use */
- return URL_SCHEME_SSH;
- if (!strcmp(name, "ssh+git")) /* deprecated - do not use */
- return URL_SCHEME_SSH;
- if (!strcmp(name, "file"))
- return URL_SCHEME_FILE;
- die(_("protocol '%s' is not supported"), name);
-}
-
static char *host_end(char **hoststart, int removebrackets)
{
char *host = *hoststart;
diff --git a/url.c b/url.c
index 057576042a..300acf98fe 100644
--- a/url.c
+++ b/url.c
@@ -1,4 +1,5 @@
#include "git-compat-util.h"
+#include "gettext.h"
#include "hex-ll.h"
#include "strbuf.h"
#include "url.h"
@@ -140,3 +141,18 @@ int url_is_local_not_ssh(const char *url)
return !colon || (slash && slash < colon) ||
(has_dos_drive_prefix(url) && is_valid_path(url));
}
+
+enum url_scheme url_get_scheme(const char *name)
+{
+ if (!strcmp(name, "ssh"))
+ return URL_SCHEME_SSH;
+ if (!strcmp(name, "git"))
+ return URL_SCHEME_GIT;
+ if (!strcmp(name, "git+ssh")) /* deprecated - do not use */
+ return URL_SCHEME_SSH;
+ if (!strcmp(name, "ssh+git")) /* deprecated - do not use */
+ return URL_SCHEME_SSH;
+ if (!strcmp(name, "file"))
+ return URL_SCHEME_FILE;
+ die(_("protocol '%s' is not supported"), name);
+}
diff --git a/url.h b/url.h
index 39d621312f..24c8cd91d0 100644
--- a/url.h
+++ b/url.h
@@ -23,6 +23,19 @@ void str_end_url_with_slash(const char *url, char **dest);
int url_is_local_not_ssh(const char *url);
+enum url_scheme {
+ URL_SCHEME_LOCAL = 1,
+ URL_SCHEME_FILE,
+ URL_SCHEME_SSH,
+ URL_SCHEME_GIT,
+};
+
+/*
+ * Identify the URL scheme by name. Dies if the name does not match
+ * any scheme that Git knows about.
+ */
+enum url_scheme url_get_scheme(const char *name);
+
/*
* The set of unreserved characters as per STD66 (RFC3986) is
* '[A-Za-z0-9-._~]'. These characters are safe to appear in URI
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v2 4/8] url: return URL_SCHEME_UNKNOWN instead of dying
2026-05-01 23:15 ` [PATCH v2 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (2 preceding siblings ...)
2026-05-01 23:15 ` [PATCH v2 3/8] url: move scheme detection to URL header/source Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-01 23:15 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 5/8] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
` (4 subsequent siblings)
8 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-01 23:15 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Enumerate a URL_SCHEME_UNKNOWN result with value 0.
Have url_get_scheme() return it for unrecognized
schemes instead of calling die() itself.
Move the die() call to parse_connect_url()
where url_get_scheme() is used.
This lets url_get_scheme() be used from contexts
that need to identify a URL's scheme without aborting
the program. For example, a future plumbing command
that validates URLs.
No external behavior change. parse_connect_url() still dies
with the same translated message for unrecognized schemes.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
connect.c | 2 ++
url.c | 3 +--
url.h | 7 ++++---
3 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/connect.c b/connect.c
index 1ac7acc6e8..73d7a6b8d0 100644
--- a/connect.c
+++ b/connect.c
@@ -1071,6 +1071,8 @@ static enum url_scheme parse_connect_url(const char *url_orig, char **ret_host,
if (host) {
*host = '\0';
scheme = url_get_scheme(url);
+ if (scheme == URL_SCHEME_UNKNOWN)
+ die(_("protocol '%s' is not supported"), url);
host += 3;
} else {
host = url;
diff --git a/url.c b/url.c
index 300acf98fe..a59818278f 100644
--- a/url.c
+++ b/url.c
@@ -1,5 +1,4 @@
#include "git-compat-util.h"
-#include "gettext.h"
#include "hex-ll.h"
#include "strbuf.h"
#include "url.h"
@@ -154,5 +153,5 @@ enum url_scheme url_get_scheme(const char *name)
return URL_SCHEME_SSH;
if (!strcmp(name, "file"))
return URL_SCHEME_FILE;
- die(_("protocol '%s' is not supported"), name);
+ return URL_SCHEME_UNKNOWN;
}
diff --git a/url.h b/url.h
index 24c8cd91d0..7289523605 100644
--- a/url.h
+++ b/url.h
@@ -24,15 +24,16 @@ void str_end_url_with_slash(const char *url, char **dest);
int url_is_local_not_ssh(const char *url);
enum url_scheme {
- URL_SCHEME_LOCAL = 1,
+ URL_SCHEME_UNKNOWN = 0,
+ URL_SCHEME_LOCAL,
URL_SCHEME_FILE,
URL_SCHEME_SSH,
URL_SCHEME_GIT,
};
/*
- * Identify the URL scheme by name. Dies if the name does not match
- * any scheme that Git knows about.
+ * Identify the URL scheme by name. Returns URL_SCHEME_UNKNOWN
+ * if the name does not match any scheme that Git knows about.
*/
enum url_scheme url_get_scheme(const char *name);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v2 5/8] urlmatch: define url_parse function
2026-05-01 23:15 ` [PATCH v2 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (3 preceding siblings ...)
2026-05-01 23:15 ` [PATCH v2 4/8] url: return URL_SCHEME_UNKNOWN instead of dying Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-01 23:15 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 6/8] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
` (3 subsequent siblings)
8 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-01 23:15 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Define url_parse, a general parsing function that supports all Git URLs
including scp style URLs such as hostname:~user/repo.
It is adapted from the algorithm in connect.c's parse_connect_url
and reuses the shared enum url_scheme and url_get_scheme function
that previous commits made available in url.h. The new parser and
the connect path agree on scheme classification. url_parse has the
same interface as url_normalize and uses the same data structures.
Both functions accept the same URL forms with one deliberate
exception. Bare local paths such as "/abs/path", "./rel"
or "repo" are accepted by parse_connect_url as URL_SCHEME_LOCAL,
but rejected by url_parse because url_normalize requires a URL
with a scheme://host form. A consumer that wants to handle both
URLs and local paths needs to dispatch on url_is_local_not_ssh
before calling url_parse, just as the connect path does internally.
The duplication with parse_connect_url is intentional.
The two functions have different contracts:
- parse_connect_url
Calls die() on an unknown scheme
and returns NUL-terminated host/path
strings for the connect path
- url_parse
Returns NULL on failure while populating
out_info->err, and exposes components
as offset/length pairs into the normalized
URL buffer, matching url_normalize.
Reconciling both is possible, but not in the scope
of the current patch set.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
t/unit-tests/u-urlmatch-normalization.c | 45 +++++++++
urlmatch.c | 127 ++++++++++++++++++++++++
urlmatch.h | 1 +
3 files changed, 173 insertions(+)
diff --git a/t/unit-tests/u-urlmatch-normalization.c b/t/unit-tests/u-urlmatch-normalization.c
index 39f6e1ba26..3595d893a2 100644
--- a/t/unit-tests/u-urlmatch-normalization.c
+++ b/t/unit-tests/u-urlmatch-normalization.c
@@ -245,3 +245,48 @@ void test_urlmatch_normalization__equivalents(void)
compare_normalized_urls("https://@x.y/^/../abc", "httpS://@x.y:0443/abc", 1);
compare_normalized_urls("https://@x.y/^/..", "httpS://@x.y:0443/", 1);
}
+
+static void check_parsed_path(const char *url, const char *expected_path)
+{
+ struct url_info info;
+ char *parsed = url_parse(url, &info);
+ char *path;
+
+ cl_assert(parsed != NULL);
+ path = xstrndup(parsed + info.path_off, info.path_len);
+ cl_assert_equal_s(path, expected_path);
+ free(path);
+ free(parsed);
+}
+
+void test_urlmatch_normalization__parse_scp(void)
+{
+ check_parsed_path("host:path", "/path");
+ check_parsed_path("user@host:path", "/path");
+ check_parsed_path("host:~user/repo", "~user/repo");
+ check_parsed_path("user@host:~user/repo", "~user/repo");
+ check_parsed_path("[host]:src", "/src");
+ check_parsed_path("[host:123]:src", "/src");
+ check_parsed_path("[::1]:repo", "/repo");
+ check_parsed_path("user@[::1]:repo", "/repo");
+}
+
+void test_urlmatch_normalization__parse_url_form(void)
+{
+ check_parsed_path("ssh://host/repo", "/repo");
+ check_parsed_path("ssh://host/~user/repo", "~user/repo");
+ check_parsed_path("git://host:9418/repo", "/repo");
+ check_parsed_path("git://host/~user/repo", "~user/repo");
+ check_parsed_path("ssh://[::1]:1234/repo", "/repo");
+ check_parsed_path("http://[2001:db8::1]/repo", "/repo");
+}
+
+void test_urlmatch_normalization__parse_strips_query_and_fragment(void)
+{
+ check_parsed_path("ssh://host/~user/repo?q", "~user/repo");
+ check_parsed_path("ssh://host/~user/repo#frag", "~user/repo");
+ check_parsed_path("git://host/~user/repo?q", "~user/repo");
+ check_parsed_path("user@host:~user/repo?q", "~user/repo");
+ check_parsed_path("https://host/repo?q", "/repo");
+ check_parsed_path("https://host/repo#frag", "/repo");
+}
diff --git a/urlmatch.c b/urlmatch.c
index eea8300489..bf8cce6de9 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -5,6 +5,7 @@
#include "hex-ll.h"
#include "strbuf.h"
#include "urlmatch.h"
+#include "url.h"
#define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
#define URL_DIGIT "0123456789"
@@ -440,6 +441,132 @@ char *url_normalize(const char *url, struct url_info *out_info)
return url_normalize_1(url, out_info, 0);
}
+char *url_parse(const char *url_orig, struct url_info *out_info)
+{
+ struct strbuf url;
+ char *host, *separator;
+ char *detached, *normalized;
+ char *url_decoded;
+ enum url_scheme scheme = URL_SCHEME_LOCAL;
+ struct url_info local_info;
+ struct url_info *info = out_info ? out_info : &local_info;
+ bool scp_syntax = false;
+
+ if (is_url(url_orig))
+ url_decoded = url_decode(url_orig);
+ else
+ url_decoded = xstrdup(url_orig);
+
+ strbuf_init(&url, strlen(url_decoded) + sizeof("ssh://"));
+ strbuf_addstr(&url, url_decoded);
+ free(url_decoded);
+
+ host = strstr(url.buf, "://");
+ if (host) {
+ /*
+ * Temporarily NUL-terminate the scheme name
+ * so we can pass it to url_get_scheme(),
+ * then restore the ':' so the buffer
+ * is intact for url_normalize() below.
+ */
+ char saved = *host;
+ *host = '\0';
+ scheme = url_get_scheme(url.buf);
+ *host = saved;
+ host += 3;
+ } else {
+ if (!url_is_local_not_ssh(url.buf)) {
+ scp_syntax = true;
+ scheme = URL_SCHEME_SSH;
+ strbuf_insertstr(&url, 0, "ssh://");
+ host = url.buf + strlen("ssh://");
+ }
+ }
+
+ /*
+ * Path starts after ':' in scp style SSH URLs.
+ *
+ * The host portion can begin with an optional "user@",
+ * and the host itself can be wrapped in '[' ']' brackets.
+ * The bracket form is git's legacy way of supporting:
+ *
+ * - IPv6 literals: [::1]:repo
+ * - host:port pairs in the short form: [myhost:123]:src
+ * - Plain hostnames that happen to need bracketing: [host]:path
+ *
+ * Treat '[' followed by 0 or 1 inner colons as the host:port
+ * or plain hostname form and strip the brackets so url_normalize
+ * sees host[:port] natively. Two or more inner colons mark an
+ * IPv6 literal: keep the brackets for url_normalize to recognize.
+ *
+ * The scp path separator is the ':' that follows the host part,
+ * and we must skip over user@ and any '[...]' before searching.
+ */
+ if (scp_syntax) {
+ char *user_at;
+ char *host_start;
+ char *bracket_end;
+
+ user_at = strchr(host, '@');
+ host_start = user_at ? user_at + 1 : host;
+
+ if (*host_start == '[') {
+ char *p;
+ int inner_colons;
+
+ bracket_end = strchr(host_start, ']');
+ inner_colons = 0;
+ for (p = host_start + 1; bracket_end && p < bracket_end; p++)
+ if (*p == ':')
+ inner_colons++;
+
+ if (bracket_end && inner_colons <= 1) {
+ size_t close_off = bracket_end - url.buf;
+ size_t open_off = host_start - url.buf;
+ strbuf_remove(&url, close_off, 1);
+ strbuf_remove(&url, open_off, 1);
+ separator = url.buf + close_off - 1;
+ } else if (bracket_end) {
+ separator = strchr(bracket_end + 1, ':');
+ } else {
+ separator = strchr(host_start, ':');
+ }
+ } else {
+ separator = strchr(host_start, ':');
+ }
+
+ if (separator) {
+ if (separator[1] == '/')
+ strbuf_remove(&url, separator - url.buf, 1);
+ else
+ *separator = '/';
+ }
+ }
+
+ detached = strbuf_detach(&url, NULL);
+ normalized = url_normalize(detached, info);
+ free(detached);
+
+ if (!normalized)
+ return NULL;
+
+ /*
+ * Point path to ~ for URLs like this:
+ *
+ * ssh://host.xz/~user/repo
+ * git://host.xz/~user/repo
+ * host.xz:~user/repo
+ */
+ if (scheme == URL_SCHEME_GIT || scheme == URL_SCHEME_SSH) {
+ if (normalized[info->path_off + 1] == '~') {
+ info->path_off++;
+ info->path_len--;
+ }
+ }
+
+ return normalized;
+}
+
static size_t url_match_prefix(const char *url,
const char *url_prefix,
size_t url_prefix_len)
diff --git a/urlmatch.h b/urlmatch.h
index 5ba85cea13..6b3ce42858 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -35,6 +35,7 @@ struct url_info {
};
char *url_normalize(const char *, struct url_info *);
+char *url_parse(const char *, struct url_info *);
struct urlmatch_item {
size_t hostmatch_len;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v2 6/8] builtin: create url-parse command
2026-05-01 23:15 ` [PATCH v2 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (4 preceding siblings ...)
2026-05-01 23:15 ` [PATCH v2 5/8] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-01 23:15 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 7/8] doc: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
` (2 subsequent siblings)
8 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-01 23:15 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Git commands can accept a rather wide variety of URLs syntaxes.
The range of accepted inputs might expand even more in the future.
This makes the parsing of URL components difficult since standard URL
parsers cannot be used. Extracting the components of a git URL would
require implementing all the schemes that git itself supports, not to
mention tracking its development continuously in case new URL schemes
are added.
The url-parse builtin command is designed to solve this problem
by exposing git's native URL parsing facilities as a plumbing command.
Other programs can then call upon git itself to parse the git URLs
and extract their components. This should be quite useful for scripts.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
.gitignore | 1 +
Makefile | 1 +
builtin.h | 1 +
builtin/url-parse.c | 132 ++++++++++++++++++++++++++++++++++++++++++++
command-list.txt | 1 +
git.c | 1 +
meson.build | 1 +
7 files changed, 138 insertions(+)
create mode 100644 builtin/url-parse.c
diff --git a/.gitignore b/.gitignore
index 24635cf2d6..c5673daa6e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -182,6 +182,7 @@
/git-update-server-info
/git-upload-archive
/git-upload-pack
+/git-url-parse
/git-var
/git-verify-commit
/git-verify-pack
diff --git a/Makefile b/Makefile
index cedc234173..1c757a1aa0 100644
--- a/Makefile
+++ b/Makefile
@@ -1497,6 +1497,7 @@ BUILTIN_OBJS += builtin/update-ref.o
BUILTIN_OBJS += builtin/update-server-info.o
BUILTIN_OBJS += builtin/upload-archive.o
BUILTIN_OBJS += builtin/upload-pack.o
+BUILTIN_OBJS += builtin/url-parse.o
BUILTIN_OBJS += builtin/var.o
BUILTIN_OBJS += builtin/verify-commit.o
BUILTIN_OBJS += builtin/verify-pack.o
diff --git a/builtin.h b/builtin.h
index 235c51f30e..c6f7672991 100644
--- a/builtin.h
+++ b/builtin.h
@@ -271,6 +271,7 @@ int cmd_update_server_info(int argc, const char **argv, const char *prefix, stru
int cmd_upload_archive(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_upload_pack(int argc, const char **argv, const char *prefix, struct repository *repo);
+int cmd_url_parse(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_var(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_verify_commit(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_verify_tag(int argc, const char **argv, const char *prefix, struct repository *repo);
diff --git a/builtin/url-parse.c b/builtin/url-parse.c
new file mode 100644
index 0000000000..6c70c131e1
--- /dev/null
+++ b/builtin/url-parse.c
@@ -0,0 +1,132 @@
+#include "builtin.h"
+#include "gettext.h"
+#include "parse-options.h"
+#include "url.h"
+#include "urlmatch.h"
+
+static const char * const builtin_url_parse_usage[] = {
+ N_("git url-parse [-c <component>] [--] <url>..."),
+ NULL
+};
+
+static char *component_arg;
+
+static struct option builtin_url_parse_options[] = {
+ OPT_STRING('c', "component", &component_arg, N_("component"),
+ N_("which URL component to extract")),
+ OPT_END(),
+};
+
+enum url_component {
+ URL_NONE = 0,
+ URL_SCHEME,
+ URL_USER,
+ URL_PASSWORD,
+ URL_HOST,
+ URL_PORT,
+ URL_PATH,
+};
+
+static void parse_or_die(const char *url, struct url_info *info)
+{
+ if (url_is_local_not_ssh(url)) {
+ if (*url == '/')
+ die("'%s' is not a URL; if you meant a local "
+ "repository, use 'file://%s'", url, url);
+ die("'%s' is not a URL; if you meant a local repository, "
+ "use a 'file://' URL with an absolute path", url);
+ }
+ if (!url_parse(url, info))
+ die("invalid git URL '%s': %s", url, info->err);
+}
+
+static enum url_component get_component_or_die(const char *arg)
+{
+ if (!strcmp("path", arg))
+ return URL_PATH;
+ if (!strcmp("host", arg))
+ return URL_HOST;
+ if (!strcmp("scheme", arg))
+ return URL_SCHEME;
+ if (!strcmp("user", arg))
+ return URL_USER;
+ if (!strcmp("password", arg))
+ return URL_PASSWORD;
+ if (!strcmp("port", arg))
+ return URL_PORT;
+ die("invalid git URL component '%s'", arg);
+}
+
+static char *extract_component(enum url_component component,
+ struct url_info *info)
+{
+ size_t offset, length;
+
+ switch (component) {
+ case URL_SCHEME:
+ offset = 0;
+ length = info->scheme_len;
+ break;
+ case URL_USER:
+ offset = info->user_off;
+ length = info->user_len;
+ break;
+ case URL_PASSWORD:
+ offset = info->passwd_off;
+ length = info->passwd_len;
+ break;
+ case URL_HOST:
+ offset = info->host_off;
+ length = info->host_len;
+ break;
+ case URL_PORT:
+ offset = info->port_off;
+ length = info->port_len;
+ break;
+ case URL_PATH:
+ offset = info->path_off;
+ length = info->path_len;
+ break;
+ case URL_NONE:
+ return NULL;
+ }
+
+ return xstrndup(info->url + offset, length);
+}
+
+int cmd_url_parse(int argc,
+ const char **argv,
+ const char *prefix,
+ struct repository *repo UNUSED)
+{
+ struct url_info info;
+ enum url_component selected = URL_NONE;
+ char *extracted;
+ int i;
+
+ argc = parse_options(argc, argv, prefix, builtin_url_parse_options,
+ builtin_url_parse_usage, 0);
+
+ if (argc == 0)
+ usage_with_options(builtin_url_parse_usage,
+ builtin_url_parse_options);
+
+ if (component_arg)
+ selected = get_component_or_die(component_arg);
+
+ for (i = 0; i < argc; i++) {
+ parse_or_die(argv[i], &info);
+
+ if (selected != URL_NONE) {
+ extracted = extract_component(selected, &info);
+ if (extracted) {
+ puts(extracted);
+ free(extracted);
+ }
+ }
+
+ free(info.url);
+ }
+
+ return 0;
+}
diff --git a/command-list.txt b/command-list.txt
index f9005cf459..1ede48186f 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -202,6 +202,7 @@ git-update-ref plumbingmanipulators
git-update-server-info synchingrepositories
git-upload-archive synchelpers
git-upload-pack synchelpers
+git-url-parse purehelpers
git-var plumbinginterrogators
git-verify-commit ancillaryinterrogators
git-verify-pack plumbinginterrogators
diff --git a/git.c b/git.c
index 5a40eab8a2..a073eed931 100644
--- a/git.c
+++ b/git.c
@@ -670,6 +670,7 @@ static struct cmd_struct commands[] = {
{ "upload-archive", cmd_upload_archive, NO_PARSEOPT },
{ "upload-archive--writer", cmd_upload_archive_writer, NO_PARSEOPT },
{ "upload-pack", cmd_upload_pack },
+ { "url-parse", cmd_url_parse },
{ "var", cmd_var, RUN_SETUP_GENTLY | NO_PARSEOPT },
{ "verify-commit", cmd_verify_commit, RUN_SETUP },
{ "verify-pack", cmd_verify_pack },
diff --git a/meson.build b/meson.build
index 11488623bf..dc3cf68ee5 100644
--- a/meson.build
+++ b/meson.build
@@ -686,6 +686,7 @@ builtin_sources = [
'builtin/update-server-info.c',
'builtin/upload-archive.c',
'builtin/upload-pack.c',
+ 'builtin/url-parse.c',
'builtin/var.c',
'builtin/verify-commit.c',
'builtin/verify-pack.c',
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v2 7/8] doc: describe the url-parse builtin
2026-05-01 23:15 ` [PATCH v2 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (5 preceding siblings ...)
2026-05-01 23:15 ` [PATCH v2 6/8] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-01 23:15 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 8/8] t9904: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
8 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-01 23:15 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
The new url-parse builtin validates git URLs
and optionally extracts their components.
Helped-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
Documentation/git-url-parse.adoc | 80 ++++++++++++++++++++++++++++++++
Documentation/meson.build | 1 +
2 files changed, 81 insertions(+)
create mode 100644 Documentation/git-url-parse.adoc
diff --git a/Documentation/git-url-parse.adoc b/Documentation/git-url-parse.adoc
new file mode 100644
index 0000000000..9d0d93da4a
--- /dev/null
+++ b/Documentation/git-url-parse.adoc
@@ -0,0 +1,80 @@
+git-url-parse(1)
+================
+
+NAME
+----
+git-url-parse - Parse and extract git URL components
+
+SYNOPSIS
+--------
+[synopsis]
+git url-parse [-c <component>] [--] <url>...
+
+DESCRIPTION
+-----------
+
+Git supports many ways to specify URLs, some of them non-standard.
+For example, git supports the scp style [user@]host:[path] format.
+This command eases interoperability with git URLs by enabling the
+parsing and extraction of the components of all git URLs.
+
+Any syntactically valid URL is parsed, even if the scheme is not one
+git supports for fetching or pushing.
+
+OPTIONS
+-------
+
+`-c <component>`::
+`--component <component>`::
+ Extract the _<component>_ component from the given Git URLs.
+ _<component>_ can be one of:
+ `scheme`, `user`, `password`, `host`, `port`, `path`.
+
+OUTPUT
+------
+
+When `--component` is given, the requested component of each URL
+is printed on its own line, in the order the URLs were given. If
+the URL has no such component (for example, a port in a URL that
+does not specify one), an empty line is printed in its place.
+
+When `--component` is not given, no output is produced. The exit
+status is zero if every URL parses successfully and non-zero
+otherwise, allowing the command to be used purely as a validator.
+
+EXAMPLES
+--------
+
+* Print the host name:
++
+------------
+$ git url-parse --component host https://example.com/user/repo
+example.com
+------------
+
+* Print the path:
++
+------------
+$ git url-parse --component path https://example.com/user/repo
+/user/repo
+$ git url-parse --component path example.com:~user/repo
+~user/repo
+$ git url-parse --component path example.com:user/repo
+/user/repo
+------------
+
+* Validate URLs without outputting anything:
++
+------------
+$ git url-parse https://example.com/user/repo example.com:~user/repo
+------------
+
+SEE ALSO
+--------
+linkgit:git-clone[1],
+linkgit:git-fetch[1],
+linkgit:git-config[1]
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Documentation/meson.build b/Documentation/meson.build
index d6365b888b..32c8606a80 100644
--- a/Documentation/meson.build
+++ b/Documentation/meson.build
@@ -155,6 +155,7 @@ manpages = {
'git-update-server-info.adoc' : 1,
'git-upload-archive.adoc' : 1,
'git-upload-pack.adoc' : 1,
+ 'git-url-parse.adoc' : 1,
'git-var.adoc' : 1,
'git-verify-commit.adoc' : 1,
'git-verify-pack.adoc' : 1,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v2 8/8] t9904: add tests for the new url-parse builtin
2026-05-01 23:15 ` [PATCH v2 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (6 preceding siblings ...)
2026-05-01 23:15 ` [PATCH v2 7/8] doc: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-01 23:15 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
8 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-01 23:15 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Test git URL parsing, validation and component extraction
on all documented git URL schemes and syntaxes.
Add IPv6 host coverage in URL form:
ssh://[::1]/path
ssh://user@[::1]:1234/path
git://[::1]:9418/path
http://[2001:db8::1]/path
https://[2001:db8::1]/path
In URL form the brackets are kept in the host component (RFC 3986
syntax for IPv6 literals).
Also exercise the bracketed scp short forms that t5601-clone.sh
covers via parse_connect_url:
[host]:path
[host:port]:path
[::1]:repo
user@[::1]:repo
user@[host:port]:path
In scp form, brackets are kept for IPv6 literals (two or more inner
colons) and stripped for plain hostnames or host:port pairs.
Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
t/meson.build | 1 +
t/t9904-url-parse.sh | 319 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 320 insertions(+)
create mode 100755 t/t9904-url-parse.sh
diff --git a/t/meson.build b/t/meson.build
index 7528e5cda5..41b389a472 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -1114,6 +1114,7 @@ integration_tests = [
't9901-git-web--browse.sh',
't9902-completion.sh',
't9903-bash-prompt.sh',
+ 't9904-url-parse.sh',
]
benchmarks = [
diff --git a/t/t9904-url-parse.sh b/t/t9904-url-parse.sh
new file mode 100755
index 0000000000..32b3f4a286
--- /dev/null
+++ b/t/t9904-url-parse.sh
@@ -0,0 +1,319 @@
+#!/bin/sh
+#
+# Copyright (c) 2024 Matheus Afonso Martins Moreira
+#
+
+test_description='git url-parse tests'
+
+. ./test-lib.sh
+
+test_expect_success 'git url-parse -- ssh syntax' '
+ git url-parse "ssh://user@example.com:1234/repository/path" &&
+ git url-parse "ssh://user@example.com/repository/path" &&
+ git url-parse "ssh://example.com:1234/repository/path" &&
+ git url-parse "ssh://example.com/repository/path"
+'
+
+test_expect_success 'git url-parse -- git syntax' '
+ git url-parse "git://example.com:1234/repository/path" &&
+ git url-parse "git://example.com/repository/path"
+'
+
+test_expect_success 'git url-parse -- http syntax' '
+ git url-parse "https://example.com:1234/repository/path" &&
+ git url-parse "https://example.com/repository/path" &&
+ git url-parse "http://example.com:1234/repository/path" &&
+ git url-parse "http://example.com/repository/path"
+'
+
+test_expect_success 'git url-parse -- scp syntax' '
+ git url-parse "user@example.com:/repository/path" &&
+ git url-parse "example.com:/repository/path"
+'
+
+test_expect_success 'git url-parse -- username expansion - ssh syntax' '
+ git url-parse "ssh://user@example.com:1234/~user/repository" &&
+ git url-parse "ssh://user@example.com/~user/repository" &&
+ git url-parse "ssh://example.com:1234/~user/repository" &&
+ git url-parse "ssh://example.com/~user/repository"
+'
+
+test_expect_success 'git url-parse -- username expansion - git syntax' '
+ git url-parse "git://example.com:1234/~user/repository" &&
+ git url-parse "git://example.com/~user/repository"
+'
+
+test_expect_success 'git url-parse -- username expansion - scp syntax' '
+ git url-parse "user@example.com:~user/repository" &&
+ git url-parse "example.com:~user/repository"
+'
+
+test_expect_success 'git url-parse -- file urls' '
+ git url-parse "file:///repository/path" &&
+ git url-parse "file://"
+'
+
+test_expect_success 'git url-parse -c scheme -- ssh syntax' '
+ test ssh = "$(git url-parse -c scheme "ssh://user@example.com:1234/repository/path")" &&
+ test ssh = "$(git url-parse -c scheme "ssh://user@example.com/repository/path")" &&
+ test ssh = "$(git url-parse -c scheme "ssh://example.com:1234/repository/path")" &&
+ test ssh = "$(git url-parse -c scheme "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c scheme -- git syntax' '
+ test git = "$(git url-parse -c scheme "git://example.com:1234/repository/path")" &&
+ test git = "$(git url-parse -c scheme "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c scheme -- http syntax' '
+ test https = "$(git url-parse -c scheme "https://example.com:1234/repository/path")" &&
+ test https = "$(git url-parse -c scheme "https://example.com/repository/path")" &&
+ test http = "$(git url-parse -c scheme "http://example.com:1234/repository/path")" &&
+ test http = "$(git url-parse -c scheme "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c scheme -- scp syntax' '
+ test ssh = "$(git url-parse -c scheme "user@example.com:/repository/path")" &&
+ test ssh = "$(git url-parse -c scheme "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- ssh syntax' '
+ test user = "$(git url-parse -c user "ssh://user@example.com:1234/repository/path")" &&
+ test user = "$(git url-parse -c user "ssh://user@example.com/repository/path")" &&
+ test "" = "$(git url-parse -c user "ssh://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c user "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- git syntax' '
+ test "" = "$(git url-parse -c user "git://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c user "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- http syntax' '
+ test "" = "$(git url-parse -c user "https://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c user "https://example.com/repository/path")" &&
+ test "" = "$(git url-parse -c user "http://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c user "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- scp syntax' '
+ test user = "$(git url-parse -c user "user@example.com:/repository/path")" &&
+ test "" = "$(git url-parse -c user "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c password -- http syntax' '
+ test secret = "$(git url-parse -c password "https://user:secret@example.com:1234/repository/path")" &&
+ test secret = "$(git url-parse -c password "http://user:secret@example.com/repository/path")" &&
+ test "" = "$(git url-parse -c password "https://user@example.com/repository/path")" &&
+ test "" = "$(git url-parse -c password "https://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- ssh syntax' '
+ test example.com = "$(git url-parse -c host "ssh://user@example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "ssh://user@example.com/repository/path")" &&
+ test example.com = "$(git url-parse -c host "ssh://example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- git syntax' '
+ test example.com = "$(git url-parse -c host "git://example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- http syntax' '
+ test example.com = "$(git url-parse -c host "https://example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "https://example.com/repository/path")" &&
+ test example.com = "$(git url-parse -c host "http://example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- scp syntax' '
+ test example.com = "$(git url-parse -c host "user@example.com:/repository/path")" &&
+ test example.com = "$(git url-parse -c host "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- ssh syntax' '
+ test 1234 = "$(git url-parse -c port "ssh://user@example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "ssh://user@example.com/repository/path")" &&
+ test 1234 = "$(git url-parse -c port "ssh://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- git syntax' '
+ test 1234 = "$(git url-parse -c port "git://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- http syntax' '
+ test 1234 = "$(git url-parse -c port "https://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "https://example.com/repository/path")" &&
+ test 1234 = "$(git url-parse -c port "http://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- scp syntax' '
+ test "" = "$(git url-parse -c port "user@example.com:/repository/path")" &&
+ test "" = "$(git url-parse -c port "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- ssh syntax' '
+ test "/repository/path" = "$(git url-parse -c path "ssh://user@example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "ssh://user@example.com/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "ssh://example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- git syntax' '
+ test "/repository/path" = "$(git url-parse -c path "git://example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- http syntax' '
+ test "/repository/path" = "$(git url-parse -c path "https://example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "https://example.com/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "http://example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- scp syntax' '
+ test "/repository/path" = "$(git url-parse -c path "user@example.com:/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- username expansion - ssh syntax' '
+ test "~user/repository" = "$(git url-parse -c path "ssh://user@example.com:1234/~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "ssh://user@example.com/~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "ssh://example.com:1234/~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "ssh://example.com/~user/repository")"
+'
+
+test_expect_success 'git url-parse -c path -- username expansion - git syntax' '
+ test "~user/repository" = "$(git url-parse -c path "git://example.com:1234/~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "git://example.com/~user/repository")"
+'
+
+test_expect_success 'git url-parse -c path -- username expansion - scp syntax' '
+ test "~user/repository" = "$(git url-parse -c path "user@example.com:~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "example.com:~user/repository")"
+'
+
+test_expect_success 'git url-parse -c path -- username expansion strips query and fragment' '
+ test "~user/repository" = "$(git url-parse -c path "ssh://example.com/~user/repository?query")" &&
+ test "~user/repository" = "$(git url-parse -c path "ssh://example.com/~user/repository#fragment")" &&
+ test "~user/repository" = "$(git url-parse -c path "git://example.com/~user/repository?query")" &&
+ test "~user/repository" = "$(git url-parse -c path "user@example.com:~user/repository?query")"
+'
+
+test_expect_success 'git url-parse -- ssh syntax with IPv6' '
+ git url-parse "ssh://user@[::1]:1234/repository/path" &&
+ git url-parse "ssh://user@[::1]/repository/path" &&
+ git url-parse "ssh://[::1]:1234/repository/path" &&
+ git url-parse "ssh://[::1]/repository/path" &&
+ git url-parse "ssh://[2001:db8::1]/repository/path"
+'
+
+test_expect_success 'git url-parse -- git syntax with IPv6' '
+ git url-parse "git://[::1]:9418/repository/path" &&
+ git url-parse "git://[::1]/repository/path"
+'
+
+test_expect_success 'git url-parse -- http syntax with IPv6' '
+ git url-parse "https://[::1]:1234/repository/path" &&
+ git url-parse "https://[::1]/repository/path" &&
+ git url-parse "http://[2001:db8::1]/repository/path"
+'
+
+test_expect_success 'git url-parse -c host -- IPv6 in URL form' '
+ test "[::1]" = "$(git url-parse -c host "ssh://user@[::1]:1234/repository/path")" &&
+ test "[::1]" = "$(git url-parse -c host "ssh://[::1]/repository/path")" &&
+ test "[2001:db8::1]" = "$(git url-parse -c host "ssh://[2001:db8::1]/repository/path")" &&
+ test "[::1]" = "$(git url-parse -c host "git://[::1]/repository/path")" &&
+ test "[2001:db8::1]" = "$(git url-parse -c host "https://[2001:db8::1]/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- IPv6 in URL form' '
+ test 1234 = "$(git url-parse -c port "ssh://user@[::1]:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "ssh://[::1]/repository/path")" &&
+ test 9418 = "$(git url-parse -c port "git://[::1]:9418/repository/path")"
+'
+
+test_expect_success 'git url-parse -- scp syntax with IPv6' '
+ git url-parse "[::1]:repository/path" &&
+ git url-parse "user@[::1]:repository/path" &&
+ git url-parse "[2001:db8::1]:repo"
+'
+
+test_expect_success 'git url-parse -- scp syntax with bracketed hostname' '
+ git url-parse "[myhost]:src" &&
+ git url-parse "user@[myhost]:src"
+'
+
+test_expect_success 'git url-parse -- scp syntax with bracketed host:port' '
+ git url-parse "[myhost:123]:src" &&
+ git url-parse "user@[myhost:123]:src"
+'
+
+test_expect_success 'git url-parse -c host -- scp+IPv6' '
+ test "[::1]" = "$(git url-parse -c host "[::1]:repository/path")" &&
+ test "[::1]" = "$(git url-parse -c host "user@[::1]:repository/path")" &&
+ test "[2001:db8::1]" = "$(git url-parse -c host "[2001:db8::1]:repo")"
+'
+
+test_expect_success 'git url-parse -c path -- scp+IPv6' '
+ test "/repository/path" = "$(git url-parse -c path "[::1]:/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "[::1]:repository/path")" &&
+ test "/repo" = "$(git url-parse -c path "[2001:db8::1]:repo")"
+'
+
+test_expect_success 'git url-parse -c host,port,path -- scp [host:port]:src' '
+ test myhost = "$(git url-parse -c host "[myhost:123]:src")" &&
+ test 123 = "$(git url-parse -c port "[myhost:123]:src")" &&
+ test "/src" = "$(git url-parse -c path "[myhost:123]:src")"
+'
+
+test_expect_success 'git url-parse -c host,path -- scp [host]:src' '
+ test myhost = "$(git url-parse -c host "[myhost]:src")" &&
+ test "/src" = "$(git url-parse -c path "[myhost]:src")"
+'
+
+test_expect_success 'git url-parse -c user -- scp with user@ and brackets' '
+ test user = "$(git url-parse -c user "user@[::1]:repo")" &&
+ test user = "$(git url-parse -c user "user@[myhost:123]:src")" &&
+ test user = "$(git url-parse -c user "user@[myhost]:src")"
+'
+
+test_expect_success 'git url-parse -- scp+IPv6 with username expansion' '
+ test "~user/repo" = "$(git url-parse -c path "[::1]:~user/repo")" &&
+ test "~user/repo" = "$(git url-parse -c path "user@[::1]:~user/repo")"
+'
+
+test_expect_success 'git url-parse fails on invalid URL' '
+ test_must_fail git url-parse "not a url"
+'
+
+test_expect_success 'git url-parse helpful error for absolute local path' '
+ test_must_fail git url-parse "/abs/path" 2>err &&
+ test_grep "is not a URL" err &&
+ test_grep "file:///abs/path" err
+'
+
+test_expect_success 'git url-parse helpful error for relative local path' '
+ test_must_fail git url-parse "./rel" 2>err &&
+ test_grep "is not a URL" err &&
+ test_grep "absolute path" err
+'
+
+test_expect_success 'git url-parse fails on unknown -c component name' '
+ test_must_fail git url-parse -c bogus "https://example.com/repo"
+'
+
+test_expect_success 'git url-parse fails on URL missing host' '
+ test_must_fail git url-parse "https://"
+'
+
+test_expect_success 'git url-parse with no URL prints usage' '
+ test_must_fail git url-parse 2>err &&
+ test_grep "usage:" err
+'
+
+test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v3 0/8] builtin: implement, document and test url-parse
2026-05-01 23:15 ` [PATCH v2 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (7 preceding siblings ...)
2026-05-01 23:15 ` [PATCH v2 8/8] t9904: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-02 5:28 ` Matheus Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 1/8] connect: rename enum protocol to url_scheme Matheus Afonso Martins Moreira via GitGitGadget
` (9 more replies)
8 siblings, 10 replies; 44+ messages in thread
From: Matheus Moreira via GitGitGadget @ 2026-05-02 5:28 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira
This series adds git url-parse, a plumbing builtin for inspecting git URLs.
Git accepts a wider variety of URL forms than any standard parser handles.
The supported forms include RFC URLs, file:// URLs, scp-style
[user@]host:path for SSH, and IPv6 in brackets. Tools wanting to reason
about them have historically had to reimplement git's parsing or shell out
indirectly. With git url-parse, scripts can ask git directly: validate a
URL, extract a component (scheme, user, host, port, path, password), or
both.
The series consists of eight commits.
The first four are preparatory. They rename enum protocol to enum url_scheme
for RFC alignment, move url_is_local_not_ssh and the scheme-detection
routines from connect.c to url.h/url.c, and stop url_get_scheme from dying
on unknown schemes so other parsers can handle unknowns gracefully.
The fifth commit defines the new parser, url_parse, in urlmatch.c. It is
adapted from parse_connect_url and uses the same data structures as
url_normalize. The parser returns NULL on failure with err populated, and
exposes URL components as offset/length pairs into the normalized URL
buffer.
The sixth commit adds the user-facing command, with a helpful error when the
input looks like a local path rather than a URL.
The last two commits are documentation (a manpage) and 53 tests covering URL
form, scp form, IPv6 in URL and scp forms, bracket forms, username
expansion, query/fragment stripping, the local-path error, and
validation-only mode.
Several choices in this series are judgment calls. Happy to amend or follow
up on any of them.
The component name is scheme, not protocol. RFC 1738/3986 calls them
schemes. The series renames enum protocol to enum url_scheme internally, and
the user-facing component name follows the same direction. I considered
accepting both as aliases but decided against the precedent for a new
command. If you would rather see protocol, or both protocol and scheme, that
is easy to change.
Local paths are deliberately not URLs. parse_connect_url accepts bare paths
like /abs/path or ./rel as URL_SCHEME_LOCAL. url_parse rejects them, since
url_normalize requires a scheme://host form, and silent conversion to
file:// has no good answer for relative or tilde forms. The builtin emits a
helpful error suggesting the explicit file:// form. If full git clone parity
is preferred (bare paths accepted via auto-conversion or a new flag), that
could be added.
Absent and empty components are conflated in output. --component user
http://host/ and --component user http://@host/ both produce empty lines.
The underlying struct url_info preserves the distinction: *_off == 0 vs
*_off != 0 with *_len == 0. A future option can expose it without breaking
change. Can amend this patch set if necessary.
Changes since v1:
* Bug fix: ~user paths with a query string or fragment were leaking the ?
or # into the path output. The ~user-skip logic in url_parse previously
ran only for file://. It now runs for git/ssh/scp URLs as well, matching
what parse_connect_url does and what users expect.
* Helpful error for local paths instead of the cryptic "invalid URL scheme
name or missing '://' suffix".
* -c protocol renamed to -c scheme for consistency with the internal rename
and the RFC.
* Documented the deliberate divergence from parse_connect_url (local paths
and unknown schemes) in the urlmatch commit message.
* Doc and command-list polish: purehelpers category, asciidoc placeholder
convention, [synopsis] form.
* Original micro commit style staged buildup of the builtin collapsed to a
single self-contained commit. The rest of the series is unchanged in
shape.
Changes since v2:
* Fix Windows CI failure: handle DOS drive prefix in the helpful local-path
error. With this, the message for a drive-letter input like C:/repo (or
an MSYS-mangled /abs/path that bash rewrites to D:/.../abs/path before
git sees it) gets the specific file:///<input> suggestion rather than the
generic fallback. No effect on Linux or macOS, since has_dos_drive_prefix
is a no-op on non-Windows builds.
* t9904: relax the grep on the absolute-path test from the literal
file:///abs/path to the structural file:/// (three slashes). The original
assertion depended on the input being preserved verbatim, which MSYS does
not do. The relaxed grep verifies the structurally meaningful property
(specific URL suggestion was produced, not the generic fallback) and runs
cross-platform.
Range-diff against v2:
1: 38f797362d = 1: 38f797362d connect: rename enum protocol to url_scheme 2:
a4153e1d24 = 2: a4153e1d24 url: move url_is_local_not_ssh to url.h 3:
e584fb03f3 = 3: e584fb03f3 url: move scheme detection to URL header/source
4: 7381704c38 = 4: 7381704c38 url: return URL_SCHEME_UNKNOWN instead of
dying 5: 89932a70f3 = 5: 89932a70f3 urlmatch: define url_parse function 6:
886a7d659e ! 6: af6c71227b builtin: create url-parse command @@
builtin/url-parse.c (new) + if (*url == '/') + die("'%s' is not a URL; if
you meant a local " + "repository, use 'file://%s'", url, url); ++ if
(has_dos_drive_prefix(url)) ++ die("'%s' is not a URL; if you meant a local
" ++ "repository, use 'file:///%s'", url, url); + die("'%s' is not a URL; if
you meant a local repository, " + "use a 'file://' URL with an absolute
path", url); + } 7: 3c44e0f478 = 7: 2b32cb71a3 doc: describe the url-parse
builtin 8: cf2ae409e6 ! 8: ce41d2ec50 t9904: add tests for the new url-parse
builtin @@ t/t9904-url-parse.sh (new) +test_expect_success 'git url-parse
helpful error for absolute local path' ' + test_must_fail git url-parse
"/abs/path" 2>err && + test_grep "is not a URL" err && -+ test_grep
"file:///abs/path" err ++ test_grep "file:///" err +' + +test_expect_success
'git url-parse helpful error for relative local path' '
Matheus Afonso Martins Moreira (8):
connect: rename enum protocol to url_scheme
url: move url_is_local_not_ssh to url.h
url: move scheme detection to URL header/source
url: return URL_SCHEME_UNKNOWN instead of dying
urlmatch: define url_parse function
builtin: create url-parse command
doc: describe the url-parse builtin
t9904: add tests for the new url-parse builtin
.gitignore | 1 +
Documentation/git-url-parse.adoc | 80 ++++++
Documentation/meson.build | 1 +
Makefile | 1 +
builtin.h | 1 +
builtin/url-parse.c | 135 ++++++++++
command-list.txt | 1 +
connect.c | 78 ++----
connect.h | 1 -
git.c | 1 +
meson.build | 1 +
remote.c | 1 +
t/meson.build | 1 +
t/t9904-url-parse.sh | 319 ++++++++++++++++++++++++
t/unit-tests/u-urlmatch-normalization.c | 45 ++++
url.c | 23 ++
url.h | 16 ++
urlmatch.c | 127 ++++++++++
urlmatch.h | 1 +
19 files changed, 780 insertions(+), 54 deletions(-)
create mode 100644 Documentation/git-url-parse.adoc
create mode 100644 builtin/url-parse.c
create mode 100755 t/t9904-url-parse.sh
base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1715%2Fmatheusmoreira%2Furl-parse-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1715/matheusmoreira/url-parse-v3
Pull-Request: https://github.com/git/git/pull/1715
Range-diff vs v2:
1: 38f797362d = 1: 38f797362d connect: rename enum protocol to url_scheme
2: a4153e1d24 = 2: a4153e1d24 url: move url_is_local_not_ssh to url.h
3: e584fb03f3 = 3: e584fb03f3 url: move scheme detection to URL header/source
4: 7381704c38 = 4: 7381704c38 url: return URL_SCHEME_UNKNOWN instead of dying
5: 89932a70f3 = 5: 89932a70f3 urlmatch: define url_parse function
6: 886a7d659e ! 6: af6c71227b builtin: create url-parse command
@@ builtin/url-parse.c (new)
+ if (*url == '/')
+ die("'%s' is not a URL; if you meant a local "
+ "repository, use 'file://%s'", url, url);
++ if (has_dos_drive_prefix(url))
++ die("'%s' is not a URL; if you meant a local "
++ "repository, use 'file:///%s'", url, url);
+ die("'%s' is not a URL; if you meant a local repository, "
+ "use a 'file://' URL with an absolute path", url);
+ }
7: 3c44e0f478 = 7: 2b32cb71a3 doc: describe the url-parse builtin
8: cf2ae409e6 ! 8: ce41d2ec50 t9904: add tests for the new url-parse builtin
@@ t/t9904-url-parse.sh (new)
+test_expect_success 'git url-parse helpful error for absolute local path' '
+ test_must_fail git url-parse "/abs/path" 2>err &&
+ test_grep "is not a URL" err &&
-+ test_grep "file:///abs/path" err
++ test_grep "file:///" err
+'
+
+test_expect_success 'git url-parse helpful error for relative local path' '
--
gitgitgadget
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v3 1/8] connect: rename enum protocol to url_scheme
2026-05-02 5:28 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
@ 2026-05-02 5:28 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 2/8] url: move url_is_local_not_ssh to url.h Matheus Afonso Martins Moreira via GitGitGadget
` (8 subsequent siblings)
9 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-02 5:28 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
RFC 1738 names the part of a URL before the colon a "scheme".
connect.c calls it "protocol", which is more generic
and collides with the unrelated enum protocol_version.
Rename:
enum protocol -> enum url_scheme
PROTO_* -> URL_SCHEME_*
prot_name -> url_scheme_name
get_protocol -> url_get_scheme
The local variables in parse_connect_url and git_connect
are renamed accordingly, from protocol to scheme.
No behavior change. The user-visible diagnostics
and translated error messages are preserved:
"Diag: protocol=..."
"protocol '%s' is not supported"
"unknown protocol"
This rename also prepares for moving the scheme-detection functions
to a shared header so that a future plumbing command can parse URLs
using the same logic as the connect path.
Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
connect.c | 68 +++++++++++++++++++++++++++----------------------------
1 file changed, 34 insertions(+), 34 deletions(-)
diff --git a/connect.c b/connect.c
index fcd35c5539..46da89905e 100644
--- a/connect.c
+++ b/connect.c
@@ -700,11 +700,11 @@ int server_supports(const char *feature)
return !!server_feature_value(feature, NULL);
}
-enum protocol {
- PROTO_LOCAL = 1,
- PROTO_FILE,
- PROTO_SSH,
- PROTO_GIT
+enum url_scheme {
+ URL_SCHEME_LOCAL = 1,
+ URL_SCHEME_FILE,
+ URL_SCHEME_SSH,
+ URL_SCHEME_GIT
};
int url_is_local_not_ssh(const char *url)
@@ -715,33 +715,33 @@ int url_is_local_not_ssh(const char *url)
(has_dos_drive_prefix(url) && is_valid_path(url));
}
-static const char *prot_name(enum protocol protocol)
+static const char *url_scheme_name(enum url_scheme scheme)
{
- switch (protocol) {
- case PROTO_LOCAL:
- case PROTO_FILE:
+ switch (scheme) {
+ case URL_SCHEME_LOCAL:
+ case URL_SCHEME_FILE:
return "file";
- case PROTO_SSH:
+ case URL_SCHEME_SSH:
return "ssh";
- case PROTO_GIT:
+ case URL_SCHEME_GIT:
return "git";
default:
return "unknown protocol";
}
}
-static enum protocol get_protocol(const char *name)
+static enum url_scheme url_get_scheme(const char *name)
{
if (!strcmp(name, "ssh"))
- return PROTO_SSH;
+ return URL_SCHEME_SSH;
if (!strcmp(name, "git"))
- return PROTO_GIT;
+ return URL_SCHEME_GIT;
if (!strcmp(name, "git+ssh")) /* deprecated - do not use */
- return PROTO_SSH;
+ return URL_SCHEME_SSH;
if (!strcmp(name, "ssh+git")) /* deprecated - do not use */
- return PROTO_SSH;
+ return URL_SCHEME_SSH;
if (!strcmp(name, "file"))
- return PROTO_FILE;
+ return URL_SCHEME_FILE;
die(_("protocol '%s' is not supported"), name);
}
@@ -1083,14 +1083,14 @@ static char *get_port(char *host)
* Extract protocol and relevant parts from the specified connection URL.
* The caller must free() the returned strings.
*/
-static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
- char **ret_path)
+static enum url_scheme parse_connect_url(const char *url_orig, char **ret_host,
+ char **ret_path)
{
char *url;
char *host, *path;
char *end;
int separator = '/';
- enum protocol protocol = PROTO_LOCAL;
+ enum url_scheme scheme = URL_SCHEME_LOCAL;
if (is_url(url_orig))
url = url_decode(url_orig);
@@ -1100,12 +1100,12 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
host = strstr(url, "://");
if (host) {
*host = '\0';
- protocol = get_protocol(url);
+ scheme = url_get_scheme(url);
host += 3;
} else {
host = url;
if (!url_is_local_not_ssh(url)) {
- protocol = PROTO_SSH;
+ scheme = URL_SCHEME_SSH;
separator = ':';
}
}
@@ -1116,13 +1116,13 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
*/
end = host_end(&host, 0);
- if (protocol == PROTO_LOCAL)
+ if (scheme == URL_SCHEME_LOCAL)
path = end;
- else if (protocol == PROTO_FILE && *host != '/' &&
+ else if (scheme == URL_SCHEME_FILE && *host != '/' &&
!has_dos_drive_prefix(host) &&
offset_1st_component(host - 2) > 1)
path = host - 2; /* include the leading "//" */
- else if (protocol == PROTO_FILE && has_dos_drive_prefix(end))
+ else if (scheme == URL_SCHEME_FILE && has_dos_drive_prefix(end))
path = end; /* "file://$(pwd)" may be "file://C:/projects/repo" */
else
path = strchr(end, separator);
@@ -1138,7 +1138,7 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
end = path; /* Need to \0 terminate host here */
if (separator == ':')
path++; /* path starts after ':' */
- if (protocol == PROTO_GIT || protocol == PROTO_SSH) {
+ if (scheme == URL_SCHEME_GIT || scheme == URL_SCHEME_SSH) {
if (path[1] == '~')
path++;
}
@@ -1149,7 +1149,7 @@ static enum protocol parse_connect_url(const char *url_orig, char **ret_host,
*ret_host = xstrdup(host);
*ret_path = path;
free(url);
- return protocol;
+ return scheme;
}
static const char *get_ssh_command(void)
@@ -1434,7 +1434,7 @@ struct child_process *git_connect(int fd[2], const char *url,
{
char *hostandport, *path;
struct child_process *conn;
- enum protocol protocol;
+ enum url_scheme scheme;
enum protocol_version version = get_protocol_version_config();
/*
@@ -1451,14 +1451,14 @@ struct child_process *git_connect(int fd[2], const char *url,
*/
signal(SIGCHLD, SIG_DFL);
- protocol = parse_connect_url(url, &hostandport, &path);
- if ((flags & CONNECT_DIAG_URL) && (protocol != PROTO_SSH)) {
+ scheme = parse_connect_url(url, &hostandport, &path);
+ if ((flags & CONNECT_DIAG_URL) && (scheme != URL_SCHEME_SSH)) {
printf("Diag: url=%s\n", url ? url : "NULL");
- printf("Diag: protocol=%s\n", prot_name(protocol));
+ printf("Diag: protocol=%s\n", url_scheme_name(scheme));
printf("Diag: hostandport=%s\n", hostandport ? hostandport : "NULL");
printf("Diag: path=%s\n", path ? path : "NULL");
conn = NULL;
- } else if (protocol == PROTO_GIT) {
+ } else if (scheme == URL_SCHEME_GIT) {
conn = git_connect_git(fd, hostandport, path, prog, version, flags);
conn->trace2_child_class = "transport/git";
} else {
@@ -1481,7 +1481,7 @@ struct child_process *git_connect(int fd[2], const char *url,
conn->use_shell = 1;
conn->in = conn->out = -1;
- if (protocol == PROTO_SSH) {
+ if (scheme == URL_SCHEME_SSH) {
char *ssh_host = hostandport;
const char *port = NULL;
transport_check_allowed("ssh");
@@ -1492,7 +1492,7 @@ struct child_process *git_connect(int fd[2], const char *url,
if (flags & CONNECT_DIAG_URL) {
printf("Diag: url=%s\n", url ? url : "NULL");
- printf("Diag: protocol=%s\n", prot_name(protocol));
+ printf("Diag: protocol=%s\n", url_scheme_name(scheme));
printf("Diag: userandhost=%s\n", ssh_host ? ssh_host : "NULL");
printf("Diag: port=%s\n", port ? port : "NONE");
printf("Diag: path=%s\n", path ? path : "NULL");
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v3 2/8] url: move url_is_local_not_ssh to url.h
2026-05-02 5:28 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 1/8] connect: rename enum protocol to url_scheme Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-02 5:28 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 3/8] url: move scheme detection to URL header/source Matheus Afonso Martins Moreira via GitGitGadget
` (7 subsequent siblings)
9 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-02 5:28 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Move url_is_local_not_ssh from connect.c/connect.h
to url.c/url.h so that the new url_parse function
in urlmatch.c, and any future code that needs to
distinguish a local path from an scp style SSH URL,
can reuse the heuristic without depending on connect.c.
No behavior change.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
connect.c | 8 --------
connect.h | 1 -
remote.c | 1 +
url.c | 8 ++++++++
url.h | 2 ++
5 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/connect.c b/connect.c
index 46da89905e..cb145de30e 100644
--- a/connect.c
+++ b/connect.c
@@ -707,14 +707,6 @@ enum url_scheme {
URL_SCHEME_GIT
};
-int url_is_local_not_ssh(const char *url)
-{
- const char *colon = strchr(url, ':');
- const char *slash = strchr(url, '/');
- return !colon || (slash && slash < colon) ||
- (has_dos_drive_prefix(url) && is_valid_path(url));
-}
-
static const char *url_scheme_name(enum url_scheme scheme)
{
switch (scheme) {
diff --git a/connect.h b/connect.h
index 1645126c17..8d84f6656b 100644
--- a/connect.h
+++ b/connect.h
@@ -13,7 +13,6 @@ int git_connection_is_socket(struct child_process *conn);
int server_supports(const char *feature);
int parse_feature_request(const char *features, const char *feature);
const char *server_feature_value(const char *feature, size_t *len_ret);
-int url_is_local_not_ssh(const char *url);
struct packet_reader;
enum protocol_version discover_version(struct packet_reader *reader);
diff --git a/remote.c b/remote.c
index a664cd166a..24a8118d25 100644
--- a/remote.c
+++ b/remote.c
@@ -8,6 +8,7 @@
#include "gettext.h"
#include "hex.h"
#include "remote.h"
+#include "url.h"
#include "urlmatch.h"
#include "refs.h"
#include "refspec.h"
diff --git a/url.c b/url.c
index 3ca5987e90..057576042a 100644
--- a/url.c
+++ b/url.c
@@ -132,3 +132,11 @@ void str_end_url_with_slash(const char *url, char **dest)
free(*dest);
*dest = strbuf_detach(&buf, NULL);
}
+
+int url_is_local_not_ssh(const char *url)
+{
+ const char *colon = strchr(url, ':');
+ const char *slash = strchr(url, '/');
+ return !colon || (slash && slash < colon) ||
+ (has_dos_drive_prefix(url) && is_valid_path(url));
+}
diff --git a/url.h b/url.h
index cd9140e994..39d621312f 100644
--- a/url.h
+++ b/url.h
@@ -21,6 +21,8 @@ char *url_decode_parameter_value(const char **query);
void end_url_with_slash(struct strbuf *buf, const char *url);
void str_end_url_with_slash(const char *url, char **dest);
+int url_is_local_not_ssh(const char *url);
+
/*
* The set of unreserved characters as per STD66 (RFC3986) is
* '[A-Za-z0-9-._~]'. These characters are safe to appear in URI
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v3 3/8] url: move scheme detection to URL header/source
2026-05-02 5:28 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 1/8] connect: rename enum protocol to url_scheme Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 2/8] url: move url_is_local_not_ssh to url.h Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-02 5:28 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 4/8] url: return URL_SCHEME_UNKNOWN instead of dying Matheus Afonso Martins Moreira via GitGitGadget
` (6 subsequent siblings)
9 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-02 5:28 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Move enum url_scheme and url_get_scheme()
from connect.c to url.h and url.c
so that other code can identify
a URL's scheme without depending
on connect.c.
No behavior change. url_get_scheme() still dies
on an unrecognized scheme name, with the same
translated message as before.
scheme_name() stays in connect.c
because it has no other callers.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
connect.c | 22 ----------------------
url.c | 16 ++++++++++++++++
url.h | 13 +++++++++++++
3 files changed, 29 insertions(+), 22 deletions(-)
diff --git a/connect.c b/connect.c
index cb145de30e..1ac7acc6e8 100644
--- a/connect.c
+++ b/connect.c
@@ -700,13 +700,6 @@ int server_supports(const char *feature)
return !!server_feature_value(feature, NULL);
}
-enum url_scheme {
- URL_SCHEME_LOCAL = 1,
- URL_SCHEME_FILE,
- URL_SCHEME_SSH,
- URL_SCHEME_GIT
-};
-
static const char *url_scheme_name(enum url_scheme scheme)
{
switch (scheme) {
@@ -722,21 +715,6 @@ static const char *url_scheme_name(enum url_scheme scheme)
}
}
-static enum url_scheme url_get_scheme(const char *name)
-{
- if (!strcmp(name, "ssh"))
- return URL_SCHEME_SSH;
- if (!strcmp(name, "git"))
- return URL_SCHEME_GIT;
- if (!strcmp(name, "git+ssh")) /* deprecated - do not use */
- return URL_SCHEME_SSH;
- if (!strcmp(name, "ssh+git")) /* deprecated - do not use */
- return URL_SCHEME_SSH;
- if (!strcmp(name, "file"))
- return URL_SCHEME_FILE;
- die(_("protocol '%s' is not supported"), name);
-}
-
static char *host_end(char **hoststart, int removebrackets)
{
char *host = *hoststart;
diff --git a/url.c b/url.c
index 057576042a..300acf98fe 100644
--- a/url.c
+++ b/url.c
@@ -1,4 +1,5 @@
#include "git-compat-util.h"
+#include "gettext.h"
#include "hex-ll.h"
#include "strbuf.h"
#include "url.h"
@@ -140,3 +141,18 @@ int url_is_local_not_ssh(const char *url)
return !colon || (slash && slash < colon) ||
(has_dos_drive_prefix(url) && is_valid_path(url));
}
+
+enum url_scheme url_get_scheme(const char *name)
+{
+ if (!strcmp(name, "ssh"))
+ return URL_SCHEME_SSH;
+ if (!strcmp(name, "git"))
+ return URL_SCHEME_GIT;
+ if (!strcmp(name, "git+ssh")) /* deprecated - do not use */
+ return URL_SCHEME_SSH;
+ if (!strcmp(name, "ssh+git")) /* deprecated - do not use */
+ return URL_SCHEME_SSH;
+ if (!strcmp(name, "file"))
+ return URL_SCHEME_FILE;
+ die(_("protocol '%s' is not supported"), name);
+}
diff --git a/url.h b/url.h
index 39d621312f..24c8cd91d0 100644
--- a/url.h
+++ b/url.h
@@ -23,6 +23,19 @@ void str_end_url_with_slash(const char *url, char **dest);
int url_is_local_not_ssh(const char *url);
+enum url_scheme {
+ URL_SCHEME_LOCAL = 1,
+ URL_SCHEME_FILE,
+ URL_SCHEME_SSH,
+ URL_SCHEME_GIT,
+};
+
+/*
+ * Identify the URL scheme by name. Dies if the name does not match
+ * any scheme that Git knows about.
+ */
+enum url_scheme url_get_scheme(const char *name);
+
/*
* The set of unreserved characters as per STD66 (RFC3986) is
* '[A-Za-z0-9-._~]'. These characters are safe to appear in URI
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v3 4/8] url: return URL_SCHEME_UNKNOWN instead of dying
2026-05-02 5:28 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (2 preceding siblings ...)
2026-05-02 5:28 ` [PATCH v3 3/8] url: move scheme detection to URL header/source Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-02 5:28 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 5/8] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
` (5 subsequent siblings)
9 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-02 5:28 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Enumerate a URL_SCHEME_UNKNOWN result with value 0.
Have url_get_scheme() return it for unrecognized
schemes instead of calling die() itself.
Move the die() call to parse_connect_url()
where url_get_scheme() is used.
This lets url_get_scheme() be used from contexts
that need to identify a URL's scheme without aborting
the program. For example, a future plumbing command
that validates URLs.
No external behavior change. parse_connect_url() still dies
with the same translated message for unrecognized schemes.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
connect.c | 2 ++
url.c | 3 +--
url.h | 7 ++++---
3 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/connect.c b/connect.c
index 1ac7acc6e8..73d7a6b8d0 100644
--- a/connect.c
+++ b/connect.c
@@ -1071,6 +1071,8 @@ static enum url_scheme parse_connect_url(const char *url_orig, char **ret_host,
if (host) {
*host = '\0';
scheme = url_get_scheme(url);
+ if (scheme == URL_SCHEME_UNKNOWN)
+ die(_("protocol '%s' is not supported"), url);
host += 3;
} else {
host = url;
diff --git a/url.c b/url.c
index 300acf98fe..a59818278f 100644
--- a/url.c
+++ b/url.c
@@ -1,5 +1,4 @@
#include "git-compat-util.h"
-#include "gettext.h"
#include "hex-ll.h"
#include "strbuf.h"
#include "url.h"
@@ -154,5 +153,5 @@ enum url_scheme url_get_scheme(const char *name)
return URL_SCHEME_SSH;
if (!strcmp(name, "file"))
return URL_SCHEME_FILE;
- die(_("protocol '%s' is not supported"), name);
+ return URL_SCHEME_UNKNOWN;
}
diff --git a/url.h b/url.h
index 24c8cd91d0..7289523605 100644
--- a/url.h
+++ b/url.h
@@ -24,15 +24,16 @@ void str_end_url_with_slash(const char *url, char **dest);
int url_is_local_not_ssh(const char *url);
enum url_scheme {
- URL_SCHEME_LOCAL = 1,
+ URL_SCHEME_UNKNOWN = 0,
+ URL_SCHEME_LOCAL,
URL_SCHEME_FILE,
URL_SCHEME_SSH,
URL_SCHEME_GIT,
};
/*
- * Identify the URL scheme by name. Dies if the name does not match
- * any scheme that Git knows about.
+ * Identify the URL scheme by name. Returns URL_SCHEME_UNKNOWN
+ * if the name does not match any scheme that Git knows about.
*/
enum url_scheme url_get_scheme(const char *name);
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v3 5/8] urlmatch: define url_parse function
2026-05-02 5:28 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (3 preceding siblings ...)
2026-05-02 5:28 ` [PATCH v3 4/8] url: return URL_SCHEME_UNKNOWN instead of dying Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-02 5:28 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 6/8] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
` (4 subsequent siblings)
9 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-02 5:28 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Define url_parse, a general parsing function that supports all Git URLs
including scp style URLs such as hostname:~user/repo.
It is adapted from the algorithm in connect.c's parse_connect_url
and reuses the shared enum url_scheme and url_get_scheme function
that previous commits made available in url.h. The new parser and
the connect path agree on scheme classification. url_parse has the
same interface as url_normalize and uses the same data structures.
Both functions accept the same URL forms with one deliberate
exception. Bare local paths such as "/abs/path", "./rel"
or "repo" are accepted by parse_connect_url as URL_SCHEME_LOCAL,
but rejected by url_parse because url_normalize requires a URL
with a scheme://host form. A consumer that wants to handle both
URLs and local paths needs to dispatch on url_is_local_not_ssh
before calling url_parse, just as the connect path does internally.
The duplication with parse_connect_url is intentional.
The two functions have different contracts:
- parse_connect_url
Calls die() on an unknown scheme
and returns NUL-terminated host/path
strings for the connect path
- url_parse
Returns NULL on failure while populating
out_info->err, and exposes components
as offset/length pairs into the normalized
URL buffer, matching url_normalize.
Reconciling both is possible, but not in the scope
of the current patch set.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
t/unit-tests/u-urlmatch-normalization.c | 45 +++++++++
urlmatch.c | 127 ++++++++++++++++++++++++
urlmatch.h | 1 +
3 files changed, 173 insertions(+)
diff --git a/t/unit-tests/u-urlmatch-normalization.c b/t/unit-tests/u-urlmatch-normalization.c
index 39f6e1ba26..3595d893a2 100644
--- a/t/unit-tests/u-urlmatch-normalization.c
+++ b/t/unit-tests/u-urlmatch-normalization.c
@@ -245,3 +245,48 @@ void test_urlmatch_normalization__equivalents(void)
compare_normalized_urls("https://@x.y/^/../abc", "httpS://@x.y:0443/abc", 1);
compare_normalized_urls("https://@x.y/^/..", "httpS://@x.y:0443/", 1);
}
+
+static void check_parsed_path(const char *url, const char *expected_path)
+{
+ struct url_info info;
+ char *parsed = url_parse(url, &info);
+ char *path;
+
+ cl_assert(parsed != NULL);
+ path = xstrndup(parsed + info.path_off, info.path_len);
+ cl_assert_equal_s(path, expected_path);
+ free(path);
+ free(parsed);
+}
+
+void test_urlmatch_normalization__parse_scp(void)
+{
+ check_parsed_path("host:path", "/path");
+ check_parsed_path("user@host:path", "/path");
+ check_parsed_path("host:~user/repo", "~user/repo");
+ check_parsed_path("user@host:~user/repo", "~user/repo");
+ check_parsed_path("[host]:src", "/src");
+ check_parsed_path("[host:123]:src", "/src");
+ check_parsed_path("[::1]:repo", "/repo");
+ check_parsed_path("user@[::1]:repo", "/repo");
+}
+
+void test_urlmatch_normalization__parse_url_form(void)
+{
+ check_parsed_path("ssh://host/repo", "/repo");
+ check_parsed_path("ssh://host/~user/repo", "~user/repo");
+ check_parsed_path("git://host:9418/repo", "/repo");
+ check_parsed_path("git://host/~user/repo", "~user/repo");
+ check_parsed_path("ssh://[::1]:1234/repo", "/repo");
+ check_parsed_path("http://[2001:db8::1]/repo", "/repo");
+}
+
+void test_urlmatch_normalization__parse_strips_query_and_fragment(void)
+{
+ check_parsed_path("ssh://host/~user/repo?q", "~user/repo");
+ check_parsed_path("ssh://host/~user/repo#frag", "~user/repo");
+ check_parsed_path("git://host/~user/repo?q", "~user/repo");
+ check_parsed_path("user@host:~user/repo?q", "~user/repo");
+ check_parsed_path("https://host/repo?q", "/repo");
+ check_parsed_path("https://host/repo#frag", "/repo");
+}
diff --git a/urlmatch.c b/urlmatch.c
index eea8300489..bf8cce6de9 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -5,6 +5,7 @@
#include "hex-ll.h"
#include "strbuf.h"
#include "urlmatch.h"
+#include "url.h"
#define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
#define URL_DIGIT "0123456789"
@@ -440,6 +441,132 @@ char *url_normalize(const char *url, struct url_info *out_info)
return url_normalize_1(url, out_info, 0);
}
+char *url_parse(const char *url_orig, struct url_info *out_info)
+{
+ struct strbuf url;
+ char *host, *separator;
+ char *detached, *normalized;
+ char *url_decoded;
+ enum url_scheme scheme = URL_SCHEME_LOCAL;
+ struct url_info local_info;
+ struct url_info *info = out_info ? out_info : &local_info;
+ bool scp_syntax = false;
+
+ if (is_url(url_orig))
+ url_decoded = url_decode(url_orig);
+ else
+ url_decoded = xstrdup(url_orig);
+
+ strbuf_init(&url, strlen(url_decoded) + sizeof("ssh://"));
+ strbuf_addstr(&url, url_decoded);
+ free(url_decoded);
+
+ host = strstr(url.buf, "://");
+ if (host) {
+ /*
+ * Temporarily NUL-terminate the scheme name
+ * so we can pass it to url_get_scheme(),
+ * then restore the ':' so the buffer
+ * is intact for url_normalize() below.
+ */
+ char saved = *host;
+ *host = '\0';
+ scheme = url_get_scheme(url.buf);
+ *host = saved;
+ host += 3;
+ } else {
+ if (!url_is_local_not_ssh(url.buf)) {
+ scp_syntax = true;
+ scheme = URL_SCHEME_SSH;
+ strbuf_insertstr(&url, 0, "ssh://");
+ host = url.buf + strlen("ssh://");
+ }
+ }
+
+ /*
+ * Path starts after ':' in scp style SSH URLs.
+ *
+ * The host portion can begin with an optional "user@",
+ * and the host itself can be wrapped in '[' ']' brackets.
+ * The bracket form is git's legacy way of supporting:
+ *
+ * - IPv6 literals: [::1]:repo
+ * - host:port pairs in the short form: [myhost:123]:src
+ * - Plain hostnames that happen to need bracketing: [host]:path
+ *
+ * Treat '[' followed by 0 or 1 inner colons as the host:port
+ * or plain hostname form and strip the brackets so url_normalize
+ * sees host[:port] natively. Two or more inner colons mark an
+ * IPv6 literal: keep the brackets for url_normalize to recognize.
+ *
+ * The scp path separator is the ':' that follows the host part,
+ * and we must skip over user@ and any '[...]' before searching.
+ */
+ if (scp_syntax) {
+ char *user_at;
+ char *host_start;
+ char *bracket_end;
+
+ user_at = strchr(host, '@');
+ host_start = user_at ? user_at + 1 : host;
+
+ if (*host_start == '[') {
+ char *p;
+ int inner_colons;
+
+ bracket_end = strchr(host_start, ']');
+ inner_colons = 0;
+ for (p = host_start + 1; bracket_end && p < bracket_end; p++)
+ if (*p == ':')
+ inner_colons++;
+
+ if (bracket_end && inner_colons <= 1) {
+ size_t close_off = bracket_end - url.buf;
+ size_t open_off = host_start - url.buf;
+ strbuf_remove(&url, close_off, 1);
+ strbuf_remove(&url, open_off, 1);
+ separator = url.buf + close_off - 1;
+ } else if (bracket_end) {
+ separator = strchr(bracket_end + 1, ':');
+ } else {
+ separator = strchr(host_start, ':');
+ }
+ } else {
+ separator = strchr(host_start, ':');
+ }
+
+ if (separator) {
+ if (separator[1] == '/')
+ strbuf_remove(&url, separator - url.buf, 1);
+ else
+ *separator = '/';
+ }
+ }
+
+ detached = strbuf_detach(&url, NULL);
+ normalized = url_normalize(detached, info);
+ free(detached);
+
+ if (!normalized)
+ return NULL;
+
+ /*
+ * Point path to ~ for URLs like this:
+ *
+ * ssh://host.xz/~user/repo
+ * git://host.xz/~user/repo
+ * host.xz:~user/repo
+ */
+ if (scheme == URL_SCHEME_GIT || scheme == URL_SCHEME_SSH) {
+ if (normalized[info->path_off + 1] == '~') {
+ info->path_off++;
+ info->path_len--;
+ }
+ }
+
+ return normalized;
+}
+
static size_t url_match_prefix(const char *url,
const char *url_prefix,
size_t url_prefix_len)
diff --git a/urlmatch.h b/urlmatch.h
index 5ba85cea13..6b3ce42858 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -35,6 +35,7 @@ struct url_info {
};
char *url_normalize(const char *, struct url_info *);
+char *url_parse(const char *, struct url_info *);
struct urlmatch_item {
size_t hostmatch_len;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v3 6/8] builtin: create url-parse command
2026-05-02 5:28 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (4 preceding siblings ...)
2026-05-02 5:28 ` [PATCH v3 5/8] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-02 5:28 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 7/8] doc: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
` (3 subsequent siblings)
9 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-02 5:28 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Git commands can accept a rather wide variety of URLs syntaxes.
The range of accepted inputs might expand even more in the future.
This makes the parsing of URL components difficult since standard URL
parsers cannot be used. Extracting the components of a git URL would
require implementing all the schemes that git itself supports, not to
mention tracking its development continuously in case new URL schemes
are added.
The url-parse builtin command is designed to solve this problem
by exposing git's native URL parsing facilities as a plumbing command.
Other programs can then call upon git itself to parse the git URLs
and extract their components. This should be quite useful for scripts.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
.gitignore | 1 +
Makefile | 1 +
builtin.h | 1 +
builtin/url-parse.c | 135 ++++++++++++++++++++++++++++++++++++++++++++
command-list.txt | 1 +
git.c | 1 +
meson.build | 1 +
7 files changed, 141 insertions(+)
create mode 100644 builtin/url-parse.c
diff --git a/.gitignore b/.gitignore
index 24635cf2d6..c5673daa6e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -182,6 +182,7 @@
/git-update-server-info
/git-upload-archive
/git-upload-pack
+/git-url-parse
/git-var
/git-verify-commit
/git-verify-pack
diff --git a/Makefile b/Makefile
index cedc234173..1c757a1aa0 100644
--- a/Makefile
+++ b/Makefile
@@ -1497,6 +1497,7 @@ BUILTIN_OBJS += builtin/update-ref.o
BUILTIN_OBJS += builtin/update-server-info.o
BUILTIN_OBJS += builtin/upload-archive.o
BUILTIN_OBJS += builtin/upload-pack.o
+BUILTIN_OBJS += builtin/url-parse.o
BUILTIN_OBJS += builtin/var.o
BUILTIN_OBJS += builtin/verify-commit.o
BUILTIN_OBJS += builtin/verify-pack.o
diff --git a/builtin.h b/builtin.h
index 235c51f30e..c6f7672991 100644
--- a/builtin.h
+++ b/builtin.h
@@ -271,6 +271,7 @@ int cmd_update_server_info(int argc, const char **argv, const char *prefix, stru
int cmd_upload_archive(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_upload_pack(int argc, const char **argv, const char *prefix, struct repository *repo);
+int cmd_url_parse(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_var(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_verify_commit(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_verify_tag(int argc, const char **argv, const char *prefix, struct repository *repo);
diff --git a/builtin/url-parse.c b/builtin/url-parse.c
new file mode 100644
index 0000000000..7e705538c0
--- /dev/null
+++ b/builtin/url-parse.c
@@ -0,0 +1,135 @@
+#include "builtin.h"
+#include "gettext.h"
+#include "parse-options.h"
+#include "url.h"
+#include "urlmatch.h"
+
+static const char * const builtin_url_parse_usage[] = {
+ N_("git url-parse [-c <component>] [--] <url>..."),
+ NULL
+};
+
+static char *component_arg;
+
+static struct option builtin_url_parse_options[] = {
+ OPT_STRING('c', "component", &component_arg, N_("component"),
+ N_("which URL component to extract")),
+ OPT_END(),
+};
+
+enum url_component {
+ URL_NONE = 0,
+ URL_SCHEME,
+ URL_USER,
+ URL_PASSWORD,
+ URL_HOST,
+ URL_PORT,
+ URL_PATH,
+};
+
+static void parse_or_die(const char *url, struct url_info *info)
+{
+ if (url_is_local_not_ssh(url)) {
+ if (*url == '/')
+ die("'%s' is not a URL; if you meant a local "
+ "repository, use 'file://%s'", url, url);
+ if (has_dos_drive_prefix(url))
+ die("'%s' is not a URL; if you meant a local "
+ "repository, use 'file:///%s'", url, url);
+ die("'%s' is not a URL; if you meant a local repository, "
+ "use a 'file://' URL with an absolute path", url);
+ }
+ if (!url_parse(url, info))
+ die("invalid git URL '%s': %s", url, info->err);
+}
+
+static enum url_component get_component_or_die(const char *arg)
+{
+ if (!strcmp("path", arg))
+ return URL_PATH;
+ if (!strcmp("host", arg))
+ return URL_HOST;
+ if (!strcmp("scheme", arg))
+ return URL_SCHEME;
+ if (!strcmp("user", arg))
+ return URL_USER;
+ if (!strcmp("password", arg))
+ return URL_PASSWORD;
+ if (!strcmp("port", arg))
+ return URL_PORT;
+ die("invalid git URL component '%s'", arg);
+}
+
+static char *extract_component(enum url_component component,
+ struct url_info *info)
+{
+ size_t offset, length;
+
+ switch (component) {
+ case URL_SCHEME:
+ offset = 0;
+ length = info->scheme_len;
+ break;
+ case URL_USER:
+ offset = info->user_off;
+ length = info->user_len;
+ break;
+ case URL_PASSWORD:
+ offset = info->passwd_off;
+ length = info->passwd_len;
+ break;
+ case URL_HOST:
+ offset = info->host_off;
+ length = info->host_len;
+ break;
+ case URL_PORT:
+ offset = info->port_off;
+ length = info->port_len;
+ break;
+ case URL_PATH:
+ offset = info->path_off;
+ length = info->path_len;
+ break;
+ case URL_NONE:
+ return NULL;
+ }
+
+ return xstrndup(info->url + offset, length);
+}
+
+int cmd_url_parse(int argc,
+ const char **argv,
+ const char *prefix,
+ struct repository *repo UNUSED)
+{
+ struct url_info info;
+ enum url_component selected = URL_NONE;
+ char *extracted;
+ int i;
+
+ argc = parse_options(argc, argv, prefix, builtin_url_parse_options,
+ builtin_url_parse_usage, 0);
+
+ if (argc == 0)
+ usage_with_options(builtin_url_parse_usage,
+ builtin_url_parse_options);
+
+ if (component_arg)
+ selected = get_component_or_die(component_arg);
+
+ for (i = 0; i < argc; i++) {
+ parse_or_die(argv[i], &info);
+
+ if (selected != URL_NONE) {
+ extracted = extract_component(selected, &info);
+ if (extracted) {
+ puts(extracted);
+ free(extracted);
+ }
+ }
+
+ free(info.url);
+ }
+
+ return 0;
+}
diff --git a/command-list.txt b/command-list.txt
index f9005cf459..1ede48186f 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -202,6 +202,7 @@ git-update-ref plumbingmanipulators
git-update-server-info synchingrepositories
git-upload-archive synchelpers
git-upload-pack synchelpers
+git-url-parse purehelpers
git-var plumbinginterrogators
git-verify-commit ancillaryinterrogators
git-verify-pack plumbinginterrogators
diff --git a/git.c b/git.c
index 5a40eab8a2..a073eed931 100644
--- a/git.c
+++ b/git.c
@@ -670,6 +670,7 @@ static struct cmd_struct commands[] = {
{ "upload-archive", cmd_upload_archive, NO_PARSEOPT },
{ "upload-archive--writer", cmd_upload_archive_writer, NO_PARSEOPT },
{ "upload-pack", cmd_upload_pack },
+ { "url-parse", cmd_url_parse },
{ "var", cmd_var, RUN_SETUP_GENTLY | NO_PARSEOPT },
{ "verify-commit", cmd_verify_commit, RUN_SETUP },
{ "verify-pack", cmd_verify_pack },
diff --git a/meson.build b/meson.build
index 11488623bf..dc3cf68ee5 100644
--- a/meson.build
+++ b/meson.build
@@ -686,6 +686,7 @@ builtin_sources = [
'builtin/update-server-info.c',
'builtin/upload-archive.c',
'builtin/upload-pack.c',
+ 'builtin/url-parse.c',
'builtin/var.c',
'builtin/verify-commit.c',
'builtin/verify-pack.c',
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v3 7/8] doc: describe the url-parse builtin
2026-05-02 5:28 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (5 preceding siblings ...)
2026-05-02 5:28 ` [PATCH v3 6/8] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-02 5:28 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 8/8] t9904: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
` (2 subsequent siblings)
9 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-02 5:28 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
The new url-parse builtin validates git URLs
and optionally extracts their components.
Helped-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
Documentation/git-url-parse.adoc | 80 ++++++++++++++++++++++++++++++++
Documentation/meson.build | 1 +
2 files changed, 81 insertions(+)
create mode 100644 Documentation/git-url-parse.adoc
diff --git a/Documentation/git-url-parse.adoc b/Documentation/git-url-parse.adoc
new file mode 100644
index 0000000000..9d0d93da4a
--- /dev/null
+++ b/Documentation/git-url-parse.adoc
@@ -0,0 +1,80 @@
+git-url-parse(1)
+================
+
+NAME
+----
+git-url-parse - Parse and extract git URL components
+
+SYNOPSIS
+--------
+[synopsis]
+git url-parse [-c <component>] [--] <url>...
+
+DESCRIPTION
+-----------
+
+Git supports many ways to specify URLs, some of them non-standard.
+For example, git supports the scp style [user@]host:[path] format.
+This command eases interoperability with git URLs by enabling the
+parsing and extraction of the components of all git URLs.
+
+Any syntactically valid URL is parsed, even if the scheme is not one
+git supports for fetching or pushing.
+
+OPTIONS
+-------
+
+`-c <component>`::
+`--component <component>`::
+ Extract the _<component>_ component from the given Git URLs.
+ _<component>_ can be one of:
+ `scheme`, `user`, `password`, `host`, `port`, `path`.
+
+OUTPUT
+------
+
+When `--component` is given, the requested component of each URL
+is printed on its own line, in the order the URLs were given. If
+the URL has no such component (for example, a port in a URL that
+does not specify one), an empty line is printed in its place.
+
+When `--component` is not given, no output is produced. The exit
+status is zero if every URL parses successfully and non-zero
+otherwise, allowing the command to be used purely as a validator.
+
+EXAMPLES
+--------
+
+* Print the host name:
++
+------------
+$ git url-parse --component host https://example.com/user/repo
+example.com
+------------
+
+* Print the path:
++
+------------
+$ git url-parse --component path https://example.com/user/repo
+/user/repo
+$ git url-parse --component path example.com:~user/repo
+~user/repo
+$ git url-parse --component path example.com:user/repo
+/user/repo
+------------
+
+* Validate URLs without outputting anything:
++
+------------
+$ git url-parse https://example.com/user/repo example.com:~user/repo
+------------
+
+SEE ALSO
+--------
+linkgit:git-clone[1],
+linkgit:git-fetch[1],
+linkgit:git-config[1]
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Documentation/meson.build b/Documentation/meson.build
index d6365b888b..32c8606a80 100644
--- a/Documentation/meson.build
+++ b/Documentation/meson.build
@@ -155,6 +155,7 @@ manpages = {
'git-update-server-info.adoc' : 1,
'git-upload-archive.adoc' : 1,
'git-upload-pack.adoc' : 1,
+ 'git-url-parse.adoc' : 1,
'git-var.adoc' : 1,
'git-verify-commit.adoc' : 1,
'git-verify-pack.adoc' : 1,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v3 8/8] t9904: add tests for the new url-parse builtin
2026-05-02 5:28 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (6 preceding siblings ...)
2026-05-02 5:28 ` [PATCH v3 7/8] doc: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-02 5:28 ` Matheus Afonso Martins Moreira via GitGitGadget
2026-05-03 3:49 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Junio C Hamano
2026-05-03 17:28 ` Torsten Bögershausen
9 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira via GitGitGadget @ 2026-05-02 5:28 UTC (permalink / raw)
To: git
Cc: Torsten Bögershausen, Ghanshyam Thakkar, Matheus Moreira,
Matheus Afonso Martins Moreira
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Test git URL parsing, validation and component extraction
on all documented git URL schemes and syntaxes.
Add IPv6 host coverage in URL form:
ssh://[::1]/path
ssh://user@[::1]:1234/path
git://[::1]:9418/path
http://[2001:db8::1]/path
https://[2001:db8::1]/path
In URL form the brackets are kept in the host component (RFC 3986
syntax for IPv6 literals).
Also exercise the bracketed scp short forms that t5601-clone.sh
covers via parse_connect_url:
[host]:path
[host:port]:path
[::1]:repo
user@[::1]:repo
user@[host:port]:path
In scp form, brackets are kept for IPv6 literals (two or more inner
colons) and stripped for plain hostnames or host:port pairs.
Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
t/meson.build | 1 +
t/t9904-url-parse.sh | 319 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 320 insertions(+)
create mode 100755 t/t9904-url-parse.sh
diff --git a/t/meson.build b/t/meson.build
index 7528e5cda5..41b389a472 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -1114,6 +1114,7 @@ integration_tests = [
't9901-git-web--browse.sh',
't9902-completion.sh',
't9903-bash-prompt.sh',
+ 't9904-url-parse.sh',
]
benchmarks = [
diff --git a/t/t9904-url-parse.sh b/t/t9904-url-parse.sh
new file mode 100755
index 0000000000..8a369d2040
--- /dev/null
+++ b/t/t9904-url-parse.sh
@@ -0,0 +1,319 @@
+#!/bin/sh
+#
+# Copyright (c) 2024 Matheus Afonso Martins Moreira
+#
+
+test_description='git url-parse tests'
+
+. ./test-lib.sh
+
+test_expect_success 'git url-parse -- ssh syntax' '
+ git url-parse "ssh://user@example.com:1234/repository/path" &&
+ git url-parse "ssh://user@example.com/repository/path" &&
+ git url-parse "ssh://example.com:1234/repository/path" &&
+ git url-parse "ssh://example.com/repository/path"
+'
+
+test_expect_success 'git url-parse -- git syntax' '
+ git url-parse "git://example.com:1234/repository/path" &&
+ git url-parse "git://example.com/repository/path"
+'
+
+test_expect_success 'git url-parse -- http syntax' '
+ git url-parse "https://example.com:1234/repository/path" &&
+ git url-parse "https://example.com/repository/path" &&
+ git url-parse "http://example.com:1234/repository/path" &&
+ git url-parse "http://example.com/repository/path"
+'
+
+test_expect_success 'git url-parse -- scp syntax' '
+ git url-parse "user@example.com:/repository/path" &&
+ git url-parse "example.com:/repository/path"
+'
+
+test_expect_success 'git url-parse -- username expansion - ssh syntax' '
+ git url-parse "ssh://user@example.com:1234/~user/repository" &&
+ git url-parse "ssh://user@example.com/~user/repository" &&
+ git url-parse "ssh://example.com:1234/~user/repository" &&
+ git url-parse "ssh://example.com/~user/repository"
+'
+
+test_expect_success 'git url-parse -- username expansion - git syntax' '
+ git url-parse "git://example.com:1234/~user/repository" &&
+ git url-parse "git://example.com/~user/repository"
+'
+
+test_expect_success 'git url-parse -- username expansion - scp syntax' '
+ git url-parse "user@example.com:~user/repository" &&
+ git url-parse "example.com:~user/repository"
+'
+
+test_expect_success 'git url-parse -- file urls' '
+ git url-parse "file:///repository/path" &&
+ git url-parse "file://"
+'
+
+test_expect_success 'git url-parse -c scheme -- ssh syntax' '
+ test ssh = "$(git url-parse -c scheme "ssh://user@example.com:1234/repository/path")" &&
+ test ssh = "$(git url-parse -c scheme "ssh://user@example.com/repository/path")" &&
+ test ssh = "$(git url-parse -c scheme "ssh://example.com:1234/repository/path")" &&
+ test ssh = "$(git url-parse -c scheme "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c scheme -- git syntax' '
+ test git = "$(git url-parse -c scheme "git://example.com:1234/repository/path")" &&
+ test git = "$(git url-parse -c scheme "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c scheme -- http syntax' '
+ test https = "$(git url-parse -c scheme "https://example.com:1234/repository/path")" &&
+ test https = "$(git url-parse -c scheme "https://example.com/repository/path")" &&
+ test http = "$(git url-parse -c scheme "http://example.com:1234/repository/path")" &&
+ test http = "$(git url-parse -c scheme "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c scheme -- scp syntax' '
+ test ssh = "$(git url-parse -c scheme "user@example.com:/repository/path")" &&
+ test ssh = "$(git url-parse -c scheme "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- ssh syntax' '
+ test user = "$(git url-parse -c user "ssh://user@example.com:1234/repository/path")" &&
+ test user = "$(git url-parse -c user "ssh://user@example.com/repository/path")" &&
+ test "" = "$(git url-parse -c user "ssh://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c user "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- git syntax' '
+ test "" = "$(git url-parse -c user "git://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c user "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- http syntax' '
+ test "" = "$(git url-parse -c user "https://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c user "https://example.com/repository/path")" &&
+ test "" = "$(git url-parse -c user "http://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c user "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- scp syntax' '
+ test user = "$(git url-parse -c user "user@example.com:/repository/path")" &&
+ test "" = "$(git url-parse -c user "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c password -- http syntax' '
+ test secret = "$(git url-parse -c password "https://user:secret@example.com:1234/repository/path")" &&
+ test secret = "$(git url-parse -c password "http://user:secret@example.com/repository/path")" &&
+ test "" = "$(git url-parse -c password "https://user@example.com/repository/path")" &&
+ test "" = "$(git url-parse -c password "https://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- ssh syntax' '
+ test example.com = "$(git url-parse -c host "ssh://user@example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "ssh://user@example.com/repository/path")" &&
+ test example.com = "$(git url-parse -c host "ssh://example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- git syntax' '
+ test example.com = "$(git url-parse -c host "git://example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- http syntax' '
+ test example.com = "$(git url-parse -c host "https://example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "https://example.com/repository/path")" &&
+ test example.com = "$(git url-parse -c host "http://example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c host -- scp syntax' '
+ test example.com = "$(git url-parse -c host "user@example.com:/repository/path")" &&
+ test example.com = "$(git url-parse -c host "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- ssh syntax' '
+ test 1234 = "$(git url-parse -c port "ssh://user@example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "ssh://user@example.com/repository/path")" &&
+ test 1234 = "$(git url-parse -c port "ssh://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- git syntax' '
+ test 1234 = "$(git url-parse -c port "git://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- http syntax' '
+ test 1234 = "$(git url-parse -c port "https://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "https://example.com/repository/path")" &&
+ test 1234 = "$(git url-parse -c port "http://example.com:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- scp syntax' '
+ test "" = "$(git url-parse -c port "user@example.com:/repository/path")" &&
+ test "" = "$(git url-parse -c port "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- ssh syntax' '
+ test "/repository/path" = "$(git url-parse -c path "ssh://user@example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "ssh://user@example.com/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "ssh://example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "ssh://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- git syntax' '
+ test "/repository/path" = "$(git url-parse -c path "git://example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "git://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- http syntax' '
+ test "/repository/path" = "$(git url-parse -c path "https://example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "https://example.com/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "http://example.com:1234/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "http://example.com/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- scp syntax' '
+ test "/repository/path" = "$(git url-parse -c path "user@example.com:/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c path -- username expansion - ssh syntax' '
+ test "~user/repository" = "$(git url-parse -c path "ssh://user@example.com:1234/~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "ssh://user@example.com/~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "ssh://example.com:1234/~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "ssh://example.com/~user/repository")"
+'
+
+test_expect_success 'git url-parse -c path -- username expansion - git syntax' '
+ test "~user/repository" = "$(git url-parse -c path "git://example.com:1234/~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "git://example.com/~user/repository")"
+'
+
+test_expect_success 'git url-parse -c path -- username expansion - scp syntax' '
+ test "~user/repository" = "$(git url-parse -c path "user@example.com:~user/repository")" &&
+ test "~user/repository" = "$(git url-parse -c path "example.com:~user/repository")"
+'
+
+test_expect_success 'git url-parse -c path -- username expansion strips query and fragment' '
+ test "~user/repository" = "$(git url-parse -c path "ssh://example.com/~user/repository?query")" &&
+ test "~user/repository" = "$(git url-parse -c path "ssh://example.com/~user/repository#fragment")" &&
+ test "~user/repository" = "$(git url-parse -c path "git://example.com/~user/repository?query")" &&
+ test "~user/repository" = "$(git url-parse -c path "user@example.com:~user/repository?query")"
+'
+
+test_expect_success 'git url-parse -- ssh syntax with IPv6' '
+ git url-parse "ssh://user@[::1]:1234/repository/path" &&
+ git url-parse "ssh://user@[::1]/repository/path" &&
+ git url-parse "ssh://[::1]:1234/repository/path" &&
+ git url-parse "ssh://[::1]/repository/path" &&
+ git url-parse "ssh://[2001:db8::1]/repository/path"
+'
+
+test_expect_success 'git url-parse -- git syntax with IPv6' '
+ git url-parse "git://[::1]:9418/repository/path" &&
+ git url-parse "git://[::1]/repository/path"
+'
+
+test_expect_success 'git url-parse -- http syntax with IPv6' '
+ git url-parse "https://[::1]:1234/repository/path" &&
+ git url-parse "https://[::1]/repository/path" &&
+ git url-parse "http://[2001:db8::1]/repository/path"
+'
+
+test_expect_success 'git url-parse -c host -- IPv6 in URL form' '
+ test "[::1]" = "$(git url-parse -c host "ssh://user@[::1]:1234/repository/path")" &&
+ test "[::1]" = "$(git url-parse -c host "ssh://[::1]/repository/path")" &&
+ test "[2001:db8::1]" = "$(git url-parse -c host "ssh://[2001:db8::1]/repository/path")" &&
+ test "[::1]" = "$(git url-parse -c host "git://[::1]/repository/path")" &&
+ test "[2001:db8::1]" = "$(git url-parse -c host "https://[2001:db8::1]/repository/path")"
+'
+
+test_expect_success 'git url-parse -c port -- IPv6 in URL form' '
+ test 1234 = "$(git url-parse -c port "ssh://user@[::1]:1234/repository/path")" &&
+ test "" = "$(git url-parse -c port "ssh://[::1]/repository/path")" &&
+ test 9418 = "$(git url-parse -c port "git://[::1]:9418/repository/path")"
+'
+
+test_expect_success 'git url-parse -- scp syntax with IPv6' '
+ git url-parse "[::1]:repository/path" &&
+ git url-parse "user@[::1]:repository/path" &&
+ git url-parse "[2001:db8::1]:repo"
+'
+
+test_expect_success 'git url-parse -- scp syntax with bracketed hostname' '
+ git url-parse "[myhost]:src" &&
+ git url-parse "user@[myhost]:src"
+'
+
+test_expect_success 'git url-parse -- scp syntax with bracketed host:port' '
+ git url-parse "[myhost:123]:src" &&
+ git url-parse "user@[myhost:123]:src"
+'
+
+test_expect_success 'git url-parse -c host -- scp+IPv6' '
+ test "[::1]" = "$(git url-parse -c host "[::1]:repository/path")" &&
+ test "[::1]" = "$(git url-parse -c host "user@[::1]:repository/path")" &&
+ test "[2001:db8::1]" = "$(git url-parse -c host "[2001:db8::1]:repo")"
+'
+
+test_expect_success 'git url-parse -c path -- scp+IPv6' '
+ test "/repository/path" = "$(git url-parse -c path "[::1]:/repository/path")" &&
+ test "/repository/path" = "$(git url-parse -c path "[::1]:repository/path")" &&
+ test "/repo" = "$(git url-parse -c path "[2001:db8::1]:repo")"
+'
+
+test_expect_success 'git url-parse -c host,port,path -- scp [host:port]:src' '
+ test myhost = "$(git url-parse -c host "[myhost:123]:src")" &&
+ test 123 = "$(git url-parse -c port "[myhost:123]:src")" &&
+ test "/src" = "$(git url-parse -c path "[myhost:123]:src")"
+'
+
+test_expect_success 'git url-parse -c host,path -- scp [host]:src' '
+ test myhost = "$(git url-parse -c host "[myhost]:src")" &&
+ test "/src" = "$(git url-parse -c path "[myhost]:src")"
+'
+
+test_expect_success 'git url-parse -c user -- scp with user@ and brackets' '
+ test user = "$(git url-parse -c user "user@[::1]:repo")" &&
+ test user = "$(git url-parse -c user "user@[myhost:123]:src")" &&
+ test user = "$(git url-parse -c user "user@[myhost]:src")"
+'
+
+test_expect_success 'git url-parse -- scp+IPv6 with username expansion' '
+ test "~user/repo" = "$(git url-parse -c path "[::1]:~user/repo")" &&
+ test "~user/repo" = "$(git url-parse -c path "user@[::1]:~user/repo")"
+'
+
+test_expect_success 'git url-parse fails on invalid URL' '
+ test_must_fail git url-parse "not a url"
+'
+
+test_expect_success 'git url-parse helpful error for absolute local path' '
+ test_must_fail git url-parse "/abs/path" 2>err &&
+ test_grep "is not a URL" err &&
+ test_grep "file:///" err
+'
+
+test_expect_success 'git url-parse helpful error for relative local path' '
+ test_must_fail git url-parse "./rel" 2>err &&
+ test_grep "is not a URL" err &&
+ test_grep "absolute path" err
+'
+
+test_expect_success 'git url-parse fails on unknown -c component name' '
+ test_must_fail git url-parse -c bogus "https://example.com/repo"
+'
+
+test_expect_success 'git url-parse fails on URL missing host' '
+ test_must_fail git url-parse "https://"
+'
+
+test_expect_success 'git url-parse with no URL prints usage' '
+ test_must_fail git url-parse 2>err &&
+ test_grep "usage:" err
+'
+
+test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v3 0/8] builtin: implement, document and test url-parse
2026-05-02 5:28 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (7 preceding siblings ...)
2026-05-02 5:28 ` [PATCH v3 8/8] t9904: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
@ 2026-05-03 3:49 ` Junio C Hamano
2026-05-03 4:29 ` Matheus Afonso Martins Moreira
2026-05-03 17:28 ` Torsten Bögershausen
9 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2026-05-03 3:49 UTC (permalink / raw)
To: Matheus Moreira via GitGitGadget
Cc: git, Torsten Bögershausen, Ghanshyam Thakkar,
Matheus Moreira
"Matheus Moreira via GitGitGadget" <gitgitgadget@gmail.com> writes:
> ... Tools wanting to reason
> about them have historically had to reimplement git's parsing or shell out
> indirectly. With git url-parse, scripts can ask git directly: validate a
> URL, extract a component (scheme, user, host, port, path, password), or
> both.
Nitpick. With "git url-parse", these scripts has to do what they
traditionally have always done, i.e., shell out to the command, no?
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v3 0/8] builtin: implement, document and test url-parse
2026-05-03 3:49 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Junio C Hamano
@ 2026-05-03 4:29 ` Matheus Afonso Martins Moreira
0 siblings, 0 replies; 44+ messages in thread
From: Matheus Afonso Martins Moreira @ 2026-05-03 4:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Torsten Bögershausen, Ghanshyam Thakkar
Junio C Hamano <gitster@pobox.com> writes:
> Nitpick. With "git url-parse", these scripts has to do what they
> traditionally have always done, i.e., shell out to the command, no?
It's a good point. I should have worded it better:
Tools wanting to reason about them have historically had to
reimplement git's parsing logic externally. With git url-parse,
scripts can delegate URL parsing to git's own parser: validate
a URL, extract a component (scheme, user, host, port, path,
password), or both.
What I meant to say is that git's URL parsing was never exposed
as a standalone operation, leading external tools to reimplement
the logic themselves.
For example:
npm/git-url-parse millions of weekly downloads
crates/git-url-parse half a million downloads
With this builtin, scripts can rely on git's own parser
instead of a third party reimplementation.
References:
https://www.npmjs.com/package/git-url-parse
https://crates.io/crates/git-url-parse
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v3 0/8] builtin: implement, document and test url-parse
2026-05-02 5:28 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
` (8 preceding siblings ...)
2026-05-03 3:49 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Junio C Hamano
@ 2026-05-03 17:28 ` Torsten Bögershausen
2026-05-03 19:36 ` Matheus Afonso Martins Moreira
9 siblings, 1 reply; 44+ messages in thread
From: Torsten Bögershausen @ 2026-05-03 17:28 UTC (permalink / raw)
To: Matheus Moreira via GitGitGadget; +Cc: git, Ghanshyam Thakkar, Matheus Moreira
>
> The series consists of eight commits.
Reviewers comment: Nicely done.
> Changes since v2:
>
> * Fix Windows CI failure: handle DOS drive prefix in the helpful local-path
> error. With this, the message for a drive-letter input like C:/repo (or
> an MSYS-mangled /abs/path that bash rewrites to D:/.../abs/path before
> git sees it) gets the specific file:///<input> suggestion rather than the
> generic fallback. No effect on Linux or macOS, since has_dos_drive_prefix
> is a no-op on non-Windows builds.
>
> * t9904: relax the grep on the absolute-path test from the literal
> file:///abs/path to the structural file:/// (three slashes). The original
> assertion depended on the input being preserved verbatim, which MSYS does
> not do. The relaxed grep verifies the structurally meaningful property
> (specific URL suggestion was produced, not the generic fallback) and runs
> cross-platform.
More a question to myself, may be, about t9904 (and may be other parts)
I have in mind that the parser learned to handle
file://server/share/repo
correctly under Windows.
I don't know if this needs to be addressed here or in a follow-up commit ?
The \\server\share\repo is an UNC name, which is handled by the
Windows file system, backslashes towards windows must be used (which we do)
and '/' may be used outside Git.
commit ebb8d2c90fb0840a0803935804e37e2205505f23
Author: Torsten Bögershausen <tboegi@web.de>
Date: Sat Aug 24 15:07:59 2019 -0700
mingw: support UNC in git clone file://server/share/repo
Extend the parser to accept file://server/share/repo in the way that
Windows users expect it to be parsed who are used to referring to file
shares by UNC paths of the form \\server\share\folder.
[jes: tightened check to avoid handling file://C:/some/path as a UNC
path.]
This closes https://github.com/git-for-windows/git/issues/1264.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v3 0/8] builtin: implement, document and test url-parse
2026-05-03 17:28 ` Torsten Bögershausen
@ 2026-05-03 19:36 ` Matheus Afonso Martins Moreira
2026-05-12 3:50 ` Junio C Hamano
0 siblings, 1 reply; 44+ messages in thread
From: Matheus Afonso Martins Moreira @ 2026-05-03 19:36 UTC (permalink / raw)
To: Torsten Bögershausen
Cc: Matheus Moreira via GitGitGadget, git, Ghanshyam Thakkar
> Reviewers comment: Nicely done.
Thank you!
> More a question to myself, may be, about t9904 (and may be other parts)
> I have in mind that the parser learned to handle
>
> file://server/share/repo
> correctly under Windows.
> I don't know if this needs to be addressed here or in a follow-up commit ?
I'd be happy to revisit this in a follow-up. It's been a while
since I used MSYS but I do remember the fact it rewrites paths
internally. I wasn't sure how to handle it properly in the tests.
The problematic test case is:
test_must_fail git url-parse "/abs/path" 2>err &&
test_grep "is not a URL" err &&
test_grep "file:///abs/path" err
MSYS bash rewrites /abs/path to C:/Program Files/Git/abs/path
before git even runs. This edge case caused the error message:
fatal: 'C:/Program Files/Git/abs/path' is not a URL;
if you meant a local repository, use a 'file://' URL
with an absolute path
The test_grep "is not a URL" passed but test_grep "file:///abs/path"
failed because the suggestion did not contain the literal string
"file:///abs/path". The drive letter broke the tool's absolute
path recognition: it was printing the generic error message.
The fix was to use has_dos_drive_prefix() to recognize the edge case.
However, that led to the generation of error messages containing paths
that I wasn't sure if I could depend on in the test suite, such as:
file:///C:/Program Files/Git/abs/path
So I decided to relax the test case just a little:
test_must_fail git url-parse "/abs/path" 2>err &&
test_grep "is not a URL" err &&
test_grep "file:///" err
The "file:///" checks that the path was properly recognized
and that the friendlier error message was printed, all while
avoiding the hard coding of a "C:/Program Files/Git" prefix
that may or may not vary depending on testing environment.
In any case, the parser already handles it correctly.
It decomposes:
file://server/share/repo
As:
- scheme: file
- host: server
- path: /share/repo
Which is the correct interpretation.
On Windows, connect.c then takes that data and reconstructs
the UNC path \\server\share\repo for the filesystem.
So the UNC reconstruction happens downstream in connect.c,
not directly in the url-parse builtin or url_parse logic.
Matheus
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v3 0/8] builtin: implement, document and test url-parse
2026-05-03 19:36 ` Matheus Afonso Martins Moreira
@ 2026-05-12 3:50 ` Junio C Hamano
2026-05-12 8:57 ` Torsten Bögershausen
0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2026-05-12 3:50 UTC (permalink / raw)
To: Matheus Afonso Martins Moreira
Cc: Torsten Bögershausen, Matheus Moreira via GitGitGadget, git,
Ghanshyam Thakkar
Matheus Afonso Martins Moreira <matheus@matheusmoreira.com> writes:
>> Reviewers comment: Nicely done.
>
> Thank you!
>
>> More a question to myself, may be, about t9904 (and may be other parts)
>> I have in mind that the parser learned to handle
>>
>> file://server/share/repo
>> correctly under Windows.
>> I don't know if this needs to be addressed here or in a follow-up commit ?
>
> I'd be happy to revisit this in a follow-up. It's been a while
> since I used MSYS but I do remember the fact it rewrites paths
> internally. I wasn't sure how to handle it properly in the tests.
So the only potential thing that is missing from the series is the
above, which we are fine to postpone in a follow-up series? I think
that is a good stopping point. Given that this command is new, it
is fine that it has known and documented short-comings that will be
improved (of course on the other hand, we are not in any urgent need
for this new command, so we do not have to ship it half-baked).
Is everybody happy with the patches in the current shape and should
I mark it for 'next'?
Thanks.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v3 0/8] builtin: implement, document and test url-parse
2026-05-12 3:50 ` Junio C Hamano
@ 2026-05-12 8:57 ` Torsten Bögershausen
0 siblings, 0 replies; 44+ messages in thread
From: Torsten Bögershausen @ 2026-05-12 8:57 UTC (permalink / raw)
To: Junio C Hamano
Cc: Matheus Afonso Martins Moreira, Matheus Moreira via GitGitGadget,
git, Ghanshyam Thakkar
On Tue, May 12, 2026 at 12:50:47PM +0900, Junio C Hamano wrote:
> Matheus Afonso Martins Moreira <matheus@matheusmoreira.com> writes:
>
> >> Reviewers comment: Nicely done.
> >
> > Thank you!
> >
> >> More a question to myself, may be, about t9904 (and may be other parts)
> >> I have in mind that the parser learned to handle
> >>
> >> file://server/share/repo
> >> correctly under Windows.
> >> I don't know if this needs to be addressed here or in a follow-up commit ?
> >
> > I'd be happy to revisit this in a follow-up. It's been a while
> > since I used MSYS but I do remember the fact it rewrites paths
> > internally. I wasn't sure how to handle it properly in the tests.
>
> So the only potential thing that is missing from the series is the
> above, which we are fine to postpone in a follow-up series? I think
> that is a good stopping point. Given that this command is new, it
> is fine that it has known and documented short-comings that will be
> improved (of course on the other hand, we are not in any urgent need
> for this new command, so we do not have to ship it half-baked).
>
> Is everybody happy with the patches in the current shape and should
> I mark it for 'next'?
>
> Thanks.
>
I am happy with merging to next.
^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2026-05-12 9:02 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 01/13] url: move helper function to URL header and source Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 02/13] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
2024-05-01 22:18 ` Ghanshyam Thakkar
2024-05-02 4:02 ` Torsten Bögershausen
2024-04-28 22:30 ` [PATCH 03/13] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 04/13] url-parse: add URL parsing helper function Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 05/13] url-parse: enumerate possible URL components Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 06/13] url-parse: define component extraction helper fn Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 07/13] url-parse: define string to component converter fn Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 08/13] url-parse: define usage and options Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 09/13] url-parse: parse options given on the command line Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 10/13] url-parse: validate all given git URLs Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 11/13] url-parse: output URL components selected by user Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:31 ` [PATCH 12/13] Documentation: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
2024-04-30 7:37 ` Ghanshyam Thakkar
2024-04-28 22:31 ` [PATCH 13/13] tests: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2024-04-29 20:53 ` [PATCH 00/13] builtin: implement, document and test url-parse Torsten Bögershausen
2024-04-29 22:04 ` Reply to community feedback Matheus Afonso Martins Moreira
2024-04-30 6:51 ` Torsten Bögershausen
2026-05-01 23:15 ` [PATCH v2 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 1/8] connect: rename enum protocol to url_scheme Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 2/8] url: move url_is_local_not_ssh to url.h Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 3/8] url: move scheme detection to URL header/source Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 4/8] url: return URL_SCHEME_UNKNOWN instead of dying Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 5/8] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 6/8] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 7/8] doc: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 8/8] t9904: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 1/8] connect: rename enum protocol to url_scheme Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 2/8] url: move url_is_local_not_ssh to url.h Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 3/8] url: move scheme detection to URL header/source Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 4/8] url: return URL_SCHEME_UNKNOWN instead of dying Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 5/8] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 6/8] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 7/8] doc: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 8/8] t9904: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2026-05-03 3:49 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Junio C Hamano
2026-05-03 4:29 ` Matheus Afonso Martins Moreira
2026-05-03 17:28 ` Torsten Bögershausen
2026-05-03 19:36 ` Matheus Afonso Martins Moreira
2026-05-12 3:50 ` Junio C Hamano
2026-05-12 8:57 ` Torsten Bögershausen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox