From: "Matheus Moreira via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Torsten Bögershausen" <tboegi@web.de>,
"Ghanshyam Thakkar" <shyamthakkar001@gmail.com>,
"Matheus Moreira" <matheus@matheusmoreira.com>
Subject: [PATCH v2 0/8] builtin: implement, document and test url-parse
Date: Fri, 01 May 2026 23:15:02 +0000 [thread overview]
Message-ID: <pull.1715.v2.git.git.1777677310.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1715.git.git.1714343461.gitgitgadget@gmail.com>
This series adds git url-parse, a plumbing builtin for inspecting git URLs.
Git accepts a wider variety of URL forms than any standard parser handles.
The supported forms include RFC URLs, file:// URLs, scp-style
[user@]host:path for SSH, and IPv6 in brackets. Tools wanting to reason
about them have historically had to reimplement git's parsing or shell out
indirectly. With git url-parse, scripts can ask git directly: validate a
URL, extract a component (scheme, user, host, port, path, password), or
both.
The series consists of eight commits.
The first four are preparatory. They rename enum protocol to enum url_scheme
for RFC alignment, move url_is_local_not_ssh and the scheme-detection
routines from connect.c to url.h/url.c, and stop url_get_scheme from dying
on unknown schemes so other parsers can handle unknowns gracefully.
The fifth commit defines the new parser, url_parse, in urlmatch.c. It is
adapted from parse_connect_url and uses the same data structures as
url_normalize. The parser returns NULL on failure with err populated, and
exposes URL components as offset/length pairs into the normalized URL
buffer.
The sixth commit adds the user-facing command, with a helpful error when the
input looks like a local path rather than a URL.
The last two commits are documentation (a manpage) and 53 tests covering URL
form, scp form, IPv6 in URL and scp forms, bracket forms, username
expansion, query/fragment stripping, the local-path error, and
validation-only mode.
Several choices in this series are judgment calls. Happy to amend or follow
up on any of them.
The component name is scheme, not protocol. RFC 1738/3986 calls them
schemes. The series renames enum protocol to enum url_scheme internally, and
the user-facing component name follows the same direction. I considered
accepting both as aliases but decided against the precedent for a new
command. If you would rather see protocol, or both protocol and scheme, that
is easy to change.
Local paths are deliberately not URLs. parse_connect_url accepts bare paths
like /abs/path or ./rel as URL_SCHEME_LOCAL. url_parse rejects them, since
url_normalize requires a scheme://host form, and silent conversion to
file:// has no good answer for relative or tilde forms. The builtin emits a
helpful error suggesting the explicit file:// form. If full git clone parity
is preferred (bare paths accepted via auto-conversion or a new flag), that
could be added.
Absent and empty components are conflated in output. --component user
http://host/ and --component user http://@host/ both produce empty lines.
The underlying struct url_info preserves the distinction: *_off == 0 vs
*_off != 0 with *_len == 0. A future option can expose it without breaking
change. Can amend this patch set if necessary.
Changes since v1:
* Bug fix: ~user paths with a query string or fragment were leaking the ?
or # into the path output. The ~user-skip logic in url_parse previously
ran only for file://. It now runs for git/ssh/scp URLs as well, matching
what parse_connect_url does and what users expect.
* Helpful error for local paths instead of the cryptic "invalid URL scheme
name or missing '://' suffix".
* -c protocol renamed to -c scheme for consistency with the internal rename
and the RFC.
* Documented the deliberate divergence from parse_connect_url (local paths
and unknown schemes) in the urlmatch commit message.
* Doc and command-list polish: purehelpers category, asciidoc placeholder
convention, [synopsis] form.
* Original micro commit style staged buildup of the builtin collapsed to a
single self-contained commit. The rest of the series is unchanged in
shape.
Matheus Afonso Martins Moreira (8):
connect: rename enum protocol to url_scheme
url: move url_is_local_not_ssh to url.h
url: move scheme detection to URL header/source
url: return URL_SCHEME_UNKNOWN instead of dying
urlmatch: define url_parse function
builtin: create url-parse command
doc: describe the url-parse builtin
t9904: add tests for the new url-parse builtin
.gitignore | 1 +
Documentation/git-url-parse.adoc | 80 ++++++
Documentation/meson.build | 1 +
Makefile | 1 +
builtin.h | 1 +
builtin/url-parse.c | 132 ++++++++++
command-list.txt | 1 +
connect.c | 78 ++----
connect.h | 1 -
git.c | 1 +
meson.build | 1 +
remote.c | 1 +
t/meson.build | 1 +
t/t9904-url-parse.sh | 319 ++++++++++++++++++++++++
t/unit-tests/u-urlmatch-normalization.c | 45 ++++
url.c | 23 ++
url.h | 16 ++
urlmatch.c | 127 ++++++++++
urlmatch.h | 1 +
19 files changed, 777 insertions(+), 54 deletions(-)
create mode 100644 Documentation/git-url-parse.adoc
create mode 100644 builtin/url-parse.c
create mode 100755 t/t9904-url-parse.sh
base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1715%2Fmatheusmoreira%2Furl-parse-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1715/matheusmoreira/url-parse-v2
Pull-Request: https://github.com/git/git/pull/1715
Range-diff vs v1:
-: ---------- > 1: 38f797362d connect: rename enum protocol to url_scheme
1: 42eb0cbf68 ! 2: a4153e1d24 url: move helper function to URL header and source
@@ Metadata
Author: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
## Commit message ##
- url: move helper function to URL header and source
+ url: move url_is_local_not_ssh to url.h
- It will be used in more places so it should be placed in url.h.
+ Move url_is_local_not_ssh from connect.c/connect.h
+ to url.c/url.h so that the new url_parse function
+ in urlmatch.c, and any future code that needs to
+ distinguish a local path from an scp style SSH URL,
+ can reuse the heuristic without depending on connect.c.
+
+ No behavior change.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
## connect.c ##
-@@ connect.c: enum protocol {
- PROTO_GIT
+@@ connect.c: enum url_scheme {
+ URL_SCHEME_GIT
};
-int url_is_local_not_ssh(const char *url)
@@ connect.c: enum protocol {
- (has_dos_drive_prefix(url) && is_valid_path(url));
-}
-
- static const char *prot_name(enum protocol protocol)
+ static const char *url_scheme_name(enum url_scheme scheme)
{
- switch (protocol) {
+ switch (scheme) {
## connect.h ##
@@ connect.h: int git_connection_is_socket(struct child_process *conn);
@@ url.h: char *url_decode_parameter_value(const char **query);
+int url_is_local_not_ssh(const char *url);
+
- #endif /* URL_H */
+ /*
+ * The set of unreserved characters as per STD66 (RFC3986) is
+ * '[A-Za-z0-9-._~]'. These characters are safe to appear in URI
-: ---------- > 3: e584fb03f3 url: move scheme detection to URL header/source
-: ---------- > 4: 7381704c38 url: return URL_SCHEME_UNKNOWN instead of dying
2: 13b81b8aa0 ! 5: 89932a70f3 urlmatch: define url_parse function
@@ Metadata
## Commit message ##
urlmatch: define url_parse function
- Define general parsing function that supports all Git URLs
+ Define url_parse, a general parsing function that supports all Git URLs
including scp style URLs such as hostname:~user/repo.
- Has the same interface as the URL normalization function
- and uses the same data structures, facilitating its use.
- It's adapted from the algorithm used to process URLs in connect.c,
- so it should support the same inputs.
+
+ It is adapted from the algorithm in connect.c's parse_connect_url
+ and reuses the shared enum url_scheme and url_get_scheme function
+ that previous commits made available in url.h. The new parser and
+ the connect path agree on scheme classification. url_parse has the
+ same interface as url_normalize and uses the same data structures.
+
+ Both functions accept the same URL forms with one deliberate
+ exception. Bare local paths such as "/abs/path", "./rel"
+ or "repo" are accepted by parse_connect_url as URL_SCHEME_LOCAL,
+ but rejected by url_parse because url_normalize requires a URL
+ with a scheme://host form. A consumer that wants to handle both
+ URLs and local paths needs to dispatch on url_is_local_not_ssh
+ before calling url_parse, just as the connect path does internally.
+
+ The duplication with parse_connect_url is intentional.
+ The two functions have different contracts:
+
+ - parse_connect_url
+
+ Calls die() on an unknown scheme
+ and returns NUL-terminated host/path
+ strings for the connect path
+
+ - url_parse
+
+ Returns NULL on failure while populating
+ out_info->err, and exposes components
+ as offset/length pairs into the normalized
+ URL buffer, matching url_normalize.
+
+ Reconciling both is possible, but not in the scope
+ of the current patch set.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
+ ## t/unit-tests/u-urlmatch-normalization.c ##
+@@ t/unit-tests/u-urlmatch-normalization.c: void test_urlmatch_normalization__equivalents(void)
+ compare_normalized_urls("https://@x.y/^/../abc", "httpS://@x.y:0443/abc", 1);
+ compare_normalized_urls("https://@x.y/^/..", "httpS://@x.y:0443/", 1);
+ }
++
++static void check_parsed_path(const char *url, const char *expected_path)
++{
++ struct url_info info;
++ char *parsed = url_parse(url, &info);
++ char *path;
++
++ cl_assert(parsed != NULL);
++ path = xstrndup(parsed + info.path_off, info.path_len);
++ cl_assert_equal_s(path, expected_path);
++ free(path);
++ free(parsed);
++}
++
++void test_urlmatch_normalization__parse_scp(void)
++{
++ check_parsed_path("host:path", "/path");
++ check_parsed_path("user@host:path", "/path");
++ check_parsed_path("host:~user/repo", "~user/repo");
++ check_parsed_path("user@host:~user/repo", "~user/repo");
++ check_parsed_path("[host]:src", "/src");
++ check_parsed_path("[host:123]:src", "/src");
++ check_parsed_path("[::1]:repo", "/repo");
++ check_parsed_path("user@[::1]:repo", "/repo");
++}
++
++void test_urlmatch_normalization__parse_url_form(void)
++{
++ check_parsed_path("ssh://host/repo", "/repo");
++ check_parsed_path("ssh://host/~user/repo", "~user/repo");
++ check_parsed_path("git://host:9418/repo", "/repo");
++ check_parsed_path("git://host/~user/repo", "~user/repo");
++ check_parsed_path("ssh://[::1]:1234/repo", "/repo");
++ check_parsed_path("http://[2001:db8::1]/repo", "/repo");
++}
++
++void test_urlmatch_normalization__parse_strips_query_and_fragment(void)
++{
++ check_parsed_path("ssh://host/~user/repo?q", "~user/repo");
++ check_parsed_path("ssh://host/~user/repo#frag", "~user/repo");
++ check_parsed_path("git://host/~user/repo?q", "~user/repo");
++ check_parsed_path("user@host:~user/repo?q", "~user/repo");
++ check_parsed_path("https://host/repo?q", "/repo");
++ check_parsed_path("https://host/repo#frag", "/repo");
++}
+
## urlmatch.c ##
@@
#include "hex-ll.h"
@@ urlmatch.c: char *url_normalize(const char *url, struct url_info *out_info)
return url_normalize_1(url, out_info, 0);
}
-+enum protocol {
-+ PROTO_UNKNOWN = 0,
-+ PROTO_LOCAL,
-+ PROTO_FILE,
-+ PROTO_SSH,
-+ PROTO_GIT,
-+};
-+
-+static enum protocol url_get_protocol(const char *name, size_t n)
-+{
-+ if (!strncmp(name, "ssh", n))
-+ return PROTO_SSH;
-+ if (!strncmp(name, "git", n))
-+ return PROTO_GIT;
-+ if (!strncmp(name, "git+ssh", n)) /* deprecated - do not use */
-+ return PROTO_SSH;
-+ if (!strncmp(name, "ssh+git", n)) /* deprecated - do not use */
-+ return PROTO_SSH;
-+ if (!strncmp(name, "file", n))
-+ return PROTO_FILE;
-+ return PROTO_UNKNOWN;
-+}
-+
+char *url_parse(const char *url_orig, struct url_info *out_info)
+{
+ struct strbuf url;
+ char *host, *separator;
+ char *detached, *normalized;
-+ enum protocol protocol = PROTO_LOCAL;
++ char *url_decoded;
++ enum url_scheme scheme = URL_SCHEME_LOCAL;
+ struct url_info local_info;
-+ struct url_info *info = out_info? out_info : &local_info;
++ struct url_info *info = out_info ? out_info : &local_info;
+ bool scp_syntax = false;
+
-+ if (is_url(url_orig)) {
-+ url_orig = url_decode(url_orig);
-+ } else {
-+ url_orig = xstrdup(url_orig);
-+ }
++ if (is_url(url_orig))
++ url_decoded = url_decode(url_orig);
++ else
++ url_decoded = xstrdup(url_orig);
+
-+ strbuf_init(&url, strlen(url_orig) + sizeof("ssh://"));
-+ strbuf_addstr(&url, url_orig);
++ strbuf_init(&url, strlen(url_decoded) + sizeof("ssh://"));
++ strbuf_addstr(&url, url_decoded);
++ free(url_decoded);
+
+ host = strstr(url.buf, "://");
+ if (host) {
-+ protocol = url_get_protocol(url.buf, host - url.buf);
++ /*
++ * Temporarily NUL-terminate the scheme name
++ * so we can pass it to url_get_scheme(),
++ * then restore the ':' so the buffer
++ * is intact for url_normalize() below.
++ */
++ char saved = *host;
++ *host = '\0';
++ scheme = url_get_scheme(url.buf);
++ *host = saved;
+ host += 3;
+ } else {
+ if (!url_is_local_not_ssh(url.buf)) {
+ scp_syntax = true;
-+ protocol = PROTO_SSH;
++ scheme = URL_SCHEME_SSH;
+ strbuf_insertstr(&url, 0, "ssh://");
-+ host = url.buf + 6;
++ host = url.buf + strlen("ssh://");
+ }
+ }
+
-+ /* path starts after ':' in scp style SSH URLs */
++ /*
++ * Path starts after ':' in scp style SSH URLs.
++ *
++ * The host portion can begin with an optional "user@",
++ * and the host itself can be wrapped in '[' ']' brackets.
++ * The bracket form is git's legacy way of supporting:
++ *
++ * - IPv6 literals: [::1]:repo
++ * - host:port pairs in the short form: [myhost:123]:src
++ * - Plain hostnames that happen to need bracketing: [host]:path
++ *
++ * Treat '[' followed by 0 or 1 inner colons as the host:port
++ * or plain hostname form and strip the brackets so url_normalize
++ * sees host[:port] natively. Two or more inner colons mark an
++ * IPv6 literal: keep the brackets for url_normalize to recognize.
++ *
++ * The scp path separator is the ':' that follows the host part,
++ * and we must skip over user@ and any '[...]' before searching.
++ */
+ if (scp_syntax) {
-+ separator = strchr(host, ':');
++ char *user_at;
++ char *host_start;
++ char *bracket_end;
++
++ user_at = strchr(host, '@');
++ host_start = user_at ? user_at + 1 : host;
++
++ if (*host_start == '[') {
++ char *p;
++ int inner_colons;
++
++ bracket_end = strchr(host_start, ']');
++ inner_colons = 0;
++ for (p = host_start + 1; bracket_end && p < bracket_end; p++)
++ if (*p == ':')
++ inner_colons++;
++
++ if (bracket_end && inner_colons <= 1) {
++ size_t close_off = bracket_end - url.buf;
++ size_t open_off = host_start - url.buf;
++ strbuf_remove(&url, close_off, 1);
++ strbuf_remove(&url, open_off, 1);
++ separator = url.buf + close_off - 1;
++ } else if (bracket_end) {
++ separator = strchr(bracket_end + 1, ':');
++ } else {
++ separator = strchr(host_start, ':');
++ }
++ } else {
++ separator = strchr(host_start, ':');
++ }
++
+ if (separator) {
+ if (separator[1] == '/')
+ strbuf_remove(&url, separator - url.buf, 1);
@@ urlmatch.c: char *url_normalize(const char *url, struct url_info *out_info)
+ normalized = url_normalize(detached, info);
+ free(detached);
+
-+ if (!normalized) {
++ if (!normalized)
+ return NULL;
-+ }
+
-+ /* point path to ~ for URL's like this:
++ /*
++ * Point path to ~ for URLs like this:
+ *
+ * ssh://host.xz/~user/repo
+ * git://host.xz/~user/repo
+ * host.xz:~user/repo
-+ *
+ */
-+ if (protocol == PROTO_GIT || protocol == PROTO_SSH) {
-+ if (normalized[info->path_off + 1] == '~')
++ if (scheme == URL_SCHEME_GIT || scheme == URL_SCHEME_SSH) {
++ if (normalized[info->path_off + 1] == '~') {
+ info->path_off++;
++ info->path_len--;
++ }
+ }
+
+ return normalized;
3: e4781b36d5 ! 6: 886a7d659e builtin: create url-parse command
@@ Commit message
The url-parse builtin command is designed to solve this problem
by exposing git's native URL parsing facilities as a plumbing command.
- Other programs can then call upon git itself to parse the git URLs and
- extract their components. This should be quite useful for scripts.
+ Other programs can then call upon git itself to parse the git URLs
+ and extract their components. This should be quite useful for scripts.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
@@ Makefile: BUILTIN_OBJS += builtin/update-ref.o
BUILTIN_OBJS += builtin/verify-pack.o
## builtin.h ##
-@@ builtin.h: int cmd_update_server_info(int argc, const char **argv, const char *prefix);
- int cmd_upload_archive(int argc, const char **argv, const char *prefix);
- int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix);
- int cmd_upload_pack(int argc, const char **argv, const char *prefix);
-+int cmd_url_parse(int argc, const char **argv, const char *prefix);
- int cmd_var(int argc, const char **argv, const char *prefix);
- int cmd_verify_commit(int argc, const char **argv, const char *prefix);
- int cmd_verify_tag(int argc, const char **argv, const char *prefix);
+@@ builtin.h: int cmd_update_server_info(int argc, const char **argv, const char *prefix, stru
+ int cmd_upload_archive(int argc, const char **argv, const char *prefix, struct repository *repo);
+ int cmd_upload_archive_writer(int argc, const char **argv, const char *prefix, struct repository *repo);
+ int cmd_upload_pack(int argc, const char **argv, const char *prefix, struct repository *repo);
++int cmd_url_parse(int argc, const char **argv, const char *prefix, struct repository *repo);
+ int cmd_var(int argc, const char **argv, const char *prefix, struct repository *repo);
+ int cmd_verify_commit(int argc, const char **argv, const char *prefix, struct repository *repo);
+ int cmd_verify_tag(int argc, const char **argv, const char *prefix, struct repository *repo);
## builtin/url-parse.c (new) ##
@@
-+/* SPDX-License-Identifier: GPL-2.0-only
-+ *
-+ * url-parse - parses git URLs and extracts their components
-+ *
-+ * Copyright © 2024 Matheus Afonso Martins Moreira
-+ *
-+ * This program is free software; you can redistribute it and/or modify
-+ * it under the terms of the GNU General Public License as published by
-+ * the Free Software Foundation; version 2.
-+ */
-+
+#include "builtin.h"
+#include "gettext.h"
++#include "parse-options.h"
++#include "url.h"
++#include "urlmatch.h"
++
++static const char * const builtin_url_parse_usage[] = {
++ N_("git url-parse [-c <component>] [--] <url>..."),
++ NULL
++};
++
++static char *component_arg;
++
++static struct option builtin_url_parse_options[] = {
++ OPT_STRING('c', "component", &component_arg, N_("component"),
++ N_("which URL component to extract")),
++ OPT_END(),
++};
++
++enum url_component {
++ URL_NONE = 0,
++ URL_SCHEME,
++ URL_USER,
++ URL_PASSWORD,
++ URL_HOST,
++ URL_PORT,
++ URL_PATH,
++};
++
++static void parse_or_die(const char *url, struct url_info *info)
++{
++ if (url_is_local_not_ssh(url)) {
++ if (*url == '/')
++ die("'%s' is not a URL; if you meant a local "
++ "repository, use 'file://%s'", url, url);
++ die("'%s' is not a URL; if you meant a local repository, "
++ "use a 'file://' URL with an absolute path", url);
++ }
++ if (!url_parse(url, info))
++ die("invalid git URL '%s': %s", url, info->err);
++}
++
++static enum url_component get_component_or_die(const char *arg)
++{
++ if (!strcmp("path", arg))
++ return URL_PATH;
++ if (!strcmp("host", arg))
++ return URL_HOST;
++ if (!strcmp("scheme", arg))
++ return URL_SCHEME;
++ if (!strcmp("user", arg))
++ return URL_USER;
++ if (!strcmp("password", arg))
++ return URL_PASSWORD;
++ if (!strcmp("port", arg))
++ return URL_PORT;
++ die("invalid git URL component '%s'", arg);
++}
++
++static char *extract_component(enum url_component component,
++ struct url_info *info)
++{
++ size_t offset, length;
++
++ switch (component) {
++ case URL_SCHEME:
++ offset = 0;
++ length = info->scheme_len;
++ break;
++ case URL_USER:
++ offset = info->user_off;
++ length = info->user_len;
++ break;
++ case URL_PASSWORD:
++ offset = info->passwd_off;
++ length = info->passwd_len;
++ break;
++ case URL_HOST:
++ offset = info->host_off;
++ length = info->host_len;
++ break;
++ case URL_PORT:
++ offset = info->port_off;
++ length = info->port_len;
++ break;
++ case URL_PATH:
++ offset = info->path_off;
++ length = info->path_len;
++ break;
++ case URL_NONE:
++ return NULL;
++ }
++
++ return xstrndup(info->url + offset, length);
++}
+
-+int cmd_url_parse(int argc, const char **argv, const char *prefix)
++int cmd_url_parse(int argc,
++ const char **argv,
++ const char *prefix,
++ struct repository *repo UNUSED)
+{
++ struct url_info info;
++ enum url_component selected = URL_NONE;
++ char *extracted;
++ int i;
++
++ argc = parse_options(argc, argv, prefix, builtin_url_parse_options,
++ builtin_url_parse_usage, 0);
++
++ if (argc == 0)
++ usage_with_options(builtin_url_parse_usage,
++ builtin_url_parse_options);
++
++ if (component_arg)
++ selected = get_component_or_die(component_arg);
++
++ for (i = 0; i < argc; i++) {
++ parse_or_die(argv[i], &info);
++
++ if (selected != URL_NONE) {
++ extracted = extract_component(selected, &info);
++ if (extracted) {
++ puts(extracted);
++ free(extracted);
++ }
++ }
++
++ free(info.url);
++ }
++
+ return 0;
+}
@@ command-list.txt: git-update-ref plumbingmanipulators
git-update-server-info synchingrepositories
git-upload-archive synchelpers
git-upload-pack synchelpers
-+git-url-parse plumbinginterrogators
++git-url-parse purehelpers
git-var plumbinginterrogators
git-verify-commit ancillaryinterrogators
git-verify-pack plumbinginterrogators
@@ git.c: static struct cmd_struct commands[] = {
{ "upload-archive", cmd_upload_archive, NO_PARSEOPT },
{ "upload-archive--writer", cmd_upload_archive_writer, NO_PARSEOPT },
{ "upload-pack", cmd_upload_pack },
-+ { "url-parse", cmd_url_parse, NO_PARSEOPT },
++ { "url-parse", cmd_url_parse },
{ "var", cmd_var, RUN_SETUP_GENTLY | NO_PARSEOPT },
{ "verify-commit", cmd_verify_commit, RUN_SETUP },
{ "verify-pack", cmd_verify_pack },
+
+ ## meson.build ##
+@@ meson.build: builtin_sources = [
+ 'builtin/update-server-info.c',
+ 'builtin/upload-archive.c',
+ 'builtin/upload-pack.c',
++ 'builtin/url-parse.c',
+ 'builtin/var.c',
+ 'builtin/verify-commit.c',
+ 'builtin/verify-pack.c',
4: 1e0895651c < -: ---------- url-parse: add URL parsing helper function
5: 0bf83ee122 < -: ---------- url-parse: enumerate possible URL components
6: 149c476b1e < -: ---------- url-parse: define component extraction helper fn
7: eb9ef8a17b < -: ---------- url-parse: define string to component converter fn
8: a2acfdbc76 < -: ---------- url-parse: define usage and options
9: 5de00324fb < -: ---------- url-parse: parse options given on the command line
10: 15d355a43c < -: ---------- url-parse: validate all given git URLs
11: 4e93509c80 < -: ---------- url-parse: output URL components selected by user
12: abda074aee ! 7: 3c44e0f478 Documentation: describe the url-parse builtin
@@ Metadata
Author: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
## Commit message ##
- Documentation: describe the url-parse builtin
+ doc: describe the url-parse builtin
The new url-parse builtin validates git URLs
and optionally extracts their components.
+ Helped-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
- ## Documentation/git-url-parse.txt (new) ##
+ ## Documentation/git-url-parse.adoc (new) ##
@@
+git-url-parse(1)
+================
@@ Documentation/git-url-parse.txt (new)
+
+SYNOPSIS
+--------
-+[verse]
-+'git url-parse' [<options>] [--] <url>...
++[synopsis]
++git url-parse [-c <component>] [--] <url>...
+
+DESCRIPTION
+-----------
@@ Documentation/git-url-parse.txt (new)
+This command eases interoperability with git URLs by enabling the
+parsing and extraction of the components of all git URLs.
+
++Any syntactically valid URL is parsed, even if the scheme is not one
++git supports for fetching or pushing.
++
+OPTIONS
+-------
+
-+-c <arg>::
-+--component <arg>::
-+ Extract the `<arg>` component from the given git URLs.
-+ `<arg>` can be one of:
-+ `protocol`, `user`, `password`, `host`, `port`, `path`.
++`-c <component>`::
++`--component <component>`::
++ Extract the _<component>_ component from the given Git URLs.
++ _<component>_ can be one of:
++ `scheme`, `user`, `password`, `host`, `port`, `path`.
++
++OUTPUT
++------
++
++When `--component` is given, the requested component of each URL
++is printed on its own line, in the order the URLs were given. If
++the URL has no such component (for example, a port in a URL that
++does not specify one), an empty line is printed in its place.
++
++When `--component` is not given, no output is produced. The exit
++status is zero if every URL parses successfully and non-zero
++otherwise, allowing the command to be used purely as a validator.
+
+EXAMPLES
+--------
@@ Documentation/git-url-parse.txt (new)
++
+------------
+$ git url-parse --component path https://example.com/user/repo
-+/usr/repo
++/user/repo
+$ git url-parse --component path example.com:~user/repo
+~user/repo
+$ git url-parse --component path example.com:user/repo
@@ Documentation/git-url-parse.txt (new)
+$ git url-parse https://example.com/user/repo example.com:~user/repo
+------------
+
++SEE ALSO
++--------
++linkgit:git-clone[1],
++linkgit:git-fetch[1],
++linkgit:git-config[1]
++
+GIT
+---
+Part of the linkgit:git[1] suite
+
+ ## Documentation/meson.build ##
+@@ Documentation/meson.build: manpages = {
+ 'git-update-server-info.adoc' : 1,
+ 'git-upload-archive.adoc' : 1,
+ 'git-upload-pack.adoc' : 1,
++ 'git-url-parse.adoc' : 1,
+ 'git-var.adoc' : 1,
+ 'git-verify-commit.adoc' : 1,
+ 'git-verify-pack.adoc' : 1,
13: 33e128496b ! 8: cf2ae409e6 tests: add tests for the new url-parse builtin
@@ Metadata
Author: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
## Commit message ##
- tests: add tests for the new url-parse builtin
+ t9904: add tests for the new url-parse builtin
Test git URL parsing, validation and component extraction
on all documented git URL schemes and syntaxes.
+ Add IPv6 host coverage in URL form:
+
+ ssh://[::1]/path
+ ssh://user@[::1]:1234/path
+ git://[::1]:9418/path
+ http://[2001:db8::1]/path
+ https://[2001:db8::1]/path
+
+ In URL form the brackets are kept in the host component (RFC 3986
+ syntax for IPv6 literals).
+
+ Also exercise the bracketed scp short forms that t5601-clone.sh
+ covers via parse_connect_url:
+
+ [host]:path
+ [host:port]:path
+ [::1]:repo
+ user@[::1]:repo
+ user@[host:port]:path
+
+ In scp form, brackets are kept for IPv6 literals (two or more inner
+ colons) and stripped for plain hostnames or host:port pairs.
+
+ Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
+ ## t/meson.build ##
+@@ t/meson.build: integration_tests = [
+ 't9901-git-web--browse.sh',
+ 't9902-completion.sh',
+ 't9903-bash-prompt.sh',
++ 't9904-url-parse.sh',
+ ]
+
+ benchmarks = [
+
## t/t9904-url-parse.sh (new) ##
@@
+#!/bin/sh
+#
-+# Copyright © 2024 Matheus Afonso Martins Moreira
++# Copyright (c) 2024 Matheus Afonso Martins Moreira
+#
+
+test_description='git url-parse tests'
@@ t/t9904-url-parse.sh (new)
+
+test_expect_success 'git url-parse -- file urls' '
+ git url-parse "file:///repository/path" &&
-+ git url-parse "file:///" &&
+ git url-parse "file://"
+'
+
-+test_expect_success 'git url-parse -c protocol -- ssh syntax' '
-+ test ssh = "$(git url-parse -c protocol "ssh://user@example.com:1234/repository/path")" &&
-+ test ssh = "$(git url-parse -c protocol "ssh://user@example.com/repository/path")" &&
-+ test ssh = "$(git url-parse -c protocol "ssh://example.com:1234/repository/path")" &&
-+ test ssh = "$(git url-parse -c protocol "ssh://example.com/repository/path")"
++test_expect_success 'git url-parse -c scheme -- ssh syntax' '
++ test ssh = "$(git url-parse -c scheme "ssh://user@example.com:1234/repository/path")" &&
++ test ssh = "$(git url-parse -c scheme "ssh://user@example.com/repository/path")" &&
++ test ssh = "$(git url-parse -c scheme "ssh://example.com:1234/repository/path")" &&
++ test ssh = "$(git url-parse -c scheme "ssh://example.com/repository/path")"
+'
+
-+test_expect_success 'git url-parse -c protocol -- git syntax' '
-+ test git = "$(git url-parse -c protocol "git://example.com:1234/repository/path")" &&
-+ test git = "$(git url-parse -c protocol "git://example.com/repository/path")"
++test_expect_success 'git url-parse -c scheme -- git syntax' '
++ test git = "$(git url-parse -c scheme "git://example.com:1234/repository/path")" &&
++ test git = "$(git url-parse -c scheme "git://example.com/repository/path")"
+'
+
-+test_expect_success 'git url-parse -c protocol -- http syntax' '
-+ test https = "$(git url-parse -c protocol "https://example.com:1234/repository/path")" &&
-+ test https = "$(git url-parse -c protocol "https://example.com/repository/path")" &&
-+ test http = "$(git url-parse -c protocol "http://example.com:1234/repository/path")" &&
-+ test http = "$(git url-parse -c protocol "http://example.com/repository/path")"
++test_expect_success 'git url-parse -c scheme -- http syntax' '
++ test https = "$(git url-parse -c scheme "https://example.com:1234/repository/path")" &&
++ test https = "$(git url-parse -c scheme "https://example.com/repository/path")" &&
++ test http = "$(git url-parse -c scheme "http://example.com:1234/repository/path")" &&
++ test http = "$(git url-parse -c scheme "http://example.com/repository/path")"
+'
+
-+test_expect_success 'git url-parse -c protocol -- scp syntax' '
-+ test ssh = "$(git url-parse -c protocol "user@example.com:/repository/path")" &&
-+ test ssh = "$(git url-parse -c protocol "example.com:/repository/path")"
++test_expect_success 'git url-parse -c scheme -- scp syntax' '
++ test ssh = "$(git url-parse -c scheme "user@example.com:/repository/path")" &&
++ test ssh = "$(git url-parse -c scheme "example.com:/repository/path")"
+'
+
+test_expect_success 'git url-parse -c user -- ssh syntax' '
@@ t/t9904-url-parse.sh (new)
+ test "" = "$(git url-parse -c user "example.com:/repository/path")"
+'
+
++test_expect_success 'git url-parse -c password -- http syntax' '
++ test secret = "$(git url-parse -c password "https://user:secret@example.com:1234/repository/path")" &&
++ test secret = "$(git url-parse -c password "http://user:secret@example.com/repository/path")" &&
++ test "" = "$(git url-parse -c password "https://user@example.com/repository/path")" &&
++ test "" = "$(git url-parse -c password "https://example.com/repository/path")"
++'
++
+test_expect_success 'git url-parse -c host -- ssh syntax' '
+ test example.com = "$(git url-parse -c host "ssh://user@example.com:1234/repository/path")" &&
+ test example.com = "$(git url-parse -c host "ssh://user@example.com/repository/path")" &&
@@ t/t9904-url-parse.sh (new)
+ test "~user/repository" = "$(git url-parse -c path "example.com:~user/repository")"
+'
+
++test_expect_success 'git url-parse -c path -- username expansion strips query and fragment' '
++ test "~user/repository" = "$(git url-parse -c path "ssh://example.com/~user/repository?query")" &&
++ test "~user/repository" = "$(git url-parse -c path "ssh://example.com/~user/repository#fragment")" &&
++ test "~user/repository" = "$(git url-parse -c path "git://example.com/~user/repository?query")" &&
++ test "~user/repository" = "$(git url-parse -c path "user@example.com:~user/repository?query")"
++'
++
++test_expect_success 'git url-parse -- ssh syntax with IPv6' '
++ git url-parse "ssh://user@[::1]:1234/repository/path" &&
++ git url-parse "ssh://user@[::1]/repository/path" &&
++ git url-parse "ssh://[::1]:1234/repository/path" &&
++ git url-parse "ssh://[::1]/repository/path" &&
++ git url-parse "ssh://[2001:db8::1]/repository/path"
++'
++
++test_expect_success 'git url-parse -- git syntax with IPv6' '
++ git url-parse "git://[::1]:9418/repository/path" &&
++ git url-parse "git://[::1]/repository/path"
++'
++
++test_expect_success 'git url-parse -- http syntax with IPv6' '
++ git url-parse "https://[::1]:1234/repository/path" &&
++ git url-parse "https://[::1]/repository/path" &&
++ git url-parse "http://[2001:db8::1]/repository/path"
++'
++
++test_expect_success 'git url-parse -c host -- IPv6 in URL form' '
++ test "[::1]" = "$(git url-parse -c host "ssh://user@[::1]:1234/repository/path")" &&
++ test "[::1]" = "$(git url-parse -c host "ssh://[::1]/repository/path")" &&
++ test "[2001:db8::1]" = "$(git url-parse -c host "ssh://[2001:db8::1]/repository/path")" &&
++ test "[::1]" = "$(git url-parse -c host "git://[::1]/repository/path")" &&
++ test "[2001:db8::1]" = "$(git url-parse -c host "https://[2001:db8::1]/repository/path")"
++'
++
++test_expect_success 'git url-parse -c port -- IPv6 in URL form' '
++ test 1234 = "$(git url-parse -c port "ssh://user@[::1]:1234/repository/path")" &&
++ test "" = "$(git url-parse -c port "ssh://[::1]/repository/path")" &&
++ test 9418 = "$(git url-parse -c port "git://[::1]:9418/repository/path")"
++'
++
++test_expect_success 'git url-parse -- scp syntax with IPv6' '
++ git url-parse "[::1]:repository/path" &&
++ git url-parse "user@[::1]:repository/path" &&
++ git url-parse "[2001:db8::1]:repo"
++'
++
++test_expect_success 'git url-parse -- scp syntax with bracketed hostname' '
++ git url-parse "[myhost]:src" &&
++ git url-parse "user@[myhost]:src"
++'
++
++test_expect_success 'git url-parse -- scp syntax with bracketed host:port' '
++ git url-parse "[myhost:123]:src" &&
++ git url-parse "user@[myhost:123]:src"
++'
++
++test_expect_success 'git url-parse -c host -- scp+IPv6' '
++ test "[::1]" = "$(git url-parse -c host "[::1]:repository/path")" &&
++ test "[::1]" = "$(git url-parse -c host "user@[::1]:repository/path")" &&
++ test "[2001:db8::1]" = "$(git url-parse -c host "[2001:db8::1]:repo")"
++'
++
++test_expect_success 'git url-parse -c path -- scp+IPv6' '
++ test "/repository/path" = "$(git url-parse -c path "[::1]:/repository/path")" &&
++ test "/repository/path" = "$(git url-parse -c path "[::1]:repository/path")" &&
++ test "/repo" = "$(git url-parse -c path "[2001:db8::1]:repo")"
++'
++
++test_expect_success 'git url-parse -c host,port,path -- scp [host:port]:src' '
++ test myhost = "$(git url-parse -c host "[myhost:123]:src")" &&
++ test 123 = "$(git url-parse -c port "[myhost:123]:src")" &&
++ test "/src" = "$(git url-parse -c path "[myhost:123]:src")"
++'
++
++test_expect_success 'git url-parse -c host,path -- scp [host]:src' '
++ test myhost = "$(git url-parse -c host "[myhost]:src")" &&
++ test "/src" = "$(git url-parse -c path "[myhost]:src")"
++'
++
++test_expect_success 'git url-parse -c user -- scp with user@ and brackets' '
++ test user = "$(git url-parse -c user "user@[::1]:repo")" &&
++ test user = "$(git url-parse -c user "user@[myhost:123]:src")" &&
++ test user = "$(git url-parse -c user "user@[myhost]:src")"
++'
++
++test_expect_success 'git url-parse -- scp+IPv6 with username expansion' '
++ test "~user/repo" = "$(git url-parse -c path "[::1]:~user/repo")" &&
++ test "~user/repo" = "$(git url-parse -c path "user@[::1]:~user/repo")"
++'
++
++test_expect_success 'git url-parse fails on invalid URL' '
++ test_must_fail git url-parse "not a url"
++'
++
++test_expect_success 'git url-parse helpful error for absolute local path' '
++ test_must_fail git url-parse "/abs/path" 2>err &&
++ test_grep "is not a URL" err &&
++ test_grep "file:///abs/path" err
++'
++
++test_expect_success 'git url-parse helpful error for relative local path' '
++ test_must_fail git url-parse "./rel" 2>err &&
++ test_grep "is not a URL" err &&
++ test_grep "absolute path" err
++'
++
++test_expect_success 'git url-parse fails on unknown -c component name' '
++ test_must_fail git url-parse -c bogus "https://example.com/repo"
++'
++
++test_expect_success 'git url-parse fails on URL missing host' '
++ test_must_fail git url-parse "https://"
++'
++
++test_expect_success 'git url-parse with no URL prints usage' '
++ test_must_fail git url-parse 2>err &&
++ test_grep "usage:" err
++'
++
+test_done
--
gitgitgadget
next prev parent reply other threads:[~2026-05-01 23:15 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 01/13] url: move helper function to URL header and source Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 02/13] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
2024-05-01 22:18 ` Ghanshyam Thakkar
2024-05-02 4:02 ` Torsten Bögershausen
2024-04-28 22:30 ` [PATCH 03/13] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 04/13] url-parse: add URL parsing helper function Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 05/13] url-parse: enumerate possible URL components Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 06/13] url-parse: define component extraction helper fn Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 07/13] url-parse: define string to component converter fn Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 08/13] url-parse: define usage and options Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 09/13] url-parse: parse options given on the command line Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 10/13] url-parse: validate all given git URLs Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 11/13] url-parse: output URL components selected by user Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:31 ` [PATCH 12/13] Documentation: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
2024-04-30 7:37 ` Ghanshyam Thakkar
2024-04-28 22:31 ` [PATCH 13/13] tests: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2024-04-29 20:53 ` [PATCH 00/13] builtin: implement, document and test url-parse Torsten Bögershausen
2024-04-29 22:04 ` Reply to community feedback Matheus Afonso Martins Moreira
2024-04-30 6:51 ` Torsten Bögershausen
2026-05-01 23:15 ` Matheus Moreira via GitGitGadget [this message]
2026-05-01 23:15 ` [PATCH v2 1/8] connect: rename enum protocol to url_scheme Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 2/8] url: move url_is_local_not_ssh to url.h Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 3/8] url: move scheme detection to URL header/source Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 4/8] url: return URL_SCHEME_UNKNOWN instead of dying Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 5/8] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 6/8] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 7/8] doc: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15 ` [PATCH v2 8/8] t9904: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 1/8] connect: rename enum protocol to url_scheme Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 2/8] url: move url_is_local_not_ssh to url.h Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 3/8] url: move scheme detection to URL header/source Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 4/8] url: return URL_SCHEME_UNKNOWN instead of dying Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 5/8] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 6/8] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 7/8] doc: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02 5:28 ` [PATCH v3 8/8] t9904: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2026-05-03 3:49 ` [PATCH v3 0/8] builtin: implement, document and test url-parse Junio C Hamano
2026-05-03 4:29 ` Matheus Afonso Martins Moreira
2026-05-03 17:28 ` Torsten Bögershausen
2026-05-03 19:36 ` Matheus Afonso Martins Moreira
2026-05-12 3:50 ` Junio C Hamano
2026-05-12 8:57 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1715.v2.git.git.1777677310.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=matheus@matheusmoreira.com \
--cc=shyamthakkar001@gmail.com \
--cc=tboegi@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox