All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Matheus Moreira via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Torsten Bögershausen" <tboegi@web.de>,
	"Ghanshyam Thakkar" <shyamthakkar001@gmail.com>,
	"Matheus Moreira" <matheus@matheusmoreira.com>
Subject: [PATCH v3 0/8] builtin: implement, document and test url-parse
Date: Sat, 02 May 2026 05:28:34 +0000	[thread overview]
Message-ID: <pull.1715.v3.git.git.1777699722.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1715.v2.git.git.1777677310.gitgitgadget@gmail.com>

This series adds git url-parse, a plumbing builtin for inspecting git URLs.
Git accepts a wider variety of URL forms than any standard parser handles.
The supported forms include RFC URLs, file:// URLs, scp-style
[user@]host:path for SSH, and IPv6 in brackets. Tools wanting to reason
about them have historically had to reimplement git's parsing or shell out
indirectly. With git url-parse, scripts can ask git directly: validate a
URL, extract a component (scheme, user, host, port, path, password), or
both.

The series consists of eight commits.

The first four are preparatory. They rename enum protocol to enum url_scheme
for RFC alignment, move url_is_local_not_ssh and the scheme-detection
routines from connect.c to url.h/url.c, and stop url_get_scheme from dying
on unknown schemes so other parsers can handle unknowns gracefully.

The fifth commit defines the new parser, url_parse, in urlmatch.c. It is
adapted from parse_connect_url and uses the same data structures as
url_normalize. The parser returns NULL on failure with err populated, and
exposes URL components as offset/length pairs into the normalized URL
buffer.

The sixth commit adds the user-facing command, with a helpful error when the
input looks like a local path rather than a URL.

The last two commits are documentation (a manpage) and 53 tests covering URL
form, scp form, IPv6 in URL and scp forms, bracket forms, username
expansion, query/fragment stripping, the local-path error, and
validation-only mode.

Several choices in this series are judgment calls. Happy to amend or follow
up on any of them.

The component name is scheme, not protocol. RFC 1738/3986 calls them
schemes. The series renames enum protocol to enum url_scheme internally, and
the user-facing component name follows the same direction. I considered
accepting both as aliases but decided against the precedent for a new
command. If you would rather see protocol, or both protocol and scheme, that
is easy to change.

Local paths are deliberately not URLs. parse_connect_url accepts bare paths
like /abs/path or ./rel as URL_SCHEME_LOCAL. url_parse rejects them, since
url_normalize requires a scheme://host form, and silent conversion to
file:// has no good answer for relative or tilde forms. The builtin emits a
helpful error suggesting the explicit file:// form. If full git clone parity
is preferred (bare paths accepted via auto-conversion or a new flag), that
could be added.

Absent and empty components are conflated in output. --component user
http://host/ and --component user http://@host/ both produce empty lines.
The underlying struct url_info preserves the distinction: *_off == 0 vs
*_off != 0 with *_len == 0. A future option can expose it without breaking
change. Can amend this patch set if necessary.

Changes since v1:

 * Bug fix: ~user paths with a query string or fragment were leaking the ?
   or # into the path output. The ~user-skip logic in url_parse previously
   ran only for file://. It now runs for git/ssh/scp URLs as well, matching
   what parse_connect_url does and what users expect.

 * Helpful error for local paths instead of the cryptic "invalid URL scheme
   name or missing '://' suffix".

 * -c protocol renamed to -c scheme for consistency with the internal rename
   and the RFC.

 * Documented the deliberate divergence from parse_connect_url (local paths
   and unknown schemes) in the urlmatch commit message.

 * Doc and command-list polish: purehelpers category, asciidoc placeholder
   convention, [synopsis] form.

 * Original micro commit style staged buildup of the builtin collapsed to a
   single self-contained commit. The rest of the series is unchanged in
   shape.

Changes since v2:

 * Fix Windows CI failure: handle DOS drive prefix in the helpful local-path
   error. With this, the message for a drive-letter input like C:/repo (or
   an MSYS-mangled /abs/path that bash rewrites to D:/.../abs/path before
   git sees it) gets the specific file:///<input> suggestion rather than the
   generic fallback. No effect on Linux or macOS, since has_dos_drive_prefix
   is a no-op on non-Windows builds.

 * t9904: relax the grep on the absolute-path test from the literal
   file:///abs/path to the structural file:/// (three slashes). The original
   assertion depended on the input being preserved verbatim, which MSYS does
   not do. The relaxed grep verifies the structurally meaningful property
   (specific URL suggestion was produced, not the generic fallback) and runs
   cross-platform.

Range-diff against v2:

1: 38f797362d = 1: 38f797362d connect: rename enum protocol to url_scheme 2:
a4153e1d24 = 2: a4153e1d24 url: move url_is_local_not_ssh to url.h 3:
e584fb03f3 = 3: e584fb03f3 url: move scheme detection to URL header/source
4: 7381704c38 = 4: 7381704c38 url: return URL_SCHEME_UNKNOWN instead of
dying 5: 89932a70f3 = 5: 89932a70f3 urlmatch: define url_parse function 6:
886a7d659e ! 6: af6c71227b builtin: create url-parse command @@
builtin/url-parse.c (new) + if (*url == '/') + die("'%s' is not a URL; if
you meant a local " + "repository, use 'file://%s'", url, url); ++ if
(has_dos_drive_prefix(url)) ++ die("'%s' is not a URL; if you meant a local
" ++ "repository, use 'file:///%s'", url, url); + die("'%s' is not a URL; if
you meant a local repository, " + "use a 'file://' URL with an absolute
path", url); + } 7: 3c44e0f478 = 7: 2b32cb71a3 doc: describe the url-parse
builtin 8: cf2ae409e6 ! 8: ce41d2ec50 t9904: add tests for the new url-parse
builtin @@ t/t9904-url-parse.sh (new) +test_expect_success 'git url-parse
helpful error for absolute local path' ' + test_must_fail git url-parse
"/abs/path" 2>err && + test_grep "is not a URL" err && -+ test_grep
"file:///abs/path" err ++ test_grep "file:///" err +' + +test_expect_success
'git url-parse helpful error for relative local path' '

Matheus Afonso Martins Moreira (8):
  connect: rename enum protocol to url_scheme
  url: move url_is_local_not_ssh to url.h
  url: move scheme detection to URL header/source
  url: return URL_SCHEME_UNKNOWN instead of dying
  urlmatch: define url_parse function
  builtin: create url-parse command
  doc: describe the url-parse builtin
  t9904: add tests for the new url-parse builtin

 .gitignore                              |   1 +
 Documentation/git-url-parse.adoc        |  80 ++++++
 Documentation/meson.build               |   1 +
 Makefile                                |   1 +
 builtin.h                               |   1 +
 builtin/url-parse.c                     | 135 ++++++++++
 command-list.txt                        |   1 +
 connect.c                               |  78 ++----
 connect.h                               |   1 -
 git.c                                   |   1 +
 meson.build                             |   1 +
 remote.c                                |   1 +
 t/meson.build                           |   1 +
 t/t9904-url-parse.sh                    | 319 ++++++++++++++++++++++++
 t/unit-tests/u-urlmatch-normalization.c |  45 ++++
 url.c                                   |  23 ++
 url.h                                   |  16 ++
 urlmatch.c                              | 127 ++++++++++
 urlmatch.h                              |   1 +
 19 files changed, 780 insertions(+), 54 deletions(-)
 create mode 100644 Documentation/git-url-parse.adoc
 create mode 100644 builtin/url-parse.c
 create mode 100755 t/t9904-url-parse.sh


base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1715%2Fmatheusmoreira%2Furl-parse-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1715/matheusmoreira/url-parse-v3
Pull-Request: https://github.com/git/git/pull/1715

Range-diff vs v2:

 1:  38f797362d = 1:  38f797362d connect: rename enum protocol to url_scheme
 2:  a4153e1d24 = 2:  a4153e1d24 url: move url_is_local_not_ssh to url.h
 3:  e584fb03f3 = 3:  e584fb03f3 url: move scheme detection to URL header/source
 4:  7381704c38 = 4:  7381704c38 url: return URL_SCHEME_UNKNOWN instead of dying
 5:  89932a70f3 = 5:  89932a70f3 urlmatch: define url_parse function
 6:  886a7d659e ! 6:  af6c71227b builtin: create url-parse command
     @@ builtin/url-parse.c (new)
      +		if (*url == '/')
      +			die("'%s' is not a URL; if you meant a local "
      +			    "repository, use 'file://%s'", url, url);
     ++		if (has_dos_drive_prefix(url))
     ++			die("'%s' is not a URL; if you meant a local "
     ++			    "repository, use 'file:///%s'", url, url);
      +		die("'%s' is not a URL; if you meant a local repository, "
      +		    "use a 'file://' URL with an absolute path", url);
      +	}
 7:  3c44e0f478 = 7:  2b32cb71a3 doc: describe the url-parse builtin
 8:  cf2ae409e6 ! 8:  ce41d2ec50 t9904: add tests for the new url-parse builtin
     @@ t/t9904-url-parse.sh (new)
      +test_expect_success 'git url-parse helpful error for absolute local path' '
      +	test_must_fail git url-parse "/abs/path" 2>err &&
      +	test_grep "is not a URL" err &&
     -+	test_grep "file:///abs/path" err
     ++	test_grep "file:///" err
      +'
      +
      +test_expect_success 'git url-parse helpful error for relative local path' '

-- 
gitgitgadget

  parent reply	other threads:[~2026-05-02  5:28 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 01/13] url: move helper function to URL header and source Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 02/13] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
2024-05-01 22:18   ` Ghanshyam Thakkar
2024-05-02  4:02     ` Torsten Bögershausen
2024-04-28 22:30 ` [PATCH 03/13] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 04/13] url-parse: add URL parsing helper function Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 05/13] url-parse: enumerate possible URL components Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 06/13] url-parse: define component extraction helper fn Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 07/13] url-parse: define string to component converter fn Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 08/13] url-parse: define usage and options Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 09/13] url-parse: parse options given on the command line Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 10/13] url-parse: validate all given git URLs Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 11/13] url-parse: output URL components selected by user Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:31 ` [PATCH 12/13] Documentation: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
2024-04-30  7:37   ` Ghanshyam Thakkar
2024-04-28 22:31 ` [PATCH 13/13] tests: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2024-04-29 20:53 ` [PATCH 00/13] builtin: implement, document and test url-parse Torsten Bögershausen
2024-04-29 22:04   ` Reply to community feedback Matheus Afonso Martins Moreira
2024-04-30  6:51     ` Torsten Bögershausen
2026-05-01 23:15 ` [PATCH v2 0/8] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 1/8] connect: rename enum protocol to url_scheme Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 2/8] url: move url_is_local_not_ssh to url.h Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 3/8] url: move scheme detection to URL header/source Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 4/8] url: return URL_SCHEME_UNKNOWN instead of dying Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 5/8] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 6/8] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 7/8] doc: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
2026-05-01 23:15   ` [PATCH v2 8/8] t9904: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28   ` Matheus Moreira via GitGitGadget [this message]
2026-05-02  5:28     ` [PATCH v3 1/8] connect: rename enum protocol to url_scheme Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 2/8] url: move url_is_local_not_ssh to url.h Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 3/8] url: move scheme detection to URL header/source Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 4/8] url: return URL_SCHEME_UNKNOWN instead of dying Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 5/8] urlmatch: define url_parse function Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 6/8] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 7/8] doc: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
2026-05-02  5:28     ` [PATCH v3 8/8] t9904: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2026-05-03  3:49     ` [PATCH v3 0/8] builtin: implement, document and test url-parse Junio C Hamano
2026-05-03  4:29       ` Matheus Afonso Martins Moreira
2026-05-03 17:28     ` Torsten Bögershausen
2026-05-03 19:36       ` Matheus Afonso Martins Moreira
2026-05-12  3:50         ` Junio C Hamano
2026-05-12  8:57           ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1715.v3.git.git.1777699722.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=matheus@matheusmoreira.com \
    --cc=shyamthakkar001@gmail.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.