From: "Matheus Afonso Martins Moreira via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Matheus Moreira <matheus.a.m.moreira@gmail.com>,
Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Subject: [PATCH 02/13] urlmatch: define url_parse function
Date: Sun, 28 Apr 2024 22:30:50 +0000 [thread overview]
Message-ID: <13b81b8aa06cfd63a5fd9d1acbaf21a8b388ff47.1714343461.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1715.git.git.1714343461.gitgitgadget@gmail.com>
From: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
Define general parsing function that supports all Git URLs
including scp style URLs such as hostname:~user/repo.
Has the same interface as the URL normalization function
and uses the same data structures, facilitating its use.
It's adapted from the algorithm used to process URLs in connect.c,
so it should support the same inputs.
Signed-off-by: Matheus Afonso Martins Moreira <matheus@matheusmoreira.com>
---
urlmatch.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
urlmatch.h | 1 +
2 files changed, 91 insertions(+)
diff --git a/urlmatch.c b/urlmatch.c
index 1d0254abacb..5a442e31fa2 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -3,6 +3,7 @@
#include "hex-ll.h"
#include "strbuf.h"
#include "urlmatch.h"
+#include "url.h"
#define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
#define URL_DIGIT "0123456789"
@@ -438,6 +439,95 @@ char *url_normalize(const char *url, struct url_info *out_info)
return url_normalize_1(url, out_info, 0);
}
+enum protocol {
+ PROTO_UNKNOWN = 0,
+ PROTO_LOCAL,
+ PROTO_FILE,
+ PROTO_SSH,
+ PROTO_GIT,
+};
+
+static enum protocol url_get_protocol(const char *name, size_t n)
+{
+ if (!strncmp(name, "ssh", n))
+ return PROTO_SSH;
+ if (!strncmp(name, "git", n))
+ return PROTO_GIT;
+ if (!strncmp(name, "git+ssh", n)) /* deprecated - do not use */
+ return PROTO_SSH;
+ if (!strncmp(name, "ssh+git", n)) /* deprecated - do not use */
+ return PROTO_SSH;
+ if (!strncmp(name, "file", n))
+ return PROTO_FILE;
+ return PROTO_UNKNOWN;
+}
+
+char *url_parse(const char *url_orig, struct url_info *out_info)
+{
+ struct strbuf url;
+ char *host, *separator;
+ char *detached, *normalized;
+ enum protocol protocol = PROTO_LOCAL;
+ struct url_info local_info;
+ struct url_info *info = out_info? out_info : &local_info;
+ bool scp_syntax = false;
+
+ if (is_url(url_orig)) {
+ url_orig = url_decode(url_orig);
+ } else {
+ url_orig = xstrdup(url_orig);
+ }
+
+ strbuf_init(&url, strlen(url_orig) + sizeof("ssh://"));
+ strbuf_addstr(&url, url_orig);
+
+ host = strstr(url.buf, "://");
+ if (host) {
+ protocol = url_get_protocol(url.buf, host - url.buf);
+ host += 3;
+ } else {
+ if (!url_is_local_not_ssh(url.buf)) {
+ scp_syntax = true;
+ protocol = PROTO_SSH;
+ strbuf_insertstr(&url, 0, "ssh://");
+ host = url.buf + 6;
+ }
+ }
+
+ /* path starts after ':' in scp style SSH URLs */
+ if (scp_syntax) {
+ separator = strchr(host, ':');
+ if (separator) {
+ if (separator[1] == '/')
+ strbuf_remove(&url, separator - url.buf, 1);
+ else
+ *separator = '/';
+ }
+ }
+
+ detached = strbuf_detach(&url, NULL);
+ normalized = url_normalize(detached, info);
+ free(detached);
+
+ if (!normalized) {
+ return NULL;
+ }
+
+ /* point path to ~ for URL's like this:
+ *
+ * ssh://host.xz/~user/repo
+ * git://host.xz/~user/repo
+ * host.xz:~user/repo
+ *
+ */
+ if (protocol == PROTO_GIT || protocol == PROTO_SSH) {
+ if (normalized[info->path_off + 1] == '~')
+ info->path_off++;
+ }
+
+ return normalized;
+}
+
static size_t url_match_prefix(const char *url,
const char *url_prefix,
size_t url_prefix_len)
diff --git a/urlmatch.h b/urlmatch.h
index 5ba85cea139..6b3ce428582 100644
--- a/urlmatch.h
+++ b/urlmatch.h
@@ -35,6 +35,7 @@ struct url_info {
};
char *url_normalize(const char *, struct url_info *);
+char *url_parse(const char *, struct url_info *);
struct urlmatch_item {
size_t hostmatch_len;
--
gitgitgadget
next prev parent reply other threads:[~2024-04-28 22:31 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-28 22:30 [PATCH 00/13] builtin: implement, document and test url-parse Matheus Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 01/13] url: move helper function to URL header and source Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` Matheus Afonso Martins Moreira via GitGitGadget [this message]
2024-05-01 22:18 ` [PATCH 02/13] urlmatch: define url_parse function Ghanshyam Thakkar
2024-05-02 4:02 ` Torsten Bögershausen
2024-04-28 22:30 ` [PATCH 03/13] builtin: create url-parse command Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 04/13] url-parse: add URL parsing helper function Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 05/13] url-parse: enumerate possible URL components Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 06/13] url-parse: define component extraction helper fn Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 07/13] url-parse: define string to component converter fn Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 08/13] url-parse: define usage and options Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 09/13] url-parse: parse options given on the command line Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 10/13] url-parse: validate all given git URLs Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:30 ` [PATCH 11/13] url-parse: output URL components selected by user Matheus Afonso Martins Moreira via GitGitGadget
2024-04-28 22:31 ` [PATCH 12/13] Documentation: describe the url-parse builtin Matheus Afonso Martins Moreira via GitGitGadget
2024-04-30 7:37 ` Ghanshyam Thakkar
2024-04-28 22:31 ` [PATCH 13/13] tests: add tests for the new " Matheus Afonso Martins Moreira via GitGitGadget
2024-04-29 20:53 ` [PATCH 00/13] builtin: implement, document and test url-parse Torsten Bögershausen
2024-04-29 22:04 ` Reply to community feedback Matheus Afonso Martins Moreira
2024-04-30 6:51 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=13b81b8aa06cfd63a5fd9d1acbaf21a8b388ff47.1714343461.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=matheus.a.m.moreira@gmail.com \
--cc=matheus@matheusmoreira.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).