From: Junio C Hamano <gitster@pobox.com>
To: "Georgios Kontaxis via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"brian m. carlson" <sandals@crustytoothpaste.net>,
"Georgios Kontaxis" <geko1702+commits@99rst.org>
Subject: Re: [PATCH v6] gitweb: redacted e-mail addresses feature.
Date: Mon, 29 Mar 2021 13:00:17 -0700 [thread overview]
Message-ID: <xmqq5z19k9wu.fsf@gitster.g> (raw)
In-Reply-To: <pull.910.v6.git.1616973963862.gitgitgadget@gmail.com> (Georgios Kontaxis via GitGitGadget's message of "Sun, 28 Mar 2021 23:26:03 +0000")
"Georgios Kontaxis via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Georgios Kontaxis <geko1702+commits@99rst.org>
>
> Gitweb extracts content from the Git log and makes it accessible
> over HTTP. As a result, e-mail addresses found in commits are
> exposed to web crawlers and they may not respect robots.txt.
> This can result in unsolicited messages.
>
> Introduce an 'email-privacy' feature which redacts e-mail addresses
> from the generated HTML content. Specifically, obscure addresses
> retrieved from the the author/committer and comment sections of the
> Git log. The feature is off by default.
>
> This feature does not prevent someone from downloading the
> unredacted commit log, e.g., by cloning the repository, and
> extracting information from it. It aims to hinder the low-
> effort, bulk collection of e-mail addresses by web crawlers.
>
> Signed-off-by: Georgios Kontaxis <geko1702+commits@99rst.org>
> ---
> @@ -751,6 +751,17 @@ default font sizes or lineheights are changed (e.g. via adding extra
> CSS stylesheet in `@stylesheets`), it may be appropriate to change
> these values.
>
> +email-privacy::
> + Redact e-mail addresses from the generated HTML, etc. content.
> + This obscures e-mail addresses retrieved from the author/committer
> + and comment sections of the Git log.
> + It is meant to hinder web crawlers that harvest and abuse addresses.
> + Such crawlers may not respect robots.txt.
> + Note that users and user tools also see the addresses as redacted.
> + If Gitweb is not the final step in a workflow then subsequent steps
> + may misbehave because of the redacted information they receive.
> + Disabled by default.
OK. I still think everything after "Note that" is a bit redandant
(as any intelligent reader can read what the feature does and reach
its natural consequence themselves), but I do not strongly oppose
leaving it in.
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 0959a782eccb..01c6faf88006 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -569,6 +569,15 @@ sub evaluate_uri {
> 'sub' => \&feature_extra_branch_refs,
> 'override' => 0,
> 'default' => []},
> +
> + # Redact e-mail addresses.
> +
> + # To enable system wide have in $GITWEB_CONFIG
> + # $feature{'email-privacy'}{'default'} = [1];
> + 'email-privacy' => {
> + 'sub' => sub { feature_bool('email-privacy', @_) },
> + 'override' => 1,
> + 'default' => [0]},
> );
Sensible.
While reviewing this part, I noticed a few things
- A few other features explain how to toggle it system wide and
nothing else, like this one does.
- Others explain, in addition, talk about how to toggle per
repository (which may necessitate explaining how to toggle the
override bit if the default is false).
It might be a good idea to standardise the explanation somehow.
This is just an observation (in other words, nothing needs to be
done in this patch).
> @@ -3449,6 +3458,13 @@ sub parse_date {
> return %date;
> }
>
> +sub hide_mailaddrs_if_private {
> + my $line = shift;
> + return $line unless gitweb_check_feature('email-privacy');
> + $line =~ s/<[^@>]+@[^>]+>/<redacted>/ig;
> + return $line;
> +}
OK. The "this catches AUTHOR_EMAIL" pattern is a bit embarrassing
(my fault); "s/<[^@>]+@[-a-z0-9.]+>/<redacted>/ig" might be less
embarrassing but I do not care deeply enough either way. The version
in this patch is probably closer to what ident.c uses anyway ;-)
> @@ -7489,7 +7506,8 @@ sub git_log_generic {
> -accesskey => "n", -title => "Alt-n"}, "next");
> }
> my $patch_max = gitweb_get_feature('patches');
> - if ($patch_max && !defined $file_name) {
> + if ($patch_max && !defined $file_name &&
> + !gitweb_check_feature('email-privacy')) {
> if ($patch_max < 0 || @commitlist <= $patch_max) {
An observation unrelated to this change. I think checking for
negative patch_max is a bug in the original code. Everywhere else,
the way to disable the 'patch' view is to set it to 0, not negative,
and this block is already protected with "if ($patch_max".
> @@ -7550,7 +7568,8 @@ sub git_commit {
> } @$parents ) .
> ')';
> }
> - if (gitweb_check_feature('patches') && @$parents <= 1) {
> + if (gitweb_check_feature('patches') && @$parents <= 1 &&
> + !gitweb_check_feature('email-privacy')) {
> $formats_nav .= " | " .
> $cgi->a({-href => href(action=>"patch", -replay=>1)},
> "patch");
> @@ -7863,7 +7882,8 @@ sub git_commitdiff {
> $formats_nav =
> $cgi->a({-href => href(action=>"commitdiff_plain", -replay=>1)},
> "raw");
> - if ($patch_max && @{$co{'parents'}} <= 1) {
> + if ($patch_max && @{$co{'parents'}} <= 1 &&
> + !gitweb_check_feature('email-privacy')) {
> $formats_nav .= " | " .
> $cgi->a({-href => href(action=>"patch", -replay=>1)},
> "patch");
I see your wish is "we do not show any link to a patch page, but
anybody who knows what a link to the patch page looks like can craft
and ask", but I am not sure if that is worth the above three hunks.
It still feels more robust to just disable the 'patches' feature
when the email-privacy feature is enabled, but that may be just me.
Will queue as-is. Input from those who are more adept at Perl
and/or interested in helping polish gitweb are still welcome, but at
my level of interest on the topic, this version looks as good as it
gets ;-)
Thanks.
next prev parent reply other threads:[~2021-03-29 20:01 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-20 23:42 [PATCH] gitweb: redacted e-mail addresses feature Georgios Kontaxis via GitGitGadget
2021-03-21 0:42 ` Ævar Arnfjörð Bjarmason
2021-03-21 1:27 ` brian m. carlson
2021-03-21 3:30 ` Georgios Kontaxis
2021-03-21 3:32 ` [PATCH v2] " Georgios Kontaxis via GitGitGadget
2021-03-21 17:28 ` [PATCH v3] " Georgios Kontaxis via GitGitGadget
2021-03-21 18:26 ` Ævar Arnfjörð Bjarmason
2021-03-21 18:48 ` Junio C Hamano
2021-03-21 19:48 ` Georgios Kontaxis
2021-03-21 18:42 ` Junio C Hamano
2021-03-21 18:57 ` Junio C Hamano
2021-03-21 19:05 ` Junio C Hamano
2021-03-21 20:07 ` Georgios Kontaxis
2021-03-21 22:17 ` Junio C Hamano
2021-03-21 23:14 ` Georgios Kontaxis
2021-03-22 4:25 ` Junio C Hamano
2021-03-22 6:57 ` [PATCH v4] " Georgios Kontaxis via GitGitGadget
2021-03-22 18:32 ` Junio C Hamano
2021-03-22 18:58 ` Georgios Kontaxis
2021-03-28 1:41 ` Junio C Hamano
2021-03-28 21:43 ` Georgios Kontaxis
2021-03-28 22:35 ` Junio C Hamano
2021-03-23 4:27 ` Georgios Kontaxis
2021-03-27 3:56 ` [PATCH v5] " Georgios Kontaxis via GitGitGadget
2021-03-28 23:26 ` [PATCH v6] " Georgios Kontaxis via GitGitGadget
2021-03-29 20:00 ` Junio C Hamano [this message]
2021-03-31 21:14 ` Junio C Hamano
2021-04-06 0:56 ` Junio C Hamano
2021-04-08 22:43 ` Ævar Arnfjörð Bjarmason
2021-04-08 22:51 ` Junio C Hamano
2021-03-29 1:47 ` [PATCH v5] " Eric Wong
2021-03-29 3:17 ` Georgios Kontaxis
2021-04-08 17:16 ` Eric Wong
2021-04-08 21:04 ` Junio C Hamano
2021-04-08 21:19 ` Eric Wong
2021-04-08 22:45 ` Ævar Arnfjörð Bjarmason
2021-04-08 22:54 ` Junio C Hamano
2021-03-21 6:00 ` [PATCH] " Junio C Hamano
2021-03-21 6:18 ` Junio C Hamano
2021-03-21 6:43 ` Georgios Kontaxis
2021-03-21 16:55 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq5z19k9wu.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=avarab@gmail.com \
--cc=geko1702+commits@99rst.org \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.