All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Georgios Kontaxis via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Georgios Kontaxis <geko1702+commits@99rst.org>
Subject: Re: [PATCH v6] gitweb: redacted e-mail addresses feature.
Date: Fri, 09 Apr 2021 00:43:19 +0200	[thread overview]
Message-ID: <87eefkieig.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <pull.910.v6.git.1616973963862.gitgitgadget@gmail.com>


On Mon, Mar 29 2021, Georgios Kontaxis via GitGitGadget wrote:

> [...]
> +email-privacy::
> +	Redact e-mail addresses from the generated HTML, etc. content.
> +	This obscures e-mail addresses retrieved from the author/committer
> +	and comment sections of the Git log.
> +	It is meant to hinder web crawlers that harvest and abuse addresses.
> +	Such crawlers may not respect robots.txt.
> +	Note that users and user tools also see the addresses as redacted.
> +	If Gitweb is not the final step in a workflow then subsequent steps
> +	may misbehave because of the redacted information they receive.
> +	Disabled by default.
> +
>  highlight::
>  	Server-side syntax highlight support in "blob" view.  It requires
>  	`$highlight_bin` program to be available (see the description of
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 0959a782eccb..01c6faf88006 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -569,6 +569,15 @@ sub evaluate_uri {
>  		'sub' => \&feature_extra_branch_refs,
>  		'override' => 0,
>  		'default' => []},
> +
> +	# Redact e-mail addresses.
> +
> +	# To enable system wide have in $GITWEB_CONFIG
> +	# $feature{'email-privacy'}{'default'} = [1];
> +	'email-privacy' => {
> +		'sub' => sub { feature_bool('email-privacy', @_) },
> +		'override' => 1,
> +		'default' => [0]},
>  );
>  
>  sub gitweb_get_feature {
> @@ -3449,6 +3458,13 @@ sub parse_date {
>  	return %date;
>  }
>  
> +sub hide_mailaddrs_if_private {
> +	my $line = shift;
> +	return $line unless gitweb_check_feature('email-privacy');
> +	$line =~ s/<[^@>]+@[^>]+>/<redacted>/ig;

The /i here is redundant, since you have nothing that'll case-fold on
the LHS of the s///, doesn't harm anything either. Just a small note
since it's new in v6...

> +	return $line;
> +}
> +
>  sub parse_tag {
>  	my $tag_id = shift;
>  	my %tag;
> @@ -3465,7 +3481,7 @@ sub parse_tag {
>  		} elsif ($line =~ m/^tag (.+)$/) {
>  			$tag{'name'} = $1;
>  		} elsif ($line =~ m/^tagger (.*) ([0-9]+) (.*)$/) {
> -			$tag{'author'} = $1;
> +			$tag{'author'} = hide_mailaddrs_if_private($1);
>  			$tag{'author_epoch'} = $2;
>  			$tag{'author_tz'} = $3;
>  			if ($tag{'author'} =~ m/^([^<]+) <([^>]*)>/) {
> @@ -3513,7 +3529,7 @@ sub parse_commit_text {
>  		} elsif ((!defined $withparents) && ($line =~ m/^parent ($oid_regex)$/)) {
>  			push @parents, $1;
>  		} elsif ($line =~ m/^author (.*) ([0-9]+) (.*)$/) {
> -			$co{'author'} = to_utf8($1);
> +			$co{'author'} = hide_mailaddrs_if_private(to_utf8($1));
>  			$co{'author_epoch'} = $2;
>  			$co{'author_tz'} = $3;
>  			if ($co{'author'} =~ m/^([^<]+) <([^>]*)>/) {
> @@ -3523,7 +3539,7 @@ sub parse_commit_text {
>  				$co{'author_name'} = $co{'author'};
>  			}
>  		} elsif ($line =~ m/^committer (.*) ([0-9]+) (.*)$/) {
> -			$co{'committer'} = to_utf8($1);
> +			$co{'committer'} = hide_mailaddrs_if_private(to_utf8($1));
>  			$co{'committer_epoch'} = $2;
>  			$co{'committer_tz'} = $3;
>  			if ($co{'committer'} =~ m/^([^<]+) <([^>]*)>/) {
> @@ -3568,9 +3584,10 @@ sub parse_commit_text {
>  	if (! defined $co{'title'} || $co{'title'} eq "") {
>  		$co{'title'} = $co{'title_short'} = '(no commit message)';
>  	}
> -	# remove added spaces
> +	# remove added spaces, redact e-mail addresses if applicable.
>  	foreach my $line (@commit_lines) {
>  		$line =~ s/^    //;
> +		$line = hide_mailaddrs_if_private($line);
>  	}
>  	$co{'comment'} = \@commit_lines;
>  
> @@ -7489,7 +7506,8 @@ sub git_log_generic {
>  			         -accesskey => "n", -title => "Alt-n"}, "next");
>  	}
>  	my $patch_max = gitweb_get_feature('patches');
> -	if ($patch_max && !defined $file_name) {
> +	if ($patch_max && !defined $file_name &&
> +		!gitweb_check_feature('email-privacy')) {
>  		if ($patch_max < 0 || @commitlist <= $patch_max) {
>  			$paging_nav .= " &sdot; " .
>  				$cgi->a({-href => href(action=>"patches", -replay=>1)},
> @@ -7550,7 +7568,8 @@ sub git_commit {
>  			} @$parents ) .
>  			')';
>  	}
> -	if (gitweb_check_feature('patches') && @$parents <= 1) {
> +	if (gitweb_check_feature('patches') && @$parents <= 1 &&
> +		!gitweb_check_feature('email-privacy')) {
>  		$formats_nav .= " | " .
>  			$cgi->a({-href => href(action=>"patch", -replay=>1)},
>  				"patch");
> @@ -7863,7 +7882,8 @@ sub git_commitdiff {
>  		$formats_nav =
>  			$cgi->a({-href => href(action=>"commitdiff_plain", -replay=>1)},
>  			        "raw");
> -		if ($patch_max && @{$co{'parents'}} <= 1) {
> +		if ($patch_max && @{$co{'parents'}} <= 1 &&
> +			!gitweb_check_feature('email-privacy')) {
>  			$formats_nav .= " | " .
>  				$cgi->a({-href => href(action=>"patch", -replay=>1)},
>  					"patch");

I didn't run this, and hadn't kept up for a few rounds. I'm happy to see
the pos/while etc. looping gone, this LGTM.

  parent reply	other threads:[~2021-04-08 22:43 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-20 23:42 [PATCH] gitweb: redacted e-mail addresses feature Georgios Kontaxis via GitGitGadget
2021-03-21  0:42 ` Ævar Arnfjörð Bjarmason
2021-03-21  1:27   ` brian m. carlson
2021-03-21  3:30   ` Georgios Kontaxis
2021-03-21  3:32 ` [PATCH v2] " Georgios Kontaxis via GitGitGadget
2021-03-21 17:28   ` [PATCH v3] " Georgios Kontaxis via GitGitGadget
2021-03-21 18:26     ` Ævar Arnfjörð Bjarmason
2021-03-21 18:48       ` Junio C Hamano
2021-03-21 19:48       ` Georgios Kontaxis
2021-03-21 18:42     ` Junio C Hamano
2021-03-21 18:57       ` Junio C Hamano
2021-03-21 19:05         ` Junio C Hamano
2021-03-21 20:07       ` Georgios Kontaxis
2021-03-21 22:17         ` Junio C Hamano
2021-03-21 23:14           ` Georgios Kontaxis
2021-03-22  4:25             ` Junio C Hamano
2021-03-22  6:57     ` [PATCH v4] " Georgios Kontaxis via GitGitGadget
2021-03-22 18:32       ` Junio C Hamano
2021-03-22 18:58         ` Georgios Kontaxis
2021-03-28  1:41           ` Junio C Hamano
2021-03-28 21:43             ` Georgios Kontaxis
2021-03-28 22:35               ` Junio C Hamano
2021-03-23  4:27         ` Georgios Kontaxis
2021-03-27  3:56       ` [PATCH v5] " Georgios Kontaxis via GitGitGadget
2021-03-28 23:26         ` [PATCH v6] " Georgios Kontaxis via GitGitGadget
2021-03-29 20:00           ` Junio C Hamano
2021-03-31 21:14             ` Junio C Hamano
2021-04-06  0:56             ` Junio C Hamano
2021-04-08 22:43           ` Ævar Arnfjörð Bjarmason [this message]
2021-04-08 22:51             ` Junio C Hamano
2021-03-29  1:47         ` [PATCH v5] " Eric Wong
2021-03-29  3:17           ` Georgios Kontaxis
2021-04-08 17:16             ` Eric Wong
2021-04-08 21:04               ` Junio C Hamano
2021-04-08 21:19                 ` Eric Wong
2021-04-08 22:45                   ` Ævar Arnfjörð Bjarmason
2021-04-08 22:54                     ` Junio C Hamano
2021-03-21  6:00 ` [PATCH] " Junio C Hamano
2021-03-21  6:18   ` Junio C Hamano
2021-03-21  6:43   ` Georgios Kontaxis
2021-03-21 16:55     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87eefkieig.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=geko1702+commits@99rst.org \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.