From: Andrey Albershteyn <aalbersh@redhat.com>
To: linux-xfs@vger.kernel.org
Cc: Andrey Albershteyn <aalbersh@kernel.org>,
"Darrick J. Wong" <djwong@kernel.org>
Subject: [PATCH v4 05/10] git-contributors: better handling of hash mark/multiple emails
Date: Thu, 13 Feb 2025 21:14:27 +0100 [thread overview]
Message-ID: <20250213-update-release-v4-5-c06883a8bbd6@kernel.org> (raw)
In-Reply-To: <20250213-update-release-v4-0-c06883a8bbd6@kernel.org>
Better handling of hash mark, tags with multiple emails and not
quoted names in emails. See comments in the script.
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
tools/git-contributors.py | 109 ++++++++++++++++++++++++++++++++++++++--------
1 file changed, 90 insertions(+), 19 deletions(-)
diff --git a/tools/git-contributors.py b/tools/git-contributors.py
index 70ac8abb26c8ce65de336c5ae48abcfee39508b2..1a0f2b80e3dad9124b86b29f8507389ef91fe813 100755
--- a/tools/git-contributors.py
+++ b/tools/git-contributors.py
@@ -37,35 +37,106 @@ class find_developers(object):
self.r1 = re.compile(regex1, re.I)
+ # regex to guess if this is a list of multiple addresses.
+ # Not sure why the initial "^.*" is needed here.
+ self.r2 = re.compile(r'^.*,[^,]*@[^@]*,[^,]*@', re.I)
+
+ # regex to match on anything inside a pair of angle brackets
+ self.r3 = re.compile(r'^.*<(.+)>', re.I)
+
+ def _handle_addr(self, addr):
+ # The next split removes everything after an octothorpe (hash
+ # mark), because someone could have provided an improperly
+ # formatted email address:
+ #
+ # Cc: stable@vger.kernel.org # v6.19+
+ #
+ # This, according to my reading of RFC5322, is allowed because
+ # octothorpes can be part of atom text. However, it is
+ # interepreted as if there weren't any whitespace
+ # ("stable@vger.kernel.org#v6.19+"). The grammar allows for
+ # this form, even though this is not a correct Internet domain
+ # name.
+ #
+ # Worse, if you follow the format specified in the kernel's
+ # SubmittingPatches file:
+ #
+ # Cc: <stable@vger.kernel.org> # v6.9
+ #
+ # emailutils will not know how to parse this, and returns empty
+ # strings. I think this is because the angle-addr
+ # specification allows only whitespace between the closing
+ # angle bracket and the CRLF.
+ #
+ # Hack around both problems by ignoring everything after an
+ # octothorpe, no matter where it occurs in the string. If
+ # someone has one in their name or the email address, too bad.
+ a = addr.split('#')[0]
+
+ # emailutils can extract email addresses from headers that
+ # roughly follow the destination address field format:
+ #
+ # Reviewed-by: Bogus J. Simpson <bogus@simpson.com>
+ # Reviewed-by: "Bogus J. Simpson" <bogus@simpson.com>
+ # Reviewed-by: bogus@simpson.com
+ #
+ # Use it to extract the email address, because we don't care
+ # about the display name.
+ (name, addr) = email.utils.parseaddr(a)
+ if DEBUG:
+ print(f'A:{a}:NAME:{name}:ADDR:{addr}:')
+ if len(addr) > 0:
+ return addr
+
+ # If emailutils fails to find anything, let's see if there's
+ # a sequence of characters within angle brackets and hope that
+ # is an email address. This works around things like:
+ #
+ # Reported-by: Xu, Wen <wen.xu@gatech.edu>
+ #
+ # Which should have had the name in quotations because there's
+ # a comma.
+ m = self.r3.match(a)
+ if m:
+ addr = m.expand(r'\g<1>')
+ if DEBUG:
+ print(f"M3:{addr}:M:{m}:")
+ return addr
+
+ # No idea, just spit the whole thing out and hope for the best.
+ return a
+
def run(self, lines):
addr_list = []
for line in lines:
l = line.strip()
- # emailutils can handle abominations like:
- #
- # Reviewed-by: Bogus J. Simpson <bogus@simpson.com>
- # Reviewed-by: "Bogus J. Simpson" <bogus@simpson.com>
- # Reviewed-by: bogus@simpson.com
- # Cc: <stable@vger.kernel.org> # v6.9
- # Tested-by: Moo Cow <foo@bar.com> # powerpc
+ # First, does this line match any of the headers we
+ # know about?
m = self.r1.match(l)
if not m:
continue
- (name, addr) = email.utils.parseaddr(m.expand(r'\g<2>'))
+ rightside = m.expand(r'\g<2>')
- # This last split removes anything after a hash mark,
- # because someone could have provided an improperly
- # formatted email address:
- #
- # Cc: stable@vger.kernel.org # v6.19+
- #
- # emailutils doesn't seem to catch this, and I can't
- # fully tell from RFC2822 that this isn't allowed. I
- # think it is because dtext doesn't forbid spaces or
- # hash marks.
- addr_list.append(addr.split('#')[0])
+ n = self.r2.match(rightside)
+ if n:
+ # Break the line into an array of addresses,
+ # delimited by commas, then handle each
+ # address.
+ addrs = rightside.split(',')
+ if DEBUG:
+ print(f"0LINE:{rightside}:ADDRS:{addrs}:M:{n}")
+ for addr in addrs:
+ a = self._handle_addr(addr)
+ addr_list.append(a)
+ else:
+ # Otherwise treat the line as a single email
+ # address.
+ if DEBUG:
+ print(f"1LINE:{rightside}:M:{n}")
+ a = self._handle_addr(rightside)
+ addr_list.append(a)
return sorted(set(addr_list))
--
2.47.2
next prev parent reply other threads:[~2025-02-13 20:16 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-13 20:14 [PATCH v4 00/10] Update release.sh Andrey Albershteyn
2025-02-13 20:14 ` [PATCH v4 01/10] release.sh: add signing and fix outdated commands Andrey Albershteyn
2025-02-13 20:14 ` [PATCH v4 02/10] release.sh: add --kup to upload release tarball to kernel.org Andrey Albershteyn
2025-02-13 20:14 ` [PATCH v4 03/10] release.sh: update version files make commit optional Andrey Albershteyn
2025-02-13 20:14 ` [PATCH v4 04/10] Add git-contributors script to notify about merges Andrey Albershteyn
2025-02-13 20:14 ` Andrey Albershteyn [this message]
2025-02-13 21:47 ` [PATCH v4 05/10] git-contributors: better handling of hash mark/multiple emails Darrick J. Wong
2025-02-13 20:14 ` [PATCH v4 06/10] git-contributors: make revspec required and shebang fix Andrey Albershteyn
2025-02-13 21:45 ` Darrick J. Wong
2025-02-13 20:14 ` [PATCH v4 07/10] release.sh: generate ANNOUNCE email Andrey Albershteyn
2025-02-13 20:14 ` [PATCH v4 08/10] release.sh: add -f to generate for-next update email Andrey Albershteyn
2025-02-13 20:14 ` [PATCH v4 09/10] libxfs-apply: drop Cc: to stable release list Andrey Albershteyn
2025-02-13 21:45 ` Darrick J. Wong
2025-02-13 22:27 ` Andrey Albershteyn
2025-02-13 22:47 ` Darrick J. Wong
2025-02-13 20:14 ` [PATCH v4 10/10] gitignore: ignore a few newly generated files Andrey Albershteyn
2025-02-13 20:20 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250213-update-release-v4-5-c06883a8bbd6@kernel.org \
--to=aalbersh@redhat.com \
--cc=aalbersh@kernel.org \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.