All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: workflows@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Sasha Levin <sashal@kernel.org>
Subject: [PATCH] verify_pull_requests: initial pull request sanitizer
Date: Sat, 12 Apr 2025 08:29:11 -0400	[thread overview]
Message-ID: <20250412122911.327134-1-sashal@kernel.org> (raw)

I'm working on evolving the work I'm doing on the linus-next integration
branch, and this seemed like another useful tool.

Verify that either the sender of the pull request is listed as a
maintainer for the subsystem the patches are destined for. This provides
us two things:

1. Audit the correctness of the MAINTAINERS file, and provide an
   opportunity to correct and add missing "tribal knowledge" (folks who
   are the de-facto maintainers, but are not listed in MAINTAINERS).

2. Verify that inadvertent changes are not included in a pull request.

Below is an example output of the tool. Take note that for pull request
#3 we see a warning because Jens isn't listed as a maintainer for
drivers/nvme/ even though he is sending pull requests for it.

$ ./scripts/verify_pull_requests.sh --days 1
Number of pull requests in the last 1 day(s): 5
Processing pull requests...
Pull request #1: http://lore.kernel.org/all/CAH2r5mt3CCXVEwdsrqPe1VE+xebPSh2k4Wg5Zqqp_OCm+m7cPQ@mail.gmail.com/
  Sender: Steve French <smfrench@gmail.com>
  Repository: git://git.samba.org/sfrench/cifs-2.6.git
  Branch/Tag: tags/v6.15-rc1-smb3-client-fixes
  Fetching: git fetch "git://git.samba.org/sfrench/cifs-2.6.git" "tags/v6.15-rc1-smb3-client-fixes"
  Fetch: ✅ Successfully fetched
  Checking maintainer status for 10 commit(s)...
  ✅ Maintainer verification: Sender or a signer is listed as maintainer for all commits
------------------------
Pull request #2: http://lore.kernel.org/all/20250411181650.GA372618@bhelgaas/
  Sender: Bjorn Helgaas <helgaas@kernel.org>
  Repository: git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
  Branch/Tag: tags/pci-v6.15-fixes-1
  Fetching: git fetch "git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git" "tags/pci-v6.15-fixes-1"
  Fetch: ✅ Successfully fetched
  Checking maintainer status for 1 commit(s)...
  ✅ Maintainer verification: Sender or a signer is listed as maintainer for all commits
------------------------
Pull request #3: http://lore.kernel.org/all/8d3e5d98-09b1-4274-af25-124c91342b7a@kernel.dk/
  Sender: Jens Axboe <axboe@kernel.dk>
  Repository: git://git.kernel.dk/linux.git
  Branch/Tag: tags/block-6.15-20250411
  Fetching: git fetch "git://git.kernel.dk/linux.git" "tags/block-6.15-20250411"
  Fetch: ✅ Successfully fetched
  Checking maintainer status for 13 commit(s)...
  ✅ Maintainer verification: Sender or a signer is listed as maintainer for all commits
  ⚠️  Warning: Sender is NOT listed as maintainer for these commits (but a signer is):
    - 70289ae5cac4d nvmet-fc: put ref when assoc->del_work is already scheduled
    - b0b26ad0e1943 nvmet-fc: take tgtport reference only once
    - 1a909565733ed nvmet-fc: update tgtport ref per assoc
    - 88517565b5929 nvmet-fc: inline nvmet_fc_free_hostport
    - aeaa0913a6994 nvmet-fc: inline nvmet_fc_delete_assoc
    - 72511b1dc4147 nvmet-fcloop: add ref counting to lport
    - f22c458f9495f nvmet-fcloop: replace kref with refcount
    - 2b5f0c5bc819a nvmet-fcloop: swap list_add_tail arguments
------------------------
Pull request #4: http://lore.kernel.org/all/Z_kntkZxksOfGwpt@8bytes.org/
  Sender: Joerg Roedel <joro@8bytes.org>
  Repository: git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
  Branch/Tag: tags/iommu-fixes-v6.15-rc1
  Fetching: git fetch "git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git" "tags/iommu-fixes-v6.15-rc1"
  Fetch: ✅ Successfully fetched
  Checking maintainer status for 9 commit(s)...
  ✅ Maintainer verification: Sender or a signer is listed as maintainer for all commits
------------------------
Pull request #5: http://lore.kernel.org/all/CAJZ5v0iEn-Lyic6zxDehxF1HHfNfg11_S7COMsHnZeQ+TzZAsA@mail.gmail.com/
  Sender: "Rafael J. Wysocki" <rafael@kernel.org>
  Repository: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
  Branch/Tag: acpi-6.15-rc2
  Fetching: git fetch "git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git" "tags/acpi-6.15-rc2"
  Fetch: ✅ Successfully fetched
  Checking maintainer status for 3 commit(s)...
  ✅ Maintainer verification: Sender or a signer is listed as maintainer for all commits

Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 scripts/verify_pull_requests.sh | 393 ++++++++++++++++++++++++++++++++
 1 file changed, 393 insertions(+)
 create mode 100755 scripts/verify_pull_requests.sh

diff --git a/scripts/verify_pull_requests.sh b/scripts/verify_pull_requests.sh
new file mode 100755
index 0000000000000..3dd6492a71d2f
--- /dev/null
+++ b/scripts/verify_pull_requests.sh
@@ -0,0 +1,393 @@
+#!/bin/bash
+#set -x
+
+# Default number of days to search
+days=1
+
+# Parse command line arguments
+while [ "$#" -gt 0 ]; do
+    case "$1" in
+        --days)
+            shift
+            if [[ "$1" =~ ^[0-9]+$ ]]; then
+                days="$1"
+            else
+                echo "Error: --days requires a numeric argument"
+                exit 1
+            fi
+            ;;
+        *)
+            echo "Unknown option: $1"
+            echo "Usage: $0 [--days N]"
+            exit 1
+            ;;
+    esac
+    shift
+done
+
+URL="https://lore.kernel.org/all/?q=s:%22GIT+PULL%22+AND+t:torvalds+AND+rt:${days}.day.ago...+AND+NOT+s:re:&x=A"
+
+temp_file=$(mktemp)
+curl -s "$URL" > "$temp_file"
+
+count=$(grep -c "<entry>" "$temp_file")
+echo "Number of pull requests in the last ${days} day(s): $count"
+
+# Extract message URLs and filter out query parameters and #related links
+message_urls=$(grep -o "http://lore.kernel.org/all/[^\"]*" "$temp_file" | grep -v "\\?" | grep -v "#related")
+
+echo "Processing pull requests..."
+
+count=0
+while read -r message_url; do
+    count=$((count + 1))
+    echo "Pull request #$count: $message_url"
+
+    message_content=$(mktemp)
+    curl -s -L "$message_url" > "$message_content"
+
+    email_content=$(cat "$message_content")
+
+    # Extract and clean sender information
+    from_line=$(echo "$email_content" | grep -o "From:.*" | head -1)
+    from_line=$(echo "$from_line" | sed 's/&lt;/</g' | sed 's/&gt;/>/g' | sed 's/&#34;/"/g' | sed 's/&quot;/"/g')
+
+    if [[ "$from_line" =~ From:[[:space:]]+(.*)[[:space:]]+\<([^>]+)\> ]]; then
+        sender_name="${BASH_REMATCH[1]}"
+        sender_email="${BASH_REMATCH[2]}"
+        sender_name=$(echo "$sender_name" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
+        sender_email=$(echo "$sender_email" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
+        echo "  Sender: $sender_name <$sender_email>"
+    else
+        echo "  Sender: $(echo "$from_line" | sed 's/From: //')"
+    fi
+
+    found_repo=false
+    repo=""
+    branch=""
+
+    # Try extraction methods in order of preference
+
+    # 1. Extract repo from HTML links
+    html_href_lines=$(echo "$email_content" | grep -n '<a[[:space:]]*href=".*git.*"')
+
+    if [ -n "$html_href_lines" ]; then
+        while read -r numbered_line; do
+            line_num=$(echo "$numbered_line" | cut -d: -f1)
+            line=$(echo "$numbered_line" | cut -d: -f2-)
+
+            if [[ $line =~ href=\"([^\"]*gitlab[^\"]*|[^\"]*git[^\"]*|[^\"]*kernel\.org[^\"]*)\" ]]; then
+                repo="${BASH_REMATCH[1]}"
+
+                # Check for branch on same line or next line
+                if [[ $line =~ \</a\>([[:space:]]*([[:alnum:]/_.-]+)) ]]; then
+                    branch="${BASH_REMATCH[2]}"
+                    echo "  Repository: $repo"
+                    echo "  Branch/Tag: $branch"
+                    found_repo=true
+                    break
+                else
+                    next_line_num=$((line_num + 1))
+                    next_line=$(echo "$email_content" | sed -n "${next_line_num}p")
+                    next_line=$(echo "$next_line" | sed 's/^[[:space:]]*//' | sed 's/[[:space:]]*$//')
+
+                    if [[ $next_line =~ ^[[:alnum:]/_.-]+$ ]]; then
+                        branch="$next_line"
+                        echo "  Repository: $repo"
+                        echo "  Branch/Tag: $branch"
+                        found_repo=true
+                        break
+                    elif [ "$found_repo" = false ]; then
+                        repo_no_branch=$repo
+                        line_no_branch=$line
+                    fi
+                fi
+            fi
+        done <<< "$html_href_lines"
+    fi
+
+    # 2. Extract repo from plain text if not found in HTML
+    if [ "$found_repo" = false ]; then
+        repo_lines=$(echo "$email_content" | grep -n -i "git://\|https://git\|git@" | grep -v "href=")
+
+        if [ -n "$repo_lines" ]; then
+            while read -r numbered_line; do
+                line_num=$(echo "$numbered_line" | cut -d: -f1)
+                line=$(echo "$numbered_line" | cut -d: -f2-)
+
+                if [[ $line =~ (git://|ssh://git|https://git|git@)[^[:space:]]+(/[^[:space:]]+)+ ]]; then
+                    repo="${BASH_REMATCH[0]}"
+                    repo=$(echo "$repo" | sed 's/[,.\\]$//' | sed 's/[[:space:]]*$//')
+
+                    if [[ $line =~ $repo[[:space:]]+([[:alnum:]/_.-]+) ]]; then
+                        branch="${BASH_REMATCH[1]}"
+                        echo "  Repository: $repo"
+                        echo "  Branch/Tag: $branch"
+                        found_repo=true
+                        break
+                    else
+                        next_line_num=$((line_num + 1))
+                        next_line=$(echo "$email_content" | sed -n "${next_line_num}p")
+                        next_line=$(echo "$next_line" | sed 's/^[[:space:]]*//' | sed 's/[[:space:]]*$//')
+
+                        if [[ $next_line =~ ^[[:alnum:]/_.-]+$ ]]; then
+                            branch="$next_line"
+                            echo "  Repository: $repo"
+                            echo "  Branch/Tag: $branch"
+                            found_repo=true
+                            break
+                        elif [ "$found_repo" = false ]; then
+                            repo_no_branch=$repo
+                            line_no_branch=$line
+                        fi
+                    fi
+                fi
+            done <<< "$repo_lines"
+        fi
+    fi
+
+    # 3. Try "available in the Git repository at:" section
+    if [ "$found_repo" = false ]; then
+        main_repo_section=$(echo "$email_content" | grep -A 10 "available in the Git repository at")
+
+        if [ -n "$main_repo_section" ]; then
+            if [[ $main_repo_section =~ href=\"([^\"]*gitlab[^\"]*|[^\"]*git[^\"]*|[^\"]*kernel\.org[^\"]*) ]]; then
+                repo="${BASH_REMATCH[1]}"
+                echo "  Repository: $repo"
+                found_repo=true
+
+                tags_line=$(echo "$main_repo_section" | grep -o "tags/[[:alnum:]/_.-]*" | head -1)
+                if [ -n "$tags_line" ]; then
+                    branch="$tags_line"
+                    echo "  Branch/Tag: $branch"
+                fi
+            fi
+        fi
+    fi
+
+    # 4. Use repo without branch if that's all we found
+    if [ "$found_repo" = false ] && [ -n "${repo_no_branch:-}" ]; then
+        repo="$repo_no_branch"
+        echo "  Repository: $repo"
+        echo "  Context: $line_no_branch"
+        found_repo=true
+    fi
+
+    if [ "$found_repo" = false ]; then
+        echo "  No repository URL found in this pull request."
+    else
+        # Convert ssh URLs to git URLs for verification
+        verification_repo="$repo"
+
+        # Handle different git URL formats for kernel.org
+        if [[ "$verification_repo" =~ ^ssh://git@gitolite\.kernel\.org(.*) ]]; then
+            verification_repo="git://git.kernel.org${BASH_REMATCH[1]}"
+            echo "  Using git URL for verification: $verification_repo"
+        fi
+
+        if [[ "$verification_repo" =~ ^git@gitolite\.kernel\.org:(.*) ]]; then
+            verification_repo="git://git.kernel.org/${BASH_REMATCH[1]}"
+            echo "  Using git URL for verification: $verification_repo"
+        fi
+
+        if [ -n "$verification_repo" ] && [ -n "$branch" ]; then
+            # Try fetching, first with tags/ prefix if needed
+            fetch_ref="$branch"
+            if [[ ! "$branch" =~ ^(refs/|tags/) ]] && [[ ! "$branch" =~ ^remotes/ ]]; then
+                fetch_ref="tags/$branch"
+            fi
+
+            echo "  Fetching: git fetch \"$verification_repo\" \"$fetch_ref\""
+            if git fetch "$verification_repo" "$fetch_ref" 2>/dev/null; then
+                echo "  Fetch: ✅ Successfully fetched"
+
+                # Check if there are any commits to verify
+                commit_hashes=$(git rev-list --no-merges origin/master..FETCH_HEAD 2>/dev/null)
+
+                if [ -z "$commit_hashes" ]; then
+                    echo "  ℹ️ No new commits found. Pull request likely already merged."
+                else
+                    total_commits=$(echo "$commit_hashes" | wc -l)
+                    echo "  Checking maintainer status for $total_commits commit(s)..."
+
+                    # Array to store problematic commits
+                    problematic_commits=()
+                    # Array to store commits where sender is not maintainer but a signer is
+                    sender_not_maintainer_commits=()
+
+                    # Check each commit silently
+                    while read -r commit_hash; do
+                        [ -z "$commit_hash" ] && continue
+
+                        commit_msg=$(git log -1 --pretty=format:"%h %s" "$commit_hash")
+
+                        if [ -f "scripts/get_maintainer.pl" ]; then
+                            maintainers=$(git show "$commit_hash" | ./scripts/get_maintainer.pl)
+                            signoffs=$(git show -s --format=%b "$commit_hash" | grep -i "Signed-off-by:" | sed 's/^[[:space:]]*Signed-off-by:[[:space:]]*//')
+
+                            valid_maintainer=false
+                            sender_is_maintainer=false
+
+                            # Check if sender is a maintainer
+                            if echo "$maintainers" | grep -q "$sender_email" || echo "$maintainers" | grep -q "$sender_name"; then
+                                valid_maintainer=true
+                                sender_is_maintainer=true
+                            else
+                                # Check if any signoff person is a maintainer
+                                while read -r signoff; do
+                                    [ -z "$signoff" ] && continue
+
+                                    # Extract name and email from signoff
+                                    if [[ "$signoff" =~ (.*)[[:space:]]+\<([^>]+)\> ]]; then
+                                        signer_name="${BASH_REMATCH[1]}"
+                                        signer_email="${BASH_REMATCH[2]}"
+                                        signer_name=$(echo "$signer_name" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
+                                        signer_email=$(echo "$signer_email" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
+
+                                        if echo "$maintainers" | grep -q "$signer_email" || echo "$maintainers" | grep -q "$signer_name"; then
+                                            valid_maintainer=true
+                                            break
+                                        fi
+                                    fi
+                                done <<< "$signoffs"
+                            fi
+
+                            # Add to problematic commits if no valid maintainer found
+                            if [ "$valid_maintainer" = false ]; then
+                                problematic_commits+=("$commit_msg")
+                            # Track commits where sender is not a maintainer but a signer is
+                            elif [ "$sender_is_maintainer" = false ]; then
+                                sender_not_maintainer_commits+=("$commit_msg")
+                            fi
+                        fi
+                    done <<< "$commit_hashes"
+
+                    # Display results based on problematic commits
+                    if [ ${#problematic_commits[@]} -eq 0 ]; then
+                        echo "  ✅ Maintainer verification: Sender or a signer is listed as maintainer for all commits"
+
+                        # Add warning if we found commits where sender is not a maintainer
+                        if [ ${#sender_not_maintainer_commits[@]} -gt 0 ]; then
+                            echo "  ⚠️  Warning: Sender is NOT listed as maintainer for these commits (but a signer is):"
+                            for commit in "${sender_not_maintainer_commits[@]}"; do
+                                echo "    - $commit"
+                            done
+                        fi
+                    else
+                        echo "  ❌ Maintainer verification: Neither sender nor any signers are listed as maintainers for these commits:"
+                        for commit in "${problematic_commits[@]}"; do
+                            echo "    - $commit"
+                        done
+                    fi
+                fi
+            else
+                # Try without tags/ prefix if the first attempt failed
+                if [[ "$fetch_ref" == tags/* ]]; then
+                    fetch_ref="${branch}"
+                    echo "  Fetching: git fetch \"$verification_repo\" \"$fetch_ref\""
+                    if git fetch "$verification_repo" "$fetch_ref" 2>/dev/null; then
+                        echo "  Fetch: ✅ Successfully fetched"
+
+                        # Check if there are any commits to verify
+                        commit_hashes=$(git rev-list --no-merges origin/master..FETCH_HEAD 2>/dev/null)
+
+                        if [ -z "$commit_hashes" ]; then
+                            echo "  ℹ️ No new commits found. Pull request likely already merged."
+                        else
+                            total_commits=$(echo "$commit_hashes" | wc -l)
+                            echo "  Checking maintainer status for $total_commits commit(s)..."
+
+                            # Array to store problematic commits
+                            problematic_commits=()
+                            # Array to store commits where sender is not maintainer but a signer is
+                            sender_not_maintainer_commits=()
+
+                            # Check each commit silently
+                            while read -r commit_hash; do
+                                [ -z "$commit_hash" ] && continue
+
+                                commit_msg=$(git log -1 --pretty=format:"%h %s" "$commit_hash")
+
+                                if [ -f "scripts/get_maintainer.pl" ]; then
+                                    maintainers=$(git show "$commit_hash" | ./scripts/get_maintainer.pl)
+                                    signoffs=$(git show -s --format=%b "$commit_hash" | grep -i "Signed-off-by:" | sed 's/^[[:space:]]*Signed-off-by:[[:space:]]*//')
+
+                                    valid_maintainer=false
+                                    sender_is_maintainer=false
+
+                                    # Check if sender is a maintainer
+                                    if echo "$maintainers" | grep -q "$sender_email" || echo "$maintainers" | grep -q "$sender_name"; then
+                                        valid_maintainer=true
+                                        sender_is_maintainer=true
+                                    else
+                                        # Check if any signoff person is a maintainer
+                                        while read -r signoff; do
+                                            [ -z "$signoff" ] && continue
+
+                                            # Extract name and email from signoff
+                                            if [[ "$signoff" =~ (.*)[[:space:]]+\<([^>]+)\> ]]; then
+                                                signer_name="${BASH_REMATCH[1]}"
+                                                signer_email="${BASH_REMATCH[2]}"
+                                                signer_name=$(echo "$signer_name" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
+                                                signer_email=$(echo "$signer_email" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
+
+                                                if echo "$maintainers" | grep -q "$signer_email" || echo "$maintainers" | grep -q "$signer_name"; then
+                                                    valid_maintainer=true
+                                                    break
+                                                fi
+                                            fi
+                                        done <<< "$signoffs"
+                                    fi
+
+                                    # Add to problematic commits if no valid maintainer found
+                                    if [ "$valid_maintainer" = false ]; then
+                                        problematic_commits+=("$commit_msg")
+                                    # Track commits where sender is not a maintainer but a signer is
+                                    elif [ "$sender_is_maintainer" = false ]; then
+                                        sender_not_maintainer_commits+=("$commit_msg")
+                                    fi
+                                fi
+                            done <<< "$commit_hashes"
+
+                            # Display results based on problematic commits
+                            if [ ${#problematic_commits[@]} -eq 0 ]; then
+                                echo "  ✅ Maintainer verification: Sender or a signer is listed as maintainer for all commits"
+
+                                # Add warning if we found commits where sender is not a maintainer
+                                if [ ${#sender_not_maintainer_commits[@]} -gt 0 ]; then
+                                    echo "  ⚠️ Warning: Sender is NOT listed as maintainer for these commits (but a signer is):"
+                                    for commit in "${sender_not_maintainer_commits[@]}"; do
+                                        echo "    - $commit"
+                                    done
+                                fi
+                            else
+                                echo "  ❌ Maintainer verification: Neither sender nor any signers are listed as maintainers for these commits:"
+                                for commit in "${problematic_commits[@]}"; do
+                                    echo "    - $commit"
+                                done
+                            fi
+                        fi
+                    else
+                        echo "  Fetch: ❌ Failed to fetch"
+                    fi
+                else
+                    echo "  Fetch: ❌ Failed to fetch"
+                fi
+            fi
+        elif [ -n "$verification_repo" ]; then
+            # If we only have the repository but no branch/tag, just verify the repository exists
+            echo "  Verifying: git ls-remote --exit-code \"$verification_repo\""
+            if git ls-remote --exit-code "$verification_repo" > /dev/null 2>&1; then
+                echo "  Verification: ✅ Repository exists"
+            else
+                echo "  Verification: ❌ Could not access repository"
+            fi
+        fi
+    fi
+
+    rm "$message_content"
+
+    echo "------------------------"
+done <<< "$message_urls"
+
+rm "$temp_file"
-- 
2.39.5


                 reply	other threads:[~2025-04-12 12:29 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250412122911.327134-1-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=workflows@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.