How to export Github pull requests

I had the need recently to export Github pull requests from a repository to a CSV so that I could do some analysis.

I wasn’t able to find a simple way to this in the Github UI. When I searched, I found several tools. But, the API seemed quite simple, so I just wrote a script that would dump all pull requests from a Github repository to a CSV.

#!/bin/bash

# This script requires jq to be installed and available in the path.

TOKEN=$1
ORG=$2
REPO=$3
OUTPUT_RAW=""

get_pull_requests() {
	curl -s --location --request GET "https://api.github.com/repos/$ORG/$REPO/pulls?state=all&per_page=40&page=$1" \
		--header "Authorization: token $TOKEN" \
		--header "Accept: application/vnd.github+json"
}

get_raw_output() {
	printf '%s' "$1" | jq -r
}

i=1
while [ "$OUTPUT_RAW" != "[]" ] ; do
	OUTPUT=$( get_pull_requests "$i" )
	OUTPUT_RAW=$( get_raw_output "$OUTPUT" )

	i=$((i+1))

	printf '%s' "$OUTPUT" | jq -r '.[] | [ .created_at, .html_url, .user.login, .title ] | @csv'
done

To use the script, you’ll need to have jq. On a mac, you can use brew install jq.

The only other pre-requisite that you’ll need to export pull requests from Github is a personal access token.

From there, you should just need to run the script with something like to export all of your pull requests for a given repository:

sh github_pulls_export.sh TOKEN ORG REPO

If you’d like to change what data gets exported, simply change the fields that are pulled in this section:

[ .created_at, .html_url, .user.login, .title ]

You can modify that printf line to get an idea of what fields are even included that you can pull from

Responses

  1. robin Avatar

    Hello and thanks for setting up the script in the first place. When I am running it now,it gives the below error:
    jq: error (at :3): Cannot index string with string “created_at” .

    1. Eric Binnion Avatar

      In my testing, this can happen when a bad response is returned. Here’s an updated version of the script that adds a basic HTTP status code check in. Let me know if this helps.

      #!/bin/bash
      
      # This script requires jq to be installed and available in the path.
      
      TOKEN=$1
      ORG=$2
      REPO=$3
      OUTPUT_RAW=""
      
      get_pull_requests() {
          curl -s -w "\n%{http_code}" --location --request GET "https://api.github.com/repos/$ORG/$REPO/pulls?state=all&per_page=40&page=$1" \
              --header "Authorization: token $TOKEN" \
              --header "Accept: application/vnd.github+json"
      }
      
      get_raw_output() {
          printf '%s' "$1" | jq -r
      }
      
      i=1
      while [ "$OUTPUT_RAW" != "[]" ] ; do
          OUTPUT=$( get_pull_requests "$i" )
      
          HTTP_CODE=$(printf "%s" "$OUTPUT" | tail -n1)
          BODY=$(printf "%s" "$OUTPUT" | sed '$d')
      
          if [ "$HTTP_CODE" -lt 200 ] || [ "$HTTP_CODE" -ge 300 ]; then
              echo "HTTP code: $HTTP_CODE" >&2
              echo "$BODY" >&2
              exit 1
          fi
      
          printf '%s' "$BODY" | jq -r '.[] | [ .created_at, .html_url, .user.login, .title ] | @csv'
      
          OUTPUT_RAW=$( get_raw_output "$BODY" )
          i=$((i+1))
      done
      

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Eric Binnion

Subscribe now to keep reading and get access to the full archive.

Continue reading