How to export Github pull requests

I had the need recently to export Github pull requests from a repository to a CSV so that I could do some analysis.

I wasn’t able to find a simple way to this in the Github UI. When I searched, I found several tools. But, the API seemed quite simple, so I just wrote a script that would dump all pull requests from a Github repository to a CSV.

#!/bin/bash

# This script requires jq to be installed and available in the path.

TOKEN=$1
ORG=$2
REPO=$3
OUTPUT_RAW=""

get_pull_requests() {
	curl -s --location --request GET "https://api.github.com/repos/$ORG/$REPO/pulls?state=all&per_page=40&page=$1" \
		--header "Authorization: token $TOKEN" \
		--header "Accept: application/vnd.github+json"
}

get_raw_output() {
	printf '%s' "$1" | jq -r
}

i=1
while [ "$OUTPUT_RAW" != "[]" ] ; do
	OUTPUT=$( get_pull_requests "$i" )
	OUTPUT_RAW=$( get_raw_output "$OUTPUT" )

	i=$((i+1))

	printf '%s' "$OUTPUT" | jq -r '.[] | [ .created_at, .html_url, .user.login, .title ] | @csv'
done

To use the script, you’ll need to have jq. On a mac, you can use brew install jq.

The only other pre-requisite that you’ll need to export pull requests from Github is a personal access token.

From there, you should just need to run the script with something like to export all of your pull requests for a given repository:

sh github_pulls_export.sh TOKEN ORG REPO

If you’d like to change what data gets exported, simply change the fields that are pulled in this section:

[ .created_at, .html_url, .user.login, .title ]

You can modify that printf line to get an idea of what fields are even included that you can pull from

2 responses to “How to export Github pull requests”

  1. Hello and thanks for setting up the script in the first place. When I am running it now,it gives the below error:
    jq: error (at :3): Cannot index string with string “created_at” .

    1. In my testing, this can happen when a bad response is returned. Here’s an updated version of the script that adds a basic HTTP status code check in. Let me know if this helps.

      #!/bin/bash
      
      # This script requires jq to be installed and available in the path.
      
      TOKEN=$1
      ORG=$2
      REPO=$3
      OUTPUT_RAW=""
      
      get_pull_requests() {
          curl -s -w "\n%{http_code}" --location --request GET "https://api.github.com/repos/$ORG/$REPO/pulls?state=all&per_page=40&page=$1" \
              --header "Authorization: token $TOKEN" \
              --header "Accept: application/vnd.github+json"
      }
      
      get_raw_output() {
          printf '%s' "$1" | jq -r
      }
      
      i=1
      while [ "$OUTPUT_RAW" != "[]" ] ; do
          OUTPUT=$( get_pull_requests "$i" )
      
          HTTP_CODE=$(printf "%s" "$OUTPUT" | tail -n1)
          BODY=$(printf "%s" "$OUTPUT" | sed '$d')
      
          if [ "$HTTP_CODE" -lt 200 ] || [ "$HTTP_CODE" -ge 300 ]; then
              echo "HTTP code: $HTTP_CODE" >&2
              echo "$BODY" >&2
              exit 1
          fi
      
          printf '%s' "$BODY" | jq -r '.[] | [ .created_at, .html_url, .user.login, .title ] | @csv'
      
          OUTPUT_RAW=$( get_raw_output "$BODY" )
          i=$((i+1))
      done
      

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.