Development

  • Learning to Code with AI

    One of my son’s friends messaged me asking how to get started with coding.

    My answer was to start with AI. Use ChatGPT or Claude, probably the paid versions, and have it help you code something that you actually care about.

    That’s how I learned to code with WordPress. People wanted features that WordPress didn’t have out of the box. I figured out the code to make it work. That cycle of problem then solution kept me moving forward.

    AI makes this easier if you use it right. If you just ask for a finished game, you probably won’t learn anything. If you ask the model to explain what it built, piece by piece, then you start learning.

    To provide a concrete example to my son’s friend, I asked ChatGPT to build a Frogger style game in JavaScript. One prompt and I had a working game in the browser.

    Starting from something that is working then provides many threads that can be pulled on:

    • Explain why all of this works.
    • Break down the code section by section.
    • What code defines the cars, frog, and logs?
    • How can we use actual images?
    • How can we change the speed of the pieces?
    • How can we insert in a concept of leveling where the difficulty starts out as easy and gets much more difficult?

    Each of these threads, or questions, is small enough that you can ask AI to walk you through it. Each time you walk through one of these questions, you’ll gain a bit more knowledge.

    You can even push further and ask it to build you a whole learning plan around extending Frogger. The game is not the point. The point is pulling on those threads until the pieces start to click.

  • Ember Shipping Code

    We were in Houston last week for some medical research appointments for Ember. What this really meant is that we spent a few hours at the hospital and then a lot of hours in the hotel room.

    While waiting around in the hotel room, I decided to deploy some code to WordPress.com. As I was walking through the steps, I had Ember type in shift + y to handle the final deploy to 100s of Millions of sites. 😄

  • How We Migrated a Tumblr Blog in Two Weeks

    After six months of paternity leave, I returned in January to do a rotation with the team migrating hundreds of millions of Tumblr blogs to WordPress.com.

    I hadn’t touched code in six months. Had never worked on Tumblr. And I’d spent the last five years in engineering leadership.

    The prompt I got? Do a speed run project to get from zero to one site migrated… in 2 weeks.

    So yea. I laughed. Then panicked.

    Fortunately, I was joining a team that had already done good foundational work. The challenge was just to get something across the finish line—fast.

    Taking Stock

    At first I got caught up in everything that might go wrong:

    • How do we migrate users?
    • What legal implications, such as terms of service, are involved?
    • How do we handle content moderation?
    • What about the differences in how Tumblr and WordPress store and render content?
    • How will we hydrate Tumblr’s APIs from this new data source?

    But, the tight timeline was a gift. Because it forced me to realize that none of this was essential for migrating a single site.

    We picked a Tumblr-owned blog with only one user: staff.tumblr.com. To handle content format differences, we decided to just export and migrate rendered HTML. To keep Tumblr’s APIs working, we double-wrote to both Tumblr and the migrated blog on WordPress.com.

    With that framing in place, we moved on to just the things that were absolutely necessary:

    • Theme
    • Content
    • Interactivity

    Moving Forward

    For each of these, I then approached them with the minimal set of requirements.

    For the theme, I went to staff.tumblr.com, viewed the source, copied it into index.php, and added a quick style.css. And committed that to WordPress.com. 😂

    Dirty. But it worked.

    The team had already done some work exporting rendered HTML from Tumblr. We took that export and piped it into WordPress.com using WP-CLI. Now we had something to look at.

    Then I started cutting up index.php into real templates and added the WordPress loop. At that point, we had a mostly working theme that we could iterate on.

    For interactivity, we stepped through this one-by-one:

    • Reblogging: Tumblr stores references. WordPress.com duplicates. We deferred solving this.
    • Likes, reblogs, embeds: Tumblr uses iFrames, so we copied them.
    • Archive pages: We quickly built a template that mimicked Tumblr production.
    • Notes: We added a new endpoint on Tumblr to fetch notes, cached the result, and skipped the complexity of partial migrations.

    With this approach, we were able to migrate a single site in about 2.5 weeks.

    Iterating Through Weirdness

    That initial migration wasn’t perfect—but it gave the team what we needed to build momentum. From there, we kept scaling and refining.

    Of course, weird things came up as we iterated. 🫠 But the team stuck with it and scaled the approach from that one messy win.

    Migrations were slow. As we added more content, we uncovered new edge cases—and had to migrate even more data to handle them.

    Functionality needed to be rewritten as we considered more edge cases.

    At times, this approach felt slower than designing the perfect system up front.

    Maybe it was.

    But it was real progress. And that mattered more.

    The lesson?

    When you’re working on something massive, don’t try to solve it all at once.

    Solve one piece. Then the next. Then the next. Move fast. Build momentum.

    How do you eat an elephant?

    One bite at a time.

  • Have you tried it to see what breaks?

    Last year, my team received a challenge from our CEO to try to significantly improve our purchase flow along with a bit of time pressure.

    For several months, I’d had a good, high-level idea of how we might improve our purchase flow. But, as the idea lacked concrete implementation details, we kept prioritizing other work ahead of it.

    But, with a directive from our CEO and some time pressure, prioritizing other work ahead of this was no longer an option. Knowing this, I reached out to a friend who had experience in the part of the codebase I?d need to make my changes in to ask him for advice on how to move forward. His reply was succinct:

    Have you tried it yet to see what breaks?

    As soon as he asked me this question, as silly as this sounds, it occurred to me that I hadn’t actually tried it. I then had the realization that I had paralyzed myself by thinking of it as a large problem that would take months to untangle rather than a series of small problems that could be addressed as needed.

    Armed with this outlook, I immediately dug into the codebase and emerged with a proof of concept within a couple of days.

    I often think about this interaction when I’m a bit paralyzed with a problem.

  • Getting a list of all Google Fonts

    As part of a recent audit that I was doing of Google Font usage in a codebase, I had a need to get a list of all Google Fonts. But, after a bit of searching, I wasn’t finding an easily consumable list of these fonts.

    But, it turns out that if you get an Google Fonts API key from the Google Cloud Console, then it’s super simple to query and get a list of fonts. Actually, this is what getting a list of Google Fonts looks like with a single command from the terminal.

    curl --location --request GET 'https://www.googleapis.com/webfonts/v1/webfonts?key=...' | jq ".items[] | .family" --raw-output

    To get this to work, simply replace the ... with your actual key and then run the command.

    Of note, this command does assume that you’ve got jq installed which is used to parse the JSON response. But, if you’d like, you can simply remove the everything after from the | jq and after to echo the json.

  • Simple Github Webhook handler in PHP

    Over the past month or so, I spent a bit of time working on setting up a local development environment for a project that I’m working on. As part of that, I also set up a deployment flow using a simple Github webhook handler in PHP.

    I modeled this Github webhook handler very much after jplitza’s Gist here, but I simplified it even further since all I really cared about was whether an event happened since I’d already filtered on Github for push events.

    This is the handler that I ended up with.

    <?php
    
    define( 'LOGFILE', '/DIR' );
    define( 'SECRET', '0000000000' );
    define( 'PULL_CMD', 'CMD_HERE' );
    
    $post_data = file_get_contents( 'php://input' );
    $signature = hash_hmac( 'sha1', $post_data, SECRET );
    
    function log_msg( $message ) {
    	file_put_contents( LOGFILE, $message . "\n", FILE_APPEND );
    }
    
    if ( empty( $_SERVER['HTTP_X_HUB_SIGNATURE'] ) ) {
    	exit;
    }
    
    if ( ! hash_equals( 'sha1=' . $signature, $_SERVER['HTTP_X_HUB_SIGNATURE'] ) ) {
    	exit;
    }
    
    // At this point, we've verified the signature from Github, so we can do things.
    $date = date(' m/d/Y h:i:s a', time() );
    log_msg( "Deploying at {$date}" );
    
    $output_lines = array();
    exec( PULL_CMD, $output_lines );
    
    if ( ! empty( $output_lines ) ) {
    	log_msg( implode( "\n", $output_lines ) );
    }
    
    exit;

    Basically, all this file is doing is:

    • Verifying that the request actually comes from Github by creating a signature using our SECRET and then comparing that to the signature that Github sent us with the time constant hash_equals
    • Running whatever command is necessary. For me, this command is basically just cd dir && git pull origin main.
    • Writing logs so that we can keep track of how often the Github webhook handler is called and whether it’s successful or not.

    To get this to work, you’ll roughly do the following:

    • Create a log file and add the location to the LOGFILE constant.
    • Create a strong secret in the SECRET constant that can be shared on Github so that we can validate the webhook came from Github.
    • Update the PULL_CMD constant with the command that you’d like to run.
    • Upload this file to your server in a publicly accessible location and make note of what the URL will be.
    • Go to Settings > Webhooks for one of the Github repos that you manage and create a new webhook.
    • Ensure that you set the URL to the file that we uploaded earlier and ensure that you enter the value from the SECRET constant.

    That seems like several steps, but in just a few minutes, you could have your own Github webhook handler in PHP up and running!

  • How to export Github pull requests

    I had the need recently to export Github pull requests from a repository to a CSV so that I could do some analysis.

    I wasn’t able to find a simple way to this in the Github UI. When I searched, I found several tools. But, the API seemed quite simple, so I just wrote a script that would dump all pull requests from a Github repository to a CSV.

    #!/bin/bash
    
    # This script requires jq to be installed and available in the path.
    
    TOKEN=$1
    ORG=$2
    REPO=$3
    OUTPUT_RAW=""
    
    get_pull_requests() {
    	curl -s --location --request GET "https://api.github.com/repos/$ORG/$REPO/pulls?state=all&per_page=40&page=$1" \
    		--header "Authorization: token $TOKEN" \
    		--header "Accept: application/vnd.github+json"
    }
    
    get_raw_output() {
    	printf '%s' "$1" | jq -r
    }
    
    i=1
    while [ "$OUTPUT_RAW" != "[]" ] ; do
    	OUTPUT=$( get_pull_requests "$i" )
    	OUTPUT_RAW=$( get_raw_output "$OUTPUT" )
    
    	i=$((i+1))
    
    	printf '%s' "$OUTPUT" | jq -r '.[] | [ .created_at, .html_url, .user.login, .title ] | @csv'
    done

    To use the script, you’ll need to have jq. On a mac, you can use brew install jq.

    The only other pre-requisite that you’ll need to export pull requests from Github is a personal access token.

    From there, you should just need to run the script with something like to export all of your pull requests for a given repository:

    sh github_pulls_export.sh TOKEN ORG REPO

    If you’d like to change what data gets exported, simply change the fields that are pulled in this section:

    [ .created_at, .html_url, .user.login, .title ]
    
    

    You can modify that printf line to get an idea of what fields are even included that you can pull from

  • On comparing large lists

    In the past, I’ve often had to generate email lists of users that fit specific conditions. This usually isn’t too difficult with some of our in-house data tools at Automattic. But, when I hit a case where I have to work across systems, it usually results in me dumping the data from each system and then comparing large lists.

    Comparing large lists on the command line isn’t that difficult. All it takes is a few commands which I’ll walk you through in this blog post!

    So, let’s come up with a theoretical example. Let’s say that I have two separate lists, one of users that have purchased Product A and one of users that have purchased Product B. Second, let’s also agree that these lists contain the user’s email address and the date of purchase. So, in a CSV, the data would look a bit like this:

    email_address,date
    "user@example.com","2022-03-01"

    Prepping the list of users

    Later on, we’re going to use the comm command for the actual comparing of the lists. Before we can use that command, there’s a bit of prep work that we first need to complete. Specifically, we need to:

    • Remove the CSV header row if there is one
    • Filter down the source data to whatever field we want to compare, email addresses in our theoretical example
    • Sort the list
    • Unique the list

    To cut the CSV header from file, we’re going to use the following:

    sed 1d product_a.csv

    This will remove the first line and print the rest of the file to standard out.

    Next, we’re going to pull out the column that contains email address, or the first column in our dataset above. To do this, we’re going to use the following:

    cut -d',' -f1 product_a.csv

    This command is setting the , as the delimeter and then pulling the first column of the file.

    From here, we simply need to sort and unique, which we can do with the sort and uniq commands. If we put all of the above together, we can run it in a single go for each file:

    sed 1d product_a.csv | cut -d',' -f1 | sort uniq > product_a_output.csv

    At this point, now we just need to take the files and actually compare them. ?

    Actually comparing large lists ?

    Alright, now that we have two processed files, we can get to the easy part, comparing. The easiest way I?ve found to compare files for these use cases is the comm command. You can man comm to get a detailed view of how that command works, but in summary, you use it like this:

    comm file1 file2

    That command then outputs three columns, where:

    • The first column is all values that are in file1 and not file2
    • The second column is all values in file2 and not file1
    • The third column is all values in both files

    If you’re only interested in the lines that are in both files, you can do the following, which will remove the first and second columns of values:

    comm -12 file1 file2

    You can then write redirect that output to a file or however you’d like.

  • How to expand tilde in bash script

    When I was working on a recent bash script, I was irritated when I wasn’t getting output to my desktop. After a while, I figured out that quoted tildes are not automatically expanded.

    From that link:

    If a word begins with an unquoted tilde character (‘~’), all of the characters up to the first unquoted slash (or all characters, if there is no unquoted slash) are considered a tilde-prefix.

    The “quoted” part being the key phrase. Thus, if your tilde is in quotes, it will not be expanded. This issue is demonstrated by the following:

    ?  ~ path="~/Desktop"
    ?  ~ cd "$path"
    cd: no such file or directory: ~/Desktop

    Never fear though, there is an easy solution for this: parameter expansion. ?

    ?  ~ path="~/Desktop"
    ?  ~ path=${path/\~/$HOME}
    ?  ~ cd "$path"
    ?  Desktop

    What we’re doing here is basically a find and replace. We take in path, search for ~, and replace it with $HOME. This gives us a valid path like /Users/username/Desktop which we can then use in various commands.

    Now, before you actually implement this, maybe consider whether you need to. In my case, the issue was that I was quoting an argument to my script. Instead of using parameter expansion to expand the tilde, I could’ve simply unquoted the path as it was passed to the script.

    But, you know, lessons learned.

  • Git checkout non-origin remote pull request

    I don’t often review pull requests that come from non-origin remotes. This is largely because Automattic tends to create pull requests in the root repository. So, when I reviewed a pull request from an open source contributor today, I found myself wandering if there was a simple way to checkout the pull request locally for testing.

    I knew that I could add a new remote, fetch from the new remote, and then checkout the pull request. But, my lazy developer brain figured there was an easier way. ?

    So, I asked my fellow Automatticians if they had any tips/tricks and I got a few good ones.

    First, and probably the best solution, is to use the Github’s CLI tool. Once you’ve installed that, you can then checkout a pull request locally with something like:

    gh pr checkout {<number> | <url> | <branch>}

    But, if you’re relatively happy with your git flow and are looking for a little helper for this specific case, then you may be interested in this blog post by Scott Lowe. In that post, Scott shares a tip for fetching the branch from the non-origin remote to your local machine in a single command:

    git fetch origin pull/1234/head:pr-1234

    In my testing, this worked very well for me. But, I wanted to be a bit lazier. So, I ended up throwing that command in a shell function that expects the pull request number as an argument and then:

    1. Fetches the branch from the non-origin remote
    2. Checks out the branch from the non-origin remote

    That function looks like this:

    function gcopr() {
    	$( git fetch origin pull/"$1"/head:pr-"$1" )
    	$( git checkout pr-"$1" )
    }

    I’ve got this function in my ~/.oh-my-zsh/custom` directory and I use it like this:

    gcopr 42940

  • Recording completed tasks with Alfred

    At Automattic, many teams have a process where they post weekly, or biweekly, updates. One of the things that I’ve often found difficult, as I write my personal update, is remembering all of the little things that I did for the past week.

    Sure, since I work on the computer, there’s usually some paper trail for what I did. But, getting that paper trail meant that I needed to comb through various sources and then also try to remember the things that didn’t have a paper trail.

    One of my favorite tools for getting all of the tasks that I completed in one place, and minimizing the number of things that weren’t tracked, was iDoneThis. But, it’s got a lot more functionality than I need. So, I set out to implement something to track completed tasks locally.

    A simple Alfred workflow

    Introducing the Dones workflow for Alfred! ?

    This very simple workflow works by querying done {query}. The workflow will then take over and do the following:

    • Create a new file with a name like 2020-03-31.txt, where 2020-03-31 is the current date
    • Add the done as a new line in that file, prepended with a timestamp. Ex. 4:19:32 PM: hello world

    With this setup, you’ll get a single file for each day that you record dones. You can then browse through those in the standard Mac file browser.

    Installation

    To install the Dones workflow, simply download the workflow from Github and then double click to import it in Alfred.

    Future work

    At the moment, there is no definite future work planned. That being said, one nice-to-have that is on my mind is adding a command to sum up a period’s dones. For example, maybe something like dones_sum 7 that gets the past 7 day’s worth of dones.

  • Install Unison 2.48.4 on Mac OS X with Homebrew

    I use Unison to sync code between my local machine and my dev servers. To sync between two servers, it requires that the same version of Unison be installed on both servers.

    Now, this isn’t usually a big deal, because once you get Unison set up, it’s set up. But, I usually get a bit frustrated when setting up a new development machine and ensuring that it has the same Unison version as my remote server.

    Most recently, I needed to get Unison 2.48.4 on my local Mac so that it matched my remote server. BUT, homebrew didn’t support Unison 2.48.4.

    So, after getting some feedback from one of my coworkers, we came up with the following. Maybe you’ll find it helpful.

    # Get rid of existing Unison
    brew uninstall --force unison
    
    # Checkout version of homebrew with Unison 2.48.4
    cd /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core
    git checkout 05460e0bf3ae5f1a15ae40315940b2d39dd6ac52 Formula/unison.rb
    
    # Install
    brew install --force-bottle unison
    
    # Set homebrew-core back to normal
    git checkout master
    git reset HEAD .
    git checkout -- .

    NOTE: If you get error: fatal: reference is not a tree: 05460e0bf3ae5f1a15ae40315940b2d39dd6ac52 after running git checkout 05460e0bf3ae5f1a15ae40315940b2d39dd6ac52 Formula/unison.rb, we’ve been able to fix the issue by recloning homebrew-core. If you get the same error, you’ll want to add these steps before retrying starting at the git checkout 05460e0bf3ae5f1a15ae40315940b2d39dd6ac52 Formula/unison.rb command above.

    cd /usr/local/Homebrew/Library/Taps/homebrew
    rm -rf homebrew-core
    git clone https://github.com/Homebrew/homebrew-core.git
    cd homebrew-core
  • Recursively cast to array in PHP

    I recently ran into an issue where JSON encoding some objects in my code wasn’t working properly. After experimenting, I realized that casting everything to an array before JSON encoding magically fixed things. 

    Casting an object to an array is simple enough:

    $variable_to_array = (array) $object_var;

    But, what happens when an object or array contains references to other objects or arrays? The answer is that we then need to recursively cast a given input to an array. But, we don’t necessarily want to recursively cast everything to an array. For example, this is what happens when we cast 1 to an array:

    return (array) 1;
    => array(1) {
      [0]=>
      int(1)
    }

    A simple fix is to recursively cast non-scalar values to an array. Here’s an example of how we would do that:

    /**
     * Given mixed input, will recursively cast to an array if the input is an array or object.
     *
     * @param mixed $input Any input to possibly cast to array.
     * @return mixed
     */ 
    function recursive_cast_to_array( $input ) {
    	if ( is_scalar( $input ) ) {
    		return $input;
    	}
    
    	return array_map( 'recursive_cast_to_array', (array) $input );
    }
  • How to remove files not tracked in SVN

    At Automattic, we use SVN and Phabricator for much of our source control needs. One issue that I often run into is a warning about untracked files when creating a Phabricator differential:

    You have untracked files in this working copy.
    
      Working copy: ~/public_html
    
      Untracked changes in working copy:
      (To ignore this change, add it to "svn:ignore".)
        test.txt
    
        Ignore this untracked file and continue? [y/N]

    This warning’s purpose is to make sure that the differential being created has ALL of the changes so that a file isn’t forgotten when a commit is made. 

    But, what if the untracked file(s) are from previously checking out and testing a patch? In that case, this warning is actually a bit annoying. 

    The simple fix is to clear out the file(s) that aren’t tracked by SVN, which is as simple as deleting the file(s) since they’re not tracked in SVN. For a single file, that might look like:

    rm test.txt

    But, what if there are dozens or hundreds of files? I know I certainly wouldn’t want to run the command above dozens or hundreds of times to remove all of the files that aren’t tracked in SVN. Of course, we can automate all of the work by running something like the following ONCE:

    svn st | grep '^?' | awk '{print }' | xargs rm -rf

    Simply run the above from the root of the project and the untracked files should be removed. The above command is a bit much, so I’d recommend throwing it in an alias, which would look something like this:

    alias clearuntracked='svn st | grep '\''^?'\'' | awk '\''{print }'\'' | xargs rm -rf'
  • Get unique values in file with shell command

    Over the past year, there have been a couple of times where I’ve needed to sort some large list of values, more than 100 million lines in one case.

    In each case, I was dealing with a data source where there was surely duplicate entries. For example, duplicate usernames, emails, or URLs.?To address this, I decided to get the unique values from the file before I ran a final processing script over them. This would require sorting all of the values in the given file and then deduping in the resulting groups of values.

    This sorting and deduping can be a bit challenging. There are various algorithms to consider and if the dataset is large enough, we also need to ensure that we’re handling the data in a way that we don’t run out of memory.

    Shell commands to the rescue 🙂

    Luckily, there are shell commands that make it quite simple to get the unique values in a file. Here’s what I ended up using to get the unique values in a file:

    cat $file | sort | uniq

    In this example, we are:

    • Opening the file at $file
    • Sorting the file so that duplicates end up in a contiguous block
    • Dedupe so that only one value remains from each contiguous block

    Here’s another example of this command with piped input:

    php -r 'for ( $i = 0; $i < 1000000; $i++ ) { echo sprintf( "%d\n", random_int( 0, 100 ) ); }' | sort -n | uniq?

    In this example, we are

    • Generating 1,000,000 million random numbers, between 0 and 1,000) on their own lines
    • Sorting that output so that like numbers are together
      • Note that we’re using -n?here to do an integer sort.
    • Deduping that so that we end up with a unique number on each line

    If we wanted know how often each number occurred in the file, we could simple add -c?to the end of the command above. The resulting command would be php -r 'for ( $i = 0; $i < 1000000; $i++ ) { echo sprintf( "%d\n", random_int( 0, 100 ) ); }' | sort -n | uniq -c?and we would get some output that looked like this:

    9880 0
    10179 1
    9725 2
    10024 3
    9921 4
    9893 5
    9945 6
    9881 7
    9707 8
    9955 9
    9896 10
    9845 11
    9928 12
    10024 13
    10005 14
    9834 15
    9929 16
    9764 17
    9795 18
    9932 19
    9735 20
    10082 21
    9876 22
    9835 23
    9748 24
    9947 25
    9975 26
    9841 27
    9856 28
    9751 29
    10138 30
    10037 31
    10026 32
    10128 33
    9926 34
    9821 35
    9990 36
    9920 37
    9696 38
    9886 39
    9896 40
    9815 41
    9924 42
    9739 43
    9854 44
    9936 45
    9977 46
    9873 47
    9824 48
    10043 49
    10054 50
    9870 51
    9783 52
    9901 53
    9819 54
    9882 55
    10022 56
    9899 57
    9922 58
    9922 59
    9902 60
    10036 61
    9830 62
    9792 63
    9894 64
    10008 65
    9774 66
    9918 67
    9986 68
    9814 69
    9661 70
    10117 71
    10046 72
    9704 73
    10016 74
    9601 75
    9901 76
    9923 77
    9931 78
    9909 79
    9895 80
    9771 81
    10044 82
    10059 83
    9864 84
    9938 85
    9799 86
    10006 87
    9883 88
    9880 89
    9837 90
    9701 91
    9870 92
    9998 93
    9809 94
    9883 95
    10144 96
    9935 97
    9979 98
    9922 99
    9789 100
  • What is the JavaScript event loop?

    I remember the first time I saw a setTimeout( fn, 0 ) call in some React. Luckily there was a comment with the code, so I kind of had an idea of why that code was there. Even with the comment though, it was still confusing. 

    Since then, I’ve read several articles about the event loop and got to a point where I was fairly comfortable with my understanding. But, after watching this JSConf talk by Philip Roberts, I feel like I’ve got a much better understanding.

    In the talk, Philip uses a slowed down demonstration of the event loop to explain what’s going on to his audience. Philip also demonstrates a tool that he built which allows users to type in code and visualize all of the parts that make JavaScript asynchronous actions work.

    You can check out the tool at http://latentflip.com/loupe, but I’d recommend doing it after watching the video.

  • How to install Unison 2.48 on Ubuntu

    For developing on remote servers, but using a local IDE, I prefer to use Unison over other methods that rely on syncing files via rsync or SFTP.

    But, one issue with Unison is that two computers must have the same version to sync. And since Homebrew installs Unison 2.48.4 and apt-get install unison installs something like 2.0.x, this meant I couldn’t sync between my computer and a development machine if I wanted to install Unison via apt-get

    No worries, by following the documentation, and a bit more searching, I was able to figure out how to build Unison 2.48.4 on my development server!

    Note: I did run into a warning at the end of the build. But, from what I can tell, the build actually succeeded. The second-to-last step below helps you test if the build succeeded.

    • apt-get install ocaml
    • apt-get install make
    • curl -O curl -O https://www.seas.upenn.edu/~bcpierce/unison//download/releases/stable/unison-2.48.4.tar.gz
    • tar -xvzf unison-2.48.4.tar.gz
    • cd src
    • make UISTYLE=text
    • ./unison to make sure it built correctly. You should see something like this:
      Usage: unison [options]
      or unison root1 root2 [options]
      or unison profilename [options]
      
      For a list of options, type "unison -help".
      For a tutorial on basic usage, type "unison -doc tutorial".
      For other documentation, type "unison -doc topics".
      
    • mv unison /usr/local/bin

    After going through these commands, unison should be in your path, so you should be able to use unison from any directory without specifying the location of the binary.

  • How to apply a filter to an aggregation in Elasticsearch

    When using Elasticsearch for reporting efforts, aggregations have been invaluable. Writing my first aggregation was pretty awesome. But, pretty soon after, I needed to figure out a way to run an aggregation over a filtered data set.

    As with learning all new things, I was clueless how to do this. Turns out, it’s quite easy. Within a few minutes, I came across some articles that recommended using a top-level query with a filtered argument, which seemed cool because I could just copy my filter up.

    That’d look something like:

    [code]
    {
    “query”: {
    “filtered”: {}
    }
    }
    [/code]

    But, one of my coworkers pointed out that filtered queries have been deprecated and removed in 5.x. Womp womp. So, the alternative was to just convert the filter to a bool must query.

    Here’s an example:

    Example

    You can find the Shakespeare data set that I’m using, as well as instructions on how to install it here. Using real data and actually running the query seems to help me learn better, so hopefully you’ll find it helpful.

    Once you’ve got the data, let’s run a simple aggregation to get the list of unique plays.

    [code]
    GET shakespeare/_search
    {
    “aggs”: {
    “play_name”: {
    “terms”: {
    “field”: “play_name”,
    “size”: 200
    }
    },
    “play_count”: {
    “cardinality”: {
    “field”: “play_name”
    }
    }
    },
    “size”: 0
    }
    [/code]

    Based on this query, we can see that there are 36 plays in the dataset, which is one off from what a Google search suggested. I’ll chalk that up to slightly off data perhaps?

    Now, if we were to dig through the buckets, we could list out every single play that Shakespeare wrote, without having to iterate over every single doc in the dataset. Pretty cool, eh?

    But, what if we wanted to see all plays that Falstaff was a speaker in? We could easily update the query to be something like the following:

    [code]

    GET shakespeare/_search
    {
    “query”: {
    “bool”: {
    “must”: {
    “term”: {
    “speaker”: “FALSTAFF”
    }
    }
    }
    },
    “aggs”: {
    “play_name”: {
    “terms”: {
    “field”: “play_name”,
    “size”: 200
    }
    }
    },
    “size”: 0
    }
    [/code]

    In this case, we’ve simply added a top-level query that returns only docs where FALSTAFF is the speaker. Then, we take those docs and run the aggregation. This gives us results like this:

    [code]
    {
    “took”: 5,
    “timed_out”: false,
    “_shards”: {
    “total”: 5,
    “successful”: 5,
    “failed”: 0
    },
    “hits”: {
    “total”: 1117,
    “max_score”: 0,
    “hits”: []
    },
    “aggregations”: {
    “play_name”: {
    “doc_count_error_upper_bound”: 0,
    “sum_other_doc_count”: 0,
    “buckets”: [
    {
    “key”: “Henry IV”,
    “doc_count”: 654
    },
    {
    “key”: “Merry Wives of Windsor”,
    “doc_count”: 463
    }
    ]
    }
    }
    }
    [/code]

    And based on that, we can see that FALSTAFF was in “Henry IV” and “Merry Wives of Windsor”.

    Comments

    Feel free to leave a comment below if you have critical feedback or if this helped you!

  • How to retry Selenium Webdriver tests in Mocha

    While working on some functional tests for a hosting provider, I kept running into an issue where the login test was failing due to a 500 error. It appeared as if the site hadn’t been fully provisioned by the time my test was trying to login.

    Initially, I attempted adding timeouts to give the installation process more time, but that seemed prone to error as well since the delay was variable. Also, with a timeout, I would’ve had to make the timeout be the longest expected time, and waiting a minute or so in a test suite didn’t seem like a good idea.

    Getting it done

    You think it’d be a quick fix, right? If this errors, do it again.

    Within minutes, I had found a setting in Mocha that allowed retrying a test. So, I happily plugged that in, ran the test suite again, and it failed…

    The issue? The JS bindings for Selenium Webdriver work off of promises, so they don’t quite mesh with the built-in test retry logic. And not having dug in to promises much yet, it definitely took me a bit to wrap my head around a solution.

    That being said, there are plenty of articles out there that talk about retries with JavaScript promises, which helped bring me up to speed. But, I didn’t find any that were for specifically retrying promises with Selenium Webdriver in a Mocha test suite.

    So, I learned from a couple of examples, and came up with a solution that’d work in my Selenium Webdriver Mocha tests.

    The Code

    You can find a repo with the code and dependencies here, but for convenience, I’m also copying the relevant snippets below:

    The retry logic

    This function below recursively calls itself, fetching a promise with the test assertions, and decrementing the number of tries each time.

    Each time the function is called, a new promise is created. In that promise, we use catch so that we can hook into the errors and decide whether to retry the test or throw the error.

    Note: The syntax looks a bit cleaner in ES6 syntax, but I didn’t want to set that up.

    [javascript]
    var handleRetries = function ( browser, fetchPromise, numRetries ) {
    numRetries = ‘undefined’ === typeof numRetries
    ? 1
    : numRetries;
    return fetchPromise().catch( function( err ) {
    if ( numRetries > 0 ) {
    return handleRetries( browser, fetchPromise, numRetries – 1 );
    }
    throw err;
    } );
    };
    [/javascript]

    The test

    The original test, without retries, looked something like this:

    [javascript]
    test.describe( ‘Can fetch URL’, function() {
    test.it( ‘page contains something’, function() {
    var selector = webdriver.By.name( ‘ebinnion’ ),
    i = 1;
    browser.get( ‘https://google.com&#8217; );
    return browser.findElement( selector );
    } );
    } );
    [/javascript]

    After integrating with the retry logic, it now looks like this:

    [javascript]
    test.describe( ‘Can fetch URL’, function() {
    test.it( ‘page contains something’, function() {
    var selector = webdriver.By.name( ‘ebinnion’ ),
    i = 1;
    return handleRetries( browser, function() {
    console.log( ‘Trying: ‘ + i++ );
    browser.get( ‘https://google.com&#8217; );
    return browser.findElement( selector );
    }, 3 );
    } );
    } );
    [/javascript]

    Note that the only thing we did different in the test was put the Selenium Webdriver calls (which return a promise) inside a callback that gets called from handleRetries. Putting the calls inside this callback allows us to get a new promise each time we retry.

    Comments?

    Feel free to leave a comment if you have input or questions. Admittedly, I may not be too much help if it’s a very technical testing question, but I can try.

    I’m also glad to accept critical feedback if there’s a better approach. Particular an approach that doesn’t require an external module, although I’m glad to hear of those as well.

  • PHP – Get methods of a class along with arguments

    Lately, I’ve been using the command line a lot more often at work. I found two things hard about using the command line to interact with PHP files:

    1. Figuring out the require path every time I opened an interactive shell
    2. Remember what methods were available in a class and what arguments the method expected

    The first was pretty easy to handle by writing a function that would require often used files. The second one turned out to not be too hard and is the subject of this post.

    The code

    Below is the code that I used to get the methods of an object as well as the arguments for each method.

    “`
    <?php

    <?php
    function print_object_methods( $mgr ) {
      foreach ( get_class_methods( $mgr ) as $method ) {
        echo $method;
        $r = new ReflectionMethod( $mgr, $method );
        $params = $r->getParameters();
    
        if ( ! empty( $params ) ) {
          $param_names = array();
          foreach ( $params as $param ) {
            $param_names[] = sprintf( '$%s', $param->getName() );
          }
          echo sprintf( '( %s )', implode(', ', $param_names ) );
        }
        echo "\n";
      }
    }

    An example

    Let’s use the Jetpack_Options class from Jetpack as an example. You can find it here: https://github.com/Automattic/jetpack/blob/master/class.jetpack-options.php

    For that class, the above code would output:

    get_option_names( $type )
    is_valid( $name, $group )
    is_network_option( $option_name )
    get_option( $name, $default )
    get_option_and_ensure_autoload( $name, $default )
    update_option( $name, $value, $autoload )
    update_options( $array )
    delete_option( $names )
    delete_raw_option( $name )
    update_raw_option( $name, $value, $autoload )
    get_raw_option( $name, $default )
    

    As a note, in this case, it could also be nice to print out the docblock for each method instead of just the arguments to add some context. But, I didn’t need too much context for a file that I’m in pretty often. Your mileage may vary.