Engineering 2020 In Review

Originally posted on Hatch Blog

Year 3 for the Hatch Engineering team was a massive year and not what we expected, but then 2020 was not what anyone expected. Hatch launched the Exchange within weeks of the COVID-19 lock-downs to help stood down workers find temporary work. We helped over 1500 people find employment and had over 12,000 people sign up. We then took the learning from that and brought forward our plans to help companies hire junior talent, not only students, launching a new version of Hatch.

2020 Highlights

Labour Exchange

The big highlight of 2020 was how the entire Hatch team pulled together in a herculean effort to build, launch, and operate the Labour Exchange in just a matter of weeks. It felt like the early days again and we generated a huge amount of learning from the experience.

Rapid Prototyping

The Exchange provided the team with an opportunity to experiment with low and no code solutions and alternate architectures to rapidly get a product to market. We ended up delivering the Exchange with a mix of Airtable and S3 as backend “UI” and data storage. This has led to us continuing to use Airtable as we started to build Hatch 2.0 later in the year.

Quality vs Speed Experiment

The focus on releasing the Exchange as quickly as possible gave us an unusual opportunity to build product in a very different way to which we normally do and compare the results! We didn’t have any CI/CD, we siloed engineers to code areas they were strongest, we limited the amount of collaboration on tech design and we had minimal automated tests although we did keep code reviews.

The result was we delivered customer value the quickest we ever have. We also had more incidents in those few months than we have had in the last 3 years, engineers had no opportunity to develop into other areas -focusing only on their core strengths, code was siloedand quickly became difficult to change and stressful to deploy, and the support and operational load was far higher.

Despite the stressful and untenable environment it created, it was a great experience for the engineering team and Hatch as a whole to see first hand the trade-offs involved for Velocity. As we pivoted the product later in the year we took a lot of lessons from the Exchange allowing us to build on top of low code platforms like Airtable in a way that was more maintainable than our first attempt.

Remote work

Coming into 2020 building a remote work culture was one of the engineering team’s stated goals

With flexible work a continuing mega-trend this decade, we want to make sure remote working is built into our culture. Whilst we aren’t remote first Hatch does embrace flexible working (work from home Wednesday!) and a number of the engineering team are planning remote work trips this year. Whilst internally we are comfortable working together remotely, we want to lean into ensuring our remote practices and interactions with the wider team are equally as second nature as it is internally.

COVID-19 was a force multiplier for remote working everywhere. It helped us go deeper on this goal than we thought possible last year. Our previous flexible work arrangement (where everyone works from home on Wednesday) was mainly designed to create space for deep individual work which can be difficult in a busy office. This didn’t force us to really develop remote first tendencies as in-person collaboration was just scheduled on other days.  Once thrust into full remote working we needed to develop clear agreements on sync and async communications, and effective ways of collaborating online.

The Engineering team felt far more productive remote than when in the office fulltime, but there are still issues to work through. The main issue is maintaining culture and team connection. We tried a number of options from team zoom lunches where we all cook the same dish, to Friday drinks. Unfortunately group social video is still not a solved problem and for the extroverts nothing beats the real thing.

Miro was one of the key contributors to our success during this period, the wider team has spent a lot of time in Miro. It’s especially good at facilitating remote collaborative sessions like retrospectives, although the feedback you get when presenting ideas and designs through Zoom can still be daunting with everyone often on mute or providing non-verbal feedback which you can’t see on your screen when looking at a Miro.

Product Pivot

After the learnings from the Labour Exchange we accelerated our roadmap into being a platform for  junior roles, not just students.  Launching the MVP of Hatch 2.0 required re-thinking how we capture, assess  and present applicant data and provided an opportunity to fulfil some of our 2020 goals the Labour Exchange had put on hold.

Splitting Site from App

Since I joined Hatch I’ve wanted to split the marketing site from the application code base. As our marketing  team grew so did the support for the endeavour to improve the performance of the site. When we needed to re-launch the marketing site to account for our new offering it was the perfect opportunity.  Based on some rapid prototyping we’d done in NextJs for the Exchange, we decided to use NextJs again for its ability to easily mix statically rendered pages with SSR rendered pages.

We use FAB to run the site on AWS CloudFront. It’s still early days for FAB and there are a lot of workarounds to have it run our site, but I love where it’s going and the feature set it opens up.

Engineering and Data Science Responsibilities

We made great progress in defining the responsibilities and boundaries between the Data Science and Engineering. After a few iterations of how to integrate node and python and where to host the code we settled on models as a service backed by a virtual feature store (thin wrapper over product databases for now) and clear definitions of which parts are maintained by DS versus engineering. We’ll outline this in more detail in a future post.

“Right-sized” Services and Orchestration

Hatch 1.0 was opinionated about how a hiring journey should work and we didn’t invest heavily in edge cases or unhappy paths. This made experimenting with new recruitment processes, like 2.0 or the Exchange, difficult. It made sense at the time, we were building for a very specific use case but as we moved into 2.0 we wanted to be able to test a number of different ways to add value to the recruitment process. We needed a more flexible set of tools we could orchestrate into processes to serve the current hypothesis.

This allowed us to accelerate into our microservice right-sized services architecture. Building services with clear data boundaries and well defined responsibilities that we glue together, where needed, with orchestration services that encode a particular workflow we are using. Rather than needing to make each component service know about its neighbours, they provide an interface API and set of events which we can use in the orchestration service to wire together workflows.

We continued building our services with Lerna in our monorepo which has been working well. We are still looking for a better solution or tooling for deploying the changed set of services more efficiently.

Low-code

We took our learning with Airtable from the Exchange and applied to our 2.0 MVP to rapidly spin up the operational aspects of role assessment definition and fulfilment. It allowed us to rapidly test data structures, hypothesis and iterate before we commit to building out bespoke product. We are even doubling down on it this year with custom Airtable apps to provide better UI over the data. We know we eventually need to replace this, but we hope to have iterated enough that we will really understand our needs once we start building.

We also started trialling Webflow to empower operations and marketing to own their own content and rapidly iterate. It’s still early days and the single concurrent user in the designer is proving to be a blocker, so it may not be Webflow in the future but empowering the entire team to iterate quickly without engineering is proving invaluable.

GraphQL

A UI challenge with service oriented architectures is that a piece of UI always needs more data than a single service can provide. We end up writing facades either as a service or in the UI itself to aggregate the needed data. The promise of GraphQL alleviating this issue and throwing in change notifications for free has been something the team has want to test out for awhile. This year we built our new version of the internal scoring tool, Astria, with GraphQL (AppSync) to kick the tyres in a contained way.

Personally I was underwhelmed. The tooling is still immature and the overhead of introducing another way of accessing data and building services (versus our REST API’s) wasn’t worth the value it delivered. Change notifications only work well if everything is mutated with GraphQL, injecting mutations from other avenues is clunky, and the experience left me thinking WebSockets would be just as easy for our current architecture.

Currently we are leaving it contained to this one service, however Apollo Federated Schemas has potential to solve the issue of data aggregation across our services. Once this is available for AppSync we may kick the tyres again.

2021 Focus

Hatch’s wider 2021 focus is outlined here, from an engineering perspective this is what we want to focus on.

  • Growing the team – we’re searching for a talented Full stack Product Engineer to join our team
  • Improving Team Knowledge Sharing – The surface area of our product has grown, we maintain Hatch 1.0, the Labour Exchange, Hatch 2.0 and a marketing site for each. Plus we are growing our team. Keeping the team aligned on our best practices and how everything fits together architecturally is a big challenge for 2021.
  • Data Science Collaboration – We have grown our Data Science team and will be focusing on our Matching Science in 2021. We laid good ground work in 2020 for how engineering and science work together. This year we will be doubling down on this and building on that initial ground work to create more maintainable systems.
  • Building a marketplace for junior talent – All of the above is in service of aligning our product offerings into a single marketplace that helps us execute on our mission.

Technology & Practices Radar

From this year we are also going to start tracking our Technology & Practices Radar

Highlights of the new entrants for us include

  • Linc and FAB’s –  interesting front end containerisation which was recently acquired by CloudFlare.
  • Archium – Which we hope will accelerate team understanding of the overall system and how it hangs together.

Some of the tools and tech we have put on hold include:

  • MobX – in favour of pure state management. The adoption of hooks really showed us we didn’t need centralised state management for our app.
  • GraphQL – detailed above

Achieving S3 Read-After-Update Consistency

Originally posted on the Hatch Blog

The team at Hatch spun up the Labour Exchange in a few days re-purposing our tech to help stood down workers find employment during the covid-19 crisis.

In order to get the system up and running in such a short time frame we decided to use S3 as a flat-file data store to maintain our serverless batch job states and caches. After a very cursory search to satisfy ourselves S3 would guarantee read-after-write consistency, we flew on.

From the documentation

Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all Regions with one caveat. The caveat is that if you make a HEAD or GET request to a key name before the object is created, then create the object shortly after that, a subsequent GET might not return the object due to eventual consistency.

Unfortunately, we missed the bolded section and the part further down the page that explicitly states our use case:

A process replaces an existing object and immediately tries to read it. Until the change is fully propagated, Amazon S3 might return the previous data.

As these processes run on a schedule (not as part of a user facing API) we could afford to spend some extra calls to S3 to roll our own read-after-update consistency. Knowing that S3 guarantees read-after-firstWrite, we can write a new file for every change, read the latest file and make sure we cleanup.

So every time we write a file we:

  • Append a timestamp to the filename
  • Remove older files

When we read a file we:

  • List all files with the key prefix (S3 guarantees listing files will be ordered by ascending UTF-8 binary order)
  • Get the newest file in the list

import { writeS3Obj, getS3FileAsObj, listObjects, deleteS3Files } from "./S3";
/**
* S3 does not provide read-after-update consistency.
* It does provide read-after-firstWrite consistency (as long as no GET has been requested)
* We write a new file every time it changes, and we read the latest file.
* S3 guarantees list of files are sorted in ascending UTF-8 Binary Order
*
*/
const cleanUp = async (key: string) => {
const response = await listObjects({
MaxKeys: 1000,
Bucket: process.env.BUCKET_NAME!,
Prefix: key,
});
const keys = response.Contents?.map((c) => c.Key!) || [];
await deleteS3Files(keys.slice(0, keys.length 1));
};
export const writeServiceState = async (key: string, state: any) => {
await writeS3Obj(`${key}.${Date.now()}`, state);
await cleanUp(key);
};
export const getServiceState = async (key: string, defaultVal: T): Promise => {
const response = await listObjects({
MaxKeys: 1000,
Bucket: process.env.BUCKET_NAME!,
Prefix: key,
});
if (!response.Contents || response.Contents.length === 0) {
console.log("No state file for key " + key);
return defaultVal;
}
return getS3FileAsObj(response.Contents[response.Contents.length 1].Key!);
};
view raw s3.ts hosted with ❤ by GitHub

Engineering Year in Review 2018

It was the first year at Hatch for the entire engineering team as we took over the reigns to build out the MVP into a mature product. We’ve merged the 2018 pull-request and have already started work on the feature/2019 branch. Here are some highlights and struggles from our 2018 retro.

2018 Highlights

  • Launched the Hatch API
  • Migrated the Company User side of the product to React
  • Converted the front-end projects from Flow to Typescript
  • Added lights to all the hardware 😉
  • Laid a solid foundation for the Hatch Engineering Culture

2019 Focus

Code Quality

At the beginning of 2018 we pushed a little hard on shipping product which led to some quality issues both in the code and for the customer. A team focus for 2019 is ensuring we are leaving the code better than we found it and improving the effectiveness of our code reviews.

Product Engineering

At Hatch we have an amazing cross-functional team that focuses on user-centred design. All engineers at Hatch are Product Engineers and in 2019 we want to get better at that.  

Specifically we want to

  • improve the communication with design, partly by focusing on our design language
  • increase our interaction with our customers
  • improve how we break down and understand the complexity in our epics

2019 is going to be a big year for Hatch, if you’re interested in joining us have a look here!