KoalaSafe says goodbye

A sad day today, after 6 years KoalaSafe is closing down. It’s a huge disappointment we didn’t get to fulfil our mission and continue serving the families that rely on us.

But we did have a huge impact on many families and acheived some modest success as a business:

  • Successful Kickstarter
  • 200,000 + devices managed
  • Sold in over 250 Target stores across the USA
  • Released two hardware versions

Engineering 2020 In Review

Originally posted on Hatch Blog

Year 3 for the Hatch Engineering team was a massive year and not what we expected, but then 2020 was not what anyone expected. Hatch launched the Exchange within weeks of the COVID-19 lock-downs to help stood down workers find temporary work. We helped over 1500 people find employment and had over 12,000 people sign up. We then took the learning from that and brought forward our plans to help companies hire junior talent, not only students, launching a new version of Hatch.

2020 Highlights

Labour Exchange

The big highlight of 2020 was how the entire Hatch team pulled together in a herculean effort to build, launch, and operate the Labour Exchange in just a matter of weeks. It felt like the early days again and we generated a huge amount of learning from the experience.

Rapid Prototyping

The Exchange provided the team with an opportunity to experiment with low and no code solutions and alternate architectures to rapidly get a product to market. We ended up delivering the Exchange with a mix of Airtable and S3 as backend “UI” and data storage. This has led to us continuing to use Airtable as we started to build Hatch 2.0 later in the year.

Quality vs Speed Experiment

The focus on releasing the Exchange as quickly as possible gave us an unusual opportunity to build product in a very different way to which we normally do and compare the results! We didn’t have any CI/CD, we siloed engineers to code areas they were strongest, we limited the amount of collaboration on tech design and we had minimal automated tests although we did keep code reviews.

The result was we delivered customer value the quickest we ever have. We also had more incidents in those few months than we have had in the last 3 years, engineers had no opportunity to develop into other areas -focusing only on their core strengths, code was siloedand quickly became difficult to change and stressful to deploy, and the support and operational load was far higher.

Despite the stressful and untenable environment it created, it was a great experience for the engineering team and Hatch as a whole to see first hand the trade-offs involved for Velocity. As we pivoted the product later in the year we took a lot of lessons from the Exchange allowing us to build on top of low code platforms like Airtable in a way that was more maintainable than our first attempt.

Remote work

Coming into 2020 building a remote work culture was one of the engineering team’s stated goals

With flexible work a continuing mega-trend this decade, we want to make sure remote working is built into our culture. Whilst we aren’t remote first Hatch does embrace flexible working (work from home Wednesday!) and a number of the engineering team are planning remote work trips this year. Whilst internally we are comfortable working together remotely, we want to lean into ensuring our remote practices and interactions with the wider team are equally as second nature as it is internally.

COVID-19 was a force multiplier for remote working everywhere. It helped us go deeper on this goal than we thought possible last year. Our previous flexible work arrangement (where everyone works from home on Wednesday) was mainly designed to create space for deep individual work which can be difficult in a busy office. This didn’t force us to really develop remote first tendencies as in-person collaboration was just scheduled on other days.  Once thrust into full remote working we needed to develop clear agreements on sync and async communications, and effective ways of collaborating online.

The Engineering team felt far more productive remote than when in the office fulltime, but there are still issues to work through. The main issue is maintaining culture and team connection. We tried a number of options from team zoom lunches where we all cook the same dish, to Friday drinks. Unfortunately group social video is still not a solved problem and for the extroverts nothing beats the real thing.

Miro was one of the key contributors to our success during this period, the wider team has spent a lot of time in Miro. It’s especially good at facilitating remote collaborative sessions like retrospectives, although the feedback you get when presenting ideas and designs through Zoom can still be daunting with everyone often on mute or providing non-verbal feedback which you can’t see on your screen when looking at a Miro.

Product Pivot

After the learnings from the Labour Exchange we accelerated our roadmap into being a platform for  junior roles, not just students.  Launching the MVP of Hatch 2.0 required re-thinking how we capture, assess  and present applicant data and provided an opportunity to fulfil some of our 2020 goals the Labour Exchange had put on hold.

Splitting Site from App

Since I joined Hatch I’ve wanted to split the marketing site from the application code base. As our marketing  team grew so did the support for the endeavour to improve the performance of the site. When we needed to re-launch the marketing site to account for our new offering it was the perfect opportunity.  Based on some rapid prototyping we’d done in NextJs for the Exchange, we decided to use NextJs again for its ability to easily mix statically rendered pages with SSR rendered pages.

We use FAB to run the site on AWS CloudFront. It’s still early days for FAB and there are a lot of workarounds to have it run our site, but I love where it’s going and the feature set it opens up.

Engineering and Data Science Responsibilities

We made great progress in defining the responsibilities and boundaries between the Data Science and Engineering. After a few iterations of how to integrate node and python and where to host the code we settled on models as a service backed by a virtual feature store (thin wrapper over product databases for now) and clear definitions of which parts are maintained by DS versus engineering. We’ll outline this in more detail in a future post.

“Right-sized” Services and Orchestration

Hatch 1.0 was opinionated about how a hiring journey should work and we didn’t invest heavily in edge cases or unhappy paths. This made experimenting with new recruitment processes, like 2.0 or the Exchange, difficult. It made sense at the time, we were building for a very specific use case but as we moved into 2.0 we wanted to be able to test a number of different ways to add value to the recruitment process. We needed a more flexible set of tools we could orchestrate into processes to serve the current hypothesis.

This allowed us to accelerate into our microservice right-sized services architecture. Building services with clear data boundaries and well defined responsibilities that we glue together, where needed, with orchestration services that encode a particular workflow we are using. Rather than needing to make each component service know about its neighbours, they provide an interface API and set of events which we can use in the orchestration service to wire together workflows.

We continued building our services with Lerna in our monorepo which has been working well. We are still looking for a better solution or tooling for deploying the changed set of services more efficiently.

Low-code

We took our learning with Airtable from the Exchange and applied to our 2.0 MVP to rapidly spin up the operational aspects of role assessment definition and fulfilment. It allowed us to rapidly test data structures, hypothesis and iterate before we commit to building out bespoke product. We are even doubling down on it this year with custom Airtable apps to provide better UI over the data. We know we eventually need to replace this, but we hope to have iterated enough that we will really understand our needs once we start building.

We also started trialling Webflow to empower operations and marketing to own their own content and rapidly iterate. It’s still early days and the single concurrent user in the designer is proving to be a blocker, so it may not be Webflow in the future but empowering the entire team to iterate quickly without engineering is proving invaluable.

GraphQL

A UI challenge with service oriented architectures is that a piece of UI always needs more data than a single service can provide. We end up writing facades either as a service or in the UI itself to aggregate the needed data. The promise of GraphQL alleviating this issue and throwing in change notifications for free has been something the team has want to test out for awhile. This year we built our new version of the internal scoring tool, Astria, with GraphQL (AppSync) to kick the tyres in a contained way.

Personally I was underwhelmed. The tooling is still immature and the overhead of introducing another way of accessing data and building services (versus our REST API’s) wasn’t worth the value it delivered. Change notifications only work well if everything is mutated with GraphQL, injecting mutations from other avenues is clunky, and the experience left me thinking WebSockets would be just as easy for our current architecture.

Currently we are leaving it contained to this one service, however Apollo Federated Schemas has potential to solve the issue of data aggregation across our services. Once this is available for AppSync we may kick the tyres again.

2021 Focus

Hatch’s wider 2021 focus is outlined here, from an engineering perspective this is what we want to focus on.

  • Growing the team – we’re searching for a talented Full stack Product Engineer to join our team
  • Improving Team Knowledge Sharing – The surface area of our product has grown, we maintain Hatch 1.0, the Labour Exchange, Hatch 2.0 and a marketing site for each. Plus we are growing our team. Keeping the team aligned on our best practices and how everything fits together architecturally is a big challenge for 2021.
  • Data Science Collaboration – We have grown our Data Science team and will be focusing on our Matching Science in 2021. We laid good ground work in 2020 for how engineering and science work together. This year we will be doubling down on this and building on that initial ground work to create more maintainable systems.
  • Building a marketplace for junior talent – All of the above is in service of aligning our product offerings into a single marketplace that helps us execute on our mission.

Technology & Practices Radar

From this year we are also going to start tracking our Technology & Practices Radar

Highlights of the new entrants for us include

  • Linc and FAB’s –  interesting front end containerisation which was recently acquired by CloudFlare.
  • Archium – Which we hope will accelerate team understanding of the overall system and how it hangs together.

Some of the tools and tech we have put on hold include:

  • MobX – in favour of pure state management. The adoption of hooks really showed us we didn’t need centralised state management for our app.
  • GraphQL – detailed above

Achieving S3 Read-After-Update Consistency

Originally posted on the Hatch Blog

The team at Hatch spun up the Labour Exchange in a few days re-purposing our tech to help stood down workers find employment during the covid-19 crisis.

In order to get the system up and running in such a short time frame we decided to use S3 as a flat-file data store to maintain our serverless batch job states and caches. After a very cursory search to satisfy ourselves S3 would guarantee read-after-write consistency, we flew on.

From the documentation

Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all Regions with one caveat. The caveat is that if you make a HEAD or GET request to a key name before the object is created, then create the object shortly after that, a subsequent GET might not return the object due to eventual consistency.

Unfortunately, we missed the bolded section and the part further down the page that explicitly states our use case:

A process replaces an existing object and immediately tries to read it. Until the change is fully propagated, Amazon S3 might return the previous data.

As these processes run on a schedule (not as part of a user facing API) we could afford to spend some extra calls to S3 to roll our own read-after-update consistency. Knowing that S3 guarantees read-after-firstWrite, we can write a new file for every change, read the latest file and make sure we cleanup.

So every time we write a file we:

  • Append a timestamp to the filename
  • Remove older files

When we read a file we:

  • List all files with the key prefix (S3 guarantees listing files will be ordered by ascending UTF-8 binary order)
  • Get the newest file in the list

import { writeS3Obj, getS3FileAsObj, listObjects, deleteS3Files } from "./S3";
/**
* S3 does not provide read-after-update consistency.
* It does provide read-after-firstWrite consistency (as long as no GET has been requested)
* We write a new file every time it changes, and we read the latest file.
* S3 guarantees list of files are sorted in ascending UTF-8 Binary Order
*
*/
const cleanUp = async (key: string) => {
const response = await listObjects({
MaxKeys: 1000,
Bucket: process.env.BUCKET_NAME!,
Prefix: key,
});
const keys = response.Contents?.map((c) => c.Key!) || [];
await deleteS3Files(keys.slice(0, keys.length 1));
};
export const writeServiceState = async (key: string, state: any) => {
await writeS3Obj(`${key}.${Date.now()}`, state);
await cleanUp(key);
};
export const getServiceState = async (key: string, defaultVal: T): Promise => {
const response = await listObjects({
MaxKeys: 1000,
Bucket: process.env.BUCKET_NAME!,
Prefix: key,
});
if (!response.Contents || response.Contents.length === 0) {
console.log("No state file for key " + key);
return defaultVal;
}
return getS3FileAsObj(response.Contents[response.Contents.length 1].Key!);
};
view raw s3.ts hosted with ❤ by GitHub

Engineering Year in Review 2019

Originally posted on Hatch Blog

Year 2 for the Hatch Engineering team is already in the rear-view mirror and work on feature/2020 has begun. We gained a new engineer, Steve, said goodbye to our Product Manager, John, and made strong progress on our 2019 focus; here are the highlights and what we want to focus on this year.

2019 Highlights

  • Grew the engineering team to 4 with the hire of Steve
  • Got very close to removing our legacy app Choo (2 views to go!)
  • Made great progress on our 2019 goal of Product Engineering. Engineers are participating much more in design and discovery
  • The investment in our design language paid dividends, improving the efficiency of front-end work as well as allowing design to hand over rough sketches for engineers to build .
  • Transitioning to React hooks
  • Our processes are always being retro’d and improved but we feel we have found a good balance of planning, rituals and GSD.
  • Breaking apart our code bases into packages  of a Lerna monorepo to aid in build times and code sharing between projects (api-client!!)
  • The entire happy path journey is in now in product (for company users and students).

2020 Focus

  • Now that we have the happy path journey in product we have a much better understanding of what the journey looks like, as well as a lot of history of what our unhappy paths look like. This learning will help us to improve our modelling of the process to better facilitate future changes as well as facilitating the manual processes for our unhappy paths. These changes will also help reduce the complexity of the system.
  • At the moment we have a monolith API project and a monolith App project. CI build times are 13 minutes for the app (15 minutes to fully deploy) and 6 minutes to build the API (30-40 minutes to fully deploy due to deploying and running e2e tests). We have laid the groundwork converting them to Lerna monorepos and hope to reduce the build times (especially locally) by separating out packages in both projects.
  • As the team grows focusing on maintaining developer productivity is crucial. A few pain points we know about and will work on are:  our test data for the service tests, re-working code to the current standard and paradigm if we touch it in a meaningful way.
  • With flexible work a continuing mega-trend this decade, we want to make sure remote working is built into our culture. Whilst we aren’t remote first Hatch does embrace flexible working (work from home Wednesday!) and a number of the engineering team are planning remote work trips this year. Whilst internally we are comfortable working together remotely, we want to lean into ensuring our remote practices and interactions with the wider team are equally as second nature as it is internally.
  • More blogging!!

Most excited for in 2020

  • 2020 is all about growth at Hatch and we are super excited about helping our new marketing team crush their growth goals!
  • As we transfer our matching learning into more robust systems we are starting to integrate our data science code into our primary code bases. This adds interesting and new technical challenges as we embrace polyglot code bases.
  • From a technical perspective, automating our back-office integrations in 2020 is an exciting new challenge.

Engineering Year in Review 2018

It was the first year at Hatch for the entire engineering team as we took over the reigns to build out the MVP into a mature product. We’ve merged the 2018 pull-request and have already started work on the feature/2019 branch. Here are some highlights and struggles from our 2018 retro.

2018 Highlights

  • Launched the Hatch API
  • Migrated the Company User side of the product to React
  • Converted the front-end projects from Flow to Typescript
  • Added lights to all the hardware 😉
  • Laid a solid foundation for the Hatch Engineering Culture

2019 Focus

Code Quality

At the beginning of 2018 we pushed a little hard on shipping product which led to some quality issues both in the code and for the customer. A team focus for 2019 is ensuring we are leaving the code better than we found it and improving the effectiveness of our code reviews.

Product Engineering

At Hatch we have an amazing cross-functional team that focuses on user-centred design. All engineers at Hatch are Product Engineers and in 2019 we want to get better at that.  

Specifically we want to

  • improve the communication with design, partly by focusing on our design language
  • increase our interaction with our customers
  • improve how we break down and understand the complexity in our epics

2019 is going to be a big year for Hatch, if you’re interested in joining us have a look here!

KoalaSafe – Easier Parenting. Safer Kids.

I haven’t posted in over a year! I have been toiling away on my new baby KoalaSafe.

Are your kids obsessed with Minecraft? Clash of Clans?
Do recognize that glazed look?

Technology moves fast, how can you as a parent keep up? It’s difficult to know which apps or sites to look out for, when they change every day.

Some things, once seen, can never be unseen. How do you ensure your kids don’t get tricked into following links to inappropriate content or videos, without being a helicopter parent? This stuff is hard.

KoalaSafe can help restore the balance and make parenting easier.

download

Logstash and IIS

Note: If you are also using Kibana as your front end, you will need to add a MimeType of “application/json” for the extension .json to IIS.

We are pushing all of our logs into Elasticsearch using Logstash. IIS was the most painful part of the process so I am writing up a few gotchas for Logstash 1.3.3 and IIS in general.

The process is relatively straight forward on paper:

  1. Logstash monitors the IIS log and pushes new entries into the pipeline
  2. Use a grok filter to split out the fields in the IIS log line (more on this below)
  3. Push the result into Elasticsearch

Firstly there is a bug in the Logstash file input on windows (doesn’t handle files named the same in different directories) which results in partial entries being read. To remedy this you need to get IIS to generate a single log file per server (default is per website). Once that is done we can read the IIS logs with this config

input {
file {
type => "iis"
path => "C:/inetpub/logs/LogFiles/W3SVC/*.log"
}
}

view raw
Logstash IIS Input
hosted with ❤ by GitHub

Once we have IIS log lines pumping through the veins of Logstash, we need to break down the line into its component fields. To do this we use the Logstash Grok filter. In IIS the default logging is W3C but you are able to select the fields you want outputed. The following config works for the default fields and [bytes sent] so we can see bandwidth usuage. The Heroku Grok Debugger is a lifesaver for debugging the Grok string (paste an entry from your log into it and then paste you GROK pattern in)

filter{
grok {
match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{WORD:iisSite} %{IPORHOST:site} %{WORD:method} %{URIPATH:page} %{NOTSPACE:querystring} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clienthost} %{NOTSPACE:useragent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:scstatus} %{NUMBER:bytes:int} %{NUMBER:timetaken:int}"]
}
}

view raw
IIS Logstash Grok
hosted with ❤ by GitHub

Below is the complete IIS configuration for logstash. There are a few other filters we use to enrich the event sent to logstash as well as a conditional to remove IIS log comments.

input {
file {
type => "iis"
path => "C:/inetpub/logs/LogFiles/W3SVC/*.log"
}
}
filter {
#ignore log comments
if [message] =~ "^#" {
drop {}
}
grok {
match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{WORD:iisSite} %{IPORHOST:site} %{WORD:method} %{URIPATH:page} %{NOTSPACE:querystring} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clienthost} %{NOTSPACE:useragent} %{NOTSPACE:referer} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:scstatus} %{NUMBER:bytes:int} %{NUMBER:timetaken:int}"]
}
#Set the Event Timesteamp from the log
date {
match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
timezone => "Etc/UCT"
}
ruby{ code => "event['kilobytes'] = event['bytes'] / 1024.0" }
#https://logstash.jira.com/browse/LOGSTASH-1354
#geoip{
# source => "clienthost"
# add_tag => [ "geoip" ]
#}
useragent {
source=> "useragent"
prefix=> "browser"
}
mutate {
remove_field => [ "log_timestamp"]
}
}
output {
elasticsearch {
host => "127.0.0.1"
}
}

view raw
IIS Logstash
hosted with ❤ by GitHub

Centralising Logs with Logstash and Kibana

Image

We have recently centralised our logs (IIS, CRM, our application of about 5 components) into Elasticsearch on Windows Server using Logstash as the data transformation pipeline (over RabbitMQ) and Kibana as the UI.   It allows us to see all our logs in one place (and if needed in a single timeline), developers can access live logs in a way that they can easily slice and dice the information with out requiring server access. And the front end (pictured) Kibana, is damn sexy! Its dead easy as well. All in all it took about a day to setup.

Architecture

Log Producers

All servers that produce file logs have Logstash installed as a service. Logstash monitors the log file and puts new entries onto a local RabbitMQ exchange . There are much lighter weight shippers out there, however they write directly to Elasticsearch. We wanted something a little more fault tolerant.

Log producers which we control (i.e. our custom components) write directly to RabbitMQ. We use NLog and a modified version (I’ll post more about that later) of the NLog.RabbitMq Target to write our log messages directly (async) to the local RabbitMQ exchange.

Log Server

Our centralized log server has Elasticsearch (the datastore) and Kibana (the UI) running. It also has another logstash agent that reads the messages off RabbitMQ, transforms them into more interesting events (extracting fields for search, GeoLocating IP addresses etc), and then dumps them into Elasticsearch.

Cozy Personal Finance Manager

Cozy Personal Finance Manager

An open-sourced Personal Finance Manager for Cozy Personal Cloud. Currently all my finances are run through Quicken Personal. I hate how closed it is (and how manual data entry is).

I have been trialing https://getpocketbook.com/ for awhile, but its intelligence about tracking what a transaction is has been buggy since day 1 and I’m locking myself into a closed-data system.

May have to look at contributing some Australian Bank Interfaces in the near future.

Making Ajax play with Passive ADFS 2.1 (and 2.0) – Reactive Authentication

The first post, described the issue of using ADFS and Ajax to create SSO between a WebApp and a WebAPI. This solution looks at the changing the WebAPI to return 401 if the request is not authorized and then using an iFrame to authenticate the user for subsequent calls.

The last solution, pre-authorized on the first AJAX call per page load, which adds some overhead. This was because JSONP has no means of returning status codes (this is not entirely true, you can return a 200 and then have the real response inside a payload, but that is beyond this article). This solution makes use of normal AJAX calls and 401 responses to perform authorization only when it is required.

Caveats

  • This uses normal AJAX calls, so it requires CORS to be enabled on the WebAPI server for cross-domain requests. (See this guide)
  • IE8 & 9 do not support the passing of cookies with cross domain requests and therefore this method will not work as described. However, it should be possible to pass the token in the body of the AJAX request (use POST and HTTPS to maintain security) and write a customized AuthenticationModule to read the token and provide it to the WSFederatedAuthenticationModule. (This is outside the scope of this solution however)

Solution

By default, the WSFederationAuthenticationModule redirects the user to ADFS if the user is not currently authenticated (there is no valid session cookie). This can be changed with the following code

FederatedAuthentication.WSFederationAuthenticationModule.AuthorizationFailed += (sender, e) =>
{
    if (Context.Request.RequestContext.HttpContext.Request.IsAjaxRequest())
    {
        e.RedirectToIdentityProvider = false;
    }
};

By adding this code to ApplicationStart, or a HttpModule, we can make the WebAPI return a HttpStatus of 401 every time authentication is required (during an AJAX request). We then handle this response in our javascript.

The following Gist shows some javascript that handles the 401 response and then uses the idea of authenticating in a iFrame from the last solution, before retrying the AJAX call. The second attempt should now have the needed session cookies to authorize and succeed.

[gist https://gist.github.com/thejuan/4e535a0c468fa47fd9cc]