test-assignment-datawrapper/README.md


# Assignment

This assignment is designed to take a day's worth of work (8 hours).
if you find that some things are not possible to complete in that timeframe,
that's okay – prioritize as you see fit.

## Datawrapper Test Project: Backend
### Overview

The project is to create a web service that takes screenshots of websites.

The service should expose an API that a client can use to create a new
screenshot request, with any given URL as a parameter.
The web service should then return a status URL that the client can
periodically call to receive updates on the status on its request.
The web service should take a screenshot of the provided URL.
Once the screenshot was taken, it should be possible for the client to retrieve it.
The service should be designed in a scalable fashion, so that it can handle
varying amounts of parallel requests in a stable and efficient fashion.

The API notation can be defined as you see fit.

### Technology stack

* Server-side code should be written in JavaScript or TypeScript
* It should be possible to run the application locally (ideally as a
containerized application, but other technologies are allowed as well)
* Any other tools, technologies or frameworks can be used at your discretion
* A hosted version of the application to test is a plus, but not a must

### Delivery

The project should be delivered as a Github repository.
It should be possible to run the service locally.

# Solution

## Service boilerplate

I'd say that this is a very odd test assignment, since it requires one to
create an entirely new service, which involves a lot of boilerplate, and takes
most of the allotted time.
This assignment effectively tests whether the applicant is capable of a new
service on their own, from scratch, which is (hopefully) very unrepresentative
of the actual job responsibilities; typically, companies have established
procedures for creating new services, often they have templates and starters,
and they definitely have established stack and dependencies.

At my current company we're using `tsoa` for microservices, with our own
templates and starters.
However, creating a new tsoa project from scratch, without using these
templates, and without looking at the code belonging to the company,
would demand a lot of time.
So I decided to search for an alternative, and settled on NestJS for the
purpose of this assignment.

Note that this is my first experience with NestJS, so I may violate some best
practices, just because I'm not aware of them.
NestJS also turned out to be surprisingly powerful for creating small
proof-of-concept services.

## Scalability

The requirement that the service should be scalable and handle high load calls
for an obvious solution: queues.
If this was an actual service and not an MVP, the architecture would probably
look like this:
* Public service accepts a screenshotting job, stores it in queue;
* A number of private workers (serverless functions, or actual private services)
pick up jobs from the queue, perform computation-heavy page rendering and
screenshotting, and store screenshots in the same queue/DB;
* Public service, when queried the status of a specific job, returns its status
along with the screenshot (if the job is completed already).

This way, quality of service would never degrade for the public service;
since it just adds jobs to queue and checks their status (which are relatively
lightweight tasks), it would always be responsible.

And, depending on the load (and budget considerations), new workers could be
added to reduce the queue length, and the amount of time clients would need to
wait for their jobs to be processed.

In order to further reduce load on the public service, it would make sense to
use the push model instead of the polling; when submitting a new job, clients
could provide their callback URL to be notified once the job is completed,
instead of constantly checking its status.
(Unfortunately, I was unable to implement this within the 8-hour time limit).

The architecture of this proof-of-concept project roughly resembles the ideal
architecture as described above, except that private workers live in the same
public service, negating all the potential scalability benefits.
However, it seems that with NestJS, workers could be trivially extracted from
the public service, as the communication between public controllers and workers
is done via Redis in this project.

## Screenshotting

First idea that came to my mind was that there ought to be headless Chrome
browser intended to be used for testing.
And sure there is; `puppeteer` package comes with a headless `Chrome`, and
allows to make screenshots.
This should be enough for the proof-of-concept.

In order to showcase the API capabilities, I originally intended to allow
clients to set additional options for screenshotting.
Unfortunately, I only had time to implement support for `imageType` (which
could be jpeg or png).

## Tests

Unfortunately, I didn't have enough time to ensure reasonable coverage, as I had
to spend most of these 8 hours getting myself acquainted with NestJS.

There is an end-to-end test that covers the basic positive scenario.
Unfortunately, it requires Redis to be up and running; I did not have enough
time to decouple it from Redis.

Also note that tests check some of the results against image snapshots stored in
repository, and of course screenshots on your platform may look different than
on mine, which can cause tests to fail (unless snapshots are recreated).
Tests pass on my system.

## Usage

In addition to using this project as an API, you can also use the included
single-page application (available under `http://localhost:3000/spa` or similar).

Just enter the target URL, create the job, check its status.
Once the status changes to `'completed'`, you will see the screenshot.

## Other considerations

Since I did not have enough time to actually write the code (instead spending
most of it basically on learning NestJS), there are some issues with the code.

In particular, there are basically no tests besides one e2e test for the most
basic scenario, and there is some unneeded code duplication.
Additionally, classes naming could be better; and `ScreenshotsController`
contains a lot of code that does not belong to it (ideally, it would all be in
some `ScreenshotsService`, with `ScreenshotsController` just being an adapter
between that service and NestJS/API interface).

# Commands

## Configuration

Check `.env`

## Installation

```bash
$ npm install
```

## Running the app

```bash
# development
$ npm run start

# watch mode
$ npm run start:dev

# production mode
$ npm run start:prod
```

After starting the app, navigate to the displayed URL (`https://localhost:3000`)
in order to check out the API docs and the basic single-page screenshotting
application.

## Test

```bash
# unit tests
$ npm run test

# e2e tests
$ npm run test:e2e

# test coverage
$ npm run test:cov
```

**Note that you need to have redis running locally in order to run e2e tests**