This assignment is designed to take a day's worth of work (8 hours). if you find that some things are not possible to complete in that timeframe, that's okay – prioritize as you see fit.
Datawrapper Test Project: Backend
The project is to create a web service that takes screenshots of websites.
The service should expose an API that a client can use to create a new screenshot request, with any given URL as a parameter. The web service should then return a status URL that the client can periodically call to receive updates on the status on its request. The web service should take a screenshot of the provided URL. Once the screenshot was taken, it should be possible for the client to retrieve it. The service should be designed in a scalable fashion, so that it can handle varying amounts of parallel requests in a stable and efficient fashion.
The API notation can be defined as you see fit.
- It should be possible to run the application locally (ideally as a containerized application, but other technologies are allowed as well)
- Any other tools, technologies or frameworks can be used at your discretion
- A hosted version of the application to test is a plus, but not a must
The project should be delivered as a Github repository. It should be possible to run the service locally.
I'd say that this is a very odd test assignment, since it requires one to create an entirely new service, which involves a lot of boilerplate, and takes most of the allotted time. This assignment effectively tests whether the applicant is capable of a new service on their own, from scratch, which is (hopefully) very unrepresentative of the actual job responsibilities; typically, companies have established procedures for creating new services, often they have templates and starters, and they definitely have established stack and dependencies.
At my current company we're using
tsoa for microservices, with our own
templates and starters.
However, creating a new tsoa project from scratch, without using these
templates, and without looking at the code belonging to the company,
would demand a lot of time.
So I decided to search for an alternative, and settled on NestJS for the
purpose of this assignment.
Note that this is my first experience with NestJS, so I may violate some best practices, just because I'm not aware of them. NestJS also turned out to be surprisingly powerful for creating small proof-of-concept services.
The requirement that the service should be scalable and handle high load calls for an obvious solution: queues. If this was an actual service and not an MVP, the architecture would probably look like this:
- Public service accepts a screenshotting job, stores it in queue;
- A number of private workers (serverless functions, or actual private services) pick up jobs from the queue, perform computation-heavy page rendering and screenshotting, and store screenshots in the same queue/DB;
- Public service, when queried the status of a specific job, returns its status along with the screenshot (if the job is completed already).
This way, quality of service would never degrade for the public service; since it just adds jobs to queue and checks their status (which are relatively lightweight tasks), it would always be responsible.
And, depending on the load (and budget considerations), new workers could be added to reduce the queue length, and the amount of time clients would need to wait for their jobs to be processed.
In order to further reduce load on the public service, it would make sense to use the push model instead of the polling; when submitting a new job, clients could provide their callback URL to be notified once the job is completed, instead of constantly checking its status. (Unfortunately, I was unable to implement this within the 8-hour time limit).
The architecture of this proof-of-concept project roughly resembles the ideal architecture as described above, except that private workers live in the same public service, negating all the potential scalability benefits. However, it seems that with NestJS, workers could be trivially extracted from the public service, as the communication between public controllers and workers is done via Redis in this project. (Using Redis was not a conscious choice; it is just that NestJS integrated queues use Redis internally and do not support other storage/queue backends).
First idea that came to my mind was that there ought to be headless Chrome
browser intended to be used for testing.
And sure there is;
puppeteer package comes with a headless
allows to make screenshots.
This should be enough for the proof-of-concept.
In order to showcase the API capabilities, I originally intended to allow
clients to set additional options for screenshotting.
Unfortunately, I only had time to implement support for
could be jpeg or png).
Unfortunately, I didn't have enough time to ensure reasonable coverage, as I had to spend most of these 8 hours getting myself acquainted with NestJS.
There is an end-to-end test that covers the basic positive scenario. Unfortunately, it requires Redis to be up and running; I did not have enough time to decouple it from Redis.
Also note that tests check some of the results against image snapshots stored in repository, and of course screenshots on your platform may look different than on mine, which can cause tests to fail (unless snapshots are recreated). Tests pass on my system.
In addition to using this project as an API, you can also use the included
single-page application (available under
http://localhost:3000/spa or similar).
Just enter the target URL, create the job, check its status.
Once the status changes to
'completed', you will see the screenshot.
Note that this is not intended to be a nice or beautiful application; on the contrary, this is something very rough and barely working, quickly hacked together simply to demonstrate how can API be used, with the goal of spending as little time on it as possible (so it had to be plain JS, without types, without any front-end frameworks, etc).
Since I did not have enough time to actually write the code (instead spending most of it basically on learning NestJS), there are some issues with the code.
In particular, there are basically no tests besides one e2e test for the most
basic scenario, and there is some unneeded code duplication.
Additionally, classes naming could be better; and
contains a lot of code that does not belong to it (ideally, it would all be in
ScreenshotsController just being an adapter
between that service and NestJS/API interface).
$ npm install
Running the app
# development $ npm run start # watch mode $ npm run start:dev # production mode $ npm run start:prod
Note that you need to have redis running locally in order to run the app
After starting the app, navigate to the displayed URL (
in order to check out the API docs and the basic single-page screenshotting
# unit tests $ npm run test # e2e tests $ npm run test:e2e # test coverage $ npm run test:cov
Note that you need to have redis running locally in order to run e2e tests