You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
352 lines
16 KiB
352 lines
16 KiB
# Assignment
|
|
|
|
## 👉 Our Expectations
|
|
|
|
We greatly appreciate your commitment to this process.
|
|
|
|
The main goal of this task is to:
|
|
|
|
- Get to know how you work
|
|
- Assess your general understanding and expertise in relevant aspects of backend development
|
|
- Ultimately, have a conversation with you to discuss the actions you performed, results, and any issues encountered.
|
|
|
|
We do not want to create a stressful situation for you or interfere with your daily activities. Therefore, you should not invest more than a few hours in executing this task.
|
|
|
|
Please feel free to contact us at any time if you have any questions.
|
|
|
|
We are confident that you will do your best, and we are looking forward to reviewing the output with you. 😊
|
|
|
|
## 💡 **Track and Trace API**
|
|
|
|
The Track and Trace page enables end users (receiver of a shipment) to monitor the status of their shipment.
|
|
|
|
Your task is to create a backend API that serves shipment data and provides users with current weather conditions at their location.
|
|
|
|
### **Guideline**
|
|
|
|
You have been provided with a CSV file (below) containing sample shipment and article data to seed your data structures. Your task is to create an API application that performs the following:
|
|
|
|
- Provides a RESTful or GraphQL Endpoint that exposes shipment and article information along with corresponding weather information.
|
|
- Allows consumers to lookup shipments via tracking number and carrier.
|
|
|
|
*Hints*:
|
|
|
|
- Integrate a suitable weather API: Choose a weather API (e.g., OpenWeatherMap, Weatherbit, or any other free API) and fetch the weather information for the respective locations.
|
|
- Limit weather data retrieval: Ensure that weather information for the same location (zip code) is fetched at most every 2 hours to minimize API calls.
|
|
- You can use any framework or library that you feel comfortable with.
|
|
|
|
*Nice to have:*
|
|
|
|
- Provide unit tests and/or integration tests for the application.
|
|
- OpenAPI docs
|
|
- Implement the API using an well-known backend framework.
|
|
|
|
### **Solution**
|
|
|
|
**Deliverables**:
|
|
|
|
- Application code, tests, and documentation needed to run the code.
|
|
|
|
Evaluation Criteria:
|
|
|
|
- Code quality and organization
|
|
- Functional correctness
|
|
- Efficiency, robustness, and adequate design choices
|
|
|
|
Discussion Points:
|
|
|
|
- What were the important design choices and trade-offs you made?
|
|
- What would be required to deploy this application to production?
|
|
- What would be required to scale this application to handle 1000 requests per second?
|
|
|
|
### Seed Data
|
|
|
|
NB: you don’t need to write an import, if that is too time consuming, you can just setup one or two sample items in the DB via any mechanism you like.
|
|
|
|
```html
|
|
tracking_number,carrier,sender_address,receiver_address,article_name,article_quantity,article_price,SKU,status
|
|
TN12345678,DHL,"Street 1, 10115 Berlin, Germany","Street 10, 75001 Paris, France",Laptop,1,800,LP123,in-transit
|
|
TN12345678,DHL,"Street 1, 10115 Berlin, Germany","Street 10, 75001 Paris, France",Mouse,1,25,MO456,in-transit
|
|
TN12345679,UPS,"Street 2, 20144 Hamburg, Germany","Street 20, 1000 Brussels, Belgium",Monitor,2,200,MT789,inbound-scan
|
|
TN12345680,DPD,"Street 3, 80331 Munich, Germany","Street 5, 28013 Madrid, Spain",Keyboard,1,50,KB012,delivery
|
|
TN12345680,DPD,"Street 3, 80331 Munich, Germany","Street 5, 28013 Madrid, Spain",Mouse,1,25,MO456,delivery
|
|
TN12345681,FedEx,"Street 4, 50667 Cologne, Germany","Street 9, 1016 Amsterdam, Netherlands",Laptop,1,900,LP345,transit
|
|
TN12345681,FedEx,"Street 4, 50667 Cologne, Germany","Street 9, 1016 Amsterdam, Netherlands",Headphones,1,100,HP678,transit
|
|
TN12345682,GLS,"Street 5, 70173 Stuttgart, Germany","Street 15, 1050 Copenhagen, Denmark",Smartphone,1,500,SP901,scanned
|
|
TN12345682,GLS,"Street 5, 70173 Stuttgart, Germany","Street 15, 1050 Copenhagen, Denmark",Charger,1,20,CH234,scanned
|
|
```
|
|
|
|
## That is it - everything else is up to you! Happy coding!
|
|
|
|
# Solution
|
|
|
|
I spent around 6 hours implementing this.
|
|
|
|
## Domain model
|
|
|
|
The original assignment leaves some questions unanswered, I tried to answer them myself:
|
|
|
|
* **Are tracking numbers unique across the entire system, or only within a single carrier?**
|
|
|
|
The sample data makes it seem as if these were not carriers' tracking numbers but parcellab's own tracking numbers
|
|
(serving as an abstraction over actual carriers).
|
|
|
|
On the other hand, the assignment says that customers should be able to look shipments up by tracking number and carrier.
|
|
|
|
In the end, I decided to use carrier + tracking number as an identifier (meaning that carrier is required in the request,
|
|
and that requests with the wrong carrier will result in 404).
|
|
|
|
* **For what point exactly should I retrieve the weather?**
|
|
|
|
The assignment mentions ZIP code in passing in an unrelated section.
|
|
But ZIP codes are not a great way to determine weather at the receiver area, because they can be very large
|
|
(e.g. 0872 in Australia is over 1500 kilometers across, with the area larger than the entire Germany and France combined).
|
|
|
|
On the other hand, addresses in the seed data are all fake ("Street 1" etc), and one cannot get weather for them.
|
|
|
|
It is not clear why would anybody even need this data, or how are they going to use it.
|
|
(Using location of a pickup point would probably make more sense than using the address of a recipient, but we don't have this data.)
|
|
|
|
In the end, I decided to use addresses, and replace some of the fake addresses in the seed data with real ones.
|
|
|
|
Which brings me to the next question...
|
|
|
|
* **What is the supposed scenario in which caching weather data will be beneficial?**
|
|
|
|
Do we assume that the application should return data for a lot of different packages to different addresses,
|
|
with only infrequent requests (e.g. once a day) for the same package?
|
|
Then caching weather data by location will not be of any benefit, instead only increazing our cache size
|
|
(because the requests will never hit the cache, because every request is for the different location).
|
|
Or, alternatively, we could use coarser coordinates for caching, e.g. rounding them to 0.1 degree (roughly 11km or less),
|
|
so that we could still reuse the weather data even for different (but close enough) addresses.
|
|
This can be done in `src/packages.service.ts`.
|
|
|
|
Or do we assume that the application is going to receive a lot of requests for the small number of packages?
|
|
|
|
For simplicity, in this implementation I went with the latter assumption.
|
|
|
|
* **How to get weather, knowing only the recipient address (including the zip code)?**
|
|
|
|
Most, if not all, weather APIs accept location as a parameter, not an address.
|
|
So in addition to integrating with a weather API, I also had to integrate with a map (geocoder) API
|
|
in order to resolve addresses to locations (latitude + longitude).
|
|
|
|
* **What do tracking numbers even mean? Why are they not unique?**
|
|
|
|
I've only noticed this in the last moment: in the seed data, there are multiple rows
|
|
with the same carrier + tracking number, same addresses, same statuses, but different products.
|
|
I guess that this is because it is supposed to come from some kind of join SQL query, returning one row per product.
|
|
|
|
I don't think this is a good was to get the data for such an API endpoint, but that's what the source data is.
|
|
|
|
If only I noticed it earlier, I would write my code accordingly.
|
|
But I only noticed it in the very end.
|
|
So I just did a quick workaround to demonstrate that everything works, and changed tracking numbers to be unique.
|
|
|
|
* **What weather data should we use?**
|
|
|
|
It is not clear why would anybody even need this data, or how are they going to use it.
|
|
|
|
I decided to use the current weather data instead of forecasts.
|
|
Since we're refetching the weather data if it's more than 2 hours old,
|
|
the "current weather" data will never be more than 2 hours out of date, and will still stay somewhat relevant.
|
|
|
|
* **What data to return? What if location or weather APIs are unavailable or return an error?**
|
|
|
|
I decided to return package data always, and weather data only if both location and weather APIs resolved successfully.
|
|
|
|
## Decisions made, implementation details
|
|
|
|
* **Which framework to use**
|
|
|
|
Nest.js provides a good starting point, and I already created services with Nest.js in the past,
|
|
so I decided to use it here as well, to save time on learning the new boilerplate.
|
|
|
|
Additionally I switched TS config to use `@tsconfig/strictest` options,
|
|
and eslint config to use `@typescript-eslint/strict` ruleset.
|
|
|
|
* **Dependency injection**
|
|
|
|
All integrations and storages etc have proper generalized interfaces;
|
|
it is easy enough to switch to another weather / location provider or to another database,
|
|
simply by implementing another integration and switching to it as a drop-in replacement
|
|
in `src/app.module.ts`.
|
|
|
|
Additionally, since Nest.js unfortunately does not support typechecking for dependencies yet,
|
|
I implemented additional types as a drop-in replacement for some of the Nest.js ones;
|
|
they're quite limited but they have all the features I used from Nest.js,
|
|
and they do support typechecking for dependencies.
|
|
(`src/dependencies.ts`).
|
|
|
|
* **Which APIs to use**
|
|
|
|
I decided to use Openmeteo for weather, and OSM Nominatim for location resolving,
|
|
because they are free / open-source, don't require any sign ups / API tokens,
|
|
and are easy enough to use.
|
|
|
|
However, they have strict request limits for the free tier, so in my integration
|
|
I limited interaction with them to one request per second.
|
|
|
|
* **Resolving addresses to locations**
|
|
|
|
One problem with OSM Nominatim is that it returns all objects that match the query.
|
|
And very often, there are several different objects matching the same address
|
|
(e.g. the building itself, plus all the organizations in it, or other buildings with address supplements).
|
|
|
|
So I take the mean of all the locations returned by OSM Nominatim, and check if all returned locations lie
|
|
within 0.01 degree (1.1km or less) from the mean.
|
|
|
|
If they do, this means that all the results are close enough together to probably actually refer
|
|
to more or less the same location, so I return the mean.
|
|
If they don't, this means that results probably refer to different locations, and we cannot determine
|
|
which of these locations is the one we're supposed to return (for example, imagine the address:
|
|
"Hauptbahnhof, Germany"), so I throw an error.
|
|
|
|
* **Caching**
|
|
|
|
The assignment says that the weather data should not be fetched more frequently than every two hours for the same location.
|
|
(It also says "zip code", but then again, 0872 Australia.)
|
|
So I'm caching the weather data for two hours (TTL in `src/clients/weather.ts`).
|
|
|
|
But I'm also caching the location data for a day, because it's unlikely to change very often
|
|
(it certainly doesn't change as often as the weather),
|
|
and because I don't want to send too many requests to OSM.
|
|
|
|
* **Database implementation**
|
|
|
|
For simplicity, and to make this solution self-contained, I decided to use in-memory database
|
|
for the packages, with simulated 50ms latency (to make it feel more like a remote database).
|
|
|
|
If needed, it can be replaced by any other database, by creating a new implementation of `PackagesRepository`.
|
|
|
|
* **Cache implementation**
|
|
The same in-memory database (just a JS `Map`) is used, with simulated 20ms latency.
|
|
|
|
If needed, it can be replaced by any other caching solution,
|
|
by creating a new implementation of `ClearableKeyValueStorage`.
|
|
|
|
(Also I should have created an intermediate caching layer handling the TTLs / expirations,
|
|
but I didn't have time for that, so this logic _and_ the logic of loading stuff and storing it in cache
|
|
both currently live in `storage/cache.ts`.)
|
|
|
|
* **Throttling**
|
|
|
|
One problem with complicated things like caching is that the naive implementations are not really concurrent-safe,
|
|
with the code assuming that the related cache state will not change while we're working with that cache.
|
|
|
|
Of course it doesn't really work like that, but as long as there is only one copy of our application running,
|
|
and no other applications work with the same data, we can simulate this easily enough by making sure
|
|
that concurrency-unsafe code is only executed once at a time for every set of data it operates on.
|
|
|
|
Since all that code in this application is supposed to be more or less idempotent, there is no need to wait
|
|
for it to resolve before calling it again; if some data is requested again for the same key, we can simply
|
|
return the previous (not-yet-resolved) promise.
|
|
|
|
This is implemented in `src/utils/throttle.ts`.
|
|
|
|
Basically this means that e.g. if the location for the same address is requested twice at the same time,
|
|
the underlying function (checking the cache, querying the API and storing data to the cache if there is a cache miss)
|
|
will only be called once, and both callers will get the same promise.
|
|
|
|
## How to use the app
|
|
|
|
To lint: `npm run lint`.
|
|
|
|
To test: `npm run test` and `npm run test:e2e`.
|
|
|
|
To start: `npm run start`.
|
|
|
|
This is a RESTful API. To get package info, send GET request to e.g. `/packages/UPS/TN12345679`.
|
|
|
|
## Discussion points
|
|
|
|
> What were the important design choices and trade-offs you made?
|
|
|
|
See above ("Domain model", "Decisions made").
|
|
|
|
> What would be required to deploy this application to production?
|
|
|
|
First of all, identifying what problem are we solving would be required.
|
|
Because this application does not seem like it's actually solving some problem.
|
|
Why would anybody need such an endpoint? How are they going to use it?
|
|
Without answering these questions, there is no point in deploying this application anywhere,
|
|
and there cannot be any clear understanding of non-functional requirements.
|
|
|
|
But also we'd need to use some production-ready reliable (and probably paid) APIs for geocoding and weather,
|
|
and some actual database (or API) for retrieving package data (instead of in-memory key-value record),
|
|
and some actual caching solution (e.g. redis), which will also mean rethinking how `throttle` is used here.
|
|
|
|
> What would be required to scale this application to handle 1000 requests per second?
|
|
|
|
This application already handles over 3000 requests per second with ease,
|
|
even with an artificial 20ms cache access latency (and artificial 50ms DB access latency).
|
|
|
|
```
|
|
❯ ab -c 500 -n 100000 http://127.0.0.1:3000/packages/UPS/TN12345679
|
|
This is ApacheBench, Version 2.3 <$Revision: 1903618 $>
|
|
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
|
|
Licensed to The Apache Software Foundation, http://www.apache.org/
|
|
|
|
Benchmarking 127.0.0.1 (be patient)
|
|
Completed 10000 requests
|
|
Completed 20000 requests
|
|
Completed 30000 requests
|
|
Completed 40000 requests
|
|
Completed 50000 requests
|
|
Completed 60000 requests
|
|
Completed 70000 requests
|
|
Completed 80000 requests
|
|
Completed 90000 requests
|
|
Completed 100000 requests
|
|
Finished 100000 requests
|
|
|
|
|
|
Server Software:
|
|
Server Hostname: 127.0.0.1
|
|
Server Port: 3000
|
|
|
|
Document Path: /packages/UPS/TN12345679
|
|
Document Length: 394 bytes
|
|
|
|
Concurrency Level: 500
|
|
Time taken for tests: 29.422 seconds
|
|
Complete requests: 100000
|
|
Failed requests: 0
|
|
Total transferred: 60300000 bytes
|
|
HTML transferred: 39400000 bytes
|
|
Requests per second: 3398.84 [#/sec] (mean)
|
|
Time per request: 147.109 [ms] (mean)
|
|
Time per request: 0.294 [ms] (mean, across all concurrent requests)
|
|
Transfer rate: 2001.47 [Kbytes/sec] received
|
|
|
|
Connection Times (ms)
|
|
min mean[+/-sd] median max
|
|
Connect: 0 7 3.5 7 19
|
|
Processing: 98 139 11.0 137 251
|
|
Waiting: 89 115 12.7 114 222
|
|
Total: 103 146 10.7 144 259
|
|
|
|
Percentage of the requests served within a certain time (ms)
|
|
50% 144
|
|
66% 146
|
|
75% 149
|
|
80% 151
|
|
90% 159
|
|
95% 166
|
|
98% 170
|
|
99% 172
|
|
100% 259 (longest request)
|
|
```
|
|
|
|
Granted this is all for the same URL, and it can be slower if we will be requesting different URLs.
|
|
But also the only reasons why it can be slower (in terms of rps) for different URLs is:
|
|
|
|
* Sending more requests to remote APIs
|
|
(will only affect local resources because more bandwidth is used,
|
|
and more requests are created and responses parsed, and more sockets are open at the same time, etc);
|
|
* Using larger caches
|
|
(performance of JS `Map` naturally depends on how many entries are there within a given map,
|
|
and also there are memory constraints).
|
|
|
|
But, depending on the usage profile (how many different packages are requested how often?),
|
|
1000 requests per second should be doable.
|
|
|