minor rewordings, clarifications and fixes

main
Inga 🏳‍🌈 11 months ago
parent fc138837a7
commit c4abe2ff42
  1. 56
      solution.md

@ -15,10 +15,10 @@ either do a full scan or build some kind of search index.
**Were this a real-world task, I would say that, depending on what problem are we trying to solve,
the solution proposed in README is most likely a wrong solution.
And that enriching local data with OMDB data should be a responsibility of the service that populates the DB and not reads from it;
And that enriching local data with OMDB data should be a responsibility of the service that populates the DB, not of the one that reads from it;
that the right approach would be for DB to contain all the data used in search,
not for API returning data from that DB to fetch all the entries from DB and then fetch all the corresponding entries from OMDB one-by-one
(whether that happens on startup or on every search).**
(regardless of whether that happens on startup or on every search).**
But as I understand, this is supposed to be a coding challenge, not a requirements analysis challenge.
So let's get to the topic of implementing that potentially incorrect solution.
@ -26,7 +26,7 @@ So let's get to the topic of implementing that potentially incorrect solution.
Our "database" here only contains four movies, but README implies that this is just for (extreme) simplicity;
an actual database would presumably contain thousands of titles, and even simple test database would probably contain a hundred.
(If we could rely on the "database" only containing these four movies, then we wouldn't even need to call it a "database",
we could just hardcode enriched data inside the application and use it in searches, without making any external requests,
we could just hardcode enriched data in the application and use it in searches, without making any external requests,
but I guess that's not what this test assignment is about.)
So fetching data from OMDB on every search request is out of question; we must cache OMDB responses somehow,
@ -36,23 +36,27 @@ Additionally, since "Constraints" section mentions to not introduce any system d
Building an index on the cached data (and maintaining it) is clearly outside of the scope of this assignment,
so we'll have to do a full scan on every search request, and at this point we might as well
only do full scans in "database" as well.
It is enough for DB interface to only have three methods:
This means that it is enough for DB interface to only have three methods:
returning "our own data" for a specific movie by "our internal ID", by IMDB ID, and returning data for all movies.
At this point we already made so many assumptions that it won't hurt to make another one, for simplicity of implementation:
that data in local DB doesn't change while the service is running
(it physically cannot, with "local DB" actually being JSON files inside the project directory),
and that data in OMDB for the movies in the local DB doesn't change while the service is running.
At this point we already made so many assumptions that it won't hurt to make another couple, for simplicity of implementation:
1. That data in local DB doesn't change while the service is running
(it physically cannot, with "local DB" actually being JSON files inside the project directory).
Otherwise that would open an entirely new can of worms regarding "cache" invalidation:
e.g. what if IMDB ID of an existing movie changes between requests?
2. That data in OMDB for the movies in the local DB doesn't change while the service is running.
Otherwise we would have to constantly refetch all data for all movies just to update the searchable fields,
so that search provides the correct and relevant results.
So the final implementation idea is:
1. Preload all entries from the local "database" on startup;
2. For each of them, fetch the relevant data from OMDB;
2. For each of them, fetch the relevant data from OMDB on startup;
3. Create "merged movie objects", keep them in memory;
4. For every search request, go through all objects, find matching ones, return them
(with zero calls to OMDB API or to the local "database").
4. For every search request: go through all objects, find matching ones, return them
(with zero calls to OMDB API or to the local "database" in the process).
# Technical decisions made
@ -61,8 +65,8 @@ So the final implementation idea is:
### AWS Lambda Function
Considering everything that was said above regarding the search, I don't think this service
(or any other reasonable implementation of the assignment, as reasonable as it might be),
is suitable for deployment to AWS as Lambda.
(or any other reasonable implementation of the assignment, as reasonable as it might be)
is suitable for serverless deployment.
In order to have the search working, we should fetch data from OMDB for every movie in the local "database", one by one.
Either we do it on every search request (which is unreasonable, and can easily get us above 1000 requests/day quota,
@ -70,7 +74,7 @@ even for the sample database with only four movies in it), or we have to store t
Since the assignment said to not introduce any system dependencies, the only somewhat persistent state we can use in Lambda is in its memory.
However, we should not rely on the way Lambda instances are cold started and shut down.
However, we should not rely on the way Lambda instances are cold started and shut down; this is what "serverless" basically means.
And even if we would rely on that: if our Lambda will get too many requests at the same time, AWS will spin up several instances,
and each one will have to fetch all data from OMDB, which is not really a good idea.
@ -84,22 +88,22 @@ the reasons why the assignment as described in README seems to be a really wrong
Preloading everything into local memory on application startup can reasonably be called "caching".
In this implementation TTL is infinite, because of the assumptions described in "Search feature" section.
In this implementation TTL is infinite, because of the assumptions described near the end of "Search feature" section.
## Efficiency, application scaling
In this implementation, search and get request handlers just do a full scan of preloaded data.
This is not very efficient (the time scales as O(number of movies)), but good enough for low enough numbers of movies
(e.g. even in hundreds).
(definitely if it is below a thousand); latency on small enough local database should be low.
And if local database contains over a thousand movies, we have a bigger problem anyway because we won't be able to
fetch all the needed data from OMDB before hitting OMDB API usage limits.
I didn't do performance measurements because I didn't have enough time for that.
But at least this implementation can be scaled horizontally, improving performance (in terms of numbers of requests handled per second) linearly,
at the expense of extra requests to OMDB API from each instance.
But at least this implementation can be scaled horizontally, improving performance (in terms of numbers of requests handled per second)
linearly with the number of instances, at the expense of extra requests to OMDB API from each instance.
## framework
## Framework
I decided to implement this API with Nest.js, because I already used it on test assignments before,
and it's good and convenient enough for API prototyping.
@ -139,7 +143,7 @@ and that these fields are always present and non-empty, so I thought that it mig
(with `title` and `description` in merged result containing the English data from OMDB,
and `localTitle` and `localDescription` containing the German data from our "database").
Besides title, description and runtime, there is another field present both in OMDB response and in our "database": language.
Besides title, description, runtime and ratings, there is only one other field present both in OMDB response and in our "database": language.
There was no clear way to reconcile the data from OMDB with the data from sample JSONs:
sample JSONs only have one language per movie, as a two-letter code,
while IMDB data contains a list of languages, not all of them even having two-letter codes
@ -164,7 +168,7 @@ that are present in our "database" but not OMDB, or vice versa.
On practice, all four entries present in sample JSONs have corresponding entries in OMDB.
For simplicity (and in order to guarantee that `/api/movies/:id` endpoint always returns full merged `MovieData` object),
I decided to return 404 when at least one of the sources does not have any data.
Changing the implementation so that API returns partial data if only one of the sources knows about the requested movie would be trivial.
Changing the implementation so that API returns partial data when only one of the sources knows about the requested movie would be trivial.
The assignment seems to be a bit contradictory regarding the way how search should work;
it implies that search field name is the same as merged object field name, but then it provides an example as `?director=Frank Miller`
@ -180,12 +184,14 @@ Another thing regarding search is that it seems that for most fields, it does no
So I limited the search feature to a subset of fields, as described in `SearchFilters` type in [`types.ts`](src/types.ts)
and in `SearchDto` in [`movies.controller.ts`](src/movies.controller.ts).
And finally, it does not make a lot of sense to search by an empty query (we probably don't want to just list our entire database to whoever
did not enter any search filters), so I return a 400 Bad Request errors for request to search endpoint with empty query.
And finally, it does not make a lot of sense to search by an empty query (we probably don't want to just list our entire database to anybody
who did not enter any search filters), so I return a 400 Bad Request errors for request to search endpoint with empty query.
## Testing
I did not have any time to come up with a good test cases, so most tests work on samples of data from OMDB and on provided sample JSONs.
The code is mostly covered by tests.
I did not have any time to come up with good test cases, so most tests work on samples of data from OMDB and on provided sample JSONs.
## Git
@ -194,7 +200,7 @@ that I should use git to document my work ("_We will look at the git history of
without squashing or bundling;
but also that you will evaluate the proper use of git.
I decided to go with the first message, and use git to document work history, not to showcase git use that would be proper for PR in a professional environment.
I decided to go with the first message, and use git to document work history, not to showcase git use that would be proper for PRs in a professional environment.
# How to use

Loading…
Cancel
Save