VPN
We generally use VPN (Virtual Private Network) to test whether content piece is accessible from another country. Using VPN our servers are tricked as if the request is coming from another country. How does this happen ?
On a simple level, when VPN app is running on my device it will send all my requests (app or browser requests) to the VPN server that is located in some another country. VPN server will then forward these requests to the destination server. Since the request has actually come from another country, destination server will respond. This response will then be forwarded back to me. Thus VPN has acted as a middle-man or proxy to send my request.
Slack Integration
With scale, time becomes of essence in resolving an issue. Thus, identifying and resolving issues quickly becomes of essence. Collating data from multiple sources to debug an issue is not desirable in such case. If your workspace is on Slack, then getting alerts on Slack increases visibility and saves time in crisis (when it matters). Slack can also be used to send alerts of any background tasks like celery tasks, jenkins deployments, …
Gunicorn
While development, we run Django application using python manage.py runserver
. But that is not recommended for production use. Why so ?
Firstly, in development mode, lot of details are returned which expose internals of system. This is a security risk in production. Secondly, application doesn’t auto-restart if it crashes with some error. Memory leaks in application accumulate over time eventually leading to crashing of server. Load can’t be balanced across CPU cores. Then static files are served synchronously this would result in queuing of requests. In dev mode server runs as a single thread so it can’t handle multiple concurrent requests thus increasing overall response time. There isn’t connection pooling for database connections. Overall it is not a reliable setup that should be used in production.
Aspera
I heard about Aspera first time today. It is not a file storage, rather a file transfer technology by IBM. But what is the benefit ? and why do you need another technology ? Let’s dig deeper.
Say, you are a production house. You produce movies or videos. The raw content is generally quite large in size. Like 100s of GB and sometimes can even go in TBs. Transferring this much content via network is quite slow. Instead just sending the content via hard disk is faster. Aspera shines for such cases. Despite size of content you get consistent speed. This is what happens internally.
Apache Bench
I wanted to replace python manage.py runserver
command with uwsgi
for production. But before taking any decision I want it to be assessed with some data. Even if some blog says uwsgi can handle better concurrent load, memory management and caching why should I believe it ? How can I test to be clear.
For starter, I can write a python script that can send n number of requests and I can make them concurrent calls using async / threads. But is there any other readily available option ? When I explored, there is an option: Apache Bench (ab)
Building Jenkins
I have used Jenkins to deploy jobs but that’s the extent of my usage. I was going to deep-dive into Jenkins. But before that I got a thought, why not think of building Jenkins if I had to build one ? If I was to build Jenkins, how would I have done it ?
So, Jenkins for me is an automated deployment
tool. So, before automating something, how would I have done the deployment manually ? Say, I want to deploy a Django project from GitHub to production. I would follow these steps:
Sentry - For The Black
Where can I find the logs ? was one of the initial questions that I asked while understanding our system. I came from a background where you need to see logs to find issues. I was taken aback when I heard that they don’t capture logs. I kept wondering how they resolve issues in production.
It’s been 5+ years now working in the system and I have rarely used logs. It brings a smile to my face when some new joinee asks for logs. With Sentry catching errors has become quite easy. Enable sentry on one of the servers and if any case throws error alerts are raised immediately. Whoever sent the buggy code is responsible to fix it immediately. Sentry alert gives you all details you would ever need to fix that bug. Which API ? Which Machine ? Which line ? What error ? What were the local variables ? … Thus, we have been able to scale to millions of users without ever depending on logs. Any bug that creeps in, gets killed in nascent stage.
Maintaining A Database
We use Django Framework in our system. Django comes with its own ORM for relational databases. There are few tables which are being used from 7+ years now, so we decided to look into them.
To check how much space a table is using, we can run following Postgres query:
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size,
pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) as table_size,
pg_size_pretty(pg_indexes_size(schemaname||'.'||tablename)) as indexes_size
FROM pg_tables
WHERE tablename = '{table_name}';
It turned out that the table is using 1.5GB space and its indexes are using 6.5GB space. This came as a surprise to me as I was always under the impression that indexes are smaller than table. Yes, indeed they are smaller but how many indexes you create also matters. It turns out that we had 50+ indexes on our table. Obviously not all of them are being used. So, I checked indexes and their usage with this query:
The Conundrum: Sync Or Async
Suppose I migrate my code from being synchronous to asynchronous what does change ? Let’s try to think through.
Synchronous code execution is very predictable and linear. If you make some network call, it will relax there till the data comes. On the contrary, asynchronous code is quite busy. The poor guy keeps on running in (event) loop juggling multiple tasks concurrently. If it makes any network call it will register an event for that and pause execution there. Meanwhile it (eventloop) will pick and process some other event which is ready.
Understanding Video
Say, I have a video.mp4
file on my system. What exactly does it contain ? The outermost layer is called Container
. There are various type of containers mp4
, mkv
, mov
, flv
, … Container contains metadata about the file like its title, duration, creation date, … Then it contains streams of data. They can be video streams, audio streams, subtitle streams, … This is pictorially explained in following image: