Skip to main content

Command Palette

Search for a command to run...

Your Image upload system is wrong !

Updated
5 min read
Your Image upload system is wrong !

Most developers take the image upload system too easy while writing garbage code. Real Image upload system is far complex than they can actually imagine , in this blog post i gonna tell you the flaws of this system that you should avoid not to build a trash.

The trash code

Lets talk about what most developers do for uploading images. For this post we gonna assume our server is running on port 3000 which have a /upload route to upload images

In most cases developers use a simple flow like this where client uploads an image to the server and server just stores the image in the filesystem after some processing and return the path of that image as the SQL or NoSQL databases are not capable tor store binary data like images or videos. This architecture looks good in high level and for hobby projects that just run locally but breaks at production.

Why breaks ?

Now let us assume we have not a million but thousands of users on your software where all the users uploads some images at once, now what ?

  • Your system has to handle a high I/O

  • It has to store thousands of binary files in the fs

  • The synchronous task (Disk I/O) takes some time for

As a results your system can be overwhelmed , if your system can actually handle that concurrency then some users has to wait for some seconds for image to be uploaded. If the operation takes too much time then may be the session expires from the users side and no one likes to wait.

How do we fix it ?

It is simple that we just can't follow up this architecture anymore to fix the problem so we have to prepare our mind for a new and a little bit complex architecture. Which actually fixes the issue

This is the architecture diagram you should follow to fix this issue

In this architecture we are using a Object storage to store the the images temporary ( object storages are build for file handling ) and creating a data base record which actually serves the client.

So when a user uploads an image store that binary file into the object storage ( like amazon s3 or cloudflare r2 ) with an unique key and create a record with a status and url key for that particular image then push job with the image unique key to a message broker (like kafka or rabitmq) . Later a different consumer can pull that job on the basis of your pull / push configuration and process the image ( compression , convert into different formats , meta data and many more ) here you can also rely some outer APIs like imagekit for CDN based image url and mark the database as completed.

While completion the UI can just show any animation which is like the image is getting uploaded with out freezing client's UI, which is more intuitive. The client side code can have some long polling strategies to fetch the image data from the database to show completion.

Pros

  • Asynchronous approach instead of synchronous

  • Main server offloaded from the overload

  • Better UX

  • Latency decreases

Cons

  • More complex to manage and code

  • Main server coupled with two database layers and queue

This approach works out of the box , but still companies don't use this architecture. Now you're thinking what the heck i am talking but that's true , this architecture works fine but this is not scalable. So in the next we gonna talk about how to actually scale it

How to scale it ?

In this section I am not going to talk about the scaling strategies like vertical or horizontal scaling. Here i am going to talk about mental model and the architecture for a image upload system to have to scale properly.

So the architecture below you should follow to scale

In this architecture we are having an event driver architecture where our main server or the uploader server just cooperate with image upload but doesn't actually uploads it.

Here every time client tries to upload an image the main server gonna return a presigned url which contains a encoding and expiry limit using that url client can directly PUT images into the object database.

Now you can configure a event driven architecture configuration for the object storage where on any PUT event the storage throws an event to the message broker becoming a producer. Now the processor does it's own job. you can also implement some Pub/Sub configuration here and connect more listeners to the object storage.

Pros

  • Much more Scalable

  • Decoupled

Cons

  • Hard to implement

Now you know how to handle images and how to scale the system to handle bulk uploads. I already tried to code it so i covered all three types in 2 different releases for the fix and scalable ones , so make sure to the repo, here is the repo link : github. Follow more interesting contents and feel free to connect with me on my socials.