Designing a scalable URL Shortener like Tiny URL

Sunaina Goyal
8 min readMar 22, 2021

1. Functional Requirements

Url Shortener is a service that creates a short alias URL for a corresponding long URL. So, now whenever the user visits the short URL, she will be redirected to the original URL.

URL shortening services like bit.ly or TinyURL are very popular to generate shorter aliases for long URLs. You need to design this kind of web service where if a user gives a long URL then the service returns a short URL and if the user gives a short URL then it returns the original long URL. For example, shortening the given URL through bit.ly :

Example: Let the original URL be: 
Long URL: https://nlogn.in/horizontal-scaling-and-vertical-scaling/
Short URL is: https://bit.ly/2OYHewm
Whenever you will click the second short url, you will automatically
be redirected to the page referred by the long URL.
Try Now.

2. Design Goals

The URL shortening service should have the following features:

  1. Mandatory Features
  2. Given a long URL it should be able to generate Unique Short URL.
  3. Given a short URL it should redirect to the original URL.
  4. The URL should expire after a timespan.
  5. The system should be highly available. This is really important to consider because if the service goes down, all the URL redirection will start failing.
  6. URL redirection should happen in real-time with minimal latency.
  7. Shortened links should not be predictable.

2. Optional Features

  1. The service should be REST API accessible.
  2. It should provide analytics features i.e, how many times the URL is visited.
  3. A user should be able to pick a custom URL.

3. Capacity Estimation

Let’s start by making some assumptions about the traffic (for scalability) and the length of the URL.

The important thing to note is that the number of reading requests will be 1000 times more than the number of write requests.

Suppose, we are going to receive 1B(1 Billion) new URL’s shorting requests per month, with a 100:1 read/write ratio. So, the total number of redirections:

100*1B = 100B redirections

If we are going to store, each URL(long and corresponding short URL) for almost 10 years, then the numbers of URLs we are going to store are:

1B*120 = 120B URLs

Now if on average every URL is going to be of 200 bytes, then total storage required:

200B*120byte = 24TB storages (Say Whhaaaat!!! )

Be prepared for millions of user requests per day

4. Algorithm REST Endpoints

Let’s starts by making two functions accessible through REST API:

  1. create_shortURL( long_url, api_key, custom_url, expiry_date)
  2. long_url: A long URL that needs to be shortened.
  3. api_key: A unique API key provided to each user, to protect from the spammers, access, and resource control for the user, etc.
  4. custom_url(optional): The custom short link URL, user want to use.
  5. expiry_date(optional): The date after which the short URL becomes invalid.
  6. Return Value: The short Url generated, or error code in case of the inappropriate parameter.
  7. redirect_shortURL( short_url)
  8. short_url: The short URL generated from the above function.
  9. Return Value: The original long URL, or invalid URL error code.

5. Shortening Algorithm

The URL that we have to generate should be a short URL. But how much short? Also, what type of encoding should be done?

Let’s consider we are using 7 characters to generate a short URL. These characters are a combination of 62 characters [A-Z, a-z, 0–9] something like http://ad.com/abXdef2.

URL Encoding — The encoding can be Base36( characters allowed [a-z][0–9]) or Base62( characters allowed [A-Z][a-z][0–9]) or Base64(characters allowed [A-Z][a-z][0–9],’+’,’/’). Since all of the characters in Base64 are URL safe characters, we will choose Base64 as our encoding technique.

Url Length — Using Base64 encoding if we choose the

URL with length 6, will give 64^6 = ~68.7 Billion URLs
URL with length 7, will give 64^7 = ~5 Trillion URLs
URL with length 8, will give 64^8 = ~281 Trillion URLs

We have already estimated that we would be storing 120B URLs, we can safely choose the short URL to be 7 characters long.

Now, to generate a unique short URL, we can calculate the MD5 hash of the long URL, which will produce a 128-bit hash value. Now when we encode the MD5 result to Base64 encoding the resultant string will be 22 characters long.

(MD5 result = 32character output = 32*4bit = 128 bit output.
Base64 encoding will use 6 bit to represent each character,
MD5 -> Base64 give (128/6) ~ 22 character output)

For choosing the short URL, first, we can randomly swap some character of the Base64 encoding result and then pick any 7 characters randomly from the result.

MD5 hash generation

String generateMd5(){// Static getInstance method is called with hashing MD5MessageDigest md = MessageDigest.getInstance("MD5");// digest() method is called to calculate message digest of an input digest() return array of bytebyte[] messageDigest = md.digest(input.getBytes());// Convert byte array into signum representationBigInteger no = new BigInteger(1, messageDigest);// Convert message digest into hex valueString hashtext = no.toString(16);return hastext.substring(0,7);}

Potential Issues — Using the above method we can come across 2 potential issues:

  1. MD5 can create a lot of collisions. For two or many different long URL inputs we may get the same unique id for short URL and that could cause data corruption.
  2. If the user enters the URL in UTF-8 format or URL-encoded format, both are the same but encoding is different. Check here.
  3. MD5 can create a lot of collisions. For two or many different long URL inputs we may get the same unique id for short URL and that could cause data corruption.

Let’s solve the first and third issue:

We can append the user IP address to a the long URL and then do the shortening porcess, in this way we won’t get redundency or duplication. Correct?

No, two users can have same public IP addresses if they are connected via same router.

One solution is to add user-id or API key to the long URL and then do the shortening. This will work fine, but user have to be logged in to create short URL. Let’s stick to this for now.

Let’s solve the second issue:

For every request recieved we will use UTF-8 encoding only. If a URL is URL-encoded, we will first convert it into UTF-8 format. For this we can have our own service or we can use some 3rd party services.

6. Database Design & Choice

Let’s see the data we need to store:

Data Related to user

  1. User ID: A unique user id or API key to make user globally distinguishable.
  2. Name: The name of the user.
  3. Email: The email id of the user
  4. Password: Password of the user to facilitate login feature.
  5. Creation Date: The date on which the user was registered.

Data Related to ShortLink

  1. Short Url: 7 character long unique short URL.
  2. Original Url: The original long URL.
  3. UserId: The unique user id or API key of the user who created the short URL.
  4. Expiration Date: The date after which this short URL should become invalid.

Data Storage Observation

  1. We will be storing Billions of links.
  2. The short link redirection should be fast.
  3. Our service is going to be read heavily.
  4. The only relation that is going to exist is which user created which URL and that too is going to accessed very less.

We have two different choices of databases: 1) Relational Databases(MySQL) 2) NoSQL Databases( Cassandra).

In general, Relational Databases are good if we have lots of complex queires involving joins, but they are slow. NoSQL databases are pathetic at handling the relationship queries but they are faster.

Now, we don’t really need lots of relationship among data, but we do need the fast read and write speed. Hence we will choose NoSQL Database. The key for each row can be the shorturl, because it is going to be globally unique.

Database Sharding

To scale out our database, we need to partition it into several machines or nodes, so that it can store information about billions of URLs. Hence now we can store more data in memory because of more machines or nodes. For database sharding we will use hash based partition technique.

In this technique, we will find the hash of the shorturl we are going to store, and determine the machine/shard in which we are going to store this particular URL. The hash function will randomly distribute the URLs into different partation or shard. We can decide the number of shards we are going to make and then we can choose appropriate hash function that random number representing the partation/shard number.( Ex if we have chosen 512 shards, then hash function will generate a number between [1–512] or [0–511] ).

7. Speeding Up the Read Operation

We know that our database is going to be read heavily. Till now we have find the way to speed up the writing process, but the reads are still slow. So we have to find some way to speed up the reading process.

Caching is the solution. We can cache the URLs that are going to be accessed frequently. For example, a url that appears on the trending page of any social networking website. Hence many people are going to visit the url. We can use the caching servies like memcached.

Things we need to consider after adding the caching layer:

  1. When the cache is full, we need to replace the URLs in cache with the treanding ones. For this we will use LRU(Least Recently Used) policy. The URL in the cace which have been refered the least number of times will be removed.
  2. Synchronizing the cache with the original URL. If the user updates or deletes the original link, the corresponding changes has to be reflected in the cache too.
  3. We can shared the cache too. This will help us store more data in memory because of the more machines. For deciding which thing go to which shard can be done using “Hashing” or “Consistant Hashing”.

Now this is now we have speed up our read and write request, but still our system is prone to network bandwidth bottleneck and single point of failure.

8. Load Balancing

To overcome the problem of limited network bandwidth and single point of failue in url shortener service, we will use Load Balancers. Load Balancer does its magic by diving the traffic among a group of server thus resulting in improved response and availability of a website or application.

To distriubte the load among server we will use the Least Bandwidth Method. This algorithm will choose the server currently serving the least amount of traffic, measured in megabits per second (Mbps).

We can place the Load Balancers between:

  1. The client and the server.
  2. The server and the database.

This is how we will design a highly scalable URL Shortener that can work in realtime.

--

--