Lambda@Edge - A scalable proxy

/ Work

You have hosted an application on s3 bucket and pointed a Domain to CloudFront and started accessing the site. Everything is fine till you realize that refreshing the route returns 404. As the application runs on virtual routes that do not exist on servers it's obvious that they will return 404.

 

If you are familiar with the hosting then you will quickly identify the solution. To route them back to the index page on 404. You will quickly realize that simply pointing index.html in the error page configuration of s3 solves the problem.

 

That's all okay. But that comes with a cost. To do so you use a single bucket to host a single website. But these configurations become a nightmare when you are about to host multiple sites in a single bucket.

 

One site per bucket configuration is also not right as there is a limit on the number of buckets you can create per account. Though you can increase it. but there is a celling. You can't go beyond a thousand buckets. Also, there are users who are looking for a cheaper alternative for s3 like b2blaze or wasabi., where the inbuilt capability to route the page does not exist.

 

Here is where lambda@edge helps us resolve this challenge. Lambda@edge provides the capability to run our code on the edge. Which means near to the user's location. The second part of this is the cloud-front events.

 

There are 4 types of CloudFront events on which you can trigger your lambda@edge function

  • Viewer request
  • Origin Request
  • Viewer response
  • Origin Response
 

I won't go deep into each of them and will summarize them in short.

  • Viewer-Request and Viewer-Response - Executes on every CloudFront Request
  • Origin-Request and Origin-Response - Execute only when CloudFront cache miss.
 

What we need here is "Origin Request" and "Origin Response". When you receive a CloudFront request to your lambda function there is a lot of header information you get in the lambda event object. The ones which we will use is "host" and "request".

 

The request object contains the "URI" key which tells us what request is being made. The host information tells us for what domain the request is for.

 

The first thing you will do is, you host your website in a bucket with the root folder name as the domain of your website.

The second part is to write your lambda redirector/ rewriter. With our lambda, you create a rule file which will have a format like this

 

Parts of rules.txt

  • The pattern to match
  • The status code 200 / 302 indicates whether the URL needs to rewrite or redirected.
  • URL to replace with
 

You can also use regex, so you do not need to specify rewritten rule for each path

 

Save this file with a name rules.txt in your project folder. Deploy this file with your project to s3.

The next step is to write the lambda function which reads the file. Let's dive into the code

As your files are saved with the folder name which names are the same as the domain. You can simply access this file from s3 with the Key {domain}/rules.txt

 
const listParams = {
    Bucket: "your bucket",
    Key: `${domain}/rules.txt`
};
const rulesFileContents = await s3.getObject(listParams).promise();

Then we will parse those rules

rules[domain] = parseRules(rulesFileContents.Body.toString('utf8'));

The parser - Which is nothing but reads file content, iterate over each line and push the rule to an array, which is then exported.

function parseRules (fileContents) {
  const lines = fileContents.trim().split(LINE_BREAK);
  const rules = [];
  for (const line of lines) {
    const trimmedLine = line.trim();
    if (isEmptyOrComment(trimmedLine)) continue;


    const [ pattern, statusCode, replacement ] = trimmedLine.split(WHITESPACE);

    let regex;
    try {
      regex = new RegExp(pattern, 'i'); // case insensitive
    } catch (err) {
      console.error(err.message);
    }


    rules.push({
      pattern,
      regex,
      statusCode,
      replacement
    });
  }
 return rules;

}

The next step is to match the rule, if rule found then replace the uri with the pattern and do redirect or rewriter based on status defined for the rule, if not then simply pass request back.

   const matchedRule = rules.find((rule) => rule.regex.test(request.uri));
    if (matchedRule) {
        const { regex, replacement, statusCode } = matchedRule;
        const newLocation = request.uri.replace(regex, replacement);
        if (statusCode >= 300 && statusCode < 400) {
            const response = createRedirect(newLocation, statusCode);
            log('redirect', response);
            return response;
        } else if (IS_URL.test(newLocation)) {
            rewriteRequest(request, newLocation); // mutate request object
            log('full rewrite', request);
            return request;
        } else {
            request.uri = newLocation; // mutate request object
            log('uri rewrite', newLocation);
            return request;
        }
    } else {
        log('no match', request.uri);
        return request;
    }

The redirect function is as below

function createRedirect(newLocation, statusCode) {
    return {
        status: statusCode,
        statusDescription: 'Moved Permanently',
        headers: {
            location: [
                { key: 'Location', value: newLocation }
            ]
        }
    };
}

And the rewriter function looks like this

function rewriteRequest(request, newLocation) {
    const url = new URL(newLocation);
    const protocol = url.protocol.slice(0, -1); // remove trailing colon


    request.uri = url.pathname;
    request.origin = {
        custom: {
            domainName: url.hostname,
            protocol,
            port: (protocol === 'https') ? 443 : 80,
            path: '', // TODO: probably not ideal to put all this in request.uri
            sslProtocols: ['TLSv1.2', 'TLSv1.1'],
            readTimeout: 5,
            keepaliveTimeout: 5,
            customHeaders: {}
        }
    };
    https://www.linkedin.com/redir/phishing-page?url=request%2eheaders%2ehost = [
        { key: 'host', value: url.hostname }
    ];
}

Deploy your lambda@edge function.

Linked it to the cloud-front events.

If you do not want to write your own lambda then you can use the library too - https://github.com/marksteele/edge-rewrite. I haven't tried but you smart enough to figure it out.

 

What you have now is a fully functional proxy that can scale easily with increasing traffic.

 

That's all from me for now. I know I have left things for you guys to figure out but those are trivial ones, still if you find it difficult then please reach out. Also, I will write another article on the use of "Origin Response". For any questions, you contact me on Linkedin or shoot your questions to rathodm63@gmail.com

Mahendra Rathod
Developer from 🇮🇳
@maddygoround
© 2024 Mahendra Rathod · Source