AWS and Jekyll

11 minute read

This first post will detail some of the steps I went through to set up a working Jekyll-based blog hosted on Amazon S3. The steps will be messy and some details might not be correct - I apologise in advance for these. I would like to go through my own instructions again from scratch with a clean setup to check that they all work exactly as written, but that’s a task for another day.

This article has been updated since first publication. I’ve fixed some errors in my original instructions and have updated other parts based on changes in AWS.

Overview

AWS

Amazon Web Services (AWS) is one of many competing cloud infrastructure platforms. I’ve chosen to use them as they offer a 12-month free period for basic computing and database instances, and also because they are used by my employer so I’m able to develop my skills with real-world technology and learn from the existing infrastructure set up there.

Jekyll

Jekyll is a Ruby-based static-site generator. Unlike other heavyweight blogging software, it doesn’t need an intelligent server to operate and does not require a complete database. This does add a lot of restrictions, but it’s a tradeoff that comes from the project’s focus on content instead of ‘bells-and-whistles’.

Local development

Before you go too far down this path, it’s worthwhile checking that Jekyll is suitable for your needs. To install it and run a basic development server, I used the instructions under this heading. Note that I’m running Ubuntu 16.04.4 LTS, so the exact steps you need to follow may vary.

Installing Jekyll

To get Jekyll to work, I first had to install the Ruby development packages. You’ll also need to ensure you have gcc, g++ and make on your system - I already had these, so they’re not listed in the commands below.

sudo apt-get install ruby-dev
sudo -H gem install bundler jekyll

Creating a project

When Jekyll has finished installing, you’re ready to use it to generate the stub code for your blog. To create the basic structure of a Jekyll site, use the following command. This will create the sub code in a folder called my-blog-name - feel free to change this to suit your taste.

jekyll new my-blog-name

If you’re familiar with the Git version-control system, you may also like to initalise a Git repository at this time and create a baseline commit with all the autogenerated files. This will let you roll back to this point if you make any mistakes and can’t just press undo in your editor.

git init .
git add .gitignore *
git commit -m "Autogenerated Jekyll site"

Running Jekyll’s server

At this point, even without having made any changes whatsoever, you’re ready to run the development server and see how your blog looks! The command you need to execute is as follows:

bundle exec jekyll serve

Jekyll will crunch the numbers for a few seconds, and then report that it’s listening and ready for us to connect using a web browser by spitting out these two lines. Open up your web browser of choice and navigate to that URL and you should see your Jekyll blog up and running!

    Server address: http://127.0.0.1:4000/
  Server running... press ctrl-c to stop.

To edit the content in your site, you can create or modify the markdown files in _posts, and the changes should be recomputed on-the-fly - so refreshing the page will show the new appearance of the document. Spend some time acquainting yourself with Jekyll and make sure it suits your needs before you do anything in AWS. There are also some additional options that we can pass to Jekyll on the command line - see the Building your Jekyll site heading below.

Serving in the Cloud

Building your Jekyll site

The command we’ve been executing to start the development server up until now is perfectly suitable for development, but when we want to push our changes onto the web there is a better method with several other options we can run. The base command to build a Jekyll site is this:

bundle exec jekyll build

It’s also possible to pass additional arguments to this command to adjust the processing behaviour, such as:

--lsi to enable Latent Semantic Indexing, which provides better results under the ‘related posts’ heading on post pages. Without this option, the default ‘related posts’ are simply the most recently created posts. Note that you may need to add classifier-reborn to your Gemfile.
--strict_front_matter to turn on additional error checking, looking specifically at the YAML front-matter of each page. If any errors are found, this will cause the build to fail.
--drafts to enable processing of ‘draft’ posts. These are posts in the _drafts directory, which normally do not get processed into HTML pages in the site. Providing this option will generate HTML pages for each of these files, with the timestamp set to the time of the build.

There is a full list of the supported command-line arguments (and their equivalent config file values) in the Jekyll documentation here.

Amazon S3

Creating a bucket

The original instructions in this section indicated that the name of the bucket didn’t really matter — this is incorrect. The bucket should be named the same as the domain name at which you plan to host.

The first step to hosting our website in Amazon S3 is creating an ‘S3 bucket’ in the AWS management console.

Click the ‘Create bucket’ button.
Enter a name for your bucket. To be able to use S3 static website hosting, the bucket name must match the address of your website (such as blog.contoso.net).
Pick a region for your bucket. This determines where Amazon will primarily locate the data, and where requests to your website will be served from. I suggest choosing the region nearest to you so you get the best speeds.
Click next. If your bucket name is already taken, AWS will warn you now.
On the ‘Configure options’ screen, leave everything at defaults unless you have specific needs. Click next.
On the ‘Set permissions’ screen, uncheck the two options that talk about “ACLs”. Leave the two “bucket policies” boxes checked. Click next.
The ‘Review’ screen will then appear, letting you check what settings you’ve entered. Check the bucket name and region, and that the permissions you’ve set are as instructed. Click ‘Create bucket’.
To enable public access to the bucket, go to the ‘Permissions’ tab and then the new ‘Access Control List’ tab that appears.
Under the ‘Public access’ heading, click the circle next to the ‘Everyone’ row. A panel will appear on the right hand side; check the ‘List objects’ box, then click ‘Save’ and you’re done!

Filling the bucket

You can stick data into an S3 bucket in a couple of ways - either manually using the web console, or automatically through some sort of tool. To simplify my deployment process, I chose to use the Python-based s3cmd utility. This isn’t installed by default, so you’ll need to run the following command:

sudo -H pip install s3cmd

Once s3cmd has been installed, you can upload a local folder into a bucket using the following command. I’ll also outline each of the options and why I’ve used them.

s3cmd sync --acl-public --delete-removed -M --no-mime-magic --access_key=abc123 --secret_key=abcd1234 "./_site/" s3://bucket-name/

s3cmd sync is used to synchronise an entire directory tree into S3. It determines whether files have been modified by comparing their size and MD5 checksum.
--acl-public sets the objects with public read permissions, which is necessary for them to be visible on the internet.
--delete-removed will clean up any files in the bucket that are not present in the local filesystem. This is especially important, as if we choose to remove a blog post this flag will remove the files to stop them from being accessible.
-M --no-mime-magic together are used to set how file MIME types are detected. This affects the Content-Type header that the S3 web server will send, and can cause your pages to not render properly if omitted. If your page looks unstyled, it’s likely that your CSS files haven’t had their MIME type detected correctly - set these flags and the problem should be resolved.
--access_key and --secret_key are your AWS programmatic access credentials, and are used to identify you to their services. You’ll need to set these up prior to running this command. These can be configured in the IAM service, or by clicking on your username in the AWS console top banner and selecting ‘My Security Credentials’.
"./_site/" is the path to the local folder to upload - in this case, the Jekyll build output directory. If you want to upload something else, change this path.
s3://bucket-name/ is where you stick in the name of your S3 bucket. You can also specify a subdirectory for it to go into, if you wanted to upload other content; in our case, we need the data to go into the root of the bucket so it can be hosted properly by S3.

Making it hosted

Once we’ve made a bucket, we need to configure it for static website hosting. Navigate to the S3 console as before to begin.

Select your bucket in the list by clicking on its name, then click on the ‘Properties’ tab.
Click on the ‘Static website hosting’ box, and then select the ‘Use this bucket to host a website’ option.
Fill in the filename for your main index document, and the page that will be served when any errors are encountered (such as missing pages or forbidden access).
Click ‘Save’.

Your bucket will now be hosted! If you have already uploaded files into it, you can click the link at the top of the panel. To get your new Jekyll site to load at the domain name you’ve picked, you’ll need to set a CNAME record with your DNS provider — instructions vary, so consult their help docs.

Wrap up

Congratulations, you’re done! You should have the foundations of a solid workflow for creating and editing posts, using Jekyll to build the pages into a set of HTML pages and resources, and s3cmd for uploading them into S3 where they can be accessed on the public internet.

At some point in the future I’ll write an article about setting up some automation for the site, so you can just push your commits into your Git host and have the site automatically built, tested and deployed for you.