A PHP Framework for Hosting Images on Amazon CloudFront
June 11, 2010 4 Comments
Not interested in a story? Go Right to the Framework.
You’re the lead developer at a hot new startup called Woofbook. Elevator pitch: “It’s like Facebook meets Flickr, but for dogs.” The founder of Woofbook has started many companies, and run them to many different levels of success and failure, by the shear force of his outgoing personality. In other words, he is a typical web entrepreneur. One day, while getting his goatee manscaped, a barber tells your boss that his nephew told him that Cloud Computing is the next big thing. He rushes back to the office and announces “We need to harness the cloud!” After searching for “Cloud” on TechCrunch, your boss tells you “Cloudfront is the thing we need.”
You take a moment to marvel that a pretty good idea was actually derived from such ridiculous reasoning. You know that using CDN is actually a very good practice and you’re pleased that you will actually spend the next few days working on something that will actually provide long-term value to the project.
It just so happens that you were building the photo-sharing section of Woofbook when your boss interrupted you with his Cloudfront idea. After 15 minutes of cruising the Amazon’s Cloudfront website and a few blog posts, you conclude that implementing Cloudfront is going to be easier than you thought. You just need to create a Cloudfront distribution, upload your images to it, then point the HTML image tags to the Cloudfront Service. Perfect.
A week later, the photo-sharing feature goes live and it’s a big hit. Before long users have uploaded over 100,000 photos and it loads pretty fast and you think you’re pretty darn sophisticated for making it look so easy.
Several weeks after that, your boss comes back from a meeting with a potential investor (those meetings always tend to result in bothersome new initiatives) and exclaims, “The photo sharing is great! We need to leverage it.”
It turns out that dogs have active social lives. They go out to dog gatherings every day at specially designated meeting places (known as dog parks) to socialize and conduct their. . . ahem – business.
“It would be great if dogs could tag their friends in the photos they uploaded.” your boss says. “Also, wouldn’t it be great if one dog could use a photo uploaded by another dog as his profile pic?”
“No problem!” you say confidently. You smartly designed the database architecture to make it easy to use the photos on the site for virtually any purpose. “I’ll have it done by COB tomorrow.” You link up the Photos table to the Users table by adding a “profilePhotoId” foreign-key field to the Users table. Then you build out the functionality to allow users to choose any photo as their profile pic.
But you soon realize there is a problem. Profile pics are displayed at 200×150 pixels and the photo album photos typically display at 500×375. You’re afraid you don’t have time to properly resize the photos without missing your self-imposed deadline. To speed things up you try resizing the actual photo images by using the WIDTH and HEIGHT attributes of the IMG tag, but that causes the images to appear warped because the aspect-ratios are different.
So now you need to run a script to resize the photos. Or maybe you’ll just resize the subset of the album photos that users have chosen as profile pics. You realize there’s a trade-off between these two options. It’s more logically simple to store all of the photos in two sizes, but it’s more efficient and elegant to only store the profile-sized images for images that will actually be used as profile pics. Unfortunately, you promised it would be done “by COB tomorrow.” To meet your deadline you settle on resizing all of the photos and uploading them all on Cloudfront. Even though it is less efficient to do it this way, it also involves less programming which means you’ll meet your deadline. You run a process to resize the photos overnight and finish up the next day. Everybody’s happy.
Several months later, things are continuing to go well at Woofbook Inc. Users are loving the ability to use their friends’ photos as their profile pics. Woofbook is starting to get positive coverage on TechCrunch and even the regular business press is beginning to take notice. Imitators start to appear on the scene. One of them, BarkSquare, decides to capitalize on some bad press Woofbook received about some ill-advised privacy-policy changes to promote their new “Transfer to BarkSquare” feature. The feature allows users to copy their entire Woofbook profile, including all of their friends and photos, to BarkSquare.
Predictably, this causes a significant panic in the WoofBook offices. “How will we stop this?” your boss cries with the agony not unlike the pleadings of a bullied child. After some determined and serious discussion, he decides that all of the photos should be watermarked. “At least we’ll be able to keep track of where the images go and get some free advertising,” he says.
You roll your eyes and get to work figuring out how to add watermarks to the images. Once you’ve created and tested the script that will apply the watermarks, you kick it off and leave it running as you head home for the evening. Before going to bed, you check the script’s progress, see that it’s finished, and send a quick email to your boss to let him know. You’re quite pleased that the timestamp on your email says 11:32pm. You expect that your dedication will not go unnoticed.
When you arrive at work the next morning, rather than thanking you for your after-hours efforts, your boss wants to know why he’s not seeing any watermarks on the photos on the site. That’s when you remember. Because of the chaos and urgency in the office yesterday, you forgot something important. The images on Cloudfront are cached on the edge locations. Updating the original does not automatically update the cached versions residing around the world. Worse, there is no way to manually flush the cache. You can only wait for the Cloudfront system to refresh the cache on it’s own. Since you followed the best-practice of setting a far future Expires header on the images, it can potentially be a very long time before the cached objects get refreshed.
Now it’s even more complicated. In order to show the new images with the watermarks, you have to upload all of them with different file names, then point to the new versions. It’s going to be a long day.
The CloudfrontImageService is designed to eliminate the pain of managing images on Cloudfront distributions. The framework has built-in functionality for managing:
- The need to display images with different dimensions (i.e. thumbnails).
- The need to make changes to images (i.e. add watermarks).
- The optimal way to store images to S3 to maximize caching at the browser level and minimize your costs.
- Just-in-time uploading – so that only the specific images that are needed in your web application ever get uploaded to Cloudfront, saving you money and eliminating the need for a background process to do that work.
There are three database tables that are used to manage images in the CloudFrontImageService. They are tbl_image, tbl_imageDimensions, and tbl_imageDimensionsMap.
tbl_images – This table is used to track the original image file that is stored on your server’s filesystem or on an EBS volume (not on S3).
- filePath: A relative path to your image on the filesystem.
- version: A number indicating the version number of the image. If an image changes (for instance, if a user uploads a new profile picture), you would write the new image to the filePath then increment the version number by one.
tbl_dimensions – This table holds the definitions of different dimension sizes like: thumbnail, x-large, original and whatever else you need.
- keyName: This is a string that make it easy to reference the image size you want in your code. For example “original”, “thumbnail”, “extra-large”.
- description: An informational field to help the site developer keep track of the purpose of the dimension. A place to store comments that are not actually used on your site.
- width, height: This is the maximum width or height for the dimension. When an image is rendered using a dimension, it will have a width and height no larger than these values. The image is resized to preserve the aspect ratio but still fit in these dimensions
tbl_imageDimensionsMap – This table holds the records of actual image objects that are currently residing in in your Cloudfront distribution.
- imageId, imageDimensionsId: The primary key for this table and the foreign keys to tbl_image and tbl_imageDimensions.
- width, height: This is the actual width and height of the image resized to fit the dimensions. When the image is resized the aspect ratio is preserved, meaning that either the width or height is likely to be different than the value in tbl_imageDimensions. For this reason, we record the actual width and height here.
- version: This is the version number of the image that is currently stored on Cloudfront. When the version in tbl_image gets incremented, we don’t immediately upload a new version of the image to Cloudfront. When an image URL is requested (you request the URL in order to insert it into your HTML), the current version number is compared to the version number in tbl_images. If the version in tbl_images has been incremented but the version in tbl_imageDimensionsMap has not, the new version of the image in the requested dimension will be created and uploaded, and the URL to that new version is returned to the caller.
Adding Images to the Framework
Before you can serve up images in your webpages, you need to add them to the framework. This is done through the createImageObjectFromFileSystemPath() method. Example:
<?php $cfImgSvc = new CloudfrontImageService(); $imgObj = $cfImgSvc->createImageObjectFromFileSystemPath('/var/www/imagedrop/flower.jpg', 'flowers/flower.jpg'); ?>
On line two we instantiate the CloudFrontImageService. On line four we use the createImageObjectFromFileSystemPath() method to take an image saved on the filesystem at location /var/www/imagedrop/flower.jpg and create an Image object which will be stored in the flowers subdirectory of the IMAGES_ROOT that is set in the config file. The createImageObjectFromFileSystemPath() handles the following steps:
- Determines the type of the image (JPEG, GIF, PNG).
- Determines that the relative path (i.e. flowers/flower.jpg) is not already used. If it is a duplicate, it will append a digit to the file name until there is a unique version (e.g. flowers/flower1.jpg, flowers/flower2.jpg).
- Makes sure that the sub-directories exist. If the flowers directory does not exist in the IMAGES_ROOT, then it will create it.
- Copies the flower.jpg file to its new location in the IMAGES_ROOT.
- Creates the tbl_image row in the database.
- Returns the newly created Image object.
Inserting Cloudfront URLs into your HTML
Once you have added an image to the framework, you can use the CloudfrontImageService class to insert the image URL into your HTML. CloudfrontImageService has three methods that you can use to do this:
getUrlFromFilePath($filePath, $dimensionKeyName) – Takes the $filePath (which is one found in tbl_images) and a valid dimensionKey (which is a keyName found in tbl_imageDimensions). It returns a URL to the image on Cloudfront. If the image doesn’t exist on Cloudfront, it automatically uploads it to your Cloudfront distribution it before returning the URL.
<html> <body> <img src="<?php echo $cfImgSvc->getUrlFromFilePath('flowers/flower.jpg', 'thumbnail'); ?>"/> </body> </html>
This will insert the URL of the thumbnail-sized image into the HTML output. The thumbnail dimensionkey (thumbnail) must be defined in tbl_imageDimensions. The resulting URL will look something like this:
The version, dimensionkey, width, and height are all part of the URL for the purposes of ensuring that it is a unique URL and for your convenience.
getUrlFromImageId($imageId , $dimensionKeyName) – Same as getUrlFromFilePath(), except that you provide the Image id.
<html> <body> <img src="<?php echo $cfImgSvc->getUrlFromImageId(55, 'extra-large'); ?>"/> </body> </html>
getUrl(Image $imgObj, $dimensionsKeyName) – Same as the two above, except you can use it if you already have the Image object.
<html> <body> <img src="<?php $imgObj = Image::findById(55); echo $cfImgSvc->getUrl($imgObj, 'original'); ?>"/> </body> </html>
- All of the images uploaded to S3 have the Cache-Control header automatically set to “public, max-age=315360000″. The “public” setting is there so that browsers will cache the images even if they are served via HTTPS. The max-age is set to 10 years. The max-age has two affects. Having a long max-age setting increases the amount of time that the edge locations will cache your content, reducing the number of times that users will have to wait for the cache to be refreshed and potentially saving you money. It also tells the users’s web browser to cache the image for up to 10 years, making your site load faster. The 10-year expiration won’t have any adverse effects if an image changes, because the URL of the image will also change (the version number will increase) so Cloudfront and the user’s browser will consider it to be a brand new object.
- The framework relies on MagickWand to resize images. This is not typically installed by default.
- Amazon recently added a S3 feature called Reduced Redundancy Storage which offers lower prices in exchange for lower durability of the file storage. Because the CloudfrontImageService stores your original image files on the file system (hopefully an EBS volume if you’re hosting on EC2), it is entirely feasible to use this cheaper alternative to store your CloudfrontImageService images. If you were to lose your S3 image storage, you could recover by just incrementing the version number of all of the records in the tbl_images table. The CloudfrontImageService will then handle the re-uploading of the image objects to S3 as needed.