| Path: | README |
| Last Update: | Wed Jan 17 12:34:08 GMT 2007 |
This is my go at a Ruby library for talking to Amazon’s S3 service via REST. It is based on the sample Ruby code for S3 provided by Amazon, but has gone a long way beyond it now. Most of the remaining Amazon code is now in core.rb; the rest is mine.
s33r supports the following S3 features:
s33r is released under an MIT-style licence (see LICENCE.txt).
Note that this documentation is a work in progress. I’ve recently revised the API quite a bit, and am still rewriting this stuff. Please alert me via the project’s RubyForge tracker (rubyforge.org/tracker/?group_id=2106) if you come across any issues.
Install as a gem:
gem install s33r
Or get the Subversion trunk:
svn checkout svn://rubyforge.org/var/svn/s33r/trunk s33r
Before you can start using s33r, you’ll need to set up an Amazon S3 account at amazon.com/s3. None of the following will work without this!
s33r has some code dependencies and some optional which require extra libraries, namely:
Builder is marked as a dependency when you install the gem, but libxml-ruby is not. This is because s33r will accept either the gem or the library version of libxml-ruby.
By the way, I’ve tested this on Linux, but not on Windows or Mac. If anyone is using it on either of these platforms successfully, please let me know.
To load s33r it from inside your Ruby script (as a gem):
require 'rubygems' require_gem 's33r'
Alternatively, load lib/s33r.rb if you’re using the svn version.
S3 provides the following components:
s33r presents these components through a fairly thin object layer. You interact with the S3 Service via either a Client or a Bucket. The difference is that to perform operations with a Client, you have to specify the bucket for each operation; with a Bucket, all operations are scoped to a particular bucket on S3. (N.B. Bucket inherits from Client, so you can also address other buckets from inside a Bucket instance.)
My preferred way of working with s33r depends on what I’m trying to do:
The Client class is hooked up to the S3 Service via your Amazon access key and secret access key.
require 'rubygems' require_gem 's33r' include S33r client = Client.new(:access => 'accesskey', :secret => 'secretaccesskey')
The Client can also be initialised with the following options which cover how the client interacts with S3:
And these options, which govern the generic HTTP behaviour:
For example:
client = Client.new(:access => 'accesskey', :secret => 'secretaccesskey', :persistent => true, :dump_requests => true)
A client can also be initialised from a YAML file with this format:
aws_access_key: 'yourkey'
aws_secret_access_key: 'yoursecretkey'
options:
use_ssl: false
dump_requests: false
persistent: true
The YAML file is passed through ERb as it is loaded, so you can insert chunks of Ruby code into it if you like.
Note that you can add as many different options into the options section of the YAML file as you like: the whole options hash is added to the client and is accessible with client.created_with_options. You can get all of the standard S33r settings for a client by calling client.settings: this hash is updated as you set properties on the client, but created_with_options is frozen on creation, and represents the options used to create the client in the first place. So you can append your own properties to the Client class using this mechanism.
Once you can have a client, you can start managing buckets. (You can have up to 100 of these per Amazon Web Services account.)
Creating a bucket:
client.create_bucket('fantastic-bucket-o-fun')
If you try to create a bucket which exists and which someone else owns, you’ll get false back from this; if the bucket exists and you own it, or the new bucket is created, you’ll get true back. You can then start working with the new bucket (see next section).
Deleting a bucket:
client.delete_bucket('fantastic-bucket-o-fun')
If a bucket has some content in it, you won’t be able to delete it with the above. If you want to clear out the content first THEN delete it, pass the :force => true option to delete_bucket:
client.delete_bucket('fantastic-bucket-o-fun', :force => true)
Note that you’ll get the same response if you try to delete a bucket that doesn’t exist in the first place:
client.delete_bucket('there-is-no-way-this-bucket-exists')
=> true
To see which buckets you have, you can do:
client.buckets => <Hash of Bucket instances (see next section)>
To get just the names of your buckets (as an Array), do:
client.bucket_names => ["elliotsmith-backup", "elliotsmith-instant-server", "elliotsmith-marvellous-bucket", "elliotsmith-test", "es-backup", "es-logs", "es-test", "moochlabs.com", "openadvantage", "openadvantage.org", "townx.org"]
While the client is nice for managing buckets, it’s a bit of a pain having to use a bucket name every time you want to perform an operation. That’s where the Bucket class comes in. A Bucket is similar to a Client, scoped to an individual bucket. This means you can perform a variety of operations on a single Bucket with more convenience and less mess. You also have access to all the Client methods if you want to carry on using those.
You can either create a fresh one:
bucket = Bucket.new('some-bucket', :access => 'accesskey', :secret => 'secretaccesskey')
Or create one from a YAML file:
bucket = Bucket.init('/home/you/s33r.yaml')
Or get a Bucket instance from a client:
bucket = client.get_bucket('some-bucket-name')
The last of these is actually creating a Bucket which hasn’t tried to connect to S3 yet; so there’s no guarantee that the bucket actually exists on S3.
All of these methods of getting a Bucket instance accept the same options as the Client constructor. In addition, you can specify a few more, including:
bucket = client.get_bucket(‘some-bucket-name’, :create => true, :check => true)
(The check is done after the attempt to create the bucket has been made.)
By default, when you use get_bucket, the Bucket you get back will inherit any options from the parent (e.g. use_ssl, dump_requests). To turn this behaviour off, do:
bucket = client.get_bucket('some-bucket-name', :orphan => true)
If you want to use the parent’s options but just override some of them (e.g. the parent has :use_ssl => true but you want a plain HTTP connection):
bucket = client.get_bucket('some-bucket-name', :use_ssl => false)
Any options you pass to get_bucket will take precedence over the parent’s options.
Once you have a bucket instance, you can destroy it with:
bucket = client.get_bucket('some-bucket-name')
bucket.destroy
Or if the bucket has any content:
bucket.destroy :force => true
This will return false if the destroy failed, or true if it succeeded.
That’s all well and good, but the point of S3 is storing stuff. How does that work? Use a Bucket instance.
Get a bucket using client.get_bucket (see above), or just create one:
bucket = Bucket.new('elliots-bucket-o-fun', :access => 'accesskey',
:secret => 'secretaccesskey')
Then use the bucket’s put or put_file methods to stick stuff onto S3:
# Default content type is 'text/plain' so this works OK.
bucket.put('hello world', :key => 'my_text')
# Set a custom content type.
bucket.put('<p>Bad HTML.</p>', :key => 'bad_html', :content_type => 'text/html')
# Put a file, using the file name as the key.
bucket.put_file('/home/you/myface.jpg')
# which is just an alias for
# bucket.put('/home/you/myface.jpg', :file => true)
# Manually set key instead of using filename.
bucket.put_file('/home/you/myface.jpg', :key => 'my_face')
# Manually set content type (put_file infers it from the filename otherwise).
bucket.put_file('/home/you/myface', :content_type => 'image/jpeg')
To stream some generic data (e.g. if you’ve got a data stream from some method and want to save it as an image file on S3):
# data should respond to stat or length to be streamed up to S3. bucket.put(data, :key => 'some_key', :content_type => 'text/jpeg')
The default behaviour when you put a file onto S3 is for the file to be rendered inline by the browser. Interestingly, if you send a ‘Content-Disposition’ header with your PUT request when putting a file onto S3, S3 will treat it as a download (attachment) instead when someone requests it. You can tell S3 you’d like the file to behave like this by adding a :render_as_attachment => true option:
bucket.put_file('/home/you/myface.jpg', :render_as_attachment => true)
When you download the file, the download file name is File.basename(resource_key). So if you manually set a key with the :key option, that’s what will be used for the downloadable file name, e.g.
bucket.put_file('/home/you/myface.jpg', :key => 'elliot.jpg',
:render_as_attachment => true)
creates a downloadable resource called ‘elliot.jpg’ which will save into a file ‘elliot.jpg’ when downloaded.
(By the way, if you’re experimenting with this, and upload a file normally, then with :render_as_attachment, your browser might do some caching which makes it look like it hasn’t worked. Try clearing your cache if you are uploading the same file to the same key but with different :render_as_attachment settings to make sure you’re seeing the current behaviour.)
Note that in all cases, if the content can be streamed, it will be. By default, s33r streams in 1Mb chunks; you can reset the size of chunks by passing a :chunk_size option when creating a client or bucket.
Once you’ve got some stuff in a bucket, you want to be able to do things with it.
Listing the objects in a bucket:
bucket.listing
You can also pass the standard request parameters understood for bucket listing requests, e.g.
bucket.listing :max_keys => 10, :prefix => '/home'
(See docs.amazonwebservices.com/AmazonS3/2006-03-01/ for the list of available parameters.)
Deleting an object from a bucket:
bucket.delete('some-key')
Listing the keys in a bucket:
bucket.keys
You may notice that the output from this looks horrific: it actually returns a BucketListing instance, which is also a hash of objects (S3Object instances) as BucketListing is a subclass of Hash.
BucketListing is a wrapper around objects on S3 to make them easy to manipulate. You don’t really need to worry about this class, as it is typically only accessed through a Bucket, as covered in the next section.
You can get an individual object from a Bucket with:
obj = bucket['some-key']
Note that this will refresh the bucket listing, create an empty S3Object instance (i.e. no data), then populate the object with a second request. If you want to avoid doing two requests and just do the first one, you can do:
obj = bucket['some-key', :lazy]
Note that this will leave obj in an unsaveable state. To save an object when in this state, you will need to at least:
See the next section for more details.
The S3Object class provides a convenient wrapper around an object stored on S3. It can also be used to put together the data and metadata you want to store on S3 on the local filesystem, then save that up to S3.
An S3Object can be created from a file on the local filesystem:
obj = S3Object.from_file('/home/you/mugshot.jpg')
obj is not associated with a bucket if you use this method.
You can set a key if you don’t want to use the filename:
obj = S3Object.from_file('/home/you/mugshot.jpg', :key => 'elliot.jpg')
Or from some text:
obj = S3Object.from_text('my-key', 'my text to put into that key')
Again, obj is not associated with a bucket in this case.
You can also create an object from scratch (e.g. for custom content types, or if you want to read in data from some bizarre source):
obj = S3Object.new('my-key')
obj.value = "<p>Some HTML</p>"
obj.content_type = "text/html"
If you want to associate an object with a bucket during creation:
b = Bucket.new('some-bucket', :access => 'accesskey', :secret => 'secretkey')
obj = S3Object.from_text('some_key', 'hello world', :bucket => b)
You can generate an S3Object instance by retrieving an object from a bucket:
obj = bucket['my-key']
In this case, obj is automatically associated with bucket.
To associate an object with a bucket after you’ve created it (so it’s easy to save), use the object’s bucket= method:
obj.bucket = client.get_bucket('my-bucket')
To save an object to S3 (which is already attached to a bucket):
obj.save
Or put it into a Bucket (even if it’s associated with a different bucket, this will save the object into the bucket you specify):
bucket = client.get_bucket('some-bucket')
bucket.put(obj)
Note that the object in the bucket and the local one (obj) are different objects: an S3Object is just a local representation of some data and metadata which you could save onto S3.
The key when you save an S3Object into a bucket is derived from the key attribute on the object, returned by:
obj.key
To put an object into a bucket under a different key:
bucket.put(obj, :key => 'my-new-key')
(Note that obj [the local representation of the object] will still reference its old key.)
If you want to make an object render as an attachment:
obj.render_as_attachment = true obj.save # or bucket.put(obj)
Or just:
obj.save :render_as_attachment => true
Note that if you retrieved an object which was originally intended to be rendered as an attachment, the @render_as_attachment instance variable is true. When the object is saved back to S3, it will continue to be rendered as an attachment. Call obj.render_as_attachment = false to treat the object as a standard object which is rendered inline.
You can move objects, either by retrieving them, deleting the old key, then putting them to a new key:
obj = bucket['old-key'] obj.delete # note this doesn't delete the local object obj.key = 'new-key' obj.save
Or you can do a renaming from an object:
obj.rename('new-key')
which just does the required sequence of actions in the background for you.
If you need to get a URL for a bucket or object, call the url method on it.
For example, for a bucket ‘some-bucket’:
bucket = Bucket.new('some-bucket', :access => 'accesskey', :secret => 'secretaccesskey')
bucket.url
# => "https://s3.amazonaws.com/some-bucket/"
bucket.url :use_ssl => false
# => "http://s3.amazonaws.com/some-bucket/"
# Get a subdomain version of the URL; note SSL must be turned off for this to work.
bucket.url :subdomain => true, :use_ssl => false
# => "http://some-bucket.s3.amazonaws.com/"
# Get an authenticated URL using the default expiry setting (now + 15 minutes)
bucket.url :authenticated => true
# => "https://s3.amazonaws.com/some-bucket/\
?Signature=KFVmPVdDoTV%2BXgYdo4g8VJP7Sls%3D&AWSAccessKeyId=accesskey\
&Expires=1168980021"
# Create an authenticated URL which expires in FAR_FUTURE (default=20) years time(!)
bucket.url :authenticated => true, :expires => :far_flung_future
# => "https://s3.amazonaws.com/some-bucket/?\
Signature=BaVFk7ILtMJs7kcOmHGRwX8dSCA%3D&AWSAccessKeyId=accesskey&Expires=2746860722"
(Note I used a bogus key here. Also note that the bucket doesn’t have to exist, doesn’t have to be public, and you don’t have to have access to it to create a URL. So you can use this part of s33r just for URL generation if you like.)
For objects:
# Create a blank object attached to a bucket.
obj = S3Object.new('some-object', nil, :bucket => bucket)
# Generate some URLs...
obj.url
# => "https://s3.amazonaws.com/some-bucket/some-object"
obj.url :use_ssl => false
# => "http://s3.amazonaws.com/some-bucket/some-object"
obj.url :use_ssl => false, :subdomain => true
# => "http://some-bucket.s3.amazonaws.com/some-object"
obj.url :use_ssl => false, :subdomain => true, :authenticated => true
# => "http://s3.amazonaws.com/some-bucket/some-object?\
Signature=ya4daYhZYPfksY9aM60BJBsPAkU%3D&AWSAccessKeyId=accesskey&Expires=1168984218"
Once you’ve uploaded an object, you can use one of these methods to create an authenticated URL for it. Then send the link to that illegally uploaded file to your friends and family. (Forget I just said that.)
You can also generate URLs without a Bucket or S3Object instance:
s3_url(:bucket => 'some-bucket', :key => 'some-object') # => "http://s3.amazonaws.com/some-bucket/some-object" # Authenticated subdomain URL without SSL for the ACL of a /bucket/key s3_url(:bucket => 'some-bucket', :key => 'some-object', :authenticated => true, \ :access => 'accesskey', :secret => 'secretkey', :expires => :far_flung_future, :acl => true) # => "http://s3.amazonaws.com/some-bucket/some-object?acl\ &Signature=1FucCuBpVfGcC1kPS2OE1e8Tzic%3D&AWSAccessKeyId=accesskey&Expires=2746864614"
The full list of options you can pass to the url or s3_url methods:
This is done with:
obj.delete
That’s it.
To set some metadata for an object:
obj['artist'] = "Wire" obj['track'] = "You Hung Your Lights in the Trees" obj['rating'] = 5 obj.save
(Metadata values are forced to strings before they are sent.)
This creates some special request headers (starting with x-amz-meta-) which are converted into metadata on the resource in S3. You don’t need to put x-amz-meta- in front of your metadata keys: s33r does that for you.
You can see the metadata on an object with:
obj.meta
which returns a hash of metadata keys and their values.
Or access individual metadata fields with:
obj['artist']
S3 provides access controls which enable you to specify the permissions on buckets and objects: see docs.amazonwebservices.com/AmazonS3/2006-03-01/RESTAccessPolicy.html for the documentation. The implementation of this in s33r is still quite primitive, but does cover all of the options offered by S3. Currently, you can only set/get ACLs through the Client class, not from buckets or objects directly.
In brief, S3 supports the following permissions:
These permissions can be given to the following types of user or group:
By default, the Owner of the resource has full control, and no other permissions are applied.
You don’t need to work at a low level with permissions, as s33r provides several convenience methods which wrap the ACLs for you.
There are convenience methods for setting a bucket or object public readable or private (i.e. giving the READ permission to the AllUsers group):
# Initialise the client from a YAML file.
client = Client.init('/home/you/s33r.yaml')
bucket = client.get_bucket('my-bucket')
bucket.make_public
bucket.make_public(:key => 'my-key')
bucket.make_private
bucket.make_private(:key => 'my-key')
Note that even if you make a bucket public, it doesn’t necessarily mean that its contents are also public. You still have to set public permissions for objects inside the bucket if you want them to be public too.
You can find out whether a bucket is public:
bucket.public?
You can also set permissions from an object directly:
obj = bucket['my-object'] obj.make_public obj.make_private obj.public?
The next step up in complexity is to use one of S3’s canned ACL headers. These are a short-hand way of specifying simple access control for a resource (bucket or object), and include:
Note that in all of these cases, the Owner gets FULL_CONTROL permissions on the resource.
All of the following examples assume you have initialised a Client instance, e.g.
client = Client.init('/home/you/s33r.yaml')
You can apply a canned ACL to any PUT request by adding an :canned_acl => <canned acl string> option when you call put. <canned acl string> is one of the strings shown above, e.g. from a bucket:
bucket.put_file('/home/you/photos/me.jpg', :canned_acl => 'public-read')
Alternatively, you can pass a :canned_acl option to the client constructor, e.g.
client = Client.new(:access => 'accesskey', :secret => 'secretkey', :canned_acl => 'public-read')
This would ensure that everything you created with this client had a ‘public-read’ permission set. Also, remember that any buckets fetched via this client will inherit this canned acl, unless you orphan them.
The ACL for a particular resource can be accessed through a Client instance:
client.acl(:bucket => 'my-bucket', :key => 'my-key')
Or from a Bucket or S3Object:
bucket.acl obj.acl
This returns an S3ACL::Policy instance which you can then add your own grants to, e.g.
bucket = Bucket.new('some-bucket', :access => 'accesskey', :secret => 'secretkey')
policy = bucket.acl
# Give the Amazon user elliot@example.com read access
# NB this needs to be a real Amazon user otherwise the grant is rejected
policy.add_grant(Grant.for_amazon_customer('elliot@example.com', :read))
bucket.acl = policy
At present, ACLs are always loaded or put to S3, and not stored locally with objects or buckets. This means that if you have an S3Object, you can’t operate on its ACL directly within the object: you have to manipulate the ACL on S3.
S3 gives you the facility to log access to a bucket. (Logging at the object level is not yet supported by S3.) Access is currently in a very raw form, but bear with me.
The first thing you need is a bucket to put logs into. This is just a standard bucket for which you turn on the "log receiver" (I made that up :)):
bucket = client.get_bucket('my-log-bucket')
# Can the bucket already receive logs?
bucket.log_receiver?
# Make the bucket capable of receiving logs
bucket.log_receiver :on
# Switch of log "receptivity"
bucket.log_receiver :off
Next, choose the bucket you want to log access to. Then set it up to log into your "log receiving" bucket:
another_bucket = client.get_bucket('sucker')
# Log into bucket, by specifying a Bucket instance
another_bucket.logs_to bucket
# Log into bucket by specifying a Bucket name.
another_bucket.logs_to 'my-log-bucket'
By default, logs (inside the log receiver bucket) are prefixed with ‘log-<bucket name>-’; you can set your own prefix like this:
# Specify a default log prefix another_bucket.logs_to bucket, :prefix => 'crazy-log-' # Turn off logging for a bucket another_bucket.logs_off
As logs are put into the bucket, each file has a key within the bucket like this:
PrefixYYYY-mm-DD-HH-MM-SS-UniqueString
Where Prefix is the :prefix option you specified, and UniqueString is some random S3 stuff.
I haven’t written a wrapper around the actual logs themselves yet, but you can read more about S3 logging here: docs.amazonwebservices.com/AmazonS3/2006-03-01/ServerLogs.html.
Note that you can also change logging status for a bucket directly from a Client instance, if you prefer:
client.logs_to 'log-bucket', :for_bucket => 'some-bucket'
Or get the logging status for a bucket from a Client instance:
client.logging :for_bucket => 'some-bucket'
(I’ll write about these another day.)
The main problem with s33r is that it’s like the Perl of S3 libraries: you can do everything 100 different ways. This is partly because I’ve evolved the code over time and made it more and more convenient for me, and I’ve built the new layers over the old low-level code. I also don’t like using options for required arguments to methods (call me old-fashioned), so in some places the ordering of arguments to a function is not intuitive. This can make it a bit confusing to use.
What we all want is an easy way to use make use of s33r. So this section is a tutorial on that topic. As an example, I’ll cover how to use it to write a really simple uploader which will:
I’ll use the highest-level abstractions provided by s33r to make this as simple as possible.
(And I’ll go through this another day too.)
s33r comes with command line examples, and a sample Rails application fores33r which can be used for basic management of your S3 buckets and files.
These are located in examples/cli. You can run any of them with:
ruby <script name>
where <script name> is the name of the *.rb file. All the files have variables you need to set first; at a minimum these will be these two:
The example scripts included are:
examples/fores33r contains a Rails application which makes use of the S33r library. It will display a bucket list and the contents of individual buckets. You can upload files to your buckets using it.
Before use, you will need to configure it by editing the file examples/s3.yaml (so fores33r knows your Amazon S3 keys) and copying it into the examples/fores33r/config folder.
This has been kept deliberately simple, and doesn’t provide any access control or logging management.
Thanks for bug reports to Keaka.
Parts of the client.rb to do with using persistent HTTP connections adapted from the AWS::S3 library by Marcel Molina.