6 things you should know about Amazon CloudSearch

Amazon CloudSearch is a fully-managed service in the AWS Cloud that makes it simple and cost-effective to set up, manage, and scale a custom search solution for your website or application.

Amazon CloudSearch supports a rich set of features including language-specific text processing for 34 languages, free text search, faceted search, geospatial search, customizable relevance ranking, highlighting, autocomplete and user configurable scaling and availability options [1].

Here are six key things you should know about Amazon CloudSearch.

1. How to build a search solution with Amazon CloudSearch?

There are three main steps:
Amazon-CloudSearch

  1. Create and configure a search domain: A search domain includes your searchable data and the search instances that handle your search requests. If you have multiple collections of data that you want to make searchable, you can create multiple search domains
  2. Upload the data you want to search to your domain: Amazon CloudSearch indexes your data and deploys the search index to one or more search instances,
  3. Search your domain: You send a search request to your domain’s search endpoint as an HTTP/HTTPS GET request.

2. How much data can you store in CloudSearch?

The number of partitions you need depends on your data and configuration. When you upload data, Amazon CloudSearch deploys one or more search instances. As your data volume grows, more search instances or larger search instances are deployed to contain your indexed data. The maximum number of search instances that can be deployed for a domain is 50, and a search index can be splited across a maximum of 10 partitions. For more information about CloudSearch limits, see Ref. [2].

3. How are you charged?

You are charged for Search instances, Document batch uploads, IndexDocuments requests and Data transfer. The Ref. [3] provides a useful resource to estimate your CloudSearch monthly bill.

4. Can you query for documents older than X and then send a batch delete request?

At the moment, Amazon CloudSearch does not provide this feature, but you can use one of their SDKs to list documents based on some parameters, and then delete them.

5. Do you need to keep a copy of the index anywhere?

You don’t really need, you can rely only on CloudSearch as source of the analytics data, as far as you respect AWS Security Best practices [5]. Amazon CloudSearch stores your data internally in high availability stores, you will not have to re-index your data, should a CloudSearch instance have an issue (this will be handled transparently). It offers key benefits including automatic node monitoring and recovery, built in data durability, easy setup and configuration and hands off auto scaling.

6. What’s the difference in the different instance sizes for CloudSearch?

A search instance is a single search engine that indexes documents and responds to search requests. It has a finite amount of RAM and CPU resources for indexing data and processing requests. There are four available instance types within CloudSearch: search.m1.small (2 Million documents), search.m1.large (8 Million documents), search.m2.xlarge (16 Million documents) and search.m2.2xlarge (32 Million documents).

Short Amazon CloudSearch Video

References:
[1] – Amazon CloudSearch
[2] – Understanding Amazon CloudSearch Limits
[3] – AWS Simple Monthly Calculator
[4] – AWS Security Best practices [pdf]