Cloud storage is typically more reliable, scalable and secure than traditional on-premises storage systems. AWS offers object storage, file storage, block storage and data transfer services.
In this article you will first learn an approach to get a better understanding about the requirements of your data storage service and afterwards you will have a first look on the different kinds of storage services including some common use cases. The purpose of that is to have a guideline when to pick which service, which can be quite overwhelming in the beginning.
Three Vs of big data
There are various different guidelines to determine the system requirements for your data storage. In the area of big data is especially the three Vs approach a common and helpful one. It is easy to understand and will help you to pick the right storage type.
Speed at which data is being read or written, measured in reads per second or writes per second. Velocity can be based on batch processing, periodic, near-real-time or real-time speed.
How many different structures exist in the data. This can range from highly structured too semi-structured, unstructured or binary large object (BLOB) data.
Highly structured data is typically present in relational databases where each entry has the same number and type of attributes. An advantage of highly structured data is its self-described nature.
In semi-structured data, the entities belonging to the same class may have different attributes even though they are grouped together, and the attributes order is not important. You will find semi-structured data often in object-oriented databases. This data is more difficult to analyse and process in an automated fashion as you have to put more of reasoning about the data on the consumer or application.
Unstructured data does not have any sense or structure. There are no entities or attributes. It is typically text heavy but may contain data such as dates, numbers and facts.
BLOB data describes large data objects stored in a binary form. This can be for example videos, audio or image objects.
Volume is the total size of the dataset. There are two main use cases for data: developing valuable insights and storing for later use. For developing insights it’s often preferable to having more data which leads to better models. Compared to storage for later use, the less you need to keep the cheaper and less complex it will be, so the focus is more on the necessary data set.
Enterprise applications like databases or ERP systems can require dedicated, low-latency storage for each host. Block-based cloud storage solutions like Amazon EBS are provisioned with each Amazon Elastic Compute Cloud (EC2) instance and offer the ultra-low latency required for high-performance workloads.
Typical use cases for Amazon EBS:
- Boot volumes on EC2 instances
- Log processing applications
- Data warehousing applications
- Big data analytics engines like Hadoop and Amazon EMR clusters
Amazon S3 is an object storage service with vast scalability and metadata characteristics. It is ideal for building modern applications from scratch that require scale and flexibility and can also be used to import existing data stores for analytics, backup or archive. Cloud object storage makes it possible to store nearly limitless amount of data in its native format. But it is not possible to install an operating system or database system on it like with Block Storage.
Sharing files is important in modern enterprise systems. With file storage there is often a support for network-attached storage (NAS) server. Services like Amazon EFS are ideal for use cases such as large content repositories, development environments, media stores or user home directories which enabling accessing and sharing files between multiple users. Amazon EFS is used nowadays in the following use cases:
- Web serving
- Database backups
- Container storage
- Home directories
- Content management
- Media and entertainment workflows
- Workflow management
- Shared state management