AWS costing for deploying a Machine Learning Model by processing the Data in a Big Data Framework:

In this blog article, we analyze the cost and pricing of the AWS for S3, EC2, EMR and Sage maker by considering over a 1GB of resultant data for 1 month. 

This analysis is done in the us-west-2(Oregon) region. 

The below procedure explaining the end to end architecture

1. Raw data has been dumped in S3.

2. Using EMR, a 3-node cluster has been initiated on EC2 with one master node and two slaves.

3. HIVE and Hadoop has been installed on the EMR cluster.

4. To extract the data that is required for training the machine learning model, HIVE Queries has been run on the raw data stored in S3.

5. The resultant data has been stored again in S3 bucket.

6. An instance has been started in Sagemaker for building a machine learning model.

7. This machine learning model consumed the resultant data that is stored in S3.

8. This Model is deployed in AWS-Sagemaker as an API endpoint.

This is the end-to-end architecture that we have considered to train a Machine Learning model in the AWS. Based on this we derived the cost of storing the data on the AWS s3, creating the Ec2 instance for the computations, initiating Amazon EMR cluster to run Hive job on the data and sending it back to S3 and for using Sage maker service for building, training & hosting Machine learning model.  

Individual cost should be considered for the below:

•  Cost for Amazon S3 (simple storage service).

•  Cost for Amazon EC2 (Elastic cloud computing).

•  Cost for Amazon EMR (Elastic map reduce).

•  Cost for Amazon Sagemaker.

Cost for Amazon S3 (simple storage service):

Amazon S3 (simple storage service) is one of the storage services in Amazon. It is a storage dump where we can dump all types of data here like structured, unstructured and semi-structured data.

Amazon charges for storing the data and for the requests to access the data.

Cost for data storage:

Amazon S3 offers 3 types of storage classes, those are

  • Standard Storage.
  • Standard Infrequent Storage.
  • Reduced Redundancy Storage.

Here we used S3 standard storage which is mostly used for frequently accessed data. Amazon charges “Pay only for what you use” type. The below following are the price list for S3 standard storage.

           Storage                  Pricing
First 50TB/Month   $0.0232/GB
Next 450 TB/Month   $0.022/GB
Over 500TB/Month   $0.021/GB


We used Amazon S3 to store 1GB of data for 1month, then the cost is

1GB x 1Month x 0.0232$ = $0.0232/GB/Month

Cost for requests:

Amazon also charges for number of request to access the data in the storage.

For PUT, COPY, POST, or LIST Requests Amazon charges $0.005 per 1,000 requests. For GET and all other Requests Amazon charges $0.0004 per 1,000 requests. DELETE requests are free.

We used 300 GET Request and 300 PUT, COPY, POST, or LIST Requests to access our 1GB of data in 1month, the total cost is $0.02462.

Total cost we spent for storing and accessing 1GB of data for 1Month Is $0.02426

Cost for Amazon EC2 (Elastic cloud computing):

Amazon EC2 (Elastic cloud computing) is a service by the Amazon for cloud computing. There are four ways to pay for Amazon EC2 instances: On-Demand, Reserved Instances, and Spot Instances. You can also pay for Dedicated Hosts which provide you with EC2 instance capacity on physical servers dedicated for your use. Amazon charges based on two types, those are EC2 instance and EBS.

Cost for EC2 instance:

Here we used On-Demand instance to create a virtual environment for computing over the cloud. In every instance amazon charges for the usage based on the used hours, selected AMIs(Amazon Machine Instance) and number of instances.

Here we used m4.large machine, for every 1hour of usage amazon charges $0.1 on this machine. For our purpose we used 3 m4.large instances for 100 hours in 1 month, so the total amount we paid for Amazon is m4.large x 3 instances x 0.1 $/hour x 100 hours/months =  $30/hour

Cost for EBS (Elastic Block Storage):

Amazon Elastic Block Store (Amazon EBS) provides persistent block storage volumes for use with Amazon EC2 instances in the AWS Cloud. Each Amazon EBS volume is automatically replicated within its Availability Zone to protect you from component failure. There are 5 types of EBS volumes, they are

  • Amazon EBS General Purpose SSD (gp2) volumes.
  • Amazon EBS Provisioned IOPS SSD (io1) volumes.
  • Amazon EBS Throughput Optimized SSD (st1) volumes.
  • Amazon EBS Cold HDD (sc1) volumes.
  • Amazon EBS Snapshots to Amazon S3.

Amazon charges for EBS based on the size, duration, volume. Here we used General Purpose Amazon EBS volume to store EC2 instance.

The total amount we paid for Amazon EBS is

0.1$/GB/Month x 32GB x 1month x 1 volume = $3.2

Finally, the total amount we spent for Amazon EC2 for one month is $33.2

Cost for Amazon EMR (Elastic map reduce):

Amazon EMR (Elastic map reduce) is a managed Hadoop framework that process vast amount of data across Amazon EC2 instances. Here we used Amazon EMR to write a Hive query on the data in the S3 and return the result back to the S3.

When we only considering the Amazon EMR pricing the Amazon EC2 instance bill also included in it. In this block, we separated Amazon EC2 pricing.

Amazon charges for EMR according to the type of the machine used for a number of hours and number of instances. Here we used an m4.large machine to work with Hadoop ecosystem. Amazon charges $0.03 per each hour used. In our process, we used 3 m4.large instances for 100 hours in a month.

So the total cost of the Amazon EMR is m4.large x 3 instances  x $0.03/hours x 100 hours in 1 month = $9

Cost for Amazon Sagemaker:

Amazon Sagemaker is a fully-managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. Amazon Sagemaker removes all the barriers that typically slow down developers who want to use machine learning.

Amazon charges money for the following:

  • Cost for Building a model.
  • Cost for Training a model.
  • Cost for Hosting a model.

Cost for building a model:

Amazon Sagemaker makes it easy to build a machine learning model and make it ready for training. In this building process, Amazon charges for various factors like notebook instance, storing a model, data processing.

For notebook instance, we used ml.t2.medium instance. Amazon charges according to the instance type used per hour. In our example, our chosen instances cost $0.05 per hour.

We used this notebook instance for 3 hours to build our model.

The cost we spent was ml.t2.medium x 3hours x $0.05 per hour = $ 0.14/hour.

And for storing this instance Amazon charges according to the storage type and storage size. For SSD type storage and for 1GB of data for 1 month Amazon charges $0.14/GB/Month.

In our use case we used SSD type to store 1GB for 1month, the cost is

SSD type x 1GB x 1Month x$0.14 = $ 0.14/hour/GB/month.

And for data processing in the model building, Amazon charges $0.02 per 1GB.

The total cost for building Machine learning is $0.4

Cost for Training a model:

Amazon charges according to the machine used in the training and for the storage of that Machine Learning model.

Here we used ml.m4.xlarge machine to train a model. Amazon charges

$0.28per hour for this machine. Storage price is as same as in the building a model.

For storage

SSD type x 1GB x 1Month x$0.14 = $ 0.14/hour/GB/month

For instance

  ml.m4.xlarge x 3 instances x 1 hour x $0.28 per hour = $0.84

So, costs incurred for training a model is $0.98

Cost for Hosting a model:

Once your model is trained and tuned, Amazon Sagemaker makes it easy to deploy in production, so you can start generating predictions on new data. Amazon Sagemaker deploys your model on auto-scaling instances.

Amazon charges for the machine instance to deploy our machine learning model and to store that model. Along with that amazon charges for data processing also.

In our case we chosen ml.t2.medium instance to deploy our machine learning model. We used storage type as same as in the building a model.

Here we hosted our model in ml.t2.medium machine instance for 720 hours. Amazon charges for type of the machine instance and for the number of hours it is used. This machine instance cost is $0.07 per hour.

For Machine instance:

ml.t2. medium x 1 instance x 720 hours x $0.07 per hour = $46.80

For storage:

SSD type x 1GB x 1Month x$0.14 = $ 0.14/hour/GB/month

For data processing:

1GB x $0.02 = $ 0.02 per GB.

The total cost we paid for hosting model is $46.96

Finally, we spent $48.34 (46.80+0.4+0.98) for Amazon Sagemaker.

The total amount that we spent in AWS for building a machine learning model on 1GB data for one month in a Big Data Framework is $90.56(42.22+48.34).

Costing is attached in the below link

Leave a Reply

Your email address will not be published. Required fields are marked *