A data engineer wants to use an Amazon Elastic Map Reduce for an application. The data engineer needs to make sure it complies with regulatory requirements. The auditor must be able to confirm at any point which servers are running and which network access controls are deployed.
Which action should the data engineer take to meet this requirement?
A. Provide the auditor IAM accounts with the SecurityAudit policy attached to their group.
B. Provide the auditor with SSH keys for access to the Amazon EMR cluster.
C. Provide the auditor with CloudFormation templates.
D. Provide the auditor with access to AWS DirectConnect to use their existing tools.
An Amazon EMR cluster using EMRFS has access to petabytes of data on Amazon S3, originating from multiple unique data sources. The customer needs to query common fields across some of the data sets to be able to perform interactive joins and then display results quickly.
Which technology is most appropriate to enable this capability?
A. Presto
B. MicroStrategy
C. Pig
D. R Studio
A company is building a new application in AWS. The architect needs to design a system to collect application log events. The design should be a repeatable pattern that minimizes data loss if an application instance fails, and keeps a durable copy of a log data for at least 30 days.
What is the simplest architecture that will allow the architect to analyze the logs?
A. Write them directly to a Kinesis Firehose. Configure Kinesis Firehose to load the events into an Amazon Redshift cluster for analysis.
B. Write them to a file on Amazon Simple Storage Service (S3). Write an AWS Lambda function that runs in response to the S3 event to load the events into Amazon Elasticsearch Service for analysis.
C. Write them to the local disk and configure the Amazon CloudWatch Logs agent to load the data into CloudWatch Logs and subsequently into Amazon Elasticsearch Service.
D. Write them to CloudWatch Logs and use an AWS Lambda function to load them into HDFS on an Amazon Elastic MapReduce (EMR) cluster for analysis.
A system needs to collect on-premises application spool files into a persistent storage layer in AWS. Each spool file is 2 KB. The application generates 1 M files per hour. Each source file is automatically deleted from the local server after an hour.
What is the most cost-efficient option to meet these requirements?
A. Write file contents to an Amazon DynamoDB table.
B. Copy files to Amazon S3 Standard Storage.
C. Write file contents to Amazon ElastiCache.
D. Copy files to Amazon S3 infrequent Access Storage.
An administrator receives about 100 files per hour into Amazon S3 and will be loading the files into Amazon Redshift. Customers who analyze the data within Redshift gain significant value when they receive data as quickly as possible. The customers have agreed to a maximum loading interval of 5 minutes.
Which loading approach should the administrator use to meet this objective?
A. Load each file as it arrives because getting data into the cluster as quickly as possibly is the priority.
B. Load the cluster as soon as the administrator has the same number of files as nodes in the cluster.
C. Load the cluster when the administrator has an event multiple of files relative to Cluster Slice Count, or 5 minutes, whichever comes first.
D. Load the cluster when the number of files is less than the Cluster Slice Count.
An administrator tries to use the Amazon Machine Learning service to classify social media posts that mention the administrator's company into posts that require a response and posts that do not. The training dataset of 10,000 posts contains the details of each post including the timestamp, author, and full text of the post. The administrator is missing the target labels that are required for training.
Which Amazon Machine Learning model is the most appropriate for the task?
A. Binary classification model, where the target class is the require-response post
B. Binary classification model, where the two classes are the require-response post and does-not-requireresponse
C. Multi-class prediction model, with two classes: require-response post and does-not-require-response
D. Regression model where the predicted value is the probability that the post requires a response
An organization is designing an application architecture. The application will have over 100 TB of data and will support transactions that arrive at rates from hundreds per second to tens of thousands per second, depending on the day of the week and time of day. All transaction data, must be durably and reliably stored. Certain read operations must be performed with strong consistency.
Which solution meets these requirements?
A. Use Amazon DynamoDB as the data store and use strongly consistent reads when necessary.
B. Use an Amazon Relational Database Service (RDS) instance sized to meet the maximum anticipated transaction rate and with the High Availability option enabled.
C. Deploy a NoSQL data store on top of an Amazon Elastic MapReduce (EMR) cluster, and select the HDFS High Durability option.
D. Use Amazon Redshift with synchronous replication to Amazon Simple Storage Service (S3) and row-level locking for strong consistency.
When an EC2 instance that is backed by an s3-based AMI is terminated. What happens to the data on the root volume?
A. Data is unavailable until the instance is restarted
B. Data is automatically deleted
C. Data is automatically saved as an EBS snapshot
D. Data is automatically saved as an EBS volume
The Amazon EC2 web service can be accessed using the _____ web services messaging protocol. This interface is described by a Web Services Description Language (WSDL) document.
A. SOAP
B. DCOM
C. CORBA
D. XML-RPC
You have a load balancer configured for VPC, and all backend Amazon EC2 instances are in service. However, your web browser times out when connecting to the load balancer's DNS name.
Which options are probable causes of this behavior?
A. The load balancer was not configured to use a public subnet with an Internet gateway configured
B. The Amazon EC2 instances do not have a dynamically allocated private IP address
C. The security groups or network ACLs are not properly configured for web traffic
D. The load balancer is not configured in a private subnet with a NAT instance
E. The VPC does not have a VGW configured