AWS MLA-C01 English - AWS練習問題集

AWS Certified Machine Learning Engineer – Associate validates technical ability in implementing ML workloads in production and operationalizing them. Boost your career profile and credibility, and position yourself for in-demand machine learning job roles.

■AWS MLA-C01(EN) All

/114

AWS MLA-C01(EN) All

AWS Certified Machine Learning Engineer - Associate validates technical ability in implementing ML workloads in production and operationalizing them. Boost your career profile and credibility, and position yourself for in-demand machine learning job roles.

1 / 114

No.1
A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a central model registry, model deployment, and model monitoring.
The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.
The company needs to use the central model registry to manage different versions of models in the application.
Which action will meet this requirement with the LEAST operational overhead?

A. Create a separate Amazon Elastic Container Registry (Amazon ECR) repository for each model.

B. Use Amazon Elastic Container Registry (Amazon ECR) and unique tags for each model version.

C. Use the SageMaker Model Registry and model groups to catalog the models.

D. Use the SageMaker Model Registry and unique tags for each model version.

Answer: C

Explanation:
https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-models.html
"Each model package in a Model Group corresponds to a trained model. The version of each model package is a numerical value that starts at 1 and is incremented with each new model package added to a Model Group. For example, if 5 model packages are added to a Model Group, the model package versions will be 1, 2, 3, 4, and 5."

2 / 114

No.2
A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a central model registry, model deployment, and model monitoring.
The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.
The company is experimenting with consecutive training jobs.
How can the company MINIMIZE infrastructure startup times for these jobs?

A. Use Managed Spot Training.

B. Use SageMaker managed warm pools.

C. Use SageMaker Training Compiler.

D. Use the SageMaker distributed data parallelism (SMDDP) library.

Answer: B

Explanation:
https://docs.aws.amazon.com/sagemaker/latest/dg/train-warm-pools.html#train-warm-pools-how-it-works
SageMaker managed warm pools let you retain and reuse provisioned infrastructure after the completion of a training job to reduce latency for repetitive workloads, such as iterative experimentation or running many jobs consecutively.

3 / 114

No.3
A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a central model registry, model deployment, and model monitoring.
The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.
The company must implement a manual approval-based workflow to ensure that only approved models can be deployed to production endpoints.
Which solution will meet this requirement?

A. Use SageMaker Experiments to facilitate the approval process during model registration.

B. Use SageMaker ML Lineage Tracking on the central model registry. Create tracking entities for the approval process.

C. Use SageMaker Model Monitor to evaluate the performance of the model and to manage the approval.

D. Use SageMaker Pipelines. When a model version is registered, use the AWS SDK to change the approval status to "Approved."

Answer: D

Explanation:
This tricked my as option D is not clearly worded:
A. No, SageMaker Experiments allows to track and organize your experiment but not for approving models
B. No, SageMaker ML Lineage Tracking allows to track model lineage but do not allow to approve a model
C. No, SageMaker Model Monitor allows to monitor data quality, model quality, bias and feature attribution
D. Yes, After you create a model version, you typically evaluate its performance and then update the approval status of the model version. You can update the approval status of a model version by using the SDK, SageMaker Studio console or with a condition step in a SageMaker AI pipeline

4 / 114

No.4
A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a central model registry, model deployment, and model monitoring.
The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.
The company needs to run an on-demand workflow to monitor bias drift for models that are deployed to real-time endpoints from the application.
Which action will meet this requirement?

A. Configure the application to invoke an AWS Lambda function that runs a SageMaker Clarify job.

B. Invoke an AWS Lambda function to pull the sagemaker-model-monitor-analyzer built-in SageMaker image.

C. Use AWS Glue Data Quality to monitor bias.

D. Use SageMaker notebooks to compare the bias.

Answer: A

Explanation:
A. Yes, Clarify allows to get bias - https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-configure-processing-jobs.html
B. No, the built-in image sagemaker-model-monitor-analyzer provides a range of model monitoring capabilities (constraint suggestion, statistics generation, constraint validation against a baseline, and emitting Amazon CloudWatch metrics) but you need Clarify for bias
C. No, Glue Data Quality doesn't analyze bias
D. No, well from a Notebook you can execute pretty much everything including a Clarify Job, however notebooks are for experiments and models development not for enabling real-time application features

5 / 114

No.5
A company stores historical data in .csv files in Amazon S3. Only some of the rows and columns in the .csv files are populated. The columns are not labeled. An ML engineer needs to prepare and store the data so that the company can use the data to train ML models.
Select and order the correct steps from the following list to perform this task. Each step should be selected one time or not at all. (Select and order three.)
• Create an Amazon SageMaker batch transform job for data cleaning and feature engineering.
• Store the resulting data back in Amazon S3.
• Use Amazon Athena to infer the schemas and available columns.
• Use AWS Glue crawlers to infer the schemas and available columns.
• Use AWS Glue DataBrew for data cleaning and feature engineering.

Step 1: [Select…] -------------------------- Select… Create an Amazon SageMaker batch transform job for data cleaning and feature engineering. Store the resulting data back in Amazon S3. Use Amazon Athena to infer the schemas and available columns. Use AWS Glue crawlers to infer the schemas and available columns. Use AWS Glue DataBrew for data cleaning and feature engineering.

Step 2: [Select…] -------------------------- Select… Create an Amazon SageMaker batch transform job for data cleaning and feature engineering. Store the resulting data back in Amazon S3. Use Amazon Athena to infer the schemas and available columns. Use AWS Glue crawlers to infer the schemas and available columns. Use AWS Glue DataBrew for data cleaning and feature engineering.

Step 3: [Select…] -------------------------- Select… Create an Amazon SageMaker batch transform job for data cleaning and feature engineering. Store the resulting data back in Amazon S3. Use Amazon Athena to infer the schemas and available columns. Use AWS Glue crawlers to infer the schemas and available columns. Use AWS Glue DataBrew for data cleaning and feature engineering.

Answer:
Step 1:Use AWS Glue crawlers to infer the schemas and available columns.
Step 2: Use AWS Glue DataBrew for data cleaning and feature engineering.
Step 3: Store the resulting data back in Amazon S3.

Explanation:
Order of steps:
Use AWS Glue crawlers to infer schemas and available columns.
Use AWS Glue DataBrew for data cleaning and feature engineering.
Store the resulting data back in Amazon S3.

6 / 114

No.6
An ML engineer needs to use Amazon SageMaker Feature Store to create and manage features to train a model.
Select and order the steps from the following list to create and use the features in Feature Store. Each step should be selected one time. (Select and order three.)
• Access the store to build datasets for training.
• Create a feature group.
• Ingest the records.

Step 1: [Select…] -------------------------- Select… Access the store to build datasets for training. Create a feature group. Ingest the records.

Step 2: [Select…] -------------------------- Select… Access the store to build datasets for training. Create a feature group. Ingest the records.

Step 3: [Select…] -------------------------- Select… Access the store to build datasets for training. Create a feature group. Ingest the records.

Answer:
Step 1: 「Create a feature group.」
Step 2: 「Ingest the records.」
Step 3: 「Access the store to build datasets for training.」

Explanation:
To create and manage features using Amazon SageMaker Feature Store, follow these steps:
1)Create a feature group : Organize your features by defining a feature group.
2)Ingest the records : Load the data into the feature group.
3)Access the store to build datasets for training : Retrieve the data from the feature group to prepare for model training.

7 / 114

No.7
A company wants to host an ML model on Amazon SageMaker. An ML engineer is configuring a continuous integration and continuous delivery (Cl/CD) pipeline in AWS CodePipeline to deploy the model. The pipeline must run automatically when new training data for the model is uploaded to an Amazon S3 bucket.
Select and order the pipeline's correct steps from the following list. Each step should be selected one time or not at all. (Select and order three.)
• An S3 event notification invokes the pipeline when new data is uploaded.
• S3 Lifecycle rule invokes the pipeline when new data is uploaded.
• SageMaker retrains the model by using the data in the S3 bucket.
• The pipeline deploys the model to a SageMaker endpoint.
• The pipeline deploys the model to SageMaker Model Registry.

Step 1: [Select…] -------------------------- Select… An S3 event notification invokes the pipeline when new data is uploaded. An S3 Lifecycle rule invokes the pipeline when new data is uploaded. SageMaker retrains the model by using the data in the S3 bucket. The pipeline deploys the model to a SageMaker endpoint. The pipeline deploys the model to SageMaker Model Registry.

Step 2: [Select…] -------------------------- Select… An S3 event notification invokes the pipeline when new data is uploaded. An S3 Lifecycle rule invokes the pipeline when new data is uploaded. SageMaker retrains the model by using the data in the S3 bucket. The pipeline deploys the model to a SageMaker endpoint. The pipeline deploys the model to SageMaker Model Registry.

Step 3: [Select…] -------------------------- Select… An S3 event notification invokes the pipeline when new data is uploaded. An S3 Lifecycle rule invokes the pipeline when new data is uploaded. SageMaker retrains the model by using the data in the S3 bucket. The pipeline deploys the model to a SageMaker endpoint. The pipeline deploys the model to SageMaker Model Registry.

Answer:
Step 1: 「An S3 event notification invokes the pipeline when new data is uploaded.」
Step 2: 「SageMaker retrains the model by using the data in the S3 bucket.」
Step 3: 「The pipeline deploys the model to a SageMaker endpoint.」

Explanation:
First two steps are obvious. For the last (third) step, there are two choices.
1. The pipeline deploys the model to a SageMaker endpoint.
2. The pipeline deploys the model to SageMaker Model Registry.
Since the question says deploy the model, 1st option is correct. If we add the model to Model Registry, it will be just there in the catalog, but won't get deployed. It needs to be explicitly deployed to the endpoint. So 2 is the correct third step.

8 / 114

No.8
An ML engineer is building a generative AI application on Amazon Bedrock by using large language models (LLMs).
Select the correct generative AI term from the following list for each description. Each term should be selected one time or not at all. (Select three.)
• Embedding
• Retrieval Augmented Generation (RAG)
• Temperature
• Token

Text representation of basic units of data processed by LLMs [Select…] -------------------------------- Select… Embedding Retrieval Augmented Generation (RAG) Temperature Token

High-dimensional vectors that contain the semantic meaning of text [Select…] -------------------------------- Select… Embedding Retrieval Augmented Generation (RAG) Temperature Token

Enrichment of information from additional data sources to improve a generated response [Select…] -------------------------------- Select… Embedding Retrieval Augmented Generation (RAG) Temperature Token

Answer:
Text representation of basic units of data processed by LLMs: 「Token」
High-dimensional vectors that contain the semantic meaning of text: 「Embedding」
Enrichment of information from additional data sources to improve a generated response: 「Retrieval Augmented Generation (RAG)」

Explanation:
Token : Represents a unit of text used in processing and generating responses by the model.
Temperature : Controls the randomness and creativity of the generated output, allowing for adjustments in the model's response style.
Embedding : Converts text into vector representations to capture semantic meaning, enhancing the model's ability to understand and generate coherent content.

9 / 114

No.9
An ML engineer is working on an ML model to predict the prices of similarly sized homes. The model will base predictions on several features The ML engineer will use the following feature engineering techniques to estimate the prices of the homes:
• Feature splitting
• Logarithmic transformation
• One-hot encoding
• Standardized distribution
Select the correct feature engineering techniques for the following list of features. Each feature engineering technique should be selected one time or not at all (Select three.)

City (name) [Select…] -------------------------------- Select… Feature splitting Logarithmic transformation One-hot encoding Standardized distribution

Type_year(type of home and year the home was built) [Select…] -------------------------------- Feature splitting Logarithmic transformation One-hot encoding Standardized distribution

Size of the building(square feet or square meters) [Select…] -------------------------------- Feature splitting Logarithmic transformation One-hot encoding Standardized distribution

Answer:
City (name): One-hot encoding
Type_year(type of home and year the home was built): Feature splitting
Size of the building(square feet or square meters): Logarithmic transformation

Explanation:
Size of building (Square feet or Square Meters) = Logarithmic transformation
Explanation: Building size is a numerical feature that often shows a skewed distribution and can have a non-linear relationship with price. Logarithmic transformation is suitable because:
It helps normalize skewed distributions
It can help linearize the relationship between size and price
It's particularly useful for features that follow exponential or multiplicative patterns
Real estate data often shows log-normal distributions

10 / 114

10.

No.10
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.
Which AWS service or feature can aggregate the data from the various data sources?

A. Amazon EMR Spark jobs

B. Amazon Kinesis Data Streams

C. Amazon DynamoDB

D. AWS Lake Formation

Answer: D

Explanation:
Yet another poorly worded AWS certification question. Here is my reasoning, the question is about "aggregate the data from S3 and on-premise mysql" and I do intend "aggregate" as put in the same place, therefore:
A. No, while EMR spark job can connect to S3 and MySQL (spark can connect to mysql database), but it is a better tool to process data and then sore them in S3
B. No, KDS it is for delivering streaming data sources to specific destinations (S3, OpenSearch ...)
C. No, DynamoDB is a nosql db that is not a great fit here
D. Yes, Lake Formation "combine different types of structured and unstructured data into a centralized repository" https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html and "with Lake Formation, you can import your data using workflows" and as it is based on AWS Glue it supports both S3 and mysql

11 / 114

11.

No.11
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.
After the data is aggregated, the ML engineer must implement a solution to automatically detect anomalies in the data and to visualize the result.
Which solution will meet these requirements?

A. Use Amazon Athena to automatically detect the anomalies and to visualize the result.

B. Use Amazon Redshift Spectrum to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.

C. Use Amazon SageMaker Data Wrangler to automatically detect the anomalies and to visualize the result.

D. Use AWS Batch to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.

Answer: C

Explanation:
https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-analyses.html
"Amazon SageMaker Data Wrangler includes built-in analyses that help you generate visualizations and data analyses in a few clicks. "
This question is tricky because it makes you think you need Quicksight for the "visualization' part.

12 / 114

12.

No.12
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.
The training dataset includes categorical data and numerical data. The ML engineer must prepare the training dataset to maximize the accuracy of the model.
Which action will meet this requirement with the LEAST operational overhead?

A. Use AWS Glue to transform the categorical data into numerical data.

B. Use AWS Glue to transform the numerical data into categorical data.

C. Use Amazon SageMaker Data Wrangler to transform the categorical data into numerical data.

D. Use Amazon SageMaker Data Wrangler to transform the numerical data into categorical data.

Answer: C

Explanation:
https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transform.html

Data Wrangler can be used for encoding categorical data, i.e. the process of creating a numerical representation for categories. Categorical encoding encodes categorical data that is in string format into arrays of integers. Data Wrangler supports ordinal and a one-hot encoding, also similarity encoding (more advanced).
https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transform.html#data-wrangler-transform-cat-encode

AWS Glue also has Data science recipe steps for One Hot Encoding and Categorical Mapping.
https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.data-science.html

However Data Wrangler is more user-friendly with visual and natural language interfaces for less operational overhead

13 / 114

13.

No.13
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.
Before the ML engineer trains the model, the ML engineer must resolve the issue of the imbalanced data.
Which solution will meet this requirement with the LEAST operational effort?

A. Use Amazon Athena to identify patterns that contribute to the imbalance. Adjust the dataset accordingly.

B. Use Amazon SageMaker Studio Classic built-in algorithms to process the imbalanced dataset.

C. Use AWS Glue DataBrew built-in features to oversample the minority class.

D. Use the Amazon SageMaker Data Wrangler balance data operation to oversample the minority class.

Answer: D

Explanation:
https://aws.amazon.com/blogs/machine-learning/balance-your-data-for-machine-learning-with-amazon-sagemaker-data-wrangler/

Both Glue DataBrew and Data Wrangler allows data preparation for ML with no-code/low-code (aka low ops effort). However, Data Wrangler provides built-in transformation for balancing dataset (random oversampling, random undersampling and smote) https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transform.html#data-wrangler-transform-balance-data while DataBrew doesn't provide built-in recipe step for balancing dataset, actually it provides a smaller set of data science recipe steps limited to binarization, bucketization, categorical mapping, one-hot encoding, scaling, skewness and tokenization https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.data-science.html

14 / 114

14.

No.14
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.
The ML engineer needs to use an Amazon SageMaker built-in algorithm to train the model.
Which algorithm should the ML engineer use to meet this requirement?

A. LightGBM

B. Linear learner

C. К-means clustering

D. Neural Topic Model (NTM)

Answer: A

Explanation:
A. LightGBM: Handles class imbalance; captures feature interdependencies; models complex patterns.
B. Linear Learner: Limited with interdependent features; struggles with complex patterns; suitable for linear relationships.
C. K-means Clustering: Unsupervised algorithm; not suitable for classification; can't handle class imbalance.
D. Neural Topic Model (NTM): Designed for topic modeling; unsuitable for fraud detection; doesn't address class imbalance.

15 / 114

15.

No.15
A company has deployed an XGBoost prediction model in production to predict if a customer is likely to cancel a subscription. The company uses Amazon SageMaker Model Monitor to detect deviations in the F1 score.
During a baseline analysis of model quality, the company recorded a threshold for the F1 score. After several months of no change, the model's F1 score decreases significantly.
What could be the reason for the reduced F1 score?

A. Concept drift occurred in the underlying customer data that was used for predictions.

B. The model was not sufficiently complex to capture all the patterns in the original baseline data.

C. The original baseline data had a data quality issue of missing values.

D. Incorrect ground truth labels were provided to Model Monitor during the calculation of the baseline.

Answer: A

Explanation:
Concept Drift: Occurs when the statistical properties of the data used for predictions change over time, causing the model to underperform on current data.
Why Not the Other Options?

B. If the model complexity was insufficient, the issue would have been detected during the initial evaluation or baseline analysis, not after months of stable performance.
C. A data quality issue would have impacted the model's performance immediately after deployment, not months later.
D. Incorrect labels during baseline calculation could result in an inaccurate baseline F1 score, but it wouldn't explain a significant drop after stable performance over months.

16 / 114

16.

No.16
A company has a team of data scientists who use Amazon SageMaker notebook instances to test ML models. When the data scientists need new permissions, the company attaches the permissions to each individual role that was created during the creation of the SageMaker notebook instance.
The company needs to centralize management of the team's permissions.
Which solution will meet this requirement?

A. Create a single IAM role that has the necessary permissions. Attach the role to each notebook instance that the team uses.

B. Create a single IAM group. Add the data scientists to the group. Associate the group with each notebook instance that the team uses.

C. Create a single IAM user. Attach the AdministratorAccess AWS managed IAM policy to the user. Configure each notebook instance to use the IAM user.

D. Create a single IAM group. Add the data scientists to the group. Create an IAM role. Attach the AdministratorAccess AWS managed IAM policy to the role. Associate the role with the group. Associate the group with each notebook instance that the team uses.

Answer: A

Explanation:
Yet another unclear question from AWS ... anyway, I am basically picking A as all the other options are not applicable or are unclear.
A. Yes, this make sense
B. No, you cannot assign (aka associate) group to notebook instances
C. No, for two reason: AdministratorAccess policy is overly broad (violate least privilege principle) and you cannot assign IAM user to notebook instance
D. No, for many reasons: AdministratorAccess policy is overly broad, not clear what associating a role to a group means (maybe a group has permissions to assume a role ...) and you cannot assign a group to a notebook

17 / 114

17.

No.17
An ML engineer needs to use an ML model to predict the price of apartments in a specific location.
Which metric should the ML engineer use to evaluate the model's performance?

A. Accuracy

B. Area Under the ROC Curve (AUC)

C. F1 score

D. Mean absolute error (MAE)

Answer: D

Explanation:
The only one for regression is D. Other 3 are for classification.
This is a regression problem, thus MAE is the right answer. Accuracy, AUC-ROC and F1 are for classification.

18 / 114

18.

No.18
An ML engineer has trained a neural network by using stochastic gradient descent (SGD). The neural network performs poorly on the test set. The values for training loss and validation loss remain high and show an oscillating pattern. The values decrease for a few epochs and then increase for a few epochs before repeating the same cycle.
What should the ML engineer do to improve the training process?

A. Introduce early stopping.

B. Increase the size of the test set.

C. Increase the learning rate.

D. Decrease the learning rate.

Answer: D

Explanation:
A. No, early stopping is for preventing overfitting
B. No, increasing test will not help with oscillating loss
C. No, increasing learning rate will make things worsening
D. Oscillating loss in training is a sign that the training is not converging, this can happen when learning rate is too high. Reducing learning rate will help here

19 / 114

19.

No.19
An ML engineer needs to process thousands of existing CSV objects and new CSV objects that are uploaded. The CSV objects are stored in a central Amazon S3 bucket and have the same number of columns. One of the columns is a transaction date. The ML engineer must query the data based on the transaction date.
Which solution will meet these requirements with the LEAST operational overhead?

A. Use an Amazon Athena CREATE TABLE AS SELECT (CTAS) statement to create a table based on the transaction date from data in the central S3 bucket. Query the objects from the table.

B. Create a new S3 bucket for processed data. Set up S3 replication from the central S3 bucket to the new S3 bucket. Use S3 Object Lambda to query the objects based on transaction date.

C. Create a new S3 bucket for processed data. Use AWS Glue for Apache Spark to create a job to query the CSV objects based on transaction date. Configure the job to store the results in the new S3 bucket. Query the objects from the new S3 bucket.

D. Create a new S3 bucket for processed data. Use Amazon Data Firehose to transfer the data from the central S3 bucket to the new S3 bucket. Configure Firehose to run an AWS Lambda function to query the data based on transaction date.

Answer: A

Explanation:
Athena allows direct querying of data stored in Amazon S3 using SQL without requiring data movement or transformation. CTAS (CREATE TABLE AS SELECT): Creates a new table based on a filtered or transformed dataset, such as transaction dates, and stores the results in S3.
Why Not the Other Options?
B. S3 Object Lambda is designed for on-the-fly data transformation, not querying data efficiently. Adding replication increases complexity without addressing the querying requirement directly.
C. Glue is suited for complex ETL workflows, but it introduces significant operational overhead for a task that Athena can handle more easily.
D. Firehose is designed for streaming data, not processing large existing datasets.

20 / 114

20.

No.20
A company has a large, unstructured dataset. The dataset includes many duplicate records across several key attributes.
Which solution on AWS will detect duplicates in the dataset with the LEAST code development?

A. Use Amazon Mechanical Turk jobs to detect duplicates.

B. Use Amazon QuickSight ML Insights to build a custom deduplication model.

C. Use Amazon SageMaker Data Wrangler to pre-process and detect duplicates.

D. Use the AWS Glue FindMatches transform to detect duplicates.

Answer: D

Explanation:
AWS Glue FindMatches is specifically designed to identify duplicate or matching records in datasets without requiring labeled training data. It uses machine learning to find fuzzy matches and allows customization to fine-tune the matching process, making it ideal for this scenario.

21 / 114

21.

No.21
A company needs to run a batch data-processing job on Amazon EC2 instances. The job will run during the weekend and will take 90 minutes to finish running. The processing can handle interruptions. The company will run the job every weekend for the next 6 months.
Which EC2 instance purchasing option will meet these requirements MOST cost-effectively?

A. Spot Instances

B. Reserved Instances

C. On-Demand Instances

D. Dedicated Instances

Answer: A

Explanation:
Cost effective + Interruptions + Short duration 90mins = Spot instance

22 / 114

22.

No.22
An ML engineer has an Amazon Comprehend custom model in Account A in the us-east-1 Region. The ML engineer needs to copy the model to Account В in the same Region.
Which solution will meet this requirement with the LEAST development effort?

A. Use Amazon S3 to make a copy of the model. Transfer the copy to Account B.

B. Create a resource-based IAM policy. Use the Amazon Comprehend ImportModel API operation to copy the model to Account B.

C. Use AWS DataSync to replicate the model from Account A to Account B.

D. Create an AWS Site-to-Site VPN connection between Account A and Account В to transfer the model.

Answer: B

Explanation:
Amazon Comprehend - ImportModel API to facilitate the transfer of custom models between AWS accounts. STEPS:
1. Exporting the model from Account A.
2. Creating a resource-based IAM policy in Account A to grant access to Account B.
3. Using the ImportModel API in Account B to import the model.

23 / 114

23.

No.23
An ML engineer is training a simple neural network model. The ML engineer tracks the performance of the model over time on a validation dataset. The model's performance improves substantially at first and then degrades after a specific number of epochs.
Which solutions will mitigate this problem? (Choose two.)

A. Enable early stopping on the model.

B. Increase dropout in the layers.

C. Increase the number of layers.

D. Increase the number of neurons.

E. Investigate and reduce the sources of model bias.

Answer: A, B

Explanation:
The issue is overfitting. Soln:-
A. Early stopping:- Stops training when validation performance declines
B. Increase dropout:- reduces overfitting by randomly disabling neurons

24 / 114

24.

No.24
A company has a Retrieval Augmented Generation (RAG) application that uses a vector database to store embeddings of documents. The company must migrate the application to AWS and must implement a solution that provides semantic search of text files. The company has already migrated the text repository to an Amazon S3 bucket.
Which solution will meet these requirements?

A. Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

B. Use a custom Amazon SageMaker notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

C. Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

D. Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Answer: C

Explanation:
https://docs.aws.amazon.com/kendra/latest/dg/data-source-s3.html

25 / 114

25.

★No.25
A company uses Amazon Athena to query a dataset in Amazon S3. The dataset has a target variable that the company wants to predict.
The company needs to use the dataset in a solution to determine if a model can predict the target variable.
Which solution will provide this information with the LEAST development effort?

A. Create a new model by using Amazon SageMaker Autopilot. Report the model's achieved performance.

B. Implement custom scripts to perform data pre-processing, multiple linear regression, and performance evaluation. Run the scripts on Amazon EC2 instances.

C. Configure Amazon Macie to analyze the dataset and to create a model. Report the model's achieved performance.

D. Select a model from Amazon Bedrock. Tune the model with the data. Report the model's achieved performance.

Answer: A

26 / 114

26.

No.26
A company wants to predict the success of advertising campaigns by considering the color scheme of each advertisement. An ML engineer is preparing data for a neural network model. The dataset includes color information as categorical data.
Which technique for feature engineering should the ML engineer use for the model?

A. Apply label encoding to the color categories. Automatically assign each color a unique integer.

B. Implement padding to ensure that all color feature vectors have the same length.

C. Perform dimensionality reduction on the color categories.

D. One-hot encode the color categories to transform the color scheme feature into a binary matrix.

Answer: D

Explanation:
1. Label Encoding: Ordinal relationship
2. Padding: Sequence data
3. Dimensionality Reduction: High-dimensional data
4. One-Hot Encoding: Categorical data (Right)

27 / 114

27.

No.27
A company uses a hybrid cloud environment. A model that is deployed on premises uses data in Amazon 53 to provide customers with a live conversational engine.
The model is using sensitive data. An ML engineer needs to implement a solution to identify and remove the sensitive data.
Which solution will meet these requirements with the LEAST operational overhead?

A. Deploy the model on Amazon SageMaker. Create a set of AWS Lambda functions to identify and remove the sensitive data.

B. Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Create an AWS Batch job to identify and remove the sensitive data.

C. Use Amazon Macie to identify the sensitive data. Create a set of AWS Lambda functions to remove the sensitive data.

D. Use Amazon Comprehend to identify the sensitive data. Launch Amazon EC2 instances to remove the sensitive data.

Answer: C

Explanation:
Macie - Identify sensitive data

28 / 114

28.

No.28
An ML engineer needs to create data ingestion pipelines and ML model deployment pipelines on AWS. All the raw data is stored in Amazon S3 buckets.
Which solution will meet these requirements?

A. Use Amazon Data Firehose to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

B. Use AWS Glue to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

C. Use Amazon Redshift ML to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

D. Use Amazon Athena to create the data ingestion pipelines. Use an Amazon SageMaker notebook to create the model deployment pipelines.

Answer: B

Explanation:
Data ingestion - Glue ; Model deployment pipeline - sagemaker studio classic
This is the main use-case for Glu

29 / 114

29.

No.29
A company that has hundreds of data scientists is using Amazon SageMaker to create ML models. The models are in model groups in the SageMaker Model Registry.
The data scientists are grouped into three categories: computer vision, natural language processing (NLP), and speech recognition. An ML engineer needs to implement a solution to organize the existing models into these groups to improve model discoverability at scale. The solution must not affect the integrity of the model artifacts and their existing groupings.
Which solution will meet these requirements?

A. Create a custom tag for each of the three categories. Add the tags to the model packages in the SageMaker Model Registry.

B. Create a model group for each category. Move the existing models into these category model groups.

C. Use SageMaker ML Lineage Tracking to automatically identify and tag which model groups should contain the models.

D. Create a Model Registry collection for each of the three categories. Move the existing model groups into the collections.

Answer: D

Explanation:
Because according to the documentation -
"Any operation you perform on your Collections does not affect the integrity of the individual Model Groups they contain—the underlying Model Group artifacts in Amazon S3 and Amazon ECR are not modified."
A could also be a valid option but in here we see exactly this:
https://docs.aws.amazon.com/sagemaker/latest/dg/modelcollections.html
"Any operation you perform on your Collections does not affect the integrity of the individual Model Groups they contain—the underlying Model Group artifacts in Amazon S3 and Amazon ECR are not modified."

30 / 114

30.

No.30
A company runs an Amazon SageMaker domain in a public subnet of a newly created VPC. The network is configured properly, and ML engineers can access the SageMaker domain.
Recently, the company discovered suspicious traffic to the domain from a specific IP address. The company needs to block traffic from the specific IP address.
Which update to the network configuration will meet this requirement?

A. Create a security group inbound rule to deny traffic from the specific IP address. Assign the security group to the domain.

B. Create a network ACL inbound rule to deny traffic from the specific IP address. Assign the rule to the default network Ad for the subnet where the domain is located.

C. Create a shadow variant for the domain. Configure SageMaker Inference Recommender to send traffic from the specific IP address to the shadow endpoint.

D. Create a VPC route table to deny inbound traffic from the specific IP address. Assign the route table to the domain.

Answer: B

Explanation:
Protection at subnet level: Network ACL. Specific IP addresses can be denied at inbound connection level.

31 / 114

31.

No.31
A company is gathering audio, video, and text data in various languages. The company needs to use a large language model (LLM) to summarize the gathered data that is in Spanish.
Which solution will meet these requirements in the LEAST amount of time?

A. Train and deploy a model in Amazon SageMaker to convert the data into English text. Train and deploy an LLM in SageMaker to summarize the text.

B. Use Amazon Transcribe and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Jurassic model to summarize the text.

C. Use Amazon Rekognition and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Anthropic Claude model to summarize the text.

D. Use Amazon Comprehend and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Stable Diffusion model to summarize the text.

Answer: B

Explanation:
LEAST amount of time -> A is out
C is out because Claude does NOT fit for summarization
D is out because that's for image generation.

32 / 114

32.

No.32
A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records every second.
The company needs to implement a scalable solution on AWS to identify anomalous data points.
Which solution will meet these requirements with the LEAST operational overhead?

A. Ingest real-time data into Amazon Kinesis data streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to process the data streams and to detect data anomalies.

B. Ingest real-time data into Amazon Kinesis data streams. Deploy an Amazon SageMaker endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

C. Ingest real-time data into Apache Kafka on Amazon EC2 instances. Deploy an Amazon SageMaker endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

D. Send real-time data to an Amazon Simple Queue Service (Amazon SQS) FIFO queue. Create an AWS Lambda function to consume the queue messages. Program the Lambda function to start an AWS Glue extract, transform, and load (ETL) job for batch processing and anomaly detection.

Answer: A

Explanation:
Option A
High-volume real-time: Kinesis Data Streams
Scalable: Managed Apache Flink
Anomaly detection: RANDOM_CUT_FOREST
Low overhead: Fully managed services

33 / 114

33.

No.33
A company has a large collection of chat recordings from customer interactions after a product release. An ML engineer needs to create an ML model to analyze the chat data. The ML engineer needs to determine the success of the product by reviewing customer sentiments about the product.
Which action should the ML engineer take to complete the evaluation in the LEAST amount of time?

A. Use Amazon Rekognition to analyze sentiments of the chat conversations.

B. Train a Naive Bayes classifier to analyze sentiments of the chat conversations.

C. Use Amazon Comprehend to analyze sentiments of the chat conversations.

D. Use random forests to classify sentiments of the chat conversations.

Answer: C

Explanation:
https://docs.aws.amazon.com/comprehend/latest/dg/what-is.htm
Prebuilt sentiment analysis + Fast setup + NLP --Comprehend

34 / 114

34.

No.34
A company has a conversational AI assistant that sends requests through Amazon Bedrock to an Anthropic Claude large language model (LLM). Users report that when they ask similar questions multiple times, they sometimes receive different answers. An ML engineer needs to improve the responses to be more consistent and less random.
Which solution will meet these requirements?

A. Increase the temperature parameter and the top_k parameter.

B. Increase the temperature parameter. Decrease the top_k parameter.

C. Decrease the temperature parameter. Increase the top_k parameter.

D. Decrease the temperature parameter and the top_k parameter.

Answer: D

Explanation:
Lower temperature: High probable output
Lower Top k : Focus on likely output
https://docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html

35 / 114

35.

No.35
A company is using ML to predict the presence of a specific weed in a farmer's field. The company is using the Amazon SageMaker linear learner built-in algorithm with a value of multiclass_dassifier for the predictorjype hyperparameter.
What should the company do to MINIMIZE false positives?

A. Set the value of the weight decay hyperparameter to zero.

B. Increase the number of training epochs.

C. Increase the value of the target_precision hyperparameter.

D. Change the value of the predictorjype hyperparameter to regressor.

Answer: C

Explanation:
A. Weight decay = 0 → No regularization, doesn’t target false positives.
B. More epochs → Longer training, risks overfitting, no direct impact on false positives.
C. Higher precision → Prioritizes correct positives, reduces false positives.
D. Regressor → Predicts continuous values, unsuitable for classification.
https://docs.aws.amazon.com/sagemaker/latest/dg/ll_hyperparameters.html

36 / 114

36.

No.36
A company has implemented a data ingestion pipeline for sales transactions from its ecommerce website. The company uses Amazon Data Firehose to ingest data into Amazon OpenSearch Service. The buffer interval of the Firehose stream is set for 60 seconds. An OpenSearch linear model generates real-time sales forecasts based on the data and presents the data in an OpenSearch dashboard.
The company needs to optimize the data ingestion pipeline to support sub-second latency for the real-time dashboard.
Which change to the architecture will meet these requirements?

A. Use zero buffering in the Firehose stream. Tune the batch size that is used in the PutRecordBatch operation.

B. Replace the Firehose stream with an AWS DataSync task. Configure the task with enhanced fan-out consumers.

C. Increase the buffer interval of the Firehose stream from 60 seconds to 120 seconds.

D. Replace the Firehose stream with an Amazon Simple Queue Service (Amazon SQS) queue.

Answer: A

Explanation:
A. Use zero buffering to minimize latency by delivering data immediate.
Tune batch size to optimize throughput & ensures sub-second delivery for real-time dashboards.
Although is quite new solution , A will do the trick:
https://aws.amazon.com/about-aws/whats-new/2023/12/amazon-kinesis-data-firehose-zero-buffering/

37 / 114

37.

No.37
A company has trained an ML model in Amazon SageMaker. The company needs to host the model to provide inferences in a production environment.
The model must be highly available and must respond with minimum latency. The size of each request will be between 1 KB and 3 MB. The model will receive unpredictable bursts of requests during the day. The inferences must adapt proportionally to the changes in demand.
How should the company deploy the model into production to meet these requirements?

A. Create a SageMaker real-time inference endpoint. Configure auto scaling. Configure the endpoint to present the existing model.

B. Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster. Use ECS scheduled scaling that is based on the CPU of the ECS cluster.

C. Install SageMaker Operator on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Deploy the model in Amazon EKS. Set horizontal pod auto scaling to scale replicas based on the memory metric.

D. Use Spot Instances with a Spot Fleet behind an Application Load Balancer (ALB) for inferences. Use the ALBRequestCountPerTarget metric as the metric for auto scaling.

Answer: A

Explanation:
SageMaker real-time endpoint: Purpose built for Auto scaling, low latency, handles bursts.
https://aws.amazon.com/blogs/machine-learning/configuring-autoscaling-inference-endpoints-in-amazon-sagemaker/

38 / 114

38.

No.38
An ML engineer needs to use an Amazon EMR cluster to process large volumes of data in batches. Any data loss is unacceptable.
Which instance purchasing option will meet these requirements MOST cost-effectively?

A. Run the primary node, core nodes, and task nodes on On-Demand Instances.

B. Run the primary node, core nodes, and task nodes on Spot Instances.

C. Run the primary node on an On-Demand Instance. Run the core nodes and task nodes on Spot Instances.

D. Run the primary node and core nodes on On-Demand Instances. Run the task nodes on Spot Instances.

Answer: D

Explanation:
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-instances-guidelines.html#emr-plan-spot-instances
"The task nodes process data but do not hold persistent data in HDFS. If they terminate because the Spot price has risen above your maximum Spot price, no data is lost"

39 / 114

39.

No.39
A company wants to improve the sustainability of its ML operations.
Which actions will reduce the energy usage and computational resources that are associated with the company's training jobs? (Choose two.)

A. Use Amazon SageMaker Debugger to stop training jobs when non-converging conditions are detected.

B. Use Amazon SageMaker Ground Truth for data labeling.

C. Deploy models by using AWS Lambda functions.

D. Use AWS Trainium instances for training.

E. Use PyTorch or TensorFlow with the distributed training option.

Answer: A, D

Explanation:
Blog: https://aws.amazon.com/blogs/machine-learning/optimizing-mlops-for-sustainability/
Sustainability Goals: instances are up to 25% more energy efficient than comparable accelerated computing EC2 instances;
https://aws.amazon.com/ai/machine-learning/trainium/

SageMaker debugger helps to optimize resource consumption by detecting under-utilization of system resources, identifying training problems, and using built-in rules to monitor and stop training jobs as soon as bugs are detected.

40 / 114

40.

No.40
A company is planning to create several ML prediction models. The training data is stored in Amazon S3. The entire dataset is more than 5 ТВ in size and consists of CSV, JSON, Apache Parquet, and simple text files.
The data must be processed in several consecutive steps. The steps include complex manipulations that can take hours to finish running. Some of the processing involves natural language processing (NLP) transformations. The entire process must be automated.
Which solution will meet these requirements?

A. Process data at each step by using Amazon SageMaker Data Wrangler. Automate the process by using Data Wrangler jobs.

B. Use Amazon SageMaker notebooks for each data processing step. Automate the process by using Amazon EventBridge.

C. Process data at each step by using AWS Lambda functions. Automate the process by using AWS Step Functions and Amazon EventBridge.

D. Use Amazon SageMaker Pipelines to create a pipeline of data processing steps. Automate the pipeline by using Amazon EventBridge.

Answer: D

Explanation:
Large datasets + Multiple file formats + Complex automation & orchestration of ML workflows + NLP Transformation ---> Sagemaker pipelines + Event bridge for trigger

41 / 114

41.

No.41
An ML engineer needs to use AWS CloudFormation to create an ML model that an Amazon SageMaker endpoint will host.
Which resource should the ML engineer declare in the CloudFormation template to meet this requirement?

A. AWS::SageMaker::Model

B. AWS::SageMaker::Endpoint

C. AWS::SageMaker::NotebookInstance

D. AWS::SageMaker::Pipeline

Answer: A

Explanation:
Type: AWS::SageMaker::Model
Properties:
Containers:
- ContainerDefinition
EnableNetworkIsolation: Boolean
ExecutionRoleArn: String
InferenceExecutionConfig:
InferenceExecutionConfig
ModelName: String
PrimaryContainer:
ContainerDefinition
Tags:
- Tag
VpcConfig:
VpcConfig

42 / 114

42.

No.42
An advertising company uses AWS Lake Formation to manage a data lake. The data lake contains structured data and unstructured data. The company's ML engineers are assigned to specific advertisement campaigns.
The ML engineers must interact with the data through Amazon Athena and by browsing the data directly in an Amazon S3 bucket. The ML engineers must have access to only the resources that are specific to their assigned advertisement campaigns.
Which solution will meet these requirements in the MOST operationally efficient way?

A. Configure IAM policies on an AWS Glue Data Catalog to restrict access to Athena based on the ML engineers' campaigns.

B. Store users and campaign information in an Amazon DynamoDB table. Configure DynamoDB Streams to invoke an AWS Lambda function to update S3 bucket policies.

C. Use Lake Formation to authorize AWS Glue to access the S3 bucket. Configure Lake Formation tags to map ML engineers to their campaigns.

D. Configure S3 bucket policies to restrict access to the S3 bucket based on the ML engineers' campaigns.

Answer: C

Explanation:
AWS Lake Formation → Tag resources with campaigns → Map ML engineers to campaigns → Fine-grained access control → Operational efficiency

43 / 114

43.

No.43
An ML engineer needs to use data with Amazon SageMaker Canvas to train an ML model. The data is stored in Amazon S3 and is complex in structure. The ML engineer must use a file format that minimizes processing time for the data.
Which file format will meet these requirements?

A. CSV files compressed with Snappy

B. JSON objects in JSONL format

C. JSON files compressed with gzip

D. Apache Parquet files

Answer: D

Explanation:
Minimize processing time: -Why Apache Parquet? Columnar, fast I/O; Efficient for complex data; Built-in compression; SageMaker Canvas compatible

44 / 114

44.

No.44
An ML engineer is evaluating several ML models and must choose one model to use in production. The cost of false negative predictions by the models is much higher than the cost of false positive predictions.
Which metric finding should the ML engineer prioritize the MOST when choosing the model?

A. Low precision

B. High precision

C. Low recall

D. High recall

Answer: D

Explanation:
A. Low precision: Increases false positives; less relevant here.
B. High precision: Reduces false positives; not the priority.
C. Low recall: Increases false negatives; must be avoided.
D. High recall: Correct; minimizes false negatives.

45 / 114

45.

No.45
A company has trained and deployed an ML model by using Amazon SageMaker. The company needs to implement a solution to record and monitor all the API call events for the SageMaker endpoint. The solution also must provide a notification when the number of API call events breaches a threshold.
Which solution will meet these requirements?

A. Use SageMaker Debugger to track the inferences and to report metrics. Create a custom rule to provide a notification when the threshold is breached.

B. Use SageMaker Debugger to track the inferences and to report metrics. Use the tensor_variance built-in rule to provide a notification when the threshold is breached.

C. Log all the endpoint invocation API events by using AWS CloudTrail. Use an Amazon CloudWatch dashboard for monitoring. Set up a CloudWatch alarm to provide notification when the threshold is breached.

D. Add the Invocations metric to an Amazon CloudWatch dashboard for monitoring. Set up a CloudWatch alarm to provide notification when the threshold is breached.

Answer: C

Explanation:
The company needs to implement a solution to record and monitor all the API call events for the SageMaker endpoint. Its needs to RECORD all events.

46 / 114

46.

No.46
A company has AWS Glue data processing jobs that are orchestrated by an AWS Glue workflow. The AWS Glue jobs can run on a schedule or can be launched manually.
The company is developing pipelines in Amazon SageMaker Pipelines for ML model development. The pipelines will use the output of the AWS Glue jobs during the data processing phase of model development. An ML engineer needs to implement a solution that integrates the AWS Glue jobs with the pipelines.
Which solution will meet these requirements with the LEAST operational overhead?

A. Use AWS Step Functions for orchestration of the pipelines and the AWS Glue jobs.

B. Use processing steps in SageMaker Pipelines. Configure inputs that point to the Amazon Resource Names (ARNs) of the AWS Glue jobs.

C. Use Callback steps in SageMaker Pipelines to start the AWS Glue workflow and to stop the pipelines until the AWS Glue jobs finish running.

D. Use Amazon EventBridge to invoke the pipelines and the AWS Glue jobs in the desired order.

Answer: C

Explanation:
https://aws.amazon.com/blogs/machine-learning/extend-amazon-sagemaker-pipelines-to-include-custom-steps-using-callback-steps/
The example is exactly for the same use-case as the question.

47 / 114

47.

No.47
A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive.
A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database.
Which solution will meet these requirements with the LEAST implementation effort?

A. Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time.

B. Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist.

C. Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist.

D. Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data. Share the dataset with the data scientist.

Answer: A

Explanation:
Amazon Redshift database → Sensitive data → Dynamic Data Masking → Query-time masking for data scientist → No transformation or additional storage → Least effort

48 / 114

48.

No.48
An ML engineer is using a training job to fine-tune a deep learning model in Amazon SageMaker Studio. The ML engineer previously used the same pre-trained model with a similar dataset. The ML engineer expects vanishing gradient, underutilized GPU, and overfitting problems.
The ML engineer needs to implement a solution to detect these issues and to react in predefined ways when the issues occur. The solution also must provide comprehensive real-time metrics during the training.
Which solution will meet these requirements with the LEAST operational overhead?

A. Use TensorBoard to monitor the training job. Publish the findings to an Amazon Simple Notification Service (Amazon SNS) topic. Create an AWS Lambda function to consume the findings and to initiate the predefined actions.

B. Use Amazon CloudWatch default metrics to gain insights about the training job. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

C. Expand the metrics in Amazon CloudWatch to include the gradients in each training step. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

D. Use SageMaker Debugger built-in rules to monitor the training job. Configure the rules to initiate the predefined actions.

Answer: D

Explanation:
SageMaker Debugger → Built-in rules → Monitor training (vanishing gradients, GPU use, overfitting) → Predefined actions → Low overhead

49 / 114

49.

No.49
A credit card company has a fraud detection model in production on an Amazon SageMaker endpoint. The company develops a new version of the model. The company needs to assess the new model's performance by using live data and without affecting production end users.
Which solution will meet these requirements?

A. Set up SageMaker Debugger and create a custom rule.

B. Set up blue/green deployments with all-at-once traffic shifting.

C. Set up blue/green deployments with canary traffic shifting.

D. Set up shadow testing with a shadow variant of the new model.

Answer: D

Explanation:
Shadow testing is a technique used to evaluate a new model's performance by running it alongside the current production model, processing the same live data but without affecting production outcomes.
https://docs.aws.amazon.com/sagemaker/latest/dg/shadow-tests-create.html

50 / 114

50.

No.50
A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of millions of rows of user activity every day. ML engineers access the data to develop their ML models.
The ML engineers need to generate daily reports and analyze click trends over the past 3 days by using Amazon Athena. The company must retain the data for 30 days before archiving the data.
Which solution will provide the HIGHEST performance for data retrieval?

A. Keep all the time-series data without partitioning in the S3 bucket. Manually move data that is older than 30 days to separate S3 buckets.

B. Create AWS Lambda functions to copy the time-series data into separate S3 buckets. Apply S3 Lifecycle policies to archive data that is older than 30 days to S3 Glacier Flexible Retrieval.

C. Organize the time-series data into partitions by date prefix in the S3 bucket. Apply S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval.

D. Put each day's time-series data into its own S3 bucket. Use S3 Lifecycle policies to archive S3 buckets that hold data that is older than 30 days to S3 Glacier Flexible Retrieval.

Answer: C

Explanation:
Time-series data → Partition by date in S3 → Optimized Athena queries → S3 lifecycle policies → Move partitions >30 days to S3 Glacier Flexible Retrieval

51 / 114

51.

No.51
A company has deployed an ML model that detects fraudulent credit card transactions in real time in a banking application. The model uses Amazon SageMaker Asynchronous Inference. Consumers are reporting delays in receiving the inference results.
An ML engineer needs to implement a solution to improve the inference performance. The solution also must provide a notification when a deviation in model quality occurs.
Which solution will meet these requirements?

A. Use SageMaker real-time inference for inference. Use SageMaker Model Monitor for notifications about model quality.

B. Use SageMaker batch transform for inference. Use SageMaker Model Monitor for notifications about model quality.

C. Use SageMaker Serverless Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

D. Keep using SageMaker Asynchronous Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

Answer: A

Explanation:
Sagemaker Real-Time Inference - Faster predictions to solve delay issues;
Model Monitor to tracks model quality and sends alerts for deviations.

52 / 114

52.

No.52
An ML engineer needs to implement a solution to host a trained ML model. The rate of requests to the model will be inconsistent throughout the day.
The ML engineer needs a scalable solution that minimizes costs when the model is not in use. The solution also must maintain the model's capacity to respond to requests during times of peak usage.
Which solution will meet these requirements?

A. Create AWS Lambda functions that have fixed concurrency to host the model. Configure the Lambda functions to automatically scale based on the number of requests to the model.

B. Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Set a static number of tasks to handle requests during times of peak usage.

C. Deploy the model to an Amazon SageMaker endpoint. Deploy multiple copies of the model to the endpoint. Create an Application Load Balancer to route traffic between the different copies of the model at the endpoint.

D. Deploy the model to an Amazon SageMaker endpoint. Create SageMaker endpoint auto scaling policies that are based on Amazon CloudWatch metrics to adjust the number of instances dynamically.

Answer: D

Explanation:
Sagemaker endpoint to host ML models; Cloudwatch metrics like CPU for autoscaling.
{
"TargetValue": 50.0,
"CustomizedMetricSpecification":
{
"MetricName": "CPUUtilization",
"Namespace": "/aws/sagemaker/Endpoints",
"Dimensions": [
{"Name": "EndpointName", "Value": "my-endpoint" },
{"Name": "VariantName","Value": "my-variant"}
],
"Statistic": "Average",
"Unit": "Percent"
}
}

https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html

53 / 114

53.

No.53
A company uses Amazon SageMaker Studio to develop an ML model. The company has a single SageMaker Studio domain. An ML engineer needs to implement a solution that provides an automated alert when SageMaker compute costs reach a specific threshold.
Which solution will meet these requirements?

A. Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Cost Explorer to send an alert when the threshold is reached.

B. Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Budgets to send an alert when the threshold is reached.

C. Add resource tagging by editing each user's IAM profile. Configure AWS Cost Explorer to send an alert when the threshold is reached.

D. Add resource tagging by editing each user's IAM profile. Configure AWS Budgets to send an alert when the threshold is reached.

Answer: B

Explanation:
Sagemaker user profiles tagging: https://docs.aws.amazon.com/sagemaker/latest/dg/domain-user-profile-add.html
Budgets : For cost tracking and setting thresholds

54 / 114

54.

No.54
A company uses Amazon SageMaker for its ML workloads. The company's ML engineer receives a 50 MB Apache Parquet data file to build a fraud detection model. The file includes several correlated columns that are not required.
What should the ML engineer do to drop the unnecessary columns in the file with the LEAST effort?

A. Download the file to a local workstation. Perform one-hot encoding by using a custom Python script.

B. Create an Apache Spark job that uses a custom processing script on Amazon EMR.

C. Create a SageMaker processing job by calling the SageMaker Python SDK.

D. Create a data flow in SageMaker Data Wrangler. Configure a transform step.

Answer: D

Explanation:
Parquet data file → SageMaker Data Wrangler → Explore data → Transform → Drop unnecessary columns → Clean and preprocess data → Export to S3 → Fraud detection model

55 / 114

55.

No.55
A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company's main competitor.
Which solution will meet this requirement?

A. Configure the competitor's name as a blocked phrase in Amazon Q Business.

B. Configure an Amazon Q Business retriever to exclude the competitor’s name.

C. Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor's name.

D. Configure document attribute boosting in Amazon Q Business to deprioritize the competitor's name.

Answer: A

Explanation:
https://docs.aws.amazon.com/amazonq/latest/api-reference/API_BlockedPhrasesConfiguration.html

56 / 114

56.

No.56
An ML engineer needs to use Amazon SageMaker to fine-tune a large language model (LLM) for text summarization. The ML engineer must follow a low-code no-code (LCNC) approach.
Which solution will meet these requirements?

A. Use SageMaker Studio to fine-tune an LLM that is deployed on Amazon EC2 instances.

B. Use SageMaker Autopilot to fine-tune an LLM that is deployed by a custom API endpoint.

C. Use SageMaker Autopilot to fine-tune an LLM that is deployed on Amazon EC2 instances.

D. Use SageMaker Autopilot to fine-tune an LLM that is deployed by SageMaker JumpStart.

Answer: D

Explanation:
LCNC solution: SageMaker Autopilot → SageMaker JumpStart → Deploy pre-trained LLM → Fine-tune for text summarization

57 / 114

57.

No.57
A company has an ML model that needs to run one time each night to predict stock values. The model input is 3 MB of data that is collected during the current day. The model produces the predictions for the next day. The prediction process takes less than 1 minute to finish running.
How should the company deploy the model on Amazon SageMaker to meet these requirements?

A. Use a multi-model serverless endpoint. Enable caching.

B. Use an asynchronous inference endpoint. Set the InitialInstanceCount parameter to 0.

C. Use a real-time endpoint. Configure an auto scaling policy to scale the model to 0 when the model is not in use.

D. Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1.

Answer: D

Explanation:
ServerlessConfig:-
MemorySizeInMB: Set to 2048 MB (options: 1024–6144 MB).
MaxConcurrency: Set to 1 (minimum for nightly predictions).
Efficient and cost-effective for one-time nightly use.
"The prediction process takes less than 1 minute to finish running" so why would you provision something in the first place - go serverless.

58 / 114

58.

No.58
An ML engineer trained an ML model on Amazon SageMaker to detect automobile accidents from dosed-circuit TV footage. The ML engineer used SageMaker Data Wrangler to create a training dataset of images of accidents and non-accidents.
The model performed well during training and validation. However, the model is underperforming in production because of variations in the quality of the images from various cameras.
Which solution will improve the model's accuracy in the LEAST amount of time?

A. Collect more images from all the cameras. Use Data Wrangler to prepare a new training dataset.

B. Recreate the training dataset by using the Data Wrangler corrupt image transform. Specify the impulse noise option.

C. Recreate the training dataset by using the Data Wrangler enhance image contrast transform. Specify the Gamma contrast option.

D. Recreate the training dataset by using the Data Wrangler resize image transform. Crop all images to the same size.

Answer: B

Explanation:
Did you guys clearly understand the question? "The model performed well during training and validation. However, the model is underperforming in production because of variations in the quality of the images from various cameras."

https://aws.amazon.com/blogs/machine-learning/prepare-image-data-with-amazon-sagemaker-data-wrangler/
Corrupting an image or creating any kind of noise helps make a model more robust. The model can predict with more accuracy even if it receives a corrupted image because it was trained with corrupt and non-corrupt images.

59 / 114

59.

No.59
A company has an application that uses different APIs to generate embeddings for input text. The company needs to implement a solution to automatically rotate the API tokens every 3 months.
Which solution will meet this requirement?

A. Store the tokens in AWS Secrets Manager. Create an AWS Lambda function to perform the rotation.

B. Store the tokens in AWS Systems Manager Parameter Store. Create an AWS Lambda function to perform the rotation.

C. Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS managed key to perform the rotation.

D. Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS owned key to perform the rotation.

Answer: A

Explanation:
Secret manager has automatic rotation

60 / 114

60.

No.60
An ML engineer receives datasets that contain missing values, duplicates, and extreme outliers. The ML engineer must consolidate these datasets into a single data frame and must prepare the data for ML.
Which solution will meet these requirements?

A. Use Amazon SageMaker Data Wrangler to import the datasets and to consolidate them into a single data frame. Use the cleansing and enrichment functionalities to prepare the data.

B. Use Amazon SageMaker Ground Truth to import the datasets and to consolidate them into a single data frame. Use the human-in-the-loop capability to prepare the data.

C. Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon Q Developer to generate code snippets that will prepare the data.

D. Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon SageMaker data labeling to prepare the data.

Answer: A

Explanation:
A: SageMaker Data Wrangler simplifies merging and cleaning datasets. (Correct answer)
B: Ground Truth is for labeling, not cleaning.
C: Manual merging is slow and inefficient.
D: Data Labeling adds labels but doesn’t clean data.

61 / 114

61.

No.61
A company has historical data that shows whether customers needed long-term support from company staff. The company needs to develop an ML model to predict whether new customers will require long-term support.
Which modeling approach should the company use to meet this requirement?

A. Anomaly detection

B. Linear regression

C. Logistic regression

D. Semantic segmentation

Answer: C

Explanation:
A. Anomaly detection: For rare events, not binary classification.
B. Linear regression: For predicting continuous numbers, not Yes/No
C. Logistic regression: Perfect for Yes/No predictions (binary classification). --> RIGHT
D. Semantic segmentation: For images, not customer prediction.

62 / 114

62.

No.62
An ML engineer has developed a binary classification model outside of Amazon SageMaker. The ML engineer needs to make the model accessible to a SageMaker Canvas user for additional tuning.
The model artifacts are stored in an Amazon S3 bucket. The ML engineer and the Canvas user are part of the same SageMaker domain.
Which combination of requirements must be met so that the ML engineer can share the model with the Canvas user? (Choose two.)

A. The ML engineer and the Canvas user must be in separate SageMaker domains.

B. The Canvas user must have permissions to access the S3 bucket where the model artifacts are stored.

C. The model must be registered in the SageMaker Model Registry.

D. The ML engineer must host the model on AWS Marketplace.

E. The ML engineer must deploy the model to a SageMaker endpoint.

Answer: B, C

Explanation:
For model outside of Amazon SageMaker, canvas user needs access to S3; Model --> Model registry

63 / 114

63.

No.63
A company is building a deep learning model on Amazon SageMaker. The company uses a large amount of data as the training dataset. The company needs to optimize the model's hyperparameters to minimize the loss function on the validation dataset.
Which hyperparameter tuning strategy will accomplish this goal with the LEAST computation time?

A. Hyperband

B. Grid search

C. Bayesian optimization

D. Random search

Answer: A

Explanation:
A. Hyperband: Efficient & best --> Right answer
B. Grid Search: Exhaustive and tries all combos
C. Bayesian Optimization: Smart with best combination
D. Random Search: Random

64 / 114

64.

No.64
A company is planning to use Amazon Redshift ML in its primary AWS account. The source data is in an Amazon S3 bucket in a secondary account.
An ML engineer needs to set up an ML pipeline in the primary account to access the S3 bucket in the secondary account. The solution must not require public IPv4 addresses.
Which solution will meet these requirements?

A. Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create a VPC peering connection between the accounts. Update the VPC route tables to remove the route to 0.0.0.0/0.

B. Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create an AWS Direct Connect connection and a transit gateway. Associate the VPCs from both accounts with the transit gateway. Update the VPC route tables to remove the route to 0.0.0.0/0.

C. Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an AWS Site-to-Site VPN connection with two encrypted IPsec tunnels between the accounts. Set up interface VPC endpoints for Amazon S3.

D. Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an S3 gateway endpoint. Update the S3 bucket policy to allow IAM principals from the primary account. Set up interface VPC endpoints for SageMaker and Amazon Redshift.

Answer: D

Explanation:
VPC endpoints for sagemaker and gateway endpoint for S3 is needed to access without public access to connect to VPC

65 / 114

65.

No.65
A company is using an AWS Lambda function to monitor the metrics from an ML model. An ML engineer needs to implement a solution to send an email message when the metrics breach a threshold.
Which solution will meet this requirement?

A. Log the metrics from the Lambda function to AWS CloudTrail. Configure a CloudTrail trail to send the email message.

B. Log the metrics from the Lambda function to Amazon CloudFront. Configure an Amazon CloudWatch alarm to send the email message.

C. Log the metrics from the Lambda function to Amazon CloudWatch. Configure a CloudWatch alarm to send the email message.

D. Log the metrics from the Lambda function to Amazon CloudWatch. Configure an Amazon CloudFront rule to send the email message.

Answer: C

Explanation:
Simple event-driven architecture.
CloudWatch alarm is the keyword; Needed to alert

66 / 114

66.

No.66
A company has used Amazon SageMaker to deploy a predictive ML model in production. The company is using SageMaker Model Monitor on the model. After a model update, an ML engineer notices data quality issues in the Model Monitor checks.
What should the ML engineer do to mitigate the data quality issues that Model Monitor has identified?

A. Adjust the model's parameters and hyperparameters.

B. Initiate a manual Model Monitor job that uses the most recent production data.

C. Create a new baseline from the latest dataset. Update Model Monitor to use the new baseline for evaluations.

D. Include additional data in the existing training set for the model. Retrain and redeploy the model.

Answer: C

Explanation:
agree with GiorgioGss - If the problems start appearing "After a model update" then C is the only valid option.
Model Monitor gives data quality issues --> Create new baseline --> Validate baseline --> Update Model Monitor with new baseline --> Reevaluate data quality --> Investigate and fix root cause (if issues persist) --> Monitor continuously

67 / 114

67.

No.67
A company has an ML model that generates text descriptions based on images that customers upload to the company's website. The images can be up to 50 MB in total size.
An ML engineer decides to store the images in an Amazon S3 bucket. The ML engineer must implement a processing solution that can scale to accommodate changes in demand.
Which solution will meet these requirements with the LEAST operational overhead?

A. Create an Amazon SageMaker batch transform job to process all the images in the S3 bucket.

B. Create an Amazon SageMaker Asynchronous Inference endpoint and a scaling policy. Run a script to make an inference request for each image.

C. Create an Amazon Elastic Kubernetes Service (Amazon EKS) cluster that uses Karpenter for auto scaling. Host the model on the EKS cluster. Run a script to make an inference request for each image.

D. Create an AWS Batch job that uses an Amazon Elastic Container Service (Amazon ECS) cluster. Specify a list of images to process for each AWS Batch job.

Answer: B

Explanation:
https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-autoscale.html To autoscale asynchronous endpoint -> Register model -> Define and apply scaling policy; Other options are complex to implement

68 / 114

68.

No.68
An ML engineer needs to use AWS services to identify and extract meaningful unique keywords from documents.
Which solution will meet these requirements with the LEAST operational overhead?

A. Use the Natural Language Toolkit (NLTK) library on Amazon EC2 instances for text pre-processing. Use the Latent Dirichlet Allocation (LDA) algorithm to identify and extract relevant keywords.

B. Use Amazon SageMaker and the BlazingText algorithm. Apply custom pre-processing steps for stemming and removal of stop words. Calculate term frequency-inverse document frequency (TF-IDF) scores to identify and extract relevant keywords.

C. Store the documents in an Amazon S3 bucket. Create AWS Lambda functions to process the documents and to run Python scripts for stemming and removal of stop words. Use bigram and trigram techniques to identify and extract relevant keywords.

D. Use Amazon Comprehend custom entity recognition and key phrase extraction to identify and extract relevant keywords.

Answer: D

Explanation:
Key phrase extraction and custom entity recognition - Amazon Comprehend helps with least operational overhead.

69 / 114

69.

No.69
A company needs to give its ML engineers appropriate access to training data. The ML engineers must access training data from only their own business group. The ML engineers must not be allowed to access training data from other business groups.
The company uses a single AWS account and stores all the training data in Amazon S3 buckets. All ML model training occurs in Amazon SageMaker.
Which solution will provide the ML engineers with the appropriate access?

A. Enable S3 bucket versioning.

B. Configure S3 Object Lock settings for each user.

C. Add cross-origin resource sharing (CORS) policies to the S3 buckets.

D. Create IAM policies. Attach the policies to IAM users or IAM roles.

Answer: D

Explanation:
IAM policies helps to define the access required and control. Can be applied to user or role.
IAM to have 'granular' permissions.

70 / 114

70.

No.70
A company needs to host a custom ML model to perform forecast analysis. The forecast analysis will occur with predictable and sustained load during the same 2-hour period every day.
Multiple invocations during the analysis period will require quick responses. The company needs AWS to manage the underlying infrastructure and any auto scaling activities.
Which solution will meet these requirements?

A. Schedule an Amazon SageMaker batch transform job by using AWS Lambda.

B. Configure an Auto Scaling group of Amazon EC2 instances to use scheduled scaling.

C. Use Amazon SageMaker Serverless Inference with provisioned concurrency.

D. Run the model on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster on Amazon EC2 with pod auto scaling.

Answer: C

Explanation:
Load is predictable and sustainable with 2 hrs usage pattern; Needs quick response as well; Sagemaker - Provisioned concurrency + Serverless inference will be able to support it. https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html

71 / 114

71.

No.71
A company's ML engineer has deployed an ML model for sentiment analysis to an Amazon SageMaker endpoint. The ML engineer needs to explain to company stakeholders how the model makes predictions.
Which solution will provide an explanation for the model's predictions?

A. Use SageMaker Model Monitor on the deployed model.

B. Use SageMaker Clarify on the deployed model.

C. Show the distribution of inferences from A/В testing in Amazon CloudWatch.

D. Add a shadow endpoint. Analyze prediction differences on samples.

Answer: B

Explanation:
Sentiment analysis model → SageMaker Clarify → Analyze feature impact → Explain predictions to stakeholders

72 / 114

72.

No.72
An ML engineer is using Amazon SageMaker to train a deep learning model that requires distributed training. After some training attempts, the ML engineer observes that the instances are not performing as expected. The ML engineer identifies communication overhead between the training instances.
What should the ML engineer do to MINIMIZE the communication overhead between the instances?

A. Place the instances in the same VPC subnet. Store the data in a different AWS Region from where the instances are deployed.

B. Place the instances in the same VPC subnet but in different Availability Zones. Store the data in a different AWS Region from where the instances are deployed.

C. Place the instances in the same VPC subnet. Store the data in the same AWS Region and Availability Zone where the instances are deployed.

D. Place the instances in the same VPC subnet. Store the data in the same AWS Region but in a different Availability Zone from where the instances are deployed.

Answer: C

Explanation:
Distributed training model → Same VPC subnet → Same Region and Availability Zone for data and instances → Minimize communication overhead

73 / 114

73.

No.73
A company is running ML models on premises by using custom Python scripts and proprietary datasets. The company is using PyTorch. The model building requires unique domain knowledge. The company needs to move the models to AWS.
Which solution will meet these requirements with the LEAST effort?

A. Use SageMaker built-in algorithms to train the proprietary datasets.

B. Use SageMaker script mode and premade images for ML frameworks.

C. Build a container on AWS that includes custom packages and a choice of ML frameworks.

D. Purchase similar production models through AWS Marketplace.

Answer: B

Explanation:
https://aws.amazon.com/blogs/machine-learning/bring-your-own-model-with-amazon-sagemaker-script-mode/

"Script mode enables you to write custom training and inference code while still utilizing common ML framework containers "

74 / 114

74.

No.74
A company is using Amazon SageMaker and millions of files to train an ML model. Each file is several megabytes in size. The files are stored in an Amazon S3 bucket. The company needs to improve training performance.
Which solution will meet these requirements in the LEAST amount of time?

A. Transfer the data to a new S3 bucket that provides S3 Express One Zone storage. Adjust the training job to use the new S3 bucket.

B. Create an Amazon FSx for Lustre file system. Link the file system to the existing S3 bucket. Adjust the training job to read from the file system.

C. Create an Amazon Elastic File System (Amazon EFS) file system. Transfer the existing data to the file system. Adjust the training job to read from the file system.

D. Create an Amazon ElastiCache (Redis OSS) cluster. Link the Redis OSS cluster to the existing S3 bucket. Stream the data from the Redis OSS cluster directly to the training job.

Answer: B

Explanation:
https://aws.amazon.com/blogs/machine-learning/speed-up-training-on-amazon-sagemaker-using-amazon-efs-or-amazon-fsx-for-lustre-file-systems/
S3 data → FSx for Lustre → High throughput and low latency → Improved training performance

75 / 114

75.

No.75
A company wants to develop an ML model by using tabular data from its customers. The data contains meaningful ordered features with sensitive information that should not be discarded. An ML engineer must ensure that the sensitive data is masked before another team starts to build the model.
Which solution will meet these requirements?

A. Use Amazon Made to categorize the sensitive data.

B. Prepare the data by using AWS Glue DataBrew.

C. Run an AWS Batch job to change the sensitive data to random values.

D. Run an Amazon EMR job to change the sensitive data to random values.

Answer: B

Explanation:
AWS Glue DataBrew (Option B) is the most efficient and user-friendly solution for masking sensitive information while retaining the structure and order of tabular data, making it ideal for preparing data for ML model development.
AWS Macie cannot mask data.

76 / 114

76.

No.76
An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of the data quality of the models. The ML engineer must receive alerts when changes in data quality occur.
Which solution will meet these requirements?

A. Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and to send alerts.

B. Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and to send alerts.

C. Deploy the models by using Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Use Amazon EventBridge to monitor the data quality and to send alerts.

D. Deploy the models by using Amazon SageMaker batch transform. Use SageMaker Model Monitor to monitor the data quality and to send alerts.

Answer: D

Explanation:
https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html Model Monitor tracks data quality, model quality, bias drift, and feature attribution drift for production models. Model monitor setup with continuous monitoring with batch transform will work

77 / 114

77.

No.77
An ML engineer normalized training data by using min-max normalization in AWS Glue DataBrew. The ML engineer must normalize the production inference data in the same way as the training data before passing the production inference data to the model for predictions.
Which solution will meet this requirement?

A. Apply statistics from a well-known dataset to normalize the production samples.

B. Keep the min-max normalization statistics from the training set. Use these values to normalize the production samples.

C. Calculate a new set of min-max normalization statistics from a batch of production samples. Use these values to normalize all the production samples.

D. Calculate a new set of min-max normalization statistics from each production sample. Use these values to normalize all the production samples.

Answer: B

Explanation:
Models are sensitive to data distribution. Consistency needed for accurate predictions and hence Option-B keeping the same min-max normalization statistics will help; Option C & D affect model performance; Option-A introduces inconsistency

78 / 114

78.

No.78
A company is planning to use Amazon SageMaker to make classification ratings that are based on images. The company has 6 ТВ of training data that is stored on an Amazon FSx for NetApp ONTAP system virtual machine (SVM). The SVM is in the same VPC as SageMaker.
An ML engineer must make the training data accessible for ML models that are in the SageMaker environment.
Which solution will meet these requirements?

A. Mount the FSx for ONTAP file system as a volume to the SageMaker Instance.

B. Create an Amazon S3 bucket. Use Mountpoint for Amazon S3 to link the S3 bucket to the FSx for ONTAP file system.

C. Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

D. Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Answer: A

Explanation:
https://docs.netapp.com/us-en/netapp-solutions/ai/mlops_fsxn_sagemaker_integration_training.html#introduction Option A would work since you can mount FSx for ONTAP directly to SageMaker, enabling fast access to the 6 TB data in the same VPC without extra steps

79 / 114

79.

No.79
A company regularly receives new training data from the vendor of an ML model. The vendor delivers cleaned and prepared data to the company's Amazon S3 bucket every 3-4 days.
The company has an Amazon SageMaker pipeline to retrain the model. An ML engineer needs to implement a solution to run the pipeline when new data is uploaded to the S3 bucket.
Which solution will meet these requirements with the LEAST operational effort?

A. Create an S3 Lifecycle rule to transfer the data to the SageMaker training instance and to initiate training.

B. Create an AWS Lambda function that scans the S3 bucket. Program the Lambda function to initiate the pipeline when new data is uploaded.

C. Create an Amazon EventBridge rule that has an event pattern that matches the S3 upload. Configure the pipeline as the target of the rule.

D. Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the pipeline when new data is uploaded.

Answer: C

Explanation:
Amazon EventBridge can automatically trigger the SageMaker pipeline when new data is uploaded to S3, making it a simple and efficient soln.

80 / 114

80.

No.80
An ML engineer is developing a fraud detection model by using the Amazon SageMaker XGBoost algorithm. The model classifies transactions as either fraudulent or legitimate.
During testing, the model excels at identifying fraud in the training dataset. However, the model is inefficient at identifying fraud in new and unseen transactions.
What should the ML engineer do to improve the fraud detection for new transactions?

A. Increase the learning rate.

B. Remove some irrelevant features from the training dataset.

C. Increase the value of the max_depth hyperparameter.

D. Decrease the value of the max_depth hyperparameter.

Answer: D

Explanation:
This is the scenario of overfitting where it works well with trained data and not with new data. Reducing the max_depth hyperparameter makes the model less complex, helping it generalize better to new data.

81 / 114

81.

No.81
A company has a binary classification model in production. An ML engineer needs to develop a new version of the model.
The new model version must maximize correct predictions of positive labels and negative labels. The ML engineer must use a metric to recalibrate the model to meet these requirements.
Which metric should the ML engineer use for the model recalibration?

A. Accuracy

B. Precision

C. Recall

D. Specificity

Answer: A

Explanation:
A. Accuracy: Correct choice; maximizes both true positives and true negatives. Formula: (TP + TN) / Total Predictions
B. Precision: Focuses only on true positives, not negatives. Formula: TP / (TP + FP)
C. Recall: Focuses on capturing all true positives, ignoring negatives. Formula: TP / (TP + FN)
D. Specificity: Focuses only on true negatives, ignoring positives. Formula: TN / (TN + FP)

82 / 114

82.

★No.82
A company is using Amazon SageMaker to create ML models. The company's data scientists need fine-grained control of the ML workflows that they orchestrate. The data scientists also need the ability to visualize SageMaker jobs and workflows as a directed acyclic graph (DAG). The data scientists must keep a running history of model discovery experiments and must establish model governance for auditing and compliance verifications.
Which solution will meet these requirements?

A. Use AWS CodePipeline and its integration with SageMaker Studio to manage the entire ML workflows. Use SageMaker ML Lineage Tracking for the running history of experiments and for auditing and compliance verifications.

B. Use AWS CodePipeline and its integration with SageMaker Experiments to manage the entire ML workflows. Use SageMaker Experiments for the running history of experiments and for auditing and compliance verifications.

C. Use SageMaker Pipelines and its integration with SageMaker Studio to manage the entire ML workflows. Use SageMaker ML Lineage Tracking for the running history of experiments and for auditing and compliance verifications.

D. Use SageMaker Pipelines and its integration with SageMaker Experiments to manage the entire ML workflows. Use SageMaker Experiments for the running history of experiments and for auditing and compliance verifications.

83 / 114

83.

No.83
A company wants to reduce the cost of its containerized ML applications. The applications use ML models that run on Amazon EC2 instances, AWS Lambda functions, and an Amazon Elastic Container Service (Amazon ECS) cluster. The EC2 workloads and ECS workloads use Amazon Elastic Block Store (Amazon EBS) volumes to save predictions and artifacts.
An ML engineer must identify resources that are being used inefficiently. The ML engineer also must generate recommendations to reduce the cost of these resources.
Which solution will meet these requirements with the LEAST development effort?

A. Create code to evaluate each instance's memory and compute usage.

B. Add cost allocation tags to the resources. Activate the tags in AWS Billing and Cost Management.

C. Check AWS CloudTrail event history for the creation of the resources.

D. Run AWS Compute Optimizer.

Answer: D

Explanation:
AWS Compute Optimizer finds wasted resources in EC2, EBS and suggests easy ways to save money and boost performance.

84 / 114

84.

No.84
A company needs to create a central catalog for all the company's ML models. The models are in AWS accounts where the company developed the models initially. The models are hosted in Amazon Elastic Container Registry (Amazon ECR) repositories.
Which solution will meet these requirements?

A. Configure ECR cross-account replication for each existing ECR repository. Ensure that each model is visible in each AWS account.

B. Create a new AWS account with a new ECR repository as the central catalog. Configure ECR cross-account replication between the initial ECR repositories and the central catalog.

C. Use the Amazon SageMaker Model Registry to create a model group for models hosted in Amazon ECR. Create a new AWS account. In the new account, use the SageMaker Model Registry as the central catalog. Attach a cross-account resource policy to each model group in the initial AWS accounts.

D. Use an AWS Glue Data Catalog to store the models. Run an AWS Glue crawler to migrate the models from the ECR repositories to the Data Catalog. Configure cross-account access to the Data Catalog.

Answer: C

Explanation:
The question asks for a "central catalog" so I believe metadata, lineage tracking are also "included". ECR could not be the solution.

85 / 114

85.

No.85
A company has developed a new ML model. The company requires online model validation on 10% of the traffic before the company fully releases the model in production. The company uses an Amazon SageMaker endpoint behind an Application Load Balancer (ALB) to serve the model.
Which solution will set up the required online validation with the LEAST operational overhead?

A. Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 0.1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.

B. Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.

C. Create a new SageMaker endpoint. Use production variants to add the new model to the new endpoint. Monitor the number of invocations by using Amazon CloudWatch.

D. Configure the ALB to route 10% of the traffic to the new model at the existing SageMaker endpoint. Monitor the number of invocations by using AWS CloudTrail.

Answer: A

Explanation:
{
'ProductionVariants': [
{
'VariantName': 'existing-model',
'ModelName': 'existing-model',
'InitialVariantWeight': 0.9
},
{
'VariantName': 'new-model',
'ModelName': 'new-model',
'InitialVariantWeight': 0.1
}
]
}

86 / 114

86.

No.86
A company needs to develop an ML model. The model must identify an item in an image and must provide the location of the item.
Which Amazon SageMaker algorithm will meet these requirements?

A. Image classification

B. XGBoost

C. Object detection

D. K-nearest neighbors (k-NN)

Answer: C

Explanation:
https://docs.aws.amazon.com/sagemaker/latest/dg/algo-object-detection-tech-notes.html

87 / 114

87.

No.87
A company has an Amazon S3 bucket that contains 1 ТВ of files from different sources. The S3 bucket contains the following file types in the same S3 folder: CSV, JSON, XLSX, and Apache Parquet.
An ML engineer must implement a solution that uses AWS Glue DataBrew to process the data. The ML engineer also must store the final output in Amazon S3 so that AWS Glue can consume the output in the future.
Which solution will meet these requirements?

A. Use DataBrew to process the existing S3 folder. Store the output in Apache Parquet format.

B. Use DataBrew to process the existing S3 folder. Store the output in AWS Glue Parquet format.

C. Separate the data into a different folder for each file type. Use DataBrew to process each folder individually. Store the output in Apache Parquet format.

D. Separate the data into a different folder for each file type. Use DataBrew to process each folder individually. Store the output in AWS Glue Parquet format.

Answer: C

Explanation:
Problem Summary:

The data in S3 is mixed file formats: CSV, JSON, XLSX, and Parquet — all in one folder.
You need to use AWS Glue DataBrew to process the data.
The processed data must be stored in S3 for AWS Glue to consume later.

Key Considerations:

DataBrew Input Requirements:
DataBrew datasets must be in a consistent format (CSV, JSON, XLSX, or Parquet).
DataBrew cannot process mixed formats in a single dataset. You must split the data by format.

DataBrew Output Format:
Apache Parquet is preferred for:
Efficient storage
Better performance with AWS Glue and other analytics tools
Columnar storage benefits in querying and transformations

"AWS Glue Parquet format" does not exist — this is a distractor in the answer options.

88 / 114

88.

No.88
A manufacturing company uses an ML model to determine whether products meet a standard for quality. The model produces an output of "Passed" or "Failed." Robots separate the products into the two categories by using the model to analyze photos on the assembly line.
Which metrics should the company use to evaluate the model's performance? (Choose two.)

A. Precision and recall

B. Root mean square error (RMSE) and mean absolute percentage error (MAPE)

C. Accuracy and F1 score

D. Bilingual Evaluation Understudy (BLEU) score

E. Perplexity

Answer:A, C

Explanation:
A. Precision and recall
C. Accuracy and F1 score

89 / 114

89.

No.89
An ML engineer needs to encrypt all data in transit when an ML training job runs. The ML engineer must ensure that encryption in transit is applied to processes that Amazon SageMaker uses during the training job.
Which solution will meet these requirements?

A. Encrypt communication between nodes for batch processing.

B. Encrypt communication between nodes in a training cluster.

C. Specify an AWS Key Management Service (AWS KMS) key during creation of the training job request.

D. Specify an AWS Key Management Service (AWS KMS) key during creation of the SageMaker domain.

Answer: B

Explanation:
this is it

90 / 114

90.

No.90
An ML engineer needs to use metrics to assess the quality of a time-series forecasting model.
Which metrics apply to this model? (Choose two.)

A. Recall

B. LogLoss

C. Root mean square error (RMSE)

D. InferenceLatency

E. Average weighted quantile loss (wQL)

Answer: C, E

Explanation:
this is correct

91 / 114

91.

No.91
A company runs Amazon SageMaker ML models that use accelerated instances. The models require real-time responses. Each model has different scaling requirements. The company must not allow a cold start for the models.
Which solution will meet these requirements?

A. Create a SageMaker Serverless Inference endpoint for each model. Use provisioned concurrency for the endpoints.

B. Create a SageMaker Asynchronous Inference endpoint for each model. Create an auto scaling policy for each endpoint.

C. Create a SageMaker endpoint. Create an inference component for each model. In the inference component settings, specify the newly created endpoint. Create an auto scaling policy for each inference component. Set the parameter for the minimum number of copies to at least 1.

D. Create an Amazon S3 bucket. Store all the model artifacts in the S3 bucket. Create a SageMaker multi-model endpoint. Point the endpoint to the S3 bucket. Create an auto scaling policy for the endpoint. Set the parameter for the minimum number of copies to at least 1.

Answer: C

Explanation:
Requirements Recap:
Real-time inference: Needs low-latency predictions.
Accelerated instances: Likely GPU-backed, costly to scale inefficiently.
No cold starts: Endpoints must always be warm and responsive.
Each model has different scaling needs: Must support independent scaling of each model.

Why Option C is correct:
Inference components are a new SageMaker feature that allow:
Hosting multiple models on a single endpoint.
Independent scaling of each model (component).
Avoiding cold starts via minimum number of copies.
Setting min invocations or min replicas ≥ 1 keeps the model always warm, eliminating cold starts.
This solution meets all requirements efficiently.

92 / 114

92.

No.92
A company uses Amazon SageMaker for its ML process. A compliance audit discovers that an Amazon S3 bucket for training data uses server-side encryption with S3 managed keys (SSE-S3).
The company requires customer managed keys. An ML engineer changes the S3 bucket to use server-side encryption with AWS KMS keys (SSE-KMS). The ML engineer makes no other configuration changes.
After the change to the encryption settings, SageMaker training jobs start to fail with AccessDenied errors.
What should the ML engineer do to resolve this problem?

A. Update the IAM policy that is attached to the execution role for the training jobs. Include the s3:ListBucket and s3:GetObject permissions.

B. Update the S3 bucket policy that is attached to the S3 bucket. Set the value of the aws:SecureTransport condition key to True.

C. Update the IAM policy that is attached to the execution role for the training jobs. Include the kms:Encrypt and kms:Decrypt permissions.

D. Update the IAM policy that is attached to the user that created the training jobs. Include the kms:CreateGrant permission.

Answer: C

Explanation:
this is correct

93 / 114

93.

No.93
A company runs training jobs on Amazon SageMaker by using a compute optimized instance. Demand for training runs will remain constant for the next 55 weeks. The instance needs to run for 35 hours each week. The company needs to reduce its model training costs.
Which solution will meet these requirements?

A. Use a serverless endpoint with a provisioned concurrency of 35 hours for each week. Run the training on the endpoint.

B. Use SageMaker Edge Manager for the training. Specify the instance requirement in the edge device configuration. Run the training.

C. Use the heterogeneous cluster feature of SageMaker Training. Configure the instance_type, instance_count, and instance_groups arguments to run training jobs.

D. Opt in to a SageMaker Savings Plan with a 1-year term and an All Upfront payment. Run a SageMaker Training job on the instance.

Answer: D

Explanation:
SageMaker Savings Plans offer a discount for long-term use of SageMaker instances.

94 / 114

94.

★No.94
HOTSPOT
-

A company needs to train an ML model that will use historical transaction data to predict customer behavior.
Select the correct AWS service from the following list to perform each task on the data. Each service should be selected one time or not at all. (Select three.)

• Amazon Athena
• AWS Glue
• Amazon Kinesis Data Streams
• Amazon S3

Query the data for exploration and analysis.Select ...
Select ...
Amazon Athena
AWS Glue
Amazon Kinesis Data Streams
Amazon S3

Store the data.Select ...
Select ...
Amazon Athena
AWS Glue
Amazon Kinesis Data Streams
Amazon S3

Transform the data.Select ...
Select ...
Amazon Athena
AWS Glue
Amazon Kinesis Data Streams
Amazon S3

95 / 114

95.

No.95
A company deployed an ML model that uses the XGBoost algorithm to predict product failures. The model is hosted on an Amazon SageMaker endpoint and is trained on normal operating data. An AWS Lambda function provides the predictions to the company's application.
An ML engineer must implement a solution that uses incoming live data to detect decreased model accuracy over time.
Which solution will meet these requirements?

A. Use Amazon CloudWatch to create a dashboard that monitors real-time inference data and model predictions. Use the dashboard to detect drift.

B. Modify the Lambda function to calculate model drift by using real-time inference data and model predictions. Program the Lambda function to send alerts.

C. Schedule a monitoring job in SageMaker Model Monitor. Use the job to detect drift by analyzing the live data against a baseline of the training data statistics and constraints.

D. Schedule a monitoring job in SageMaker Debugger. Use the job to detect drift by analyzing the live data against a baseline of the training data statistics and constraints.

Answer: C　

Explanation:
this is it

96 / 114

96.

No.96
A company has an ML model that uses historical transaction data to predict customer behavior. An ML engineer is optimizing the model in Amazon SageMaker to enhance the model's predictive accuracy. The ML engineer must examine the input data and the resulting predictions to identify trends that could skew the model's performance across different demographics.
Which solution will provide this level of analysis?

A. Use Amazon CloudWatch to monitor network metrics and CPU metrics for resource optimization during model training.

B. Create AWS Glue DataBrew recipes to correct the data based on statistics from the model output.

C. Use SageMaker Clarify to evaluate the model and training data for underlying patterns that might affect accuracy.

D. Create AWS Lambda functions to automate data pre-processing and to ensure consistent quality of input data for the model.

Answer: C

Explanation:
Option C - SageMaker Clarify is built for bias detection and explainability.
It can analyze training data and model predictions to identify potential biases.
It provides insights into how different demographic groups are affected by the model.

97 / 114

97.

No.97
A company uses 10 Reserved Instances of accelerated instance types to serve the current version of an ML model. An ML engineer needs to deploy a new version of the model to an Amazon SageMaker real-time inference endpoint.
The solution must use the original 10 instances to serve both versions of the model. The solution also must include one additional Reserved Instance that is available to use in the deployment process. The transition between versions must occur with no downtime or service interruptions.
Which solution will meet these requirements?

A. Configure a blue/green deployment with all-at-once traffic shifting.

B. Configure a blue/green deployment with canary traffic shifting and a size of 10%.

C. Configure a shadow test with a traffic sampling percentage of 10%.

D. Configure a rolling deployment with a rolling batch size of 1.

Answer: B

Explanation:
should be B.
D doesn’t provide a clear strategy for managing traffic during the transition.

98 / 114

98.

No.98
An IoT company uses Amazon SageMaker to train and test an XGBoost model for object detection. ML engineers need to monitor performance metrics when they train the model with variants in hyperparameters. The ML engineers also need to send Short Message Service (SMS) text messages after training is complete.
Which solution will meet these requirements?

A. Use Amazon CloudWatch to monitor performance metrics. Use Amazon Simple Queue Service (Amazon SQS) for message delivery.

B. Use Amazon CloudWatch to monitor performance metrics. Use Amazon Simple Notification Service (Amazon SNS) for message delivery.

C. Use AWS CloudTrail to monitor performance metrics. Use Amazon Simple Queue Service (Amazon SQS) for message delivery.

D. Use AWS CloudTrail to monitor performance metrics. Use Amazon Simple Notification Service (Amazon SNS) for message delivery.

Answer: B

Explanation:
Option A and C out (SQS). Cloudtrail not monitor performance metrcs.
Cloudwatch with SNS, SQS is for queuing messages and not sending.
CloudTrail options doesn't apply here

99 / 114

99.

No.99
A company is working on an ML project that will include Amazon SageMaker notebook instances. An ML engineer must ensure that the SageMaker notebook instances do not allow root access.
Which solution will prevent the deployment of notebook instances that allow root access?

A. Use IAM condition keys to stop deployments of SageMaker notebook instances that allow root access.

B. Use AWS Key Management Service (AWS KMS) keys to stop deployments of SageMaker notebook instances that allow root access.

C. Monitor resource creation by using Amazon EventBridge events. Create an AWS Lambda function that deletes all deployed SageMaker notebook instances that allow root access.

D. Monitor resource creation by using AWS CloudFormation events. Create an AWS Lambda function that deletes all deployed SageMaker notebook instances that allow root access.

Answer: Ａ　

Explanation:
this is it

100 / 114

100.

No.100
A company is using Amazon SageMaker to develop ML models. The company stores sensitive training data in an Amazon S3 bucket. The model training must have network isolation from the internet.
Which solution will meet this requirement?

A. Run the SageMaker training jobs in private subnets. Create a NAT gateway. Route traffic for training through the NAT gateway.

B. Run the SageMaker training jobs in private subnets. Create an S3 gateway VPC endpoint. Route traffic for training through the S3 gateway VPC endpoint.

C. Run the SageMaker training jobs in public subnets that have an attached security group. In the security group, use inbound rules to limit traffic from the internet. Encrypt SageMaker instance storage by using server-side encryption with AWS KMS keys (SSE-KMS).

D. Encrypt traffic to Amazon S3 by using a bucket policy that includes a value of True for the aws:SecureTransport condition key. Use default at-rest encryption for Amazon S3. Encrypt SageMaker instance storage by using server-side encryption with AWS KMS keys (SSE-KMS).

Answer: B

Explanation:
Use private subnets and S3 gateway VPC endpoint to bypass public Internet.

101 / 114

101.

No.101
A company needs an AWS solution that will automatically create versions of ML models as the models are created.
Which solution will meet this requirement?

A. Amazon Elastic Container Registry (Amazon ECR)

B. Model packages from Amazon SageMaker Marketplace

C. Amazon SageMaker ML Lineage Tracking

D. Amazon SageMaker Model Registry

Answer: Ⅾ

Explanation:
this is it

102 / 114

102.

No.102
A company needs to use Retrieval Augmented Generation (RAG) to supplement an open source large language model (LLM) that runs on Amazon Bedrock. The company's data for RAG is a set of documents in an Amazon S3 bucket. The documents consist of .csv files and .docx files.
Which solution will meet these requirements with the LEAST operational overhead?

A. Create a pipeline in Amazon SageMaker Pipelines to generate a new model. Call the new model from Amazon Bedrock to perform RAG queries.

B. Convert the data into vectors. Store the data in an Amazon Neptune database. Connect the database to Amazon Bedrock. Call the Amazon Bedrock API to perform RAG queries.

C. Fine-tune an existing LLM by using an AutoML job in Amazon SageMaker. Configure the S3 bucket as a data source for the AutoML job. Deploy the LLM to a SageMaker endpoint. Use the endpoint to perform RAG queries.

D. Create a knowledge base for Amazon Bedrock. Configure a data source that references the S3 bucket. Use the Amazon Bedrock API to perform RAG queries.

Answer: Ⅾ

Explanation:
D is the correct answer.
A The csv and docx files has to be vectorized first. Beside this option does not mention anything about the data
B and C are not applicable in this case.

103 / 114

103.

No.103
A company plans to deploy an ML model for production inference on an Amazon SageMaker endpoint. The average inference payload size will vary from 100 MB to 300 MB. Inference requests must be processed in 60 minutes or less.
Which SageMaker inference option will meet these requirements?

A. Serverless inference

B. Asynchronous inference

C. Real-time inference

D. Batch transform

Answer: B

Explanation:
Agree with B.
In general, real-time inference supports payloads up to 5 MB for synchronous requests, while asynchronous inference can support larger payloads, often up to 5 GB.

The use case in this questions involves inference payloads of 100 MB to 300 MB and needs to be processed in under 60 minutes, Asynchronous Inference is the best choice for handling large payloads without strict real-time requirements.

104 / 114

104.

No.104
An ML engineer notices class imbalance in an image classification training job.
What should the ML engineer do to resolve this issue?

A. Reduce the size of the dataset.

B. Transform some of the images in the dataset.

C. Apply random oversampling on the dataset.

D. Apply random data splitting on the dataset.

Answer: C　

Explanation:
correct

105 / 114

105.

No.105
A company receives daily .csv files about customer interactions with its ML model. The company stores the files in Amazon S3 and uses the files to retrain the model. An ML engineer needs to implement a solution to mask credit card numbers in the files before the model is retrained.
Which solution will meet this requirement with the LEAST development effort?

A. Create a discovery job in Amazon Macie. Configure the job to find and mask sensitive data.

B. Create Apache Spark code to run on an AWS Glue job. Use the Sensitive Data Detection functionality in AWS Glue to find and mask sensitive data.

C. Create Apache Spark code to run on an AWS Glue job. Program the code to perform a regex operation to find and mask sensitive data.

D. Create Apache Spark code to run on an Amazon EC2 instance. Program the code to perform an operation to find and mask sensitive data.

Answer: Ａ　

Explanation:
correct

106 / 114

106.

No.106
A medical company is using AWS to build a tool to recommend treatments for patients. The company has obtained health records and self-reported textual information in English from patients. The company needs to use this information to gain insight about the patients.
Which solution will meet this requirement with the LEAST development effort?

A. Use Amazon SageMaker to build a recurrent neural network (RNN) to summarize the data.

B. Use Amazon Comprehend Medical to summarize the data.

C. Use Amazon Kendra to create a quick-search tool to query the data.

D. Use the Amazon SageMaker Sequence-to-Sequence (seq2seq) algorithm to create a text summary from the data.

Answer: B

Explanation:
correct

107 / 114

107.

No.107
A company needs to extract entities from a PDF document to build a classifier model.
Which solution will extract and store the entities in the LEAST amount of time?

A. Use Amazon Comprehend to extract the entities. Store the output in Amazon S3.

B. Use an open source AI optical character recognition (OCR) tool on Amazon SageMaker to extract the entities. Store the output in Amazon S3.

C. Use Amazon Textract to extract the entities. Use Amazon Comprehend to convert the entities to text. Store the output in Amazon S3.

D. Use Amazon Textract integrated with Amazon Augmented AI (Amazon A2I) to extract the entities. Store the output in Amazon S3.

Answer: C

Explanation:
Agree with C.
Normally Amazon Comprehend is sufficient if the pdf contains only text. Since the question does not mention the exact contents of the pdf files. It would be safer to use Amazon Textract to extract the text, then Amazon Comprehend do the entity extraction.

108 / 114

108.

No.108
A company shares Amazon SageMaker Studio notebooks that are accessible through a VPN. The company must enforce access controls to prevent malicious actors from exploiting presigned URLs to access the notebooks.
Which solution will meet these requirements?

A. Set up Studio client IP validation by using the aws:sourceIp IAM policy condition.

B. Set up Studio client VPC validation by using the aws:sourceVpc IAM policy condition.

C. Set up Studio client role endpoint validation by using the aws:PrimaryTag IAM policy condition.

D. Set up Studio client user endpoint validation by using the aws:PrincipalTag IAM policy condition.

Answer: Ａ

Explanation:
A is correct.
https://aws.amazon.com/blogs/machine-learning/secure-amazon-sagemaker-studio-presigned-urls-part-1-foundational-infrastructure/

Studio supports a few methods for enforcing access controls against presigned URL data exfiltration:
Client IP validation using the IAM policy condition aws:sourceIp
Client VPC validation using the IAM condition aws:sourceVpc
Client VPC endpoint validation using the IAM policy condition aws:sourceVpce

Context:
The company is using Amazon SageMaker Studio notebooks.
Access is allowed through a VPN, meaning users are coming from known, fixed IP ranges.
The concern is unauthorized access via presigned URLs, which could potentially be used outside the trusted network.

Why aws:sourceIp is the right choice:
The aws:sourceIp condition in IAM policies allows you to restrict access based on the client's IP address.
This is perfect for VPN-based setups where you know the IP range.
It ensures that only users accessing from allowed IPs (e.g., your VPN subnet) can access SageMaker Studio resources, even if they have a valid presigned URL.
This directly mitigates the risk of URL misuse from outside the VPN.

109 / 114

109.

No.109
An ML engineer needs to merge and transform data from two sources to retrain an existing ML model. One data source consists of .csv files that are stored in an Amazon S3 bucket. Each .csv file consists of millions of records. The other data source is an Amazon Aurora DB cluster.
The result of the merge process must be written to a second S3 bucket. The ML engineer needs to perform this merge-and-transform task every week.
Which solution will meet these requirements with the LEAST operational overhead?

A. Create a transient Amazon EMR cluster every week. Use the cluster to run an Apache Spark job to merge and transform the data.

B. Create a weekly AWS Glue job that uses the Apache Spark engine. Use DynamicFrame native operations to merge and transform the data.

C. Create an AWS Lambda function that runs Apache Spark code every week to merge and transform the data. Configure the Lambda function to connect to the initial S3 bucket and the DB cluster.

D. Create an AWS Batch job that runs Apache Spark code on Amazon EC2 instances every week. Configure the Spark code to save the data from the EC2 instances to the second S3 bucket.

Answer: B

Explanation:
correct

110 / 114

110.

No.110
An ML engineer has deployed an Amazon SageMaker model to a serverless endpoint in production. The model is invoked by the InvokeEndpoint API operation.
The model's latency in production is higher than the baseline latency in the test environment. The ML engineer thinks that the increase in latency is because of model startup time.
What should the ML engineer do to confirm or deny this hypothesis?

A. Schedule a SageMaker Model Monitor job. Observe metrics about model quality.

B. Schedule a SageMaker Model Monitor job with Amazon CloudWatch metrics enabled.

C. Enable Amazon CloudWatch metrics. Observe the ModelSetupTime metric in the SageMaker namespace.

D. Enable Amazon CloudWatch metrics. Observe the ModelLoadingWaitTime metric in the SageMaker namespace.

Answer: Ⅾ

Explanation:
ModelLoadingWaitTime metric
measures the time taken to load the model

111 / 114

111.

No.111
An ML engineer needs to ensure that a dataset complies with regulations for personally identifiable information (PII). The ML engineer will use the data to train an ML model on Amazon SageMaker instances. SageMaker must not use any of the PII.
Which solution will meet these requirements in the MOST operationally efficient way?

A. Use the Amazon Comprehend DetectPiiEntities API call to redact the PII from the data. Store the data in an Amazon S3 bucket. Access the S3 bucket from the SageMaker instances for model training.

B. Use the Amazon Comprehend DetectPiiEntities API call to redact the PII from the data. Store the data in an Amazon Elastic File System (Amazon EFS) file system. Mount the EFS file system to the SageMaker instances for model training.

C. Use AWS Glue DataBrew to cleanse the dataset of PII. Store the data in an Amazon Elastic File System (Amazon EFS) file system. Mount the EFS file system to the SageMaker instances for model training.

D. Use Amazon Macie for automatic discovery of PII in the data. Remove the PII. Store the data in an Amazon S3 bucket. Mount the S3 bucket to the SageMaker instances for model training.

Answer: Ａ

Explanation:
correct

112 / 114

112.

No.112
A company must install a custom script on any newly created Amazon SageMaker notebook instances.
Which solution will meet this requirement with the LEAST operational overhead?

A. Create a lifecycle configuration script to install the custom script when a new SageMaker notebook is created. Attach the lifecycle configuration to every new SageMaker notebook as part of the creation steps.

B. Create a custom Amazon Elastic Container Registry (Amazon ECR) image that contains the custom script. Push the ECR image to a Docker registry. Attach the Docker image to a SageMaker Studio domain. Select the kernel to run as part of the SageMaker notebook.

C. Create a custom package index repository. Use AWS CodeArtifact to manage the installation of the custom script. Set up AWS PrivateLink endpoints to connect CodeArtifact to the SageMaker instance. Install the script.

D. Store the custom script in Amazon S3. Create an AWS Lambda function to install the custom script on new SageMaker notebooks. Configure Amazon EventBridge to invoke the Lambda function when a new SageMaker notebook is initialized.

Answer: Ａ

Explanation:
correct

113 / 114

113.

★No.113
A company is building a real-time data processing pipeline for an ecommerce application. The application generates a high volume of clickstream data that must be ingested, processed, and visualized in near real time. The company needs a solution that supports SQL for data processing and Jupyter notebooks for interactive analysis.
Which solution will meet these requirements?

A. Use Amazon Data Firehose to ingest the data. Create an AWS Lambda function to process the data. Store the processed data in Amazon S3. Use Amazon QuickSight to visualize the data.

B. Use Amazon Kinesis Data Streams to ingest the data. Use Amazon Data Firehose to transform the data. Use Amazon Athena to process the data. Use Amazon QuickSight to visualize the data.

C. Use Amazon Managed Streaming for Apache Kafka (Amazon MSK) to ingest the data. Use AWS Glue with PySpark to process the data. Store the processed data in Amazon S3. Use Amazon QuickSight to visualize the data.

D. Use Amazon Managed Streaming for Apache Kafka (Amazon MSK) to ingest the data. Use Amazon Managed Service for Apache Flink to process the data. Use the built-in Flink dashboard to visualize the data.

114 / 114

114.

No.114
A medical company needs to store clinical data. The data includes personally identifiable information (PII) and protected health information (PHI).
An ML engineer needs to implement a solution to ensure that the PII and PHI are not used to train ML models.
Which solution will meet these requirements?

A. Store the clinical data in Amazon S3 buckets. Use AWS Glue DataBrew to mask the PII and PHI before the data is used for model training.

B. Upload the clinical data to an Amazon Redshift database. Use built-in SQL stored procedures to automatically classify and mask the PII and PHI before the data is used for model training.

C. Use Amazon Comprehend to detect and mask the PII before the data is used for model training. Use Amazon Comprehend Medical to detect and mask the PHI before the data is used for model training.

D. Create an AWS Lambda function to encrypt the PII and PHI. Program the Lambda function to save the encrypted data to an Amazon S3 bucket for model training.

Answer: C

Explanation:
correct

Your score is

■AWS MLA-C01(EN) Q.1-100

/100

AWS MLA-C01(EN) Q.1-100

1 / 100

A. Create a separate Amazon Elastic Container Registry (Amazon ECR) repository for each model.

B. Use Amazon Elastic Container Registry (Amazon ECR) and unique tags for each model version.

C. Use the SageMaker Model Registry and model groups to catalog the models.

D. Use the SageMaker Model Registry and unique tags for each model version.

Answer: C

2 / 100

A. Use Managed Spot Training.

B. Use SageMaker managed warm pools.

C. Use SageMaker Training Compiler.

D. Use the SageMaker distributed data parallelism (SMDDP) library.

Answer: B

3 / 100

A. Use SageMaker Experiments to facilitate the approval process during model registration.

B. Use SageMaker ML Lineage Tracking on the central model registry. Create tracking entities for the approval process.

C. Use SageMaker Model Monitor to evaluate the performance of the model and to manage the approval.

D. Use SageMaker Pipelines. When a model version is registered, use the AWS SDK to change the approval status to "Approved."

Answer: D

4 / 100

A. Configure the application to invoke an AWS Lambda function that runs a SageMaker Clarify job.

B. Invoke an AWS Lambda function to pull the sagemaker-model-monitor-analyzer built-in SageMaker image.

C. Use AWS Glue Data Quality to monitor bias.

D. Use SageMaker notebooks to compare the bias.

Answer: A

5 / 100

Explanation:
Order of steps:
Use AWS Glue crawlers to infer schemas and available columns.
Use AWS Glue DataBrew for data cleaning and feature engineering.
Store the resulting data back in Amazon S3.

6 / 100

Step 1: [Select…] -------------------------- Select… Access the store to build datasets for training. Create a feature group. Ingest the records.

Step 2: [Select…] -------------------------- Select… Access the store to build datasets for training. Create a feature group. Ingest the records.

Step 3: [Select…] -------------------------- Select… Access the store to build datasets for training. Create a feature group. Ingest the records.

Answer:
Step 1: 「Create a feature group.」
Step 2: 「Ingest the records.」
Step 3: 「Access the store to build datasets for training.」

7 / 100

8 / 100

Text representation of basic units of data processed by LLMs [Select…] -------------------------------- Select… Embedding Retrieval Augmented Generation (RAG) Temperature Token

High-dimensional vectors that contain the semantic meaning of text [Select…] -------------------------------- Select… Embedding Retrieval Augmented Generation (RAG) Temperature Token

9 / 100

City (name) [Select…] -------------------------------- Select… Feature splitting Logarithmic transformation One-hot encoding Standardized distribution

Type_year(type of home and year the home was built) [Select…] -------------------------------- Feature splitting Logarithmic transformation One-hot encoding Standardized distribution

Size of the building(square feet or square meters) [Select…] -------------------------------- Feature splitting Logarithmic transformation One-hot encoding Standardized distribution

Answer:
City (name): One-hot encoding
Type_year(type of home and year the home was built): Feature splitting
Size of the building(square feet or square meters): Logarithmic transformation

10 / 100

10.

A. Amazon EMR Spark jobs

B. Amazon Kinesis Data Streams

C. Amazon DynamoDB

D. AWS Lake Formation

Answer: D

11 / 100

11.

A. Use Amazon Athena to automatically detect the anomalies and to visualize the result.

B. Use Amazon Redshift Spectrum to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.

C. Use Amazon SageMaker Data Wrangler to automatically detect the anomalies and to visualize the result.

D. Use AWS Batch to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.

Answer: C

12 / 100

12.

A. Use AWS Glue to transform the categorical data into numerical data.

B. Use AWS Glue to transform the numerical data into categorical data.

C. Use Amazon SageMaker Data Wrangler to transform the categorical data into numerical data.

D. Use Amazon SageMaker Data Wrangler to transform the numerical data into categorical data.

Answer: C

Explanation:
https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transform.html

AWS Glue also has Data science recipe steps for One Hot Encoding and Categorical Mapping.
https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.data-science.html

However Data Wrangler is more user-friendly with visual and natural language interfaces for less operational overhead

13 / 100

13.

A. Use Amazon Athena to identify patterns that contribute to the imbalance. Adjust the dataset accordingly.

B. Use Amazon SageMaker Studio Classic built-in algorithms to process the imbalanced dataset.

C. Use AWS Glue DataBrew built-in features to oversample the minority class.

D. Use the Amazon SageMaker Data Wrangler balance data operation to oversample the minority class.

Answer: D

Explanation:
https://aws.amazon.com/blogs/machine-learning/balance-your-data-for-machine-learning-with-amazon-sagemaker-data-wrangler/

14 / 100

14.

A. LightGBM

B. Linear learner

C. К-means clustering

D. Neural Topic Model (NTM)

Answer: A

15 / 100

15.

A. Concept drift occurred in the underlying customer data that was used for predictions.

B. The model was not sufficiently complex to capture all the patterns in the original baseline data.

C. The original baseline data had a data quality issue of missing values.

D. Incorrect ground truth labels were provided to Model Monitor during the calculation of the baseline.

Answer: A

Explanation:
Concept Drift: Occurs when the statistical properties of the data used for predictions change over time, causing the model to underperform on current data.
Why Not the Other Options?

16 / 100

16.

A. Create a single IAM role that has the necessary permissions. Attach the role to each notebook instance that the team uses.

B. Create a single IAM group. Add the data scientists to the group. Associate the group with each notebook instance that the team uses.

C. Create a single IAM user. Attach the AdministratorAccess AWS managed IAM policy to the user. Configure each notebook instance to use the IAM user.

Answer: A

17 / 100

17.

No.17
An ML engineer needs to use an ML model to predict the price of apartments in a specific location.
Which metric should the ML engineer use to evaluate the model's performance?

A. Accuracy

B. Area Under the ROC Curve (AUC)

C. F1 score

D. Mean absolute error (MAE)

Answer: D

Explanation:
The only one for regression is D. Other 3 are for classification.
This is a regression problem, thus MAE is the right answer. Accuracy, AUC-ROC and F1 are for classification.

18 / 100

18.

A. Introduce early stopping.

B. Increase the size of the test set.

C. Increase the learning rate.

D. Decrease the learning rate.

Answer: D

19 / 100

19.

A. Use an Amazon Athena CREATE TABLE AS SELECT (CTAS) statement to create a table based on the transaction date from data in the central S3 bucket. Query the objects from the table.

B. Create a new S3 bucket for processed data. Set up S3 replication from the central S3 bucket to the new S3 bucket. Use S3 Object Lambda to query the objects based on transaction date.

Answer: A

20 / 100

20.

A. Use Amazon Mechanical Turk jobs to detect duplicates.

B. Use Amazon QuickSight ML Insights to build a custom deduplication model.

C. Use Amazon SageMaker Data Wrangler to pre-process and detect duplicates.

D. Use the AWS Glue FindMatches transform to detect duplicates.

Answer: D

21 / 100

21.

A. Spot Instances

B. Reserved Instances

C. On-Demand Instances

D. Dedicated Instances

Answer: A

Explanation:
Cost effective + Interruptions + Short duration 90mins = Spot instance

22 / 100

22.

A. Use Amazon S3 to make a copy of the model. Transfer the copy to Account B.

B. Create a resource-based IAM policy. Use the Amazon Comprehend ImportModel API operation to copy the model to Account B.

C. Use AWS DataSync to replicate the model from Account A to Account B.

D. Create an AWS Site-to-Site VPN connection between Account A and Account В to transfer the model.

Answer: B

23 / 100

23.

A. Enable early stopping on the model.

B. Increase dropout in the layers.

C. Increase the number of layers.

D. Increase the number of neurons.

E. Investigate and reduce the sources of model bias.

Answer: A, B

Explanation:
The issue is overfitting. Soln:-
A. Early stopping:- Stops training when validation performance declines
B. Increase dropout:- reduces overfitting by randomly disabling neurons

24 / 100

24.

A. Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

B. Use a custom Amazon SageMaker notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

C. Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

D. Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Answer: C

Explanation:
https://docs.aws.amazon.com/kendra/latest/dg/data-source-s3.html

25 / 100

25.

A. Create a new model by using Amazon SageMaker Autopilot. Report the model's achieved performance.

B. Implement custom scripts to perform data pre-processing, multiple linear regression, and performance evaluation. Run the scripts on Amazon EC2 instances.

C. Configure Amazon Macie to analyze the dataset and to create a model. Report the model's achieved performance.

D. Select a model from Amazon Bedrock. Tune the model with the data. Report the model's achieved performance.

Answer: A

26 / 100

26.

A. Apply label encoding to the color categories. Automatically assign each color a unique integer.

B. Implement padding to ensure that all color feature vectors have the same length.

C. Perform dimensionality reduction on the color categories.

D. One-hot encode the color categories to transform the color scheme feature into a binary matrix.

Answer: D

Explanation:
1. Label Encoding: Ordinal relationship
2. Padding: Sequence data
3. Dimensionality Reduction: High-dimensional data
4. One-Hot Encoding: Categorical data (Right)

27 / 100

27.

A. Deploy the model on Amazon SageMaker. Create a set of AWS Lambda functions to identify and remove the sensitive data.

B. Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Create an AWS Batch job to identify and remove the sensitive data.

C. Use Amazon Macie to identify the sensitive data. Create a set of AWS Lambda functions to remove the sensitive data.

D. Use Amazon Comprehend to identify the sensitive data. Launch Amazon EC2 instances to remove the sensitive data.

Answer: C

Explanation:
Macie - Identify sensitive data

28 / 100

28.

No.28
An ML engineer needs to create data ingestion pipelines and ML model deployment pipelines on AWS. All the raw data is stored in Amazon S3 buckets.
Which solution will meet these requirements?

A. Use Amazon Data Firehose to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

B. Use AWS Glue to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

C. Use Amazon Redshift ML to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

D. Use Amazon Athena to create the data ingestion pipelines. Use an Amazon SageMaker notebook to create the model deployment pipelines.

Answer: B

Explanation:
Data ingestion - Glue ; Model deployment pipeline - sagemaker studio classic
This is the main use-case for Glu

29 / 100

29.

A. Create a custom tag for each of the three categories. Add the tags to the model packages in the SageMaker Model Registry.

B. Create a model group for each category. Move the existing models into these category model groups.

C. Use SageMaker ML Lineage Tracking to automatically identify and tag which model groups should contain the models.

D. Create a Model Registry collection for each of the three categories. Move the existing model groups into the collections.

Answer: D

30 / 100

30.

A. Create a security group inbound rule to deny traffic from the specific IP address. Assign the security group to the domain.

B. Create a network ACL inbound rule to deny traffic from the specific IP address. Assign the rule to the default network Ad for the subnet where the domain is located.

C. Create a shadow variant for the domain. Configure SageMaker Inference Recommender to send traffic from the specific IP address to the shadow endpoint.

D. Create a VPC route table to deny inbound traffic from the specific IP address. Assign the route table to the domain.

Answer: B

Explanation:
Protection at subnet level: Network ACL. Specific IP addresses can be denied at inbound connection level.

31 / 100

31.

A. Train and deploy a model in Amazon SageMaker to convert the data into English text. Train and deploy an LLM in SageMaker to summarize the text.

B. Use Amazon Transcribe and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Jurassic model to summarize the text.

C. Use Amazon Rekognition and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Anthropic Claude model to summarize the text.

D. Use Amazon Comprehend and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Stable Diffusion model to summarize the text.

Answer: B

Explanation:
LEAST amount of time -> A is out
C is out because Claude does NOT fit for summarization
D is out because that's for image generation.

32 / 100

32.

Answer: A

Explanation:
Option A
High-volume real-time: Kinesis Data Streams
Scalable: Managed Apache Flink
Anomaly detection: RANDOM_CUT_FOREST
Low overhead: Fully managed services

33 / 100

33.

A. Use Amazon Rekognition to analyze sentiments of the chat conversations.

B. Train a Naive Bayes classifier to analyze sentiments of the chat conversations.

C. Use Amazon Comprehend to analyze sentiments of the chat conversations.

D. Use random forests to classify sentiments of the chat conversations.

Answer: C

Explanation:
https://docs.aws.amazon.com/comprehend/latest/dg/what-is.htm
Prebuilt sentiment analysis + Fast setup + NLP --Comprehend

34 / 100

34.

A. Increase the temperature parameter and the top_k parameter.

B. Increase the temperature parameter. Decrease the top_k parameter.

C. Decrease the temperature parameter. Increase the top_k parameter.

D. Decrease the temperature parameter and the top_k parameter.

Answer: D

Explanation:
Lower temperature: High probable output
Lower Top k : Focus on likely output
https://docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html

35 / 100

35.

A. Set the value of the weight decay hyperparameter to zero.

B. Increase the number of training epochs.

C. Increase the value of the target_precision hyperparameter.

D. Change the value of the predictorjype hyperparameter to regressor.

Answer: C

36 / 100

36.

A. Use zero buffering in the Firehose stream. Tune the batch size that is used in the PutRecordBatch operation.

B. Replace the Firehose stream with an AWS DataSync task. Configure the task with enhanced fan-out consumers.

C. Increase the buffer interval of the Firehose stream from 60 seconds to 120 seconds.

D. Replace the Firehose stream with an Amazon Simple Queue Service (Amazon SQS) queue.

Answer: A

37 / 100

37.

A. Create a SageMaker real-time inference endpoint. Configure auto scaling. Configure the endpoint to present the existing model.

B. Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster. Use ECS scheduled scaling that is based on the CPU of the ECS cluster.

D. Use Spot Instances with a Spot Fleet behind an Application Load Balancer (ALB) for inferences. Use the ALBRequestCountPerTarget metric as the metric for auto scaling.

Answer: A

38 / 100

38.

A. Run the primary node, core nodes, and task nodes on On-Demand Instances.

B. Run the primary node, core nodes, and task nodes on Spot Instances.

C. Run the primary node on an On-Demand Instance. Run the core nodes and task nodes on Spot Instances.

D. Run the primary node and core nodes on On-Demand Instances. Run the task nodes on Spot Instances.

Answer: D

39 / 100

39.

A. Use Amazon SageMaker Debugger to stop training jobs when non-converging conditions are detected.

B. Use Amazon SageMaker Ground Truth for data labeling.

C. Deploy models by using AWS Lambda functions.

D. Use AWS Trainium instances for training.

E. Use PyTorch or TensorFlow with the distributed training option.

Answer: A, D

40 / 100

40.

A. Process data at each step by using Amazon SageMaker Data Wrangler. Automate the process by using Data Wrangler jobs.

B. Use Amazon SageMaker notebooks for each data processing step. Automate the process by using Amazon EventBridge.

C. Process data at each step by using AWS Lambda functions. Automate the process by using AWS Step Functions and Amazon EventBridge.

D. Use Amazon SageMaker Pipelines to create a pipeline of data processing steps. Automate the pipeline by using Amazon EventBridge.

Answer: D

Explanation:
Large datasets + Multiple file formats + Complex automation & orchestration of ML workflows + NLP Transformation ---> Sagemaker pipelines + Event bridge for trigger

41 / 100

41.

A. AWS::SageMaker::Model

B. AWS::SageMaker::Endpoint

C. AWS::SageMaker::NotebookInstance

D. AWS::SageMaker::Pipeline

Answer: A

42 / 100

42.

A. Configure IAM policies on an AWS Glue Data Catalog to restrict access to Athena based on the ML engineers' campaigns.

B. Store users and campaign information in an Amazon DynamoDB table. Configure DynamoDB Streams to invoke an AWS Lambda function to update S3 bucket policies.

C. Use Lake Formation to authorize AWS Glue to access the S3 bucket. Configure Lake Formation tags to map ML engineers to their campaigns.

D. Configure S3 bucket policies to restrict access to the S3 bucket based on the ML engineers' campaigns.

Answer: C

Explanation:
AWS Lake Formation → Tag resources with campaigns → Map ML engineers to campaigns → Fine-grained access control → Operational efficiency

43 / 100

43.

A. CSV files compressed with Snappy

B. JSON objects in JSONL format

C. JSON files compressed with gzip

D. Apache Parquet files

Answer: D

Explanation:
Minimize processing time: -Why Apache Parquet? Columnar, fast I/O; Efficient for complex data; Built-in compression; SageMaker Canvas compatible

44 / 100

44.

A. Low precision

B. High precision

C. Low recall

D. High recall

Answer: D

45 / 100

45.

A. Use SageMaker Debugger to track the inferences and to report metrics. Create a custom rule to provide a notification when the threshold is breached.

B. Use SageMaker Debugger to track the inferences and to report metrics. Use the tensor_variance built-in rule to provide a notification when the threshold is breached.

D. Add the Invocations metric to an Amazon CloudWatch dashboard for monitoring. Set up a CloudWatch alarm to provide notification when the threshold is breached.

Answer: C

Explanation:
The company needs to implement a solution to record and monitor all the API call events for the SageMaker endpoint. Its needs to RECORD all events.

46 / 100

46.

A. Use AWS Step Functions for orchestration of the pipelines and the AWS Glue jobs.

B. Use processing steps in SageMaker Pipelines. Configure inputs that point to the Amazon Resource Names (ARNs) of the AWS Glue jobs.

C. Use Callback steps in SageMaker Pipelines to start the AWS Glue workflow and to stop the pipelines until the AWS Glue jobs finish running.

D. Use Amazon EventBridge to invoke the pipelines and the AWS Glue jobs in the desired order.

Answer: C

47 / 100

47.

A. Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time.

B. Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist.

C. Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist.

D. Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data. Share the dataset with the data scientist.

Answer: A

Explanation:
Amazon Redshift database → Sensitive data → Dynamic Data Masking → Query-time masking for data scientist → No transformation or additional storage → Least effort

48 / 100

48.

B. Use Amazon CloudWatch default metrics to gain insights about the training job. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

C. Expand the metrics in Amazon CloudWatch to include the gradients in each training step. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

D. Use SageMaker Debugger built-in rules to monitor the training job. Configure the rules to initiate the predefined actions.

Answer: D

Explanation:
SageMaker Debugger → Built-in rules → Monitor training (vanishing gradients, GPU use, overfitting) → Predefined actions → Low overhead

49 / 100

49.

A. Set up SageMaker Debugger and create a custom rule.

B. Set up blue/green deployments with all-at-once traffic shifting.

C. Set up blue/green deployments with canary traffic shifting.

D. Set up shadow testing with a shadow variant of the new model.

Answer: D

50 / 100

50.

A. Keep all the time-series data without partitioning in the S3 bucket. Manually move data that is older than 30 days to separate S3 buckets.

B. Create AWS Lambda functions to copy the time-series data into separate S3 buckets. Apply S3 Lifecycle policies to archive data that is older than 30 days to S3 Glacier Flexible Retrieval.

C. Organize the time-series data into partitions by date prefix in the S3 bucket. Apply S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval.

D. Put each day's time-series data into its own S3 bucket. Use S3 Lifecycle policies to archive S3 buckets that hold data that is older than 30 days to S3 Glacier Flexible Retrieval.

Answer: C

Explanation:
Time-series data → Partition by date in S3 → Optimized Athena queries → S3 lifecycle policies → Move partitions >30 days to S3 Glacier Flexible Retrieval

51 / 100

51.

A. Use SageMaker real-time inference for inference. Use SageMaker Model Monitor for notifications about model quality.

B. Use SageMaker batch transform for inference. Use SageMaker Model Monitor for notifications about model quality.

C. Use SageMaker Serverless Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

D. Keep using SageMaker Asynchronous Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

Answer: A

Explanation:
Sagemaker Real-Time Inference - Faster predictions to solve delay issues;
Model Monitor to tracks model quality and sends alerts for deviations.

52 / 100

52.

A. Create AWS Lambda functions that have fixed concurrency to host the model. Configure the Lambda functions to automatically scale based on the number of requests to the model.

B. Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Set a static number of tasks to handle requests during times of peak usage.

D. Deploy the model to an Amazon SageMaker endpoint. Create SageMaker endpoint auto scaling policies that are based on Amazon CloudWatch metrics to adjust the number of instances dynamically.

Answer: D

https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html

53 / 100

53.

A. Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Cost Explorer to send an alert when the threshold is reached.

B. Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Budgets to send an alert when the threshold is reached.

C. Add resource tagging by editing each user's IAM profile. Configure AWS Cost Explorer to send an alert when the threshold is reached.

D. Add resource tagging by editing each user's IAM profile. Configure AWS Budgets to send an alert when the threshold is reached.

Answer: B

Explanation:
Sagemaker user profiles tagging: https://docs.aws.amazon.com/sagemaker/latest/dg/domain-user-profile-add.html
Budgets : For cost tracking and setting thresholds

54 / 100

54.

A. Download the file to a local workstation. Perform one-hot encoding by using a custom Python script.

B. Create an Apache Spark job that uses a custom processing script on Amazon EMR.

C. Create a SageMaker processing job by calling the SageMaker Python SDK.

D. Create a data flow in SageMaker Data Wrangler. Configure a transform step.

Answer: D

Explanation:
Parquet data file → SageMaker Data Wrangler → Explore data → Transform → Drop unnecessary columns → Clean and preprocess data → Export to S3 → Fraud detection model

55 / 100

55.

A. Configure the competitor's name as a blocked phrase in Amazon Q Business.

B. Configure an Amazon Q Business retriever to exclude the competitor’s name.

C. Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor's name.

D. Configure document attribute boosting in Amazon Q Business to deprioritize the competitor's name.

Answer: A

Explanation:
https://docs.aws.amazon.com/amazonq/latest/api-reference/API_BlockedPhrasesConfiguration.html

56 / 100

56.

A. Use SageMaker Studio to fine-tune an LLM that is deployed on Amazon EC2 instances.

B. Use SageMaker Autopilot to fine-tune an LLM that is deployed by a custom API endpoint.

C. Use SageMaker Autopilot to fine-tune an LLM that is deployed on Amazon EC2 instances.

D. Use SageMaker Autopilot to fine-tune an LLM that is deployed by SageMaker JumpStart.

Answer: D

Explanation:
LCNC solution: SageMaker Autopilot → SageMaker JumpStart → Deploy pre-trained LLM → Fine-tune for text summarization

57 / 100

57.

A. Use a multi-model serverless endpoint. Enable caching.

B. Use an asynchronous inference endpoint. Set the InitialInstanceCount parameter to 0.

C. Use a real-time endpoint. Configure an auto scaling policy to scale the model to 0 when the model is not in use.

D. Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1.

Answer: D

58 / 100

58.

A. Collect more images from all the cameras. Use Data Wrangler to prepare a new training dataset.

B. Recreate the training dataset by using the Data Wrangler corrupt image transform. Specify the impulse noise option.

C. Recreate the training dataset by using the Data Wrangler enhance image contrast transform. Specify the Gamma contrast option.

D. Recreate the training dataset by using the Data Wrangler resize image transform. Crop all images to the same size.

Answer: B

59 / 100

59.

A. Store the tokens in AWS Secrets Manager. Create an AWS Lambda function to perform the rotation.

B. Store the tokens in AWS Systems Manager Parameter Store. Create an AWS Lambda function to perform the rotation.

C. Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS managed key to perform the rotation.

D. Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS owned key to perform the rotation.

Answer: A

Explanation:
Secret manager has automatic rotation

60 / 100

60.

A. Use Amazon SageMaker Data Wrangler to import the datasets and to consolidate them into a single data frame. Use the cleansing and enrichment functionalities to prepare the data.

B. Use Amazon SageMaker Ground Truth to import the datasets and to consolidate them into a single data frame. Use the human-in-the-loop capability to prepare the data.

C. Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon Q Developer to generate code snippets that will prepare the data.

D. Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon SageMaker data labeling to prepare the data.

Answer: A

61 / 100

61.

A. Anomaly detection

B. Linear regression

C. Logistic regression

D. Semantic segmentation

Answer: C

62 / 100

62.

A. The ML engineer and the Canvas user must be in separate SageMaker domains.

B. The Canvas user must have permissions to access the S3 bucket where the model artifacts are stored.

C. The model must be registered in the SageMaker Model Registry.

D. The ML engineer must host the model on AWS Marketplace.

E. The ML engineer must deploy the model to a SageMaker endpoint.

Answer: B, C

Explanation:
For model outside of Amazon SageMaker, canvas user needs access to S3; Model --> Model registry

63 / 100

63.

A. Hyperband

B. Grid search

C. Bayesian optimization

D. Random search

Answer: A

Explanation:
A. Hyperband: Efficient & best --> Right answer
B. Grid Search: Exhaustive and tries all combos
C. Bayesian Optimization: Smart with best combination
D. Random Search: Random

64 / 100

64.

Answer: D

Explanation:
VPC endpoints for sagemaker and gateway endpoint for S3 is needed to access without public access to connect to VPC

65 / 100

65.

A. Log the metrics from the Lambda function to AWS CloudTrail. Configure a CloudTrail trail to send the email message.

B. Log the metrics from the Lambda function to Amazon CloudFront. Configure an Amazon CloudWatch alarm to send the email message.

C. Log the metrics from the Lambda function to Amazon CloudWatch. Configure a CloudWatch alarm to send the email message.

D. Log the metrics from the Lambda function to Amazon CloudWatch. Configure an Amazon CloudFront rule to send the email message.

Answer: C

Explanation:
Simple event-driven architecture.
CloudWatch alarm is the keyword; Needed to alert

66 / 100

66.

A. Adjust the model's parameters and hyperparameters.

B. Initiate a manual Model Monitor job that uses the most recent production data.

C. Create a new baseline from the latest dataset. Update Model Monitor to use the new baseline for evaluations.

D. Include additional data in the existing training set for the model. Retrain and redeploy the model.

Answer: C

67 / 100

67.

A. Create an Amazon SageMaker batch transform job to process all the images in the S3 bucket.

B. Create an Amazon SageMaker Asynchronous Inference endpoint and a scaling policy. Run a script to make an inference request for each image.

C. Create an Amazon Elastic Kubernetes Service (Amazon EKS) cluster that uses Karpenter for auto scaling. Host the model on the EKS cluster. Run a script to make an inference request for each image.

D. Create an AWS Batch job that uses an Amazon Elastic Container Service (Amazon ECS) cluster. Specify a list of images to process for each AWS Batch job.

Answer: B

68 / 100

68.

No.68
An ML engineer needs to use AWS services to identify and extract meaningful unique keywords from documents.
Which solution will meet these requirements with the LEAST operational overhead?

A. Use the Natural Language Toolkit (NLTK) library on Amazon EC2 instances for text pre-processing. Use the Latent Dirichlet Allocation (LDA) algorithm to identify and extract relevant keywords.

D. Use Amazon Comprehend custom entity recognition and key phrase extraction to identify and extract relevant keywords.

Answer: D

Explanation:
Key phrase extraction and custom entity recognition - Amazon Comprehend helps with least operational overhead.

69 / 100

69.

A. Enable S3 bucket versioning.

B. Configure S3 Object Lock settings for each user.

C. Add cross-origin resource sharing (CORS) policies to the S3 buckets.

D. Create IAM policies. Attach the policies to IAM users or IAM roles.

Answer: D

Explanation:
IAM policies helps to define the access required and control. Can be applied to user or role.
IAM to have 'granular' permissions.

70 / 100

70.

A. Schedule an Amazon SageMaker batch transform job by using AWS Lambda.

B. Configure an Auto Scaling group of Amazon EC2 instances to use scheduled scaling.

C. Use Amazon SageMaker Serverless Inference with provisioned concurrency.

D. Run the model on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster on Amazon EC2 with pod auto scaling.

Answer: C

71 / 100

71.

A. Use SageMaker Model Monitor on the deployed model.

B. Use SageMaker Clarify on the deployed model.

C. Show the distribution of inferences from A/В testing in Amazon CloudWatch.

D. Add a shadow endpoint. Analyze prediction differences on samples.

Answer: B

Explanation:
Sentiment analysis model → SageMaker Clarify → Analyze feature impact → Explain predictions to stakeholders

72 / 100

72.

A. Place the instances in the same VPC subnet. Store the data in a different AWS Region from where the instances are deployed.

B. Place the instances in the same VPC subnet but in different Availability Zones. Store the data in a different AWS Region from where the instances are deployed.

C. Place the instances in the same VPC subnet. Store the data in the same AWS Region and Availability Zone where the instances are deployed.

D. Place the instances in the same VPC subnet. Store the data in the same AWS Region but in a different Availability Zone from where the instances are deployed.

Answer: C

Explanation:
Distributed training model → Same VPC subnet → Same Region and Availability Zone for data and instances → Minimize communication overhead

73 / 100

73.

A. Use SageMaker built-in algorithms to train the proprietary datasets.

B. Use SageMaker script mode and premade images for ML frameworks.

C. Build a container on AWS that includes custom packages and a choice of ML frameworks.

D. Purchase similar production models through AWS Marketplace.

Answer: B

Explanation:
https://aws.amazon.com/blogs/machine-learning/bring-your-own-model-with-amazon-sagemaker-script-mode/

"Script mode enables you to write custom training and inference code while still utilizing common ML framework containers "

74 / 100

74.

A. Transfer the data to a new S3 bucket that provides S3 Express One Zone storage. Adjust the training job to use the new S3 bucket.

B. Create an Amazon FSx for Lustre file system. Link the file system to the existing S3 bucket. Adjust the training job to read from the file system.

C. Create an Amazon Elastic File System (Amazon EFS) file system. Transfer the existing data to the file system. Adjust the training job to read from the file system.

D. Create an Amazon ElastiCache (Redis OSS) cluster. Link the Redis OSS cluster to the existing S3 bucket. Stream the data from the Redis OSS cluster directly to the training job.

Answer: B

75 / 100

75.

A. Use Amazon Made to categorize the sensitive data.

B. Prepare the data by using AWS Glue DataBrew.

C. Run an AWS Batch job to change the sensitive data to random values.

D. Run an Amazon EMR job to change the sensitive data to random values.

Answer: B

76 / 100

76.

A. Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and to send alerts.

B. Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and to send alerts.

C. Deploy the models by using Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Use Amazon EventBridge to monitor the data quality and to send alerts.

D. Deploy the models by using Amazon SageMaker batch transform. Use SageMaker Model Monitor to monitor the data quality and to send alerts.

Answer: D

77 / 100

77.

A. Apply statistics from a well-known dataset to normalize the production samples.

B. Keep the min-max normalization statistics from the training set. Use these values to normalize the production samples.

C. Calculate a new set of min-max normalization statistics from a batch of production samples. Use these values to normalize all the production samples.

D. Calculate a new set of min-max normalization statistics from each production sample. Use these values to normalize all the production samples.

Answer: B

78 / 100

78.

A. Mount the FSx for ONTAP file system as a volume to the SageMaker Instance.

B. Create an Amazon S3 bucket. Use Mountpoint for Amazon S3 to link the S3 bucket to the FSx for ONTAP file system.

C. Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

D. Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Answer: A

79 / 100

79.

A. Create an S3 Lifecycle rule to transfer the data to the SageMaker training instance and to initiate training.

B. Create an AWS Lambda function that scans the S3 bucket. Program the Lambda function to initiate the pipeline when new data is uploaded.

C. Create an Amazon EventBridge rule that has an event pattern that matches the S3 upload. Configure the pipeline as the target of the rule.

D. Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the pipeline when new data is uploaded.

Answer: C

Explanation:
Amazon EventBridge can automatically trigger the SageMaker pipeline when new data is uploaded to S3, making it a simple and efficient soln.

80 / 100

80.

A. Increase the learning rate.

B. Remove some irrelevant features from the training dataset.

C. Increase the value of the max_depth hyperparameter.

D. Decrease the value of the max_depth hyperparameter.

Answer: D

81 / 100

81.

A. Accuracy

B. Precision

C. Recall

D. Specificity

Answer: A

82 / 100

82.

83 / 100

83.

A. Create code to evaluate each instance's memory and compute usage.

B. Add cost allocation tags to the resources. Activate the tags in AWS Billing and Cost Management.

C. Check AWS CloudTrail event history for the creation of the resources.

D. Run AWS Compute Optimizer.

Answer: D

Explanation:
AWS Compute Optimizer finds wasted resources in EC2, EBS and suggests easy ways to save money and boost performance.

84 / 100

84.

A. Configure ECR cross-account replication for each existing ECR repository. Ensure that each model is visible in each AWS account.

B. Create a new AWS account with a new ECR repository as the central catalog. Configure ECR cross-account replication between the initial ECR repositories and the central catalog.

D. Use an AWS Glue Data Catalog to store the models. Run an AWS Glue crawler to migrate the models from the ECR repositories to the Data Catalog. Configure cross-account access to the Data Catalog.

Answer: C

Explanation:
The question asks for a "central catalog" so I believe metadata, lineage tracking are also "included". ECR could not be the solution.

85 / 100

85.

A. Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 0.1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.

B. Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.

C. Create a new SageMaker endpoint. Use production variants to add the new model to the new endpoint. Monitor the number of invocations by using Amazon CloudWatch.

D. Configure the ALB to route 10% of the traffic to the new model at the existing SageMaker endpoint. Monitor the number of invocations by using AWS CloudTrail.

Answer: A

86 / 100

86.

No.86
A company needs to develop an ML model. The model must identify an item in an image and must provide the location of the item.
Which Amazon SageMaker algorithm will meet these requirements?

A. Image classification

B. XGBoost

C. Object detection

D. K-nearest neighbors (k-NN)

Answer: C

Explanation:
https://docs.aws.amazon.com/sagemaker/latest/dg/algo-object-detection-tech-notes.html

87 / 100

87.

A. Use DataBrew to process the existing S3 folder. Store the output in Apache Parquet format.

B. Use DataBrew to process the existing S3 folder. Store the output in AWS Glue Parquet format.

C. Separate the data into a different folder for each file type. Use DataBrew to process each folder individually. Store the output in Apache Parquet format.

D. Separate the data into a different folder for each file type. Use DataBrew to process each folder individually. Store the output in AWS Glue Parquet format.

Answer: C

Explanation:
Problem Summary:

Key Considerations:

DataBrew Output Format:
Apache Parquet is preferred for:
Efficient storage
Better performance with AWS Glue and other analytics tools
Columnar storage benefits in querying and transformations

"AWS Glue Parquet format" does not exist — this is a distractor in the answer options.

88 / 100

88.

A. Precision and recall

B. Root mean square error (RMSE) and mean absolute percentage error (MAPE)

C. Accuracy and F1 score

D. Bilingual Evaluation Understudy (BLEU) score

E. Perplexity

Answer:A, C

Explanation:
A. Precision and recall
C. Accuracy and F1 score

89 / 100

89.

A. Encrypt communication between nodes for batch processing.

B. Encrypt communication between nodes in a training cluster.

C. Specify an AWS Key Management Service (AWS KMS) key during creation of the training job request.

D. Specify an AWS Key Management Service (AWS KMS) key during creation of the SageMaker domain.

Answer: B

Explanation:
this is it

90 / 100

90.

No.90
An ML engineer needs to use metrics to assess the quality of a time-series forecasting model.
Which metrics apply to this model? (Choose two.)

A. Recall

B. LogLoss

C. Root mean square error (RMSE)

D. InferenceLatency

E. Average weighted quantile loss (wQL)

Answer: C, E

Explanation:
this is correct

91 / 100

91.

A. Create a SageMaker Serverless Inference endpoint for each model. Use provisioned concurrency for the endpoints.

B. Create a SageMaker Asynchronous Inference endpoint for each model. Create an auto scaling policy for each endpoint.

Answer: C

92 / 100

92.

A. Update the IAM policy that is attached to the execution role for the training jobs. Include the s3:ListBucket and s3:GetObject permissions.

B. Update the S3 bucket policy that is attached to the S3 bucket. Set the value of the aws:SecureTransport condition key to True.

C. Update the IAM policy that is attached to the execution role for the training jobs. Include the kms:Encrypt and kms:Decrypt permissions.

D. Update the IAM policy that is attached to the user that created the training jobs. Include the kms:CreateGrant permission.

Answer: C

Explanation:
this is correct

93 / 100

93.

A. Use a serverless endpoint with a provisioned concurrency of 35 hours for each week. Run the training on the endpoint.

B. Use SageMaker Edge Manager for the training. Specify the instance requirement in the edge device configuration. Run the training.

C. Use the heterogeneous cluster feature of SageMaker Training. Configure the instance_type, instance_count, and instance_groups arguments to run training jobs.

D. Opt in to a SageMaker Savings Plan with a 1-year term and an All Upfront payment. Run a SageMaker Training job on the instance.

Answer: D

Explanation:
SageMaker Savings Plans offer a discount for long-term use of SageMaker instances.

94 / 100

94.

★No.94
HOTSPOT
-

• Amazon Athena
• AWS Glue
• Amazon Kinesis Data Streams
• Amazon S3

Query the data for exploration and analysis.Select ...
Select ...
Amazon Athena
AWS Glue
Amazon Kinesis Data Streams
Amazon S3

Store the data.Select ...
Select ...
Amazon Athena
AWS Glue
Amazon Kinesis Data Streams
Amazon S3

Transform the data.Select ...
Select ...
Amazon Athena
AWS Glue
Amazon Kinesis Data Streams
Amazon S3

95 / 100

95.

A. Use Amazon CloudWatch to create a dashboard that monitors real-time inference data and model predictions. Use the dashboard to detect drift.

B. Modify the Lambda function to calculate model drift by using real-time inference data and model predictions. Program the Lambda function to send alerts.

C. Schedule a monitoring job in SageMaker Model Monitor. Use the job to detect drift by analyzing the live data against a baseline of the training data statistics and constraints.

D. Schedule a monitoring job in SageMaker Debugger. Use the job to detect drift by analyzing the live data against a baseline of the training data statistics and constraints.

Answer: C　

Explanation:
this is it

96 / 100

96.

A. Use Amazon CloudWatch to monitor network metrics and CPU metrics for resource optimization during model training.

B. Create AWS Glue DataBrew recipes to correct the data based on statistics from the model output.

C. Use SageMaker Clarify to evaluate the model and training data for underlying patterns that might affect accuracy.

D. Create AWS Lambda functions to automate data pre-processing and to ensure consistent quality of input data for the model.

Answer: C

97 / 100

97.

A. Configure a blue/green deployment with all-at-once traffic shifting.

B. Configure a blue/green deployment with canary traffic shifting and a size of 10%.

C. Configure a shadow test with a traffic sampling percentage of 10%.

D. Configure a rolling deployment with a rolling batch size of 1.

Answer: B

Explanation:
should be B.
D doesn’t provide a clear strategy for managing traffic during the transition.

98 / 100

98.

A. Use Amazon CloudWatch to monitor performance metrics. Use Amazon Simple Queue Service (Amazon SQS) for message delivery.

B. Use Amazon CloudWatch to monitor performance metrics. Use Amazon Simple Notification Service (Amazon SNS) for message delivery.

C. Use AWS CloudTrail to monitor performance metrics. Use Amazon Simple Queue Service (Amazon SQS) for message delivery.

D. Use AWS CloudTrail to monitor performance metrics. Use Amazon Simple Notification Service (Amazon SNS) for message delivery.

Answer: B

Explanation:
Option A and C out (SQS). Cloudtrail not monitor performance metrcs.
Cloudwatch with SNS, SQS is for queuing messages and not sending.
CloudTrail options doesn't apply here

99 / 100

99.

A. Use IAM condition keys to stop deployments of SageMaker notebook instances that allow root access.

B. Use AWS Key Management Service (AWS KMS) keys to stop deployments of SageMaker notebook instances that allow root access.

C. Monitor resource creation by using Amazon EventBridge events. Create an AWS Lambda function that deletes all deployed SageMaker notebook instances that allow root access.

D. Monitor resource creation by using AWS CloudFormation events. Create an AWS Lambda function that deletes all deployed SageMaker notebook instances that allow root access.

Answer: Ａ　

Explanation:
this is it

100 / 100

100.

A. Run the SageMaker training jobs in private subnets. Create a NAT gateway. Route traffic for training through the NAT gateway.

B. Run the SageMaker training jobs in private subnets. Create an S3 gateway VPC endpoint. Route traffic for training through the S3 gateway VPC endpoint.

Answer: B

Explanation:
Use private subnets and S3 gateway VPC endpoint to bypass public Internet.

Your score is

■AWS MLA-C01(EN) Q.101-114

/14

AWS MLA-C01(EN) Q.101-114

1 / 14

No.101
A company needs an AWS solution that will automatically create versions of ML models as the models are created.
Which solution will meet this requirement?

A. Amazon Elastic Container Registry (Amazon ECR)

B. Model packages from Amazon SageMaker Marketplace

C. Amazon SageMaker ML Lineage Tracking

D. Amazon SageMaker Model Registry

Answer: Ⅾ

Explanation:
this is it

2 / 14

A. Create a pipeline in Amazon SageMaker Pipelines to generate a new model. Call the new model from Amazon Bedrock to perform RAG queries.

B. Convert the data into vectors. Store the data in an Amazon Neptune database. Connect the database to Amazon Bedrock. Call the Amazon Bedrock API to perform RAG queries.

D. Create a knowledge base for Amazon Bedrock. Configure a data source that references the S3 bucket. Use the Amazon Bedrock API to perform RAG queries.

Answer: Ⅾ

Explanation:
D is the correct answer.
A The csv and docx files has to be vectorized first. Beside this option does not mention anything about the data
B and C are not applicable in this case.

3 / 14

A. Serverless inference

B. Asynchronous inference

C. Real-time inference

D. Batch transform

Answer: B

Explanation:
Agree with B.
In general, real-time inference supports payloads up to 5 MB for synchronous requests, while asynchronous inference can support larger payloads, often up to 5 GB.

4 / 14

No.104
An ML engineer notices class imbalance in an image classification training job.
What should the ML engineer do to resolve this issue?

A. Reduce the size of the dataset.

B. Transform some of the images in the dataset.

C. Apply random oversampling on the dataset.

D. Apply random data splitting on the dataset.

Answer: C　

Explanation:
correct

5 / 14

A. Create a discovery job in Amazon Macie. Configure the job to find and mask sensitive data.

B. Create Apache Spark code to run on an AWS Glue job. Use the Sensitive Data Detection functionality in AWS Glue to find and mask sensitive data.

C. Create Apache Spark code to run on an AWS Glue job. Program the code to perform a regex operation to find and mask sensitive data.

D. Create Apache Spark code to run on an Amazon EC2 instance. Program the code to perform an operation to find and mask sensitive data.

Answer: Ａ　

Explanation:
correct

6 / 14

A. Use Amazon SageMaker to build a recurrent neural network (RNN) to summarize the data.

B. Use Amazon Comprehend Medical to summarize the data.

C. Use Amazon Kendra to create a quick-search tool to query the data.

D. Use the Amazon SageMaker Sequence-to-Sequence (seq2seq) algorithm to create a text summary from the data.

Answer: B

Explanation:
correct

7 / 14

No.107
A company needs to extract entities from a PDF document to build a classifier model.
Which solution will extract and store the entities in the LEAST amount of time?

A. Use Amazon Comprehend to extract the entities. Store the output in Amazon S3.

B. Use an open source AI optical character recognition (OCR) tool on Amazon SageMaker to extract the entities. Store the output in Amazon S3.

C. Use Amazon Textract to extract the entities. Use Amazon Comprehend to convert the entities to text. Store the output in Amazon S3.

D. Use Amazon Textract integrated with Amazon Augmented AI (Amazon A2I) to extract the entities. Store the output in Amazon S3.

Answer: C

8 / 14

A. Set up Studio client IP validation by using the aws:sourceIp IAM policy condition.

B. Set up Studio client VPC validation by using the aws:sourceVpc IAM policy condition.

C. Set up Studio client role endpoint validation by using the aws:PrimaryTag IAM policy condition.

D. Set up Studio client user endpoint validation by using the aws:PrincipalTag IAM policy condition.

Answer: Ａ

Explanation:
A is correct.
https://aws.amazon.com/blogs/machine-learning/secure-amazon-sagemaker-studio-presigned-urls-part-1-foundational-infrastructure/

9 / 14

A. Create a transient Amazon EMR cluster every week. Use the cluster to run an Apache Spark job to merge and transform the data.

B. Create a weekly AWS Glue job that uses the Apache Spark engine. Use DynamicFrame native operations to merge and transform the data.

C. Create an AWS Lambda function that runs Apache Spark code every week to merge and transform the data. Configure the Lambda function to connect to the initial S3 bucket and the DB cluster.

D. Create an AWS Batch job that runs Apache Spark code on Amazon EC2 instances every week. Configure the Spark code to save the data from the EC2 instances to the second S3 bucket.

Answer: B

Explanation:
correct

10 / 14

10.

A. Schedule a SageMaker Model Monitor job. Observe metrics about model quality.

B. Schedule a SageMaker Model Monitor job with Amazon CloudWatch metrics enabled.

C. Enable Amazon CloudWatch metrics. Observe the ModelSetupTime metric in the SageMaker namespace.

D. Enable Amazon CloudWatch metrics. Observe the ModelLoadingWaitTime metric in the SageMaker namespace.

Answer: Ⅾ

Explanation:
ModelLoadingWaitTime metric
measures the time taken to load the model

11 / 14

11.

A. Use the Amazon Comprehend DetectPiiEntities API call to redact the PII from the data. Store the data in an Amazon S3 bucket. Access the S3 bucket from the SageMaker instances for model training.

D. Use Amazon Macie for automatic discovery of PII in the data. Remove the PII. Store the data in an Amazon S3 bucket. Mount the S3 bucket to the SageMaker instances for model training.

Answer: Ａ

Explanation:
correct

12 / 14

12.

No.112
A company must install a custom script on any newly created Amazon SageMaker notebook instances.
Which solution will meet this requirement with the LEAST operational overhead?

Answer: Ａ

Explanation:
correct

13 / 14

13.

A. Use Amazon Data Firehose to ingest the data. Create an AWS Lambda function to process the data. Store the processed data in Amazon S3. Use Amazon QuickSight to visualize the data.

B. Use Amazon Kinesis Data Streams to ingest the data. Use Amazon Data Firehose to transform the data. Use Amazon Athena to process the data. Use Amazon QuickSight to visualize the data.

14 / 14

14.

A. Store the clinical data in Amazon S3 buckets. Use AWS Glue DataBrew to mask the PII and PHI before the data is used for model training.

B. Upload the clinical data to an Amazon Redshift database. Use built-in SQL stored procedures to automatically classify and mask the PII and PHI before the data is used for model training.

C. Use Amazon Comprehend to detect and mask the PII before the data is used for model training. Use Amazon Comprehend Medical to detect and mask the PHI before the data is used for model training.

D. Create an AWS Lambda function to encrypt the PII and PHI. Program the Lambda function to save the encrypted data to an Amazon S3 bucket for model training.

Answer: C

Explanation:
correct

Your score is