What is the formula for measuring skewness in a dataset?
A. MEAN - MEDIAN
B. MODE - MEDIAN
C. (3(MEAN - MEDIAN))/ STANDARD DEVIATION
D. (MEAN - MODE)/ STANDARD DEVIATION
Which command manually triggers a single run of a scheduled task (either a standalone task or the root task in a DAG) independent of the schedule defined for the task?
A. RUN TASK
B. CALL TASK
C. EXECUTE TASK
D. RUN ROOT TASK
Which one is not the feature engineering techniques used in ML data science world?
A. Imputation
B. Binning
C. One hot encoding
D. Statistical
Which tools helps data scientist to manage ML lifecycle and Model versioning? Choose 2.
A. MLFlow
B. Pachyderm
C. Albert
D. CRUX
Data Scientist can query, process, and transform data in a which of the following ways using Snowpark Python. Choose 2.
A. Query and process data with a DataFrame object.
B. Write a user-defined tabular function (UDTF) that processes data and returns data in a set of rows with one or more columns.
C. SnowPark currently do not support writing UDTF.
D. Transform Data using DataIKY tool with SnowPark API.
Which ones are the known limitations of using External function? Choose all apply.
A. Currently, external functions cannot be shared with data consumers via Secure Data Sharing.
B. Currently, external functions must be scalar functions. A scalar external function re-turns a single value for each input row.
C. External functions have more overhead than internal functions (both built-in functions and internal UDFs) and usually execute more slowly
D. An external function accessed through an AWS API Gateway private endpoint can be accessed only from a Snowflake VPC (Virtual Private Cloud) on AWS and in the same AWS region.
Mark the incorrect statement regarding Python UDF?
A. Python UDFs can contain both new code and calls to existing packages
B. For each row passed to a UDF, the UDF returns either a scalar (i.e. single) value or, if defined as a table function, a set of rows.
C. A UDF also gives you a way to encapsulate functionality so that you can call it repeatedly from multiple places in code
D. A scalar function (UDF) returns a tabular value for each input row
Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the aggregate method shown in below code do?
g = df.groupby(df.index.str.len())
A. aggregate({'A':len, 'B':np.sum})
B. Computes Sum of column A values
C. Computes length of column A
D. Computes length of column A and Sum of Column B values of each group
E. Computes length of column A and Sum of Column B values
What Can Snowflake Data Scientist do in the Snowflake Marketplace as Consumer? Choose all apply.
A. Discover and test third-party data sources.
B. Receive frictionless access to raw data products from vendors.
C. Combine new datasets with your existing data in Snowflake to derive new business in- sights.
D. Use the business intelligence (BI)/ML/Deep learning tools of her choice.
Which of the following cross validation versions is suitable quicker cross-validation for very large datasets with hundreds of thousands of samples?
A. k-fold cross-validation
B. Leave-one-out cross-validation
C. Holdout method
D. All of the above