70-775 PDF Dumps

How to use our free microsoft 70-775 PDF Dumps

Our Free 70-775 PDF dumps are based on the full 70-775 mock exams which are available on our Web Site. The microsoft 70-775 PDF consists in questions and answers with detailed explanations.
You can use the PDF 70-775 practice exam as a study material to pass the 70-775 exam, and don't forget to try also our 70-775 testing engine Web Simulator.

On-Line Users: {{voteInfo.result.viewUsers}}
Subscribed Users: {{voteInfo.result.subscribedUsers}}
Thank you for your vote {{voteInfo.result.stars}} Your vote has already been submitted ({{voteInfo.result.votingPeople}} votes)

Follow us on SlideShare to see the latest available 70-775 tests pdf.

										
											Q1.Note: This question is part of a series of questions that present the same scenario. Each question in
the series contains a unique solution that might meet the stated goals. Some question sets might have
more than one correct solution, while others might not have a correct solution.
After you answer a question in this sections, you will NOT be able to return to it. As a result, these
questions will not appear in the review screen.
You are building a security tracking solution in Apache Kafka to parse security logs. The security logs record an
entry each time a user attempts to access an application. Each log entry contains the IP address used to make
the attempt and the country from which the attempt originated.
You need to receive notifications when an IP address from outside of the United States is used to access the
application.
Solution: Create two new consumers. Create a file import process to send messages. Start the producer.
Does this meet the goal?
 - A:   Yes
 - B:   No

 solution: B



Q2.Note: This question is part of a series of questions that present the same scenario. Each question in the series
contains a unique solution that might meet the stated goals. Some question sets might have more than one
correct solution, while others might not have a correct solution.
After you answer a question in this sections, you will NOT be able to return to it. As a result, these questions will
not appear in the review screen.
You are building a security tracking solution in Apache Kafka to parse security logs. The security logs record an
entry each time a user attempts to access an application. Each log entry contains the IP address used to make
the attempt and the country from which the attempt originated.
You need to receive notifications when an IP address from outside of the United States is used to access the
application.
Solution: Create new topics. Create a file import process to send messages. Start the consumer and run the
producer.
Does this meet the goal?
 - A:   Yes
 - B:   No

 solution: A



Q3.Note: This question is part of a series of questions that present the same scenario. Each question in the series
contains a unique solution that might meet the stated goals. Some question sets might have more than one
correct solution, while others might not have a correct solution.
After you answer a question in this sections, you will NOT be able to return to it. As a result, these questions will
not appear in the review screen.
You are building a security tracking solution in Apache Kafka to parse security logs. The security logs record an
entry each time a user attempts to access an application. Each log entry contains the IP address used to make
the attempt and the country from which the attempt originated.
You need to receive notifications when an IP address from outside of the United States is used to access the
application.
Solution: Create a consumer and a broker. Create a file import process to send messages. Run the producer.
Does this meet the goal?
 - A:   Yes
 - B:   No

 solution: B



Q4.You have an Azure HDInsight cluster.
You need a build a solution to ingest real-time streaming data into a nonrelational distributed database.
What should you use to build the solution?
 - A:   Apache Hive and Apache Kafka
 - B:   Spark and Phoenix
 - C:   Apache Storm and Apache HBase
 - D:   Apache Pig and Apache HCatalog

 solution: C

Explanation:
References:
http://storm.apache.org/
http://hbase.apache.org/


Q5.You have an Apache Hive table that contains one billion rows.
You plan to use queries that will filter the data by using the WHERE clause. The values of the columns will be
known only while the data loads into a Hive table.
You need to decrease the query runtime.


What should you configure?
 - A:   static partitioning
 - B:   bucket sampling
 - C:   parallel execution
 - D:   dynamic partitioning

 solution: C

Explanation:
References: https://www.qubole.com/blog/5-tips-for-efficient-hive-queries/


Q6.You plan to copy data from Azure Blob storage to an Azure SQL database by using Azure Data Factory.
Which file formats can you use?
 - A:   binary, JSON, Apache Parquet, and ORC
 - B:   OXPS, binary, text and JSON
 - C:   XML, Apache Avro, text, and ORC
 - D:   text, JSON, Apache Avro, and Apache Parquet

 solution: D

Explanation:
References: https://docs.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-
codecs


Q7.You have an Apache Spark cluster in Azure HDInsight.
You plan to join a large table and a lookup table.
You need to minimize data transfers during the join operation.
What should you do?
 - A:   Use the reduceByKey function.
 - B:   Use a Broadcast variable.
 - C:   Repartition the data.
 - D:   Use the DISK_ONLY storage level.
 - E:   Store the lookup table to a disk.
 - F:   Store the lookup table to Azure Blob storage.

 solution: B

Explanation:
References: https://www.dezyre.com/article/top-50-spark-interview-questions-and-answers-for-2017/208


Q8.You have an Apache Spark cluster in Azure HDInsight.
You execute the following command.

[PIC-1]

What is the result of running the command?
 - A:   the Hive ORC library is imported to Spark and external tables in ORC format are created
 - B:   the Spark library is imported and the data is loaded to an Apache Hive table
 - C:   the Hive ORC library is imported to Spark and the ORC-formatted data stored in Apache Hive tables

becomes accessible
 - D:   the Spark library is imported and Scala functions are executed

 solution: C



Q9.You use YARN to manage the resources for a Spark Thrift Server running on a Linux-based Apache Spark
cluster in Azure HDInsight.
You discover that the cluster does not fully utilize the resources. You want to increase resource allocation.
You need to increase the number of executors and the allocation of memory to the Spark Thrift Server driver.
Which two parameters should you modify? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
 - A:   spark.dynamicAllocation.maxExecutors
 - B:   spark.cores.max
 - C:   spark.executor.memory
 - D:   spark_thrift_cmd_opts
 - E:   spark.executor.instances

 solution: A, C

Explanation:
References: https://stackoverflow.com/questions/37871194/how-to-tune-spark-executor-number-cores-and-
executor-memory


Q10.Note: This question is part of a series of questions that use the same scenario. For your convenience, the
scenario is repeated in each question. Each question presents a different goal and answer choices, but the text
of the scenario is exactly the same in each question in this series.
You are planning a big data infrastructure by using an Apache Spark cluster in Azure HDInsight. The cluster
has 24 processor cores and 512 GB of memory.
The architecture of the infrastructure is shown in the exhibit. (Click the Exhibit button.)

[PIC-2]

The architecture will be used by the following users:
Support analysts who run applications that will use REST to submit Spark jobs.
Business analysts who use JDBC and ODBC client applications from a real-time view. The business
analysts run monitoring queries to access aggregate results for 15 minutes. The results will be referenced
by subsequent queries.
Data analysts who publish notebooks drawn from batch layer, serving layer, and speed layer queries. All of
the notebooks must support native interpreters for data sources that are batch processed. The serving layer
queries are written in Apache Hive and must support multiple sessions. Unique GUIDs are used across the
data sources, which allow the data analysts to use Spark SQL.
The data sources in the batch layer share a common storage container. The following data sources are used:
Hive for sales data
Apache HBase for operations data
HBase for logistics data by using a single region server
The business analysts report that they experience performance issues when they run the monitoring queries.
You troubleshoot the performance issues and discover that the intermediate tables generated when the
analysts run the queries cause pressure for the Java Virtual Machine (JVM) garbage collection per job.
Which configuration settings should you modify to alleviate the performance issues?
 - A:   spark.sql.inMemoryColumnarStorage.batchSize
 - B:   spark.sql.broadcastTimeout
 - C:   spark.sql.files.openCostInBytes
 - D:   spark.sql.shuffle.partitions

 solution: D