70-475 PDF Dumps

How to use our free microsoft 70-475 PDF Dumps

Our Free 70-475 PDF dumps are based on the full 70-475 mock exams which are available on our Web Site. The microsoft 70-475 PDF consists in questions and answers with detailed explanations.
You can use the PDF 70-475 practice exam as a study material to pass the 70-475 exam, and don't forget to try also our 70-475 testing engine Web Simulator.

										
											
Microsoft 70-475
Designing and Implementing Big Data Analytics Solutions
Microsoft 70-475 Dumps Available Here at:
https://www.certification-questions.com/microsoft-exam/70-475-dumps.html
Enrolling now you will get access to 23 questions in a unique set of 70- 475 dumps
Question 1
Overview:
Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred companies.
Relecloud has a Microsoft SQL Server database named DB1 that stores information about the advertisers. DB1 is hosted on a Microsoft Azure virtual machine.
Relecloud has two main offices. The offices are located in San Francisco and New York City.
The offices connect to each other by using a site-to-site VPN. Each office connects directly to the Internet. Relecloud modifies the pricing of its advertisements based on trending topics. Topics are considered to be trending if they generate many mentions in a specific country during a 15-minute time frame. The highest trending topics generate the highest advertising revenue.
Relecloud wants to deliver reports to the advertisers by using Microsoft Power BI. The reports will provide real-time data on trending topics, current advertising rates, and advertising costs for a given month. Relecloud will analyze the trending topics data, and then store the data in a new data warehouse for ad- hoc analysis. The data warehouse is expected to grow at a rate of 1 GB per hour or 8.7 terabytes (TB) per year. The data will be retained for five years for the purpose of long-term trending.
Requirements:
Management at Relecloud must be able to view which topics are trending to adjust advertising rates in near real-time.
Relecloud plans to implement a new streaming analytics platform that will report on trending topics. Relecloud plans to implement a data warehouse named DB2.
Relecloud identifies the following technical requirements:
- Social media data must be analyzed to identify trending topics in real-time.
- The use of Infrastructure as a Service (IaaS) platforms must minimized, whenever possible.
- The real-time solution used to analyze the social media data must support scaling up and down without service interruption.
 https://www.certification-questions.com

Microsoft 70-475
Relecloud identifies the following technical requirements for the advertisers:
- The advertisers must be able to see only their own data in the Power BI reports.
- The advertisers must authenticate to Power BI by using Azure Active Directory (Azure AD) credentials.
- The advertisers must be able to leverage existing Transact-SQL language knowledge when developing the real-time streaming solution.
- Members of the internal advertising sales team at Relecloud must be able to see only the sales date of the advertisers to which they are assigned.
- The internal Relecloud advertising sales team must be prevented from inserting, updating, and deleting rows for the advertisers to which they are not assigned.
- The internal Relecloud advertising sales team must be able to use a text file to update the list of advertisers, and then to upload the file to Azure Blob storage.
Relecloud identifies the following requirements for DB1:
- Data generated by the streaming analytics platform must be stored in DB1.
- The user names of the advertisers must be mapped to CustomerID in a table named Table2.
- The advertisers in DB1 must be stored in a table named Table1 and must be refreshed nightly.
- The user names of the employees at Relecloud must be mapped to EmployeeID in a table named Table3.
Relecloud identifies the following requirements for DB2:
- DB2 must have minimal storage costs.
- DB2 must run load processes in parallel.
- DB2 must support massive parallel processing.
- DB2 must be able to store more than 40 TB of data.
- DB2 must support scaling up and down, as required.
-Data from DB1 must be archived in DB2 for long-term storage.
- All of the reports that are executed from DB2 must use aggregation.
- Users must be able to pause DB2 when the data warehouse is not in use.
 https://www.certification-questions.com

Microsoft 70-475
- Users must be able to view previous versions of the data in DB2 by using aggregates. Relecloud identifies the following requirements for extract, transformation, and load (ETL):
- Data movement between DB1 and DB2 must occur each hour.
- An email alert must be generated when a failure of any type occurs during ETL processing. Sample code and data:
You execute the following code for a table named rls_table1.
-
You use the following code to create Table1. create table table1
(customerid int,
salespersonid int
...
)
Go
The following is a sample of the streaming data.
-
Which technology should you recommend to meet the technical requirement for analyzing the social media data?
Options:
A. Azure Stream Analytics
B. Azure Data Lake Analytics
C. Azure Machine Learning
D. Azure HDInsight Storm clusters Answer: A
Explanation:
Azure Stream Analytics is a fully managed event-processing engine that lets you set up real-time analytic computations on streaming data.
Scalability
Stream Analytics can handle up to 1 GB of incoming data per second. Integration with Azure Event Hubs and Azure IoT Hub allows jobs to ingest millions of events per second coming from connected devices, clickstreams, and log files, to name a few. Using the partition feature of event hubs, you can partition computations into logical steps, each with the ability to be further partitioned to increase scalability.
From scenario: Relecloud identifies the following technical requirements:
- Social media data must be analyzed to identify trending topics in real-time.
- The use of Infrastructure as a Service (IaaS) platforms must minimized, whenever possible.
 https://www.certification-questions.com

Microsoft 70-475
- The real-time solution used to analyze the social media data must support scaling up and down without service interruption.
Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-introduction
Question 2
Note: The question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
Your company has multiple databases that contain millions of sales transactions.
You plan to implement a data mining solution to identify purchasing fraud.
You need to design a solution that mines 10 terabytes (TB) of sales date. The solution must meet the following requirements:
- Run the analysis to identify fraud once per week.
- Continue to receive new sales transactions while the analysis runs.
- Be able to stop computing services when the analysis is NOT running.
Solution: You create a Cloudera Hadoop cluster on Microsoft Azure virtual machines. Does this meet the goal?
Options:
A. Yes
B. No Answer: A
Explanation:
Processing large amounts of unstructured data requires serious computing power and also maintenance effort. As load on computing power typically fluctuates due to time and seasonal influences and/or processes running on certain times, a cloud solution like Microsoft Azure is a good option to be able to scale up easily and pay only for what is actually used.
Reference: http://blog.cloudera.com/blog/2016/02/how-to-install-cloudera-enterprise-on-microsoft-azure- part-1/
Question 3
Note: The question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
 https://www.certification-questions.com

Microsoft 70-475
will not appear in the review screen.
Your company has multiple databases that contain millions of sales transactions.
You plan to implement a data mining solution to identify purchasing fraud.
You need to design a solution that mines 10 terabytes (TB) of sales date. The solution must meet the following requirements:
- Run the analysis to identify fraud once per week.
- Continue to receive new sales transactions while the analysis runs.
- Be able to stop computing services when the analysis is NOT running. Solution: You create a Microsoft Azure HDInsight cluster.
Does this meet the goal?
Options:
A. Yes
B. No Answer: B
Explanation:
HDInsight cluster billing starts once a cluster is created and stops when the cluster is deleted. Billing is pro-rated per minute, so you should always delete your cluster when it is no longer in use.
Reference: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-provision-linux-clusters
Question 4
You are designing a solution that will use Apache HBase on Microsoft Azure HDInsight.
You need to design the row keys for the database to ensure that client traffic is directed over all of the nodes in the cluster.
What are two possible techniques that you can use? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.
Options:
A. padding
B. trimming
C. hashing
D. salting
Answer: C, D
Explanation:
There are two strategies that you can use to avoid hotspotting:
 https://www.certification-questions.com

Microsoft 70-475
* Hashing keys
To spread write and insert activity across the cluster, you can randomize sequentially generated keys by hashing the keys, inverting the byte order. Note that these strategies come with trade-offs. Hashing keys, for example, makes table scans for key subranges inefficient, since the subrange is spread across the cluster.
* Salting keys
Instead of hashing the key, you can salt the key by prepending a few bytes of the hash of the key to the actual key.
Note. Salted Apache HBase tables with pre-split is a proven effective HBase solution to provide uniform workload distribution across RegionServers and prevent hot spots during bulk writes. In this design, a row key is made with a logical key plus salt at the beginning. One way of generating salt is by calculating n (number of regions) modulo on the hash code of the logical row key (date, etc).
Reference: https://blog.cloudera.com/blog/2015/06/how-to-scan-salted-apache-hbase-tables-with-region-specific-key- ranges-in-mapreduce/ http://maprdocs.mapr.com/51/MapR-DB/designing_row_keys_for_mapr_db_binary_tables.html
Question 5
A company named Fabrikam, Inc. has a Microsoft Azure web app. Billions of users visit the app daily.
The web app logs all user activity by using text files in Azure Blob storage. Each day, approximately 200 GB of text files are created.
Fabrikam uses the log files from an Apache Hadoop cluster on Azure HDInsight.
You need to recommend a solution to optimize the storage of the log files for later Hive use.
What is the best property to recommend adding to the Hive table definition to achieve the goal? More than one answer choice may achieve the goal. Select the BEST answer.
Options:
A. STORED AS RCFILE B. STORED AS GZIP
C. STORED AS ORC
D. STORED AS TEXTFILE Answer: C
Explanation:
The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data. It was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data.
Compared with RCFile format, for example, ORC file format has many advantages such as:
- a single file as the output of each task, which reduces the NameNode's load
- Hive type support including datetime, decimal, and the complex types (struct, list, map, and union) - light-weight indexes stored within the file
 https://www.certification-questions.com

Microsoft 70-475
- skip row groups that don't pass predicate filtering - seek to a given row
- block-mode compression based on data type
- run-length encoding for integer columns
- dictionary encoding for string columns
- concurrent reads of the same file using separate RecordReaders
- ability to split files without scanning for markers
- bound the amount of memory needed for reading or writing
- metadata stored using Protocol Buffers, which allows addition and removal of fields Reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual +ORC#LanguageManualORC-ORCFileFormat
Question 6
You are designing a solution based on the lambda architecture.
You need to recommend which technology to use for the serving layer. What should you recommend?
Options:
A. Apache Storm
B. Kafka
C. Microsoft Azure DocumentDB D. Apache Hadoop
Answer: C
Explanation:
The Serving Layer is a bit more complicated in that it needs to be able to answer a single query request against two or more databases, processing platforms, and data storage devices. Apache Druid is an example of a cluster-based tool that can marry the Batch and Speed layers into a single answerable request.
Reference: https://en.wikipedia.org/wiki/Lambda_architecture
Question 7
Your company has thousands of Internet-connected sensors.
You need to recommend a computing solution to perform a real-time analysis of the date generated by the sensors.
Which computing solution should you include in the recommendation?
Options:
A. Microsoft Azure Stream Analytics
 https://www.certification-questions.com

Microsoft 70-475
B. Microsoft Azure Notification Hubs
C. Microsoft Azure Cognitive Services
D. Microsoft Azure HDInsight HBase cluster Answer: D
Explanation:
HDInsight HBase is offered as a managed cluster that is integrated into the Azure environment. The clusters are configured to store data directly in Azure Storage or Azure Data Lake Store, which provides low latency and increased elasticity in performance and cost choices. This enables customers to build interactive websites that work with large datasets, to build services that store sensor and telemetry data from millions of end points, and to analyze this data with Hadoop jobs. HBase and Hadoop are good starting points for big data project in Azure; in particular, they can enable real-time applications to work with large datasets.
Reference: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hbase-overview https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-introduction
Question 8
Note: The question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
Your company has multiple databases that contain millions of sales transactions.
You plan to implement a data mining solution to identify purchasing fraud.
You need to design a solution that mines 10 terabytes (TB) of sales date. The solution must meet the following requirements:
- Run the analysis to identify fraud once per week.
- Continue to receive new sales transactions while the analysis runs.
- Be able to stop computing services when the analysis is NOT running. Solution: You create a Microsoft Azure Data Lake job.
Does this meet the goal?
Options:
A. Yes
B. No Answer: B
 https://www.certification-questions.com

Microsoft 70-475
Question 9
You have a Microsoft Azure subscription that contains an Azure Data Factory pipeline. You have an RSS feed that is published on a public website.
You need to configure the RSS feed as a data source for the pipeline.
Which type of linked service should you use?
Options:
A. web
B. OData
C. Azure Search
D. Azure Data Lake Store
Answer: A
Explanation:
Reference: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-web-table-connector
Question 10
You have an Apache Storm cluster.
The cluster will ingest data from a Microsoft Azure event hub.
The event hub has the characteristics described in the following table.
-
You are designing the Storm application topology.
You need to ingest data from all of the partitions. The solution must maximize the throughput of the data ingestion.
Which setting should you use?
Options:
A. Partition Count
B. Message Retention
C. Partition Key
D. Shared access policies Answer: A
Explanation:
Reference: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-storm-develop-java-event-hub- topology
 https://www.certification-questions.com

Microsoft 70-475
Would you like to see more? Don't miss our 70-475 PDF file at:
https://www.certification-questions.com/microsoft-pdf/70-475-pdf.html
 https://www.certification-questions.com