Talend Data Mapper Advanced – Spark

Talend Data Mapper (TDM) offers multiple specialized components that allow you to process hierarchical files at the speed of Spark.

This course covers how to create Big Data batch or streaming Jobs and how to invoke TDM maps from these Jobs. At the end of the course, you will know how to transform hierarchical files and streams of hierarchical records.

Duration1 jour (7 heures)
Target audience Java developers and software architects
PrerequisitesCompletion of Data Integration Basics and TDM Essentials
Course objectives

After completing this course, you will be able to:

  • Invoke TDM maps in Big Data batch and streaming Jobs
  • Understand the basics of Spark, Spark streaming, and Kafka
  • Use TDM components dedicated to Big Data Jobs
  • Enable multiple outputs on TDM Big Data components
Course agenda

Spark in context

  • Concepts

Connection to the Hadoop cluster

  • Opening a training project
  • Monitoring the Hadoop cluster
  • Creating cluster metadata

TDM on Spark in context

  • Concepts

Converting files

  • Converting file formats

Transforming files

  • Transforming files – single output
  • Transforming files – multiple outputs

Processing files

  • Processing hierarchical data in files

Processing streams

  • Understanding the basics of Kafka
  • Processing streams of records