Big Data analysis with Hadoop and Spark

Domenico Pontari

DATE Wednesday, 20th of March 2019

LOCATION Details will follow. Stay tuned!

The workshop aims to bring participants to understand the architecture and modules of the Hadoop framework, use the distributed HDFS file system, write and start jobs MapReduce (java), load data (bulk load) on HDFS and use Spark for data processing (scala).

Combo Special offer! Save up to 25% on workshop + Regular conference ticket full price (€ 300)!
Only 20 tickets available.


Click here to know how to obtain these discounts.

LANGUAGE
Italian

LEVEL
Medium

DURATION
The workshop is full-day (8 hours) from 09:00 to 18:00, with one hour lunch break.

CHECK IN 8:30 - 09:00

PRICES

114 € the first 20 tickets;
190  until the 19th of March;

Combo Special offer! Save up to 25% on workshop + Regular conference ticket full price (€ 300)!
Only 20 tickets available.


Click here to know how to obtain these discounts.

Domenico is an IT entrepreneur and data scientist, he loves to transmit his passion for solving complex problems that require creative solutions and technical expertise.
Founder and CEO of WiNK (http://wink.by), a software company specialized in the realization of innovative projects and ready-to-market prototypes. He is BHL – Bio Health Lab board member, an innovative startup and spin off the Campus Biomedico di Roma, which aims to develop IT solutions in biomedical field. He is the community manager of Apache Spark’s meetups on Rome.

ABSTRACT

The workshop aims to bring participants to understand the architecture and modules of the Hadoop framework, use the distributed HDFS file system, write and start jobs MapReduce (java), load data (bulk load) on HDFS and use Spark for data processing (scala). All topics are dealt with from a theoretical point of view and accompanied by many demonstration exercises that will be developed and described during the course.

TABLE OF CONTENTS

– Architecture of a Big Data system
– Hadoop introduction
– HDFS
– Functional programming
– Programming for distributed environments
– Map-Reduce algorithm
– Spark RDDs
– Spark actions and transformations
– Examples and exercises

TRAINING OBJECTIVES

– understanding the architecture and modules of the Hadoop framework;
– use the HDFS distributed file system;
– introduction of functional programming;
– map-reduce algorithm analysis and its computational complexity;
– understanding Spark RDDs;
– big data analysis with Spark;

WHO THE WORKSHOP IS DEDICATED TO?

software developers who want to explore how to manage big data using Hadoop and Spark

PREREQUIREMENTS

Rough knowledge of java and scala. Good knowledge of at least one high level programming language.

HARDWARE AND SOFTWARE REQUIREMENTS

– laptop
– scala
– Apache Spark
– virtual machine Cloudera CDH 5.16

WARNING
Seats are limited.
The workshop will be held only if the minimun number of attendees is reached.

Combo Special offer! Save up to 25% on workshop + Regular conference ticket full price (€ 300)!
Only 20 tickets available.


Click here to know how to obtain these discounts.

Back to workshops list

Main Sponsor