Mois : June 2016

Data Preparation, to the Moon and Beyond

  According to a University of Southern California study, less than a decade ago, overall digital information located on storage devices, for the entire world, reached 300 Exabyte. To figure out what it’s like, just imagine that it would require over 400 billion CD-ROMs. If you were to build a stack with it, you would go over the distance from […]

The Evolution of ETL and Continuous Integration

In the beginning of ETL…. When I started my IT career over 15 years ago I was nothing more than a “Fresh-out” with a college degree and an interest in computers and programming. At that time, I knew the theories behind the Software Development Life Cycle (SDLC) and had put it to some practice in […]

How to Aggregate Clickstream Data with Apache Spark

  As part of a POC of Talend v6.1 Big Data capabilities, I was asked by one of our long-time customers, a major e-commerce company, to present a solution for aggregating huge files of clickstream data on Hadoop. The input data was a giant clickstream file (larger than 100GB, or even terabytes) from a website. Our […]