November 2016 - at Present
• Data Lake Design and Implementation based on Lambda Architecture: we create batch and streaming layer in our solution using HDFS and Cassandra as a Data Store Layer and Spark framework as a Processing Layer. • Using Cloudera Manager for provisioning, managing and monitoring Hadoop Cluster. • Data Pipelines: Designed and developed complex data pipelines and maintained the data quality to support a rapidly growing business. • Create Data Pipelines and the ETL workflows: getting data from some our data sources (Crawler - Nutch, RDBMS, Log files and some Rest services) and process it to generate metadata by calling some web services, transform to appropriate format and clean bad records in a chain of data services using Apache Kafak and Apache NiFi. • Installation, Configuration and Performance tuning of big data tools: The Hadoop Stack (HDFS + Yarn + Hive + Spark), Apache Kafka, Apache NiFi, Apache Cassandra, Apache Solr and Apache Zookeeper.