Monthly Archives: January 2018

Home » Archives for January 2018

Spark SQL 

By | 2018-08-01T12:51:19+00:00 January 31st, 2018|SparkSQL, Technologies|

Apache Spark is a lightning-fast cluster computing framework designed for fast computation. It is of the most successful projects in the Apache Software Foundation. Spark SQL is a new module in Spark which integrates relational processing with [...]

Apache Hive 

By | 2018-08-01T12:52:59+00:00 January 31st, 2018|Apache Hive, Technologies|

The Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax. Built on top of Apache Hadoop™, Hive provides the following features: Tools to enable easy [...]

Apache Mahout 

By | 2018-08-01T12:54:18+00:00 January 31st, 2018|Apache Mahout, Technologies|

Apache™ Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop®  and using the MapReduce paradigm. Machine learning is a discipline of artificial intelligence focused on enabling machines to learn [...]

Apache Flink

By | 2018-08-01T12:55:45+00:00 January 31st, 2018|Apache Flink, Technologies|

Introduction to Apache Flink® Below is a high-level overview of Apache Flink and stream processing. Continuous Processing for Unbounded Datasets Features: Why Flink? Flink, the streaming model, and bounded datasets The “What”: Flink from [...]

Apache ZooKeeper

By | 2018-08-09T06:38:03+00:00 January 31st, 2018|Apache ZooKeeper, Technologies|

Apache ZooKeeper Apache ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration [...]

Apache Kafka

By | 2018-08-01T13:00:04+00:00 January 30th, 2018|Apache Kafka, Technologies|

We think of a streaming platform as having three key capabilities: It lets you publish and subscribe to streams of records. In this respect it is similar to a message queue or enterprise messaging [...]

Apache Flume

By | 2018-08-01T13:01:09+00:00 January 30th, 2018|Apache Flume, Technologies|

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. The use of Apache Flume [...]

Apache Spark 

By | 2018-08-01T13:02:14+00:00 January 30th, 2018|Apache Spark, Technologies|

Apache Spark is a fast and general engine for large-scale data processing. Speed Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Apache Spark has an advanced DAG [...]

Apache Pig

By | 2018-08-01T13:09:25+00:00 January 30th, 2018|Apache Pig, Technologies|

Apache Pig is a high-level language platform developed to execute queries on huge datasets that are stored in HDFS using Apache Hadoop. It is similar to SQL query language but applied on a larger [...]

Load More Posts