Puneet Varma (Editor)

Sqoop

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Development status
  
Active

Operating system
  
Cross-platform

Written in
  
Java

Developer(s)
  
Apache Software Foundation

Stable release
  
1.4.6 / May 11, 2015 (2015-05-11)

Repository
  
git-wip-us.apache.org/repos/asf/sqoop.git

Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. It supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be used to populate tables in Hive or HBase. Exports can be used to put data from Hadoop into a relational database. Sqoop got the name from sql+hadoop. Sqoop became a top-level Apache project in March 2012.

Informatica Big Data Management provides Sqoop based connector from version 10.1. Informatica supports both Sqoop Import and Export, which is often used with Data Integration use cases on Hadoop.

Pentaho provides open source Sqoop based connector steps, Sqoop Import and Sqoop Export, in their ETL suite Pentaho Data Integration since version 4.5 of the software. Microsoft uses a Sqoop-based connector to help transfer data from Microsoft SQL Server databases to Hadoop. Couchbase, Inc. also provides a Couchbase Server-Hadoop connector by means of Sqoop.

In 2015 Ralph Kimball described Sqoop as follows under the heading The Future of ETL:

Several big changes must take place in the ETL environment. First, the data feeds from original sources must support huge bandwidths, at least gigabytes per second. Learn about Sqoop loading data into Hadoop. If these words mean nothing to you, you have some reading to do! Start with Wikipedia.

References

Sqoop Wikipedia