Configuring a Cloudera CDH cluster on Ubuntu Bionic

Cloudera CDH Overview

CDH (Cloudera’s Distribution including Apache Hadoop) is an open source platform distribution including Apache Hadoop, Apache Spark, Apache Impala, Apache Kudu, Apache HBase, and many more. The software is maintained by the company Cloudera and is available both in a free community edition and in an enterprise edition that proposes advanced features.

CDH features all the leading components to store, process, discover, model, and serve unlimited data and is built entirely on open standards.

This tutorial is designed for Bare Metal Cloud Servers, but Cloudera can also be installed on Scaleway Dedibox servers or Virtual Instances. For more information about cluster sizing, you can read the Hardware requirements article.

Requirements

  • You have an account and are logged into the Scaleway Elements console
  • You have at least two Bare Metal Cloud Servers running on Ubuntu Bionic Beaver to form a cluster
  • For optimal performances it is recommended to use GP-BM1-M Bare Metal Cloud Servers for smaller projets or HM-BM1-M Bare Metal Cloud Servers for advanced projets that require large amounts of RAM.

Creating an SSH key compatible with Cloudera

Cloudera requires an SSH key in the OpenSSL format.

1 . Generate a RSA key for the communication from the Cloudera Manager with the nodes:

openssl genrsa -out key.pem 4096

2 . Set the read and write permissions to the user only:

chmod a-rw key.pem
chmod u+rw key.pem

3 . Extract the public key:

ssh-keygen -y -f key.pem > public.pub

4 . Display the public key and upload it to your Scaleway management console:

cat public.pub

Installing Cloudera Manager

Important: By downloading and installing CDH, you agree to the Cloudera Standard License Terms and Conditions.

1 . Log into the Master machine via SSH:

ssh root@bare-metal-server-ip

2 . Download the archive.key for Ubuntu Bionic Beaver and add it to apt, to be able to use the Cloudera repository.

wget https://archive.cloudera.com/cm6/6.3.1/ubuntu1804/apt/archive.key
apt-key add archive.key

3 . Add the Cloudera repository information to the apt packet manager:

add-apt-repository "deb [arch=amd64] http://archive.cloudera.com/cm6/6.3.1/ubuntu1804/apt bionic-cm6.3.1 contrib"

Important: If the command above fails, install the package software-properties-common with the following command: apt install software-properties-common

4 . Update the apt repository data:

apt update

5 . Install the Oracle JDK:

apt install oracle-j2sdk1.8

Important: By installing the Oracle JDK, you have to agree to the Oracle Binary Code License Agreement.
If you don’t want to agree to the Oracle Binary Code License Agreement, you can install OpenJDK, an open-source implementation of the Java Platform, manually on all hosts in your cluster:
apt install openjdk-8-jdk

6 . Install Cloudera Manager via the apt packet manager:

apt install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server

Installing and Configuring a Database for Cloudera Manager

Execute the following steps on the Master machine.

1 . Install MariaDB, an open-source fork of the MySQL database engine:

apt install mariadb-server libmysql-java

2 . Run the secure installation wizzard to secure the database root account and to remove unused contents:

mysql_secure_installation

3 . Connect to the MySQL shell as root:

mysql -u root -p

4 . Create the MySQL databases required for Cloudera. Replace <password> with a secure password of your choice:

# Cloudera Manager Server
CREATE DATABASE scm DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON scm.* TO 'scm'@'%' IDENTIFIED BY '<password>';

# Activity Monitor
CREATE DATABASE amon DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON amon.* TO 'amon'@'%' IDENTIFIED BY '<password>';

# Reports Manager
CREATE DATABASE rman DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON rman.* TO 'rman'@'%' IDENTIFIED BY '<password>';

# Hue
CREATE DATABASE hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON hue.* TO 'hue'@'%' IDENTIFIED BY '<password>';

# Hive Metastore Server
CREATE DATABASE metastore DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON metastore.* TO 'hive'@'%' IDENTIFIED BY '<password>';

# Sentry Server
CREATE DATABASE sentry DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON sentry.* TO 'sentry'@'%' IDENTIFIED BY '<password>';

# Cloudera Navigator Audit Server
CREATE DATABASE nav DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON nav.* TO 'nav'@'%' IDENTIFIED BY '<password>';

# Cloudera Navigator Metadata Server
CREATE DATABASE navms DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON navms.* TO 'navms'@'%' IDENTIFIED BY '<password>';

# Oozie
CREATE DATABASE oozie DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON oozie.* TO 'oozie'@'%' IDENTIFIED BY '<password>';

5 . Verify that all databases are available:

SHOW DATABASES;

The output looks like the following example:

+--------------------+
| Database           |
+--------------------+
| amon               |
| hue                |
| information_schema |
| metastore          |
| mysql              |
| nav                |
| navms              |
| oozie              |
| performance_schema |
| rman               |
| scm                |
| sentry             |
+--------------------+
12 rows in set (0.00 sec)

6 . Quit the MySQL shell once all tables are configured:

quit;

7 . Open the MySQL configuration file /etc/mysql/mariadb.conf.d/50-server.cnf in a text editor:

nano /etc/mysql/mariadb.conf.d/50-server.cnf

8 . Comment-out the line bind-address by putting a # in front of it to enable remote connections to the MariaDB server:

Important: Once uncommented the database server will be reachable from any machine connected to the Internet. It is recommended to limit the access to the IPs of the cluster by setting up a firewall on the machine.

# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
#bind-address           = 127.0.0.1

9 . Save the file, exit the text editor and restart MariaDB:

service mysql restart

Setting up the Cloudera Manager Database

After installing and configuring the database server on the master machine, continue by setting up the Cloudera Manager Database. The application comes with a script that can automatically:

  • Create the Cloudera Manager Server database configuration file.
  • Configure a database for Cloudera Manager Server to use.
  • Create and configure a user account for Cloudera Manager Server.

1 . Run the script to initalize the database. The command requires that both, a database and a user called scm are created in the previous step.

/opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm

2 . When prompted enter the password for the user scm:

Enter SCM password:

3 . An output like the following should display on the screen:

JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
Verifying that we can write to /etc/cloudera-scm-server
Creating SCM configuration file in /etc/cloudera-scm-server
Executing:  /usr/lib/jvm/java-1.8.0-openjdk-amd64/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-java.jar:/opt/cloudera/cm/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
2019-07-25 16:47:58,307 [main] INFO  com.cloudera.enterprise.dbutil.DbCommandExecutor  - Successfully connected to database.
All done, your SCM database is configured correctly!

4 . Start the Cloudera SCM Server:

systemctl start cloudera-scm-server

Important: Starting the Cloudera SCM Server may take some minutes. You can follow the startup process with the following command:
tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
The SCM server is ready when the following line appears:
INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty server.

Installing CDH and Other Software

Continue the installation of CDH and other required software on the master host.

1 . Open a web browser and go to http://<cloudera_server_host>:7180, where <cloudera_server_host> stands for the IP address of the Cloudera server.

2 . The login screen displays:

Use these credentials to login:

  • Username: admin
  • Password: admin

3 . The Welcome Screen displays, showing some basic information about Cloudera Manager. Click Continue to go to the next step of the installation progress.

4 . The End User License Terms and Conditions displays. Read the document carefully. Once read, check the box Yes, I accept the End User License Terms and Conditions. and click on Continue to proceed with the installation.

5 . Select the license type:

  • Cloudera Express: This version does not require a license, but provides a limited set of features.
  • Cloudera Enterprise Trial: This version does not require a license, but expires after 60 days and cannot be renewed.
  • Cloudera Enterprise: This version requires a license key.

This tutorial is about the Express version of Cloudera. You can upgrade the license to another version at any time if required.

Click on Cloudera Express, then on Continue to go to the next step of the installation process and add a first cluster.

Adding a cluster in Cloudera Manager

The following steps have to be run on the master machine:

1 . The Add Cluster welcome screen displays, explaining the steps necessary to configure a new cluster in Cloudera Manager. Click on Continue to go to the next step.

2 . Enter the Cluster Name, then click on Continue.

Adding Servers in a Cluster

1 . Specify the Hostnames of the machines forming the cluster. Set the SSH Port (Default: Port 22) and click on Search. The machines will be displayed in a list below the form. Click on Continue to move forward to the next step.

2 . Select the Repository to be used for the installation of Cloudera Manager Agent, CDH and other software. Select the Public Cloudera Repository and choose the installation via Parcels. Parcels are a binary distribution format containing the program files, along with additional metadata used by Cloudera Manager. Click on Continue to validate the form and to go to the next step.

3 . In case the Oracle Java SE Development Kit (JDK 8) will be installed, read the license terms and tick the box to agree. Leave the box unchecked to continue using a manually installed OpenJDK on the cluster nodes. Click on Continue to move forward to the next step.

4 . Configure the Login credentials. Root access to all cluster hosts is required to install the Cloudera packages. Configure the root login by uploading the private key file, entering the passphrase (if any) and the SSH port (Default value: 22).

5 . The automatic configuration of the machines in the cluster is launched. Once done, click on Continue to proceed to the next step.

6 . In this step CDH is being downloaded and deployed on all machines in the cluster:

Click Continue once the step has been done.

Configuring a Cluster in Cloudera Manager

Continue to run the following steps on the master machine:

1 . Choose one of the pre-defined service configurations, or select Custom to configure the services towards your particular needs. Click on Essentials, then on Continue to setup a basic cluster.

2 . Cloudera Manager assigns the roles automatically to the different machines in the cluster according to their performance. You may modify the distribution of the roles, but if assignments are made incorrectly it may have an impact on the performance of the whole cluster. Once you have validated the distribution of the roles, click on Continue to move forward to the next step.

3 . Configure the databases required for the different services. The database server is already pre-filled. Enter the passwords for the different databases as set previously.
Test the connection to the database by clicking on Test connection:

Once the connection test is finished, click on Continue to move to the next step.

4 . Review the changes, the default values should be fine except if you want to run a specific configuration. Click on Continue to execute the First Run command.

5 . The First Run command runs a set of services for the first time to check that everything is working. This may take some time. Once the tests have been completed, click on Continue to go to the next step.

6 . Once the First Run has completed as message confirming the installation of the cluster appears. Click Finish to leave the installation wizard:

7 . The Cloudera Manager Dashboard appears, providing an overview about the clusters status:

You now have a working Cloudera CDH cluster. You may refer to the official documentation for more information or continue with the tutorial Getting Started with Hadoop to learn how to use Cloudera CDH with Apache Hadoop.

Discover New Bare Metal Cloud servers

Deploy 100% dedicated servers that are billed by the hour and available in minutes.