Jump toUpdate content
Configuring a Cloudera CDH cluster on Ubuntu Bionic
CDH (Cloudera’s Distribution including Apache Hadoop) is an open source platform distribution including Apache Hadoop, Apache Spark, Apache Impala, Apache Kudu, Apache HBase, and many more. The software is maintained by the company Cloudera and is available both in a free community edition and in an enterprise edition that proposes advanced features.
CDH features all the leading components to store, process, discover, model, and serve unlimited data and is built entirely on open standards.
This tutorial is designed for Bare Metal Cloud Servers, but Cloudera can also be installed on Scaleway Dedibox servers or Virtual Instances. For more information about cluster sizing, you can read the Hardware requirements article.
- You have an account and are logged into the Scaleway Console
- You have created at least two Bare Metal servers running on Ubuntu Bionic Beaver to form a cluster. For optimal performance it is recommended to use GP-BM1-M Bare Metal Cloud Servers for smaller projets or HM-BM1-M Bare Metal Cloud Servers for advanced projets that require large amounts of RAM.
Cloudera requires an SSH key in the OpenSSL format.
Generate a RSA key for the communication from the Cloudera Manager with the nodes:
openssl genrsa -out key.pem 4096
Set the read and write permissions to the user only:
chmod a-rw key.pem
chmod u+rw key.pem
Extract the public key:
ssh-keygen -y -f key.pem > public.pub
Display the public key and upload it to your Scaleway management console:
By downloading and installing CDH, you agree to the Cloudera Standard License Terms and Conditions.
Log into the Master machine via SSH:
archive.keyfor Ubuntu Bionic Beaver and add it to
apt, to be able to use the Cloudera repository.
apt-key add archive.key
Add the Cloudera repository information to the
add-apt-repository "deb [arch=amd64] http://archive.cloudera.com/cm6/6.3.1/ubuntu1804/apt bionic-cm6.3.1 contrib"Important:
If the command above fails, install the package
software-properties-commonwith the following command:
apt install software-properties-common
Update the apt repository data:
Install the Oracle JDK:
apt install oracle-j2sdk1.8Important:
By installing the Oracle JDK, you have to agree to the [Oracle Binary Code License Agreement](Oracle Binary Code License Agreement). If you don’t want to agree to the Oracle Binary Code License Agreement, you can install OpenJDK, an open-source implementation of the Java Platform, manually on all hosts in your cluster:
apt install openjdk-8-jdk
Install Cloudera Manager via the
apt install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server
Execute the following steps on the Master machine.
apt install mariadb-server libmysql-java
Run the secure installation wizzard to secure the database root account and to remove unused contents:
Connect to the MySQL shell as root:
mysql -u root -p
Create the MySQL databases required for Cloudera. Replace
<password>with a secure password of your choice:
# Cloudera Manager Server
CREATE DATABASE scm DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON scm.* TO 'scm'@'%' IDENTIFIED BY '<password>';
# Activity Monitor
CREATE DATABASE amon DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON amon.* TO 'amon'@'%' IDENTIFIED BY '<password>';
# Reports Manager
CREATE DATABASE rman DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON rman.* TO 'rman'@'%' IDENTIFIED BY '<password>';
CREATE DATABASE hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON hue.* TO 'hue'@'%' IDENTIFIED BY '<password>';
# Hive Metastore Server
CREATE DATABASE metastore DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON metastore.* TO 'hive'@'%' IDENTIFIED BY '<password>';
# Sentry Server
CREATE DATABASE sentry DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON sentry.* TO 'sentry'@'%' IDENTIFIED BY '<password>';
# Cloudera Navigator Audit Server
CREATE DATABASE nav DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON nav.* TO 'nav'@'%' IDENTIFIED BY '<password>';
# Cloudera Navigator Metadata Server
CREATE DATABASE navms DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON navms.* TO 'navms'@'%' IDENTIFIED BY '<password>';
CREATE DATABASE oozie DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON oozie.* TO 'oozie'@'%' IDENTIFIED BY '<password>';
Verify that all databases are available:
The output looks like the following example:
| Database |
| amon |
| hue |
| information_schema |
| metastore |
| mysql |
| nav |
| navms |
| oozie |
| performance_schema |
| rman |
| scm |
| sentry |
12 rows in set (0.00 sec)
Quit the MySQL shell once all tables are configured:
Open the MySQL configuration file
/etc/mysql/mariadb.conf.d/50-server.cnfin a text editor:
Comment-out the line
bind-addressby putting a
#in front of it to enable remote connections to the MariaDB server:Important:
Once uncommented the database server will be reachable from any machine connected to the Internet. It is recommended to limit the access to the IPs of the cluster by setting up a firewall on the machine.
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
#bind-address = 127.0.0.1
Save the file, exit the text editor and restart MariaDB:
service mysql restart
After installing and configuring the database server on the master machine, continue by setting up the Cloudera Manager Database. The application comes with a script that can automatically:
- Create the Cloudera Manager Server database configuration file.
- Configure a database for Cloudera Manager Server to use.
- Create and configure a user account for Cloudera Manager Server.
Run the script to initalize the database. The command requires that both, a database and a user called
scmare created in the previous step.
/opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm
When prompted enter the password for the user
Enter SCM password:
An output like the following should display on the screen:
Verifying that we can write to /etc/cloudera-scm-server
Creating SCM configuration file in /etc/cloudera-scm-server
Executing: /usr/lib/jvm/java-1.8.0-openjdk-amd64/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-java.jar:/opt/cloudera/cm/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db.
2019-07-25 16:47:58,307 [main] INFO com.cloudera.enterprise.dbutil.DbCommandExecutor - Successfully connected to database.
All done, your SCM database is configured correctly!
Start the Cloudera SCM Server:
systemctl start cloudera-scm-serverImportant:
Starting the Cloudera SCM Server may take some minutes. You can follow the startup process with the following command:
tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
The SCM server is ready when the following line appears:
INFO WebServerImpl:com.cloudera.server.cmf.WebServerImpl: Started Jetty server.
Continue the installation of CDH and other required software on the master host.
Open a web browser and go to
<cloudera_server_host>stands for the IP address of the Cloudera server.
The login screen displays:
Use these credentials to login:
The Welcome Screen displays, showing some basic information about Cloudera Manager. Click Continue to go to the next step of the installation progress.
The End User License Terms and Conditions displays. Read the document carefully. Once read, check the box Yes, I accept the End User License Terms and Conditions. and click Continue to proceed with the installation.
Select the license type:
- Cloudera Express: This version does not require a license, but provides a limited set of features.
- Cloudera Enterprise Trial: This version does not require a license, but expires after 60 days and cannot be renewed.
- Cloudera Enterprise: This version requires a license key.
This tutorial is about the Express version of Cloudera. You can upgrade the license to another version at any time if required.
Click Cloudera Express, then on Continue to go to the next step of the installation process and add a first cluster.
The following steps have to be run on the master machine:
The Add Cluster welcome screen displays, explaining the steps necessary to configure a new cluster in Cloudera Manager. Click Continue to go to the next step.
Enter the Cluster Name, then click Continue.
Specify the Hostnames of the machines forming the cluster. Set the SSH Port (Default: Port 22) and click Search. The machines will be displayed in a list below the form. Click Continue to move forward to the next step.
Select the Repository to be used for the installation of Cloudera Manager Agent, CDH and other software. Select the Public Cloudera Repository and choose the installation via Parcels. Parcels are a binary distribution format containing the program files, along with additional metadata used by Cloudera Manager. Click Continue to validate the form and to go to the next step.
In case the Oracle Java SE Development Kit (JDK 8) will be installed, read the license terms and tick the box to agree. Leave the box unchecked to continue using a manually installed OpenJDK on the cluster nodes. Click Continue to move forward to the next step.
Configure the Login credentials. Root access to all cluster hosts is required to install the Cloudera packages. Configure the
rootlogin by uploading the private key file, entering the passphrase (if any) and the SSH port (Default value: 22).
The automatic configuration of the machines in the cluster is launched. Once done, click Continue to proceed to the next step.
In this step CDH is being downloaded and deployed on all machines in the cluster:
Click Continue once the step has been done.
Continue to run the following steps on the master machine:
Choose one of the pre-defined service configurations, or select Custom to configure the services towards your particular needs. Click Essentials, then on Continue to setup a basic cluster.
Cloudera Manager assigns the roles automatically to the different machines in the cluster according to their performance. You may modify the distribution of the roles, but if assignments are made incorrectly it may have an impact on the performance of the whole cluster. Once you have validated the distribution of the roles, click Continue to move forward to the next step.
Configure the databases required for the different services. The database server is already pre-filled. Enter the passwords for the different databases as set previously. Test the connection to the database by clicking on Test connection:
Once the connection test is finished, click Continue to move to the next step.
Review the changes, the default values should be fine except if you want to run a specific configuration. Click Continue to execute the First Run command.
The First Run command runs a set of services for the first time to check that everything is working. This may take some time. Once the tests have been completed, click Continue to go to the next step.
Once the First Run has completed as message confirming the installation of the cluster appears. Click Finish to leave the installation wizard:
The Cloudera Manager Dashboard appears, providing an overview about the clusters status:
You now have a working Cloudera CDH cluster. You may refer to the official documentation for more information or continue with the tutorial Getting Started with Hadoop to learn how to use Cloudera CDH with Apache Hadoop.