Scaleway Documentationtutorials
use apache nifi migrate data

Jump toUpdate content

How to use Apache NiFi to migrate Object Storage data

Reviewed on 10 May 2021Published on 26 September 2019
  • Object-Storage
  • Apache
  • NiFi
  • data-migration
  • Terraform

Apache NiFi - Overview

Apache NiFi is a system to process and distribute data between different machines. It can transfer data and manages the transfer between different sources and destination systems. The tool provides a web interface to facilitate the design, management, and control of data transfers.

We are going to use Terraform to provision the machines and then configure Apache NiFi to do the data workflows we want.


Configuring Terraform

You should install Terraform and export several variables that are necessary to run Terraform.

  1. Download the terraform binary from the official page Once downloaded, copy it in your binary path.

  2. Set the following variables to be able to start an instance from Terraform.

    export SCALEWAY_REGION=fr-par

    Replace <ORG_ID>, <ACCESS_KEY> and <SECRET_KEY> with the credentials of your API key.

Running Terraform

  1. Open a text editor and edit the following plan to your needs. A plan is a file with a *.tf extension. For example:

    provider "scaleway" {
    zone = "fr-par-1"

    resource scaleway_instance_ip "nifi" {
    server_id =

    resource "scaleway_instance_security_group" "nifi" {
    name = "sg-nifi"

    inbound_default_policy = "drop"
    outbound_default_policy = "accept"

    inbound_rule {
    action = "accept"
    port = "22"
    ip_range = "<MY_PUBLIC_IP>/32"

    inbound_rule {
    action = "accept"
    port = "8080"
    ip_range = "<MY_PUBLIC_IP>/32"


    resource "scaleway_instance_server" "nifi" {
    name = "nifi"
    type = "DEV1-L"
    image = "ubuntu-bionic"
    tags = [ "nifi" ]

    security_group_id =

    connection {
    host = self.public_ip
    user = "root"

    provisioner "remote-exec" {
    inline = [
    "apt-get update",
    "DEBIAN_FRONTEND=noninteractive apt-get install -yq python openjdk-8-jre-headless",
    "mkdir -p /opt/",
    "wget -O /opt/nifi.tar.gz <APACHE_MIRROR>",
    "tar xzf /opt/nifi.tar.gz -C /opt/",
    "/opt/nifi-1.9.2/bin/ start",


    Replace <MY_PUBLIC_IP> by your public ip which can be found here, also <APACHE_MIRROR> by one of the URLs listed here.

  2. Then run the following commands:

    terraform init
    terraform apply

    It will generate a large output. Check out if any errors are displayed. If everything went well, proceed to the next step.

Configuring Apache NiFi

  1. Open the interface from your browser by getting the public IP of your instance by either getting it in the web console or running the following command terraform show | grep address. Open your browser by typing the following URL: http://<INSTANCE_PUBLIC_IP>:8080/nifi/ (Important: NiFi may need some time to start. If the page does not load, wait a few minutes before you retry). You will be greeted by NiFi’s interface.

  2. Once the interface displays proceed to the next step which is creating data flows using processors.

Synchronizing one bucket to another

To be able to synchronize a bucket to another, we will have to create a workflow.

  1. Start by dragging the Processor icon onto the workspace, it will prompt the following window.

  2. We first want to list the content of our bucket, so start by creating a ListS3 processor. Double click the processor or open the settings by clicking on the dented wheel in the Operate window and navigate to the Properties tab. We will start by creating an AWS Credential Provider service. This will permit to setup once our Credentials, click Create new service...

  3. It will prompt you with a new window in which you can define the name.

  4. Click the small arrow at the right of the line, it will open the configuration window. This will send you to the NiFi Flow Configuration window. You will be able to set your S3 <ACCESS_KEY> and S3 <SECRET_KEY> by clicking on the dented wheel at the right and entering them in the separate credential fields.


    Replace <ACCESS_KEY> and <SECRET_KEY> with the credentials of your API key.

  5. Once the credentials entered, return to the S3 List Processor settings, you will have to set three settings.

    • The endpoint URL (has to be set in each Processor)
    • The source or destination bucket name (has to be set in each Processor)
    • The listing method (only in S3 List Processor)
  6. Once these three options are set, create a FetchS3Object, configure it the same way, then create a PutS3Object and set the destination bucket name.

  7. Then connect all Processors using arrows and create loops for failures.

It should look the following way:

Then enable each Processor, and if there are no errors, it should start synchronizing from source bucket to destination bucket.

Combining several data sources

You can combine different fetch source to import or migrate your data to Scaleway Elements Object Storage.

Here is an example of a multi fetch with one upload: