hdfs egress connector

HDFS Egress Connector

Version: 17.07

Supported Since: 17.07

What is HDFS Egress Connector?

The HDFS Egress Connector can be used to manipulate files in a Hadoop Distributed File System. This egress connector is an operation based connector. Hence, this egress connector MUST be used with a respective connector operation.

In order to use the HDFS Egress Connector, you must first select the HDFS Connector dependency from the connector list when you are creating an empty Ultra project. If you have already created a project, you can add the HDFS Connector dependency via Component Registry. From Tools menu, select Ultra Studio → Component Registry and from the Connectors list, select the HDFS Connector dependency.
hdfs egress outport

Out Ports

On Exception

The message will be emitted from this out port if the connector failed to perform the respective HDFS connector operation.

Response Processor

The original message will be emitted from this port if the connector operation is successful . You can use this port to connect any processing element or connector in-order to continue the flow.

Side Ports

Connector Operation

This port is used to connect operational elements specified below to the Egress Connector.

Parameters

* marked fields are mandatory

Host Name *

Basic

Host Name of the endpoint which has the HDFS installed. Usually, this can be found in the /etc/hadoop/core-site.xml configuration file.

Port *

Basic

Port of the endpoint which has the HDFS installed. Usually, this can be found in the /etc/hadoop/core-site.xml configuration file.

Connector Operations

HDFS Directory Creator

HDFS Directory Creator can be used to create a directory at the specified path in HDFS. If a directory already exists on the specified path or if the directory creation fails, message will be emitted from the Exceptional Out port of HDFS Egress Connector.

dir creator
Parameters

* marked fields are mandatory

Absolute Directory Path *

Basic

The absolute path of the directory which needs to be created. The value can be a constant or a placeholder expression which contains @{message.headers.<name>}, @{message.properties.<name>}, @{message.id}, @{mc.id}, @{mc.properties.<name>}, @{variable.<name>}, @{current.timestamp.<timestamp_format>}.

Ignore Existing Directory

Basic

Specify whether to ignore existing directory. If the value is True and there is already a directory with the specified name, it will be ignored. If the value is false and if there is already an existing directory, then an exception will be thrown. Default value is False.

HDFS Entry Renamer

This connector operation can be used to rename a file/directory located at the specified path. If rename fails, a message will be emitted from the Exceptional Out port of HDFS Egress Connector.

entry renamer
Parameters

* marked fields are mandatory

Old Entry Path *

Basic

The absolute path of the existing File or Directory which needs to be renamed. The value can be a constant or a placeholder expression which contains @{message.headers.<name>}, @{message.properties.<name>}, @{message.id}, @{mc.id}, @{mc.properties.<name>}, @{variable.<name>}, @{current.timestamp.<timestamp_format>}.

New Entry Path *

Basic

The absolute path of the new entry. The value can be a constant or a placeholder expression which contains @{message.headers.<name>}, @{message.properties.<name>}, @{message.id}, @{mc.id}, @{mc.properties.<name>}, @{variable.<name>}, @{current.timestamp.<timestamp_format>}.

HDFS Entry Meta-Data

HDFS Entry Meta-Data retriever can be used to extract the meta-data of file/directory located at the specified path in HDFS and sets those data as scope variables. If obtaining meta-data fails, message will be emitted from the Exceptional Out port of HDFS Egress Connector. Below list contains the name of the scope variables and the data it contains

ustudio.hdfs.group

Group for the file. The string could be empty if there is no notion of group of a file in a filesystem or if it could not be determined (rare).

ustudio.hdfs.access_time

The access time of file in milliseconds since January 1, 1970 UTC.

ustudio.hdfs.block_size

Get the block size of the file (Number of bytes).

ustudio.hdfs.length

Get the length of this file, in bytes.

ustudio.hdfs.modification_time

The modification time of file in milliseconds since January 1, 1970 UTC.

ustudio.hdfs.owner

Owner of the file. The string could be empty if there is no notion of owner of a file in a filesystem or if it could not be determined (rare).

ustudio.hdfs.is_directory

True if this is a directory

ustudio.hdfs.is_file

True if this is a file

ustudio.hdfs.last_access_time

The last access time of file in milliseconds since January 1, 1970 UTC.

ustudio.hdfs.is_encrypted

True if the underlying file is encrypted.

ustudio.hdfs.replication_count

The replication factor of a file.

get meta data
Parameters

* marked fields are mandatory

Absolute File/ Directory Path *

Basic

The absolute path of the file/directory where the meta-data needs to be obtained. The value can be a constant or a placeholder expression which contains @{message.headers.<name>}, @{message.properties.<name>}, @{message.id}, @{mc.id}, @{mc.properties.<name>}, @{variable.<name>}, @{current.timestamp.<timestamp_format>}.

HDFS Entry Deleter

HDFS Entry Deleter will Delete a file/directory located at the designated path in HDFS. If file/directory deletion fails, message will be emitted from the Exceptional Out port of HDFS Egress Connector.

entry delete
Parameters

* marked fields are mandatory

Absolute File/ Directory Path *

Basic

The absolute path of the file/directory which needs to be deleted. The value can be a constant or a placeholder expression which contains @{message.headers.<name>}, @{message.properties.<name>}, @{message.id}, @{mc.id}, @{mc.properties.<name>}, @{variable.<name>}, @{current.timestamp.<timestamp_format>}.

Delete Recursively

Basic

If the path is for a directory, setting this property as true will recursively delete all the content within the directory. If the value is false and the directory is not empty,the directory deletion will fail. Default value is false.

Copy File L2HDFS

This Connector operation can be used to copy local files into HDFS. If a file at the specified path already exists in HDFS, then that file will be replaced with the new content. If file copy fails, message will be emitted from the Exceptional Out port of HDFS Egress Connector.

copy local to hdfs
Parameters

* marked fields are mandatory

Absolute Local File Path *

Basic

The absolute path of the file in the local file system. The value can be a constant or a placeholder expression which contains @{message.headers.<name>}, @{message.properties.<name>}, @{message.id}, @{mc.id}, @{mc.properties.<name>}, @{variable.<name>}, @{current.timestamp.<timestamp_format>}.

Absolute Remote(HDFS) File Path *

Basic

The absolute path of the file in the HDFS file system. The value can be a constant or a placeholder expression which contains @{message.headers.<name>}, @{message.properties.<name>}, @{message.id}, @{mc.id}, @{mc.properties.<name>}, @{variable.<name>}, @{current.timestamp.<timestamp_format>}.

Delete Source File

Basic

Indicate whether to delete the source file. If the value is true, the source file will be deleted. Default value is false.

Copy File HDFS2L

This connector operation can be used to copy a HDFS file into Local File System. If a file at the specified path does not exists, or the file copy fails, message will be emitted from the Exceptional Outport of HDFS Egress Connector.

copy remote to local
Parameters

* marked fields are mandatory

Absolute Local File Path *

Basic

The absolute path of the file in the local file system. The value can be a constant or a placeholder expression which contains @{message.headers.<name>}, @{message.properties.<name>}, @{message.id}, @{mc.id}, @{mc.properties.<name>}, @{variable.<name>}, @{current.timestamp.<timestamp_format>}.

Absolute Remote(HDFS) File Path *

Basic

The absolute path of the file in the HDFS file system. The value can be a constant or a placeholder expression which contains @{message.headers.<name>}, @{message.properties.<name>}, @{message.id}, @{mc.id}, @{mc.properties.<name>}, @{variable.<name>}, @{current.timestamp.<timestamp_format>}.

Delete Source File

Basic

Indicate whether to delete the source file. If the value is true, the source file will be deleted. Default value is false.

HDFS Payload Saver

Payload Saver will write the current payload of the message into the specified HDFS file. If a file with the specified name already exists, the content will be replaced. If writing payload to file fails, message will be emitted from the Exceptional Outport of HDFS Egress Connector.

payload saver
Parameters

* marked fields are mandatory

File Path *

Basic

The absolute path of the file (in HDFS), where the payload of the message will be saved. The value can be a constant or a placeholder expression which contains @{message.headers.<name>}, @{message.properties.<name>}, @{message.id}, @{mc.id}, @{mc.properties.<name>}, @{variable.<name>}, @{current.timestamp.<timestamp_format>}.

Buffer Size

Basic

The buffer size (in bytes) to be used when copying the payload. Default value is 2048 bytes

Sample Use Case

Use case description

In this sample use case, let’s see how we can save the current payload of the message to a file in HDFS.

Implementation

To implement above use case, let’s create our integration flow named hdfs-payload-saver and get required components from the component registry. For this use case it’s required to get, HDFS Connector and HTTP NIO Connector from the component registry.

First, a NIO HTTP Ingress Connector should be added and configured to accept HTTP requests from the external HTTP Client. Depending on the requirement, HTTP port and service path can be configured in the HTTP ingress connector.

Then a HDFS Egress connector should be added. This egress connector should be configured with the details of HDFS host name and port details.

hdfs egress connector config

After that, attach a HDFS Payload Saver Connector Operation as the connector operation for the above HDFS egress connector to save the payload to a file in HDFS. This connector operation should be configured as follows.

  • File Path - /data/payload-@{message.id}.xml

payload saver config

With all above elements, complete flow would look as below,

hdfs complete flow

Now if you run the flow, and send a request to the configured HTTP endpoint with the required parameters, you will be able to see that a new file is created in HDFS as shown below.