Guide to Setup Hadoop on Windows and Integrating HDFS Connector With MuleSoft
996
0
·
2020/12/11
·
4 mins read
☕
WriterShelf™ is a unique multiple pen name blogging and forum platform. Protect relationships and your privacy. Take your writing in new directions. ** Join WriterShelf**
WriterShelf™ is an open writing platform. The views, information and opinions in this article are those of the author.
Article info
Categories:
Tags:
Date:
Published: 2020/12/11 - Updated: 2021/01/18
Total: 929 words
Like
or Dislike
More to explore
Introduction
Hadoop is a framework written in Java which is basically used for processing big data applications. Hadoop has two basic components:
1.1 Installing Hadoop on Windows
Step 1: Download & Install java 8 on your system.
Step 2: Download hadoop archive(Hadoop 3.1.0) zip file & extract it in C:\Users\DELL location.
Step 3: Right click on This PC. Click on Properties. Click on Advanced system settings. Click on the environment variables button.
Step 4: Create a new user variable called JAVA_HOME & put java installation directory path upto bin directory inside it.
Step 5: Create a new user variable called HADOOP_HOME & put hadoop extracted directory path inside it.
Step 6: Edit Path system variable. It should contain the HADOOP_HOME path as well as JAVA_HOME path.
Step 7: Go to hadoop-3.1.0 directory(extracted). Go to the etc/hadoop folder. Edit core-site.xml file using any editor. For the first time, you will get <configuration> </configuration> empty tag. You need to put the required configuration inside it.
Step 8: Go to hadoop main directory. Create a data folder inside it. Inside data folder create two subfolders namely namenode & datanode. Again go to etc/hadoop directory & edit hdfs-site.xml as follows:
Step 9: Edit mapred-site.xml file as follows:
Step 10: Edit yarn-site.xml file as follows:
Step 11: Edit hadoop-env.cmd file. Edit JAVA_HOME variable inside it. Put the java installation main directory path inside it. If the path contains spaces then put the path inside double quotes.
Step 12: Download hadoop windows compatible files from https://github.com/s911415/apache-hadoop-3.1.0-winutils. Learn more skills from Mulesoft Certification
Step 13: If you extract this folder, then you will get bin directory. Copy all the files from the bin folder. Either you can paste and replace all the files in the bin folder of hadoop directory or you can rename the bin folder from hadoop directory and paste the complete bin folder.
Step 14: Check java version using the command java -version.
Step 15: Check hadoop installation is done properly or not using a command: hadoop version
Step 16: Open the command prompt and do the command hdfs namenode -format to format the namenode. It is done only once when the hadoop is installed. If you do it again, it will delete all the data.
Step 17: Give read/write permission for access to namenode and datanode folder chmod command using windows utility installed inside bin folder of hadoop directory.
If you do not give the appropriate permission, then it will throw permission denied exception while running the hadoop daemons.
Step 18: Start hdfs and yarn using a command start-all.cmd by navigating inside sbin directory of hadoop or you can give two seperate commands start-dfs.cmd and start-yarn.cmd. It will open two new windows after making the start-dfs command. One window will show the start of a namenode and another window will show the start of a datanode. After making start-yarn command, two new windows will appear. One window will show the start of a resource manager and another will show the start of a nodemanager.
Step 19: You can view resource manager current jobs, finished jobs etc by visiting following link: http://localhost:8088/cluster
Step 20: You can view HDFS information by using following link: http://localhost:9870/
3.0 Integration HDFS connector With MuleSoft
HDFS connector in mule 4 is used to connect hadoop and mule applications. It is used to integrate HDFS operations with mule. In order to use this connector, you need to import it anypoint exchange. Various operations that can be performed using HDFS connector are:
Sample flow:
Make directories connector configuration:
You can test it using postman and verify it using dashboard.
Dashboard:
Sample flow:
Copy from local connector configuration:
Using the above connector, we can copy the abc.txt file from desktop to /abc /folder1 directory on HDFS. You can verify it using postman and user interface.
Postman output:
Sample flow:
Get metadata connector configuration:
You will get metadata of file abc.txt. You can verify it using a postman. You will get a json object as an output.
Sample flow:
Delete a file connector configuration:
Delete a file postman output:
You can verify this operation by viewing a dashboard. You will not be able to see the abc.txt file.
Sample flow:
Delete directory connector configuration:
Delete directory postman output:
You can verify whether a directory is deleted or not using a dashboard:
Folder1 is deleted from folder abc.