Got it

Hive:Case 1: Table Creation and Data Query and Loading Using Hive

Latest reply: Aug 29, 2018 09:10:12 1048 1 0 0 0

1.1 Case 1: Table Creation and Data Query and Loading Using Hive

1.1.1 Scenarios

Applicable Versions

FusionInsight HD V100R002C70 and FusionInsight HD V100R002C80

Scenario Description

A user develops a Hive data analysis application for managing employee information described in Table 1-1 and Table 1-2.

Table 1-1 Employee information

No.

Name

Salary    Currency

Salary

Tax    Category

Work    Place

Hiring    Date

1

Wang

R

8000.01

Personal income tax&0.05

China: Shenzhen

2014

3

Tom

D

12000.02

Personal income tax&0.09

America: NewYork

2014

4

Jack

D

24000.03

Personal income tax&0.09

America: Manhattan

2014

6

Linda

D

36000.04

Personal income tax&0.09

America: NewYork

2014

8

Zhang

R

9000.05

Personal income tax&0.05

China: Shanghai

2014

 

Table 1-2 Employee contact information

No.

Mobile    Phone number

E-mail    Address

1

135 XXXX XXXX

xxxx@xx.com

3

159 XXXX XXXX

xxxxx@xx.com.cn

4

186 XXXX XXXX

xxxx@xx.org

6

189 XXXX XXXX

xxxx@xxx.cn

8

134 XXXX XXXX

xxxx@xxxx.cn

 

1.1.2 Development Idea

                               Step 1      Prepare data.

1.         Create three tables: employee information table employees_info, contact table employees_contact, and extension employee information table employees_info_extended.

           Fields in the employees_info table include the employee ID, name, salary currency, salary, tax category, work place, and hiring date. R indicates RMB, and D indicates USD.

           Fields in the employees_contact table include the employee ID, mobile phone number, and e-mail address.

           Fields in the employees_info_extended table include the employee ID, name, mobile phone number, e-mail address, salary currency, salary, tax category, and work place. The employee information is separated by the hiring date.

For table creation codes, see Creating a Table.

2.         Load employee information to employees_info.

For data loading codes, see Loading Data.

Table 1-1 describes employee information.

3.         Load employee contact information to employees_contact.

Table 1-1 describes employee contact information.

                               Step 2      Analyze data.

For data analysis codes, see Querying Data.

l   Query contact information of employees whose salaries are paid in USD.

l   Query the IDs and names of employees who were hired in 2014, and load query results to the partition with the hiring time of 2014 in employees_info_extended.

l   Collect statistics for the number of records in the employees_info table.

l   Query information about employees whose email addresses end with "cn".

                               Step 3      Submit a data analysis task to collect statistics for the number of records in the employees_info table.

For details about the implementation, see section 1.1.5 Example Program Guide.

----End

1.1.3 Example Code Description

Creating a Table

l   Function

This topic describes how to use Hive Query Language (HQL) to create internal and external tables. You can create a table in three modes:

           Define the table structure, and use the keyword EXTERNAL to differentiate between internal and external tables.

l   If all data is to be processed by Hive, create an internal table. When an internal table is deleted, the metadata and data in the table are deleted.

l   If data is to be processed by multiple tools (such as Pig), create an external table. When an external table is deleted, only the metadata is deleted.

           Create a table based on existing tables. Use CREATE LIKE to fully copy the original table structure, including the storage format.

           Create a table based on query results using CREATE AS SELECT.

In this mode, you can specify which fields are to be copied when copying the original table structure. The storage format is not copied.

note

To perform the following operations on a cluster with the security service enabled, you must have the creation permission for databases. To create a table using the CREATE AS SELECT statement, you must have the selection permission for tables.

l   Example Codes

-- Create an external table employees_info. 
CREATE EXTERNAL TABLE IF NOT EXISTS employees_info 

id INT, 
name STRING, 
usd_flag STRING, 
salary DOUBLE, 
deductions MAP<STRING, DOUBLE>, 
address STRING, 
entrytime STRING 

-- Specify the field delimiter. 
--  Use delimited fields terminated by to to specify the delimiter between columns to a comma (,). Use MAP KEYS TERMINATED BY to specify the delimiter between map keys to &
ROW FORMAT delimited fields terminated by ',' MAP KEYS TERMINATED BY '&'  
-- Set the storage format to TEXTFILE.   
STORED AS TEXTFILE;  
 
-- Use CREATE Like to create a table. 
CREATE TABLE employees_like LIKE employees_info; 
 
-- Use DESCRIBE to query the structures of employees_info, employees_like, and employees_as_select tables. 
DESCRIBE employees_info; 
DESCRIBE employees_like;

l   Extensions

           Create a partition.

A table may have one partition or multiple partitions. Each partition is saved as an independent folder in the table directory. Partitions help minimize the query scope, accelerate data query, and allow users to manage data based on certain criteria.

A partition is defined using the PARTITIONED BY clause during table creation.

 CREATE EXTERNAL TABLE IF NOT EXISTS employees_info_extended 
 ( 
 id INT, 
 name STRING, 
 usd_flag STRING,   
 salary DOUBLE, 
 deductions MAP<STRING, DOUBLE>, 
 address STRING 
 ) 
 -- Use PARTITIONED BY to specify the column name and data type of the partition. 
 PARTITIONED BY (entrytime STRING)  
 STORED AS TEXTFILE;    

           Update the table structure.

After a table is created, you can use ALTER TABLE to add or delete fields to or from the table, modify table attributes, and add partitions.

 -- Add the tel_phone and email fields to the employees_info_extended table. 
 ALTER TABLE employees_info_extended ADD COLUMNS (tel_phone STRING, email STRING);    

           Configure Hive data encryption when creating a table.

Set the table format to RCFile (recommended) or SequenceFile, and the encryption algorithm to ARC4Codec. SequenceFile is a unique Hadoop file format, and RCFile is a Hive file format with optimized column storage. For data query from tables with a large data size, RCFile provides higher performance than SequenceFile.

 set hive.exec.compress.output=true; 
 set hive.exec.compress.intermediate=true;   
 set hive.intermediate.compression.codec=org.apache.hadoop.io.encryption.arc4.ARC4Codec; 
 create table seq_Codec (key string, value string) stored as RCFile;

Loading Data

l   Function

This section describes how to use Hive Query Language (HQL) to load data to the existing employees_info table. You can learn how to load data from a local file system and FusionInsight HD cluster. LOCAL is used to differentiate between local and non-local data sources.

note

To perform the following operations on a cluster with the security service enabled, you must have the update permission for databases, owner permission and read/write permission for files to be loaded.

If LOCAL is used in data loading statements, data is loaded from a local directory. In addition to the update permission for tables, you must have the read permission for the data path. It is also required that the data can be accessed by user omm on the active HiveServer.

If OVERWRITE is used in data loading statements, the existing data in a table will be overwritten by new data. If OVERWRITE does not exist, data will be added to the table.

l   Example Codes

--  Load employee_info.txt from /opt/hive_examples_data/ in the local file system to the employees_info table.  
LOAD DATA LOCAL INPATH '/opt/hive_examples_data/employee_info.txt' OVERWRITE INTO TABLE employees_info; 
 
-- Load /user/hive_examples_data/employee_info.txt from the HDFS to the employees_info table.  
LOAD DATA INPATH '/user/hive_examples_data/employee_info.txt' OVERWRITE INTO TABLE employees_info;

note

Loading data is to copy data to a specified table in the HDFS.

l   Extensions

None

Querying Data

l   Function

This section describes how to use Hive Query Language (HQL) to query and analyze data. You can query and analyze data using the following methods:

           Use common features for SELECT query, such as JOIN.

           Load data to a specified partition.

           Use Hive-provided functions.

           Query and analyze data using user-defined functions (UDFs). For details about how to create and define UDFs, see section Case 2: UDFs"Case 2: UDFs."

note

To perform the following operations on a cluster with the security service enabled, you must have related permissions for tables.

l   Example Codes

-- Query the contact information of employees whose salaries are paid in USD.  
SELECT  
a.name,  
b.tel_phone,  
b.email  
FROM employees_info a JOIN employees_contact b  ON(a.id = b.id) WHERE usd_flag='D'; 
 
-- Query the IDs and names of employees who were hired in 2014, and load query results to the partition with the hiring time of 2014 in the employees_info_extended table.  
INSERT OVERWRITE TABLE employees_info_extended PARTITION (entrytime = '2014')  
SELECT  
a.id,  
a.name,  
a.usd_flag,  
a.salary,  
a.deductions,  
a.address, 
b.tel_phone, 
b.email  
FROM employees_info a JOIN employees_contact b ON (a.id = b.id) WHERE a.entrytime = '2014'; 
 
-- Use the existing Hive function COUNT() to count the number of records in the employees_info table. 
SELECT COUNT(*) FROM employees_info; 
 
--  Query information about employees whose email addresses end with "cn". 
SELECT a.name, b.tel_phone FROM  employees_info a JOIN employees_contact b ON (a.id = b.id) WHERE b.email like '%cn';

l   Extensions

           Configure intermediate Hive data encryption.

Set the table format to RCFile (recommended) or SequenceFile, and the encryption algorithm to ARC4Codec. SequenceFile is a unique Hadoop file format, and RCFile is a Hive file format with optimized column storage. For data query from tables with a large data size, RCFile provides higher performance than SequenceFile.

 set hive.exec.compress.output=true; 
 set hive.exec.compress.intermediate=true; 
 set hive.intermediate.compression.codec=org.apache.hadoop.io.encryption.arc4.ARC4Codec;

           For details about UDFs, see section Case 2: UDFs.

1.1.4 Obtaining Example Codes

By FusionInsight Client

Decompress the FusionInsight client installation package and obtain the example projects jdbc-examples, jdbc-examples-multizk, python-examples, and HCatalog-examples, which are saved in the Hive sub-folder of the FusionInsight_Services_ClientConfig folder.

By Maven Project

Log in to Huawei DevClod (https://codehub-cn-south-1.devcloud.huaweicloud.com/codehub/7076065/home) to download code udner components/hive to the local PC.

1.1.5 Example Program Guide

Function

This section describes how to use an example program to complete an analysis task. An example program can submit a task by using the following methods:

l   Submitting a data analysis task by using JDBC interfaces

l   Submitting a data analysis task by using Python

l   Submitting a data analysis task by using ODBC interfaces

Example Codes

l   Submit a data analysis task using the Hive Java database connectivity (JDBC) interface, that is, JDBCExample.java.

a.         Read the property file of the HiveServer client. The hiveclient.properties file is saved in the conf directory of the JDBC example program provided by Hive.

Properties clientInfo = null; 
String userdir = System.getProperty("user.dir") + File.separator 
+ "conf" + File.separator; 
InputStream fileInputStream = null; 
try{ 
clientInfo = new Properties(); 
//hiveclient.properties indicates the client configuration file. If the multiple instances feature is used, the file must be replaced with the hiveclient.properties file on the instance client. 
//hiveclient.properties is located under the config directory of the directory where the instance client installation package is decompressed. 
String hiveclientProp = userdir + "hiveclient.properties" ; 
File propertiesFile = new File(hiveclientProp); 
fileInputStream = new FileInputStream(propertiesFile); 
clientInfo.load(fileInputStream); 
}catch (Exception e) { 
throw new IOException(e); 
}finally{ 
if(fileInputStream != null){ 
fileInputStream.close(); 
fileInputStream = null; 

}

b.         Obtain the IP address and port list of ZooKeeper, the cluster authentication mode, the SASL configuration of HiveServers, node names of HiveServers in ZooKeeper, the discovery mode from the client to the server, and the principal server process for user authentication. You can read all these configurations from the hiveclient.properties file.

    //The format of zkQuorum is xxx.xxx.xxx.xxx:24002,xxx.xxx.xxx.xxx:24002,xxx.xxx.xxx.xxx:24002";   
  //xxx.xxx.xxx.xxx is the IP address of the node where ZooKeeper resides. The default port is 24002.  
 
    zkQuorum = clientInfo.getProperty("zk.quorum"); 
    auth = clientInfo.getProperty("auth"); 
    sasl_qop = clientInfo.getProperty("sasl.qop"); 
    zooKeeperNamespace = clientInfo.getProperty("zooKeeperNamespace"); 
    serviceDiscoveryMode = clientInfo.getProperty("serviceDiscoveryMode"); 
    principal = clientInfo.getProperty("principal");  
    

c.         In security mode, the Kerberos user and keytab file path are required for login authentication. For details about how to obtain USER_NAME, USER_KEYTAB_FILE, and KRB5_FILE, see 1.1.6.1 Running JDBC and Viewing Results.

    // Set the userName of new user. 
    USER_NAME = "xxx"; 
    // Set the keytab and krb5 files location of client. 
    String userdir = System.getProperty("user.dir") + File.separator  
            + "conf" + File.separator; 
    USER_KEYTAB_FILE = userdir + "user.keytab"; 
    KRB5_FILE = userdir + "krb5.conf";

d.         Define HQL. HQL must be a single statement and cannot contain semicolons (;).

     // Define HQL. HQL cannot contain ";"  
     String[] sqls = {"CREATE TABLE IF NOT EXISTS employees_info(id INT,name STRING)",  
            "SELECT COUNT(*) FROM employees_info", "DROP TABLE employees_info"};    

e.         Build JDBC URL.

note

You can also implement pre-authentication without the need of providing the account and keytab file path. For details, see JDBC code example 2 in Development Specifications > Hive > Examples. If IBM JDK is used to run Hive applications, pre-authentication in JDBC example code 2 must be implemented.

The following is an example of the JDBC URL composed of code snippets:

jdbc:hive2://xxx.xxx.xxx.xxx:24002,xxx.xxx.xxx.xxx:24002,xxx.xxx.xxx.xxx:24002/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;sasl.qop=auth-conf;auth=KERBEROS;principal=hive/hadoop.hadoop.com@HADOOP.COM;

     // Concat JDBC URL  
     StringBuilder sBuilder = new StringBuilder(  
         "jdbc:hive2://").append(zkQuorum).append("/");  
       
     if ("KERBEROS".equalsIgnoreCase(auth)) {  
      sBuilder.append(";serviceDiscoveryMode=")  
              .append(serviceDiscoveryMode) 
              .append(";zooKeeperNamespace=") 
              .append(zooKeeperNamespace) 
              .append(";sasl.qop=") 
              .append(sasl_qop) 
              .append(";auth=") 
              .append(auth) 
              .append(";principal=") 
              .append(principal) 
                          .append(";user.principal=") 
              .append(USER_NAME) 
              .append(";user.keytab=") 
              .append(USER_KEYTAB_FILE) 
              .append(";"); 
                
     } else { 
       // Normal mode  
       sBuilder.append(";serviceDiscoveryMode=") 
               .append(serviceDiscoveryMode)  
               .append(";zooKeeperNamespace=")  
               .append(zooKeeperNamespace) 
               .append(";auth=none;");  
     }  
     String url = sBuilder.toString(); 

f.          Load the Hive JDBC driver.

  
 // Load the Hive JDBC driver.  
 Class.forName(HIVE_DRIVER);

g.         Obtain the JDBC connection, confirm the HQL type (DDL/DML), call ports to run the HQL statement, return the queried column name and results to the console, and close the JDBC connection.

  
 Connection connection = null;  
     try {  
       // Obtain the JDBC connection.  
       // If the normal mode is used, the second parameter needs to be set to a correct username. Otherwise, the anonymous user will be used for login. 
       connection = DriverManager.getConnection(url, "", "");  
           
     // Create a table  
     // To import data to a table after the table is created, you can use the LOAD statement. For example, import data from the HDFS to the table.  
       //load data inpath '/tmp/employees.txt' overwrite into table employees_info;  
       execDDL(connection,sqls[0]);  
       System.out.println("Create table success!");  
          
     // Query  
     execDML(connection,sqls[1]);  
 
     // Delete the table 
     execDDL(connection,sqls[2]);  
     System.out.println("Delete table success!");  
 }  
 finally { 
     // Close the JDBC connection.  
       if (null != connection) {  
         connection.close();  
       } 
 
 public static void execDDL(Connection connection, String sql)  
 throws SQLException {   
     PreparedStatement statement = null;  
     try {  
       statement = connection.prepareStatement(sql);  
       statement.execute();  
     }  
     finally {  
       if (null != statement) {  
         statement.close();  
       }  
     }  
 }  
 
 
 public static void execDML(Connection connection, String sql) throws SQLException {  
     PreparedStatement statement = null;  
     ResultSet resultSet = null;  
     ResultSetMetaData resultMetaData = null;  
       
     try {  
       // Run the HQL statement.  
       statement = connection.prepareStatement(sql);  
       resultSet = statement.executeQuery();  
         
       // Return the queried column name to the console.  
       resultMetaData = resultSet.getMetaData();  
       int columnCount = resultMetaData.getColumnCount();  
       for (int i = 1; i <= columnCount; i++) {  
         System.out.print(resultMetaData.getColumnLabel(i) + '\t');  
       }  
       System.out.println();  
         
       // Return the results to the console.  
       while (resultSet.next()) {  
         for (int i = 1; i <= columnCount; i++) {  
           System.out.print(resultSet.getString(i) + '\t');  
         }  
         System.out.println();  
       }  
     }  
     finally {  
       if (null != resultSet) {  
         resultSet.close();  
       }  
         
       if (null != statement) {  
         statement.close();  
       }  
     }  
 }    

n   Submit a data analysis task using the Python interface, that is, python-examples/pyCLI_sec.py. The authentication mode of the cluster to which the example program connects is the secure mode. Before running the example program, run the kinit command to authenticate the Kerberos user with related rights.

h.         Import the HAConnection class.

  
 from pyhs2.haconnection import HAConnection    

i.          Declare the HiveServer IP address list. In this example, hosts indicate the nodes of HiveServer, and xxx.xxx.xxx.xxx indicates the business IP address.

  
 hosts = ["xxx.xxx.xxx.xxx", "xxx.xxx.xxx.xxx"]  
 

note

l  If the HiveServer instance is migrated, the original sample program is invalid. You need to update the IP address of the HiveServer after the migration of the HiveServer instance used in the sample program.

l  If multiple Hive instances are used, update the IP address based on the address of the instance that is actually connected.

j.          Configure the Kerberos host name and service name. In this example, the Kerberos host name is hadoop and service name is hive.

  
 conf = {"krb_host":"hadoop.hadoop.com", "krb_service":"hive"}    

note

If multiple Hive instances are used, change the service name based on the cluster that is actually connected. For example, if the Hive1 instance is connected, change the service name to hive1.

k.         Create a connection, run the HQL statement, and return the queried column name and results to the console.

  
 try:  
       with HAConnection(hosts = hosts,  
                          port = 21066,  
                          authMechanism = "KERBEROS",  
                          configuration = conf) as haConn:  
           with haConn.getConnection() as conn:  
               with conn.cursor() as cur:  
                   # show databases  
                   print cur.getdatabases()  
                     
                   # execute query  
                   cur.execute("show tables")  
                     
                   # return column info from query  
                   print cur.getschema()  
                      
                   # fetch table results  
                   for i in cur.fetch():  
                       print i  
                         
 except exception, e:  
       print e  
     

note

If multiple Hive instances are used, you need to modify hosts according to the description in b and change the port number based on to the actual port number. The default ports of Hive to Hive 4 are 21066 to 21070, respectively.

l   Submit a data analysis task using Hive ODBC interfaces. For details, see hiveodbcexample.c in the example program.

The value of DSN in the code must the same as the data source name configured, for example, hiveodbcs.

SQLCHAR* ConnStrIn = (SQLCHAR *)"DSN=hiveodbcs";

1.1.6 Commissioning Applications

1.1.6.1 Running JDBC and Viewing Results

Running JDBC in CLI Mode

                               Step 1      Right-click the Eclipse jdbc-examples project, and choose Export from the shortcut menu, as shown in Figure 1-5.

note

The preceding project name is for reference only. Use the actual project name in an actual scenario.

Figure 1-1 Choosing Export from the Eclipse shortcut menu

20180829175628176006.png

 

                               Step 2      In the displayed export panel, select JAR file, and click Next, as shown in Figure 1-6.

Figure 1-2 Exporting JAR files

20180829175629211007.png

 

                               Step 3      Select the src directory, select the full path for storing exported files, and click Finish, as shown in Figure 1-7.

Figure 1-3 Selecting the full path for storing exported files

20180829175630688008.png

 

                               Step 4      Create a directory on the Windows or Linux as the running directory, such as D:\jdbc_example (Windows) or /opt/jdbc_example (Linux), and create the conf and lib sub-directories in the directory.

Copy all the JAR files and jdbc-examples.jar exported in Step 3 from the lib directory of the jdbc-examples project to the lib subdirectory.

Copy all files from the conf directory of jdbc-examples to the conf directory.

                               Step 5      In Windows, run the following command:

cd /d d:\jdbc_example

java -cp .;D:\jdbc_example\conf;D:\jdbc_example\lib\* com.huawei.bigdata.hive.example.JDBCExample

In Linux, run the following command:

chmod +x /opt/jdbc_example -R

cd /opt/jdbc_example

java -cp .:/opt/jdbc_example/conf:/opt/jdbc_example/lib/* com.huawei.bigdata.hive.example.JDBCExample

                               Step 6      In the CLI, view the HQL query results in the example codes.

The following information is displayed if the example project is successful in Windows:

Create table success! 
_c0 

Delete table success!

The following information is displayed if the example project is successful in Linux:

Create table success! 
_c0 

Delete table success!

----End

Running JDBC in Eclipse Mode

                               Step 1      Right-click the JDBCExample class in the Eclipse jdbc-examples project, and choose Run As > Java Application from the shortcut menu.

                               Step 2      In the Eclipse output window, view the HQL query results in the example codes.

Create table success! 
_c0 

Delete table success!

----End

1.1.6.2 Running HCatalog and Viewing Results

Running HCatalog Example Projects

                               Step 1      Right-click the Eclipse HCatalog-examples project, and choose Export from the shortcut menu, as shown in the following figure.

20180829175631208009.png

                               Step 2      In the displayed export panel, select JAR file, and click Next, as shown in the following figure.

20180829175632093010.png

                               Step 3      Select the src directory. In JAR file, select the full path of the files to be exported, set the name of the JAR package to be exported to hcat.jar, and click Finish, as shown in the following figure.

20180829175633826011.png

                               Step 4      Uploaded the hcat.jar package generated in the previous step to a specified path on Linux, for example, /opt/hive_client, marked as $HCAT_CLIENT, and ensure that the Hive and YARN clients have been installed.

                               Step 5      Run the following command to configure environment parameters (client installation path /opt/client is used as an example):

export HADOOP_HOME=/opt/client/HDFS/hadoop  
 export HIVE_HOME=/opt/client/Hive/Beeline  
 export HCAT_HOME=$HIVE_HOME/../HCatalog  
 export LIB_JARS=$HCAT_HOME/lib/hive-hcatalog-core-1.3.0.jar,$HCAT_HOME/lib/hive-metastore-1.3.0.jar,$HIVE_HOME/lib/hive-exec-1.3.0.jar,$HCAT_HOME/lib/libfb303-0.9.3.jar,$HCAT_HOME/lib/slf4j-api-1.7.5.jar,$HCAT_HOME/lib/antlr-2.7.7.jar,$HCAT_HOME/lib/jdo-api-3.0.1.jar,$HCAT_HOME/lib/antlr-runtime-3.4.jar,$HCAT_HOME/lib/datanucleus-api-jdo-3.2.6.jar,$HCAT_HOME/lib/datanucleus-core-3.2.10.jar,$HCAT_HOME/lib/datanucleus-rdbms-3.2.9.jar  
 export HADOOP_CLASSPATH=$HCAT_HOME/lib/hive-hcatalog-core-1.3.0.jar:$HCAT_HOME/lib/hive-metastore-1.3.0.jar:$HIVE_HOME/lib/hive-exec-1.3.0.jar:$HCAT_HOME/lib/libfb303-0.9.3.jar:$HADOOP_HOME/etc/hadoop:$HCAT_HOME/conf:$HCAT_HOME/lib/slf4j-api-1.7.5.jar:$HCAT_HOME/lib/antlr-2.7.7.jar:$HCAT_HOME/lib/jdo-api-3.0.1.jar:$HCAT_HOME/lib/antlr-runtime-3.4.jar:$HCAT_HOME/lib/datanucleus-api-jdo-3.2.6.jar:$HCAT_HOME/lib/datanucleus-core-3.2.10.jar:$HCAT_HOME/lib/datanucleus-rdbms-3.2.9.jar

note

If the multi-instance function is enabled for Hive, perform the configuration in export HIVE_HOME=/opt/client/Hive/Beeline. For example, if Hive 1 is used, ensure that the Hive 1 client has been installed before using it. Change the value of export HIVE_HOME to /opt/client/Hive1/Beeline.

                               Step 6      Prepare for the running:

1.         Use the Hive client to create source table t1 in beeline: create table t1(col1 int);

Insert the following data into t1:

  
    +----------+--+  
    | t1.col1  |  
    +----------+--+  
    | 1        |  
    | 1        |  
    | 1        |  
    | 2        |   
    | 2        |  
    | 3        |    

2.         Create destination table t2: create table t2(col1 int,col2 int);

                               Step 7      Use the Yarn client to submit tasks:

yarn --config $HADOOP_HOME/etc/hadoop jar $HCAT_CLIENT/hcat.jar com.huawei.bigdata.hcatalog.example.HCatalogExample -libjars $LIB_JARS t1 t2

                               Step 8      View the running result. The data in t2 is as follows:

0: jdbc:hive2://192.168.1.18:24002,192.168.1.> select * from t2;  
 +----------+----------+--+  
 | t2.col1  | t2.col2  |  
 +----------+----------+--+  
 | 1        | 3        |  
 | 2        | 2        |  
 | 3        | 1        |  
 +----------+----------+--+    

----End

1.1.6.3 Running Python and Viewing Results

Running Python in CLI Mode

                               Step 1      Grant the execution permission for the script of python-examples folder. Run the following command in the CLI:

chmod +x python-examples -R.

                               Step 2      For security cluster connections, enter the service plane IP address of the node where HiveServer is installed in the hosts array of python-examples/pyCLI_sec.py.

note

When the multi-instance is enabled: In python-examples/pyCLI_sec.py, the hosts array must be modified. In addition, the port must be set according to the used instance. The port (hive.server2.thrift.port) is used for Hive to provide the Thrift service. For example, if the Hive 1 instance is used, port must be set to 21067. (The default ports of Hive to Hive 4 is 21066 to 21070, respectively).

                               Step 3      Run the kinit command to obtain the cache for Kerberos authentication.

Use the developer account created in Preparing the Developer Account section to run the following commands to run the client program:

kinit -kt keytabstorage path username

cd python-examples

python pyCLI_sec.py

                               Step 4      In the CLI, view the HQL query results in the example codes. For example:

[['default', '']]  
 [{'comment': 'from deserializer', 'columnName': 'tab_name', 'type': 'STRING_TYPE'}]  
 ['xx']    

----End

1.1.6.4 Running ODBC and Viewing Results

1.2.2.6.4.1 Windows Environment

Running ODBC Example Projects

                               Step 1      Open the HiveODBCExample folder under the downloaded ODBC driver package.

                               Step 2      Double-click HiveODBCExample.sln and open the HiveODBCExample project in Visual Studio 2012.

                               Step 3      On the navigation bar, select the HiveODBCExample project and press F5 to compile and run the example code.

                               Step 4      After the example code is successfully compiled and executed, all database names in the connected Hive are printed.

note

As the running is fast, you can add breakpoints to the main function of hiveodbcexample.c and View the operation result.

----End

1.2.2.6.4.2 Linux Environment

Using the ISQL Tool of unixODBC for Testing

No authentication is available in the Hive ODBC driver. You must manually run the kinit command for login.

                               Step 1      Use the prepared developer, for example, developuser for login:

kinit developuser

                               Step 2      Download the unixODBC tool from the official website and install it to a directory, such as /opt/odbc/. Run the isql command:

export unixODBC_PATH=/opt/odbc/unixODBC/ 
export LD_LIBRARY_PATH=${unixODBC_PATH}/lib/:$LD_LIBRARY_PATH 
export LD_PRELOAD=/usr/lib64/libodbchive.so 
cd ${unixODBC_PATH}/bin 
./isql -v Hive

${unixODBC_PATH} indicates the installation directory of unixODBC.

                               Step 3      If the following information is displayed, the login is successful:

+---------------------------------------+ 
| Connected!                            | 
|                                       | 
| sql-statement                         | 
| help [tablename]                      | 
| quit                                  | 
|                                       | 
+---------------------------------------+ 
SQL>

                               Step 4      Use the tool for Hive operations, for example, show tables. The result is as follows:

SQL> show tables; 
+-------------------------------------------------------------------------------------------------------+ 
| tab_name                            | 
+-------------------------------------------------------------------------------------------------------+ 
| encode_test                         | 
| encode_test2                        | 
| hbase_t                             | 
| sales                               | 
| things                              | 
+-------------------------------------------------------------------------------------------------------+

note

Data types tinyint and boolean cannot be displayed properly because the ISQL tool does not support them perfectly.

----End

 


This post was last edited by chz at 2018-08-29 09:56.

This article contains more resources

You need to log in to download or view. No account? Register

x

welcome
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.