1.1 Case 1: Table Creation and Data Query and Loading Using Hive
1.1.1 Scenarios
Applicable Versions
FusionInsight HD V100R002C70 and FusionInsight HD V100R002C80
Scenario Description
A user develops a Hive data analysis application for managing employee information described in Table 1-1 and Table 1-2.
Table 1-1 Employee information
|
No. |
Name |
Salary Currency |
Salary |
Tax Category |
Work Place |
Hiring Date |
|
1 |
Wang |
R |
8000.01 |
Personal income tax&0.05 |
China: Shenzhen |
2014 |
|
3 |
Tom |
D |
12000.02 |
Personal income tax&0.09 |
America: NewYork |
2014 |
|
4 |
Jack |
D |
24000.03 |
Personal income tax&0.09 |
America: Manhattan |
2014 |
|
6 |
Linda |
D |
36000.04 |
Personal income tax&0.09 |
America: NewYork |
2014 |
|
8 |
Zhang |
R |
9000.05 |
Personal income tax&0.05 |
China: Shanghai |
2014 |
Table 1-2 Employee contact information
|
No. |
Mobile Phone number |
E-mail Address |
|
1 |
135 XXXX XXXX |
xxxx@xx.com |
|
3 |
159 XXXX XXXX |
xxxxx@xx.com.cn |
|
4 |
186 XXXX XXXX |
xxxx@xx.org |
|
6 |
189 XXXX XXXX |
xxxx@xxx.cn |
|
8 |
134 XXXX XXXX |
xxxx@xxxx.cn |
1.1.2 Development Idea
Step 1 Prepare data.
1. Create three tables: employee information table employees_info, contact table employees_contact, and extension employee information table employees_info_extended.
− Fields in the employees_info table include the employee ID, name, salary currency, salary, tax category, work place, and hiring date. R indicates RMB, and D indicates USD.
− Fields in the employees_contact table include the employee ID, mobile phone number, and e-mail address.
− Fields in the employees_info_extended table include the employee ID, name, mobile phone number, e-mail address, salary currency, salary, tax category, and work place. The employee information is separated by the hiring date.
For table creation codes, see Creating a Table.
2. Load employee information to employees_info.
For data loading codes, see Loading Data.
Table 1-1 describes employee information.
3. Load employee contact information to employees_contact.
Table 1-1 describes employee contact information.
Step 2 Analyze data.
For data analysis codes, see Querying Data.
l Query contact information of employees whose salaries are paid in USD.
l Query the IDs and names of employees who were hired in 2014, and load query results to the partition with the hiring time of 2014 in employees_info_extended.
l Collect statistics for the number of records in the employees_info table.
l Query information about employees whose email addresses end with "cn".
Step 3 Submit a data analysis task to collect statistics for the number of records in the employees_info table.
For details about the implementation, see section 1.1.5 Example Program Guide.
----End
1.1.3 Example Code Description
Creating a Table
l Function
This topic describes how to use Hive Query Language (HQL) to create internal and external tables. You can create a table in three modes:
− Define the table structure, and use the keyword EXTERNAL to differentiate between internal and external tables.
l If all data is to be processed by Hive, create an internal table. When an internal table is deleted, the metadata and data in the table are deleted.
l If data is to be processed by multiple tools (such as Pig), create an external table. When an external table is deleted, only the metadata is deleted.
− Create a table based on existing tables. Use CREATE LIKE to fully copy the original table structure, including the storage format.
− Create a table based on query results using CREATE AS SELECT.
In this mode, you can specify which fields are to be copied when copying the original table structure. The storage format is not copied.
![]()
To perform the following operations on a cluster with the security service enabled, you must have the creation permission for databases. To create a table using the CREATE AS SELECT statement, you must have the selection permission for tables.
l Example Codes
-- Create an external table
employees_info.
CREATE EXTERNAL TABLE IF NOT EXISTS employees_info
(
id INT,
name STRING,
usd_flag STRING,
salary DOUBLE,
deductions MAP<STRING, DOUBLE>,
address STRING,
entrytime STRING
)
-- Specify the field delimiter.
-- Use delimited fields terminated by to to specify the delimiter
between columns to a comma (,). Use MAP KEYS TERMINATED BY to specify
the delimiter between map keys to &.
ROW FORMAT delimited fields terminated by ',' MAP KEYS TERMINATED BY
'&'
-- Set the storage format to TEXTFILE.
STORED AS TEXTFILE;
-- Use CREATE Like to create a table.
CREATE TABLE employees_like LIKE employees_info;
-- Use DESCRIBE to query the structures of employees_info, employees_like, and
employees_as_select tables.
DESCRIBE employees_info;
DESCRIBE employees_like;
l Extensions
− Create a partition.
A table may have one partition or multiple partitions. Each partition is saved as an independent folder in the table directory. Partitions help minimize the query scope, accelerate data query, and allow users to manage data based on certain criteria.
A partition is defined using the PARTITIONED BY clause during table creation.
CREATE EXTERNAL TABLE IF NOT
EXISTS employees_info_extended
(
id INT,
name STRING,
usd_flag STRING,
salary DOUBLE,
deductions MAP<STRING, DOUBLE>,
address STRING
)
-- Use PARTITIONED BY to specify the column name and data type of
the partition.
PARTITIONED BY (entrytime STRING)
STORED AS TEXTFILE;
− Update the table structure.
After a table is created, you can use ALTER TABLE to add or delete fields to or from the table, modify table attributes, and add partitions.
-- Add the tel_phone
and email fields to the employees_info_extended table.
ALTER TABLE employees_info_extended ADD COLUMNS (tel_phone STRING, email
STRING);
− Configure Hive data encryption when creating a table.
Set the table format to RCFile (recommended) or SequenceFile, and the encryption algorithm to ARC4Codec. SequenceFile is a unique Hadoop file format, and RCFile is a Hive file format with optimized column storage. For data query from tables with a large data size, RCFile provides higher performance than SequenceFile.
set
hive.exec.compress.output=true;
set hive.exec.compress.intermediate=true;
set hive.intermediate.compression.codec=org.apache.hadoop.io.encryption.arc4.ARC4Codec;
create table seq_Codec (key string, value string) stored as RCFile;
Loading Data
l Function
This section describes how to use Hive Query Language (HQL) to load data to the existing employees_info table. You can learn how to load data from a local file system and FusionInsight HD cluster. LOCAL is used to differentiate between local and non-local data sources.
![]()
To perform the following operations on a cluster with the security service enabled, you must have the update permission for databases, owner permission and read/write permission for files to be loaded.
If LOCAL is used in data loading statements, data is loaded from a local directory. In addition to the update permission for tables, you must have the read permission for the data path. It is also required that the data can be accessed by user omm on the active HiveServer.
If OVERWRITE is used in data loading statements, the existing data in a table will be overwritten by new data. If OVERWRITE does not exist, data will be added to the table.
l Example Codes
-- Load employee_info.txt
from /opt/hive_examples_data/ in the local file system to the employees_info
table.
LOAD DATA LOCAL INPATH '/opt/hive_examples_data/employee_info.txt' OVERWRITE
INTO TABLE employees_info;
-- Load /user/hive_examples_data/employee_info.txt from the HDFS to the
employees_info table.
LOAD DATA INPATH '/user/hive_examples_data/employee_info.txt' OVERWRITE INTO
TABLE employees_info;
![]()
Loading data is to copy data to a specified table in the HDFS.
l Extensions
None
Querying Data
l Function
This section describes how to use Hive Query Language (HQL) to query and analyze data. You can query and analyze data using the following methods:
− Use common features for SELECT query, such as JOIN.
− Load data to a specified partition.
− Use Hive-provided functions.
− Query and analyze data using user-defined functions (UDFs). For details about how to create and define UDFs, see section Case 2: UDFs"Case 2: UDFs."
![]()
To perform the following operations on a cluster with the security service enabled, you must have related permissions for tables.
l Example Codes
-- Query the contact information of
employees whose salaries are paid in USD.
SELECT
a.name,
b.tel_phone,
b.email
FROM employees_info a JOIN employees_contact b ON(a.id = b.id) WHERE
usd_flag='D';
-- Query the IDs and names of employees who were hired in 2014, and load query
results to the partition with the hiring time of 2014 in the
employees_info_extended table.
INSERT OVERWRITE TABLE employees_info_extended PARTITION (entrytime =
'2014')
SELECT
a.id,
a.name,
a.usd_flag,
a.salary,
a.deductions,
a.address,
b.tel_phone,
b.email
FROM employees_info a JOIN employees_contact b ON (a.id = b.id) WHERE
a.entrytime = '2014';
-- Use the existing Hive function COUNT() to count the number of records in the
employees_info table.
SELECT COUNT(*) FROM employees_info;
-- Query information about employees whose email addresses end with
"cn".
SELECT a.name, b.tel_phone FROM employees_info a JOIN employees_contact b
ON (a.id = b.id) WHERE b.email like '%cn';
l Extensions
− Configure intermediate Hive data encryption.
Set the table format to RCFile (recommended) or SequenceFile, and the encryption algorithm to ARC4Codec. SequenceFile is a unique Hadoop file format, and RCFile is a Hive file format with optimized column storage. For data query from tables with a large data size, RCFile provides higher performance than SequenceFile.
set hive.exec.compress.output=true;
set hive.exec.compress.intermediate=true;
set
hive.intermediate.compression.codec=org.apache.hadoop.io.encryption.arc4.ARC4Codec;
− For details about UDFs, see section Case 2: UDFs.
1.1.4 Obtaining Example Codes
By FusionInsight Client
Decompress the FusionInsight client installation package and obtain the example projects jdbc-examples, jdbc-examples-multizk, python-examples, and HCatalog-examples, which are saved in the Hive sub-folder of the FusionInsight_Services_ClientConfig folder.
By Maven Project
Log in to Huawei DevClod (https://codehub-cn-south-1.devcloud.huaweicloud.com/codehub/7076065/home) to download code udner components/hive to the local PC.
1.1.5 Example Program Guide
Function
This section describes how to use an example program to complete an analysis task. An example program can submit a task by using the following methods:
l Submitting a data analysis task by using JDBC interfaces
l Submitting a data analysis task by using Python
l Submitting a data analysis task by using ODBC interfaces
Example Codes
l Submit a data analysis task using the Hive Java database connectivity (JDBC) interface, that is, JDBCExample.java.
a. Read the property file of the HiveServer client. The hiveclient.properties file is saved in the conf directory of the JDBC example program provided by Hive.
Properties clientInfo = null;
String userdir = System.getProperty("user.dir") +
File.separator
+ "conf" + File.separator;
InputStream fileInputStream = null;
try{
clientInfo = new Properties();
//hiveclient.properties indicates the client configuration file. If the
multiple instances feature is used, the file must be replaced with the hiveclient.properties
file on the instance client.
//hiveclient.properties is located under the config directory of
the directory where the instance client installation package is
decompressed.
String hiveclientProp = userdir + "hiveclient.properties" ;
File propertiesFile = new File(hiveclientProp);
fileInputStream = new FileInputStream(propertiesFile);
clientInfo.load(fileInputStream);
}catch (Exception e) {
throw new IOException(e);
}finally{
if(fileInputStream != null){
fileInputStream.close();
fileInputStream = null;
}
}
b. Obtain the IP address and port list of ZooKeeper, the cluster authentication mode, the SASL configuration of HiveServers, node names of HiveServers in ZooKeeper, the discovery mode from the client to the server, and the principal server process for user authentication. You can read all these configurations from the hiveclient.properties file.
//The format of
zkQuorum is xxx.xxx.xxx.xxx:24002,xxx.xxx.xxx.xxx:24002,xxx.xxx.xxx.xxx:24002";
//xxx.xxx.xxx.xxx is the IP address of the node where ZooKeeper
resides. The default port is 24002.
zkQuorum = clientInfo.getProperty("zk.quorum");
auth = clientInfo.getProperty("auth");
sasl_qop =
clientInfo.getProperty("sasl.qop");
zooKeeperNamespace =
clientInfo.getProperty("zooKeeperNamespace");
serviceDiscoveryMode =
clientInfo.getProperty("serviceDiscoveryMode");
principal =
clientInfo.getProperty("principal");
c. In security mode, the Kerberos user and keytab file path are required for login authentication. For details about how to obtain USER_NAME, USER_KEYTAB_FILE, and KRB5_FILE, see 1.1.6.1 Running JDBC and Viewing Results.
// Set the
userName of new user.
USER_NAME = "xxx";
// Set the keytab and krb5 files location of client.
String userdir = System.getProperty("user.dir") +
File.separator
+
"conf" + File.separator;
USER_KEYTAB_FILE = userdir + "user.keytab";
KRB5_FILE = userdir + "krb5.conf";
d. Define HQL. HQL must be a single statement and cannot contain semicolons (;).
// Define
HQL. HQL cannot contain ";"
String[] sqls = {"CREATE TABLE IF NOT EXISTS
employees_info(id INT,name STRING)",
"SELECT
COUNT(*) FROM employees_info", "DROP TABLE
employees_info"};
e. Build JDBC URL.
![]()
You can also implement pre-authentication without the need of providing the account and keytab file path. For details, see JDBC code example 2 in Development Specifications > Hive > Examples. If IBM JDK is used to run Hive applications, pre-authentication in JDBC example code 2 must be implemented.
The following is an example of the JDBC URL composed of code snippets:
jdbc:hive2://xxx.xxx.xxx.xxx:24002,xxx.xxx.xxx.xxx:24002,xxx.xxx.xxx.xxx:24002/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;sasl.qop=auth-conf;auth=KERBEROS;principal=hive/hadoop.hadoop.com@HADOOP.COM;
// Concat
JDBC URL
StringBuilder sBuilder = new
StringBuilder(
"jdbc:hive2://").append(zkQuorum).append("/");
if ("KERBEROS".equalsIgnoreCase(auth))
{
sBuilder.append(";serviceDiscoveryMode=")
.append(serviceDiscoveryMode)
.append(";zooKeeperNamespace=")
.append(zooKeeperNamespace)
.append(";sasl.qop=")
.append(sasl_qop)
.append(";auth=")
.append(auth)
.append(";principal=")
.append(principal)
.append(";user.principal=")
.append(USER_NAME)
.append(";user.keytab=")
.append(USER_KEYTAB_FILE)
.append(";");
} else {
// Normal mode
sBuilder.append(";serviceDiscoveryMode=")
.append(serviceDiscoveryMode)
.append(";zooKeeperNamespace=")
.append(zooKeeperNamespace)
.append(";auth=none;");
}
String url = sBuilder.toString();
f. Load the Hive JDBC driver.
// Load the Hive JDBC driver.
Class.forName(HIVE_DRIVER);
g. Obtain the JDBC connection, confirm the HQL type (DDL/DML), call ports to run the HQL statement, return the queried column name and results to the console, and close the JDBC connection.
Connection connection = null;
try {
// Obtain the JDBC connection.
// If the normal mode is used, the second
parameter needs to be set to a correct username. Otherwise, the anonymous user
will be used for login.
connection =
DriverManager.getConnection(url, "", "");
// Create a table
// To import data to a table after the table is
created, you can use the LOAD statement. For example, import data from the HDFS
to the table.
//load data inpath '/tmp/employees.txt'
overwrite into table employees_info;
execDDL(connection,sqls[0]);
System.out.println("Create table
success!");
// Query
execDML(connection,sqls[1]);
// Delete the table
execDDL(connection,sqls[2]);
System.out.println("Delete table
success!");
}
finally {
// Close the JDBC connection.
if (null != connection) {
connection.close();
}
public static void execDDL(Connection connection, String sql)
throws SQLException {
PreparedStatement statement = null;
try {
statement =
connection.prepareStatement(sql);
statement.execute();
}
finally {
if (null != statement) {
statement.close();
}
}
}
public static void execDML(Connection connection, String sql) throws
SQLException {
PreparedStatement statement = null;
ResultSet resultSet = null;
ResultSetMetaData resultMetaData = null;
try {
// Run the HQL statement.
statement =
connection.prepareStatement(sql);
resultSet =
statement.executeQuery();
// Return the queried column name to the
console.
resultMetaData = resultSet.getMetaData();
int columnCount =
resultMetaData.getColumnCount();
for (int i = 1; i <= columnCount; i++)
{
System.out.print(resultMetaData.getColumnLabel(i) + '\t');
}
System.out.println();
// Return the results to the
console.
while (resultSet.next()) {
for (int i = 1; i <=
columnCount; i++) {
System.out.print(resultSet.getString(i) + '\t');
}
System.out.println();
}
}
finally {
if (null != resultSet) {
resultSet.close();
}
if (null != statement) {
statement.close();
}
}
}
n Submit a data analysis task using the Python interface, that is, python-examples/pyCLI_sec.py. The authentication mode of the cluster to which the example program connects is the secure mode. Before running the example program, run the kinit command to authenticate the Kerberos user with related rights.
h. Import the HAConnection class.
from pyhs2.haconnection import HAConnection
i. Declare the HiveServer IP address list. In this example, hosts indicate the nodes of HiveServer, and xxx.xxx.xxx.xxx indicates the business IP address.
hosts = ["xxx.xxx.xxx.xxx",
"xxx.xxx.xxx.xxx"]
![]()
l If the HiveServer instance is migrated, the original sample program is invalid. You need to update the IP address of the HiveServer after the migration of the HiveServer instance used in the sample program.
l If multiple Hive instances are used, update the IP address based on the address of the instance that is actually connected.
j. Configure the Kerberos host name and service name. In this example, the Kerberos host name is hadoop and service name is hive.
conf = {"krb_host":"hadoop.hadoop.com",
"krb_service":"hive"}
![]()
If multiple Hive instances are used, change the service name based on the cluster that is actually connected. For example, if the Hive1 instance is connected, change the service name to hive1.
k. Create a connection, run the HQL statement, and return the queried column name and results to the console.
try:
with HAConnection(hosts =
hosts,
port = 21066,
authMechanism = "KERBEROS",
configuration = conf) as haConn:
with
haConn.getConnection() as conn:
with conn.cursor() as cur:
# show databases
print cur.getdatabases()
# execute query
cur.execute("show tables")
# return column info from query
print cur.getschema()
# fetch table results
for i in cur.fetch():
print i
except exception, e:
print e
![]()
If multiple Hive instances are used, you need to modify hosts according to the description in b and change the port number based on to the actual port number. The default ports of Hive to Hive 4 are 21066 to 21070, respectively.
l Submit a data analysis task using Hive ODBC interfaces. For details, see hiveodbcexample.c in the example program.
The value of DSN in the code must the same as the data source name configured, for example, hiveodbcs.
SQLCHAR* ConnStrIn = (SQLCHAR *)"DSN=hiveodbcs";
1.1.6 Commissioning Applications
1.1.6.1 Running JDBC and Viewing Results
Running JDBC in CLI Mode
Step 1 Right-click the Eclipse jdbc-examples project, and choose Export from the shortcut menu, as shown in Figure 1-5.
![]()
The preceding project name is for reference only. Use the actual project name in an actual scenario.
Figure 1-1 Choosing Export from the Eclipse shortcut menu
![]()
Step 2 In the displayed export panel, select JAR file, and click Next, as shown in Figure 1-6.
Figure 1-2 Exporting JAR files
![]()
Step 3 Select the src directory, select the full path for storing exported files, and click Finish, as shown in Figure 1-7.
Figure 1-3 Selecting the full path for storing exported files
![]()
Step 4 Create a directory on the Windows or Linux as the running directory, such as D:\jdbc_example (Windows) or /opt/jdbc_example (Linux), and create the conf and lib sub-directories in the directory.
Copy all the JAR files and jdbc-examples.jar exported in Step 3 from the lib directory of the jdbc-examples project to the lib subdirectory.
Copy all files from the conf directory of jdbc-examples to the conf directory.
Step 5 In Windows, run the following command:
cd /d d:\jdbc_example
java -cp .;D:\jdbc_example\conf;D:\jdbc_example\lib\* com.huawei.bigdata.hive.example.JDBCExample
In Linux, run the following command:
chmod +x /opt/jdbc_example -R
cd /opt/jdbc_example
java -cp .:/opt/jdbc_example/conf:/opt/jdbc_example/lib/* com.huawei.bigdata.hive.example.JDBCExample
Step 6 In the CLI, view the HQL query results in the example codes.
The following information is displayed if the example project is successful in Windows:
Create table success!
_c0
0
Delete table success!
The following information is displayed if the example project is successful in Linux:
Create table success!
_c0
0
Delete table success!
----End
Running JDBC in Eclipse Mode
Step 1 Right-click the JDBCExample class in the Eclipse jdbc-examples project, and choose Run As > Java Application from the shortcut menu.
Step 2 In the Eclipse output window, view the HQL query results in the example codes.
Create table success!
_c0
0
Delete table success!
----End
1.1.6.2 Running HCatalog and Viewing Results
Running HCatalog Example Projects
Step 1 Right-click the Eclipse HCatalog-examples project, and choose Export from the shortcut menu, as shown in the following figure.
![]()
Step 2 In the displayed export panel, select JAR file, and click Next, as shown in the following figure.
![]()
Step 3 Select the src directory. In JAR file, select the full path of the files to be exported, set the name of the JAR package to be exported to hcat.jar, and click Finish, as shown in the following figure.
![]()
Step 4 Uploaded the hcat.jar package generated in the previous step to a specified path on Linux, for example, /opt/hive_client, marked as $HCAT_CLIENT, and ensure that the Hive and YARN clients have been installed.
Step 5 Run the following command to configure environment parameters (client installation path /opt/client is used as an example):
export
HADOOP_HOME=/opt/client/HDFS/hadoop
export HIVE_HOME=/opt/client/Hive/Beeline
export HCAT_HOME=$HIVE_HOME/../HCatalog
export
LIB_JARS=$HCAT_HOME/lib/hive-hcatalog-core-1.3.0.jar,$HCAT_HOME/lib/hive-metastore-1.3.0.jar,$HIVE_HOME/lib/hive-exec-1.3.0.jar,$HCAT_HOME/lib/libfb303-0.9.3.jar,$HCAT_HOME/lib/slf4j-api-1.7.5.jar,$HCAT_HOME/lib/antlr-2.7.7.jar,$HCAT_HOME/lib/jdo-api-3.0.1.jar,$HCAT_HOME/lib/antlr-runtime-3.4.jar,$HCAT_HOME/lib/datanucleus-api-jdo-3.2.6.jar,$HCAT_HOME/lib/datanucleus-core-3.2.10.jar,$HCAT_HOME/lib/datanucleus-rdbms-3.2.9.jar
export HADOOP_CLASSPATH=$HCAT_HOME/lib/hive-hcatalog-core-1.3.0.jar:$HCAT_HOME/lib/hive-metastore-1.3.0.jar:$HIVE_HOME/lib/hive-exec-1.3.0.jar:$HCAT_HOME/lib/libfb303-0.9.3.jar:$HADOOP_HOME/etc/hadoop:$HCAT_HOME/conf:$HCAT_HOME/lib/slf4j-api-1.7.5.jar:$HCAT_HOME/lib/antlr-2.7.7.jar:$HCAT_HOME/lib/jdo-api-3.0.1.jar:$HCAT_HOME/lib/antlr-runtime-3.4.jar:$HCAT_HOME/lib/datanucleus-api-jdo-3.2.6.jar:$HCAT_HOME/lib/datanucleus-core-3.2.10.jar:$HCAT_HOME/lib/datanucleus-rdbms-3.2.9.jar
![]()
If the multi-instance function is enabled for Hive, perform the configuration in export HIVE_HOME=/opt/client/Hive/Beeline. For example, if Hive 1 is used, ensure that the Hive 1 client has been installed before using it. Change the value of export HIVE_HOME to /opt/client/Hive1/Beeline.
Step 6 Prepare for the running:
1. Use the Hive client to create source table t1 in beeline: create table t1(col1 int);
Insert the following data into t1:
+----------+--+
| t1.col1 |
+----------+--+
| 1 |
| 1 |
| 1 |
| 2 |
| 2 |
| 3
|
2. Create destination table t2: create table t2(col1 int,col2 int);
Step 7 Use the Yarn client to submit tasks:
yarn --config $HADOOP_HOME/etc/hadoop jar $HCAT_CLIENT/hcat.jar com.huawei.bigdata.hcatalog.example.HCatalogExample -libjars $LIB_JARS t1 t2
Step 8 View the running result. The data in t2 is as follows:
0:
jdbc:hive2://192.168.1.18:24002,192.168.1.> select * from t2;
+----------+----------+--+
| t2.col1 | t2.col2 |
+----------+----------+--+
| 1 |
3 |
| 2 |
2 |
| 3 |
1 |
+----------+----------+--+
----End
1.1.6.3 Running Python and Viewing Results
Running Python in CLI Mode
Step 1 Grant the execution permission for the script of python-examples folder. Run the following command in the CLI:
chmod +x python-examples -R.
Step 2 For security cluster connections, enter the service plane IP address of the node where HiveServer is installed in the hosts array of python-examples/pyCLI_sec.py.
![]()
When the multi-instance is enabled: In python-examples/pyCLI_sec.py, the hosts array must be modified. In addition, the port must be set according to the used instance. The port (hive.server2.thrift.port) is used for Hive to provide the Thrift service. For example, if the Hive 1 instance is used, port must be set to 21067. (The default ports of Hive to Hive 4 is 21066 to 21070, respectively).
Step 3 Run the kinit command to obtain the cache for Kerberos authentication.
Use the developer account created in Preparing the Developer Account section to run the following commands to run the client program:
kinit -kt keytabstorage path username
cd python-examples
python pyCLI_sec.py
Step 4 In the CLI, view the HQL query results in the example codes. For example:
[['default', '']]
[{'comment': 'from deserializer', 'columnName': 'tab_name', 'type':
'STRING_TYPE'}]
['xx']
----End
1.1.6.4 Running ODBC and Viewing Results
1.2.2.6.4.1 Windows Environment
Running ODBC Example Projects
Step 1 Open the HiveODBCExample folder under the downloaded ODBC driver package.
Step 2 Double-click HiveODBCExample.sln and open the HiveODBCExample project in Visual Studio 2012.
Step 3 On the navigation bar, select the HiveODBCExample project and press F5 to compile and run the example code.
Step 4 After the example code is successfully compiled and executed, all database names in the connected Hive are printed.
![]()
As the running is fast, you can add breakpoints to the main function of hiveodbcexample.c and View the operation result.
----End
1.2.2.6.4.2 Linux Environment
Using the ISQL Tool of unixODBC for Testing
No authentication is available in the Hive ODBC driver. You must manually run the kinit command for login.
Step 1 Use the prepared developer, for example, developuser for login:
kinit developuser
Step 2 Download the unixODBC tool from the official website and install it to a directory, such as /opt/odbc/. Run the isql command:
export
unixODBC_PATH=/opt/odbc/unixODBC/
export LD_LIBRARY_PATH=${unixODBC_PATH}/lib/:$LD_LIBRARY_PATH
export LD_PRELOAD=/usr/lib64/libodbchive.so
cd ${unixODBC_PATH}/bin
./isql -v Hive
${unixODBC_PATH} indicates the installation directory of unixODBC.
Step 3 If the following information is displayed, the login is successful:
+---------------------------------------+
|
Connected!
|
| |
|
sql-statement
|
| help
[tablename]
|
|
quit
|
|
|
+---------------------------------------+
SQL>
Step 4 Use the tool for Hive operations, for example, show tables. The result is as follows:
SQL> show tables;
+-------------------------------------------------------------------------------------------------------+
|
tab_name
|
+-------------------------------------------------------------------------------------------------------+
|
encode_test
|
|
encode_test2
|
|
hbase_t
|
|
sales
|
|
things
|
+-------------------------------------------------------------------------------------------------------+
![]()
Data types tinyint and boolean cannot be displayed properly because the ISQL tool does not support them perfectly.
----End
This post was last edited by chz at 2018-08-29 09:56.
