Presto 0.100 Documentation

4.2. Hive连接器

4.2. Hive连接器

The Hive connector allows querying data stored in a Hive data warehouse. Hive is a combination of three components:

Presto only uses the first two components: the data and the metadata. It does not use HiveQL or any part of Hive’s execution environment.

配置

针对不同的hadoop版本,Presto都有对应的hive connector:

  • hive-hadoop1: Apache Hadoop 1.x
  • hive-hadoop2: Apache Hadoop 2.x
  • hive-cdh4: Cloudera CDH 4
  • hive-cdh5: Cloudera CDH 5

创建一个包含有以下内容的文件:etc/catalog/hive.properties,从而在hive catalog中挂载hive-cdh4的connector。根据你实际的hadoop版本和环境替换掉hive-cdh4example.net:9083:

connector.name=hive-cdh4
hive.metastore.uri=thrift://example.net:9083

Multiple Hive Clusters

如果需要你可以设置很多catalog,所以若你现在又多了一个hive集群,你只需要在etc/catalog目录下添加一个另一个配置文件就OK了。(切记:配置文件一定要以.properties结尾)。 For example, if you name the property file sales.properties, Presto will create a catalog named sales using the configured connector. If you are connecting to more than one Hive metastore, you can create any number of properties files configuring multiple instances of the Hive connector.

HDFS Configuration

如果hive metastore的引用文件存放在一个存在联邦的HDFS上,或者你是通过其他非标准的客户端来访问HDFS集群的,请添加以下配置信息来指向你的HDFS配置文件:

hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

大多数情况下,Presto会在安装过程中自动完成HDFS客户端的配置。 如果确实需要特殊配置,只需要添加一些额外的配置文件,并且需要指定这些新加的配置文件。 建议将配置文件中的配置属性最小化。尽量少添加一些配置属性,因为过多的添加配置属性会引起其他问题。

Configuration Properties

Property Name Description Example
hive.metastore.uri The URI of the Hive Metastore to connect to using the Thrift protocol. This property is required. thrift://192.0.2.3:9083
hive.config.resources An optional comma-separated list of HDFS configuration files. These files must exist on the machines running Presto. Only specify this if absolutely necessary to access HDFS. /etc/hdfs-site.xml
hive.storage-format The default file format used when creating new tables RCBINARY
hive.force-local-scheduling Force splits to be scheduled on the same node as the Hadoop DataNode process serving the split data. This is useful for installations where Presto is collocated with every DataNode. true

Querying Hive Tables

The following table is an example Hive table from the Hive Tutorial. It can be created in Hive (not in Presto) using the following Hive CREATE TABLE command:

hive> CREATE TABLE page_view (
    >   viewTime INT,
    >   userid BIGINT,
    >   page_url STRING,
    >   referrer_url STRING,
    >   ip STRING COMMENT 'IP Address of the User')
    > COMMENT 'This is the page view table'
    > PARTITIONED BY (dt STRING, country STRING)
    > STORED AS SEQUENCEFILE;
OK
Time taken: 3.644 seconds

Assuming that this table was created in the web schema in Hive, this table can be described in Presto:

DESCRIBE hive.web.page_view;
    Column    |  Type   | Null | Partition Key |        Comment
--------------+---------+------+---------------+------------------------
 viewtime     | bigint  | true | false         |
 userid       | bigint  | true | false         |
 page_url     | varchar | true | false         |
 referrer_url | varchar | true | false         |
 ip           | varchar | true | false         | IP Address of the User
 dt           | varchar | true | true          |
 country      | varchar | true | true          |
(7 rows)

This table can then be queried in Presto:

SELECT * FROM hive.web.page_view;