CDH6.3.2 集成配置 ATLAS-2.1.0

宙宇元

2024-03-08 帮助3人

去官网下载ATLAS源码包：http://atlas.apache.org/2.1.0/index.html#/Downloads

一、Atlas源码编译

1.修改pom文件

因与CDH6.3.2集成，在repositories中新增以下部分：

<repository>

     <id>cloudera</id>

     <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>

     <releases>

         <enabled>true</enabled>

     </releases>

     <snapshots>

         <enabled>false</enabled>

     </snapshots>

</repository>

修改CHD对应的版本，注意连接符号。直接复制cdh的话是个号

学新通

        <lucene-solr.version>7.4.0-cdh6.3.2</lucene-solr.version>

        <hadoop.version>3.0.0-cdh6.3.2</hadoop.version>

        <hbase.version>2.1.0-cdh6.3.2</hbase.version>

        <solr.version>7.4.0-cdh6.3.2</solr.version>

        <hive.version>2.1.1-cdh6.3.2</hive.version>

        <kafka.version>2.2.1-cdh6.3.2</kafka.version>

        <kafka.scala.binary.version>2.11</kafka.scala.binary.version>

        <calcite.version>1.16.0</calcite.version>

        <zookeeper.version>3.4.5-cdh6.3.2</zookeeper.version>

        <falcon.version>0.8</falcon.version>

        <sqoop.version>1.4.7-cdh6.3.2</sqoop.version>

2.兼容Hive2.1.1版本，修改Atlas源代码

默认是3.1 不修改的话会报错

所需修改的项目位置：atlas-release-2.1.0-rc3/addons/hive-bridge

①.src/main/java/org/apache/atlas/hive/bridge//HiveMetaStoreBridge.java 577行

String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null;

改为

String catalogName = null;

②.src/main/java/org/apache/atlas/hive/hook/AtlasHiveHookContext.java 81行

this.metastoreHandler = (listenerEvent != null) ? metastoreEvent.getIHMSHandler() : null;

this.metastoreHandler = null;

3.编译

注意java 版本需要和生产环境的版本一致不然会报错

mvn clean -DskipTests package -Pdist

完成之后文件在/home/software/atlas/distro/target，会编译生成很多压缩包

学新通

二、Atlas安装

1.解压

将apache-atlas-2.1.0-bin.tar.gz解压至安装目录，不要用官方文档说的server包那个包没有各种hook文件

2.修改配置文件atlas-env.sh

export HBASE_CONF_DIR=/etc/hbase/conf

export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=5120m -XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=512m"

export ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0 -XX: CMSClassUnloadingEnabled -XX: UseConcMarkSweepGC -XX: CMSParallelRemarkEnabled -XX: PrintTenuringDistribution -XX: HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=dumps/atlas_server.hprof -Xloggc:logs/gc-worker.log -verbose:gc -XX: UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX: PrintGCDetails -XX: PrintHeapAtGC -XX: PrintGCTimeStamps"

export MANAGE_LOCAL_HBASE=false

export MANAGE_LOCAL_SOLR=false

export MANAGE_EMBEDDED_CASSANDRA=false

export MANAGE_LOCAL_ELASTICSEARCH=false

3.修改配置文件atlas-application.properties

这需要重点注意，habase、kafka、solr、zookeeper等配置需要修改

#

# Licensed to the Apache Software Foundation (ASF) under one

# or more contributor license agreements.  See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership.  The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License.  You may obtain a copy of the License at

#

#     http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

#

#########  Graph Database Configs  #########

# Graph Database

#Configures the graph database to use.  Defaults to JanusGraph

#atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase

# Graph Storage

# Set atlas.graph.storage.backend to the correct value for your desired storage

# backend. Possible values:

#

# hbase

# cassandra

# embeddedcassandra - Should only be set by building Atlas with  -Pdist,embedded-cassandra-solr

# berkeleyje

#

# See the configuration documentation for more information about configuring the various  storage backends.

#

atlas.graph.storage.backend=hbase

atlas.graph.storage.hbase.table=apache_atlas_janus

#Hbase

#For standalone mode , specify localhost

#for distributed mode, specify zookeeper quorum here

atlas.graph.storage.hostname=hadoop-101:2181,hadoop-102:2181,hadoop-103:2181

atlas.graph.storage.hbase.regions-per-server=1

atlas.graph.storage.lock.wait-time=10000

#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the

#the following properties

#atlas.graph.storage.clustername=

#atlas.graph.storage.port=

# Gremlin Query Optimizer

#

# Enables rewriting gremlin queries to maximize performance. This flag is provided as

# a possible way to work around any defects that are found in the optimizer until they

# are resolved.

#atlas.query.gremlinOptimizerEnabled=true

# Delete handler

#

# This allows the default behavior of doing "soft" deletes to be changed.

#

# Allowed Values:

# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes

# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes

#

#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1

# Entity audit repository

#

# This allows the default behavior of logging entity changes to hbase to be changed.

#

# Allowed Values:

# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase

# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra

# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository

#

#atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository

# if Cassandra is used as a backend for audit from the above property, uncomment and set the following

# properties appropriately. If using the embedded cassandra profile, these properties can remain

# commented out.

# atlas.EntityAuditRepository.keyspace=atlas_audit

# atlas.EntityAuditRepository.replicationFactor=1

# Graph Search Index

atlas.graph.index.search.backend=solr

#Solr

#Solr cloud mode properties

atlas.graph.index.search.solr.mode=cloud

atlas.graph.index.search.solr.zookeeper-url=master1:2181/solr,master2:2181/solr,core1:2181/solr

atlas.graph.index.search.solr.zookeeper-connect-timeout=60000

atlas.graph.index.search.solr.zookeeper-session-timeout=60000

atlas.graph.index.search.solr.wait-searcher=true

#Solr http mode properties

#atlas.graph.index.search.solr.mode=http

#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr

# ElasticSearch support (Tech Preview)

# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the

# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.

#

# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product

# https://www.elastic.co/products/x-pack/security

#

# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional

# plugins: https://docs.janusgraph.org/latest/elasticsearch.html

#atlas.graph.index.search.hostname=localhost

#atlas.graph.index.search.elasticsearch.client-only=false

# Solr-specific configuration property

atlas.graph.index.search.max-result-set-size=150

#########  Import Configs  #########

#atlas.import.temp.directory=/temp/import

#########  Notification Configs  #########

atlas.notification.embedded=false

atlas.kafka.data=${sys:atlas.home}/data/kafka

atlas.kafka.zookeeper.connect=hadoop-101:2181,hadoop-102:2181,hadoop-103:2181

atlas.kafka.bootstrap.servers=master1:9092,master2:9092,core1:9092

atlas.kafka.zookeeper.session.timeout.ms=60000

atlas.kafka.zookeeper.connection.timeout.ms=60000

atlas.kafka.zookeeper.sync.time.ms=20

atlas.kafka.auto.commit.interval.ms=1000

atlas.kafka.hook.group.id=atlas

atlas.kafka.enable.auto.commit=false

atlas.kafka.auto.offset.reset=earliest

atlas.kafka.session.timeout.ms=30000

atlas.kafka.offsets.topic.replication.factor=1

atlas.kafka.poll.timeout.ms=1000

atlas.notification.create.topics=true

atlas.notification.replicas=1

atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES

atlas.notification.log.failed.messages=true

atlas.notification.consumer.retry.interval=500

atlas.notification.hook.retry.interval=1000

# Enable for Kerberized Kafka clusters

#atlas.notification.kafka.service.principal=kafka/_HOST@EXAMPLE.COM

#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab

## Server port configuration

atlas.server.http.port=21000

#atlas.server.https.port=21443

#########  Security Properties  #########

# SSL config

atlas.enableTLS=false

#truststore.file=/path/to/truststore.jks

#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks

#following only required for 2-way SSL

#keystore.file=/path/to/keystore.jks

# Authentication config

atlas.authentication.method.kerberos=false

atlas.authentication.method.file=true

#### ldap.type= LDAP or AD

atlas.authentication.method.ldap.type=none

#### user credentials file

atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties

### groups from UGI

#atlas.authentication.method.ldap.ugi-groups=true

######## LDAP properties #########

#atlas.authentication.method.ldap.url=ldap://<ldap server url>:389

#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com

#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com

#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)

#atlas.authentication.method.ldap.groupRoleAttribute=cn

#atlas.authentication.method.ldap.base.dn=dc=example,dc=com

#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com

#atlas.authentication.method.ldap.bind.password=<password>

#atlas.authentication.method.ldap.referral=ignore

#atlas.authentication.method.ldap.user.searchfilter=(uid={0})

#atlas.authentication.method.ldap.default.role=<default role>

######### Active directory properties #######

#atlas.authentication.method.ldap.ad.domain=example.com

#atlas.authentication.method.ldap.ad.url=ldap://<AD server url>:389

#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})

#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com

#atlas.authentication.method.ldap.ad.bind.password=<password>

#atlas.authentication.method.ldap.ad.referral=ignore

#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})

#atlas.authentication.method.ldap.ad.default.role=<default role>

#########  JAAS Configuration ########

#atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule

#atlas.jaas.KafkaClient.loginModuleControlFlag = required

#atlas.jaas.KafkaClient.option.useKeyTab = true

#atlas.jaas.KafkaClient.option.storeKey = true

#atlas.jaas.KafkaClient.option.serviceName = kafka

#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab

#atlas.jaas.KafkaClient.option.principal = atlas/_HOST@EXAMPLE.COM

#########  Server Properties  #########

atlas.rest.address=http://localhost:21000

# If enabled and set to true, this will run setup steps when the server starts

atlas.server.run.setup.on.start=false

#########  Entity Audit Configs  #########

atlas.audit.hbase.tablename=apache_atlas_entity_audit

atlas.audit.zookeeper.session.timeout.ms=1000

atlas.audit.hbase.zookeeper.quorum=hadoop-101:2181,hadoop-102:2181,hadoop-103:2181

#########  High Availability Configuration ########

atlas.server.ha.enabled=false

#### Enabled the configs below as per need if HA is enabled #####

#atlas.server.ids=id1

#atlas.server.address.id1=localhost:21000

#atlas.server.ha.zookeeper.connect=localhost:2181

#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000

#atlas.server.ha.zookeeper.num.retries=3

#atlas.server.ha.zookeeper.session.timeout.ms=20000

## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##

#atlas.server.ha.zookeeper.acl=<scheme>:<id>

#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>

######### Atlas Authorization #########

atlas.authorizer.impl=simple

atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json

#########  Type Cache Implementation ########

# A type cache class which implements

# org.apache.atlas.typesystem.types.cache.TypeCache.

# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.

#atlas.TypeCache.impl=

#########  Performance Configs  #########

#atlas.graph.storage.lock.retries=10

#atlas.graph.storage.cache.db-cache-time=120000

#########  CSRF Configs  #########

atlas.rest-csrf.enabled=true

atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*

atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE

atlas.rest-csrf.custom-header=X-XSRF-HEADER

############ KNOX Configs ################

#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera

#atlas.sso.knox.enabled=true

#atlas.sso.knox.providerurl=https://<knox gateway ip>:8443/gateway/knoxsso/api/v1/websso

#atlas.sso.knox.publicKey=

############ Atlas Metric/Stats configs ################

# Format: atlas.metric.query.<key>.<name>

atlas.metric.query.cache.ttlInSecs=900

#atlas.metric.query.general.typeCount=

#atlas.metric.query.general.typeUnusedCount=

#atlas.metric.query.general.entityCount=

#atlas.metric.query.general.tagCount=

#atlas.metric.query.general.entityDeleted=

#

#atlas.metric.query.entity.typeEntities=

#atlas.metric.query.entity.entityTagged=

#

#atlas.metric.query.tags.entityTags=

#########  Compiled Query Cache Configuration  #########

# The size of the compiled query cache.  Older queries will be evicted from the cache

# when we reach the capacity.

#atlas.CompiledQueryCache.capacity=1000

# Allows notifications when items are evicted from the compiled query

# cache because it has become full.  A warning will be issued when

# the specified number of evictions have occurred.  If the eviction

# warning threshold <= 0, no eviction warnings will be issued.

#atlas.CompiledQueryCache.evictionWarningThrottle=0

#########  Full Text Search Configuration  #########

#Set to false to disable full text search.

#atlas.search.fulltext.enable=true

#########  Gremlin Search Configuration  #########

#Set to false to disable gremlin search.

atlas.search.gremlin.enable=false

########## Add http headers ###########

#atlas.headers.Access-Control-Allow-Origin=*

#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST

#atlas.headers.<headerName>=<headerValue>

#########  UI Configuration ########

atlas.ui.default.version=v1

######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary

4.修改atlas-log4j.xml文件

去掉如下代码的注释

<appender name="perf_appender" class="org.apache.log4j.DailyRollingFileAppender">

   <param name="file" value="${atlas.log.dir}/atlas_perf.log" />

    <param name="datePattern" value="'.'yyyy-MM-dd" />

    <param name="append" value="true" />

    <layout class="org.apache.log4j.PatternLayout">

        <param name="ConversionPattern" value="%d|%t|%m%n" />

    </layout>

</appender>

<logger name="org.apache.atlas.perf" additivity="false">

    <level value="debug" />

    <appender-ref ref="perf_appender" />

</logger>

5.集成CDH的HBase

添加hbase集群配置文件到/home/software/atlas/conf/hbase下

ln -s /etc/hbase/conf/ /home/software/atlas/conf/hbase

6.集成CDH的Solr

①将apache-atlas-2.1.0/conf/solr文件拷贝到solr的安装目录下，更名为atlas-solr

②创建collection

vi /etc/passwd

/sbin/nologin 修改为 /bin/bash

su - solr

/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/bin/solr create -c  vertex_index -d /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/atlas-solr -shards 3 -replicationFactor 2

/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/bin/solr create -c  edge_index -d /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/atlas-solr -shards 3 -replicationFactor 2

/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/bin/solr create -c  fulltext_index -d /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/solr/atlas-solr -shards 3 -replicationFactor 2

③验证创建collection成功

学新通

7.集成CDH的Kafka

①创建Kafka Topic

/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/bin/solr create -c  vertex_index -d /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/atlas-solr -shards 3 -replicationFactor 2 -force

/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/bin/solr create -c  edge_index -d /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/atlas-solr -shards 3 -replicationFactor 2 -force

/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/bin/solr create -c  fulltext_index -d /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/atlas-solr -shards 3 -replicationFactor 2 -force

②查看topic

kafka-topics --list --zookeeper hadoop-101:2181,hadoop-102:2181,hadoop-103:2181

8.Atlas启动

cd /home/software/atlas

./bin/atlas_start.py

默认用户名和密码为：admin

学新通

三、Atlas集成配置

1、Atlas与Hive集成

1.配置修改

修改hive的相关配置文件

进入CM web控制台–> 进入hive的配置界面

① 搜索 hive-site.xml

修改【hive-site.xml的Hive服务高级配置代码段(安全阀)】

名称：hive.exec.post.hooks

值：org.apache.atlas.hive.hook.HiveHook

修改【hive-site.xml的Hive客户端高级配置代码段(安全阀)】

名称：hive.exec.post.hooks

值：org.apache.atlas.hive.hook.HiveHook

学新通

③搜索Hive 辅助 JAR 目录，增加辅助目录：

修改【Hive 辅助 JAR 目录】

值：/home/fusion_data/hive_auxlib/

将atlas下的hook jar包拷贝到hive辅助目录下，

cp /home/software/atlas/hook/hive/* /etc/hive/conf

scp /home/software/atlas/hook/hive/* root@hadoop-101:/etc/hive/conf

scp /home/software/atlas/hook/hive/* root@hadoop-103:/etc/hive/conf

将atlas配置文件 atlas-application.properties 拷贝到hive配置文件下

cp /home/software/atlas/conf/atlas-application.properties /etc/hive/conf

scp /home/software/atlas/conf/atlas-application.properties root@hadoop-101:/etc/hive/conf

scp /home/software/atlas/conf/atlas-application.properties root@hadoop-103:/etc/hive/conf

2.将hive元数据导入Atlas

默认账密是admin/admin

cd /home/software/atlas

./bin/import-hive.sh

    Enter username for atlas :- admin

    Enter password for atlas :-

    Hive Meta Data import was successful!!

2、Atlas与Sqoop集成

1.修改配

置

通过添加以下内容在/sqoop-site.xml中设置Atlas hook：

<property>

<name>sqoop.job.data.publish.class</name>

<value>org.apache.atlas.sqoop.hook.SqoopHook</value>

</property>

将<atlas package>/conf/atlas-application.properties复制到<sqoop package>/conf/

在sqoop lib中链接<atlas package>/hook/sqoop/*.jar或完整复制过去

这篇好文章是转载于：学新通技术网

CDH6.3.2 集成配置 ATLAS-2.1.0

三、Atlas集成配置

2、Atlas与Sqoop集成

photoshop保存的图片太大微信发不了怎么办

Android 11 保存文件到外部存储，并分享文件

《学习通》视频自动暂停处理方法

word里面弄一个表格后上面的标题会跑到下面怎么办

photoshop扩展功能面板显示灰色怎么办

微信公众号没有声音提示怎么办

excel下划线不显示怎么办

excel打印预览压线压字怎么办

怎样阻止微信小程序自动打开

TikTok加速器哪个好免费的TK加速器推荐