一. Flume安装
- Flume的安装非常简单,只需要解压即可,当然,前提是已有hadoop环境
- 上传安装包到数据源所在节点上
- 解压
tar -zxvf apache-flume-1.6.0-bin.tar.gz -C /usr/local
- 修改conf下的
flume-en.sh
,在里面配置JAVA_HOME
二. 配置
- 根据数据采集的需求配置采集方案,描述在配置文件中(文件名可任意自定义)
- 指定采集方案配置文件,在相应的节点上启动flume agent
- 采集本机日志到HDFS中的配置,
vi flumeproject
```shellName the components on this agent
a1.sources = r1 a1.sinks = k1 a1.channels = c1
Describe/configure the source
a1.sources.r1.type = exec a1.sources.r1.command = tail -F /home/flume.exec.log
Describe the sink
a1.sinks.k1.type=hdfs a1.sinks.k1.hdfs.path=hdfs://mycluster/log/%Y%m%d a1.sinks.k1.hdfs.rollCount=0 a1.sinks.k1.hdfs.rollInterval=0 a1.sinks.k1.hdfs.rollSize=10240 a1.sinks.k1.hdfs.idleTimeout=5 a1.sinks.k1.hdfs.fileType=DataStream a1.sinks.k1.hdfs.useLocalTimeStamp=true
Use a channel which buffers events in memory
a1.channels.c1.type = memory
通道中最大的可以存储的event数量
a1.channels.c1.capacity = 1000
每次最大可以从source中拿到或者送到sink中的event数量
a1.channels.c1.transactionCapacity = 100
Bind the source and sink to the channel
a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
4. 启动
```shell
flume-ng agent -n a1 -f flumeproject -Dflume.root.logger=INFO,console
-n a1 指定我们这个agent的名字
-f flumeproject指定所描述的采集方案