写入数据

你现在可以已经迫不及待的想要把数据写入TSD了,但是真正想把OpenTSDB的强大和灵活性利用起来, 你需要暂停一下,并仔细思考一下命名格式. 当你想好之后,你可以继续使用telnet或HTTP API插入数据了, 或者使用已有的OpenTSDB工具,比如说 'tcollector'.

命名模式

许多度量管理人员习惯于为他们的时间序列提供一个单独的名称. 例如, 系统管理员使用RRD风格可能会将他们的时间序列命名为webserver01.sys.cpu.0.user. 这个名字告诉我们时间序列在用户空间为webserver01的cpu 0以时间提供记录. 如果你以后只想在特殊的web服务器检索这个cpu的用户时间,它会工作的很好.

但是如果这个web服务器有64个核心,并且你想获取它们的平均时间呢?一些系统允许你 指定一个像webserver01.sys.cpu.*.user这样的通配符,它会读取所有的64个文件并且汇总结果. 另外, 你需要记录一个名为webserver01.sys.cpu.user.all的新时间序列表示新的集合,但是现实你必须写'64 + 1'个不同的时间序列. 如果你有上千个web服务器并且你想获得它们所有的平均cpu时间呢? 你可以使用一个这样的通配符查询*.sys.cpu.*.user然后系统会打开64,000 个文件, 汇总结果并返回数据. 或者设置一个进程来预聚合数据并将其写入webservers.sys.cpu.user.all.

OpenTSDB 引入'tags'思想处理问题时有一点不同 . 每个时间序列仍然有 'metric' 名,但是它更通用, 一些东西可以共享许多特殊的时间序列. 相反,唯一性来源于一个具有快速聚合的灵活查询键值对标签.

注意

每个OpenTSDB中的时间序列必须至少有一个标签.

以前面的webserver01.sys.cpu.0.user为例. 在OpenTSDB中, 它应该成为sys.cpu.userhost=webserver01,cpu=0. 现在,如果我们想要单个核心的数据, 我们可以把写成sum:sys.cpu.user{host=webserver01,cpu=42}. 如果我们需要所有核心, 我们简单的去掉cpu标签,用sum:sys.cpu.user{host=webserver01}请求. 这会给我们64个核心的汇总结果. 如果我们需要所有1000个服务器的结果,我们写成 requestsum:sys.cpu.user. 基础数据模式会彼此相连存储所有sys.cpu.user时间序列,以便快速高效的聚合所需数据. 自从越来越多的用户使用OpenTSDB,它被设计为尽可能快的进行这些查询, 接下来深入了解更详细的信息.

聚合

因为标签系统非常灵活, 如果你不知道OpenTSDB的查询方法,这会出现一些问题, 因此需要一些思考. 尝试这个查询示例:sum:sys.cpu.user{host=webserver01}. 我们为webserver01记录了64个不同的时间序列, 分别对应每个CPU核心. 当我们发出那个查询时, 所有以sys.cpu.user为名称,以host=webserver01为标签的时间序列会被检索, 平均, 并作为一系列数字返回. 我们假设时间戳为1356998400的数据返回平均值为50. 现在我们在另一个系统迁移到OpenTSDB ,并且有一个预聚合所有64个核心的过程,所以我们可以快速获取平均值并且简单的写入一个新时间序列sys.cpu.userhost=webserver01.如果我们执行相同的查询, 我们会在1356998400得到一个100的值. 发生了什么? OpenTSDB 汇总所有64个时间序列和预聚合时间序列到100. 存储时, 我们应该有像这样的数据:

sys.cpu.user host=webserver01        1356998400  50
sys.cpu.user host=webserver01,cpu=0  1356998400  1
sys.cpu.user host=webserver01,cpu=1  1356998400  0
sys.cpu.user host=webserver01,cpu=2  1356998400  2
sys.cpu.user host=webserver01,cpu=3  1356998400  0
...
sys.cpu.user host=webserver01,cpu=63 1356998400  1

如果查询时没有给出标签,OpenTSDB 将自动聚合所有时间序列. 如果定义了一个或多个标签,聚合将包括所有与标签匹配的时间序列,而不考虑其他标记. 使用sum:sys.cpu.user{host=webserver01}, 将包含sys.cpu.userhost=webserver01,cpu=0同样也包含sys.cpu.userhost=webserver01,cpu=0,manufacturer=Intel,sys.cpu.userhost=webserver01,foo=bar还有sys.cpu.userhost=webserver01,cpu=0,datacenter=lax,department=ops.这个例子是说:注意你的命名模式.

时间序列基数

任何命名模式的一个关键方面是考虑时间序列的基数基数定义为集合中唯一项的个数. OpenTSDB案例中, 这意味着与度量值相关联的项的数量。, i.e. 所有可能的标记名称和值组合, 以及唯一度量名称、标记名称和标记值的数目. 基数是重要的,基于如下两个原因.

有限的唯一ID(UIDs)

每个度量值、标记名和标记值分配的唯一ID是有限的.默认情况下有超过 1600万种可能的ID类型. 例如,你运行了一个非常流行的web服务,打算以访问者IP地址作为标签, e.g.web.app.hitsclientip=38.26.34.10, 你可能很快就会遇到UID分配限制有可能超过40亿个IPv4地址. 此外, 这个方法会为IP 38.26.34.10创建一个非常稀疏的时间序列,并且很少在你应用上使用, 更也许再也没有那个地址的访问.

UID限定通常不是问题.一个分配了UID的标签值和它的tag标签完全分离. 如果你为标签值使用数字标识符, 这个数值会分配给UID一次而且可以与许多标记名一起使用. 例如,如果我们把数字2分配给 UID, 当使用只有1个标签值UID (2) 和4个标签名(cpu,interface,hddfan),我们可以使用标签对cpu=2,interface=2,hdd=2andfan=2存储时间序列 .

如果你觉得UID限定影响你,首先想一下你想执行的查询. 如果我们看看上面web.app.hits这个例子, 你可能只关心命中你服务的总量,不怎么需要关心某一个IP地址. 基于这样的原因, 你可能想作为注释来存储IP地址. 这样,您仍然可以从低基数获益,但是如果你需要的话 , 您可以使用外部脚本搜索特定IP的结果. (注意: 注释查询在未来的OpenTSDB版本中会提供支持 .)

如果你确实需要超过1600万个值, 你可以从3字节增加OpenTSDB编码的字节数最大到8字节. 此更改将需要修改源代码中的值, 重新编译,部署到你需要处理这些数据的TSD实例, 并在所有未来的补丁和发布中保持这种定制.

警告

It is possible that your situation requires this value to be increased. If you choose to modify this value, you must start with fresh data and a new UID table. Any data written with a TSD expecting 3-byte UID encoding will be incompatible with this change, so ensure that all of your TSDs are running the same modified code and that any data you have stored in OpenTSDB prior to making this change has been exported to a location where it can be manipulated by external tools. See theTSDB.javafile for the values to change.

Query Speed

Cardinality also affects query speed a great deal, so consider the queries you will be performing frequently and optimize your naming schema for those. OpenTSDB creates a new row per time series per hour. If we have one host with a single core that emits one time seriessys.cpu.userhost=webserver01,cpu=0with data written every second for 1 day, that would result in 24 rows of data or 86,400 data points. However if we have 8 possible CPU cores for that host, now we have 192 rows and 691,200 data points. This looks good because we can get easily a sum or average of CPU usage across all cores by issuing a query likestart=1d-ago&m=avg:sys.cpu.user{host=webserver01}. The query will iterate over all 192 rows and aggregate the data into a single time series.

However what if we have 20,000 hosts, each with 8 cores? Now we will have 3.8 million rows and 1.728 billion data points per day due to a high cardinality of host values. Queries for the average core usage on hostwebserver01will be slower as it must pick out 192 rows out of 3.8 million. (However with OpenTSDB 2.2, you can use the explicit tags feature to specifycpu=*and the fuzzy filter will kick in to help skip those unnecessary rows quicker.)

The benefits of this schema are that you have very deep granularity in your data, e.g., storing usage metrics on a per-core basis. You can also easily craft a query to get the average usage across all cores an all hosts:start=1d-ago&m=avg:sys.cpu.user. However queries against that particular metric will take longer as there are more rows to sift through. This is common amongst all databases and is not OpenTSDB's problem alone.

Here are some common means of dealing with cardinality:

Pre-Aggregate- In the example above withsys.cpu.user, you generally care about the average usage on the host, not the usage per core. While the data collector may send a separate value per core with the tagging schema above, the collector could also send one extra data point such assys.cpu.user.avghost=webserver01. Now you have a completely separate timeseries that would only have 24 rows per day and with 20K hosts, only 480K rows to sift through. Queries will be much more responsive for the per-host average and you still have per-core data to drill down to separately.

Shift to Metric- What if you really only care about the metrics for a particular host and don't need to aggregate across hosts? In that case you can shift the hostname to the metric. Our previous example becomessys.cpu.user.websvr01cpu=0. Queries against this schema are very fast as there would only be 192 rows per day for the metric. However to aggregate across hosts you would have to execute multiple queries and aggregate outside of OpenTSDB. (Future work will include this capability).

Naming Conclusion

When you design your naming schema, keep these suggestions in mind:

  • Be consistent with your naming to reduce duplication. Always use the same case for metrics, tag names and values.
  • Use the same number and type of tags for each metric. E.g. don't store my.metric host=foo and my.metric datacenter=lga .
  • Think about the most common queries you'll be executing and optimize your schema for those queries
  • Think about how you may want to drill down when querying
  • Don't use too many tags, keep it to a fairly small number, usually up to 4 or 5 tags (By default, OpenTSDB supports a maximum of 8 tags).

Data Specification

Every time series data point requires the following data:

  • metric - A generic name for the time series such as sys.cpu.user , stock.quote or env.probe.temp .
  • timestamp - A Unix/POSIX Epoch timestamp in seconds or milliseconds defined as the number of seconds that have elapsed since January 1st, 1970 at 00:00:00 UTC time. Only positive timestamps are supported at this time.
  • value - A numeric value to store at the given timestamp for the time series. This may be an integer or a floating point value.
  • tag(s) - A key/value pair consisting of a tagk (the key) and a tagv (the value). Each data point must have at least one tag.

Timestamps

Data can be written to OpenTSDB with second or millisecond resolution. Timestamps must be integers and be no longer than 13 digits (See first [NOTE] below). Millisecond timestamps must be of the format1364410924250where the final three digits represent the milliseconds. Applications that generate timestamps with more than 13 digits (i.e., greater than millisecond resolution) must be rounded to a maximum of 13 digits before submitting or an error will be generated.

Timestamps with second resolution are stored on 2 bytes while millisecond resolution are stored on 4. Thus if you do not need millisecond resolution or all of your data points are on 1 second boundaries, we recommend that you submit timestamps with 10 digits for second resolution so that you can save on storage space. It's also a good idea to avoid mixing second and millisecond timestamps for a given time series. Doing so will slow down queries as iteration across mixed timestamps takes longer than if you only record one type or the other. OpenTSDB will store whatever you give it.

Note

When writing to the telnet interface, timestamps may optionally be written in the form1364410924.250, where three digits representing the milliseconds are placed after a period. Timestamps sent to the/api/putendpoint over HTTP_must_be integers and may not have periods. Data with millisecond resolution can only be extracted via the/api/queryendpoint or CLI command at this time. Seequery/indexfor details.

Note

Providing millisecond resolution does not necessarily mean that OpenTSDB supports write speeds of 1 data point per millisecond over many time series. While a single TSD may be able to handle a few thousand writes per second, that would only cover a few time series if you're trying to store a point every millisecond. Instead OpenTSDB aims to provide greater measurement accuracy and you should generally avoid recording data at such a speed, particularly for long running time series.

Metrics and Tags

The following rules apply to metric and tag values:

  • Strings are case sensitive, i.e. "Sys.Cpu.User" will be stored separately from "sys.cpu.user"
  • Spaces are not allowed
  • Only the following characters are allowed: atoz,AtoZ,0to9,-,_,.,/or Unicode letters (as per the specification)

Metric and tags are not limited in length, though you should try to keep the values fairly short.

Integer Values

If the value from aputcommand is parsed without a decimal point (.), it will be treated as a signed integer. Integers are stored, unsigned, with variable length encoding so that a data point may take as little as 1 byte of space or up to 8 bytes. This means a data point can have a minimum value of -9,223,372,036,854,775,808 and a maximum value of 9,223,372,036,854,775,807 (inclusive). Integers cannot have commas or any character other than digits and the dash (for negative values). For example, in order to store the maximum value, it must be provided in the form9223372036854775807.

Floating Point Values

If the value from aputcommand is parsed with a decimal point (.) it will be treated as a floating point value. Currently all floating point values are stored on 4 bytes, single-precision, with support for 8 byte double-precision in 2.4 and later. Floats are stored in IEEE 754 floating-point "single format" with positive and negative value support. Infinity and Not-a-Number values are not supported and will throw an error if supplied to a TSD. SeeWikipediaand theJava Documentationfor details.

Note

Because OpenTSDB only supports floating point values, it is not suitable for storing measurements that require exact values like currency. This is why, when storing a value like15.2the database may return15.199999809265137.

Ordering

Unlike other solutions, OpenTSDB allows for writing data for a given time series in any order you want. This enables significant flexibility in writing data to a TSD, allowing for populating current data from your systems, then importing historical data at a later time.

Duplicate Data Points

Writing data points in OpenTSDB is generally idempotent within an hour of the original write. This means you can write the value42at timestamp1356998400and then write42again for the same time and nothing bad will happen. However if you have compactions enabled to reduce storage consumption and write the same data point after the row of data has been compacted, an exception may be returned when you query over that row. If you attempt to write two different values with the same timestamp, a duplicate data point exception may be thrown during query time. This is due to a difference in encoding integers on 1, 2, 4 or 8 bytes and floating point numbers. If the first value was an integer and the second a floating point, the duplicate error will always be thrown. However if both values were floats or they were both integers that could be encoded on the same length, then the original value may be overwritten if a compaction has not occurred on the row.

In most situations, if a duplicate data point is written it is usually an indication that something went wrong with the data source such as a process restarting unexpectedly or a bug in a script. OpenTSDB will fail "safe" by throwing an exception when you query over a row with one or more duplicates so you can down the issue.

With OpenTSDB 2.1 you can enable last-write-wins by setting thetsd.storage.fix_duplicatesconfiguration value totrue. With this flag enabled, at query time, the most recent value recorded will be returned instead of throwing an exception. A warning will also be written to the log file noting a duplicate was found. If compaction is also enabled, then the original compacted value will be overwritten with the latest value.

Input Methods

There are currently three main methods to get data into OpenTSDB: Telnet API, HTTP API and batch import from a file. Alternatively you can use a tool that provides OpenTSDB support, or if you're extremely adventurous, use the Java library.

Warning

Don't try to write directly to the underlying storage system, e.g. HBase. Just don't. It'll get messy quickly.

Note

If thetsd.modeis set toroinstead ofrw, the TSD will not accept data points through RPC calls. Telnet style calls will throw an exception and calls to the HTTP endpoint will return a 404 error. However it is still possible to write via the JAVA API when the mode is set to read only.

Telnet

The easiest way to get started with OpenTSDB is to open up a terminal or telnet client, connect to your TSD and issue aputcommand and hit 'enter'. If you are writing a program, simply open a socket, print the string command with a new line and send the packet. The telnet command format is:

put 
<
metric
>
<
timestamp
>
<
value
>
<
tagk1=tagv1[ tagk2=tagv2 ...tagkN=tagvN]
>

For example:

put sys.cpu.user 1356998400 42.5 host=webserver01 cpu=0

Eachputcan only send a single data point. Don't forget the newline character, e.g.\nat the end of your command.

Note

The Telnet method of writing is discouraged as it doesn't provide a way of determining which data points failed to write due to formatting or storage errors. Instead use the HTTP API.

Http API

As of version 2.0, data can be sent over HTTP in formats supported by 'Serializer' plugins. Multiple, un-related data points can be sent in a single HTTP POST request to save bandwidth. See the../api_http/putfor details.

Batch Import

If you are importing data from another system or you need to backfill historical data, you can use theimportCLI utility. Seecli/importfor details.

Write Performance

OpenTSDB can scale to writing millions of data points per 'second' on commodity servers with regular spinning hard drives. However users who fire up a VM with HBase in stand-alone mode and try to slam millions of data points at a brand new TSD are disappointed when they can only write data in the hundreds of points per second. Here's what you need to do to scale for brand new installs or testing and for expanding existing systems.

UID Assignment

The first sticking point folks run into is ''uid assignment''. Every string for a metric, tag key and tag value must be assigned a UID before the data point can be stored. For example, the metricsys.cpu.usermay be assigned a UID of000001the first time it is encountered by a TSD. This assignment takes a fair amount of time as it must fetch an available UID, write a UID to name mapping and a name to UID mapping, then use the UID to write the data point's row key. The UID will be stored in the TSD's cache so that the next time the same metric comes through, it can find the UID very quickly.

Therefore, we recommend that you 'pre-assign' UID to as many metrics, tag keys and tag values as you can. If you have designed a naming schema as recommended above, you'll know most of the values to assign. You can use the CLI toolscli/mkmetric,cli/uidor the HTTP API../api_http/uid/indexto perform pre-assignments. Any time you are about to send a bunch of new metrics or tags to a running OpenTSDB cluster, try to pre-assign or the TSDs will bog down a bit when they get the new data.

Note

If you restart a TSD, it will have to lookup the UID for every metric and tag so performance will be a little slow until the cache is filled.

Random Metric UID Assignment

With 2.2 you can randomly assign UIDs to metrics for better region server write distribution. Because metric UIDs are located at the start of the row key, if a new set of busy metric are created, all writes for those metric will be on the same server until the region splits. With random UID generation enabled, the new metrics will be distributed across the key space and likely to wind up in different regions on different servers.

Random metric generation can be enabled or disabled at any time by modifying thetsd.core.uid.random_metricsflag and data is backwards compatible all the way back to OpenTSDB 1.0. However it is recommended that you pre-split your TSDB data table according to the full metric UID space. E.g. if you use the default UID size in OpenTSDB, UIDs are 3 bytes wide, thus you can have 16,777,215 values. If you already have data in your TSDB table and choose to enable random UIDs, you may want to create new regions.

When generating random IDs, TSDB will try up to 10 times to assign a UID without a collision. Thus as the number of assigned metrics increases so too will the number of collisions and the likely hood that a data point may be dropped due to retries. If you enable random IDs and keep adding more metrics then you may want to increase the number of bytes on metric UIDs. Note that the UID change is not backwards compatible so you have to create a new table and migrate your old data.

Salting

In 2.2 salting is supported to greatly increase write distribution across region servers. When enabled, a configured number of bytes are prepended to each row key. Each metric and combination of tags is then hashed into one "bucket", the ID of which is written to the salt bytes. Distribution is improved particularly for high-cardinality metrics (those with a large number of tag combinations) as the time series are split across the configured bucket count, thus routed to different regions and different servers. For example, without salting, a metric with 1 million series will be written to a single region on a single server. With salting enabled and a bucket size of 20, the series will be split across 20 regions (and 20 servers if the cluster has that many hosts) where each region has 50,000 series.

Warning

Because salting modifies the storage format, you cannot enable or disable salting at whim. If you have existing data, you must start a new data table and migrate data from the old table into the new one. Salted data cannot be read from previous versions of OpenTSDB.

To enable salting you must modify the config file parametertsd.storage.salt.widthand optionallytsd.storage.salt.buckets. We recommend setting the salt width to1and determine the number of buckets based on a factor of the number of region servers in your cluster. Note that at query time, the TSD will firetsd.storage.salt.bucketsnumber of scanners to fetch data. The proper number of salt buckets must be determined through experimentation as at some point query performance may suffer due to having too many scanners open and collating the results. In the future the salt width and buckets may be configurable but we didn't want folks changing settings on accident and losing data.

Appends

Also in 2.2, writing to HBase columns via appends is now supported. This can improve both read and write performance in that TSDs will no longer maintain a queue of rows to compact at the end of each hour, thus preventing a massive read and re-write operation in HBase. However due to the way appends operate in HBase, an increase in CPU utilization, store file size and HDFS traffic will occur on the region servers. Make sure to monitor your HBase servers closely.

At read time, only one column is returned per row similar to post-TSD-compaction rows. However note that if thetsd.storage.repair_appendsis enabled, then when a column has duplicates or out of order data, it will be re-written to HBase. Also columns with many duplicates or ordering issues may slow queries as they must be resolved before answering the caller.

Appends can be enabled and disabled at any time. However versions of OpenTSDB prior to 2.2 will skip over appended values.

Pre-Split HBase Regions

For brand new installs you will see much better performance if you pre-split the regions in HBase regardless of if you're testing on a stand-alone server or running a full cluster. HBase regions handle a defined range of row keys and are essentially a single file. When you create thetsdbtable and start writing data for the first time, all of those data points are being sent to this one file on one server. As a region fills up, HBase will automatically split it into different files and move it to other servers in the cluster, but when this happens, the TSDs cannot write to the region and must buffer the data points. Therefore, if you can pre-allocate a number of regions before you start writing, the TSDs can send data to multiple files or servers and you'll be taking advantage of the linear scalability immediately.

The simplest way to pre-split yourtsdbtable regions is to estimate the number of unique metric names you'll be recording. If you have designed a naming schema, you should have a pretty good idea. Let's say that we will track 4,000 metrics in our system. That's not to say 4,000 time series, as we're not counting the tags yet, just the metric names such as "sys.cpu.user". Data points are written in row keys where the metric's UID comprises the first bytes, 3 bytes by default. The first metric will be assigned a UID of000001as a hex encoded value. The 4,000th metric will have a UID of000FA0in hex. You can use these as the start and end keys in the script from theHBase Bookto split your table into any number of regions. 256 regions may be a good place to start depending on how many time series share each metric.

TODO - include scripts for pre-splitting.

The simple split method above assumes that you have roughly an equal number of time series per metric (i.e. a fairly consistent cardinality). E.g. the metric with a UID of000001may have 200 time series and000FA0has about 150. If you have a wide range of time series per metric, e.g.000001has 10,000 time series while000FA0only has 2, you may need to develop a more complex splitting algorithm.

But don't worry too much about splitting. As stated above, HBase will automatically split regions for you so over time, the data will be distributed fairly evenly.

Distributed HBase

HBase will run in stand-alone mode where it will use the local file system for storing files. It will still use multiple regions and perform as well as the underlying disk or raid array will let it. You'll definitely want a RAID array under HBase so that if a drive fails, you can replace it without losing data. This kind of setup is fine for testing or very small installations and you should be able to get into the low thousands of data points per second.

However if you want serious throughput and scalability you have to setup a Hadoop and HBase cluster with multiple servers. In a distributed setup HDFS manages region files, automatically distributing copies to different servers for fault tolerance. HBase assigns regions to different servers and OpenTSDB's client will send data points to the specific server where they will be stored. You're now spreading operations amongst multiple servers, increasing performance and storage. If you need even more throughput or storage, just add nodes or disks.

There are a number of ways to setup a Hadoop/HBase cluster and a ton of various tuning tweaks to make, so Google around and ask user groups for advice. Some general recommendations include:

  • Dedicate a pair of high memory, low disk space servers for the Name Node. Set them up for high availability using something like Heartbeat and Pacemaker.
  • Setup Zookeeper on at least 3 servers for fault tolerance. They must have a lot of RAM and a fairly fast disk for log writing. On small clusters, these can run on the Name node servers.
  • JBOD for the HDFS data nodes
  • HBase region servers can be collocated with the HDFS data nodes
  • At least 1 gbps links between servers, 10 gbps preferable.
  • Keep the cluster in a single data center

Multiple TSDs

A single TSD can handle thousands of writes per second. But if you have many sources it's best to scale by running multiple TSDs and using a load balancer (such as Varnish or DNS round robin) to distribute the writes. Many users colocate TSDs on their HBase region servers when the cluster is dedicated to OpenTSDB.

Persistent Connections

Enable keep-alives in the TSDs and make sure that any applications you are using to send time series data keep their connections open instead of opening and closing for every write. Seeconfigurationfor details.

Disable Meta Data and Real Time Publishing

OpenTSDB 2.0 introduced meta data for tracking the kinds of data in the system. When tracking is enabled, a counter is incremented for every data point written and new UIDs or time series will generate meta data. The data may be pushed to a search engine or passed through tree generation code. These processes require greater memory in the TSD and may affect throughput. Tracking is disabled by default so test it out before enabling the feature.

2.0 also introduced a real-time publishing plugin where incoming data points can be emitted to another destination immediately after they're queued for storage. This is disabled by default so test any plugins you are interested in before deploying in production.

results matching ""

    No results matching ""