
Is temperature measured in Fahrenheit or Celsius? What are reasonable values for CPU temperature in each unit? Do we store temperature with decimals or as an integer in the schema?.Is CPU a percentage? Out of 100% or are we representing multi-core CPUs that can present as 200%, 400%, or 800%?.With just two fields, we have a few choices to make if we want the generated data to feel more realistic. Take our simple example of CPU and temperature data from above. Even if the data you're trying to mimic are just numeric values, they likely have valid ranges and maybe a predictable frequency. It's probably not a bunch of decimal or integer values. In most cases, however, you often know what the data you're trying to explore looks like.

Using the basic techniques we've already discussed allows you to create a lot of data quickly. This feels like a good time to make sure we're on the same page. With a little more thought and custom PostgreSQL functions, we can start to bring our sample data "to life." What is realistic data? We can use this to our advantage to begin making the data look more realistic. Every row has a different value because the function is called separately for each row of generated data. That's what happened with the random() function in the CPU data example. Likewise, adding a function as a column value will be called one time for each row of the final set. If we add static text (like 'Hello, Timescale!'), that text is repeated for every row. It's all random numbers, with lots of decimals and minimal variation.Īs we saw in the query above (generating fake CPU data), any columns of data that we add to the SELECT query are added to each row of the resulting set. Even if we can generate 50 million rows of data with a few lines of SQL, the data we generate isn't very realistic.
#Postgresql mock data generator how to
+-+-+Īnd finally, we talked about how to calculate the total number of rows your query would generate based on the time range, the interval between timestamps, and the number of "things" for which you are creating fake data. there is an implicit CROSS JOIN between the two generate_series() sets This example from the first post joined a timestamp set, a numeric set, and the random() function to create fake CPU data for four fake devices over time. We then discussed how the data quickly becomes more complex as we join the various sets together (along with some value returning functions) to create a multiple of both sets together. SELECT * from generate_series('','', INTERVAL '1 hour')
#Postgresql mock data generator series
generate a series of timestamps, incrementing by 1 hour create a series of values, 1 through 5, incrementing by 1

The generated data is essentially an in-memory table that can quickly create large sets of sample data. In the first post, we demonstrated how generate_series() (a Set Returning Function) could quickly create a data set based on a range of numeric values or dates. In part 3 of this blog series adds one final tool to the mix - combining the data formatting techniques below with additional equations and relational data to shape your sample time-series output into something that more closely resembles real-life applications.īy the end of this series, you'll be ready to test almost any feature that TimescaleDB offers and create quick datasets for your testing and demos! A brief review of generate_series() This second post will demonstrate a few ways to create more realistic-looking data beyond a column or two of random decimal values. Without more effort, using functions like random() to generate values doesn't provide much control over precisely what numbers are produced, so the data still feels more fake than we might want. The data that we were able to generate was very basic and not very realistic. However, there was one problem with the data we could produce at the end of the first post.

We ended the first post by showing you how to quickly calculate the number of rows a query will produce and modify the parameters for generate_series() to fine-tune the size and shape of the data. In part 1 of the series, we reviewed how generate_series() works, including the ability to join multiple series into a larger table of time-series data - through a feature known as a CROSS (or Cartesian) JOIN. In this three-part series on generating sample time-series data, we demonstrate how to use the built-in PostgreSQL function, generate_series(), to more easily create large sets of data to help test various workloads, database features, or just to create fun samples.
