Fall Research Expo 2020

Generating Time Series Data Using Probability Specifications

In the 21st century, generating data has become an efficient alternative to traditional data collection methods. In this poster, we present the motivation and methods behind a new data generation tool capable of taking in a probability model and generating a customized stream of realistic data. The tool generates time series data, which is just a sequence of data points indexed by time. Various other data generation tools are briefly discussed.

Three cases of time series systems are considered. The static case type implies that the events at a given time t are independent of events at all other time steps—time is effectively irrelevant for data generation. The time-invariant case type defines the data such that the distribution of variables at time t is constant for all time steps; in other words, they are constrained by the stationary assumption. However, generation at time t may depend on data from previous time steps, even if probabilities do not change. Finally, the time-variant case type allows probabilities at time t to be defined in terms of probabilities before it, and thus probabilities may vary with time. An example of the time-invariant case type is presented in the poster. More detailed examples of solving and generating for all three case types can be found through the QR code in the “Methods” section.

By rewriting probability and independence specifications in terms of elementary event probabilities, we demonstrate a capacity to fully define a categorical probability model. This is done using a Mathematica program taking in an input file of specifications, solving those specifications as a system of equations, and generating data. In doing this, time series data can realistically be generated according to the constraints of a fully-defined system.

PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
College of Arts & Sciences 2022
Advised By
Ivan Ruchkin
Oleg Sokolsky
Insup Lee
Kaustubh Sridhar
PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
College of Arts & Sciences 2022
Advised By
Ivan Ruchkin
Oleg Sokolsky
Insup Lee
Kaustubh Sridhar

Comments

Thank you Jason for the impressive presentation! It seems that the different cases have some challenges to them, especially time invariant and time variant cases. Hoping things go well on developing the project in the future!

Jason,

Thanks for the presentation! I found the idea that generating data for analysis fascinating. I was wondering if there are cases of papers or projects that have already used such generated data for studies rather than collected ones? 

Junyoung

Thank you for a very interesting presentation! I have never really thought of the drawbacks of collecting real-world data, and the idea of generating data sounds like a great solution. I am curious to see how and in which fields these models will be implemented.