Project

pest

0.0
No commit activity in last 3 years
No release in over 3 years
Wrappers to facilitate different classes of probability estimators
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

>= 0
>= 0
>= 0

Runtime

 Project Readme

Pest, a framework for Probability Estimation

Build Status

A concise API focused on painless investigation of data sets

Pest provides a framework for interacting with different probability estimation models. Pest abstracts common statstical operations including:

  • Marginal, Joint and Conditional point probability
  • Interval and Cumulative probability
  • Entropy, Cross Entropy, and Mutual Information
  • Mean, Median, Mode, etc

Scalability if you need it

Pest tries to be agnostic about the underlying data data structures, so changing libraries (NArray -> Hadoop) is as simple as using a different data source. Pest is designed to create estimators using subsets of larger data sources, and transparently constructs estimators to facilitate dynamic querying

Code structure designed to be extended

Implementing custom estimation models is easy, and Pest implements some model common ones for you.

Install

Add it to your Gemfile and bundle

gem "pest"

bundle install 

API

# Creating Datasets
test = Pest::DataSet::Hash.from_hash hash             # Creates a Hash dataset of observations from a hash
train = Pest::DataSet::NArray.from_hash hash          # Creates a NArray dataset

# DataSet Variables
test.variables                                        # hash of Variable instances detected in observation set
test.v                                                # alias of 'variables'
test.v[:foo]                                          # a specific variable
test.v[:foo] = another_variable                       # explicit declaration

# Creating Estimators
e = Pest::Estimator::Frequency.new(data)              # Frequentist estimator - values treated as unordered set
e = Pest::Estimator::Multinomial.new(data)            # Multinomial estimator
e = Pest::Estimator::Gaussian.new(data)               # Gaussian mean/varaince ML estimator

# Descriptive Statistical Properties
#e.mode(:foo)                                          # Mode
#e.mean(:foo)                                          # Mean (discrete & continuous only)
#e.median(:foo)                                        # Median (discrete & continuous only)
# quantile?
# variance?
# deviation?

# Estimating Entropy (Set & Discrete only)
e.entropy(:foo)                                       # Entropy of 'foo'
e.h(:foo, :bar)                                       # Joint entropy of 'foo' AND 'bar'
e.h(:foo).given(:bar)                                 # Cross entropy of 'foo' : 'bar'
e.mutual_information(:foo, :bar)                      # Mutual information of 'foo' and 'bar'
e.i(:foo, :bar)                                       # Alias

# Estimating Point Probability
e.probability(e.variables[:foo] => 1)                 # Estimate the probability that foo=1
e.p(:foo => 1)                                        # Same as above, tries to find a variable named 'foo'
e.p(:foo => 1, :bar => 2)                             # Estimate the probability that foo=1 AND bar=2
e.p(:foo => 1).given(:bar => 2)                       # Estimate the probability that foo=1 given bar=2
e.p(:foo => 1, :bar => 2).given(:baz => 3, :qux => 4) # Moar

# Batch Point Probability Estimation
e.batch_probability(:foo).in(test)                    # Estimate the probability of each value in test
e.batch_p(:foo, :bar).in(test)                        # Joint probability
e.batch_p(:foo).given(:bar).in(test)                  # Conditional probability
e.batch_p(:foo, :bar).given(:baz, :qux).in(test)      # Moar

# Estimating Cumulative & Interval Probability
#e.probability(:foo).greater_than(:bar).in(test)
#e.p(:foo).greater_than(:bar).less_than(:baz).in(test)
#e.p(:foo).gt(:bar).lt(:baz).given(:qux).in(test)

TODO

the builders should validate the variables they're given and throw errors if they're not part of the estimators data