Data

Data formats

PSOAP relies upon chunks of data. When working with real data, there are a few things to keep in mind.

First, it may so happen that certain pixels may need to be masked, for example due to cosmic ray hits. This means that actual data chunks will probably have an un-equal number of pixels per epoch. This is OK.

So, all chunks will be generated with their full complement of data, but when executing any inference routines, the masks will be applied to the data.

All data and chunks are stored in an HDF5 format.

Data module

class psoap.data.Chunk(wl, fl, sigma, date, mask=None)[source]

Hold a chunk of data. Each chunk is shape (n_epochs, n_pix) and has components wl, fl, sigma, date, and mask (all the same length).

apply_mask()[source]

Apply the mask to all of the attributes, so now we return 1D arrays.

date = None

date vector

date1D = None

data vector of length n_epochs

fl = None

flux vector

lwl = None

natural log of the wavelength vector

classmethod open(order, wl0, wl1, limit=100, prefix='')[source]

Load a spectrum from a directory link pointing to HDF5 output. :param fname: HDF5 file containing files on disk.

sigma = None

measurement uncertainty vector

wl = None

wavelength vector

class psoap.data.Spectrum(fname)[source]

Data structure for the raw spectra, stored in an HDF5 file.

This is the main datastructure used to interact with your dataset. The key is getting your spectra into an HDF5 format first.

Parameters:fname (string) – location of the HDF5 file.
Returns:the instantiated Spectrum object.
Return type:Spectrum
sort_by_SN(order=22)[source]

Sort the dataset in order of decreasing signal to noise. This is designed to make it easy to limit the analysis to the highest SNR epochs, if you wish to speed things up.

Parameters:order (int) – the order to calculate the signal-to-noise. By default, the TRES Mg b order is chosen, which is generally a good order for TRES data. If you are using data from a different telescope, you will likely need to adjust this value.
psoap.data.lredshift(lwl, v)[source]

Redshift a vector of wavelengths that are already in log-lamba (natural log). A positive velocity corresponds to a lengthening (increase) of the wavelengths in the array.

Parameters:
  • wl (np.array, arbitrary shape) – the input ln(wavelengths).
  • velocity (float) – the velocity by which to redshift the wavelengths
Returns:

A redshifted version of the wavelength vector

Return type:

np.array

psoap.data.redshift(wl, v)[source]

Redshift a vector of wavelengths. A positive velocity corresponds to a lengthening (increase) of the wavelengths in the array.

Parameters:
  • wl (np.array, arbitrary shape) – the input wavelengths
  • velocity (float) – the velocity by which to redshift the wavelengths
Returns:

A redshifted version of the wavelength vector

Return type:

np.array

psoap.data.replicate_wls(lwls, velocities, mask)[source]

Using the set of velocities calculated from an orbit, copy and blue-shift the input ln(wavelengths), so that they correspond to the rest-frame wavelengths of the individual components. This routine is primarily for producing replicated ln-wavelength vectors ready to feed to the GP routines.

Parameters:
  • lwls (1D np.array with length (n_epochs * n_good_pixels)) – this dataproduct is the 1D representation of the natural log of the (masked) input wavelength vectors. The masking process naturally makes it 1D.
  • velocities (2D np.array with shape (n_components, n_epochs)) – a set of velocities determined from an orbital model.
  • mask – the np.bool mask used to select the good datapoints. It is necessary for properly replicating the velocities to the right epoch.
Returns:

A 2D (n_components, n_epochs * n_good_pixels) shape array of the wavelength vectors blue-shifted according to the velocities. This means that for each component, the arrays are flattened into 1D vectors.

Return type:

np.array

Utils module

psoap.utils.convert_dict(model, fix_params, **kwargs)[source]

Used to turn a dictionary of parameter values (from config.yaml) directly into a parameter type. Generally used for synthesis and plotting command line scripts.

psoap.utils.convert_vector(p, model, fix_params, **kwargs)[source]

Unroll a vector of parameter values into a parameter type, using knowledge about which model we are fitting, the parameters we are fixing, and the default values of those parameters.

Parameters:
  • p (np.float) – 1D input array of only a subset of parameter values.
  • model (str) – “SB1”, “SB2”, etc..
  • fix_params (list of str) – names of parameters that will be fixed
  • **kwargs – input for {param_name: default_value} pairs
Returns:

a 2-tuple of the full vectors for the orbital parameters, and the GP parameters, augmented with the previously missing values.

Return type:

(np.float, np.float)

psoap.utils.gelman_rubin(samplelist)[source]

Given a list of flatchains from separate runs (that already have burn in cut and have been trimmed, if desired), compute the Gelman-Rubin statistics in Bayesian Data Analysis 3, pg 284. If you want to compute this for fewer parameters, then slice the list before feeding it in.

psoap.utils.get_labels(model, fix_params)[source]

Collect the labels for each model, so that we can plot.