Data Type and Preprocessing

Data Types

Python list, numpy.ndarray, and torch.tensor (PyTorch tensors) are the base data types. The Experiment class can handle data saved in any of these types.

We use several customized types to specify the dimensionality and the base type of the data.

Customized Type

Base Type

Array

1D numpy.ndarray

Matrix

2D numpy.ndarray

ArrayLike1d

1D numpy.ndarray, list, or torch.tensor

MatrixLike2d

2D numpy.ndarray, list, or torch.tensor

Usually, users provide data in numpy.ndarray or list. NEXTorch converts them to torch.tensor and then pass them between BoTorch functions. Once the training is done, NEXTorch converts the data to numpy.ndarray for output and visualization purposes.

Type Conversion

np_to_tensor

Converts numpy objects to tensor objects Returns a copy

tensor_to_np

Convert tensor objects to numpy array objects Returns a copy with no gradient information :param X: tensor objects :type X: MatrixLike2d

expand_list

Expand 1d list to 2d

(Inverse) Normalization

Convert arrays or matrics from a real scale into a unit scale [0, 1] in each dimension, vice versa. This step is often needed for X before modeling training.

Note

X or X_unit suggests that the values are in unit scales.

X_real suggests that the values are in real scales or units.

unitscale_xv

Takes in an x array in a real scale and converts it to a unit scale

unitscale_X

Takes in a matrix in a real scale and converts it into a unit scale

inverse_unitscale_xv

Takes in an x array in a unit scale and converts it to a real scale

inverse_unitscale_X

Takes in a matrix in a unit scale and converts it into a real scale

(Inverse) Standardization

Convert arrays or matrics from a real scale into a standardized scale with a zero mean and unit variance in each dimension, vice versa. This step is often needed for Y before modeling training.

Note

Y suggests that the values are in standardized scales.

Y_real suggests that the values are in real scales or units.

standardize_X

Takes in an array/matrix X and returns the standardized data with zero mean and a unit variance

inverse_standardize_X

Takes in an arrary/matrix/tensor X and returns the data in the real scale

Test Points Generation

Generate X points as in a mesh grid for visualization or testing purposes.

create_X_mesh_2d

Create 2D mesh for testing

transform_Y_mesh_2d

takes in 1 column of X tensor predict the Y values convert to real units and return a 2D numpy array in the size of mesh_size*mesh_size

transform_X_2d

Transform X1 and X2 in unit scale to real scales for plotting

prepare_full_X_unit

Create a full X_test matrix given with varying x at dimensions defined by x_indices and fixed values at the rest dimensions

prepare_full_X_real

Create a full X_test matrix given with varying x at dimensions defined by x_indices and fixed values at the rest dimensions

get_baseline_unit

Get the baseline values from X_ranges in a unit scale

fill_full_X_test

Choose certain dimensions defined by x_indices, fill them with mesh test points and keep the rest as fixed values.

create_full_X_test_2d

Choose two dimensions, create 2D mesh and keep the rest as fixed values.

create_full_X_test_1d

Choose two dimensions, create 2D mesh and keep the rest as fixed values.

Encoding/Decoding

For ordinal and categorical variables, their real values need to be converted into unit-scale encodings in the continuous space. We can do it with real_to_encode_X. These encodings are used to train BO functions.

To convert the unit-scale encodings back to the original variable values, we can do it in two steps: using unit_to_encode_X and encode_to_real_X.

A ParameterSpace object can also be the input to these functions.

encode_xv

Convert original data to encoded data

decode_xv

Decoded the data to ordinal or categorical values

real_to_encode_X

Takes in a matrix in a real scale from the relaxed continuous space, rounds (encodes) to the available values, and converts it into a unit scale

unit_to_encode_X

Takes in a matrix in a unit scale from the relaxed continuous space, rounds (encodes) to the available values

encode_to_real_X

Takes in a matrix in a unit scale from the encoding space, converts it into a real scale

real_to_encode_ParameterSpace

Takes in a matrix in a real scale from the relaxed continuous space, rounds (encodes) to the available values, and converts it into a unit scale.

unit_to_encode_ParameterSpace

Takes in a matrix in a unit scale from the relaxed continuous space, rounds (encodes) to the available values Using ParameterSpace object

encode_to_real_ParameterSpace

Takes in a matrix in a unit scale from the encoding space, converts it into a real scale Using ParameterSpace object