Skip to content

rockfish.actions

import rockfish.actions as ra

Source and Sink Actions

rockfish.actions.DatasetLoad

Load a Dataset as the output table.

Attributes:

Name Type Description
Config type[LoadConfig]

Alias for LoadConfig.

rockfish.actions.DatasetSave

Save table as a Dataset.

Attributes:

Name Type Description
Config type[SaveConfig]

Alias for SaveConfig.

rockfish.actions.ModelLoad

Produce a model table.

Attributes:

Name Type Description
Config type[Config]

Alias for Config.

Dataset Property Extraction Actions

rockfish.actions.TabPropertyExtractor

Compute and add dataset and field properties to the tabular dataset.

Run default property detection for tabular datasets
import rockfish.actions as ra
detect_tab_props = ra.TabPropertyExtractor()
Run property detection with PII detection
import rockfish.actions as ra
detect_tab_props = ra.TabPropertyExtractor(detect_pii=True)
Run property detection with default association rule detection
import rockfish.actions as ra
detect_tab_props = ra.TabPropertyExtractor(detect_association_rules=True)
Run property detection with custom association threshold
import rockfish.actions as ra
detect_tab_props = ra.TabPropertyExtractor(
    detect_association_rules=True,
    association_threshold=0.99
)

rockfish.actions.properties.TabPropertyExtractorConfig

Config class for the TabPropertyExtractor action.

Attributes:

Name Type Description
detect_pii bool

Flag to run PII detection or not (default = False).

detect_association_rules bool

Flag to run association rule detection or not (default = False). Running this will add AssociationRules to the dataset properties.

association_threshold float

Fields will be associated with each other if their association score is greater than the association threshold (default = 0.95). Should be a number between [0.0, 1.0].

rockfish.actions.TimePropertyExtractor

Compute and add dataset and field properties to the timeseries dataset.

Run default property detection for timeseries datasets
import rockfish.actions as ra
detect_time_props = ra.TimePropertyExtractor(timestamp="ts")
Run property detection with a known timeseries data model
import rockfish.actions as ra
detect_time_props = ra.TimePropertyExtractor(
    timestamp="ts",
    session_fields=["user_id"],
    metadata_fields=["age", "gender"]
)
Run property detection with metadata field detection
import rockfish.actions as ra
detect_time_props = ra.TimePropertyExtractor(
    timestamp="ts",
    session_fields=["user_id"],
    detect_metadata_fields=True
)
Run property detection with PII detection
import rockfish.actions as ra
detect_time_props = ra.TimePropertyExtractor(timestamp="ts", detect_pii=True)
Run property detection with default association rule detection
import rockfish.actions as ra
detect_time_props = ra.TimePropertyExtractor(
    timestamp="ts",
    detect_association_rules=True
)
Run property detection with custom association threshold
import rockfish.actions as ra
detect_time_props = ra.TimePropertyExtractor(
    timestamp="ts",
    detect_association_rules=True,
    association_threshold=0.99
)

rockfish.actions.properties.TimePropertyExtractorConfig

Config class for the TimePropertyExtractor action.

Attributes:

Name Type Description
timestamp str

Name of the timestamp field in a timeseries dataset.

metadata_fields Optional[list[str]]

List of field names to be treated as metadata fields in the timeseries dataset (default = None). Should be None if detect_metadata_fields is True. Can be an empty list if detect_metadata_fields is False.

detect_metadata_fields bool

Flag to run metadata field detection or not (default = False).

session_fields list[str]

List of fields to be treated as session fields in the timeseries dataset (default = []). Cannot be an empty list if detect_metadata_fields is True.

detect_pii bool

Flag to run PII detection or not (default = False).

detect_association_rules bool

Flag to run association rule detection or not (default = False). Running this will add AssociationRules to the dataset properties.

association_threshold float

Fields will be associated with each other if their association score is greater than the association threshold (default = 0.95). Should be a number between [0.0, 1.0].

Data Processing Actions

rockfish.actions.Apply

Apply a function and append the results to the table as a new field.

Attributes:

Name Type Description
Config type[ApplyConfig]

Alias for ApplyConfig.

rockfish.actions.Transform

Transform a field replacing the values with the result of the function.

Attributes:

Name Type Description
Config type[TransformConfig]

Alias for TransformConfig.

rockfish.actions.AppendUUID

Return table with new field of UUID values.

Append field 'a' with UUID values
import rockfish.actions as ra
append_uuid = ra.AppendUUID(
    append_field="a",
    seed=1234
)
Append field 'b' with UUID values, per session
import rockfish.actions as ra
append_uuid = ra.AppendUUID(
    group_fields=["session_key"],
    append_field="b",
    seed=1234
)
Append field 'c' with UUID values, per other group_fields
import rockfish.actions as ra
append_uuid = ra.AppendUUID(
    group_fields=["d", "e"],
    append_field="c",
    seed=1234
)

Attributes:

Name Type Description
Config

Alias for AppendUUIDConfig.

rockfish.actions.append.AppendUUIDConfig

Config class for the AppendUUID action.

Attributes:

Name Type Description
group_fields Optional[list[str]]

List of fields to group over. Each group will be assigned a new value in the append_field. If an empty list is specified, each row will be assigned a new value. If unspecified, group_fields will be taken from the dataset's TableMetadata.

append_field str

The name of the new field to append.

seed Optional[int]

The seed for the random number generator.

rockfish.actions.AppendDomain

Return table with new field of values from the given domain. All values in the domain should be of the same type. It is possible to pass only one value in the domain, in case one wants to add a single-valued field.

Append field 'a' with values from given domain
import rockfish.actions as ra
append_domain = ra.AppendDomain(
    append_field="a",
    domain=["one", "two", "three"],
    seed=1234
)
Append field 'a' with a constant value
import rockfish.actions as ra
append_domain = ra.AppendDomain(
    append_field="a",
    domain=[10],
    seed=1234
)
Append field 'b' with values from given domain, per session
import rockfish.actions as ra
append_domain = ra.AppendDomain(
    group_fields=["session_key"],
    append_field="b",
    domain=["one", "two", "three"],
    seed=1234
)
Append field 'c' with values from given domain, per other group_fields
import rockfish.actions as ra
append_domain = ra.AppendDomain(
    group_fields=["d", "e"],
    append_field="c",
    domain=["one", "two", "three"],
    seed=1234
)

Attributes:

Name Type Description
Config

Alias for AppendDomainConfig.

rockfish.actions.append.AppendDomainConfig

Config class for the AppendDomain action.

Attributes:

Name Type Description
group_fields Optional[list[str]]

List of fields to group over. Each group will be assigned a value in the append_field. If an empty list is specified, each row will be assigned a value. If unspecified, group_fields will be taken from the dataset's TableMetadata.

append_field str

The name of the new field to append.

domain Union[list[str], list[int], list[float]]

List of values that the new field can have. All values should have the same data type. The list should be of size <= 100.

seed Optional[int]

The seed for the random number generator.

rockfish.actions.AppendNormal

Return table with new field of values from the given normal distribution.

Append field 'a' with values from normal(mean=0.0, scale=1.0)
import rockfish.actions as ra
append_normal = ra.AppendNormal(
    append_field="a",
    mean=0.0,
    scale=1.0,
    seed=1234
)
Append field 'a' with values from normal(mean=0.0, scale=1.0), precision = 3 digits
import rockfish.actions as ra
append_normal = ra.AppendNormal(
    append_field="a",
    mean=0.0,
    scale=1.0,
    append_field_ndigits=3,
    seed=1234
)
Append field 'b' with values from normal(mean=0.0, scale=1.0), per session
import rockfish.actions as ra
append_normal = ra.AppendNormal(
    group_fields=["session_key"],
    append_field="b",
    mean=0.0,
    scale=1.0,
    seed=1234
)
Append field 'c' with values from normal(mean=0.0, scale=1.0), per other group_fields
import rockfish.actions as ra
append_normal = ra.AppendNormal(
    group_fields=["d", "e"],
    append_field="c",
    mean=0.0,
    scale=1.0,
    seed=1234
)

Attributes:

Name Type Description
Config

Alias for AppendNormalConfig.

rockfish.actions.append.AppendNormalConfig

Config class for the AppendNormal action.

Attributes:

Name Type Description
group_fields Optional[list[str]]

List of fields to group over. Each group will be assigned a value in the append_field. If an empty list is specified, each row will be assigned a value. If unspecified, group_fields will be taken from the dataset's TableMetadata.

append_field str

The name of the new field to append.

mean float

Mean of normal distribution from which new field values are sampled from.

scale float

Standard deviation of normal distribution from which new field values are sampled from.

append_field_ndigits int

Precision of append field (default = 2).

seed Optional[int]

The seed for the random number generator.

rockfish.actions.Flatten

Flatten a table by expanding json objects / pyarrow structs in a column into multiple columns. e.g.

col1 col2 col3
a {"b": 1} c

turns into

col1 col2.b col3
a 1 c

This action recursively flattens the table until no more json nestings are present. This action does not handle lists or JSON arrays, and will raise an error if present in the table.

rockfish.actions.flatten.FlattenConfig dataclass

Configuration class for the Flatten action.

Attributes:

Name Type Description
separator str

String that field values after expanding a struct will be concatenated by.

rockfish.actions.Unflatten

Unflatten a table by condensing multiple columns into json objects / pyarrow structs. e.g.

col1 col2.b col3
a 1 c

turns into

col1 col2 col3
a {"b": 1} c

rockfish.actions.flatten.UnflattenConfig dataclass

Configuration class for the Unflatten action.

Attributes:

Name Type Description
separator str

String that field values are split by when constructing structs.

rockfish.actions.Sample

Return table with sampled rows according to the provided sample_type.

Sample using default sampling method
import rockfish.actions as ra
sample = ra.Sample(sample_size=100, sample_type=None)
Sample using random sampling with replacement
import rockfish.actions as ra
sample = ra.Sample(frac=0.23, sample_type="random", replace=True, seed=3)

Attributes:

Name Type Description
Config

Alias for SampleConfig.

rockfish.actions.sample.SampleConfig dataclass

Config class for the Sample action.

Attributes:

Name Type Description
sample_size Optional[int]

the number of rows to sample

frac Optional[float]

the fraction of rows to sample

sample_type Optional[SampleType]

the type of sampling to use, if None, uses first_n

seed Optional[int]

the seed for the random number generator

replace Optional[bool]

sample with replacement, if true, allows the same row to be sampled multiple times

session_key Optional[str]

the field name that defines the session for timeseries datasets

chunk bool

produce chunks of data

chunk_row_limit int

number of rows in each chunk

rockfish.actions.SampleLabel

Sample rows/sessions that match a label.

Sample from a lable field
sample = ra.SampleLabel(
    field="my_label",
    dist={
        "value1": ra.SampleLabel.Count(2),
        "value2": ra.SampleLabel.Count(4),
        "": ra.SampleLabel.Count(6),
    }
    replace=True,
)

Attributes:

Name Type Description
Config

Alias for SampleLabelConfig.

rockfish.actions.sample_label.SampleLabelConfig

Config class for the SampleLabel action.

Attributes:

Name Type Description
field str

field containing the sampling label

dist SampleDist

distribution for each label; the empty string matches all unspecified values

replace bool

sample with replacement, if true, allows the same row to be sampled multiple times

session_key Optional[str]

the field name that defines the session for timeseries datasets

seed Optional[int]

the seed for the random number generator

chunk bool

produce chunks of data

chunk_row_limit int

number of rows in each chunk

rockfish.actions.AlterTimestamp

Alter a timestamp field in the table.

The method to generate new timestamps depends on the interarrival_type option.

fixed

The fixed type generates new timestamps with fixed/regular interarrivals spread over the time range at a per session level.

random

The random type generates new timestamps with random interarivals at a per session level.

squeeze

The squeeze type takes the original interarrivals and shifts them to the starting or ending of the time range depending on the value of flow_start_type. If the interarrivals are larger than the range they are linearly scaled to fit.

chop

The chop type takes the original interarrivals and shifts them to the starting or ending of the time range depending on the value of flow_start_type. If the interarrivals are larger than the range they are trimmed.

original

The original type takes the original interarrivals and shifts them to the starting or ending of the time range depending on the value of flow_start_type. They are not scaled or trimmed.

AlterTimestamp Action Example
import rockfish.actions as ra

alter_timestamp = ra.AlterTimestamp(
    field="ts",
    start_time=datetime(2024, 11, 11, 0, 0, 0),
    end_time=datetime(2024, 11, 11, 23, 59, 59),
    interarrival_type="random",
)

Attributes:

Name Type Description
Config

rockfish.actions.timestamps.AlterTimestampConfig

Configuration class for the AlterTimestamp action.

Attributes:

Name Type Description
field str

Field name containing the timestamp to alter.

start_time datetime

Start time for the desired output range.

end_time datetime

End time for the desired output range.

flow_start_type Literal['starting', 'ending', 'random']

Method for placing the flow within the range, if the interarrival_type supports.

interarrival_type Literal['fixed', 'random', 'squeeze', 'chop', 'original']

Method to use for generating new timestamps.

seed Optional[int]

Fixed seed for the random number generator.

rockfish.actions.PostAmplify

rockfish.actions.SQL

Return table after applying the provided SQL query.

Run query on one table
import rockfish.actions as ra
sql = ra.SQL(
    query="select col_1 from foo_table;",
    table_name="foo_table"
)
Join two tables on a common column
import rockfish.actions as ra
query = "select t1.col_1, t2.col_1, from t1 inner join t2 on t1.id = t2.id;"
t2_id = "<ID_OF_REMOTE_DATASET>"  # using rockfish.RemoteDataset.id
sql = ra.SQL(
    query=query,
    table_name="t1",
    dataset_name_to_id={"t2": t2_id}
)

Note: If your table(s) contains columns that have uppercase names, please wrap the column names in backticks or quotation marks. For example, if your table has a column called 'Color', the SQL query should be passed as:

  1. "select `Color` from my_table", OR
  2. 'select "Color" from my_table'

Attributes:

Name Type Description
Config

Alias for Config.

rockfish.actions.sql.Config dataclass

Config class for the SQL action.

Attributes:

Name Type Description
query str

The SQL query to run on the table.

table_name str

Name that the table is referred to in the SQL query, the default name is 'my_table'.

dataset_name_to_id dict[str, str]

Dict that maps additional remote dataset names to their dataset IDs, these are retrieved before the query is applied.

Encoding Actions

rockfish.actions.JoinFields

Merge fields using a separator and append the merged field to the table. The original fields are dropped from the table.

Join fields 'a', 'b' and 'c'
import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b", "c"])
Join fields 'a' and 'b' with a custom separator
import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b"], separator="++")
Join fields 'a' and 'b' with a custom name for the new field
import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b"], append_field="a_and_b")

rockfish.actions.join_split.JoinConfig

Configuration class for the JoinFields action.

Attributes:

Name Type Description
fields list[str]

List of field names in the table that need to be merged.

append_field Optional[str]

Name of merged field that will be appended to the table.

separator str

String that field values in the merged field will be separated by.

rockfish.actions.SplitField

Split a field using a separator and append the split fields to the table. The original field is dropped from the table.

Split previously joined fields 'a', 'b' and 'c'
import rockfish.actions as ra
split = ra.SplitField(field="a;b;c")
Split multiple previously joined fields 'a;b' and 'c;d'
import rockfish.actions as ra

# suppose the join actions were added as follows:
builder.add(join_ab, parents=[dataset])
builder.add(join_cd, parents=[join_ab])

# the corresponding split actions should be added
# in the reverse order:
split_ab = ra.SplitField(field="a;b")
split_cd = ra.SplitField(field="c;d")

builder.add(split_cd, parents=[model])
builder.add(split_ab, parents=[split_cd])

rockfish.actions.join_split.SplitConfig

Configuration class for the SplitField action.

Attributes:

Name Type Description
field Optional[str]

Field name in the table that needs to be split.

append_fields Optional[list[str]]

List of split field names that will be appended to the table.

separator Optional[str]

String that field values in the split field will be separated by.

rockfish.actions.LabelEncode

Return table after label encoding has been applied on the given field.

Label encode field 'a'
import rockfish.actions as ra
label_encode = ra.LabelEncode(field="a")

rockfish.actions.encode.LabelEncodeConfig

Config class for the LabelEncode action.

Attributes:

Name Type Description
field str

field to be encoded (should be categorical)

rockfish.actions.LabelDecode

Return table after label decoding has been applied on the given field. Assumes a LabelEncode action was applied on the field before training.

Label decode previously encoded field 'a'
import rockfish.actions as ra
label_decode = ra.LabelDecode(field="a")
Label decode previously encoded fields 'a', 'b'
import rockfish.actions as ra

# suppose the encoding actions were added as follows:
builder.add(label_encode_a, parents=[dataset])
builder.add(label_encode_b, parents=[label_encode_a])

# the corresponding decoding actions should be added
# in the reverse order:
label_decode_a = ra.LabelDecode(field="a")
label_decode_b = ra.LabelDecode(field="b")

builder.add(label_decode_b, parents=[model])
builder.add(label_decode_a, parents=[label_decode_b])

rockfish.actions.encode.LabelDecodeConfig

Config class for the LabelDecode action.

Attributes:

Name Type Description
field Optional[str]

field to be decoded.

artifact_id Optional[str]

Artifact ID that contains the label encoder mappings.

rockfish.actions.LogEncode

Return table after log encoding has been applied on the given field.

Log encode field 'a'
import rockfish.actions as ra
log_encode = ra.LogEncode(field="a")

rockfish.actions.encode.LogEncodeConfig

Config class for the LogEncode action.

Attributes:

Name Type Description
field str

field to be encoded (should be continuous)

rockfish.actions.LogDecode

Return table after log decoding has been applied on the given field. Assumes a LogEncode action was applied on the field before training.

Log decode previously encoded field 'a'
import rockfish.actions as ra
log_decode = ra.LogDecode(field="a")
Log decode previously encoded field 'a', specify precision for decoded field
import rockfish.actions as ra
log_decode = ra.LogDecode(field="a", field_ndigits=2)
Log decode previously encoded fields 'a', 'b'
import rockfish.actions as ra

# suppose the encoding actions were added as follows:
builder.add(log_encode_a, parents=[dataset])
builder.add(log_encode_b, parents=[log_encode_a])

# the corresponding decoding actions should be added
# in the reverse order:
log_decode_a = ra.LogDecode(field="a")
log_decode_b = ra.LogDecode(field="b")

builder.add(log_decode_b, parents=[model])
builder.add(log_decode_a, parents=[log_decode_b])

rockfish.actions.encode.LogDecodeConfig

Config class for the LogEncode action.

Attributes:

Name Type Description
field Optional[str]

field to be decoded (should be continuous)

field_ndigits Optional[int]

precision of decoded field, applicable for float fields only (default = 3)

field_type Optional[FieldType]

field type to cast the decoded field back to

rockfish.actions.SubtractTimestamp

This calculates deltas for a list of timestamps relative to a primary timestamp. This is useful for calculating the time difference between two timestamps, if using the TimeGAN model.

Example:

timestamp1 timestamp2 timestamp3
2021-01-01 2021-01-02 2021-01-03
SubtractTimestamp Action Workflow Example
import rockfish.actions as ra
subtract = ra.SubtractTimestamp(base_timestamp="timestamp1",
                 fields=["timestamp2", "timestamp3"],
                 timestamp_format="%Y-%m-%d")

After running the workflow:

timestamp1 timestamp2 timestamp3
2021-01-01 1 day 2 days

Another example, if not all timestamps are correlated:

timestamp1 timestamp2 timestamp3
2021-01-01 2021-01-02 2011-10-03
SubtractTimestamp Action Workflow Example [uncorrelated timestamp3]
import rockfish.actions as ra
subtract = ra.SubtractTimestamp(base_timestamp="timestamp1",
                 fields=["timestamp2"],
                 timestamp_format="%Y-%m-%d")

After running the workflow:

timestamp1 timestamp2 timestamp3
2021-01-01 1 day 2011-10-03

Another example, if you do not want to replace the fields:

timestamp1 timestamp2 timestamp3
2021-01-01 2021-01-02 2021-01-03
SubtractTimestamp Action Workflow Example [append_fields]
import rockfish.actions as ra
subtract = ra.SubtractTimestamp(base_timestamp="timestamp1",
                 fields=["timestamp2", "timestamp3"],
                 append_fields=["timestamp2_delta", "timestamp3_delta"],
                 timestamp_format="%Y-%m-%d")

After running the workflow:

timestamp1 timestamp2 timestamp3 timestamp2_delta timestamp3_delta
2021-01-01 2021-01-02 2021-01-03 1 day 2 days

rockfish.actions.timestamps.SubtractTimestampConfig dataclass

Configuration class for the SubtractTimestamp action

Attributes:

Name Type Description
base_timestamp str

the timestamp to which the other timestamps are compared

fields list[str]

the list of timestamps to calculate the deltas for

append_fields Optional[list[str]]

the list of columns to append the durations to. If None, the durations will be appended to the same column.

timestamp_format Optional[str]

the format of the timestamps IF they are strings.

rockfish.actions.AddDuration

This calculates timestamps from deltas for a list of timestamps relative to a primary timestamp. Post Synthesis, this is useful for converting the deltas back to timestamps.

Example:

timestamp1 timestamp2 timestamp3
2021-01-01 1 day 2 days
AddDuration Action Workflow Example
import rockfish.actions as ra
add = ra.AddDuration(base_timestamp="timestamp1",
                     fields=["timestamp2", "timestamp3"],
                     timestamp_format="%Y-%m-%d")

After running the workflow:

timestamp1 timestamp2 timestamp3
2021-01-01 2021-01-02 2021-01-03

Another example, if not all timestamps are correlated (will be ignored):

timestamp1 timestamp2 timestamp3
2021-01-01 1 day 2011-10-03
AddDuration Action Workflow Example [uncorrelated timestamp3]
import rockfish.actions as ra
add = ra.AddDuration(base_timestamp="timestamp1",
                     fields=["timestamp2"],
                     timestamp_format="%d-%m-%Y")

After running the workflow:

timestamp1 timestamp2 timestamp3
01-01-2021 02-01-2021 03-10-2011

rockfish.actions.timestamps.AddDurationConfig dataclass

Configuration class for the AddDuration action

Attributes:

Name Type Description
base_timestamp str

the timestamp to which the other timestamps are compared

fields list[str]

the list of columns that are timestamp deltas, or duration[s] dtype

timestamp_format str

the format of the timestamps. This parameter is REQUIRED. This converts the primary timestamp to this format if it is a string. This also converts all relative_timestamps into this format after delta conversion.

Train and Generate Actions

rockfish.actions.TrainTimeGAN

Train a Rockfish DoppelGANger based model.

train = ra.Train(ra.Train.Config())

Attributes:

Name Type Description
Config type[Config]

Alias for Config

DGConfig type[DGConfig]

Alias for DGConfig

DatasetConfig type[DatasetConfig]

Alias for DatasetConfig

TimestampConfig type[TimestampConfig]

Alias for TimestampConfig

FieldConfig type[FieldConfig]

Alias for FieldConfig

EmbeddingConfig type[EmbeddingConfig]

Alias for EmbeddingConfig

PrivacyConfig type[PrivacyConfig]

Alias for PrivacyConfig

rockfish.actions.GenerateTimeGAN

Generate synthetic data using the Rockfish DoppelGANger model.

generate = ra.Generate(ra.Generate.Config())

Attributes:

Name Type Description
Config type[Config]

Alias for Config

DGConfig type[DGConfig]

Alias for DGConfig

DatasetConfig type[DatasetConfig]

Alias for DatasetConfig

TimestampConfig type[TimestampConfig]

Alias for TimestampConfig

FieldConfig type[FieldConfig]

Alias for FieldConfig

EmbeddingConfig type[EmbeddingConfig]

Alias for EmbeddingConfig

PrivacyConfig type[PrivacyConfig]

Alias for PrivacyConfig

rockfish.actions.TrainTabGAN

Train a model using a tabular GAN.

Attributes:

Name Type Description
Config type[TrainTabGANConfig]

Alias for TrainTabGANConfig

TrainConfig type[TrainConfig]

Alias for TrainConfig

DatasetConfig type[DatasetConfig]

Alias for DatasetConfig

TimestampConfig type[TimestampConfig]

Alias for TimestampConfig

FieldConfig type[FieldConfig]

Alias for FieldConfig

rockfish.actions.GenerateTabGAN

Generate synthetic data using a tabular GAN model.

Attributes:

Name Type Description
Config type[GenerateTabGANConfig]
GenerateConfig type[GenerateConfig]

Alias for GenerateConfig

rockfish.actions.TrainTabTransformer

Train a Tab Transformer model.

rockfish.actions.GenerateTabTransformer

Generate synthetic data using the Tab Transformer model.

Attributes:

Name Type Description
Config TypeAlias

rockfish.actions.TrainTimeTransformer

Train a Time Transformer model.

Attributes:

Name Type Description
Config TypeAlias
TrainConfig TypeAlias

Alias for TrainTimeConfig.

ParentConfig TypeAlias

Alias for ParentConfig.

ChildConfig TypeAlias

Alias for ChildConfig.

GPT2Config TypeAlias

Alias for GPT2Config.

DatasetConfig TypeAlias

Alias for DatasetConfig.

TimestampConfig TypeAlias

Alias for TimestampConfig.

FieldConfig TypeAlias

Alias for FieldConfig.

rockfish.actions.GenerateTimeTransformer

Generate synthetic data using the Time Transformer model.

Attributes:

Name Type Description
Config TypeAlias

rockfish.actions.SessionTarget

SessionTarget can be used to trigger generation cycles until a desired target number of sessions is reached.

Attributes:

Name Type Description
Config type[Config]

Alias for Config.

Evaluation

rockfish.actions.EvaluateLinkability

Evaluate the linkability privacy score of the input data.

Example:

Consider the example dataset:

Age Gender Zip Code Medical Condition Label
25 F 10000 Condition X ori
41 M 10732 Condition Y ori
30 M 20000 Condition Y syn
... ... ... ... ...

The configuration for the action includes the auxiliary columns to use for the attack, the label column name, and the number of neighbors to use for the attack.

config = {
    "aux_cols_a": ["Age", "Gender"],
    "aux_cols_b": ["Zip Code", "Medical Condition"],
    "label": "Label",
    "n_neighbors": 1,
}
evaluate_linkability = ra.EvaluateLinkability(config)

The output of the action is a table with a single linkability score between 0 and 1, where higher values indicate better protection against linkability attacks.

rockfish.actions.privacy.LinkabilityConfig

Configuration for the EvaluateLinkability action.

Attributes:

Name Type Description
n_attacks int

The number of attacks to run.

n_trials int

The number of trials to run.

label str

The label column name.

rng Optional[int]

The random seed.

aux_cols_a list[str]

The auxiliary columns to use for the first set.

aux_cols_b list[str]

The auxiliary columns to use for the second set.

n_neighbors int

The number of neighbors to use for the k-nearest neighbors attack.

rockfish.actions.EvaluateInference

Evaluate the inference privacy score of the input data.

Example:

Consider the example dataset:

Age Gender Zip Code Medical Condition Label
25 F 10000 Condition X ori
41 M 10732 Condition Y ori
30 M 20000 Condition Y syn
... ... ... ... ...

The configuration for the action includes the auxiliary columns to use for the attack, the secret column name, and the label column name.

config = {
    "aux_cols": ["Age", "Gender", "Zip Code"],
    "secret": "Medical Condition",
    "label": "Label",
}
evaluate_inference = ra.EvaluateInference(config)

The output of the action is a table with a single inference score between 0 and 1, where higher values indicate better protection against inference attacks.

rockfish.actions.privacy.InferenceConfig

Configuration for the EvaluateInference action.

Attributes:

Name Type Description
n_attacks int

The number of attacks to run.

n_trials int

The number of trials to run.

label str

The label column name.

rng Optional[int]

The random seed.

aux_cols list[str]

The auxiliary columns to use as features for the inference attack.

secret str

The secret column name to attack.

rockfish.actions.EvaluateLogisticRegression

Evaluate the classification performance using Logistic Regression.

Example:

Consider the fall detection dataset with labels for train and test sets.

Sex Body Temperature Heart Rate Respiratory Rate SBP DBP split
M 97 80 15 140 90 train
F 96 78 14 145 95 train
M 98 81 13 143 93 test
... ... ... ... ... ... ...

The configuration for the action includes the numerical features, the binary-valued target, and the positive label.

config = {
    "features": [
        "Body Temperature",
        "Heart Rate",
        "Respiratory Rate",
        "SBP",
        "DBP",
    ],
    "target": "Sex",
    "pos_label": "F",
}
evaluate_logistic_regression = ra.EvaluateLogisticRegression(config)

The output of the action is a table with a single AUC value.

rockfish.actions.txtr.LogisticRegressionConfig

Configuration for the EvaluateLogisticRegression action.

See details on some of the arguments in sklearn.linear_model.LogisticRegression v1.6.1.

Attributes:

Name Type Description
features list[str]

Numerical features to use in the model.

target str

The classification target. Must have two unique values.

pos_label Optional[str]

The positive label. If None and the target value set is {0, 1} or {-1, 1}, then the positive label is 1.

table_split_col_name str

The name of the column that contains the split label (train/test).

penalty Optional[Literal['l1', 'l2', 'elasticnet']]

Specify the norm of the penalty.

dual bool

Dual (constrained) or primal (regularized) formulation.

tol float

Tolerance for stopping criteria.

C float

Inverse of regularization strength; must be a positive float.

fit_intercept bool

Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

intercept_scaling float

Useful only when the solver 'liblinear' is used and fit_intercept is set to True.

class_weight ClassWeight

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

random_state Optional[int]

Used when solver == 'sag', 'saga' or 'liblinear' to shuffle the data.

solver str

Algorithm to use in the optimization problem.

max_iter int

Maximum number of iterations taken for the solvers to converge.

rockfish.actions.EvaluateRandomForest

Evaluate the classification performance using Random Forest.

See the example in EvaluateLogisticRegression for usage.

rockfish.actions.txtr.RandomForestConfig

Configuration for the EvaluateRandomForest action.

See details on some of the arguments in sklearn.ensemble.RandomForestClassifier v1.6.1.

Attributes:

Name Type Description
features list[str]

Numerical features to use in the model.

target str

The classification target. Must have two unique values.

pos_label Optional[str]

The positive label. If None and the target value set is {0, 1} or {-1, 1}, then the positive label is 1.

table_split_col_name str

The name of the column that contains the split label (train/test).

n_estimators int

The number of trees in the forest.

criterion Literal['gini', 'entropy', 'log_loss']

The function to measure the quality of a split.

max_depth Optional[int]

The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_split int

The minimum number of samples required to split an internal node.

min_samples_leaf int

The minimum number of samples required to be at a leaf node.

min_weight_fraction_leaf float

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.

max_features Union[str, int, float, None]

The number of features to consider when looking for the best split.

max_leaf_nodes Optional[int]

Grow trees with max_leaf_nodes in best-first fashion.

min_impurity_decrease float

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

bootstrap bool

Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

oob_score bool

Whether to use out-of-bag samples to estimate the generalization score.

n_jobs Optional[int]

The number of jobs to run in parallel.

random_state Optional[int]

Controls both the randomness of the bootstrapping of the samples used when building trees (if bootstrap=True) and the sampling of the features to consider when looking for the best split at each node (if max_features < n_features).

class_weight ClassWeight

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

ccp_alpha float

Complexity parameter used for Minimal Cost-Complexity Pruning.

max_samples Optional[float]

If bootstrap is True, the number of samples to draw from X to train each base estimator.

rockfish.actions.txtr.ClassWeight = Union[dict[str, float], str, None] module-attribute

rockfish.actions.EvaluateForecast

Evaluate the forecasting performance using Prophet.

Example 1:

Consider the following time series dataset:

ds y split
2007-12-10 9.590761 train
2007-12-11 8.519590 train
2007-12-12 8.183677 train
... ... ...
2016-01-16 7.817223 train
2016-01-17 9.273878 test
2016-01-18 10.333775 test
2016-01-19 9.125871 test
2016-01-20 8.891374 test

The configuration for the action includes the datestamp, the target, and the split column.

config = {"datestamp": "ds", "target": "y", "table_split_col_name": "split"}
evaluate_forecast = ra.EvaluateForecast(config)

The output of the action is a table with the forecasted values.

ds y
2016-01-17 9.496974
2016-01-18 9.777253
2016-01-19 9.577357
2016-01-20 9.425384

Example 2:

The input table can contain multiple sessions. The forecast is done on each session separately. Sessions are defined by one or multiple group-by columns as the session key.

ds y split group1 group2
2020-01-01 0.030472 train a x
2020-01-02 0.677833 train a x
2020-01-03 1.049973 train a x
... ... ... ... ...
2020-04-05 3.583768 test b y
2020-04-06 3.054671 test b y
2020-04-07 3.180977 test b y

The configuration for the action includes the datestamp, the target, the split column, and the session key.

config = {
    "datestamp": "ds",
    "target": "y",
    "table_split_col_name": "split",
    "session_key": ["group1", "group2"],
}
evaluate_forecast = ra.EvaluateForecast(config)

If no session key is provided, the table metadata is used to extract the session field or the group fields. If that fails, the entire table is treated as a single session.

Note: the datestamp column needs to be a date type.

rockfish.actions.txtr.EvaluateForecastConfig

Configuration for the EvaluateForecast action.

Attributes:

Name Type Description
datestamp str

The datestamp to use for the forecast.

target str

The target column to forecast.

table_split_col_name str

The name of the column that contains the split label (train/test).

session_key Optional[Union[str, list[str]]]

The name of the column(s) that contain the session field or group fields. If None, then the session key is extracted from the table metadata. If session or group fields are found, forecasting is done on each session separately. Otherwise, it's done on the entire dataset.

growth Literal['linear', 'logistic', 'flat']

String 'linear', 'logistic', or 'flat' to specify a linear, logistic, or flat trend.

yearly_seasonality Seasonality

Fit yearly seasonality. Can be 'auto', True, False, or an integer number of Fourier terms to generate.

weekly_seasonality Seasonality

Fit weekly seasonality. Can be 'auto', True, False, or an integer number of Fourier terms to generate.

daily_seasonality Seasonality

Fit daily seasonality. Can be 'auto', True, False, or an integer number of Fourier terms to generate.

seasonality_mode Literal['additive', 'multiplicative']

'additive' (default) or 'multiplicative'.

seasonality_prior_scale float

Parameter modulating the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, smaller values dampen the seasonality.

holidays_prior_scale float

Parameter modulating the strength of the holiday components model.

mcmc_samples int

If greater than 0, will do full Bayesian inference with the specified number of MCMC samples. If 0, will do MAP estimation.

holidays_mode Literal['auto', 'additive', 'multiplicative']

'additive', 'multiplicative', or 'auto'. Defaults to seasonality_mode.