rockfish.actions

import rockfish.actions as ra

Source and Sink Actions

`rockfish.actions.DatasetLoad`

Load a Dataset as the output table.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `LoadConfig`.

`rockfish.actions.DatasetSave`

Save table as a Dataset.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `SaveConfig`.

`rockfish.actions.ModelLoad`

Produce a model table.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `Config`.

Dataset Property Extraction Actions

`rockfish.actions.TabPropertyExtractor`

Compute and add dataset and field properties to the tabular dataset.

Run default property detection for tabular datasets

import rockfish.actions as ra
detect_tab_props = ra.TabPropertyExtractor()

Run property detection with PII detection

import rockfish.actions as ra
detect_tab_props = ra.TabPropertyExtractor(detect_pii=True)

Run property detection with default association rule detection

import rockfish.actions as ra
detect_tab_props = ra.TabPropertyExtractor(detect_association_rules=True)

Run property detection with custom association threshold

import rockfish.actions as ra
detect_tab_props = ra.TabPropertyExtractor(
    detect_association_rules=True,
    association_threshold=0.99
)

`rockfish.actions.properties.TabPropertyExtractorConfig`

Config class for the TabPropertyExtractor action.

Attributes:

Name	Type	Description
`detect_pii`	`bool`	Flag to run PII detection or not (default = False).
`detect_association_rules`	`bool`	Flag to run association rule detection or not (default = False). Running this will add AssociationRules to the dataset properties.
`association_threshold`	`float`	Fields will be associated with each other if their association score is greater than the association threshold (default = 0.95). Should be a number between [0.0, 1.0].

`rockfish.actions.TimePropertyExtractor`

Compute and add dataset and field properties to the timeseries dataset.

Run default property detection for timeseries datasets

import rockfish.actions as ra
detect_time_props = ra.TimePropertyExtractor(timestamp="ts")

Run property detection with a known timeseries data model

import rockfish.actions as ra
detect_time_props = ra.TimePropertyExtractor(
    timestamp="ts",
    session_fields=["user_id"],
    metadata_fields=["age", "gender"]
)

Run property detection with metadata field detection

import rockfish.actions as ra
detect_time_props = ra.TimePropertyExtractor(
    timestamp="ts",
    session_fields=["user_id"],
    detect_metadata_fields=True
)

Run property detection with PII detection

import rockfish.actions as ra
detect_time_props = ra.TimePropertyExtractor(timestamp="ts", detect_pii=True)

Run property detection with default association rule detection

import rockfish.actions as ra
detect_time_props = ra.TimePropertyExtractor(
    timestamp="ts",
    detect_association_rules=True
)

Run property detection with custom association threshold

import rockfish.actions as ra
detect_time_props = ra.TimePropertyExtractor(
    timestamp="ts",
    detect_association_rules=True,
    association_threshold=0.99
)

`rockfish.actions.properties.TimePropertyExtractorConfig`

Config class for the TimePropertyExtractor action.

Attributes:

Name	Type	Description
`timestamp`	`str`	Name of the timestamp field in a timeseries dataset.
`metadata_fields`	`Optional[list[str]]`	List of field names to be treated as metadata fields in the timeseries dataset (default = None). Should be None if detect_metadata_fields is True. Can be an empty list if detect_metadata_fields is False.
`detect_metadata_fields`	`bool`	Flag to run metadata field detection or not (default = False).
`session_fields`	`list[str]`	List of fields to be treated as session fields in the timeseries dataset (default = []). Cannot be an empty list if detect_metadata_fields is True.
`detect_pii`	`bool`	Flag to run PII detection or not (default = False).
`detect_association_rules`	`bool`	Flag to run association rule detection or not (default = False). Running this will add AssociationRules to the dataset properties.
`association_threshold`	`float`	Fields will be associated with each other if their association score is greater than the association threshold (default = 0.95). Should be a number between [0.0, 1.0].

Data Processing Actions

`rockfish.actions.Apply`

Apply a function and append the results to the table as a new field.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `ApplyConfig`.

`rockfish.actions.Transform`

Transform a field replacing the values with the result of the function.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `TransformConfig`.

`rockfish.actions.AppendUUID`

Return table with new field of UUID values.

Append field 'a' with UUID values

import rockfish.actions as ra
append_uuid = ra.AppendUUID(
    append_field="a",
    seed=1234
)

Append field 'b' with UUID values, per session

import rockfish.actions as ra
append_uuid = ra.AppendUUID(
    group_fields=["session_key"],
    append_field="b",
    seed=1234
)

Append field 'c' with UUID values, per other group_fields

import rockfish.actions as ra
append_uuid = ra.AppendUUID(
    group_fields=["d", "e"],
    append_field="c",
    seed=1234
)

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `AppendUUIDConfig`.

`rockfish.actions.append.AppendUUIDConfig`

Config class for the AppendUUID action.

Attributes:

Name	Type	Description
`group_fields`	`Optional[list[str]]`	List of fields to group over. Each group will be assigned a new value in the append_field. If an empty list is specified, each row will be assigned a new value. If unspecified, group_fields will be taken from the dataset's TableMetadata.
`append_field`	`str`	The name of the new field to append.
`seed`	`Optional[int]`	The seed for the random number generator.

`rockfish.actions.AppendDomain`

Return table with new field of values from the given domain. All values in the domain should be of the same type. It is possible to pass only one value in the domain, in case one wants to add a single-valued field.

Append field 'a' with values from given domain

import rockfish.actions as ra
append_domain = ra.AppendDomain(
    append_field="a",
    domain=["one", "two", "three"],
    seed=1234
)

Append field 'a' with a constant value

import rockfish.actions as ra
append_domain = ra.AppendDomain(
    append_field="a",
    domain=[10],
    seed=1234
)

Append field 'b' with values from given domain, per session

import rockfish.actions as ra
append_domain = ra.AppendDomain(
    group_fields=["session_key"],
    append_field="b",
    domain=["one", "two", "three"],
    seed=1234
)

Append field 'c' with values from given domain, per other group_fields

import rockfish.actions as ra
append_domain = ra.AppendDomain(
    group_fields=["d", "e"],
    append_field="c",
    domain=["one", "two", "three"],
    seed=1234
)

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `AppendDomainConfig`.

`rockfish.actions.append.AppendDomainConfig`

Config class for the AppendDomain action.

Attributes:

Name	Type	Description
`group_fields`	`Optional[list[str]]`	List of fields to group over. Each group will be assigned a value in the append_field. If an empty list is specified, each row will be assigned a value. If unspecified, group_fields will be taken from the dataset's TableMetadata.
`append_field`	`str`	The name of the new field to append.
`domain`	`Union[list[str], list[int], list[float]]`	List of values that the new field can have. All values should have the same data type. The list should be of size <= 100.
`seed`	`Optional[int]`	The seed for the random number generator.

`rockfish.actions.AppendNormal`

Return table with new field of values from the given normal distribution.

Append field 'a' with values from normal(mean=0.0, scale=1.0)

import rockfish.actions as ra
append_normal = ra.AppendNormal(
    append_field="a",
    mean=0.0,
    scale=1.0,
    seed=1234
)

Append field 'a' with values from normal(mean=0.0, scale=1.0), precision = 3 digits

import rockfish.actions as ra
append_normal = ra.AppendNormal(
    append_field="a",
    mean=0.0,
    scale=1.0,
    append_field_ndigits=3,
    seed=1234
)

Append field 'b' with values from normal(mean=0.0, scale=1.0), per session

import rockfish.actions as ra
append_normal = ra.AppendNormal(
    group_fields=["session_key"],
    append_field="b",
    mean=0.0,
    scale=1.0,
    seed=1234
)

Append field 'c' with values from normal(mean=0.0, scale=1.0), per other group_fields

import rockfish.actions as ra
append_normal = ra.AppendNormal(
    group_fields=["d", "e"],
    append_field="c",
    mean=0.0,
    scale=1.0,
    seed=1234
)

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `AppendNormalConfig`.

`rockfish.actions.append.AppendNormalConfig`

Config class for the AppendNormal action.

Attributes:

Name	Type	Description
`group_fields`	`Optional[list[str]]`	List of fields to group over. Each group will be assigned a value in the append_field. If an empty list is specified, each row will be assigned a value. If unspecified, group_fields will be taken from the dataset's TableMetadata.
`append_field`	`str`	The name of the new field to append.
`mean`	`float`	Mean of normal distribution from which new field values are sampled from.
`scale`	`float`	Standard deviation of normal distribution from which new field values are sampled from.
`append_field_ndigits`	`int`	Precision of append field (default = 2).
`seed`	`Optional[int]`	The seed for the random number generator.

`rockfish.actions.Flatten`

Flatten a table by expanding json objects / pyarrow structs in a column into multiple columns. e.g.

col1	col2	col3
a	{"b": 1}	c

turns into

col1	col2.b	col3
a	1	c

This action recursively flattens the table until no more json nestings are present. This action does not handle lists or JSON arrays, and will raise an error if present in the table.

`rockfish.actions.flatten.FlattenConfig` `dataclass`

Configuration class for the Flatten action.

Attributes:

Name	Type	Description
`separator`	`str`	String that field values after expanding a struct will be concatenated by.

`rockfish.actions.Unflatten`

Unflatten a table by condensing multiple columns into json objects / pyarrow structs. e.g.

col1	col2.b	col3
a	1	c

turns into

col1	col2	col3
a	{"b": 1}	c

`rockfish.actions.flatten.UnflattenConfig` `dataclass`

Configuration class for the Unflatten action.

Attributes:

Name	Type	Description
`separator`	`str`	String that field values are split by when constructing structs.

`rockfish.actions.Sample`

Return table with sampled rows according to the provided sample_type.

Sample using default sampling method

import rockfish.actions as ra
sample = ra.Sample(sample_size=100, sample_type=None)

Sample using random sampling with replacement

import rockfish.actions as ra
sample = ra.Sample(frac=0.23, sample_type="random", replace=True, seed=3)

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `SampleConfig`.

`rockfish.actions.sample.SampleConfig` `dataclass`

Config class for the Sample action.

Attributes:

Name	Type	Description
`sample_size`	`Optional[int]`	the number of rows to sample
`frac`	`Optional[float]`	the fraction of rows to sample
`sample_type`	`Optional[SampleType]`	the type of sampling to use, if None, uses first_n
`seed`	`Optional[int]`	the seed for the random number generator
`replace`	`Optional[bool]`	sample with replacement, if true, allows the same row to be sampled multiple times
`session_key`	`Optional[str]`	the field name that defines the session for timeseries datasets
`chunk`	`bool`	produce chunks of data
`chunk_row_limit`	`int`	number of rows in each chunk

`rockfish.actions.SampleLabel`

Sample rows/sessions that match a label.

Sample from a lable field

sample = ra.SampleLabel(
    field="my_label",
    dist={
        "value1": ra.SampleLabel.Count(2),
        "value2": ra.SampleLabel.Count(4),
        "": ra.SampleLabel.Count(6),
    }
    replace=True,
)

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `SampleLabelConfig`.

`rockfish.actions.sample_label.SampleLabelConfig`

Config class for the SampleLabel action.

Attributes:

Name	Type	Description
`field`	`str`	field containing the sampling label
`dist`	`SampleDist`	distribution for each label; the empty string matches all unspecified values
`replace`	`bool`	sample with replacement, if true, allows the same row to be sampled multiple times
`session_key`	`Optional[str]`	the field name that defines the session for timeseries datasets
`seed`	`Optional[int]`	the seed for the random number generator
`chunk`	`bool`	produce chunks of data
`chunk_row_limit`	`int`	number of rows in each chunk

`rockfish.actions.AlterTimestamp`

Alter a timestamp field in the table.

The method to generate new timestamps depends on the interarrival_type option.

`fixed`

The fixed type generates new timestamps with fixed/regular interarrivals spread over the time range at a per session level.

`random`

The random type generates new timestamps with random interarivals at a per session level.

`squeeze`

The squeeze type takes the original interarrivals and shifts them to the starting or ending of the time range depending on the value of flow_start_type. If the interarrivals are larger than the range they are linearly scaled to fit.

`chop`

The chop type takes the original interarrivals and shifts them to the starting or ending of the time range depending on the value of flow_start_type. If the interarrivals are larger than the range they are trimmed.

`original`

The original type takes the original interarrivals and shifts them to the starting or ending of the time range depending on the value of flow_start_type. They are not scaled or trimmed.

AlterTimestamp Action Example

import rockfish.actions as ra

alter_timestamp = ra.AlterTimestamp(
    field="ts",
    start_time=datetime(2024, 11, 11, 0, 0, 0),
    end_time=datetime(2024, 11, 11, 23, 59, 59),
    interarrival_type="random",
)

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `AlterTimestampConfig`.

`rockfish.actions.timestamps.AlterTimestampConfig`

Configuration class for the AlterTimestamp action.

Attributes:

Name	Type	Description
`field`	`str`	Field name containing the timestamp to alter.
`start_time`	`datetime`	Start time for the desired output range.
`end_time`	`datetime`	End time for the desired output range.
`flow_start_type`	`Literal['starting', 'ending', 'random']`	Method for placing the flow within the range, if the `interarrival_type` supports.
`interarrival_type`	`Literal['fixed', 'random', 'squeeze', 'chop', 'original']`	Method to use for generating new timestamps.
`seed`	`Optional[int]`	Fixed seed for the random number generator.

`rockfish.actions.PostAmplify`

`rockfish.actions.SQL`

Return table after applying the provided SQL query.

Run query on one table

import rockfish.actions as ra
sql = ra.SQL(
    query="select col_1 from foo_table;",
    table_name="foo_table"
)

Join two tables on a common column

import rockfish.actions as ra
query = "select t1.col_1, t2.col_1, from t1 inner join t2 on t1.id = t2.id;"
t2_id = "<ID_OF_REMOTE_DATASET>"  # using rockfish.RemoteDataset.id
sql = ra.SQL(
    query=query,
    table_name="t1",
    dataset_name_to_id={"t2": t2_id}
)

Note: If your table(s) contains columns that have uppercase names, please wrap the column names in backticks or quotation marks. For example, if your table has a column called 'Color', the SQL query should be passed as:

"select `Color` from my_table", OR
'select "Color" from my_table'

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `Config`.

`rockfish.actions.sql.Config` `dataclass`

Config class for the SQL action.

Attributes:

Name	Type	Description
`query`	`str`	The SQL query to run on the table.
`table_name`	`str`	Name that the table is referred to in the SQL query, the default name is 'my_table'.
`dataset_name_to_id`	`dict[str, str]`	Dict that maps additional remote dataset names to their dataset IDs, these are retrieved before the query is applied.

Encoding Actions

`rockfish.actions.JoinFields`

Merge fields using a separator and append the merged field to the table. The original fields are dropped from the table.

Join fields 'a', 'b' and 'c'

import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b", "c"])

Join fields 'a' and 'b' with a custom separator

import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b"], separator="++")

Join fields 'a' and 'b' with a custom name for the new field

import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b"], append_field="a_and_b")

`rockfish.actions.join_split.JoinConfig`

Configuration class for the JoinFields action.

Attributes:

Name	Type	Description
`fields`	`list[str]`	List of field names in the table that need to be merged.
`append_field`	`Optional[str]`	Name of merged field that will be appended to the table.
`separator`	`str`	String that field values in the merged field will be separated by.

`rockfish.actions.SplitField`

Split a field using a separator and append the split fields to the table. The original field is dropped from the table.

Split previously joined fields 'a', 'b' and 'c'

import rockfish.actions as ra
split = ra.SplitField(field="a;b;c")

Split multiple previously joined fields 'a;b' and 'c;d'

import rockfish.actions as ra

# suppose the join actions were added as follows:
builder.add(join_ab, parents=[dataset])
builder.add(join_cd, parents=[join_ab])

# the corresponding split actions should be added
# in the reverse order:
split_ab = ra.SplitField(field="a;b")
split_cd = ra.SplitField(field="c;d")

builder.add(split_cd, parents=[model])
builder.add(split_ab, parents=[split_cd])

`rockfish.actions.join_split.SplitConfig`

Configuration class for the SplitField action.

Attributes:

Name	Type	Description
`field`	`Optional[str]`	Field name in the table that needs to be split.
`append_fields`	`Optional[list[str]]`	List of split field names that will be appended to the table.
`separator`	`Optional[str]`	String that field values in the split field will be separated by.

`rockfish.actions.LabelEncode`

Return table after label encoding has been applied on the given field.

Label encode field 'a'

import rockfish.actions as ra
label_encode = ra.LabelEncode(field="a")

`rockfish.actions.encode.LabelEncodeConfig`

Config class for the LabelEncode action.

Attributes:

Name	Type	Description
`field`	`str`	field to be encoded (should be categorical)

`rockfish.actions.LabelDecode`

Return table after label decoding has been applied on the given field. Assumes a LabelEncode action was applied on the field before training.

Label decode previously encoded field 'a'

import rockfish.actions as ra
label_decode = ra.LabelDecode(field="a")

Label decode previously encoded fields 'a', 'b'

import rockfish.actions as ra

# suppose the encoding actions were added as follows:
builder.add(label_encode_a, parents=[dataset])
builder.add(label_encode_b, parents=[label_encode_a])

# the corresponding decoding actions should be added
# in the reverse order:
label_decode_a = ra.LabelDecode(field="a")
label_decode_b = ra.LabelDecode(field="b")

builder.add(label_decode_b, parents=[model])
builder.add(label_decode_a, parents=[label_decode_b])

`rockfish.actions.encode.LabelDecodeConfig`

Config class for the LabelDecode action.

Attributes:

Name	Type	Description
`field`	`Optional[str]`	field to be decoded.
`artifact_id`	`Optional[str]`	Artifact ID that contains the label encoder mappings.

`rockfish.actions.LogEncode`

Return table after log encoding has been applied on the given field.

Log encode field 'a'

import rockfish.actions as ra
log_encode = ra.LogEncode(field="a")

`rockfish.actions.encode.LogEncodeConfig`

Config class for the LogEncode action.

Attributes:

Name	Type	Description
`field`	`str`	field to be encoded (should be continuous)

`rockfish.actions.LogDecode`

Return table after log decoding has been applied on the given field. Assumes a LogEncode action was applied on the field before training.

Log decode previously encoded field 'a'

import rockfish.actions as ra
log_decode = ra.LogDecode(field="a")

Log decode previously encoded field 'a', specify precision for decoded field

import rockfish.actions as ra
log_decode = ra.LogDecode(field="a", field_ndigits=2)

Log decode previously encoded fields 'a', 'b'

import rockfish.actions as ra

# suppose the encoding actions were added as follows:
builder.add(log_encode_a, parents=[dataset])
builder.add(log_encode_b, parents=[log_encode_a])

# the corresponding decoding actions should be added
# in the reverse order:
log_decode_a = ra.LogDecode(field="a")
log_decode_b = ra.LogDecode(field="b")

builder.add(log_decode_b, parents=[model])
builder.add(log_decode_a, parents=[log_decode_b])

`rockfish.actions.encode.LogDecodeConfig`

Config class for the LogEncode action.

Attributes:

Name	Type	Description
`field`	`Optional[str]`	field to be decoded (should be continuous)
`field_ndigits`	`Optional[int]`	precision of decoded field, applicable for float fields only (default = 3)
`field_type`	`Optional[FieldType]`	field type to cast the decoded field back to

`rockfish.actions.SubtractTimestamp`

This calculates the duration between the base_timestamp field and each of the fields provided. This is useful for calculating the time difference between two timestamps if using the TimeGAN model, which only supports a single timestamp field. The fields must be of timestamp type, or a string type convertible to timestamp.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `SubtractTimestampConfig`. Example: \| timestamp1 \| timestamp2 \| timestamp3 \| \|------------\|------------\|------------\| \| 2021-01-01 \| 2021-01-02 \| 2021-01-03 \| `py title="SubtractTimestamp Action Workflow Example" subtract = SubtractTimestamp(base_timestamp="timestamp1", fields=["timestamp2", "timestamp3"])` After running the workflow: \| timestamp1 \| timestamp2 \| timestamp3 \| \|------------\|------------\|------------\| \| 2021-01-01 \| 1 day \| 2 days \| Another example, if not all timestamps are correlated: \| timestamp1 \| timestamp2 \| timestamp3 \| \|------------\|------------\|------------\| \| 2021-01-01 \| 2021-01-02 \| 2011-10-03 \| `py title="SubtractTimestamp Action Workflow Example [uncorrelated timestamp3]" subtract = SubtractTimestamp(base_timestamp="timestamp1", fields=["timestamp2"])` After running the workflow: \| timestamp1 \| timestamp2 \| timestamp3 \| \|------------\|------------\|------------\| \| 2021-01-01 \| 1 day \| 2011-10-03 \|

`rockfish.actions.timestamps.SubtractTimestampConfig` `dataclass`

Configuration class for the SubtractTimestamp action.

Attributes:

Name	Type	Description
`base_timestamp`	`str`	The timestamp field that will be subtracted from the other fields.
`fields`	`list[str]`	The fields from which the base_timestamp will be subtracted.

`rockfish.actions.AddDuration`

This calculates timestamps from durations, for the fields provided, from the base_timestamp. Post generation of synthetic data, this is useful for converting the durations back to timestamps. This requires that the fields param be the fields that are of type duration, and the base_timestamp be of type timestamp. It will add the duration to the base_timestamp and return a new timestamp.

The duration_types param as well as relative_timestamp_types must be provided, if the SubtractTimestamp action was not run prior.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `AddDurationConfig`. Example: \| timestamp1 \| timestamp2 \| timestamp3 \| \|------------\|------------\|------------\| \| 2021-01-01 \| 1 day \| 2 days \| `py title="AddDuration Action Workflow Example" add = AddDuration(base_timestamp="timestamp1", fields=["timestamp2", "timestamp3"])` After running the workflow: \| timestamp1 \| timestamp2 \| timestamp3 \| \|------------\|------------\|------------\| \| 2021-01-01 \| 2021-01-02 \| 2021-01-03 \| Another example, if not all timestamps are correlated (will be ignored): \| timestamp1 \| timestamp2 \| timestamp3 \| \|------------\|------------\|------------\| \| 2021-01-01 \| 1 day \| 2011-10-03 \| `py title="AddDuration Action Workflow Example [uncorrelated timestamp3]" add = AddDuration(base_timestamp="timestamp1", fields=["timestamp2"])` After running the workflow: \| timestamp1 \| timestamp2 \| timestamp3 \| \|------------\|------------\|------------\| \| 01-01-2021 \| 02-01-2021 \| 03-10-2011 \|

`rockfish.actions.timestamps.AddDurationConfig` `dataclass`

Configuration class for the AddDuration action.

Attributes:

Name	Type	Description
`base_timestamp`	`Optional[str]`	The timestamp field to which the duration fields are added to.
`fields`	`Optional[list[str]]`	The fields that are of type duration that are added to base_timestamp to create fields of type timestamp.
`duration_types`	`Optional[list[str]]`	The types of the duration fields provided. This must be the same length as fields. Valid types are "duration[s]", "duration[ms]", "duration[us]", "duration[ns]".
`relative_timestamp_types`	`Optional[list[str]]`	The desired output pyarrow types of the timestamp fields. This must be the same length as fields.

`rockfish.actions.StabPreprocess`

This class preprocesses data by binning values and analyzing transitions of states in grouped datasets. It includes utilities for binning categorical data and constructing Markov Chains for state transitions.

Binning Example: Given a table:

field	group_field
'a'	'group1'
'b'	'group1'
'c'	'group1'
'b'	'group2'
'd'	'group2'

Results in a table with bin assignments:

field	group_field
1	'group1'
0	'group1'
1	'group1'
0	'group2'
2	'group2'

Markov Chain Example: The above table will be transformed into a Markov chain:

curr	next	perc
a	b	1.0
b	c	0.5
b	d	0.5
c	NULL	1.0
d	NULL	1.0

Usage Example:

import rockfish.actions as ra
preprocess = ra.StabPreprocess(
    field="field",
    group_fields=["group_field"],
)

`rockfish.actions.StabPostprocess`

This class postprocesses data by reconstructing original values from binned categories and simulating state transitions for grouped datasets. It supports two approaches: using state tracking based on Markov Chains or directly unbinning values without state tracking.

Postprocessing Example (Without State Tracking): Input data: A binned table with bin numbers.

field
0
1
2

The process unbins these values back to their original states based on bin proportions.

Result:

field
'a'
'b'
'd'

Postprocessing Example (With State Tracking): Input binned data for sessions:

session	bin_num
"session1"	1
"session1"	2
"session2"	2
"session2"	3

Markov Chains transition data is used to simulate and generate probable sequences of states within each session.

Result:

session	field
"session1"	'a'
"session1"	'b'
"session2"	'b'
"session2"	'c'

Usage Example:

import rockfish.actions as ra
postprocess = ra.StabPostprocess(
    field="field",
    group_fields=["group_field"],
    artifact_id="some_id",
    state_tracking=True
)

Train and Generate Actions

`rockfish.actions.GenerateFromDataSchema`

Generate synthetic data tables from a DataSchema specification.

This action uses the Rockfish EntityGenerator to generate synthetic data based on a data schema definition.

The data schema definition supports:

Multiple entities (tables) with relationships
Independent columns (IDs, categorical, numerical distributions)
Stateful columns (timeseries, state machines)
Derived columns (foreign keys, computed values)
Temporal data with timestamps

Each entity is generated as a separate PyArrow table and uploaded as a labeled Rockfish dataset.

See the rockfish.actions.ent documentation for a quickstart, examples, and the complete schema reference.

`rockfish.actions.TrainTimeGAN`

Train a Rockfish DoppelGANger based model.

train = ra.Train(ra.Train.Config())

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `Config`
`DGConfig`	`TypeAlias`	Alias for `DGConfig`
`DatasetConfig`	`TypeAlias`	Alias for `DatasetConfig`
`TimestampConfig`	`TypeAlias`	Alias for `TimestampConfig`
`FieldConfig`	`TypeAlias`	Alias for `FieldConfig`
`EmbeddingConfig`	`TypeAlias`	Alias for `EmbeddingConfig`
`PrivacyConfig`	`TypeAlias`	Alias for `PrivacyConfig`

`rockfish.actions.GenerateTimeGAN`

Generate synthetic data using the Rockfish DoppelGANger model.

generate = ra.Generate(ra.Generate.Config())

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `Config`
`DGConfig`	`TypeAlias`	Alias for `DGConfig`
`DatasetConfig`	`TypeAlias`	Alias for `DatasetConfig`
`TimestampConfig`	`TypeAlias`	Alias for `TimestampConfig`
`FieldConfig`	`TypeAlias`	Alias for `FieldConfig`
`EmbeddingConfig`	`TypeAlias`	Alias for `EmbeddingConfig`
`PrivacyConfig`	`TypeAlias`	Alias for `PrivacyConfig`

`rockfish.actions.TrainTabGAN`

Train a model using a tabular GAN.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `TrainTabGANConfig`
`TrainConfig`	`TypeAlias`	Alias for `TrainConfig`
`DatasetConfig`	`TypeAlias`	Alias for `DatasetConfig`
`TimestampConfig`	`TypeAlias`	Alias for `TimestampConfig`
`FieldConfig`	`TypeAlias`	Alias for `FieldConfig`

`rockfish.actions.GenerateTabGAN`

Generate synthetic data using a tabular GAN model.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `GenerateTabGANConfig`
`GenerateConfig`	`TypeAlias`	Alias for `GenerateConfig`

`rockfish.actions.TrainTabTransformer`

Train a Tab Transformer model.

`rockfish.actions.GenerateTabTransformer`

Generate synthetic data using the Tab Transformer model.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `GenerateTabTransformerConfig`.

`rockfish.actions.TrainTimeTransformer`

Train a Time Transformer model.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `TrainTimeTransformerConfig`.
`TrainConfig`	`TypeAlias`	Alias for `TrainTimeConfig`.
`ParentConfig`	`TypeAlias`	Alias for `ParentConfig`.
`ChildConfig`	`TypeAlias`	Alias for `ChildConfig`.
`GPT2Config`	`TypeAlias`	Alias for `GPT2Config`.
`DatasetConfig`	`TypeAlias`	Alias for `DatasetConfig`.
`TimestampConfig`	`TypeAlias`	Alias for `TimestampConfig`.
`FieldConfig`	`TypeAlias`	Alias for `FieldConfig`.

`rockfish.actions.GenerateTimeTransformer`

Generate synthetic data using the Time Transformer model.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `GenerateTimeTransformerConfig`.

`rockfish.actions.SessionTarget`

SessionTarget can be used to trigger generation cycles until a desired target number of sessions is reached.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `Config`.

Evaluation

`rockfish.actions.EvaluateLinkability`

Evaluate the linkability privacy score of the input data.

Example:

Consider the example dataset:

Age	Gender	Zip Code	Medical Condition	Label
25	F	10000	Condition X	ori
41	M	10732	Condition Y	ori
30	M	20000	Condition Y	syn
...	...	...	...	...

The configuration for the action includes the auxiliary columns to use for the attack, the label column name, and the number of neighbors to use for the attack.

config = {
    "aux_cols_a": ["Age", "Gender"],
    "aux_cols_b": ["Zip Code", "Medical Condition"],
    "label": "Label",
    "n_neighbors": 1,
}
evaluate_linkability = ra.EvaluateLinkability(config)

The output of the action is a table with a single linkability score between 0 and 1, where higher values indicate better protection against linkability attacks.

`rockfish.actions.privacy.LinkabilityConfig`

Configuration for the EvaluateLinkability action.

Attributes:

Name	Type	Description
`n_attacks`	`int`	The number of attacks to run.
`n_trials`	`int`	The number of trials to run.
`label`	`str`	The label column name.
`rng`	`Optional[int]`	The random seed.
`aux_cols_a`	`list[str]`	The auxiliary columns to use for the first set.
`aux_cols_b`	`list[str]`	The auxiliary columns to use for the second set.
`n_neighbors`	`int`	The number of neighbors to use for the k-nearest neighbors attack.

`rockfish.actions.EvaluateInference`

Evaluate the inference privacy score of the input data.

Example:

Consider the example dataset:

Age	Gender	Zip Code	Medical Condition	Label
25	F	10000	Condition X	ori
41	M	10732	Condition Y	ori
30	M	20000	Condition Y	syn
...	...	...	...	...

The configuration for the action includes the auxiliary columns to use for the attack, the secret column name, and the label column name.

config = {
    "aux_cols": ["Age", "Gender", "Zip Code"],
    "secret": "Medical Condition",
    "label": "Label",
}
evaluate_inference = ra.EvaluateInference(config)

The output of the action is a table with a single inference score between 0 and 1, where higher values indicate better protection against inference attacks.

`rockfish.actions.privacy.InferenceConfig`

Configuration for the EvaluateInference action.

Attributes:

Name	Type	Description
`n_attacks`	`int`	The number of attacks to run.
`n_trials`	`int`	The number of trials to run.
`label`	`str`	The label column name.
`rng`	`Optional[int]`	The random seed.
`aux_cols`	`list[str]`	The auxiliary columns to use as features for the inference attack.
`secret`	`str`	The secret column name to attack.

`rockfish.actions.EvaluateLogisticRegression`

Evaluate the classification performance using Logistic Regression.

Example:

Consider the fall detection dataset with labels for train and test sets.

Sex	Body Temperature	Heart Rate	Respiratory Rate	SBP	DBP	split
M	97	80	15	140	90	train
F	96	78	14	145	95	train
M	98	81	13	143	93	test
...	...	...	...	...	...	...

The configuration for the action includes the numerical features, the binary-valued target, and the positive label.

config = {
    "features": [
        "Body Temperature",
        "Heart Rate",
        "Respiratory Rate",
        "SBP",
        "DBP",
    ],
    "target": "Sex",
    "pos_label": "F",
}
evaluate_logistic_regression = ra.EvaluateLogisticRegression(config)

The output of the action is a table with a single AUC value.

`rockfish.actions.txtr.LogisticRegressionConfig`

Configuration for the EvaluateLogisticRegression action.

See details on some of the arguments in sklearn.linear_model.LogisticRegression v1.6.1.

Attributes:

Name	Type	Description
`features`	`list[str]`	Numerical features to use in the model.
`target`	`str`	The classification target. Must have two unique values.
`pos_label`	`Optional[str]`	The positive label. If None and the target value set is {0, 1} or {-1, 1}, then the positive label is 1.
`table_split_col_name`	`str`	The name of the column that contains the split label (train/test).
`penalty`	`Optional[Literal['l1', 'l2', 'elasticnet']]`	Specify the norm of the penalty.
`dual`	`bool`	Dual (constrained) or primal (regularized) formulation.
`tol`	`float`	Tolerance for stopping criteria.
`C`	`float`	Inverse of regularization strength; must be a positive float.
`fit_intercept`	`bool`	Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.
`intercept_scaling`	`float`	Useful only when the solver `'liblinear'` is used and `fit_intercept` is set to `True`.
`class_weight`	`ClassWeight`	Weights associated with classes in the form `{class_label: weight}`. If not given, all classes are supposed to have weight one.
`random_state`	`Optional[int]`	Used when `solver` == `'sag'`, `'saga'` or `'liblinear'` to shuffle the data.
`solver`	`str`	Algorithm to use in the optimization problem.
`max_iter`	`int`	Maximum number of iterations taken for the solvers to converge.

`rockfish.actions.EvaluateRandomForest`

Evaluate the classification performance using Random Forest.

See the example in EvaluateLogisticRegression for usage.

`rockfish.actions.txtr.RandomForestConfig`

Configuration for the EvaluateRandomForest action.

See details on some of the arguments in sklearn.ensemble.RandomForestClassifier v1.6.1.

Attributes:

Name	Type	Description
`features`	`list[str]`	Numerical features to use in the model.
`target`	`str`	The classification target. Must have two unique values.
`pos_label`	`Optional[str]`	The positive label. If None and the target value set is {0, 1} or {-1, 1}, then the positive label is 1.
`table_split_col_name`	`str`	The name of the column that contains the split label (train/test).
`n_estimators`	`int`	The number of trees in the forest.
`criterion`	`Literal['gini', 'entropy', 'log_loss']`	The function to measure the quality of a split.
`max_depth`	`Optional[int]`	The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than `min_samples_split` samples.
`min_samples_split`	`int`	The minimum number of samples required to split an internal node.
`min_samples_leaf`	`int`	The minimum number of samples required to be at a leaf node.
`min_weight_fraction_leaf`	`float`	The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.
`max_features`	`Union[str, int, float, None]`	The number of features to consider when looking for the best split.
`max_leaf_nodes`	`Optional[int]`	Grow trees with `max_leaf_nodes` in best-first fashion.
`min_impurity_decrease`	`float`	A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
`bootstrap`	`bool`	Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.
`oob_score`	`bool`	Whether to use out-of-bag samples to estimate the generalization score.
`n_jobs`	`Optional[int]`	The number of jobs to run in parallel.
`random_state`	`Optional[int]`	Controls both the randomness of the bootstrapping of the samples used when building trees (if `bootstrap=True`) and the sampling of the features to consider when looking for the best split at each node (if `max_features < n_features`).
`class_weight`	`ClassWeight`	Weights associated with classes in the form `{class_label: weight}`. If not given, all classes are supposed to have weight one.
`ccp_alpha`	`float`	Complexity parameter used for Minimal Cost-Complexity Pruning.
`max_samples`	`Optional[float]`	If bootstrap is True, the number of samples to draw from X to train each base estimator.

`rockfish.actions.txtr.ClassWeight = Union[dict[str, float], str, None]` `module-attribute`

`rockfish.actions.EvaluateForecast`

Evaluate the forecasting performance using Prophet.

Example 1:

Consider the following time series dataset:

ds	y	split
2007-12-10	9.590761	train
2007-12-11	8.519590	train
2007-12-12	8.183677	train
...	...	...
2016-01-16	7.817223	train
2016-01-17	9.273878	test
2016-01-18	10.333775	test
2016-01-19	9.125871	test
2016-01-20	8.891374	test

The configuration for the action includes the datestamp, the target, and the split column.

config = {"datestamp": "ds", "target": "y", "table_split_col_name": "split"}
evaluate_forecast = ra.EvaluateForecast(config)

The output of the action is a table with the forecasted values.

ds	y
2016-01-17	9.496974
2016-01-18	9.777253
2016-01-19	9.577357
2016-01-20	9.425384

Example 2:

The input table can contain multiple sessions. The forecast is done on each session separately. Sessions are defined by one or multiple group-by columns as the session key.

ds	y	split	group1	group2
2020-01-01	0.030472	train	a	x
2020-01-02	0.677833	train	a	x
2020-01-03	1.049973	train	a	x
...	...	...	...	...
2020-04-05	3.583768	test	b	y
2020-04-06	3.054671	test	b	y
2020-04-07	3.180977	test	b	y

The configuration for the action includes the datestamp, the target, the split column, and the session key.

config = {
    "datestamp": "ds",
    "target": "y",
    "table_split_col_name": "split",
    "session_key": ["group1", "group2"],
}
evaluate_forecast = ra.EvaluateForecast(config)

If no session key is provided, the table metadata is used to extract the session field or the group fields. If that fails, the entire table is treated as a single session.

Note: the datestamp column needs to be a date type.

`rockfish.actions.txtr.EvaluateForecastConfig`

Configuration for the EvaluateForecast action.

Attributes:

Name	Type	Description
`datestamp`	`str`	The datestamp to use for the forecast.
`target`	`str`	The target column to forecast.
`table_split_col_name`	`str`	The name of the column that contains the split label (train/test).
`session_key`	`Optional[Union[str, list[str]]]`	The name of the column(s) that contain the session field or group fields. If None, then the session key is extracted from the table metadata. If session or group fields are found, forecasting is done on each session separately. Otherwise, it's done on the entire dataset.
`growth`	`Literal['linear', 'logistic', 'flat']`	String 'linear', 'logistic', or 'flat' to specify a linear, logistic, or flat trend.
`yearly_seasonality`	`Seasonality`	Fit yearly seasonality. Can be 'auto', True, False, or an integer number of Fourier terms to generate.
`weekly_seasonality`	`Seasonality`	Fit weekly seasonality. Can be 'auto', True, False, or an integer number of Fourier terms to generate.
`daily_seasonality`	`Seasonality`	Fit daily seasonality. Can be 'auto', True, False, or an integer number of Fourier terms to generate.
`seasonality_mode`	`Literal['additive', 'multiplicative']`	'additive' (default) or 'multiplicative'.
`seasonality_prior_scale`	`float`	Parameter modulating the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, smaller values dampen the seasonality.
`holidays_prior_scale`	`float`	Parameter modulating the strength of the holiday components model.
`mcmc_samples`	`int`	If greater than 0, will do full Bayesian inference with the specified number of MCMC samples. If 0, will do MAP estimation.
`holidays_mode`	`Literal['auto', 'additive', 'multiplicative']`	'additive', 'multiplicative', or 'auto'. Defaults to seasonality_mode.

rockfish.actions

Source and Sink Actions

rockfish.actions.DatasetLoad

rockfish.actions.DatasetSave

rockfish.actions.ModelLoad

Dataset Property Extraction Actions

rockfish.actions.TabPropertyExtractor

rockfish.actions.properties.TabPropertyExtractorConfig

rockfish.actions.TimePropertyExtractor

rockfish.actions.properties.TimePropertyExtractorConfig

Data Processing Actions

rockfish.actions.Apply

rockfish.actions.Transform

rockfish.actions.AppendUUID

rockfish.actions.append.AppendUUIDConfig

rockfish.actions.AppendDomain

rockfish.actions.append.AppendDomainConfig

rockfish.actions.AppendNormal

rockfish.actions.append.AppendNormalConfig

rockfish.actions.Flatten

rockfish.actions.flatten.FlattenConfig dataclass

rockfish.actions.Unflatten

rockfish.actions.flatten.UnflattenConfig dataclass

rockfish.actions.Sample

rockfish.actions.sample.SampleConfig dataclass

rockfish.actions.SampleLabel

rockfish.actions.sample_label.SampleLabelConfig

rockfish.actions.AlterTimestamp

fixed

random

squeeze

chop

original

rockfish.actions.timestamps.AlterTimestampConfig

rockfish.actions.PostAmplify

rockfish.actions.SQL

rockfish.actions.sql.Config dataclass

Encoding Actions

rockfish.actions.JoinFields

rockfish.actions.join_split.JoinConfig

rockfish.actions.SplitField

rockfish.actions.join_split.SplitConfig

rockfish.actions.LabelEncode

rockfish.actions.encode.LabelEncodeConfig

rockfish.actions.LabelDecode

rockfish.actions.encode.LabelDecodeConfig

rockfish.actions.LogEncode

rockfish.actions.encode.LogEncodeConfig

rockfish.actions.LogDecode

rockfish.actions.encode.LogDecodeConfig

rockfish.actions.SubtractTimestamp

rockfish.actions.timestamps.SubtractTimestampConfig dataclass

rockfish.actions.AddDuration

rockfish.actions.timestamps.AddDurationConfig dataclass

rockfish.actions.StabPreprocess

rockfish.actions.StabPostprocess

Train and Generate Actions

rockfish.actions.GenerateFromDataSchema

rockfish.actions.TrainTimeGAN

rockfish.actions.GenerateTimeGAN

rockfish.actions.TrainTabGAN

rockfish.actions.GenerateTabGAN

rockfish.actions.TrainTabTransformer

rockfish.actions.GenerateTabTransformer

rockfish.actions.TrainTimeTransformer

rockfish.actions.GenerateTimeTransformer

rockfish.actions.SessionTarget

Evaluation

rockfish.actions.EvaluateLinkability

rockfish.actions.privacy.LinkabilityConfig

rockfish.actions.EvaluateInference

rockfish.actions.privacy.InferenceConfig

rockfish.actions.EvaluateLogisticRegression

rockfish.actions.txtr.LogisticRegressionConfig

rockfish.actions.EvaluateRandomForest

rockfish.actions.txtr.RandomForestConfig

rockfish.actions.txtr.ClassWeight = Union[dict[str, float], str, None] module-attribute

rockfish.actions.EvaluateForecast

rockfish.actions.txtr.EvaluateForecastConfig

`rockfish.actions.DatasetLoad`

`rockfish.actions.DatasetSave`

`rockfish.actions.ModelLoad`

`rockfish.actions.TabPropertyExtractor`

`rockfish.actions.properties.TabPropertyExtractorConfig`

`rockfish.actions.TimePropertyExtractor`

`rockfish.actions.properties.TimePropertyExtractorConfig`

`rockfish.actions.Apply`

`rockfish.actions.Transform`

`rockfish.actions.AppendUUID`

`rockfish.actions.append.AppendUUIDConfig`

`rockfish.actions.AppendDomain`

`rockfish.actions.append.AppendDomainConfig`

`rockfish.actions.AppendNormal`

`rockfish.actions.append.AppendNormalConfig`

`rockfish.actions.Flatten`

`rockfish.actions.flatten.FlattenConfig` `dataclass`

`rockfish.actions.Unflatten`

`rockfish.actions.flatten.UnflattenConfig` `dataclass`

`rockfish.actions.Sample`

`rockfish.actions.sample.SampleConfig` `dataclass`

`rockfish.actions.SampleLabel`

`rockfish.actions.sample_label.SampleLabelConfig`

`rockfish.actions.AlterTimestamp`

`fixed`

`random`

`squeeze`

`chop`

`original`

`rockfish.actions.timestamps.AlterTimestampConfig`

`rockfish.actions.PostAmplify`

`rockfish.actions.SQL`

`rockfish.actions.sql.Config` `dataclass`

`rockfish.actions.JoinFields`

`rockfish.actions.join_split.JoinConfig`

`rockfish.actions.SplitField`

`rockfish.actions.join_split.SplitConfig`

`rockfish.actions.LabelEncode`

`rockfish.actions.encode.LabelEncodeConfig`

`rockfish.actions.LabelDecode`

`rockfish.actions.encode.LabelDecodeConfig`

`rockfish.actions.LogEncode`

`rockfish.actions.encode.LogEncodeConfig`

`rockfish.actions.LogDecode`

`rockfish.actions.encode.LogDecodeConfig`

`rockfish.actions.SubtractTimestamp`

`rockfish.actions.timestamps.SubtractTimestampConfig` `dataclass`

`rockfish.actions.AddDuration`

`rockfish.actions.timestamps.AddDurationConfig` `dataclass`

`rockfish.actions.StabPreprocess`

`rockfish.actions.StabPostprocess`

`rockfish.actions.GenerateFromDataSchema`

`rockfish.actions.TrainTimeGAN`

`rockfish.actions.GenerateTimeGAN`

`rockfish.actions.TrainTabGAN`

`rockfish.actions.GenerateTabGAN`

`rockfish.actions.TrainTabTransformer`

`rockfish.actions.GenerateTabTransformer`

`rockfish.actions.TrainTimeTransformer`

`rockfish.actions.GenerateTimeTransformer`

`rockfish.actions.SessionTarget`

`rockfish.actions.EvaluateLinkability`

`rockfish.actions.privacy.LinkabilityConfig`

`rockfish.actions.EvaluateInference`

`rockfish.actions.privacy.InferenceConfig`

`rockfish.actions.EvaluateLogisticRegression`

`rockfish.actions.txtr.LogisticRegressionConfig`

`rockfish.actions.EvaluateRandomForest`

`rockfish.actions.txtr.RandomForestConfig`

`rockfish.actions.txtr.ClassWeight = Union[dict[str, float], str, None]` `module-attribute`

`rockfish.actions.EvaluateForecast`

`rockfish.actions.txtr.EvaluateForecastConfig`