RIME Adversarial

Documentation for usage of RIME Adversarial code, including Attacks (the algorithms used to find adversarial points) and Epsilon Balls (the methods used to constrain the attacks).

Attacks

Attack algorithms for tabular models.

class rime.tabular.attacks.TabularRandomizedAttack(black_box: BlackBoxModel, target_score: float, max_queries: int, columns: List[Column], custom_perturb_pct: Optional[Dict[str, float]] = None, default_perturb_pct: float = 1.0, max_unsuccessful_iters: Optional[int] = None, epsilon_ball: Optional[EpsilonBall] = None, **kwargs: Any)

Attack algorithm that randomly perturbs a subset of features each iteration.

__init__(black_box: BlackBoxModel, target_score: float, max_queries: int, columns: List[Column], custom_perturb_pct: Optional[Dict[str, float]] = None, default_perturb_pct: float = 1.0, max_unsuccessful_iters: Optional[int] = None, epsilon_ball: Optional[EpsilonBall] = None, **kwargs: Any)

Initialize the algorithm.

Parameters:

black_box – Model to attack.
target_score – Upper bound on the score (loss) of examples considered to be successfully adversarial.
max_queries – Maximum number of queries allowed to make to model.
columns – List of column objects from profiler.
custom_perturb_pct – Mapping of feature name to probability of perturbing that feature. Can be used to focus more or less on certain features. Defaults to None.
default_perturb_pct – Default perturbation percent to use for each feature not specified in custom_perturb_pct mapping. Defaults to 1.
max_unsuccessful_iters – If an integer is passed, stop attack after this number of unsuccessful iterations. Defaults to None.
epsilon_ball – Epsilon ball to use to restrict changes. Will sample from inside this ball. Defaults to None.

class rime.tabular.attacks.TabularCombinationAttack(black_box: BlackBoxModel, target_score: float, max_queries: float, columns: List[Column], subset_sizes: Union[int, List[int]] = 1, search_count: int = 10, max_unsuccessful_iters: Optional[int] = None, column_names_to_ignore: Optional[Set[str]] = None, epsilon_ball: Optional[ColumnRangeEpsilonBall] = None, **kwargs: Any)

Attack which greedily searches over combinations of features.

__init__(black_box: BlackBoxModel, target_score: float, max_queries: float, columns: List[Column], subset_sizes: Union[int, List[int]] = 1, search_count: int = 10, max_unsuccessful_iters: Optional[int] = None, column_names_to_ignore: Optional[Set[str]] = None, epsilon_ball: Optional[ColumnRangeEpsilonBall] = None, **kwargs: Any)

Initialize the algorithm.

Parameters:

black_box – Model to attack.
target_score – Upper bound on the score (loss) of examples considered to be successfully adversarial.
max_queries – Maximum number of queries allowed to make to model.
columns – List of column objects from profiler.
subset_sizes – List of feature subset sizes to consider. Can either be specified as an integer n, in which case subset sizes of 1 through n are considered, or a specific list of subset sizes. Defaults to 1.
search_count – The number of perturbed values to consider for each column on each iteration. Defaults to 10.
max_unsuccessful_iters – The maximum number of iterations to proceed without improvement. Default to None.
column_names_to_ignore – Names of columns to ignore, ie not attempt to perturb. Defaults to None.
epsilon_ball – Epsilon ball to use to restrict changes. Will make perturbations to the extrema of the ranges. Defaults to None.

class rime.tabular.attacks.TabularExhaustiveGreedyAttack(black_box: BlackBoxModel, target_score: float, max_queries: int, columns: List[Column], perturbation_threshold: float = 0.1, **kwargs: Any)

Greedy attack algorithm that exhaustively attacks each feature.

__init__(black_box: BlackBoxModel, target_score: float, max_queries: int, columns: List[Column], perturbation_threshold: float = 0.1, **kwargs: Any)

Initialize the algorithm.

Parameters:

black_box – Model to attack.
target_score – Upper bound on the score (loss) of examples considered to be successfully adversarial.
max_queries – Maximum number of queries allowed to make to model.
columns – List of column objects from profiler.
perturbation_threshold – When trying to exhaustively perturb a feature, will keep on going while change in score is greater than this threshold. Defaults to 0.1.
**kwargs – Same as arguments to TabularGreedyAttack.

class rime.tabular.attacks.TabularGreedyAttack(black_box: BlackBoxModel, target_score: float, max_queries: int, columns: List[Column], skip_categoricals: bool = False, num_features_per_round: int = 3, custom_perturb_pct: Optional[Dict[str, float]] = None, default_perturb_pct: float = 1.0, early_stop_threshold: float = inf, **kwargs: Any)

Greedy attack algorithm that samples perturbations for each column.

__init__(black_box: BlackBoxModel, target_score: float, max_queries: int, columns: List[Column], skip_categoricals: bool = False, num_features_per_round: int = 3, custom_perturb_pct: Optional[Dict[str, float]] = None, default_perturb_pct: float = 1.0, early_stop_threshold: float = inf, **kwargs: Any)

Initialize the algorithm.

Parameters:

black_box – Model to attack.
target_score – Upper bound on the score (loss) of examples considered to be successfully adversarial.
max_queries – Maximum number of queries allowed to make to model.
columns – List of column objects from profiler.
skip_categoricals – Whether to skip categorical columns or not. Defaults to False.
num_features_per_round – Number of features to perturb per iteration. Defaults to 3.
custom_perturb_pct – Mapping of feature name to probability of perturbing that feature. Can be used to focus more or less on certain features. Defaults to None.
default_perturb_pct – Default perturbation percent to use for each feature not specified in custom_perturb_pct mapping. Defaults to 1.
early_stop_threshold – Stop adding more perturbations in this round if initial perturbation is greater than this. Defaults to np.inf.

class rime.tabular.attacks.TabularNoiseRemoval(base_attack: TabularIterativeAttack, repeat: int = 1, target_score: Optional[float] = None)

Attack algorithm that first runs a base attack, then removes unneeded noise.

Is best paired with attacks that quickly (but inefficiently) cross the decision boundary, like TabularRandomizedAttack.

__init__(base_attack: TabularIterativeAttack, repeat: int = 1, target_score: Optional[float] = None)

Initialize with base attack and information for removing noise.

Parameters:

base_attack – Base attack algorithm to run first.
repeat – How many times to attempt to remove noise for each column. Defaults to 1.
target_score – Upper bound on the score (loss) of examples considered to be successfully adversarial. Defaults to None.

Epsilon Balls

Classes defining epsilon balls.

class rime.tabular.attacks.epsilon_ball.LInfQuantileEpsilonBall(epsilon: float, columns: List[Column], col_indices: Optional[List[int]] = None)

An Epsilon Ball with each feature bounded by a quantile range.

__init__(epsilon: float, columns: List[Column], col_indices: Optional[List[int]] = None)

Initialize the epsilon ball.

Parameters:

epsilon – A percentage specifying the (one-sided) quantile range each feature may be perturbed. Should not exceed .5 (50%), as it refers to deviations from the 50th percentile both above and below.
columns – List of columns associated with the features of data points handled by epsilon ball.
col_indices – Optional list of indices to which the columns correspond.

class rime.tabular.attacks.epsilon_ball.LInfRangeEpsilonBall(epsilon: float, columns: List[Column], col_indices: Optional[List[int]] = None)

EpsilonBall implementation that clips based on max-min range.

__init__(epsilon: float, columns: List[Column], col_indices: Optional[List[int]] = None)

Initialize the epsilon ball.

Parameters:

epsilon – A percentage specifying the (one-sided) size of the range each feature may be perturbed, calculated as a percentage of the corresponding columns’ value ranges. Should not exceed 1 (100%).
columns – List of columns associated with the features of data points handled by epsilon ball.
col_indices – Optional list of indices to which the columns correspond.

Attack Runner

Run tabular attacks.

rime.tabular.attacks.runner.run_attack_loop(attack: TabularIterativeAttack, run_container: TabularRunContainer, sample_size: int, use_tqdm: bool = True, special_logger: Optional[Logger] = None) → Tuple[List[TabularAttackState], list]

Run attack over sample of data.

Parameters:

attack – Attack to run.
run_container – Container of data/model to be attacked.
sample_size – Number of data points to sample to run attacks over.
use_tqdm – Whether to use tqdm to log progress of loop or not, defaults to True.
special_logger – If specified, the logger to use to log info messages. Defaults to None.

Returns:

List of attack results and list of indices that were attacked.