Interpretability

ACTL3143 & ACTL5111 Deep Learning for Actuaries

Author

Patrick Laub

Acknowledgments

Thanks to Eric Dong for making the original version of these slides.

Show the package imports
import random
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import numpy.random as rnd
import pandas as pd

import keras
from keras.metrics import SparseTopKCategoricalAccuracy
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.callbacks import EarlyStopping
 
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split 

Interpretability

Interpretability on a high-level refers to understanding how a model works. Understanding how a model works is very important for decision making. Traditional statistical methods like linear regression and generalized linear regressions are inherently interpretable because we can see and understand how different variables impact the model predictions collectively and individually. In contrast, deep learning algorithms do not readily provide insights into how variables contributed to the predictions. They are composed of multiple layers of interconnected nodes that learn different representations of data. Hence, it is not clear how inputs directly contributed to the outputs. This makes neural networks less interpretable. This is not very desirable, especially in situations which demand making explanations. As such, there is active discussion going on about how we can make less interpretable models more interpretable so that we start trusting these models more.

Interpretability

Interpretability Definition

Interpretability refers to the ease with which one can understand and comprehend the model’s algorithm and predictions.

Interpretability of black-box models can be crucial to ascertaining trust.

Interpretability is about transparency, about understanding exactly why and how the model is generating predictions, and therefore, it is important to observe the inner mechanics of the algorithm considered. This leads to interpreting the model’s parameters and features used to determine the given output. Explainability is about explaining the behavior of the model in human terms.

Husky vs. Wolf

A well-known anecdote in the explainability literature.

Aspects of Interpretability

Inherent Interpretability

The model is interpretable by design.

Models with inherent interpretability generally have a simple model architecture where the relationships between inputs and outputs are straightforward. This makes it easy to understand and comprehend model’s inner workings and its predictions. As a result, decision making processes convenient. Examples for models with inherent interpretability include linear regression models, generalized linear regression models and decision trees.

Post-hoc Interpretability

The model is not interpretable by design, but we can use other methods to explain the model.

Post-hoc interpretability refers to applying various techniques to understand how the model makes its predictions after the model is trained. Post-hoc interpretability is useful for understanding predictions coming from complex models (less interpretable models) such as neural networks, random forests and gradient boosting trees.


Global Interpretability

The ability to understand how the model works.

Local Interpretability

The ability to interpret/understand each prediction.

Global Interpretability focuses on understanding the model’s decision-making process as a whole. Global interpretability takes in to account the entire dataset. These techniques will try to look at general patterns related how input data drives the output in general. Examples for techniques include global feature importance method and permutation importance methods.

Local Interpretability focuses on understanding the model’s decision-making for a specific input observation. These techniques will try to look at how different input features contributed to the output.

Inherent Interpretability


Trees are interpretable!

Train prices

Trees are interpretable?

Full train pricing

Linear models & LocalGLMNet

A GLM has the form

\hat{y} = g^{-1}\bigl( \beta_0 + \beta_1 x_1 + \dots + \beta_p x_p \bigr)

where \beta_0, \dots, \beta_p are the model parameters.

Global & local interpretations are easy to obtain.

The above GLM representation provides a clear interpretation of how a marginal change in a variable x can contribute to a change in the mean of the output. This makes GLM inherently interpretable.


LocalGLMNet extends this to a neural network.

\hat{y_i} = g^{-1}\bigl( \beta_0(\boldsymbol{x}_i) + \beta_1(\boldsymbol{x}_i) x_{i1} + \dots + \beta_p(\boldsymbol{x}_i) x_{ip} \bigr)

A GLM with local parameters \beta_0(\boldsymbol{x}_i), \dots, \beta_p(\boldsymbol{x}_i) for each observation \boldsymbol{x}_i. The local parameters are the output of a neural network.

Here, \beta_p’s are the neurons from the output layer. First, we define a Feed Foward Neural Network using an input layer, several hidden layers and an output layer. The number of neurons in the output layer must be equal to the number of inputs. Thereafter, we define a skip connection from the input layer directly to the output layer, and merge them using scaler multiplication. Thereafter, the neural network returns the coefficients of the GLM fitted for each individual. We then train the model with the response variable.

Post-hoc Interpretability

Permutation importance

  • Inputs: fitted model m, tabular dataset D.

  • Compute the reference score s of the model m on data D (for instance the accuracy for a classifier or the R^2 for a regressor).

  • For each feature j (column of D):

    • For each repetition k in {1, \dots, K}:

      • Randomly shuffle column j of dataset D to generate a corrupted version of the data named \tilde{D}_{k,j}.
      • Compute the score s_{k,j} of model m on corrupted data \tilde{D}_{k,j}.
    • Compute importance i_j for feature f_j defined as:

      i_j = s - \frac{1}{K} \sum_{k=1}^{K} s_{k,j}

Originally proposed by Breiman (2001), Random forests, Machine learning (45), pp. 5-32.

Extended by Fisher et al. (2019), All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously, Journal of Machine Learning Research (20.177), pp. 1-81.

Permutation importance

def permutation_test(model, X, y, num_reps=1, seed=42):
    """
    Run the permutation test for variable importance.
    Returns matrix of shape (X.shape[1], len(model.evaluate(X, y))).
    """
    rnd.seed(seed)
    scores = []    

    for j in range(X.shape[1]):
        original_column = np.copy(X[:, j])
        col_scores = []

        for r in range(num_reps):
            rnd.shuffle(X[:,j])
            col_scores.append(model.evaluate(X, y, verbose=0))

        scores.append(np.mean(col_scores, axis=0))
        X[:,j] = original_column
    
    return np.array(scores)

LIME

Local Interpretable Model-agnostic Explanations employs an interpretable surrogate model to explain locally how the black-box model makes predictions for individual instances.

E.g. a black-box model predicts Bob’s premium as the highest among all policyholders. LIME uses an interpretable model (a linear regression) to explain how Bob’s features influence the black-box model’s prediction.

Globally vs. Locally Faithful

Globally Faithful

The interpretable model’s explanations accurately reflect the behaviour of the black-box model across the entire input space.

Locally Faithful

The interpretable model’s explanations accurately reflect the behaviour of the black-box model for a specific instance.

LIME aims to construct an interpretable model that mimics the black-box model’s behaviour in a locally faithful manner.

LIME Algorithm

Suppose we want to explain the instance \boldsymbol{x}_{\text{Bob}}=(1, 2, 0.5).

  1. Generate perturbed examples of \boldsymbol{x}_{\text{Bob}} and use the trained gamma MDN f to make predictions: \begin{align*} \boldsymbol{x}^{'(1)}_{\text{Bob}} &= (1.1, 1.9, 0.6), \quad f\big(\boldsymbol{x}^{'(1)}_{\text{Bob}}\big)=34000 \\ \boldsymbol{x}^{'(2)}_{\text{Bob}} &= (0.8, 2.1, 0.4), \quad f\big(\boldsymbol{x}^{'(2)}_{\text{Bob}}\big)=31000 \\ &\vdots \quad \quad \quad \quad\quad \quad\quad \quad\quad \quad \quad \vdots \end{align*} We can then construct a dataset of N_{\text{Examples}} perturbed examples: \mathcal{D}_{\text{LIME}} = \big(\big\{\boldsymbol{x}^{'(i)}_{\text{Bob}},f\big(\boldsymbol{x}^{'(i)}_{\text{Bob}}\big)\big\}\big)_{i=0}^{N_{\text{Examples}}}.

LIME Algorithm

  1. Fit an interpretable model g, i.e., a linear regression using \mathcal{D}_{\text{LIME}} and the following loss function: \mathcal{L}_{\text{LIME}}(f,g,\pi_{\boldsymbol{x}_{\text{Bob}}})=\sum_{i=1}^{N_{\text{Examples}}}\pi_{\boldsymbol{x}_{\text{Bob}}}\big(\boldsymbol{x}^{'(i)}_{\text{Bob}}\big)\cdot \bigg(f\big(\boldsymbol{x}^{'(i)}_{\text{Bob}}\big)-g\big(\boldsymbol{x}^{'(i)}_{\text{Bob}}\big)\bigg)^2, where \pi_{\boldsymbol{x}_{\text{Bob}}}\big(\boldsymbol{x}^{'(i)}_{\text{Bob}}\big) represents the distance from the perturbed example \boldsymbol{x}^{'(i)}_{\text{Bob}} to the instance to be explained \boldsymbol{x}_{\text{Bob}}.

“Explaining” to Bob

The bold red cross is the instance being explained. LIME samples instances (grey nodes), gets predictions using f (gamma MDN) and weighs them by the proximity to the instance being explained (represented here by size). The dashed line g is the learned local explanation.

SHAP Values

The SHapley Additive exPlanations (SHAP) value helps to quantify the contribution of each feature to the prediction for a specific instance.

The SHAP value for the jth feature is defined as \begin{align*} \text{SHAP}^{(j)}(\boldsymbol{x}) &= \sum_{U\subset \{1, ..., p\} \backslash \{j\}} \frac{1}{p} \binom{p-1}{|U|}^{-1} \big(\mathbb{E}[Y| \boldsymbol{x}^{(U\cup \{j\})}] - \mathbb{E}[Y|\boldsymbol{x}^{(U)}]\big), \end{align*} where p is the number of features. A positive SHAP value indicates that the variable increases the prediction value.

Grad-CAM

Original image

Grad-CAM

See, e.g., Keras tutorial.

Criticism

Multiple conflicting explanations.

Illustrative Example

First attempt at NLP task

Code
df_raw = pd.read_parquet("../Natural-Language-Processing/NHTSA_NMVCCS_extract.parquet.gzip")

df_raw["NUM_VEHICLES"] = df_raw["NUMTOTV"].map(lambda x: str(x) if x <= 2 else "3+")

weather_cols = [f"WEATHER{i}" for i in range(1, 9)]
features = df_raw[["SUMMARY_EN"] + weather_cols]

target_labels = df_raw["NUM_VEHICLES"]
target = LabelEncoder().fit_transform(target_labels)

X_main, X_test, y_main, y_test = train_test_split(features, target, test_size=0.2, random_state=1)
X_train, X_val, y_train, y_val = train_test_split(X_main, y_main, test_size=0.25, random_state=1)
df_raw["SUMMARY_EN"]
0       V1, a 2000 Pontiac Montana minivan, made a lef...
1       The crash occurred in the eastbound lane of a ...
2       This crash occurred just after the noon time h...
                              ...                        
6946    The crash occurred in the eastbound lanes of a...
6947    This single-vehicle crash occurred in a rural ...
6948    This two vehicle daytime collision occurred mi...
Name: SUMMARY_EN, Length: 6949, dtype: object
df_raw["NUM_VEHICLES"].value_counts()\
  .sort_index()
NUM_VEHICLES
1     1822
2     4151
3+     976
Name: count, dtype: int64

Trained neural networks performing really well on predictions does not necessarily imply good performance. Interrogating the model can help us understand inside workings of the model to ensure there are no underlying problems with model.

Bag of words for the top 1,000 words

Code
def vectorise_dataset(X, vect, txt_col="SUMMARY_EN", dataframe=False):
    X_vects = vect.transform(X[txt_col]).toarray()
    X_other = X.drop(txt_col, axis=1)

    if not dataframe:
        return np.concatenate([X_vects, X_other], axis=1)                           
    else:
        # Add column names and indices to the combined dataframe.
        vocab = list(vect.get_feature_names_out())
        X_vects_df = pd.DataFrame(X_vects, columns=vocab, index=X.index)
        return pd.concat([X_vects_df, X_other], axis=1) 
vect = CountVectorizer(max_features=1_000, stop_words="english")
vect.fit(X_train["SUMMARY_EN"])

X_train_bow = vectorise_dataset(X_train, vect)
X_val_bow = vectorise_dataset(X_val, vect)
X_test_bow = vectorise_dataset(X_test, vect)

vectorise_dataset(X_train, vect, dataframe=True).head()
10 105 113 12 15 150 16 17 18 180 ... yield zone WEATHER1 WEATHER2 WEATHER3 WEATHER4 WEATHER5 WEATHER6 WEATHER7 WEATHER8
2532 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
6209 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2561 1 0 1 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
6664 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4214 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 1008 columns

Trained a basic neural network on that

Code
def build_model(num_features, num_cats):
    random.seed(42)
    
    model = Sequential([
        Input((num_features,)),
        Dense(100, activation="relu"),
        Dense(num_cats, activation="softmax")
    ])
    
    topk = SparseTopKCategoricalAccuracy(k=2, name="topk")
    model.compile("adam", "sparse_categorical_crossentropy",
        metrics=["accuracy", topk])
    
    return model
num_features = X_train_bow.shape[1]
num_cats = df_raw["NUM_VEHICLES"].nunique()
model = build_model(num_features, num_cats)
es = EarlyStopping(patience=1, restore_best_weights=True, monitor="val_accuracy")
model.fit(X_train_bow, y_train, epochs=10,
    callbacks=[es], validation_data=(X_val_bow, y_val), verbose=0)
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                   │ (None, 100)            │       100,900 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 3)              │           303 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 303,611 (1.16 MB)
 Trainable params: 101,203 (395.32 KB)
 Non-trainable params: 0 (0.00 B)
 Optimizer params: 202,408 (790.66 KB)
model.evaluate(X_train_bow, y_train, verbose=0)
[0.004478050395846367, 1.0, 1.0]
model.evaluate(X_val_bow, y_val, verbose=0)
[8.163677215576172, 0.9784172773361206, 0.9985611438751221]

Permutation importance algorithm

Taken directly from scikit-learn documentation:

  • Inputs: fitted predictive model m, tabular dataset (training or validation) D.

  • Compute the reference score s of the model m on data D (for instance the accuracy for a classifier or the R^2 for a regressor).

  • For each feature j (column of D):

    • For each repetition k in {1, \dots, K}:

      • Randomly shuffle column j of dataset D to generate a corrupted version of the data named \tilde{D}_{k,j}.
      • Compute the score s_{k,j} of model m on corrupted data \tilde{D}_{k,j}.
    • Compute importance i_j for feature f_j defined as:

      i_j = s - \frac{1}{K} \sum_{k=1}^{K} s_{k,j}

Find important inputs

def permutation_test(model, X, y, num_reps=1, seed=42):
    """
    Run the permutation test for variable importance.
    Returns matrix of shape (X.shape[1], len(model.evaluate(X, y))).
    """
    rnd.seed(seed)
    scores = []    

    for j in range(X.shape[1]):
        original_column = np.copy(X[:, j])
        col_scores = []

        for r in range(num_reps):
            rnd.shuffle(X[:,j])
            col_scores.append(model.evaluate(X, y, verbose=0))

        scores.append(np.mean(col_scores, axis=0))
        X[:,j] = original_column
    
    return np.array(scores)

Run the permutation test

1all_perm_scores = permutation_test(model, X_val_bow, y_val)
all_perm_scores
1
The permutation_test, aims to evaluate the model’s performance on different sets of unseen data. The idea here is to shuffle the order of the val set, and compare the model performance.
array([[ 8.16,  0.98,  1.  ],
       [ 8.16,  0.98,  1.  ],
       [ 8.16,  0.98,  1.  ],
       ...,
       [ 8.48,  0.98,  1.  ],
       [ 8.47,  0.98,  1.  ],
       [14.59,  0.98,  1.  ]])

Plot the permutated accuracies

1perm_scores = all_perm_scores[:,1]
plt.plot(perm_scores)
plt.xlabel("Input index")
plt.ylabel("Accuracy when shuffled");
1
[:,1] part will extract the accuracy of the output from the model evaluation and store is as a vector.

The above method on a high-level says that, if we corrupt the information contained in a feature by changing the order of the data in that feature column, then we are able to see how much information the variable brings in. If a certain variable is not contributing to the prediction accuracy, then changing the order of the variable will not result in a notable drop in accuracy. However, if a certain variable is highly important, then changing the order of data will result in a larger drop. This is an indication of variable importance. The plot above shows how model’s accuracy fluctuates across variables, and we can see how certain variables result in larger drops of accuracies.

Find the most significant inputs

1vocab = vect.get_feature_names_out()
2input_cols = list(vocab) + weather_cols

3best_input_inds = np.argsort(perm_scores)[:100]
4best_inputs = [input_cols[idx] for idx in best_input_inds]

5print(best_inputs)
1
Extracts the names of the features in a vectorizer object
2
Combines the list of names in the vectorizer object with the weather columns
3
Sorts the perm_scores in the ascending order and select the 100 observation which had the most impact on model’s accuracy
4
Find the names of the input features by mapping the index
5
Prints the output
['v3', 'v2', 'vehicle', 'involved', 'event', 'harmful', 'stated', 'motor', 'WEATHER8', 'left', 'v1', 'just', 'turn', 'traveling', 'approximately', 'medication', 'hurry', 'encroaching', 'chevrolet', 'parked', 'continued', 'saw', 'road', 'rest', 'distraction', 'ahead', 'wet', 'hit', 'WEATHER5', 'WEATHER6', 'WEATHER4', 'old', 'v4', '48', 'rested', 'direction', 'occurred', 'kph', 'clear', 'right', 'miles', 'uphill', 'WEATHER3', 'denied', '80', 'attempt', 'thinking', 'assumption', 'damage', 'runs', 'time', 'consisted', 'following', 'towed', 'WEATHER1', 'alcohol', 'mph', 'pickup', 'lane', 'conversing', 'make', 'started', 'maneuver', 'stopped', 'store', 'car', 'local', 'dry', 'median', 'south', 'driver', 'higher', 'pre', 'northbound', 'impacting', 'congested', 'health', 'southwest', 'gmc', 'observed', 'parking', 'partially', 'heart', 'shoulders', 'shoulder', 'southeast', 'came', 'heard', 'gap', 'southbound', 'conditions', 'contacted', 'change', 'compact', '2002', 'cell', 'causing', 'oldsmobile', 'half', 'hard']

How about a simple decision tree?

We can try building a simpler model using only the most important features. Here, we chose a classification decision tree.

1from sklearn import tree

2clf = tree.DecisionTreeClassifier(random_state=0, max_leaf_nodes=3)
3clf.fit(X_train_bow[:, best_input_inds], y_train);
1
Imports tree class from sklearn
2
Specifies a decision tree with 3 leaf nodes. max_leaf_nodes=3 ensures that the fitted tree will have at most 3 leaf nodes
3
Fits the decision tree on the selected dataset. Here we only select the best_input_inds columns from the train set
print(clf.score(X_train_bow[:, best_input_inds], y_train))
print(clf.score(X_val_bow[:, best_input_inds], y_val))
0.9186855360997841
0.9316546762589928

The decision tree ends up giving pretty good results.

Decision tree

tree.plot_tree(clf, feature_names=best_inputs, filled=True);

print(np.where(clf.feature_importances_ > 0)[0])
[best_inputs[ind] for ind in np.where(clf.feature_importances_ > 0)[0]]
[0 1]
['v3', 'v2']

Illustrative Example (Fixed)

This is why we replace “v1”, “v2”, “v3”

Code
# Go through every summary and find the words "V1", "V2" and "V3".
# For each summary, replace "V1" with a random number like "V1623", and "V2" with a different random number like "V1234".
rnd.seed(123)

df = df_raw.copy()
for i, summary in enumerate(df["SUMMARY_EN"]):
    word_numbers = ["one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"]
    num_cars = 10
    new_car_nums = [f"V{rnd.randint(100, 10000)}" for _ in range(num_cars)]
    num_spaces = 4

    for car in range(1, num_cars+1):
        new_num = new_car_nums[car-1]
        summary = summary.replace(f"V-{car}", new_num)
        summary = summary.replace(f"Vehicle {word_numbers[car-1]}", new_num).replace(f"vehicle {word_numbers[car-1]}", new_num)
        summary = summary.replace(f"Vehicle #{word_numbers[car-1]}", new_num).replace(f"vehicle #{word_numbers[car-1]}", new_num)
        summary = summary.replace(f"Vehicle {car}", new_num).replace(f"vehicle {car}", new_num)
        summary = summary.replace(f"Vehicle #{car}", new_num).replace(f"vehicle #{car}", new_num)
        summary = summary.replace(f"Vehicle # {car}", new_num).replace(f"vehicle # {car}", new_num)

        for j in range(num_spaces+1):
            summary = summary.replace(f"V{' '*j}{car}", new_num).replace(f"V{' '*j}#{car}", new_num).replace(f"V{' '*j}# {car}", new_num)
            summary = summary.replace(f"v{' '*j}{car}", new_num).replace(f"v{' '*j}#{car}", new_num).replace(f"v{' '*j}# {car}", new_num)
         
    df.loc[i, "SUMMARY_EN"] = summary

There was a slide in the NLP deck titled “Just ignore this for now…” That was going through each summary and replacing the words “V1”, “V2”, “V3” with random numbers. This was done to see if the model was overfitting to these words.

Code
features = df[["SUMMARY_EN"] + weather_cols]
X_main, X_test, y_main, y_test = train_test_split(features, target, test_size=0.2, random_state=1)
X_train, X_val, y_train, y_val = train_test_split(X_main, y_main, test_size=0.25, random_state=1)

vect = CountVectorizer(max_features=1_000, stop_words="english")
vect.fit(X_train["SUMMARY_EN"])

X_train_bow = vectorise_dataset(X_train, vect)
X_val_bow = vectorise_dataset(X_val, vect)
X_test_bow = vectorise_dataset(X_test, vect)

model = build_model(num_features, num_cats)

es = EarlyStopping(patience=1, restore_best_weights=True,
    monitor="val_accuracy", verbose=2)
model.fit(X_train_bow, y_train, epochs=10,
    callbacks=[es], validation_data=(X_val_bow, y_val), verbose=0);

Retraining on the fixed dataset gives us a more realistic (lower) accuracy.

model.evaluate(X_train_bow, y_train, verbose=0)
[0.07491350173950195, 0.986567497253418, 0.9997601509094238]
model.evaluate(X_val_bow, y_val, verbose=0)
[2.865464925765991, 0.9446043372154236, 0.9956834316253662]

Permutation importance accuracy plot

perm_scores = permutation_test(model, X_val_bow, y_val)[:,1]
plt.plot(perm_scores)
plt.xlabel("Input index"); plt.ylabel("Accuracy when shuffled");

Find the most significant inputs

vocab = vect.get_feature_names_out()
input_cols = list(vocab) + weather_cols

best_input_inds = np.argsort(perm_scores)[:100]
best_inputs = [input_cols[idx] for idx in best_input_inds]

print(best_inputs)
['harmful', 'involved', 'event', 'year', 'struck', 'contacted', 'old', 'impact', 'coded', 'motor', 'rear', 'towed', 'stop', 'driven', 'turning', 'single', 'forward', 'chevrolet', 'lanes', 'crash', 'parked', 'continued', 'WEATHER4', 'WEATHER1', 'travel', 'divided', 'brakes', 'include', 'came', 'stopped', 'final', 'factor', 'clear', '2004', 'speed', '2002', 'reason', 'passing', 'pickup', 'crossing', 'driving', 'ahead', 'did', 'control', '10', 'highway', 'WEATHER5', 'weather', 'traveling', 'surveillance', 'stated', 'spin', 'heart', 'road', 'pole', 'occupants', 'afternoon', 'moderate', 'mirror', 'including', 'pushed', '55mph', 'intersection', 'WEATHER8', '1993', 'cross', 'curve', 'slight', 'dark', 'day', 'sight', 'distance', 'seat', 'saw', 'said', 'drivers', 'earlier', 'early', 'dodge', 'corrected', 'state', 'contributed', '16', 'trying', 'trip', 'buick', 'trailer', 'traffic', 'tractor', 'cherokee', '42', 'taken', '41', 'cloudy', 'collision', 'congested', 'contact', 'eastbound', 'roadways', 'roadway']

How about a simple decision tree?

clf = tree.DecisionTreeClassifier(random_state=0, max_leaf_nodes=3)
clf.fit(X_train_bow[:, best_input_inds], y_train);
print(clf.score(X_train_bow[:, best_input_inds], y_train))
print(clf.score(X_val_bow[:, best_input_inds], y_val))
0.9249220436555529
0.9381294964028777

Decision tree

tree.plot_tree(clf, feature_names=best_inputs, filled=True);

The tree shows how, the model would check for the word v3, and decides the prediction as 3+. This is not very meaningful, because having v3 in the input is a direct indication of the number of vehicles.

print(np.where(clf.feature_importances_ > 0)[0])
[best_inputs[ind] for ind in np.where(clf.feature_importances_ > 0)[0]]
[ 0 36]
['harmful', 'reason']

Package Versions

from watermark import watermark
print(watermark(python=True, packages="keras,matplotlib,numpy,pandas,seaborn,scipy,torch,tensorflow,tf_keras"))
Python implementation: CPython
Python version       : 3.11.9
IPython version      : 8.24.0

keras     : 3.3.3
matplotlib: 3.9.0
numpy     : 1.26.4
pandas    : 2.2.2
seaborn   : 0.13.2
scipy     : 1.11.0
torch     : 2.3.1
tensorflow: 2.16.1
tf_keras  : 2.16.0

Glossary

  • global interpretability
  • Grad-CAM
  • inherent interpretability
  • LIME
  • local interpretability
  • permutation importance
  • post-hoc interpretability
  • SHAP values