Supported Models#
DPGExplainer supports various tree-based ensemble models from scikit-learn. This page documents all supported models, their features, and usage examples.
Overview#
DPGExplainer works with any sklearn ensemble model that has an estimators_ attribute. The framework automatically detects the model type and handles tree structure differences transparently.
Classification Models#
RandomForestClassifier#
Status: ✅ Fully supported
The classic ensemble method. Works as expected with DPGExplainer.
from sklearn.ensemble import RandomForestClassifier
from dpg import DPGExplainer
rf = RandomForestClassifier(n_estimators=10, max_depth=5)
rf.fit(X, y)
explainer = DPGExplainer(rf, feature_names, target_names)
explanation = explainer.explain_global(X)
GradientBoostingClassifier#
Status: ✅ Fully supported (NEW!)
Gradient Boosting is now fully supported with automatic tree structure normalization. No special configuration needed.
from sklearn.ensemble import GradientBoostingClassifier
from dpg import DPGExplainer
gb = GradientBoostingClassifier(n_estimators=100, max_depth=5, learning_rate=0.1)
gb.fit(X, y)
# Works exactly like RandomForest - automatic normalization happens internally
explainer = DPGExplainer(gb, feature_names, target_names)
explanation = explainer.explain_global(X)
Technical Note: GradientBoosting stores trees differently than RandomForest (2D array vs 1D list). DPGExplainer automatically normalizes this difference, so you don’t need to do anything special.
ExtraTreesClassifier#
Status: ✅ Fully supported
Extremely randomized trees work seamlessly with DPGExplainer.
from sklearn.ensemble import ExtraTreesClassifier
from dpg import DPGExplainer
et = ExtraTreesClassifier(n_estimators=10, max_depth=5)
et.fit(X, y)
explainer = DPGExplainer(et, feature_names, target_names)
explanation = explainer.explain_global(X)
AdaBoostClassifier#
Status: ✅ Fully supported
Adaptive Boosting is fully supported.
from sklearn.ensemble import AdaBoostClassifier
from dpg import DPGExplainer
ada = AdaBoostClassifier(n_estimators=10)
ada.fit(X, y)
explainer = DPGExplainer(ada, feature_names, target_names)
explanation = explainer.explain_global(X)
BaggingClassifier#
Status: ✅ Fully supported
Bootstrap Aggregating works with DPGExplainer.
from sklearn.ensemble import BaggingClassifier
from dpg import DPGExplainer
bag = BaggingClassifier(n_estimators=10)
bag.fit(X, y)
explainer = DPGExplainer(bag, feature_names, target_names)
explanation = explainer.explain_global(X)
Regression Models#
RandomForestRegressor#
Status: ✅ Fully supported
from sklearn.ensemble import RandomForestRegressor
from dpg import DPGExplainer
rf = RandomForestRegressor(n_estimators=10, max_depth=5)
rf.fit(X, y)
explainer = DPGExplainer(rf, feature_names, target_names=["prediction"])
explanation = explainer.explain_global(X)
GradientBoostingRegressor#
Status: ✅ Fully supported (NEW!)
Gradient Boosting regression is now fully supported with the same automatic normalization as the classifier version.
from sklearn.ensemble import GradientBoostingRegressor
from dpg import DPGExplainer
gb = GradientBoostingRegressor(n_estimators=100, max_depth=5, learning_rate=0.1)
gb.fit(X, y)
explainer = DPGExplainer(gb, feature_names, target_names=["prediction"])
explanation = explainer.explain_global(X)
ExtraTreesRegressor#
Status: ✅ Fully supported
from sklearn.ensemble import ExtraTreesRegressor
from dpg import DPGExplainer
et = ExtraTreesRegressor(n_estimators=10, max_depth=5)
et.fit(X, y)
explainer = DPGExplainer(et, feature_names, target_names=["prediction"])
explanation = explainer.explain_global(X)
AdaBoostRegressor#
Status: ✅ Fully supported
from sklearn.ensemble import AdaBoostRegressor
from dpg import DPGExplainer
ada = AdaBoostRegressor(n_estimators=10)
ada.fit(X, y)
explainer = DPGExplainer(ada, feature_names, target_names=["prediction"])
explanation = explainer.explain_global(X)
Unsupported Models#
The following models are NOT supported:
Model |
Reason |
|---|---|
DecisionTreeClassifier / DecisionTreeRegressor |
Single tree, not an ensemble |
LogisticRegression |
Linear model, not tree-based |
SVC / SVR |
Support vector machines, not tree-based |
KNeighborsClassifier / KNeighborsRegressor |
Instance-based, not tree-based |
Neural Networks (MLPClassifier, etc.) |
Non-tree-based |
Linear/Ridge/Lasso Regression |
Linear models, not tree-based |
If you try to use an unsupported model, you’ll get a clear error message:
DPGError: Model must be a tree-based ensemble
Model Comparison#
Model |
Type |
Status |
Trees |
Parameters |
|---|---|---|---|---|
RandomForestClassifier |
Bagging |
✅ |
Independent |
n_estimators, max_depth |
GradientBoostingClassifier |
Boosting |
✅ NEW |
Sequential |
n_estimators, learning_rate, max_depth |
ExtraTreesClassifier |
Bagging |
✅ |
Independent |
n_estimators, max_depth |
AdaBoostClassifier |
Boosting |
✅ |
Sequential |
n_estimators, learning_rate |
BaggingClassifier |
Bagging |
✅ |
Independent |
n_estimators |
GradientBoosting Implementation Details#
What Changed#
Previously, using GradientBoostingClassifier or GradientBoostingRegressor would fail with:
AttributeError: 'numpy.ndarray' object has no attribute 'tree_'
This happened because GradientBoosting stores trees in a 2D array (n_classes, n_estimators) while other models use a 1D list. DPGExplainer expects a consistent 1D structure.
How It Works Now#
DPGExplainer includes an automatic normalizer (SklearnEnsembleNormalizer) that:
Detects GradientBoosting models automatically during initialization
Flattens the 2D estimators array to a 1D list
Processes the model normally (transparent to the user)
Performance Impact#
Normalization overhead: < 1ms (one-time, during initialization)
Extraction overhead: None (same iteration logic as before)
Memory impact: Negligible (list is same size as 2D array)
Testing#
All supported models are thoroughly tested:
# Run model-specific tests
pytest tests/test_sklearn_models.py -v
# Test results
# - GradientBoostingClassifier: binary, multiclass ✓
# - GradientBoostingRegressor: regression ✓
# - Backward compatibility: all other models ✓
# - Total: 203 tests passing
Examples#
Example 1: Compare Multiple Models#
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from dpg import DPGExplainer
iris = load_iris()
X, y = iris.data, iris.target
models = {
'RandomForest': RandomForestClassifier(n_estimators=10),
'GradientBoosting': GradientBoostingClassifier(n_estimators=10),
}
for name, model in models.items():
model.fit(X, y)
explainer = DPGExplainer(model, iris.feature_names, iris.target_names)
explanation = explainer.explain_global(X)
print(f"{name}: {len(explanation.nodes)} nodes, {len(explanation.graph.edges())} edges")
Example 2: Hyperparameter Exploration with GradientBoosting#
from sklearn.ensemble import GradientBoostingClassifier
from dpg import DPGExplainer
gb = GradientBoostingClassifier(
n_estimators=100, # Number of boosting stages
learning_rate=0.1, # Learning rate (smaller → more conservative)
max_depth=3, # Depth of each tree
subsample=0.8, # Fraction of samples for fitting
random_state=42
)
gb.fit(X, y)
explainer = DPGExplainer(gb, feature_names, target_names)
explanation = explainer.explain_global(X)
# Inspect node metrics
print(explanation.node_metrics.head(10))
Example 3: Local Explanations with GradientBoosting#
from sklearn.ensemble import GradientBoostingClassifier
from dpg import DPGExplainer
gb = GradientBoostingClassifier(n_estimators=10)
gb.fit(X, y)
explainer = DPGExplainer(gb, feature_names, target_names)
explainer.fit(X) # Fit the DPG
# Explain a single sample
sample = X[0]
local = explainer.explain_local(sample, sample_id=0)
print(f"Prediction: {local.majority_vote}")
print(f"Class votes: {local.class_votes}")
print(f"Confidence: {local.sample_confidence}")
Tips and Best Practices#
Model Size: DPGExplainer works best with 5-100 trees. Very large ensembles may produce complex graphs.
Tree Depth: Shallow trees (max_depth=3-5) tend to produce more interpretable DPGs.
GradientBoosting Learning Rate: Higher learning rates lead to fewer, stronger trees. Experiment with values like 0.01, 0.1, 0.5.
Data Size: The DPG extraction scales linearly with number of samples and trees. For very large datasets, consider sampling.
Configuration: Use
perc_varanddecimal_thresholdto control the DPG complexity:explainer = DPGExplainer( model=gb, feature_names=feature_names, target_names=target_names, dpg_config={ "dpg": { "default": { "perc_var": 1e-9, # Filter rare paths "decimal_threshold": 2, # Round thresholds to 2 decimals "n_jobs": -1, # Use all CPU cores } } } )
Troubleshooting#
Model Not Recognized#
DPGError: Model must be a tree-based ensemble
Solution: Check that your model has an estimators_ attribute. Use print(type(model.estimators_)) to verify.
Out of Memory with Large Models#
Solution:
Reduce
n_estimatorsSample your training data:
explainer.explain_global(X[:1000])Increase
perc_varto filter more paths
Unexpected Graph Structure#
Solution:
Check
perc_var- if too high, many paths are filteredVerify
decimal_thresholddoesn’t oversimplify thresholdsTry different
graph_constructionmodes:"execution_trace"vs"aggregated_transitions"
Future Support#
We’re actively working on support for:
XGBoost
LightGBM
CatBoost
Submit feature requests on GitHub Issues.