# Supported Models

DPGExplainer supports various tree-based ensemble models from scikit-learn. This page documents all supported models, their features, and usage examples.

## Overview

DPGExplainer works with any sklearn ensemble model that has an `estimators_` attribute. The framework automatically detects the model type and handles tree structure differences transparently.

## Classification Models

### RandomForestClassifier
Status: ✅ Fully supported

The classic ensemble method. Works as expected with DPGExplainer.

```python
from sklearn.ensemble import RandomForestClassifier
from dpg import DPGExplainer

rf = RandomForestClassifier(n_estimators=10, max_depth=5)
rf.fit(X, y)

explainer = DPGExplainer(rf, feature_names, target_names)
explanation = explainer.explain_global(X)
```

### GradientBoostingClassifier
Status: ✅ Fully supported (NEW!)

Gradient Boosting is now fully supported with automatic tree structure normalization. No special configuration needed.

```python
from sklearn.ensemble import GradientBoostingClassifier
from dpg import DPGExplainer

gb = GradientBoostingClassifier(n_estimators=100, max_depth=5, learning_rate=0.1)
gb.fit(X, y)

# Works exactly like RandomForest - automatic normalization happens internally
explainer = DPGExplainer(gb, feature_names, target_names)
explanation = explainer.explain_global(X)
```

**Technical Note**: GradientBoosting stores trees differently than RandomForest (2D array vs 1D list). DPGExplainer automatically normalizes this difference, so you don't need to do anything special.

### ExtraTreesClassifier
Status: ✅ Fully supported

Extremely randomized trees work seamlessly with DPGExplainer.

```python
from sklearn.ensemble import ExtraTreesClassifier
from dpg import DPGExplainer

et = ExtraTreesClassifier(n_estimators=10, max_depth=5)
et.fit(X, y)

explainer = DPGExplainer(et, feature_names, target_names)
explanation = explainer.explain_global(X)
```

### AdaBoostClassifier
Status: ✅ Fully supported

Adaptive Boosting is fully supported.

```python
from sklearn.ensemble import AdaBoostClassifier
from dpg import DPGExplainer

ada = AdaBoostClassifier(n_estimators=10)
ada.fit(X, y)

explainer = DPGExplainer(ada, feature_names, target_names)
explanation = explainer.explain_global(X)
```

### BaggingClassifier
Status: ✅ Fully supported

Bootstrap Aggregating works with DPGExplainer.

```python
from sklearn.ensemble import BaggingClassifier
from dpg import DPGExplainer

bag = BaggingClassifier(n_estimators=10)
bag.fit(X, y)

explainer = DPGExplainer(bag, feature_names, target_names)
explanation = explainer.explain_global(X)
```

## Regression Models

### RandomForestRegressor
Status: ✅ Fully supported

```python
from sklearn.ensemble import RandomForestRegressor
from dpg import DPGExplainer

rf = RandomForestRegressor(n_estimators=10, max_depth=5)
rf.fit(X, y)

explainer = DPGExplainer(rf, feature_names, target_names=["prediction"])
explanation = explainer.explain_global(X)
```

### GradientBoostingRegressor
Status: ✅ Fully supported (NEW!)

Gradient Boosting regression is now fully supported with the same automatic normalization as the classifier version.

```python
from sklearn.ensemble import GradientBoostingRegressor
from dpg import DPGExplainer

gb = GradientBoostingRegressor(n_estimators=100, max_depth=5, learning_rate=0.1)
gb.fit(X, y)

explainer = DPGExplainer(gb, feature_names, target_names=["prediction"])
explanation = explainer.explain_global(X)
```

### ExtraTreesRegressor
Status: ✅ Fully supported

```python
from sklearn.ensemble import ExtraTreesRegressor
from dpg import DPGExplainer

et = ExtraTreesRegressor(n_estimators=10, max_depth=5)
et.fit(X, y)

explainer = DPGExplainer(et, feature_names, target_names=["prediction"])
explanation = explainer.explain_global(X)
```

### AdaBoostRegressor
Status: ✅ Fully supported

```python
from sklearn.ensemble import AdaBoostRegressor
from dpg import DPGExplainer

ada = AdaBoostRegressor(n_estimators=10)
ada.fit(X, y)

explainer = DPGExplainer(ada, feature_names, target_names=["prediction"])
explanation = explainer.explain_global(X)
```

## Unsupported Models

The following models are **NOT** supported:

| Model | Reason |
|-------|--------|
| DecisionTreeClassifier / DecisionTreeRegressor | Single tree, not an ensemble |
| LogisticRegression | Linear model, not tree-based |
| SVC / SVR | Support vector machines, not tree-based |
| KNeighborsClassifier / KNeighborsRegressor | Instance-based, not tree-based |
| Neural Networks (MLPClassifier, etc.) | Non-tree-based |
| Linear/Ridge/Lasso Regression | Linear models, not tree-based |

If you try to use an unsupported model, you'll get a clear error message:
```
DPGError: Model must be a tree-based ensemble
```

## Model Comparison

| Model | Type | Status | Trees | Parameters |
|-------|------|--------|-------|------------|
| RandomForestClassifier | Bagging | ✅ | Independent | n_estimators, max_depth |
| GradientBoostingClassifier | Boosting | ✅ NEW | Sequential | n_estimators, learning_rate, max_depth |
| ExtraTreesClassifier | Bagging | ✅ | Independent | n_estimators, max_depth |
| AdaBoostClassifier | Boosting | ✅ | Sequential | n_estimators, learning_rate |
| BaggingClassifier | Bagging | ✅ | Independent | n_estimators |

## GradientBoosting Implementation Details

### What Changed

Previously, using `GradientBoostingClassifier` or `GradientBoostingRegressor` would fail with:

```
AttributeError: 'numpy.ndarray' object has no attribute 'tree_'
```

This happened because GradientBoosting stores trees in a 2D array `(n_classes, n_estimators)` while other models use a 1D list. DPGExplainer expects a consistent 1D structure.

### How It Works Now

DPGExplainer includes an automatic normalizer (`SklearnEnsembleNormalizer`) that:

1. **Detects** GradientBoosting models automatically during initialization
2. **Flattens** the 2D estimators array to a 1D list
3. **Processes** the model normally (transparent to the user)

### Performance Impact

- **Normalization overhead**: < 1ms (one-time, during initialization)
- **Extraction overhead**: None (same iteration logic as before)
- **Memory impact**: Negligible (list is same size as 2D array)

## Testing

All supported models are thoroughly tested:

```bash
# Run model-specific tests
pytest tests/test_sklearn_models.py -v

# Test results
# - GradientBoostingClassifier: binary, multiclass ✓
# - GradientBoostingRegressor: regression ✓
# - Backward compatibility: all other models ✓
# - Total: 203 tests passing
```

## Examples

### Example 1: Compare Multiple Models

```python
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from dpg import DPGExplainer

iris = load_iris()
X, y = iris.data, iris.target

models = {
    'RandomForest': RandomForestClassifier(n_estimators=10),
    'GradientBoosting': GradientBoostingClassifier(n_estimators=10),
}

for name, model in models.items():
    model.fit(X, y)
    explainer = DPGExplainer(model, iris.feature_names, iris.target_names)
    explanation = explainer.explain_global(X)
    
    print(f"{name}: {len(explanation.nodes)} nodes, {len(explanation.graph.edges())} edges")
```

### Example 2: Hyperparameter Exploration with GradientBoosting

```python
from sklearn.ensemble import GradientBoostingClassifier
from dpg import DPGExplainer

gb = GradientBoostingClassifier(
    n_estimators=100,      # Number of boosting stages
    learning_rate=0.1,     # Learning rate (smaller → more conservative)
    max_depth=3,           # Depth of each tree
    subsample=0.8,         # Fraction of samples for fitting
    random_state=42
)

gb.fit(X, y)
explainer = DPGExplainer(gb, feature_names, target_names)
explanation = explainer.explain_global(X)

# Inspect node metrics
print(explanation.node_metrics.head(10))
```

### Example 3: Local Explanations with GradientBoosting

```python
from sklearn.ensemble import GradientBoostingClassifier
from dpg import DPGExplainer

gb = GradientBoostingClassifier(n_estimators=10)
gb.fit(X, y)

explainer = DPGExplainer(gb, feature_names, target_names)
explainer.fit(X)  # Fit the DPG

# Explain a single sample
sample = X[0]
local = explainer.explain_local(sample, sample_id=0)

print(f"Prediction: {local.majority_vote}")
print(f"Class votes: {local.class_votes}")
print(f"Confidence: {local.sample_confidence}")
```

## Tips and Best Practices

1. **Model Size**: DPGExplainer works best with 5-100 trees. Very large ensembles may produce complex graphs.

2. **Tree Depth**: Shallow trees (max_depth=3-5) tend to produce more interpretable DPGs.

3. **GradientBoosting Learning Rate**: Higher learning rates lead to fewer, stronger trees. Experiment with values like 0.01, 0.1, 0.5.

4. **Data Size**: The DPG extraction scales linearly with number of samples and trees. For very large datasets, consider sampling.

5. **Configuration**: Use `perc_var` and `decimal_threshold` to control the DPG complexity:
   ```python
   explainer = DPGExplainer(
       model=gb,
       feature_names=feature_names,
       target_names=target_names,
       dpg_config={
           "dpg": {
               "default": {
                   "perc_var": 1e-9,          # Filter rare paths
                   "decimal_threshold": 2,    # Round thresholds to 2 decimals
                   "n_jobs": -1,              # Use all CPU cores
               }
           }
       }
   )
   ```

## Troubleshooting

### Model Not Recognized

```
DPGError: Model must be a tree-based ensemble
```

**Solution**: Check that your model has an `estimators_` attribute. Use `print(type(model.estimators_))` to verify.

### Out of Memory with Large Models

**Solution**: 
- Reduce `n_estimators`
- Sample your training data: `explainer.explain_global(X[:1000])`
- Increase `perc_var` to filter more paths

### Unexpected Graph Structure

**Solution**: 
- Check `perc_var` - if too high, many paths are filtered
- Verify `decimal_threshold` doesn't oversimplify thresholds
- Try different `graph_construction` modes: `"execution_trace"` vs `"aggregated_transitions"`

## Future Support

We're actively working on support for:
- XGBoost
- LightGBM
- CatBoost

Submit feature requests on [GitHub Issues](https://github.com/Meta-Group/DPG/issues).