# Quickstart ## Installation ```bash pip install dpg ``` DPG requires Python 3.10+. If you want graph rendering, install the system [Graphviz](https://graphviz.org/download/) package as well so the `dot` executable is available on your `PATH`. For local development installs and longer setup notes, see [docs/README.md](README.md). ## Minimal example The simplest way to use DPG is through `DPGExplainer`: ```python from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier from sklearn.datasets import load_iris from dpg import DPGExplainer # 1. Train any tree-based ensemble (RandomForest, GradientBoosting, AdaBoost, etc.) X, y = load_iris(return_X_y=True, as_frame=True) # model = RandomForestClassifier(n_estimators=5, random_state=42).fit(X, y) model = GradientBoostingClassifier(n_estimators=5, random_state=42).fit(X, y) # 2. Create the explainer (automatic model adaptation happens here) explainer = DPGExplainer( model, feature_names=X.columns.tolist(), target_names=["setosa", "versicolor", "virginica"], ) # 3. Fit (extract the graph from training paths) explainer.fit(X.values) explanation = explainer.explain_global() # 4. Inspect metrics print(explanation.node_metrics.head()) print(explanation.edge_metrics.head()) # 5. Visualise explainer.plot(explanation, save_dir="results/") ``` ## What `DPGExplainer` returns `explainer.explain_global()` returns a {class}`dpg.DPGExplanation` dataclass with: | Attribute | Type | Description | |---|---|---| | `graph` | `nx.DiGraph` | NetworkX directed graph | | `dot` | `graphviz.Digraph` | Graphviz rendering object | | `node_metrics` | `pd.DataFrame` | Per-node betweenness, LRC, degree, … | | `edge_metrics` | `pd.DataFrame` | Per-edge weight, source/target labels | | `class_boundaries` | `dict` | Per-class feature constraint ranges | | `communities` | `dict` | Optional community assignments | ## Configuration DPG can be configured via a YAML file or a dict: ```python explainer = DPGExplainer( model, feature_names=X.columns.tolist(), dpg_config={ "dpg": { "default": { "perc_var": 0.001, # minimum path frequency (0-1) "decimal_threshold": 2, # rounding for thresholds "n_jobs": -1, # -1 = all CPU cores } } }, ) ``` You can also configure how the graph is constructed: ```python explainer = DPGExplainer( model, feature_names=X.columns.tolist(), dpg_config={ "dpg": { "default": { "perc_var": 1e-9, "decimal_threshold": 6, "n_jobs": -1, }, "graph_construction": { "mode": "execution_trace", # or "aggregated_transitions" }, } }, ) ``` - `aggregated_transitions`: default global DPG behavior. - `execution_trace`: trace-first construction, useful for local path inspection. ## Supported Models DPGExplainer works with a wide range of scikit-learn tree-based ensemble models: **Classification:** - ✅ `RandomForestClassifier` - ✅ `GradientBoostingClassifier` (NEW!) - ✅ `ExtraTreesClassifier` - ✅ `AdaBoostClassifier` - ✅ `BaggingClassifier` **Regression:** - ✅ `RandomForestRegressor` - ✅ `GradientBoostingRegressor` (NEW!) - ✅ `ExtraTreesRegressor` - ✅ `AdaBoostRegressor` All models work automatically without any special configuration. DPGExplainer detects the model type and handles differences internally. For a complete list of models and detailed configuration options, see [Supported Models](supported_models.md). ## Local explanations After fitting the explainer, you can inspect one sample at a time: ```python local = explainer.explain_local(sample=X.iloc[0].values, sample_id=0) print(local.majority_vote) print(local.class_votes) print(local.sample_confidence) local_df = explainer.local_path_dataframe(local) print(local_df.head()) ``` Path labels remain in DPG format such as `Class 0`, while `local.class_votes` and `local.majority_vote` use normalized class names such as `0`. To render the local paths on top of the fitted DPG: ```python explainer.plot_local_on_dpg( "iris_local_sample0", local_explanation=local, true_class_label=str(y.iloc[0]), save_dir="results/", theme="dpg", palette="olive", show=False, ) ``` See [examples/local_explanation_iris.py](../examples/local_explanation_iris.py) for a minimal runnable script. ## Faithfulness evaluation DPG can evaluate local explanations against the fitted black-box model: ```python details = explainer.evaluate_faithfulness( X_test, y_true=y_test, return_details=True, ) print(details["faithfulness_score"]) print(details["output_fidelity"]) print(details["mean_trace_coverage_score"]) print(details["mean_recombination_rate"]) ``` This API reports: - `output_fidelity`: agreement between the local explanation and the model - structural metrics such as trace coverage and recombination - semantic metrics such as evidence margin - a composite `faithfulness_score` Notes: - the composite score is a heuristic summary, not a calibrated probability - `output_fidelity` measures agreement with the black-box model - `local_accuracy` is only available when `y_true` is supplied - structural faithfulness here is about recovering executed decision traces ## Visualisation options For a complete gallery of available graph and chart outputs, see [Visualization](visualization.md). The example outputs below use the themed DPG palette. ```python from dpg import plot_dpg, plot_lrc_vs_rf_importance, plot_top_lrc_predicate_splits # Basic DPG plot plot_dpg( "iris_dpg", explanation.dot, explanation.node_metrics, explanation.edge_metrics, save_dir="results/", attribute="Local reaching centrality", # color by LRC theme="dpg", palette="olive", layout_template="vertical", label_mode="wrapped", readability="presentation", fig_size=(14, 14), title="Iris Decision Predicate Graph by Local Reaching Centrality", ) # Compare DPG importance vs Random Forest importance plot_lrc_vs_rf_importance( explanation, model, X, dataset_name="Iris", theme="dpg", palette="olive", ) # Visualise top predicate split lines in feature space plot_top_lrc_predicate_splits( explanation, X, y, dataset_name="Iris", theme="dpg", palette="olive", ) ``` The theming API also supports `theme="legacy"` and `palette="default"` if you want a more neutral look. Example outputs: ```{figure} _static/quickstart/iris_dpg.png :alt: Example Decision Predicate Graph visualization for the Iris dataset :width: 100% Decision Predicate Graph colored by Local Reaching Centrality. ``` ```{figure} _static/quickstart/lrc_vs_rf_importance.png :alt: Comparison plot between DPG Local Reaching Centrality and Random Forest feature importance :width: 100% DPG-based feature importance compared with Random Forest feature importance. ``` ```{figure} _static/quickstart/top_lrc_predicate_splits.png :alt: Scatter plots highlighting the most important predicate split lines discovered by DPG :width: 100% Top predicate split lines in feature space ranked by Local Reaching Centrality. ``` ## scikit-learn compatible pipeline DPG also ships a scikit-learn `Transformer` wrapper: ```python from dpg.sklearn_dpg import DPGTransformer from sklearn.pipeline import Pipeline pipe = Pipeline([ ("dpg", DPGTransformer(model, feature_names=X.columns.tolist())), ]) ```