Skip to content

removals: engines from daal4py#3026

Open
Alexandr-Solovev wants to merge 11 commits intouxlfoundation:mainfrom
Alexandr-Solovev:dev/asolovev_removals_engines
Open

removals: engines from daal4py#3026
Alexandr-Solovev wants to merge 11 commits intouxlfoundation:mainfrom
Alexandr-Solovev:dev/asolovev_removals_engines

Conversation

@Alexandr-Solovev
Copy link
Copy Markdown
Contributor

Description


Checklist:

Completeness and readability

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least a summary table with measured data, if performance change is expected.
  • I have provided justification why performance and/or quality metrics have changed or why changes are not expected.
  • I have extended the benchmarking suite and provided a corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@david-cortes-intel
Copy link
Copy Markdown
Contributor

Please add them to the list of exceptions to parse from the generator like in previous PRs.

@david-cortes-intel
Copy link
Copy Markdown
Contributor

You might need to keep the enums:

Error compiling Cython file:
------------------------------------------------------------
...
        std_string& fptype,
                                                                                         std_string& method,
                                                                                         size_t nFactors,
                                                                                         size_t fullNUsers,
                                                                                         size_t seed,
                                                                                         engines_EnginePtr* engine,
                                                                                         ^
------------------------------------------------------------
build/daal4py_cy.pyx:14647:89: 'engines_EnginePtr' is not a type identifier

Error compiling Cython file:
------------------------------------------------------------
...
                  fptype = "double",
                  method = "defaultDense",
                  size_t nFactors = -1,
                  size_t fullNUsers = -1,
                  size_t seed = -1,
                  engines_engine engine = None,
                  ^
------------------------------------------------------------
build/daal4py_cy.pyx:14671:18: 'engines_engine' is not a type identifier

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
azure 79.63% <ø> (ø)
github ?

Flags with carried forward coverage won't be shown. Click here to find out more.
see 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JiwaniZakir

This comment was marked as spam.

JiwaniZakir

This comment was marked as spam.

@Alexandr-Solovev Alexandr-Solovev marked this pull request as ready for review April 7, 2026 13:26
Copilot AI review requested due to automatic review settings April 7, 2026 13:26
@Alexandr-Solovev
Copy link
Copy Markdown
Contributor Author

@david-cortes-intel can you please take a look one more time. Looks like I fixed the building issue, but not sure about testing issues

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the daal4py wrapper generator, scikit-learn integration code, examples, and docs to accommodate oneDAL >= 2026, where RNG “engines” are no longer part of the public API.

Changes:

  • Update wrapper generation to avoid wrapping engine namespaces/types and to drop engine / EnginePtr parameters for oneDAL >= 2026.
  • Update sklearn-facing algorithms (GBTs, RandomForest, KMeans init) and daal4py examples to only pass engine=... for oneDAL < 2026.
  • Remove RNG engine API documentation from doc/sources/daal4py.rst.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
generator/wrappers.py Skips wrapping specific engine namespaces for oneDAL >= 2026.
generator/gen_daal4py.py Drops engine params and removes engine-related interfaces; skips EnginePtr types for oneDAL >= 2026.
examples/daal4py/decision_forest_regression_hist.py Conditionally passes engine only for oneDAL < 2026.
examples/daal4py/decision_forest_regression_default_dense.py Conditionally passes engine only for oneDAL < 2026.
examples/daal4py/decision_forest_classification_hist.py Conditionally passes engine only for oneDAL < 2026.
examples/daal4py/decision_forest_classification_default_dense.py Conditionally passes engine only for oneDAL < 2026.
doc/sources/daal4py.rst Removes RNG engine API section from docs.
daal4py/sklearn/ensemble/GBTDAAL.py Conditionally passes engine only for oneDAL < 2026 in GBT training.
daal4py/sklearn/ensemble/_forest.py Conditionally passes engine only for oneDAL < 2026 in RF training.
daal4py/sklearn/cluster/k_means.py Conditionally passes engine only for oneDAL < 2026 in kmeans init paths.

Comment on lines +95 to +99
if not _engines_removed:
_seed = random_state.randint(np.iinfo("i").max)
daal_engine = daal4py.engines_mt19937(
fptype=X_fptype, method="defaultDense", seed=_seed
)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When oneDAL >= 2026, _engines_removed is true and this branch skips creating/setting a seeded RNG engine. That means random_state no longer controls the k-means++ (and similarly the random) initialization, so results may become non-reproducible even with a fixed random_state, which can violate scikit-learn estimator expectations. Consider implementing a random_state-driven initialization in Python (e.g., sklearn-style k-means++/random init) or falling back to scikit-learn for initialization when engines are unavailable; at minimum emit a warning that random_state cannot be honored for oneDAL >= 2026.

Copilot uses AI. Check for mistakes.
}
if not daal_check_version((2026, "P", 0)):
daal_engine = daal4py.engines_mt19937(seed=seed_, fptype=getFPType(X))
parameters["engine"] = daal_engine
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For oneDAL >= 2026 this code stops passing a seeded engine into decision forest training, so random_state/seed_ no longer influences the underlying RNG. That can make RandomForestClassifier(random_state=...) non-deterministic and may cause scikit-learn compatibility checks to fail. If oneDAL provides an alternative seed parameter in 2026+, use it here; otherwise consider falling back to the scikit-learn implementation or warning that determinism is not guaranteed.

Suggested change
parameters["engine"] = daal_engine
parameters["engine"] = daal_engine
elif seed_ is not None:
warnings.warn(
"oneDAL 2026+ does not accept the seeded engine parameter used by "
"this code path, so `random_state` may not fully determine the "
"underlying RNG and results may be non-deterministic.",
RuntimeWarning,
)

Copilot uses AI. Check for mistakes.
Comment on lines +1224 to +1226
if not daal_check_version((2026, "P", 0)):
daal_engine = daal4py.engines_mt19937(seed=seed_, fptype=getFPType(X))
parameters["engine"] = daal_engine
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For oneDAL >= 2026 this code stops passing a seeded engine into decision forest training, so random_state/seed_ no longer influences the underlying RNG. That can make RandomForestRegressor(random_state=...) non-deterministic and may cause scikit-learn compatibility checks to fail. If oneDAL provides an alternative seed parameter in 2026+, use it here; otherwise consider falling back to the scikit-learn implementation or warning that determinism is not guaranteed.

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +198
if not daal_check_version((2026, "P", 0)):
train_params["engine"] = d4p.engines_mcg59(seed=seed_)
train_algo = d4p.gbt_classification_training(**train_params)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For oneDAL >= 2026, this skips setting a seeded engine, so random_state/seed_ no longer controls training randomness. That can make GBTDAALClassifier(random_state=...) non-deterministic and may break scikit-learn estimator checks. If oneDAL 2026+ has a replacement seed parameter, pass seed_ via that; otherwise consider falling back to scikit-learn (or warning) when determinism is required.

Copilot uses AI. Check for mistakes.
Comment on lines +304 to +306
if not daal_check_version((2026, "P", 0)):
train_params["engine"] = d4p.engines_mcg59(seed=seed_)
train_algo = d4p.gbt_regression_training(**train_params)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For oneDAL >= 2026, this skips setting a seeded engine, so random_state/seed_ no longer controls training randomness. That can make GBTDAALRegressor(random_state=...) non-deterministic and may break scikit-learn estimator checks. If oneDAL 2026+ has a replacement seed parameter, pass seed_ via that; otherwise consider falling back to scikit-learn (or warning) when determinism is required.

Copilot uses AI. Check for mistakes.
@david-cortes-intel
Copy link
Copy Markdown
Contributor

@Alexandr-Solovev The CI error:

n_estimators = 8000

    @pytest.mark.parametrize("n_estimators", N_ESTIMATORS_IRIS)
    def test_classifier_big_estimators_iris(n_estimators):
>       _compare_with_sklearn_classifier_iris(
            n_estimators=n_estimators, description=f"Classifier n_estimators={n_estimators}: "
        )

../venv/lib/python3.13/site-packages/daal4py/sklearn/ensemble/tests/test_decision_forest.py:156: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../venv/lib/python3.13/site-packages/daal4py/sklearn/ensemble/tests/test_decision_forest.py:58: in _compare_with_sklearn_classifier_iris
    daal4py_model.fit(x_train, y_train, sample_weight=sample_weight)
../venv/lib/python3.13/site-packages/daal4py/sklearn/_n_jobs_support.py:113: in n_jobs_wrapper
    return method(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../venv/lib/python3.13/site-packages/daal4py/sklearn/ensemble/_forest.py:468: in fit
    self._daal_fit_classifier(X, y, sample_weight=sample_weight)
../venv/lib/python3.13/site-packages/daal4py/sklearn/ensemble/_forest.py:777: in _daal_fit_classifier
    dfc_trainingResult = dfc_algorithm.compute(X, y, sample_weight)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
build/daal4py_cy.pyx:8898: in daal4py._daal4py.decision_forest_classification_training.compute
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   RuntimeError: Engine not supported

build/daal4py_cy.pyx:8877: RuntimeError

It is calling this function:
https://uxlfoundation.github.io/scikit-learn-intelex/2025.11/daal4py.html#daal4py.decision_forest_classification_training.compute

The object is being created with default argument for engine (i.e. engine=None), and that is erroring out.

I believe the arguments it takes and their defaults are generated from the oneDAL headers.

Were the relevant changes for this already merged to oneDAL?

@Alexandr-Solovev
Copy link
Copy Markdown
Contributor Author

@Alexandr-Solovev The CI error:

n_estimators = 8000

    @pytest.mark.parametrize("n_estimators", N_ESTIMATORS_IRIS)
    def test_classifier_big_estimators_iris(n_estimators):
>       _compare_with_sklearn_classifier_iris(
            n_estimators=n_estimators, description=f"Classifier n_estimators={n_estimators}: "
        )

../venv/lib/python3.13/site-packages/daal4py/sklearn/ensemble/tests/test_decision_forest.py:156: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../venv/lib/python3.13/site-packages/daal4py/sklearn/ensemble/tests/test_decision_forest.py:58: in _compare_with_sklearn_classifier_iris
    daal4py_model.fit(x_train, y_train, sample_weight=sample_weight)
../venv/lib/python3.13/site-packages/daal4py/sklearn/_n_jobs_support.py:113: in n_jobs_wrapper
    return method(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../venv/lib/python3.13/site-packages/daal4py/sklearn/ensemble/_forest.py:468: in fit
    self._daal_fit_classifier(X, y, sample_weight=sample_weight)
../venv/lib/python3.13/site-packages/daal4py/sklearn/ensemble/_forest.py:777: in _daal_fit_classifier
    dfc_trainingResult = dfc_algorithm.compute(X, y, sample_weight)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
build/daal4py_cy.pyx:8898: in daal4py._daal4py.decision_forest_classification_training.compute
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   RuntimeError: Engine not supported

build/daal4py_cy.pyx:8877: RuntimeError

It is calling this function: https://uxlfoundation.github.io/scikit-learn-intelex/2025.11/daal4py.html#daal4py.decision_forest_classification_training.compute

The object is being created with default argument for engine (i.e. engine=None), and that is erroring out.

I believe the arguments it takes and their defaults are generated from the oneDAL headers.

Were the relevant changes for this already merged to oneDAL?

Not yet, the changes are here uxlfoundation/oneDAL#3586

@david-cortes-intel
Copy link
Copy Markdown
Contributor

david-cortes-intel commented Apr 7, 2026

@Alexandr-Solovev I get the same error when running it locally if I try the branch from your fork of oneDAL.

@david-cortes-intel
Copy link
Copy Markdown
Contributor

Note that the init signature doesn't explicitly include engine:

Init signature: daal4py.decision_forest_classification_training(self, /, *args, **kwargs)

And the parameters do not pass it either:

{'bootstrap': True, 'featuresPerNode': 2, 'fptype': 'double', 'impurityThreshold': 0.0, 'maxBins': 256, 'maxLeafNodes': 0, 'maxTreeDepth': 0, 'memorySavingMode': False, 'method': 'hist', 'minBinSize': 1, 'minImpurityDecreaseInSplitNode': 0.0, 'minWeightFractionInLeafNode': 0.0, 'nClasses': 3, 'nTrees': 8000, 'observationsPerTreeFraction': 1.0, 'resultsToCompute': '', 'varImportance': 'MDI', 'minObservationsInSplitNode': 2, 'minObservationsInLeafNode': 1, 'binningStrategy': 'quantiles'}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants