Skip to content

[enhancement] Implement MaxAbsScaler Estimator **AI implemented**#3020

Open
icfaust wants to merge 24 commits intouxlfoundation:mainfrom
icfaust:maxabs_test
Open

[enhancement] Implement MaxAbsScaler Estimator **AI implemented**#3020
icfaust wants to merge 24 commits intouxlfoundation:mainfrom
icfaust:maxabs_test

Conversation

@icfaust
Copy link
Copy Markdown
Contributor

@icfaust icfaust commented Mar 12, 2026

Description

Note: Here is my Implementation plan from Antigravity (this PR took an hour max):

Goal Description

The objective is to create a new sklearnex estimator, MaxAbsScaler, that duplicates the API functionality of sklearn.preprocessing.MaxAbsScaler. This estimator will be accelerated using Intel oneDAL's IncrementalBasicStatistics from the onedal backend directly. Following the design patterns of DummyRegressor, we will implement an API-compatible layer in sklearnex that uses the onedal backend natively. The new estimator will be integrated into the preview submodule.

Proposed Changes

sklearnex Layer (sklearnex/preview/preprocessing/)

We will implement the frontend estimator that conforms to scikit-learn's API and utilizes oneDAL's IncrementalBasicStatistics. We will place this under _data.py to match scikit-learn's internal file structures.

[NEW] sklearnex/preview/preprocessing/__init__.py

Exports MaxAbsScaler.

[NEW] sklearnex/preview/preprocessing/_data.py

  • Inherits from oneDALEstimator and sklearn.preprocessing.MaxAbsScaler.
  • fit / partial_fit: Overridden methods with the @control_n_jobs decorator. The transform method will NOT be overridden; we will rely purely on the standard sklearn implementation of transform.
  • _onedal_cpu_supported / _onedal_gpu_supported: Defines condition chains to fallback to sklearn (e.g., if input is sparse, as MaxAbsScaler supports csr_matrix natively in sklearn, but IncrementalBasicStatistics might only support dense arrays; also checking for supported numerical types float32, float64).
  • _onedal_fit / _onedal_partial_fit: Internal routines executing the dispatch logic. Here we will use onedal.basic_statistics.IncrementalBasicStatistics(result_options=["min", "max"]) to compute the min_ and max_ for the batch/data.
  • _onedal_finalize_fit: We will compute max_abs_ = np.maximum(np.abs(min_), np.abs(max_)) and scale_ accordingly using numpy/xp functionality.

SPMD Interface (sklearnex/spmd/preprocessing/)

We will also provide a distributed implementation via SPMD functionality relying on onedal.spmd.basic_statistics.

[NEW] sklearnex/spmd/preprocessing/__init__.py

Exports MaxAbsScaler for the SPMD interface.

[NEW] sklearnex/spmd/preprocessing/_data.py

  • Inherits from sklearnex.preview.preprocessing.MaxAbsScaler (our base preview class).
  • The _onedal_incremental_basic_statistics static method will be overridden to point to onedal.spmd.basic_statistics.IncrementalBasicStatistics.

Dispatcher Updates (sklearnex/)

[MODIFY] sklearnex/dispatcher.py

  • Add sklearn.preprocessing.MaxAbsScaler to the preview_mapping so patch_sklearn() correctly diverts execution to sklearnex.preview.preprocessing.MaxAbsScaler when preview mode is on. Ensure the proper preprocessing_module (import sklearn.preprocessing as preprocessing_module) is passed in the patch map tuple.

Verification Plan

Automated Tests

  • We will add sklearnex/preview/preprocessing/tests/test_data.py containing a comprehensive test suite.

  • Tests will include:

    1. Dense data validation (fit, partial_fit, transform, inverse_transform) comparing max_abs_, scale_, and transformed output against standard sklearn.preprocessing.MaxAbsScaler.
    2. Fallback checking to ensure Sparse arrays trigger a fallback to sklearn successfully.
    3. Batch processing tests to ensure partial_fit handles continuous batches accurately.
    4. Follow the precedent set out in the repository for other estimators.
    5. Include tests for Array API dispatch execution and device-specific behavior (e.g. GPU, via SYCL queues and DPCTL/DPNP dataframes) to align with standard sklearnex estimator validation.
  • We will also add SPMD testing in sklearnex/spmd/preprocessing/tests/test_data_spmd.py:

    1. Add tests marked with @pytest.mark.mpi checking that sklearnex.spmd implementation is entirely equivalent to local batch execution via the non-SPMD sklearnex preview module.
    2. Utilize helper functions like _get_local_tensor and _convert_to_dataframe to simulate SPMD environments where data is split among ranks.

Linting

  • the repo has black, isort, clang-format, numpydoc-validation and codespell format/lint hooks in its .pre-commit-config.yaml.
  • We must make sure to run those checks on the new files once the code is implemented.

Checklist:

Completeness and readability

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least a summary table with measured data, if performance change is expected.
  • I have provided justification why performance and/or quality metrics have changed or why changes are not expected.
  • I have extended the benchmarking suite and provided a corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

# limitations under the License.
# ==============================================================================

import numpy as np
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused import here.

(X,) = data
patching_status.and_conditions(
[
(not is_sparse(X), "Sparse input is not supported"),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume benchmarking here will be necessary to find where the standard sklearn implementation is faster in finding the min and max, and then add a condition here to make sure ours is used when accelerating.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
azure 77.06% <100.00%> (-2.46%) ⬇️
github ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...l/basic_statistics/incremental_basic_statistics.py 100.00% <100.00%> (ø)
sklearnex/dispatcher.py 86.04% <100.00%> (-5.08%) ⬇️

... and 32 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@icfaust icfaust marked this pull request as ready for review March 15, 2026 23:26
@icfaust icfaust requested a review from Vika-F as a code owner March 15, 2026 23:26
@yuejiaointel
Copy link
Copy Markdown
Contributor

/intelci: run

@yuejiaointel
Copy link
Copy Markdown
Contributor

/intelci: run

@icfaust icfaust changed the title [enhancement] Implement MaxAbsScalar Estimator **AI implemented** [enhancement] Implement MaxAbsScaler Estimator **AI implemented** Mar 18, 2026
@icfaust
Copy link
Copy Markdown
Contributor Author

icfaust commented Mar 19, 2026

/intelci: run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants