[enhancement] Implement MaxAbsScaler Estimator **AI implemented**#3020
Open
icfaust wants to merge 24 commits intouxlfoundation:mainfrom
Open
[enhancement] Implement MaxAbsScaler Estimator **AI implemented**#3020icfaust wants to merge 24 commits intouxlfoundation:mainfrom
icfaust wants to merge 24 commits intouxlfoundation:mainfrom
Conversation
icfaust
commented
Mar 12, 2026
| # limitations under the License. | ||
| # ============================================================================== | ||
|
|
||
| import numpy as np |
Contributor
Author
There was a problem hiding this comment.
unused import here.
icfaust
commented
Mar 12, 2026
| (X,) = data | ||
| patching_status.and_conditions( | ||
| [ | ||
| (not is_sparse(X), "Sparse input is not supported"), |
Contributor
Author
There was a problem hiding this comment.
I assume benchmarking here will be necessary to find where the standard sklearn implementation is faster in finding the min and max, and then add a condition here to make sure ours is used when accelerating.
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 32 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Contributor
|
/intelci: run |
Contributor
|
/intelci: run |
Contributor
Author
|
/intelci: run |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Note: Here is my Implementation plan from Antigravity (this PR took an hour max):
Goal Description
The objective is to create a new sklearnex estimator, MaxAbsScaler, that duplicates the API functionality of
sklearn.preprocessing.MaxAbsScaler. This estimator will be accelerated using Intel oneDAL's IncrementalBasicStatistics from the onedal backend directly. Following the design patterns of DummyRegressor, we will implement an API-compatible layer in sklearnex that uses the onedal backend natively. The new estimator will be integrated into the preview submodule.Proposed Changes
sklearnex Layer (
sklearnex/preview/preprocessing/)We will implement the frontend estimator that conforms to scikit-learn's API and utilizes oneDAL's
IncrementalBasicStatistics. We will place this under_data.pyto match scikit-learn's internal file structures.[NEW]
sklearnex/preview/preprocessing/__init__.pyExports
MaxAbsScaler.[NEW]
sklearnex/preview/preprocessing/_data.pyoneDALEstimatorandsklearn.preprocessing.MaxAbsScaler.fit/partial_fit: Overridden methods with the@control_n_jobsdecorator. Thetransformmethod will NOT be overridden; we will rely purely on the standardsklearnimplementation oftransform._onedal_cpu_supported/_onedal_gpu_supported: Defines condition chains to fallback tosklearn(e.g., if input is sparse, asMaxAbsScalersupportscsr_matrixnatively in sklearn, butIncrementalBasicStatisticsmight only support dense arrays; also checking for supported numerical typesfloat32,float64)._onedal_fit/_onedal_partial_fit: Internal routines executing the dispatch logic. Here we will useonedal.basic_statistics.IncrementalBasicStatistics(result_options=["min", "max"])to compute themin_andmax_for the batch/data._onedal_finalize_fit: We will computemax_abs_ = np.maximum(np.abs(min_), np.abs(max_))andscale_accordingly using numpy/xp functionality.SPMD Interface (
sklearnex/spmd/preprocessing/)We will also provide a distributed implementation via SPMD functionality relying on
onedal.spmd.basic_statistics.[NEW]
sklearnex/spmd/preprocessing/__init__.pyExports
MaxAbsScalerfor the SPMD interface.[NEW]
sklearnex/spmd/preprocessing/_data.pysklearnex.preview.preprocessing.MaxAbsScaler(our base preview class)._onedal_incremental_basic_statisticsstatic method will be overridden to point toonedal.spmd.basic_statistics.IncrementalBasicStatistics.Dispatcher Updates (
sklearnex/)[MODIFY]
sklearnex/dispatcher.pysklearn.preprocessing.MaxAbsScalerto thepreview_mappingsopatch_sklearn()correctly diverts execution tosklearnex.preview.preprocessing.MaxAbsScalerwhen preview mode is on. Ensure the properpreprocessing_module(import sklearn.preprocessing as preprocessing_module) is passed in the patch map tuple.Verification Plan
Automated Tests
We will add
sklearnex/preview/preprocessing/tests/test_data.pycontaining a comprehensive test suite.Tests will include:
fit,partial_fit,transform,inverse_transform) comparingmax_abs_,scale_, and transformed output against standardsklearn.preprocessing.MaxAbsScaler.Sparsearrays trigger a fallback tosklearnsuccessfully.partial_fithandles continuous batches accurately.sklearnexestimator validation.We will also add SPMD testing in
sklearnex/spmd/preprocessing/tests/test_data_spmd.py:@pytest.mark.mpichecking thatsklearnex.spmdimplementation is entirely equivalent to local batch execution via the non-SPMDsklearnexpreview module._get_local_tensorand_convert_to_dataframeto simulate SPMD environments where data is split among ranks.Linting
.pre-commit-config.yaml.Checklist:
Completeness and readability
Testing
Performance