Advancing Feature Distribution Diversity in
Vertical Federated Learning Benchmarks (ICLR 2024)

A collection of datasets, tools, and examples for federated learning

🤔 The estimated scope of VFL datasets

vfl_dataset_scpoe

The scope of party imbalance (α) and correlation (β) in existing real VFL datasets, termed the real scope, is limited. VertiBench extends beyond the confines of existing uniform and real scopes, shedding light on VFL scenarios previously unexplored.

🔭 Visualization of our splitting method

cifar10_imp0.1

Figure 18: Visualization of CIFAR10 split by different α

cifar10_corr0.0

Figure 20: Visualization of MNIST split by different α

🤗 Use the tools

Installation

To install VertiBench, run the following command: (The installation requires the installation of python>=3.9)

pip install vertibench

Load dataset

from sklearn.datasets import make_classification

# Generate a large dataset
X, y = make_classification(n_samples=10000, n_features=10)

Split dataset by importance(α)

from vertibench.Splitter import ImportanceSplitter

imp_splitter = ImportanceSplitter(num_parties=4, weights=[1, 1, 1, 3])
Xs = imp_splitter.split(X)

Split dataset by correlation(β)

from vertibench.Splitter import ImportanceSplitter

imp_splitter = ImportanceSplitter(num_parties=4, weights=[1, 1, 1, 3])
Xs = imp_splitter.split(X)

Split dataset by importance

from vertibench.Splitter import CorrelationSplitter

corr_splitter = CorrelationSplitter(num_parties=4)
Xs = corr_splitter.fit_split(X)

😎 See the datasets

🎉 Cite the paper

@inproceedings{wu2024vertibench,

author = {Zhaomin Wu and Junyi Hou and Bingsheng He},

title = {VertiBench: Advancing Feature Distribution Diversity in Vertical Federated Learning Benchmarks},

booktitle = {The Twelfth International Conference on Learning Representations, {ICLR} 2024, Vienna, Austria, May 7-11, 2024},

publisher = {OpenReview.net},

year = {2024},

url = {https://openreview.net/forum?id=glwwbaeKm2},

timestamp = {Tue, 23 Jul 2024 16:00:00 +0200}

}