Advancing Feature Distribution Diversity in Vertical Federated Learning Benchmarks (ICLR 2024)
A collection of datasets, tools, and examples for federated learning
🤔 The estimated scope of VFL datasets
The scope of party imbalance (α) and correlation (β) in existing real VFL datasets, termed the real scope, is limited. VertiBench extends beyond the confines of existing uniform and real scopes, shedding light on VFL scenarios previously unexplored.
🔠Visualization of our splitting method
Figure 18: Visualization of CIFAR10 split by different α
Figure 20: Visualization of MNIST split by different α
🤗 Use the tools
Installation
To install VertiBench, run the following command: (The installation requires the installation of python>=3.9
)
pip install vertibench
Load dataset
from sklearn.datasets import make_classification # Generate a large dataset X, y = make_classification(n_samples=10000, n_features=10)
Split dataset by importance(α)
from vertibench.Splitter import ImportanceSplitter imp_splitter = ImportanceSplitter(num_parties=4, weights=[1, 1, 1, 3]) Xs = imp_splitter.split(X)
Split dataset by correlation(β)
from vertibench.Splitter import ImportanceSplitter imp_splitter = ImportanceSplitter(num_parties=4, weights=[1, 1, 1, 3]) Xs = imp_splitter.split(X)
Split dataset by importance
from vertibench.Splitter import CorrelationSplitter corr_splitter = CorrelationSplitter(num_parties=4) Xs = corr_splitter.fit_split(X)
😎 See the datasets
🎉 Cite the paper
@inproceedings{wu2024vertibench,
author = {Zhaomin Wu and Junyi Hou and Bingsheng He},
title = {VertiBench: Advancing Feature Distribution Diversity in Vertical Federated Learning Benchmarks},
booktitle = {The Twelfth International Conference on Learning Representations, {ICLR} 2024, Vienna, Austria, May 7-11, 2024},
publisher = {OpenReview.net},
year = {2024},
url = {https://openreview.net/forum?id=glwwbaeKm2},
timestamp = {Tue, 23 Jul 2024 16:00:00 +0200}
}