Anchor Data Augmentation as a Generalized Variant of C-Mixup

Written by anchoring | Published 2024/11/14
Tech Story Tags: data-augmentation | anchor-data | anchor-data-augmentation | nonlinear-regression | neural-networks | reinforcement-learning | anchor-regression | regression-models

TLDRADA generalizes C-Mixup by mixing multiple samples based on their cluster membership, preserving nonlinear relationships in augmented regression data. The approach allows augmentations to stay within or extend beyond the convex hull of original samples, improving data diversity while maintaining model accuracy.via the TL;DR App

Authors:

(1) Nora Schneider, Computer Science Department, ETH Zurich, Zurich, Switzerland (nschneide@student.ethz.ch);

(2) Shirin Goshtasbpour, Computer Science Department, ETH Zurich, Zurich, Switzerland and Swiss Data Science Center, Zurich, Switzerland (shirin.goshtasbpour@inf.ethz.ch);

(3) Fernando Perez-Cruz, Computer Science Department, ETH Zurich, Zurich, Switzerland and Swiss Data Science Center, Zurich, Switzerland (fernando.perezcruz@sdsc.ethz.ch).

Table of Links

Abstract and 1 Introduction

2 Background

2.1 Data Augmentation

2.2 Anchor Regression

3 Anchor Data Augmentation

3.1 Comparison to C-Mixup and 3.2 Preserving nonlinear data structure

3.3 Algorithm

4 Experiments and 4.1 Linear synthetic data

4.2 Housing nonlinear regression

4.3 In-distribution Generalization

4.4 Out-of-distribution Robustness

5 Conclusion, Broader Impact, and References

A Additional information for Anchor Data Augmentation

B Experiments

3.1 Comparison to C-Mixup

3.2 Preserving nonlinear data structure

The AR modification Equations 5 and 6 do not preserve the nonlinear relation between the target and predictors,

This paper is available on arxiv under CC0 1.0 DEED license.


Written by anchoring | Anchoring provides a steady start, grounding decisions and perspectives in clarity and confidence.
Published by HackerNoon on 2024/11/14