Improving classification performance on imbalanced class distributions.
1. Tasks Overview
-
Image Classification
-
Masked Language Modeling
-
Machine Translation
2. Image classification
2.1. Datasets
Ruh either of the below commands to download and prepare datasets:
- Run data/imgcls/get-data.sh
— This includes data augmentations provided by dataset creators.
- Run data/imgcls/get-data-uniq.sh
-- This excludes augmentations provided by dataset creators. Maybe useful if you have dynamic augmentations (we do as part of pytorch data loader)
cd data/imgcls
./get-data.sh # output dirs: hirise and msl
./get-data-uniq.sh # output dirs: hirise-uniq and msl-uniq
This script download the following datasets:
-
Mars Orbital Image (HiRISE): zenodo.org/record/4002935
-
MSL Curiosity Rover: zenodo.org/record/4033453
2.2. Model
See ../imblearn/imgcls/
dir
3. Masked Language Modeling (MLM)
MLM is based on 🤗 Transformers.
3.1. Datasets
Automatically downloaded.
3.2. Model
See imblearn/mlm/
directory
4. (Neural) Machine Translation
4.1. Datasets
TODO:
-
Hindi-English
4.2. Models
5. Configuration file: conf.yml
Here is the basic schema of conf.yml
model:
name: <name>
args:
key1: <value>
key2: <value>
optimizer:
name: <name>
args:
key1: <value>
schedule:
name: <name>
args:
key1: <value>
loss:
name: <name>
args:
key1: <value>
train:
data: <path/data/train>
batch_size: 2 # number of images
max_step: 300 # maximum number of steps
max_epoch: 100 # maximum number of epochs
checkpoint: 100 # validate and checkpoint every these many steps
keep_in_mem: true # keep datasets in memory
validation:
data: <path/data/val>
batch_size: 10
patience: 10
by: macro_f1
tests:
#test: <path/data/test1> # dont use tests until the end
val: <path/data/val>
5.1. Models
5.1.1. Image Classifier
model:
name: image_classifier
args:
n_classes: 19
intermediate: 40
dropout: 0.2
parent: resnext50_32x4d # torchvision.models.<this>
pretrained: true # initialize pretrained parent model
5.1.2. Masked Language Model
Work in progress |
5.2. Optimizers
The following optimizers are supported
-
adam
=torch.optim.Adam -
sgd
=torch.optim.SGD -
adagrad
=torch.optim.Adagrad -
adam_w
=torch.optim.AdamW -
adadelta
=torch.optim.Adadelta -
sparse_adam
=torch.optim.SparseAdam
adam
optimizer:
name: adam
args:
lr: 0.0005
betas: [0.9, 0.999]
5.3. Schedule
Schedule is an optional component. Comment or delete the schedule:
block in conf.yml to disable it.
The following learning schedules are supported:
5.3.1. inverse_sqrt
: Inverse Square Root
schedule:
name: inverse_sqrt
args:
peak_lr: 0.0005
warmup: 100
5.3.2. noam
: Noam
Based on arxiv.org/abs/1706.03762
schedule:
name: noam
args:
scaler: 2
model_dim: 100
warmup: 2000
5.4. Loss
5.4.1. Cross Entropy
loss:
name: cross_entropy
args:
weight_by: inverse_frequency
5.4.2. Weighted Cross Entropy
loss.args.weight
can be set to a list. The list should have one weight per each class and match the order in <experiment-dir/classes.csv>.
Heuristics based on frequencies in training corpus can be used to obtain weights. Let \$c\$ be a class with \$f_c\$ be class frequency in training corpus, and \$w_c\$ be weight to be inferred using heuristics.
Heuristic is to be set to loss.args.weight_by
in conf.yml.
The valid values to loss.args.weight_by
are:
-
inverse_frequency
\$w_c \propto 1/f_c \$ -
inverse_log
\$w_c \propto 1/\log(f_c) \$ -
inverse_sqrt
\$w_c \propto 1/\sqrt(f_c) \$ -
information_content
Uses information content
Let \$\pi_c = \frac{f_c}{\sum_i f_i} \$ be probability in training corpus (i.e. prior)
\$w_c = -\log_2(\pi_c)\$
Based on arxiv.org/abs/1901.05555
Instead of using raw frequencies from training corpus, we can also use effective frequencies (i.e. number of samples).
Example:
loss:
name: cross_entropy
args:
weight_by: inverse_frequency (1)
# to use effective number of samples
eff_frequency: true (2)
eff_beta: 0.99 (3)
1 | Other supported heuristics can also be used |
2 | to enable it |
3 | \$\beta \in [0, 1)\$ is required when eff_frequency=true . |
Effective number of samples is a kind of smoothing function for frequencies. If \$\beta=0 \implies \$ all classes attain same frequency of 1 as effective frequency(thus results in unweighted cross entropy); and if \$\beta \rightarrow 1\$ effective frequencies approaches raw frequencies (thus, no smoothing is in effect).
5.4.3. Focal Loss
Based on arxiv.org/abs/1708.02002
Implements loss = \$\sum_c y_c (1-p_c)^\gamma \log(p_c)\$ where \$y_c\$ is ground thruth class, \$p_c\$ is model output probability, and \$\gamma\$ is a hyper parameter.
loss:
name: focal_loss
args:
gamma: 2
5.4.4. Label Smoothing
Extends cross_entropy
Based on arxiv.org/abs/1512.00567
loss:
name: smooth_cross_entropy
args: (1)
#weight_by: inverse_frequency
#eff_frequency: true
#eff_beta: 0.99
smooth_epsilon: 0.05
1 | Label smoothing works on top of cross_entropy , so all the args of cross_entropy such as weight_by are valid here. |
5.4.5. Balanced label smoothing
Extends cross_entropy
this is experimental |
loss:
name: smooth_cross_entropy
args:
#weight_by: inverse_frequency
smooth_epsilon: 0.05
smooth_weight_by: inverse_frequency
5.5. Macro Cross Entropy
Extends smooth_cross_entropy
+ This does not accept weight_by
, instead does macro average, which is an unweighted average across classes.
+ Since this extends smooth_cross_entropy
, label smoothing params (smooth_epsilon
) and also smooth_weight_by
are supported (optional).
loss:
name: macro_cross_entropy
args:
smooth_epsilon: 0.1
smooth_weight_by: inverse_frequency
5.6. Trainer
This example is for image classifier model, and it is subject to change.
train:
data: <path/data/train> (1)
batch_size: 20 # number of images
max_step: 200_000 # maximum number of steps (2)
min_step: 10_000 # minimum number of steps.. ignore early stop til then
max_epoch: 100 # maximum number of epochs (2)
checkpoint: 1000 # validate and checkpoint every these many steps
keep_in_mem: true # keep datasets in memory (3)
1 | The directory specified by path should be compatible with torchvision.datasets.ImageFolder |
2 | max_step or max_epoch whichever comes earlier |
3 | Default is true and uses CPU memory. To use GPU memory set keep_in_mem: cuda . To disable: set it to false or null . |
5.7. Validation
validation:
data: <path/data/val> (1)
batch_size: 10 # this can be larger than train.batch_size
patience: 10 # patience for early stop
by: macro_f1 # metric to use for early stop
1 | The directory specified by path should be compatible with torchvision.datasets.ImageFolder |