Characterizing Structural Regularities of Labeled Data in Overparameterized Models

Ziheng Jiang, Chiyuan Zhang, Kunal Talwar, Michael C. Mozer Equal contribution

Abstract: Human learners appreciate that observations usually form hierarchies of regularities and sub-regularities. For example, English verbs have irregular cases that must be memorized (e.g., go ↦ went) and regular cases that generalize well (e.g., kiss ↦ kissed, miss ↦ missed). Likewise, deep neural networks have the capacity to memorize rare or irregular forms but nonetheless generalize across instances that share common patterns or structures. We analyze how individual instances are treated by a model via a consistency score. The score is the expected accuracy of a particular architecture for a held-out instance on a training set of a given size sampled from the data distribution. We obtain empirical estimates of this score for individual instances in multiple data sets, and we show that the score identifies out-of-distribution and mislabeled examples at one end of the continuum and regular examples at the other end. We explore two categories of proxies to the consistency score: pairwise distance based proxy and the training statistics based proxies. We conclude with two applications using C-scores to help understand the dynamics of representation learning and filter out outliers, and discussions of other potential applications such as curriculum learning, and active data collection.

Pre-computed C-scores: We provide pre-computed C-score for download below. The files are in Numpy's data format exported via numpy.savez. For CIFAR-10 and CIFAR-100, the exported file contains two arrays labels and scores. Both arrays are stored in the order of training examples as defined by the original datasets. The data loading tools provided in some deep learning library might not be following the original data example orders, so we provided the labels array for easy sanity check of the data ordering. For ImageNet, please refer to the ImageNet section below.

For TFDS users: because TFDS saves the example id when preparing the dataset (at least for CIFAR), it is possible to remap the exported C-scores to TFDS ordering with the following code snippet:

# load the full cifar10 dataset into memory to get the example ids
data_name = 'cifar10:3.0.2'
raw_data, info = tfds.load(name=data_name, batch_size=-1, with_info=True,
as_dataset_kwargs={'shuffle_files': False})
raw_data = tfds.as_numpy(raw_data)
trainset_np, testset_np = raw_data['train'], raw_data['test']

# load c-scores in original data order
cscore_fn = '/path/to/cifar10-cscores-orig-order.npz'
cscore_arrays = load_npz(cscore_fn)

# get example index
def _id_to_idx(str_id):
return int(str_id.split(b'_')[1])
vec_id_to_idx = np.vectorize(_id_to_idx)
trainset_orig_idx = vec_id_to_idx(trainset_np['id'])

# sanity check with labels to make sure that data order is correct
assert np.all(trainset_np['label'] == cscore_arrays['labels'][trainset_orig_idx])

# now this is c-scores in TFDS order
ordered_cscores = cscore_arrays['scores'][trainset_orig_idx]

MNIST

We show top-ranking (top row) and bottom-ranking (bottom row) examples from MNIST by C-scores computed via multi-layer perceptrons. Use the dropdown menu to select the class to show.


CIFAR-10

We show top-ranking (top row) and bottom-ranking (bottom row) examples from CIFAR-10 by C-scores computed via Inception models. Use the dropdown menu to select the class to show. The pre-computed C-scores can be downloaded from here.


CIFAR-100

We show top-ranking (top row) and bottom-ranking (bottom row) examples from CIFAR-100 by C-scores computed via Inception models. Use the dropdown menu to select the class to show. The pre-computed C-scores can be downloaded from here.


ImageNet

We show examples from ImageNet by C-scores computed via ResNet50 models. For each class, the top 2 rows show the top ranking examples, and the bottom 2 rows show the bottom ranking examples. In the middle, a histogram of the C-scores of all the training examples in this class is show, in both log scale and linear scale.

Because ImageNet contains 1000 classes, we select a subset to visualize. The first subset contains a few representative classes, as indicated by the in the figure here. yellow lady's slipper is a typical regular class, where most of the instances are highly regular and even the bottom ranking examples show some color consistency. oscilloscope, green snake, Norwich terrier and weasel, ordered by the average C-scores in each class, represent most of the classes in the ImageNet dataset: they contain both high regular top-ranking examples and highly irregular bottom-ranking examples. Finally, projectile is a typical irregular class, where the instances are extremely diversified. The second subset contains 100 randomly sampled classes.

The pre-computed C-scores can be downloaded from here. Since there is no well defined example ordering, we order the exported scores arbitrarily, while include the filename of each example to help identify the example-score mapping. More specifically, the exported file for ImageNet contains three arrays labels, scores and filenames. Again we include labels for easy sanity checking.

A few representative classes

100 random classes