Cyan's Blog

Search

Search IconIcon to open search

D2L-26-环境和分布偏移

Last updated Feb 19, 2022 Edit Source

# Environment and Distribution Shift

2022-02-19

Tags: #DeepLearning #DistributionShift #CovariateShift

# 分布偏移的类型

# Covariate Shift - 协变量偏移

# Label Shift - 标签偏移

# Concept Shift - 概念偏移

# 分布偏移: 纠正

# Empirical Risk

对于训练数据 $\left{\left(\mathbf{x}{1}, y{1}\right), \ldots,\left(\mathbf{x}{n}, y{n}\right)\right}$, 我们最小化损失函数的过程可以表示为: $$\operatorname{minimize} \frac{1}{n} \sum_{i=1}^{n} l\left(f\left(\mathbf{x}{i}\right), y{i}\right)$$ 在统计的语境里面, 上面的损失也称为 经验损失 (Empirical Risk). 也就是损失 $l(f(\mathbf{x}), y)$ 在整个数据的真实分布 $p(\mathbf{x}, y)$ 上面的数学期望: $$E_{p(\mathbf{x}, y)}[l(f(\mathbf{x}), y)]=\iint l(f(\mathbf{x}), y) p(\mathbf{x}, y) d \mathbf{x} d y$$ 但是在实际过程中我们不知道数据的真实分布 $p(\mathbf{x}, y)$, 所以我们只能近似地去最小化Empirical risk.

# Covariate Shift Correction

# 那么怎么估计 $\beta_{i}$ 呢?

# Label Shift Correction

# 怎么估计 $p(y_{i})$ 呢

# Concept Shift Correction


  1. Say, for example, that we trained a model to predict who will repay vs. default on a loan, finding that an applicant’s choice of footwear was associated with the risk of default (Oxfords indicate repayment, sneakers indicate default). We might be inclined to thereafter grant loans to all applicants wearing Oxfords and to deny all applicants wearing sneakers. In this case, our ill-considered leap from pattern recognition to decision-making and our failure to critically consider the environment might have disastrous consequences. For starters, as soon as we began making decisions based on footwear, customers would catch on and change their behavior. Before long, all applicants would be wearing Oxfords, without any coinciding improvement in credit-worthiness. Similar issues abound in many applications of machine learning: by introducing our model-based decisions to the environment, we might break the model. 4.9. Environment and Distribution Shift — Dive into Deep Learning 0.17.5 documentation ↩︎

  2. 鼠疫 - 維基百科,自由的百科全書 ↩︎

  3. machine learning - Difference between distribution shift and data shift, concept drift and model drift - Cross Validated ↩︎

  4. 4.9. Environment and Distribution Shift — Dive into Deep Learning 0.17.5 documentation ↩︎

  5. Confusion Matrix ↩︎