Points Double the Odds (PDO) is a widely used metric to
track scorecard performance over time and measure the deterioration of
scorecard’s ranking ability. It is calculated on validation data and be compared to the fixed PDO based on benchmark data. An increase in PDO indicates deterioration in model’s ranking
performance and the scorecard hence needs to be recalibrated or even
redeveloped.
However, such use of PDO is dangerous and an acceptable PDO value may be misleading. It is not unusual to observe a stable validation PDO with poor performance which indicated by other metrics such AUC and KS. Let us go through the
calculation of PDO to see why. Assume there is an existing score to rank customers’
risk, the higher score, the less risk. We first need to fit a logistic
regression with the following form,
\[log\left ( \frac{p}{1-p} \right )=intercept + slope \times score\]
\( p \)
stands for the probability of being ‘good’. We compute PDO by solving the
following equation simultaneously,
\[log\left ( \frac{2p}{1-p} \right )=intercept + slope \times \left ( score+PDO \right )\]
\[log\left ( \frac{2p}{1-p} \right )=intercept + slope \times \left ( score+PDO \right )\]
which illustrate how increasing scores to double the odds.
As a result, \(PDO = \frac{log2}{slope}\);
However
something is missing here. How about we put on some hats on the equation?
\[\widehat{PDO}= \frac{log2}{\widehat{slope}}\]
\[\widehat{PDO}= \frac{log2}{\widehat{slope}}\]
What is the implication of those hats? The hats means PDO,
as well as the slope, is a statistic calculated from a sample with uncertainty. Fortunately, the
uncertainty can be measured by interval estimate. Since only the increase in
PDO is of concerned, a percentile estimate on the right tail, 95th
percentile for instance, noted as \(\widehat{PDO_{95}}\),
can be used as another metric to
measure model’s deterioration. In addition, \(\widehat{PDO_{95}}= \frac{log2}{\widehat{slope_{5}}} \),
and \(\widehat{slope_{5}}\) is
easy to calculated by using normal approximation.
A
typical case is that a modeler observes the PDO in validation is 20.5 compared
to 20 in benchmark, which seems acceptable, but the 95th percentile
of validation PDO is 25 and indicates the scorecard’s stability may be
questionable. A one-side hypothesis test can be formulated to check whether the
PDO estimate is significantly less than a pre-specified threshold.
BTW,
people may ask a question—do we also need
to consider the interval estimates of KS and AUC, since both of them are
estimates with uncertainty? The answer is usually NO. The interval
estimates of Kolmogrov-Simirnov statistic and Mann–Whitney U statistic do not
depend on the score empirical distributions in good and bad groups, although do
their point estimates. In other words, scorecard deterioration has impact only on
the point estimates of KS and AUC (instead, a change in two samples’ bad rates
does relate to two interval estimates, but the magnitude is small in
real world case), and thus their estimated values are directly comparable
without the concern on uncertainty. However the scorecard deterioration may have
impact on one or both of mean and variance estimates of PDO. Checking on one but
ignoring the other would miss the critical information.