WoE & IV

"Težina dokaza" (engleski weights of evidence, WoE) i informaciona vrijednost (engleski information value, IV) predstavljaju najčešće korišćene metrike prilikom razvoja rejting modela kreditnog rizika.

Zadatak: Importovati woe_iv.csv fajl dat u prilogu, a zatim importovanom data frameu db dodati novu variablu maturity.g , definisanu na način da se vrijednosti varijable maturity grupišu u 5 (po broju observacija) približno jednakih grupa. Dalje:

  1. izračunati WoE i IV nove varijable maturity.g u odnosu na binarnu zavisnu varijablu bo ;

  2. izračunati WoE i IV nove varijable maturity.g u odnosu na neprekidnu zavisnu varijablu co .

woe_iv.csv
> #naredne komande izvrsiti ukoliko paketi vec nisu instalirani
> #install.packages("Hmisc")
> #install.packages("dtplyr")
> #install.packages("dplyr")
> library(Hmisc)
> library(dtplyr)
> library(dplyr)
> 
> #importovati woe_iv.csv fajl
> db <- read.csv("woe_iv.csv", header = TRUE)
> str(db)
'data.frame':   10000 obs. of  3 variables:
 $ bo      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ co      : num  0.1361 0.0941 0.0847 0.0122 0.0122 ...
 $ maturity: int  18 9 12 12 12 10 8 6 18 24 ...
> #bo - dobar (0) / los (1) indikator
> table(db$bo)

   0    1 
9500  500 
> #kreirati grupe rocnosti kredita
> db$maturity.g <- cut2(db$maturity, g = 5)
> #kreirati data.table objekat 
> db <- lazy_dt(db)
> db
Source: local data table [10,000 x 4]
Call:   `_DT1`

     bo     co maturity maturity.g
  <int>  <dbl>    <int> <fct>     
1     0 0.136        18 [14,22)   
2     0 0.0941        9 [ 4,11)   
3     0 0.0847       12 [11,14)   
4     0 0.0122       12 [11,14)   
5     0 0.0122       12 [11,14)   
6     0 0.0122       10 [ 4,11)   
# ... with 9,994 more rows

# Use as.data.table()/as.data.frame()/as_tibble() to access results
> bo.s <- db %>% 
+   group_by(maturity.g) %>%
+   summarise(no = n(),
+ ng = sum(bo%in%0),
+ nb = sum(bo)) %>%
+   mutate(dr = nb / no) %>%
+   ungroup() %>%
+   mutate(so = sum(no),
+    sg = sum(ng),
+    sb = sum(nb), 
+    dist.g = ng / sg,
+    dist.b = nb / sb,
+    woe = log(dist.g / dist.b),
+    iv.c = (dist.g - dist.b) * woe,  
+    iv.s = sum(iv.c))
> as.data.frame(bo.s)
  maturity.g   no   ng  nb         dr    so   sg  sb    dist.g dist.b
1    [ 4,11) 2009 1961  48 0.02389248 10000 9500 500 0.2064211  0.096
2    [11,14) 2024 1942  82 0.04051383 10000 9500 500 0.2044211  0.164
3    [14,22) 2174 2058 116 0.05335787 10000 9500 500 0.2166316  0.232
4    [22,26) 1856 1772  84 0.04525862 10000 9500 500 0.1865263  0.168
5    [26,72] 1937 1767 170 0.08776458 10000 9500 500 0.1860000  0.340
          woe        iv.c      iv.s
1  0.76556984 0.084535027 0.1893244
2  0.22031542 0.008905381 0.1893244
3 -0.06853925 0.001053340 0.1893244
4  0.10460835 0.001938007 0.1893244
5 -0.60319894 0.092892637 0.1893244

Last updated

Was this helpful?