R u bankarstvu
  • Zbirka riješenih zadataka
  • O Zbirci
  • 1. Import podataka
    • .csv & .txt
    • Microsoft Excel
    • Microsoft Access
    • SAS
    • .RData
  • 2. Manipulacije i agregacije podataka
    • str
    • ifelse & if
    • Nedostupne vrijednosti
    • %in%
    • as.Date
    • Petlje
    • Agregacije podataka
  • 3. Eksport podataka
    • .csv & .txt
    • Microsoft Excel
    • Microsoft Access
    • SAS
    • .RData
    • Eksport tabela i grafika u Microsoft PowerPoint
    • Eksport tabela i grafika u Microsoft Word
  • 4. Ostalo
    • ODBC konekcije
    • Sistemsko manipulisanje fajlovima i folderima
    • Pozivanje R funkcija i programa iz SAS-a
    • Pozivanje SAS programa iz R-a
    • Korisničke funkcije
    • Neto sadašnja vrijednost
    • Plan otplate kredita
    • Efektivna kamatna stopa
    • Moratorijum na otplatu kredita
    • Restrukturiranje kredita kroz produženje roka otplate
    • WoE & IV
    • WoE transformacije u regresionim modelima
    • Kalibracija rejting skale
    • Monotono grupisanje numeričkih risk faktora
  • Biografija
Powered by GitBook
On this page

Was this helpful?

  1. 4. Ostalo

WoE transformacije u regresionim modelima

PreviousWoE & IVNextKalibracija rejting skale

Last updated 4 years ago

Was this helpful?

Zadatak: Importovati woe_iv.csv fajl dat u prilogu, a zatim importovanom data frameu db dodati novu variablu maturity.g , definisanu na način da se vrijednosti varijable maturity grupišu u 3 grupe u odnosu na zadate granice 4, 11, 14 i 72. Dalje:

  1. ocijeniti model logističke regresije (zavisna varijabla bo, nezavisna maturity.g) koristeći metod: a) WoE transformacije nezavisne varijable (tzv. woe coding); b) transformacije nezavisne varijable u binarne varijable (tzv. dummy coding);

  2. ocijeniti model linearne regresije (zavisna varijabla co, nezavisna maturity.g) koristeći metod: a) WoE transformacije nezavisne varijable (tzv. woe coding); b) transformacije nezavisne varijable u binarne varijable (tzv. dummy coding).

> #naredne komande izvrsiti ukoliko paketi vec nisu instalirani
> #install.packages("Hmisc")
> #install.packages("dtplyr")
> #install.packages("dplyr")
> library(Hmisc)
> library(dtplyr)
> library(dplyr)
> 
> #importovati woe_iv.csv fajl
> db <- read.csv("woe_iv.csv", header = TRUE)
> str(db)
'data.frame':   10000 obs. of  3 variables:
 $ bo      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ co      : num  0.1361 0.0941 0.0847 0.0122 0.0122 ...
 $ maturity: int  18 9 12 12 12 10 8 6 18 24 ...
> #bo - dobar (0) / los (1) indikator
> table(db$bo)

   0    1 
9500  500 
> #kreirati grupe rocnosti kredita
> db$maturity.g <- cut2(db$maturity, cuts = c(4, 11, 14))
> #kreirati data.table objekat 
> dt <- lazy_dt(db)
> dt
Source: local data table [10,000 x 4]
Call:   `_DT1`

     bo     co maturity maturity.g
  <int>  <dbl>    <int> <fct>     
1     0 0.136        18 [14,72]   
2     0 0.0941        9 [ 4,11)   
3     0 0.0847       12 [11,14)   
4     0 0.0122       12 [11,14)   
5     0 0.0122       12 [11,14)   
6     0 0.0122       10 [ 4,11)   
# ... with 9,994 more rows

# Use as.data.table()/as.data.frame()/as_tibble() to access results
> #woe izracun
> bo.s <- dt %>% 
+   group_by(maturity.g) %>%
+   summarise(no = n(),
+ ng = sum(bo%in%0),
+ nb = sum(bo)) %>%
+   mutate(dr = nb / no) %>%
+   ungroup() %>%
+   mutate(dist.g = ng / sum(ng),
+    dist.b = nb / sum(nb),
+    woe = log(dist.g / dist.b))
> bo.s <- as.data.frame(bo.s)
> bo.s
  maturity.g   no   ng  nb         dr    dist.g dist.b        woe
1    [ 4,11) 2009 1961  48 0.02389248 0.2064211  0.096  0.7655698
2    [11,14) 2024 1942  82 0.04051383 0.2044211  0.164  0.2203154
3    [14,72] 5967 5597 370 0.06200771 0.5891579  0.740 -0.2279560
> #mapiranje odgovarajucih modaliteta i woe vrijednosti
> woe.nv <- bo.s$woe
> names(woe.nv) <- bo.s$maturity.g
> db$maturity.woe.b <- woe.nv[db$maturity.g]
> #provjera
> table(db$maturity.woe.b, db$maturity.g)
                    
                     [ 4,11) [11,14) [14,72]
  -0.227955965913351       0       0    5967
  0.220315422420577        0    2024       0
  0.765569836122015     2009       0       0
> #logisticka regresija - woe transformacija
> lr.woe.b <- glm(bo ~ maturity.woe.b, family = binomial(link = logit), data = db)
> summary(lr.woe.b)

Call:
glm(formula = bo ~ maturity.woe.b, family = binomial(link = logit), 
    data = db)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.3578  -0.3578  -0.3578  -0.2876   2.7328  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -2.94444    0.04668 -63.081  < 2e-16 ***
maturity.woe.b -1.00000    0.14450  -6.921  4.5e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3970.3  on 9999  degrees of freedom
Residual deviance: 3913.9  on 9998  degrees of freedom
AIC: 3917.9

Number of Fisher Scoring iterations: 6

> #logisticka regresija - dummy transformacija
> lr.dummy.b <- glm(bo ~ maturity.g, family = binomial(link = logit), data = db)
> summary(lr.dummy.b)

Call:
glm(formula = bo ~ maturity.g, family = binomial(link = logit), 
    data = db)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.3578  -0.3578  -0.3578  -0.2876   2.7328  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -3.7100     0.1461 -25.395  < 2e-16 ***
maturity.g[11,14)   0.5453     0.1845   2.955  0.00313 ** 
maturity.g[14,72]   0.9935     0.1556   6.383 1.73e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 3970.3  on 9999  degrees of freedom
Residual deviance: 3913.9  on 9997  degrees of freedom
AIC: 3919.9

Number of Fisher Scoring iterations: 6
> #woe izracun
> co.s <- db %>% 
+   group_by(maturity.g) %>%
+   summarise(no = n(),
+ sy = sum(co)) %>%
+   ungroup() %>%
+   mutate(po = no / sum(no),
+    py = sy / sum(sy),
+    woe = log(py / po))
> co.s <- as.data.frame(co.s)
> co.s
  maturity.g   no        sy     po        py         woe
1    [ 4,11) 2009  78.45591 0.2009 0.1664434 -0.18815216
2    [11,14) 2024  93.44255 0.2024 0.1982373 -0.02078092
3    [14,72] 5967 299.46856 0.5967 0.6353193  0.06271322
> #mapiranje odgovarajucih modaliteta i woe vrijednosti
> woe.nv <- co.s$woe
> names(woe.nv) <-  co.s$maturity.g
> db$maturity.woe.c <- woe.nv[db$maturity.g]
> #provjera
> table(db$maturity.woe.c, db$maturity.g)
                     
                      [ 4,11) [11,14) [14,72]
  -0.188152156905125     2009       0       0
  -0.0207809184405925       0    2024       0
  0.0627132152120221        0       0    5967
> #logisticka regresija - woe transformacija
> lr.woe.c <- lm(co ~ maturity.woe.c, data = db)
> summary(lr.woe.c)

Call:
lm(formula = co ~ maturity.woe.c, data = db)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.04823 -0.03439 -0.01531  0.02287  0.26299 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.0473407  0.0004228  111.96   <2e-16 ***
maturity.woe.c 0.0444954  0.0043277   10.28   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.04224 on 9998 degrees of freedom
Multiple R-squared:  0.01046,   Adjusted R-squared:  0.01036 
F-statistic: 105.7 on 1 and 9998 DF,  p-value: < 2.2e-16

> #logisticka regresija - dummy transformacija
> lr.dummy.c <- lm(co ~ maturity.g, data = db)
> summary(lr.dummy.c)

Call:
lm(formula = co ~ maturity.g, data = db)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.04828 -0.03419 -0.01539  0.02281  0.26293 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       0.0390522  0.0009424  41.440  < 2e-16 ***
maturity.g[11,14) 0.0071150  0.0013303   5.349 9.06e-08 ***
maturity.g[14,72] 0.0111352  0.0010895  10.220  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.04224 on 9997 degrees of freedom
Multiple R-squared:  0.01047,   Adjusted R-squared:  0.01027 
F-statistic: 52.89 on 2 and 9997 DF,  p-value: < 2.2e-16
174KB
woe_iv.csv
woe_iv.csv