Agregacije podataka
Zadatak: Generisati vještački data frame db
, a zatim izračunati aritmetičku sredinu i standardnu devijaciju varijabli x
i y
, dok za varijablu z
izračunati maksimalni datum u odnosu na modalitete varijable w
.
> #definisati data frame
> set.seed(2021)
> db <- data.frame(x = rnorm(1000, 10, 2),
+ y = runif(1000, 200, 800),
+ z = as.Date("2021-03-31") - 1:1000,
+ w = sample(letters[1:5], 1000, rep = TRUE))
> str(db)
'data.frame': 1000 obs. of 4 variables:
$ x: num 9.76 11.1 10.7 10.72 11.8 ...
$ y: num 675 574 524 282 206 ...
$ z: Date, format: "2021-03-30" "2021-03-29" ...
$ w: chr "d" "d" "e" "d" ...
> #izracunati aritmeticku sredinu kolona x i y
> aggregate(x = db[, c("x", "y")],
+ by = list("Group" = db$w),
+ FUN = "mean")
Group x y
1 a 10.039274 491.7409
2 b 10.035221 479.6193
3 c 10.182541 495.6854
4 d 9.823664 513.8679
5 e 10.042712 489.2177
> #izracunati aritmeticku sredinu i standardnu devijaciju kolona x i y
> aggregate(x = db[, c("x", "y")],
+ by = list("Group" = db$w),
+ FUN = function(x) {c("avg" = mean(x), "stdev" = sd(x))})
Group x.avg x.stdev y.avg y.stdev
1 a 10.039274 1.864388 491.7409 174.8500
2 b 10.035221 2.070886 479.6193 179.2915
3 c 10.182541 2.235355 495.6854 174.2087
4 d 9.823664 2.027051 513.8679 172.6692
5 e 10.042712 1.946465 489.2177 176.1243
> #izracunati maksimalni datum kolone z
> aggregate(x = db[, "z"],
+ by = list("Group" = db$w),
+ FUN = "max")
Group x
1 a 2021-03-23
2 b 2021-03-16
3 c 2021-03-25
4 d 2021-03-30
5 e 2021-03-28
Last updated
Was this helpful?