Sunday, July 13, 2014

make group (aggregate) data set in R

Suppose you have a data set called sort2013
And you want to calculate the group/aggregate of each Head_age

> sort2013
  Head_age P2_age P3_age P4_age P5_age P6_age P7_age P8_age
1       84     75      0      0      0      0      0      0
2       84     75      0      0      0      0      0      0
3       84     75      0      0      0      0      0      0
4       84     75      0      0      0      0      0      0
5       84     75      0      0      0      0      0      0
6       85     76      0      0      0      0      0      0
Etc...

Do the aggregate or grouping the data frame with aggregate()
> group2013 = aggregate(. ~ Head_age, sort2013, mean)
#aggregate or group of .(all column) ~(by) Head_age, in sort2013 data frame, and calculate the mean of all column. Then input into group2013 data frame

The result would be

> group2013
  Head_age   P2_age   P3_age   P4_age P5_age P6_age P7_age P8_age
1       16  0.00000 0.000000 0.000000      0      0      0      0
2       17  5.00000 0.000000 0.000000      0      0      0      0
3       18 13.44444 0.000000 0.000000      0      0      0      0
4       19 22.38095 7.000000 0.000000      0      0      0      0
5       20 17.45455 0.000000 0.000000      0      0      0      0
6       21 42.60000 6.666667 8.466667      0      0      0      0
Etc...


The result would be automatically sorted as you can see above

Thursday, July 10, 2014

sort data in each row in R data set

Suppose we have this kind of data A
And want to sort from the maximum until the minimum value each row

> A
    [1]      [2]          [3]          [4]
[1] 3       7             9              5
[2] 7       9            11             3
[3] 5       3            7               8

We want to sort A by row and resulting like this
    [1]      [2]          [3]          [4]
[1] 3       5             7             9
[2] 3       7             9            11
Etc....

We can sort by row by doing this
>Asorted = t(apply(A,1,sort))  
#sort the data A each row and input to Asorted data

apply(A,1,sort) means apply for A data set, in 1 (row), and sort them
*1: row, 2: column and (1:2): both of row and column

Yet it will be like this. It will be sorted from the minimum value to the maximum value in each row  (reverse)

    [1]      [2]          [3]          [4]
[1] 3       5             7             9
[2] 3       7             9            11
Etc....

What we need to do next is just reverse it by this command
>Asorted = Asorted[ ,ncol(Asorted):1]    
#reverse sequence in column index, and input it (rewrite) to Asorted data

There you go, the result would be like this
>Asorted

    [1]      [2]          [3]          [4]
[1] 9       7             5            3
[2] 11     9             7            3
Etc....

Friday, April 4, 2014

hello new period!


here in Japan, the "seeing the sakura flower" is a culture to enjoy with someone, if you know what i mean thou, or some people (yeah, you can enjoy it alone as well). and as the sakura bloom, the life goes on to the new stage
i've got to say hello April, hello new period of time, hello (preparation for the) new life
well, actually, i saw this after seeing her off, i feel lonely...
but as the new term has begun, my new spirit is overflowed with joy

i will try my best!


日本では桜の花見は誰か、ここの誰かとは誰を指しているのかわかるかい、と楽しんでいる文化です(まぁ、自分で楽しんでることもあるけど)。桜が咲いていて、人生は新たなステージに歩んでいます。
4月に、新たな時期にも、新たな人生(の準備)にも八ローですね。
実や彼女を見送った後、この道通っていて、寂しいです…
しかし、新たな時期が始まったから、新スピリットが溢れています。

もちろんこれからも頑張っていきます!

Wednesday, January 22, 2014

never trust excel!


Conclusion: Never trust excel, even for a bit complicated statistics!

As a "wanna-be-researcher", of course, I am quite close with data and number. Well, sometimes I use Excel to do some simple things on regards to the data, like drawing a graph, and so on.
But you know what?!
Today my professor and I did several things using excel, and we got real surprised. Imagine... We only did the simple linear trendline, it was only a super simple thing, I mean a real simple thing. The result was y = 29018x - 31865. That was okay, but then when we tested, it went completely wrong!
After that we decided to manually calculate the slope and the intercept. And yes, it was wrong, it should be -318658. We found out that the last digit was missing!
What the heck!

No more excel for this kind of thing!



結論から言うとちょっと複雑統計やればはエクセル信用できません!

”研究者に目指す”わたくしが、当然、データー、数値等に接することが多いです。シンプルなことならたまにはエクセルでグラフ描いたり、シンプルな計算したりするのです。
今日先生と相談していた時に、いろんなことエクセルでやってみたのですが、変なことがあって、びっくりしましたよ。想像して欲しいんです…散布図描いて、線形近似曲線追加したら y = 29018x - 31865という結果が出ました。出た時には「あぁ、出たね」と思っただけ、チェックしたら「あれ?!全然NGじゃんかよ!」っと
そしたら手動でスロープと切片計算したら、全然違いましたよ、マイクロソフトさん。正しかったのは - 318658で、最後の桁がなかったんじゃないですか。
アホか!

こんなことはやっぱりエクセルでNG!