1. arrange()
- 데이터프레임의 행을 기준에 따라 정렬하는 데 사용된다. 이 함수를 사용하면 데이터를 원하는 순서대로 정렬 가능
- arrange()의 함수의 구문
arrange(<dataset>, ...)
#... : 정렬할 기준이 되는 열 이름
ex)
arrange(flights,year,month,day)
# A tibble: 336,776 × 19
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest air_time distance hour
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 2013 1 1 517 515 2 830 819 11 UA 1545 N14228 EWR IAH 227 1400 5
2 2013 1 1 533 529 4 850 830 20 UA 1714 N24211 LGA IAH 227 1416 5
3 2013 1 1 542 540 2 923 850 33 AA 1141 N619AA JFK MIA 160 1089 5
4 2013 1 1 544 545 -1 1004 1022 -18 B6 725 N804JB JFK BQN 183 1576 5
5 2013 1 1 554 600 -6 812 837 -25 DL 461 N668DN LGA ATL 116 762 6
6 2013 1 1 554 558 -4 740 728 12 UA 1696 N39463 EWR ORD 150 719 5
7 2013 1 1 555 600 -5 913 854 19 B6 507 N516JB EWR FLL 158 1065 6
8 2013 1 1 557 600 -3 709 723 -14 EV 5708 N829AS LGA IAD 53 229 6
9 2013 1 1 557 600 -3 838 846 -8 B6 79 N593JB JFK MCO 140 944 6
10 2013 1 1 558 600 -2 753 745 8 AA 301 N3ALAA LGA ORD 138 733 6
# ℹ 336,766 more rows
# ℹ 2 more variables: minute <dbl>, time_hour <dttm>
# ℹ Use `print(n = ...)` to see more rows
- 내림차순으로 열을 재정렬하고싶다면 desc()를 사용해주면된다.
ex)
arrange(flights, desc(year), desc(month), desc(day))
# A tibble: 336,776 × 19
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest air_time distance hour
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 2013 12 31 13 2359 14 439 437 2 B6 839 N566JB JFK BQN 189 1576 23
2 2013 12 31 18 2359 19 449 444 5 DL 412 N713TW JFK SJU 192 1598 23
3 2013 12 31 26 2245 101 129 2353 96 B6 108 N374JB JFK PWM 50 273 22
4 2013 12 31 459 500 -1 655 651 4 US 1895 N557UW EWR CLT 95 529 5
5 2013 12 31 514 515 -1 814 812 2 UA 700 N470UA EWR IAH 223 1400 5
6 2013 12 31 549 551 -2 925 900 25 UA 274 N577UA EWR LAX 346 2454 5
7 2013 12 31 550 600 -10 725 745 -20 AA 301 N3CXAA LGA ORD 127 733 6
8 2013 12 31 552 600 -8 811 826 -15 EV 3825 N14916 EWR IND 118 645 6
9 2013 12 31 553 600 -7 741 754 -13 DL 731 N333NB LGA DTW 86 502 6
10 2013 12 31 554 550 4 1024 1027 -3 B6 939 N552JB JFK BQN 195 1576 5
# ℹ 336,766 more rows
# ℹ 2 more variables: minute <dbl>, time_hour <dttm>
# ℹ Use `print(n = ...)` to see more rows
3. 연습문제
Q1) flights를 정렬하여 가장 지연된 항공편을 찾아라.
arrange(flights,desc(dep_delay))
# A tibble: 336,776 × 19
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest air_time distance hour
<int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 2013 1 9 641 900 1301 1242 1530 1272 HA 51 N384HA JFK HNL 640 4983 9
2 2013 6 15 1432 1935 1137 1607 2120 1127 MQ 3535 N504MQ JFK CMH 74 483 19
3 2013 1 10 1121 1635 1126 1239 1810 1109 MQ 3695 N517MQ EWR ORD 111 719 16
4 2013 9 20 1139 1845 1014 1457 2210 1007 AA 177 N338AA JFK SFO 354 2586 18
5 2013 7 22 845 1600 1005 1044 1815 989 MQ 3075 N665MQ JFK CVG 96 589 16
6 2013 4 10 1100 1900 960 1342 2211 931 DL 2391 N959DL JFK TPA 139 1005 19
7 2013 3 17 2321 810 911 135 1020 915 DL 2119 N927DA LGA MSP 167 1020 8
8 2013 6 27 959 1900 899 1236 2226 850 DL 2007 N3762Y JFK PDX 313 2454 19
9 2013 7 22 2257 759 898 121 1026 895 DL 2047 N6716C LGA ATL 109 762 7
10 2013 12 5 756 1700 896 1058 2020 878 AA 172 N5DMAA EWR MIA 149 1085 17
# ℹ 336,766 more rows
# ℹ 2 more variables: minute <dbl>, time_hour <dttm>
# ℹ Use `print(n = ...)` to see more rows
'DS Study > R4DS(R언어)' 카테고리의 다른 글
[R4DS] [2-5] mutate() (0) | 2024.03.31 |
---|---|
[R4DS] [2-4] select() (0) | 2024.03.31 |
[R4DS] [2-2] filter() (0) | 2024.03.31 |
[R4DS] [2-1] 데이터 변형 (nycflights13, tidyverse) (0) | 2024.03.30 |
[R4DS] [1-8] 그래프 레이어 문법 (0) | 2024.03.30 |