Introduce
‘Corpus’ is a large and structured set of digitalized linguistic data, which is necessary in various academic fields studying languages since it presents a comprehensive view of linguistic variation. The project for Yonsei Corpus started in 1986 with the start of the Korean Dictionary Society. We started by building a corpus for compiling dictionaries in 1988. Later, we extended the scope of the corpus to incorporate more various types of linguistic data for studies in Korean linguistics, Korean education, Human Linguistics, or Teaching Korean as a Foreign Language.
Lists
Number | Name | Size |
---|---|---|
1 | Yonsei Corpus 1 | 2,900,000 |
2 | Yonsei Corpus 2 | 1,100,000 |
3 | Yonsei Corpus3 | 5,980,000 |
4 | Yonsei Corpus 4 | 770,000 |
5 | Yonsei Corpus 5 | 8,00,000 |
6 | Yonsei Corpus 6 | 7,230,000 |
7 | Yonsei Corpus 7 | 13,670,000 |
8 | Yonsei Corpus 8 | 870,000 |
9 | Yonsei Corpus 9 | 1500,000 |
10 | Yonsei Corpus 10 | 780,000 |
11 | Yonsei Corpus 11 | 730,000 |
12 | Yonsei Corpus of Korean in the 20th Century | 150,378,870 |
13 | Corpus of Korean Textbooks (Complete) | 724,856 |
14 | Corpus of Korean Textbooks (Conversation) | 119,598 |
15 | Yonsei Korean Learner Corpus | 278,542 |
16 | Korean Elementary Textbook Corpus after Independence | 1,496,280 |
17 | The 6th and 7th Korean Elementary Textbook Corpus | 1,681,769 |
18 | Yonsei Balanced Corpus of Written Discourse | 1,054,362 |
19 | Yonsei Balanced Corpus of Spoken Discourse | 998,934 |
20 | Yonsei Corpus of Polysemy | 1,165,224 |
21 | Yonsei Corpus of Hangul tripitaka | 386,472 |
22 | Corpus of <Tongnip Sinmun> Newspaper | 144,309 |
23 | Corpus of Popular Songs in the Modern Era | 29,339 |
24 | Yonsei Corpus of Multimodal Data | 18,986 |
25 | Twitter Corpus | 945,175,620 |
26 | Political Discourse corpus | 306,681 |
Total | 1,148,089,842 |