Application of Text Mining and K-Medoids for Job Clustering Based on Job Description Analysis

Authors

  • Teguh Aries Wahyudi Universitas PGRI Ronggolawe, Indonesia
  • Amaludin Arifia Universitas PGRI ronggolawe
  • Andik Adi Suryanto Universitas PGRI Ronggolawe

DOI:

https://doi.org/10.26905/jtmi.v11i1.15863

Keywords:

Text Mining, K-Medoids, Clustering, Job Group, Job Description

Abstract

The massive growth of job description data poses challenges in manual job classification due to the unstructured nature of the data. This study aims to develop a job cluster grouping model using a text mining approach and the K-Medoids algorithm. Job description data was obtained from the O*NET Online website and supplemented with data from various prominent job portals in Indonesia. The data was then processed through the stages of tokenization, stopword removal, lemmatization, and vector representation using SentenceTransformer. The clustering process was performed using the K-Medoids algorithm with the Euclidean distance metric, and evaluation was conducted using the Silhouette Score and Davies-Bouldin Index. The results showed that the model produced a sufficiently representative classification with a Silhouette Score of 0.48770 and a Davies-Bouldin Index of 0.815, demonstrating better performance compared to the Agglomerative Clustering method. This approach is effective for supporting automated and data-driven human resource management.

Downloads

Download data is not yet available.

References

Putri, D. M., Ilmananda, A. S., & Prisanta, N. (2024). The Use of K-Means and K-Medoids Algorithms for Developing New Student Admissions Promotion Strategies. SMATIKA : STIKI Informatika Jurnal, vol 14, no. 2, pp. 388-398.

Chai, C. P. (2023). Comparison of Text Preprocessing Methods. Natural Language Engineering, vol. 29, no 3, pp. 456–474, doi: https://doi.org/10.1017/S1351324922000213.

Mishra, M., & Narendar, P. (2021). Impact of HR Analytics on Training and Development in an Organization. Psychology and Education Journal, vol. 58, no. 1, pp. 3606–3614, doi: https://doi.org/10.17762/pae.v58i1.1315.

Sujjada, A., Insany, G. P., & Noer, S. (2024). Analisis Clustering Data Penyandang Disabilitas Menggunakan Metode Agglomerative Hierarchical Clustering dan K-means. Jurnal Teknologi dan Manajemen Informatika, vol. 10 no. 1, pp 1-12.

Purnamasari, K. K. (2019). K-Means and K-Medoids for Indonesian Text Summarization. IOP Conference Series: Materials Science and Engineering, doi: https://doi.org/10.1088/1757-899X/662/6/062013.

Hussain, A. A., & Bodapati, P. (2016). Pattern Discovery and Document Clustering Using K-Means, PAM and HAC. from https://www.researchgate.net/publication/286747212_Pattern_discovery_using_k-means_algorithm.

Hamadeh, M. W. (2015). Using Text Mining and Clustering Techniques on Tweets to Discover Trending Topics in Dubai from https://bspace.buid.ac.ae/buid_server/api/core/bitstreams/65c1e2f5-6828-4d86-812c-b978b60da94a/content.

Dai, Q., & Liu, J. (2019). The Exploration and Application of K-Medoids in Text Clustering. Journal of Advances in Applied Mathematics, doi: https://doi.org/10.22606/JAAM.2019.43001.

Ahmed, M. A., Baharin, H., & Nohuddin, P. N. E. (2023). K-Means Variations Analysis for Translation of English Tafseer Al-Quran Text. International Journal of Electrical and Computer Engineering, vol. 13, no. 3, pp. 3255–3265, doi: https://doi.org/10.11591/ijece.v13i3.pp3255-3265.

Nurdiyansyah, F., & Akbar, I. (2021). Implementasi Algoritma K-Means untuk Menentukan Persediaan Barang pada Poultry Shop. Jurnal Teknologi dan Manajemen Informatika, vol. 7, no. 2, pp. 86–94, doi: https://doi.org/10.26905/jtmi.v7i2.6377.

Sujjada, A., Insany, G. P., & Noer, S. (2024). Analisis Clustering Data Penyandang Disabilitas Menggunakan Metode Agglomerative Hierarchical Clustering dan K-means. Jurnal Teknologi dan Manajemen Informatika (JTMI), vol. 10, no. 1, pp. 1–12, doi: https://doi.org/10.26905/jtmi.v10i1.10654.

Khan, M. K., Ahmed, S. M., Sarker, S., & Khan, M. H. A. (2021). K-Cosine-Medoids Clustering Algorithm. 2021 5th International Conference on Electrical Information and Communication Technology (EICT), pp. 1–5.

Kusuma, A. P., & Oktavianto, A. D. (2022). Analisis Metode Euclidean Distance dalam Menentukan Koordinat Peta pada Alamat Rumah. Jurnal Teknologi dan Manajemen Informatika, vol. 8, no. 2, pp. 108–115, doi: https://doi.org/10.26905/jtmi.v8i2.8871.

Hidayati, R., Zubair, A.. Pratama, A. H., & Indana, L. (2021). Analisis silhouette coefficient pada 6 perhitungan jarak k-means clustering. Tecno. com Jurnal Teknologi Informasi, vol. 20, no. 2, pp. 186-197.

Ramadhan, V.P., Purwanto, & Alzami, F. (2022). Sentiment analysis of community response Indonesia against covid-19 on twitter based on negation handling. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, vol. 7, no. 2, pp. 161-168.

Downloads

Published

04-07-2025

Issue

Section

Articles