Application of Text Mining and K-Medoids for Job Clustering Based on Job Description Analysis
DOI:
https://doi.org/10.26905/jtmi.v11i1.15863Keywords:
Text Mining, K-Medoids, Clustering, Job Group, Job DescriptionAbstract
The massive growth of job description data poses challenges in manual job classification due to the unstructured nature of the data. This study aims to develop a job cluster grouping model using a text mining approach and the K-Medoids algorithm. Job description data was obtained from the O*NET Online website and supplemented with data from various prominent job portals in Indonesia. The data was then processed through the stages of tokenization, stopword removal, lemmatization, and vector representation using SentenceTransformer. The clustering process was performed using the K-Medoids algorithm with the Euclidean distance metric, and evaluation was conducted using the Silhouette Score and Davies-Bouldin Index. The results showed that the model produced a sufficiently representative classification with a Silhouette Score of 0.48770 and a Davies-Bouldin Index of 0.815, demonstrating better performance compared to the Agglomerative Clustering method. This approach is effective for supporting automated and data-driven human resource management.
Downloads
References
Putri, D. M., Ilmananda, A. S., & Prisanta, N. (2024). The Use of K-Means and K-Medoids Algorithms for Developing New Student Admissions Promotion Strategies. SMATIKA : STIKI Informatika Jurnal, vol 14, no. 2, pp. 388-398.
Chai, C. P. (2023). Comparison of Text Preprocessing Methods. Natural Language Engineering, vol. 29, no 3, pp. 456–474, doi: https://doi.org/10.1017/S1351324922000213.
Mishra, M., & Narendar, P. (2021). Impact of HR Analytics on Training and Development in an Organization. Psychology and Education Journal, vol. 58, no. 1, pp. 3606–3614, doi: https://doi.org/10.17762/pae.v58i1.1315.
Sujjada, A., Insany, G. P., & Noer, S. (2024). Analisis Clustering Data Penyandang Disabilitas Menggunakan Metode Agglomerative Hierarchical Clustering dan K-means. Jurnal Teknologi dan Manajemen Informatika, vol. 10 no. 1, pp 1-12.
Purnamasari, K. K. (2019). K-Means and K-Medoids for Indonesian Text Summarization. IOP Conference Series: Materials Science and Engineering, doi: https://doi.org/10.1088/1757-899X/662/6/062013.
Hussain, A. A., & Bodapati, P. (2016). Pattern Discovery and Document Clustering Using K-Means, PAM and HAC. from https://www.researchgate.net/publication/286747212_Pattern_discovery_using_k-means_algorithm.
Hamadeh, M. W. (2015). Using Text Mining and Clustering Techniques on Tweets to Discover Trending Topics in Dubai from https://bspace.buid.ac.ae/buid_server/api/core/bitstreams/65c1e2f5-6828-4d86-812c-b978b60da94a/content.
Dai, Q., & Liu, J. (2019). The Exploration and Application of K-Medoids in Text Clustering. Journal of Advances in Applied Mathematics, doi: https://doi.org/10.22606/JAAM.2019.43001.
Ahmed, M. A., Baharin, H., & Nohuddin, P. N. E. (2023). K-Means Variations Analysis for Translation of English Tafseer Al-Quran Text. International Journal of Electrical and Computer Engineering, vol. 13, no. 3, pp. 3255–3265, doi: https://doi.org/10.11591/ijece.v13i3.pp3255-3265.
Nurdiyansyah, F., & Akbar, I. (2021). Implementasi Algoritma K-Means untuk Menentukan Persediaan Barang pada Poultry Shop. Jurnal Teknologi dan Manajemen Informatika, vol. 7, no. 2, pp. 86–94, doi: https://doi.org/10.26905/jtmi.v7i2.6377.
Sujjada, A., Insany, G. P., & Noer, S. (2024). Analisis Clustering Data Penyandang Disabilitas Menggunakan Metode Agglomerative Hierarchical Clustering dan K-means. Jurnal Teknologi dan Manajemen Informatika (JTMI), vol. 10, no. 1, pp. 1–12, doi: https://doi.org/10.26905/jtmi.v10i1.10654.
Khan, M. K., Ahmed, S. M., Sarker, S., & Khan, M. H. A. (2021). K-Cosine-Medoids Clustering Algorithm. 2021 5th International Conference on Electrical Information and Communication Technology (EICT), pp. 1–5.
Kusuma, A. P., & Oktavianto, A. D. (2022). Analisis Metode Euclidean Distance dalam Menentukan Koordinat Peta pada Alamat Rumah. Jurnal Teknologi dan Manajemen Informatika, vol. 8, no. 2, pp. 108–115, doi: https://doi.org/10.26905/jtmi.v8i2.8871.
Hidayati, R., Zubair, A.. Pratama, A. H., & Indana, L. (2021). Analisis silhouette coefficient pada 6 perhitungan jarak k-means clustering. Tecno. com Jurnal Teknologi Informasi, vol. 20, no. 2, pp. 186-197.
Ramadhan, V.P., Purwanto, & Alzami, F. (2022). Sentiment analysis of community response Indonesia against covid-19 on twitter based on negation handling. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, vol. 7, no. 2, pp. 161-168.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
(1)Â Copyright of the published articles will be transferred to the journal as the publisher of the manuscripts. Therefore, the author confirms that the copyright has been managed by the journal.
(2) Publisher of JTMI: Jurnal Teknologi dan Manajemen Informatika is University of Merdeka Malang.
(3) The copyright follows Creative Commons Attribution–ShareAlike License (CC BY SA): This license allows to Share — copy and redistribute the material in any medium or format, Adapt — remix, transform, and build upon the material, for any purpose, even commercially.