publications
2025
- DatasetKARRIEREWEGE: A large scale Career Path Prediction DatasetElena Senger, Yuri Campbell, Rob Goot, and 1 more authorIn To appear in COLING Industry Track, Jan 2025
Accurate career path prediction can support many stakeholders, like job seekers, recruiters, HR, and project managers. However, publicly available data and tools for career path prediction are scarce. In this work, we introduce KARRIEREWEGE, a comprehensive, publicly available dataset containing over 500k career paths, significantly surpassing the size of previously available datasets. We link the dataset to the ESCO taxonomy to offer a valuable resource for predicting career trajectories. To tackle the problem of free-text inputs typically found in resumes, we enhance it by synthesizing job titles and descriptions resulting in KARRIEREWEGE+. This allows for accurate predictions from unstructured data, closely aligning with real-world application challenges. We benchmark existing state-of-the-art (SOTA) models on our dataset and a previous benchmark and see increased performance and robustness by synthesizing the data for the free-text use cases.
@inproceedings{senger-etal-2025-data, title = {KARRIEREWEGE: A large scale Career Path Prediction Dataset}, author = {Senger, Elena and Campbell, Yuri and van der Goot, Rob and Plank, Barbara}, booktitle = {To appear in COLING Industry Track}, month = jan, year = {2025} }
2024
- SurveyDeep Learning-based Computational Job Market Analysis: A Survey on Skill Extraction and Classification from Job PostingsElena Senger, Mike Zhang, Rob Goot, and 1 more authorIn Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024), Mar 2024
Recent years have brought significant advances to Natural Language Processing (NLP), which enabled fast progress in the field of computational job market analysis. Core tasks in this application domain are skill extraction and classification from job postings. Because of its quick growth and its interdisciplinary nature, there is no exhaustive assessment of this field. This survey aims to fill this gap by providing a comprehensive overview of deep learning methodologies, datasets, and terminologies specific to NLP-driven skill extraction. Our comprehensive cataloging of publicly available datasets addresses the lack of consolidated information on dataset creation and characteristics. Finally, the focus on terminology addresses the current lack of consistent definitions for important concepts, such as hard and soft skills, and terms relating to skill extraction and classification.
@inproceedings{senger-etal-2024-deep, title = {Deep Learning-based Computational Job Market Analysis: A Survey on Skill Extraction and Classification from Job Postings}, author = {Senger, Elena and Zhang, Mike and van der Goot, Rob and Plank, Barbara}, editor = {Hruschka, Estevam and Lake, Thom and Otani, Naoki and Mitchell, Tom}, booktitle = {Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024)}, month = mar, year = {2024}, address = {St. Julian{'}s, Malta}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.nlp4hr-1.1}, pages = {1--15} }