publications
2024
- From Text to Maps: LLM-Driven Extraction and Geotagging of Epidemiological DataKarlyn K. Harrod, Prabin Bhandari, and Antonios AnastasopoulosIn Proceedings of the Third Workshop on NLP for Positive Impact, Nov 2024
Epidemiological datasets are essential for public health analysis and decision-making, yet they remain scarce and often difficult to compile due to inconsistent data formats, language barriers, and evolving political boundaries. Traditional methods of creating such datasets involve extensive manual effort and are prone to errors in accurate location extraction. To address these challenges, we propose utilizing large language models (LLMs) to automate the extraction and geotagging of epidemiological data from textual documents. Our approach significantly reduces the manual effort required, limiting human intervention to validating a subset of records against text snippets and verifying the geotagging reasoning, as opposed to reviewing multiple entire documents manually to extract, clean, and geotag. Additionally, the LLMs identify information often overlooked by human annotators, further enhancing the dataset’s completeness. Our findings demonstrate that LLMs can be effectively used to semi-automate the extraction and geotagging of epidemiological data, offering several key advantages: (1) comprehensive information extraction with minimal risk of missing critical details; (2) minimal human intervention; (3) higher-resolution data with more precise geotagging; and (4) significantly reduced resource demands compared to traditional methods.
@inproceedings{harrod-etal-2024-text, title = {From Text to Maps: {LLM}-Driven Extraction and Geotagging of Epidemiological Data}, author = {Harrod, Karlyn K. and Bhandari, Prabin and Anastasopoulos, Antonios}, editor = {Dementieva, Daryna and Ignat, Oana and Jin, Zhijing and Mihalcea, Rada and Piatti, Giorgio and Tetreault, Joel and Wilson, Steven and Zhao, Jieyu}, booktitle = {Proceedings of the Third Workshop on NLP for Positive Impact}, month = nov, year = {2024}, address = {Miami, Florida, USA}, publisher = {Association for Computational Linguistics}, pages = {258--270}, outstanding_paper = {true} }
- Urban Mobility Assessment Using LLMsPrabin Bhandari, Antonios Anastasopoulos, and Dieter PfoserIn Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems, Nov 2024
In urban science, understanding mobility patterns and analyzing how people move around cities helps improve the overall quality of life and supports the development of more livable, efficient, and sustainable urban areas. A challenging aspect of this work is the collection of mobility data through user tracking or travel surveys, given the associated privacy concerns, noncompliance, and high cost. This work proposes an innovative AI-based approach for synthesizing travel surveys by prompting large language models (LLMs), aiming to leverage their vast amount of relevant background knowledge and text generation capabilities. Our study evaluates the effectiveness of this approach across various U.S. metropolitan areas by comparing the results against existing survey data at different granularity levels. These levels include (i) pattern level, which compares aggregated metrics such as the average number of locations traveled and travel time, (ii) trip level, which focuses on comparing trips as whole units using transition probabilities, and (iii) activity chain level, which examines the sequence of locations visited by individuals. Our work covers several proprietary and open-source LLMs, revealing that open-source base models like Llama-2, when fine-tuned on even a limited amount of actual data, can generate synthetic data that closely mimics the actual travel survey data and, as such, provides an argument for using such data in mobility studies.
@inproceedings{10.1145/3678717.3691221, author = {Bhandari, Prabin and Anastasopoulos, Antonios and Pfoser, Dieter}, title = {Urban Mobility Assessment Using LLMs}, year = {2024}, isbn = {9798400711077}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3678717.3691221}, doi = {10.1145/3678717.3691221}, booktitle = {Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems}, pages = {67–79}, numpages = {13}, keywords = {Large Language Models, Travel Data, Travel Survey, Travel Survey Data Simulation}, location = {Atlanta, GA, USA}, series = {SIGSPATIAL '24}, best_paper = {true} }
- Using Contextual Information for Sentence-level Morpheme SegmentationPrabin Bhandari, and Abhishek PaudelarXiv preprint arXiv:2403.15436, Nov 2024
Recent advancements in morpheme segmentation primarily emphasize word-level segmentation, often neglecting the contextual relevance within the sentence. In this study, we redefine the morpheme segmentation task as a sequence-to-sequence problem, treating the entire sentence as input rather than isolating individual words. Our findings reveal that the multilingual model consistently exhibits superior performance compared to monolingual counterparts. While our model did not surpass the performance of the current state-of-the-art, it demonstrated comparable efficacy with high-resource languages while revealing limitations in low-resource language scenarios.
@article{bhandari2024using, title = {Using Contextual Information for Sentence-level Morpheme Segmentation}, author = {Bhandari, Prabin and Paudel, Abhishek}, journal = {arXiv preprint arXiv:2403.15436}, year = {2024}, }
- A Survey on Prompting Techniques in LLMsPrabin BhandariarXiv preprint arXiv:2312.03740, Nov 2024
Autoregressive Large Language Models have transformed the landscape of Natural Language Processing. Pre-train and prompt paradigm has replaced the conventional approach of pre-training and fine-tuning for many downstream NLP tasks. This shift has been possible largely due to LLMs and innovative prompting techniques. LLMs have shown great promise for a variety of downstream tasks owing to their vast parameters and huge datasets that they are pre-trained on. However, in order to fully realize their potential, their outputs must be guided towards the desired outcomes. Prompting, in which a specific input or instruction is provided to guide the LLMs toward the intended output, has become a tool for achieving this goal. In this paper, we discuss the various prompting techniques that have been applied to fully harness the power of LLMs. We present a taxonomy of existing literature on prompting techniques and provide a concise survey based on this taxonomy. Further, we identify some open problems in the realm of prompting in autoregressive LLMs which could serve as a direction for future research.
@article{bhandari2024surveypromptingtechniquesllms, title = {A Survey on Prompting Techniques in LLMs}, author = {Bhandari, Prabin}, journal = {arXiv preprint arXiv:2312.03740}, year = {2024}, }
2023
- Are Large Language Models Geospatially Knowledgeable?Prabin Bhandari, Antonios Anastasopoulos, and Dieter PfoserIn Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems, Nov 2023
Despite the impressive performance of Large Language Models (LLM) for various natural language processing tasks, little is known about their comprehension of geographic data and related ability to facilitate informed geospatial decision-making. This paper investigates the extent of geospatial knowledge, awareness, and reasoning abilities encoded within such pretrained LLMs. With a focus on autoregressive language models, we devise experimental approaches related to (i) probing LLMs for geo-coordinates to assess geospatial knowledge, (ii) using geospatial and non-geospatial prepositions to gauge their geospatial awareness, and (iii) utilizing a multidimensional scaling (MDS) experiment to assess the models’ geospatial reasoning capabilities and to determine locations of cities based on prompting. Our results confirm that it does not only take larger but also more sophisticated LLMs to synthesize geospatial knowledge from textual information. As such, this research contributes to understanding the potential and limitations of LLMs in dealing with geospatial information.
@inproceedings{10.1145/3589132.3625625, author = {Bhandari, Prabin and Anastasopoulos, Antonios and Pfoser, Dieter}, title = {Are Large Language Models Geospatially Knowledgeable?}, year = {2023}, isbn = {9798400701689}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3589132.3625625}, doi = {10.1145/3589132.3625625}, booktitle = {Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems}, articleno = {75}, numpages = {4}, keywords = {large language models, geospatial knowledge, geospatial awareness, geospatial reasoning}, location = {Hamburg, Germany}, series = {SIGSPATIAL '23}, }
- Trustworthiness of Children Stories Generated by Large Language ModelsPrabin Bhandari, and Hannah BrennanIn Proceedings of the 16th International Natural Language Generation Conference, Sep 2023
Large Language Models (LLMs) have shown a tremendous capacity for generating literary text. However, their effectiveness in generating children’s stories has yet to be thoroughly examined. In this study, we evaluate the trustworthiness of children’s stories generated by LLMs using various measures, and we compare and contrast our results with both old and new children’s stories to better assess their significance. Our findings suggest that LLMs still struggle to generate children’s stories at the level of quality and nuance found in actual stories.
@inproceedings{bhandari-brennan-2023-trustworthiness, title = {Trustworthiness of Children Stories Generated by Large Language Models}, author = {Bhandari, Prabin and Brennan, Hannah}, booktitle = {Proceedings of the 16th International Natural Language Generation Conference}, month = sep, year = {2023}, address = {Prague, Czechia}, publisher = {Association for Computational Linguistics}, pages = {352--361}, }
- Estimation of Vehicular Velocity based on Non-Intrusive stereo cameraBikram Adhikari, and Prabin BhandariarXiv preprint arXiv:2304.05298, Sep 2023
The paper presents a modular approach for the estimation of a leading vehicle’s velocity based on a non-intrusive stereo camera where SiamMask is used for leading vehicle tracking, Kernel Density estimate (KDE) is used to smooth the distance prediction from a disparity map, and LightGBM is used for leading vehicle velocity estimation. Our approach yields an RMSE of 0.416 which outperforms the baseline RMSE of 0.582 for the SUBARU Image Recognition Challenge
@article{adhikari2023estimation, title = {Estimation of Vehicular Velocity based on Non-Intrusive stereo camera}, author = {Adhikari, Bikram and Bhandari, Prabin}, journal = {arXiv preprint arXiv:2304.05298}, year = {2023}, }