Subham Sah

Generating Analytic Specifications for Data Visualization from Natural Language Queries using Large Language Models

This was presented at IEEE VIS NLVIZ Workshop 2024

Released in 2024, this version enables developers to utilize a Large Language Model (GPT) to translate a natural language query about a dataset into a relevant visualization, including additional features such as multi-turn conversational interaction and ambiguity resolution. We present a comprehensive text prompt that, given a tabular dataset and an NL query about the dataset, generates an analytic specification including (detected) data attributes, (inferred) analytic tasks, and (recommended) visualizations. This specification captures key aspects of the query translation process, affording both explainability and debuggability. For instance, it provides mappings from the detected entities to the corresponding phrases in the input query, as well as the specific visual design principles that determined the visualization recommendations. Moreover, unlike prior LLM-based approaches, our prompt supports conversational interaction and ambiguity detection capabilities. In our paper, we detail the iterative process of curating our prompt, present a preliminary performance evaluation using GPT-4, and discuss the strengths and limitations of LLMs at various stages of query translation. Check it out at https://nl4dv.github.io/nl4dv/

Citation:

    @misc{sah2024nl4dvllm,
        title={Generating Analytic Specifications for Data Visualization from Natural Language Queries using Large Language Models}, 
        author={{Sah}, Subham and {Mitra}, Rishab and {Narechania}, Arpit and {Endert}, Alex and {Stasko}, John and {Dou}, Wenwen},
        year={2024},
        eprint={2408.13391},
        archivePrefix={arXiv},
        primaryClass={cs.HC},
        url={https://arxiv.org/abs/2408.13391}, 
        howpublished={Presented at the NLVIZ Workshop, IEEE VIS 2024}
    }