Wang Yuqi's Blog

Papers About Creating a Dataset or Benchmark

Benchmark

Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

procedure


DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation


TimelineQA: A Benchmark for Question Answering over Timelines


Large Language Models of Code Fail at Completing Code with Potential Bugs

Dataset

A parallel corpus of Python functions and documentation strings for automated code documentation and code generation


Program Synthesis with Large Language Models


MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms


Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks


Tabular dataset 1


GitTables: A Large-Scale Corpus of Relational Tables


The Stack: 3 TB of permissively licensed source code


Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs


The Pile: An 800GB Dataset of Diverse Text for Language Modeling


The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only