Wang Yuqi's Blog

Literature Review

RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation

Fine-tuning BART to do data preparation tasks:

Data preparation — including data cleaning, data transformation, entity resolution, information extraction, and so forth

GitTables: A Large-Scale Corpus of Relational Tables

Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes

architecture

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

Thinking: prompt to synthesize programs with random function which then are used to generate test cases.

DocPrompting: Generating Code by Retrieving the Docs

Binding Language Models in Symbolic Languages

A Static Evaluation of Code Completion by Large Language Models

Textbooks Are All You Need