HyDE - Precise Zero-Shot Dense Retrieval without Relevance Labels

2024/01/23

Given a query, HyDE first zero-shot instructs an instruction-following language model (e.g. InstructGPT) to generate a hypothetical document.
The document captures relevance patterns but is unreal and may contain false details.
Then, an unsupervised contrastively learned encoder (e.g. Contriever) encodes the document into an embedding vector.
This vector identifies a neighborhood in the corpus embedding space, where similar real documents are retrieved based on vector similarity.
This second step ground the generated document to the actual corpus, with the encoder’s dense bottleneck filtering out the incorrect details.

hyde

Reference: https://github.com/run-llama/llama_index/blob/main/docs/examples/query_transformations/HyDEQueryTransformDemo.ipynb

Reference: https://boston.lti.cs.cmu.edu/luyug/HyDE/HyDE.pdf

#rag #llm #prompting

lukkiddd. 2022, powered by Jekyll Garden