Join the SAA Web Archiving Section on Friday, June 14, 12-1pm EST for a discussion with Matteo Cargnelutti and Kristi Mukk from the Harvard Library Innovation Lab about web archiving and AI!
Description:
Can the techniques used to ground and augment the responses provided by Large Language Models be used to help explore web archive collections? That question led us to develop and release WARC-GPT: an experimental open-source Retrieval Augmented Generation tool for exploring collections of WARC files using AI. WARC-GPT functions as a highly-customizable boilerplate the web archiving community can use to explore the intersection between web archiving and AI. Specifically, WARC-GPT is a RAG pipeline, which allows for the creation of a knowledge base out of a set of WARC files, which is later used to help answer questions asked to a Large Language Model (LLM) of the user's choosing. In this session, we will demo the tool and explain how it works, discuss our experience testing it out so far, and share our perspective on how web archivists can respond to this AI moment.
Blog post: https://lil.law.harvard.edu/blog/2024/02/12/warc-gpt-an-open-source-tool-for-exploring-web-archives-with-ai/
WARC-GPT on Github: https://github.com/harvard-lil/warc-gpt
Registration:
Please register in advance for this meeting: https://harvard.zoom.us/meeting/register/tJElceGrrDwpGdBcPae2eMWwrv4j_TqDmLcT
After registering, you will receive a confirmation email containing information about joining the meeting.
This presentation will be recorded, and the recording link will be made available afterward.
Allison Fischbach, MLIS (she/her)
Digital Archivist
Alan Mason Chesney Medical Archives
Johns Hopkins University and Medicine
afischbach@jhmi.edu
410-735-6782
medicalarchives.jhmi.edu