Improving semantic search

Claude skills, Python notebook, new research directions

Videlicet has changed a lot in the past few months! This week, we continued fine-tuning our searching, experimenting with Claude skills, and building Python notebooks.

We have an evaluation process to provide feedback for computer-generated transcriptions, but what about our search results? With an untrained LLM, semantic search results are too numerous and broad for researchers. We are narrowing the scope by training the LLM to think like a historian!

This week, we explored Claude skills and wrote Python notebooks to train the LLM to recognize direct matches, somewhat related, and unrelated pages returned from a semantic search. We also set up a supervised fine-tuning training model to expedite the training process further.

I also began the long overdue endeavor of updating the documentation on our website! Videlicet Help This is an ongoing process, so check back regularly to learn about our new functions.

Lastly, I’ve been brainstorming ideas for my Master’s research project. For my undergraduate honors thesis, I explored 1700s Virginia county court records at the Library of Virginia. Those court cases raised more questions than they answered! However, they are primarily accessible via microfilm, making them hard to search. I’m in the process of contacting and collaborating with Virginian counties to digitize the sources, transcribe them, and make them searchable through Videlicet. I’m looking forward to working with these records and discovering new history!

As always, we’re looking for collaborators from archives, academic labs, and faculty interested in making their historical sources searchable.

LinkedIn