Problem/Idea in 2-3 sentences

When editing a video, a significant amount of time is spent searching for B-roll footage. This is time-consuming and not enjoyable for editors. Editors typically index footage by tagging and filtering out low-quality clips, a process that consumes about one-third of the total editing time. For instance, for a 4-minute video, 3-4 out of 10 days of editing are spent on indexing, according to Michael Pikelj (interviewee). Often, editors know a specific shot or media asset exists but still spend time searching for it within their folder structure. This is especially problematic for unscripted videos, such as documentaries, event aftermovies, and vlogs, as well as for companies with large archives of footage, like news agencies.

Our idea is to develop a tool that allows users to search through their image and video archives semantically using natural language. Users can describe a) the content, b) the type of shot (e.g., panning camera, close-up), or c) the emotions conveyed in the clip. Additionally, users can search for text, logos, and spoken words within the footage.

1-3 year vision: Users can talk to the app to provide context about who people regularly occurring in the footage are and how different footage relates. The AI model uses this information to provide even better results.

2-5 year vision: The tool is capable of creating complete edits given a prompt. At first this would be TikTok length videos, then 1-2 minute videos (e.g. event after movies) and then 5 minute plus videos.

Why are/were we excited about it

  1. One of us has experienced this problem firsthand and is an end-user (Dom).
  2. Both user interviews confirmed a need for this product.
  3. Medium to strong product-founder fit: There is a machine learning problem to be solved that requires research, and we are both excited about the technology.
  4. There is a vision for a much larger product (storytelling through AI), but there is also a beachhead market through semantic video search. We also identified many adjacent markets or business opportunities that could arise if we solve semantic video search.

End user

The end user is an editor working alone or in a small team. He/she has been working on a single project for a longer time or regularly works on smaller projects lasting 1-4 weeks on average.

User groups we thought of: