This project was created for Software Engineering Project at University of Helsinki.
The project introduces an AI powered memory assistant that integrates with Even Realities G1 smart glasses. The core concept is to enable the glasses to record work meetings and provide realtime, contextually relevant information on the glasses display during ongoing conversations. Key data such as budgets, deadlines, and action items are automatically extracted from recorded meetings and stored in a vector database, which serves as the knowledge source for information retrieval.
- Python
- FastApi
- postgreSQL
- Gemini
- Firebase
Audio is streamed to Google Cloud Speech-to-Text, which produces an accurate transcript. The transcript is forwarded to Gemini live, which makes tool calls to query the vector database when it determines stored information may be relevant to the ongoing conversation. The vector database results, the transcript and Gemini lives context, are passed to a Gemini model that decides whether the retrieved information is useful. If relevant, the model generates a concise one sentence summary and sends it to the client, where it is displayed on the glasses. After the meeting ends, the transcript is sent to Gemini model that will generate summary of the conversation and extract key information. Extracted information is embedded as vectors and stored in the vector database for the Gemini live to use in future conversations.
Google products were used as clients request
At first for MVP we started with just a speech-to-text model returning the spoken speech as text to the glasses.
Later when started to implement the actual product, we chose Gemini live native-audio model for processing raw audio, generating transcript and making tool calls. We figured that audio-native model would have lowest latency because it can generate transcript and make tool calls concurrently. Also simplicity was a big reason.
However while Gemini live was good at making tool calls and understanding the audio, the transcript it produced was utter garbage. This was major setback as our application relies heavily on transcript for making decisions and storing information to vector database. This is why we switched to a speech-to-text for generating transcript that is far more accurate. STT produced transcript is then forwarded to Gemini live. This will cause latency but improved transcript enabled our software to work properly. We also considered feeding raw audio for both to speech-to-text and Gemini live for smaller latency but as audio tokens for Gemini live are much more expensive than text tokens, we decided to feed the transcript to Gemini live.
- Flutter
- Even realities G1 (communication with glasses)
We chose to use Websockets instead of the webRTC for audio streaming because of ease of implementation.
Official Even realities demo app was not complete app and we had problems having it to connect to glasses. We use a community fork of even_realities_g1 for BLE communication with the glasses. Since the package is probably not actively maintained, we forked it as a local dependency (packages/even_realities_g1) to allow fixes and modifications.
- Built in microphone of the glasses isn't good enough to record both discussion participants voice so we decided to use phone microphone to improve accuracy of the audio recording.
- Documentation and glasses internal functionality isn't described well by Even realities so we need to use communitys findings and reverse engineering.
- Connecting to glasses might cause headache when first time connect to them. After connection is established further connection is much easier
- Developed only Android in mind, IOS might work but is not tested accordingly.
- Connecting to G1 glasses causes sometimes problems.

