Talking to Sylva | Sylva Help Center

Typing Messages

Type your message in the input bar at the bottom of any thread. You can:

Press Enter to send (or switch to Ctrl+Enter in settings if you prefer Enter for newlines)
Use the formatting toolbar for bold, italic, lists, and emoji
Write multi-line messages freely

Input bar with message context

When you're ready, hit Enter or click the send button:

Send your message

The formatting toolbar sits above the input area with buttons for bold, italic, lists, and more.

Searching Your Meeting Transcripts

When you ask Sylva a question in any thread — including the Main Thread and every other conversation — it automatically searches across all of your completed meeting transcripts. Meetings don't need to be linked to the current thread, tagged, or attached in any way for their content to surface. You don't need to tell Sylva where to look. Ask your question naturally and Sylva finds the right context for you.

This means a question you ask in a project-specific thread can still pull relevant excerpts from an unrelated one-on-one or an all-hands meeting if the content is relevant. Linking a meeting to a thread is useful for organization, but it has no effect on search — Sylva searches your entire transcript library regardless.

The search works by semantic similarity, meaning Sylva matches the meaning of your question against the content of your transcripts rather than looking for exact keywords. If you ask "What did the team decide about the launch date?", Sylva finds relevant excerpts even if no one in the meeting used the word "decide" or "launch date" verbatim. It looks across every completed meeting in your workspace — one-on-ones, team standups, client calls, all-hands, and any other meeting where a transcript was captured — so you always get the broadest possible context.

When Sylva finds relevant transcript excerpts, it weaves them directly into its response with source attribution showing the meeting title. The AI works from the actual words spoken in your meetings — not a summary or approximation — so you can trust the answer traces back to a real conversation. A typical exchange might look like this:

You type: "What did Sarah say about the Q3 timeline?"

Sylva responds:

Based on your product sync from Tuesday, Sarah mentioned that the Q3 timeline needs to shift by two weeks due to the dependency on the API redesign. She suggested moving the beta launch from July 15 to July 29 and asked Marcus to update the roadmap deck before the board meeting.

The more meetings you have transcribed, the richer the context Sylva can draw from when answering your questions. This search runs automatically — there's nothing to enable or configure.

Chat response with meeting transcript context

Voice Input

Click the microphone button in the input bar to speak instead of type.

Start voice input

Voice audio is recorded at 32 kbps bitrate for clear, natural-sounding capture and automatically converted to WebM/Opus format before upload. The higher bitrate provides better audio quality — voices sound fuller with improved clarity on consonants and vocal inflection, making recordings easier to review during playback. Each recording can be up to 5 minutes (300 seconds) long. If you reach the 5-minute mark, Sylva automatically stops the recording without any action on your part and submits it for transcription — a notification appears letting you know the limit was reached. You won't lose any audio; everything captured up to that point proceeds through the normal pipeline.

After Sylva transcribes your speech, it runs the transcript through a smart format pipeline before you see it. This pipeline removes filler words (like "um" and "uh"), applies punctuation and capitalization, processes any voice commands you spoke (such as "new paragraph" or "scratch that"), and formats lists — so the text that appears in your input bar reads like polished, typed text rather than a raw transcription. You then review the cleaned-up result and press Send when you're satisfied. Sylva never auto-sends voice messages, so you always have a chance to edit before sending.

All voice recordings are automatically converted to WebM/Opus format before upload. WebM/Opus is a modern audio codec designed specifically for speech — it delivers dramatically smaller file sizes than raw browser recordings while maintaining excellent clarity across all major browsers. The 32 kbps bitrate produces voices that sound clear, full, and natural during playback, with noticeably better quality than lower bitrates — making it easy to review recordings and catch nuances in dictated content — while still producing compact files (roughly 2.4 MB per 10 minutes) that upload quickly. This conversion happens transparently — you don't need to do anything — and it means your recordings use less storage and upload faster than uncompressed browser audio.

Pauses During Recording

You don't need to speak in one continuous stream. When you pause to collect your thoughts — whether for a couple of seconds or longer — the recording keeps running and the audio stream stays intact. Sylva doesn't interpret silence as a signal to stop; it waits for you to either resume speaking or explicitly end the recording. This means you can pause mid-sentence, think through what you want to say next, and pick back up without losing context or creating a fragmented transcript. The final transcription treats the entire recording as one continuous piece of audio regardless of any gaps in speech.

Your workspace also has a configurable recording duration set by an admin, which may be shorter than the 5-minute hard cap. The default limit is 2 minutes, but your admin may have set it to 5 minutes, 10 minutes, or 20 minutes — though any setting above 5 minutes is effectively capped at the 300-second maximum. Check with your admin if you're unsure what your workspace allows. If you hit the configured duration while speaking, Sylva automatically stops the recording and submits it for transcription. You won't lose any audio — everything captured up to that point is transcribed and placed in the input bar as usual. If you have more to say, start a new recording.

You can choose your preferred voice mode in Settings > AI & Voice:

Push-to-talk — Hold the mic button (or spacebar) while speaking
Toggle — Click once to start recording, click again to stop
Both — Use either method

Recording Quality and File Size

You can control the audio bitrate of your voice recordings in Settings > AI & Voice under Recording quality. This setting determines how much detail is captured when encoding your audio to WebM/Opus format — and directly affects how much of your audio storage quota each recording consumes.

Three options are available:

Low (13 kbps) — Optimized for voice. Uses roughly 1 MB per 10 minutes of recording. Works well for dictation and conversational input where storage efficiency is the priority. Deepgram's speech-to-text engine is designed for speech-frequency audio, so even this bitrate captures everything the transcription model needs.
Standard (32 kbps) — The default. Provides better audio quality than the Low setting — voices sound fuller and more natural during playback, with improved clarity on consonants and vocal inflection. Uses roughly 2.4 MB per 10 minutes. Good for everyday use where you want higher-quality playback without a large storage footprint.
High (128 kbps) — Near-lossless voice quality. Uses roughly 10 MB per 10 minutes. Choose this if you plan to download and listen back to your recordings and want the best fidelity.

If you frequently record long voice notes, use Low quality to stay within your audio quota. If transcription accuracy is critical — for example, you're dictating in a noisy environment or working with specialized terminology — use High quality. That said, even the Low setting captures enough detail for reliable speech-to-text in most conditions, so the quality choice is primarily about storage trade-offs and playback fidelity rather than transcription performance.

Your workspace has an audio storage quota set by your admin (the default varies by plan). You can see how much quota you've used in Settings > Admin. Choosing a lower bitrate stretches your quota significantly — a 10-minute recording at Low quality is roughly 1/10th the size of the same recording at High quality.

Recording quality setting

Uploading Audio Files for Transcription

You can also upload pre-recorded audio files for transcription through the upload dialog. If the file you upload is not already in WebM format, Sylva automatically converts it to WebM/Opus before uploading. A progress bar appears showing both stages — first the conversion progress, then the upload progress — so you always know where things stand. The conversion uses the same speech-optimized encoding as live recordings, keeping file sizes small and transcription quality high.

Audio conversion progress bar during upload

Max Recording Duration

Sylva enforces a time limit on voice recordings so that long recordings don't consume unnecessary resources or exceed transcription limits. There are two layers to this:

Hard maximum of 300 seconds (5 minutes) — This is a system-wide cap that applies to all workspaces regardless of admin configuration. The server-side transcription pipeline supports recordings up to this full 300-second length without timeout, so recordings at or near the maximum reliably complete transcription. When a recording reaches 5 minutes, Sylva automatically stops it without any user action and submits it for transcription.
Workspace limit (admin-configurable) — Your admin can set a shorter limit for your workspace. If the workspace limit is lower than 5 minutes, the recording stops at the workspace limit. If the workspace limit is set higher than 5 minutes (e.g., 10 or 20 minutes) or unlimited, the 300-second hard cap still applies.

In either case, the auto-stop is seamless — the recording ends, a notification appears letting you know the limit was reached, and your audio is sent for transcription. You won't lose any audio — everything captured up to that point is transcribed and placed in the input bar as usual. If you have more to say, start a new recording.

The default limit for new Sylva instances is 2 minutes.

Configuring the Limit (Admin)

Admins can change the maximum recording duration in Settings > Admin under the Voice section. Open the Voice Max Recording Minutes dropdown and choose from:

Unlimited — No admin-configured time limit, but the 300-second hard cap still applies.
2 minutes — The default.
5 minutes
10 minutes
20 minutes

Settings above 5 minutes are effectively capped at 300 seconds due to the server-side transcription limit. The change takes effect immediately for all users in your workspace.

Max recording duration setting

Voice Commands

While dictating, you can speak punctuation and formatting commands naturally. Sylva recognizes them during the smart format pipeline and converts them to the correct symbols before you see the final text:

Punctuation:

"period" / "full stop", "comma", "question mark", "exclamation point", "colon", "semicolon"
"apostrophe", "ellipsis" / "dot dot dot"
"dash", "em dash"

Quotes & Brackets:

"quote" / "quotation mark" / "end quote" (toggles open/close)
"open paren" / "close paren", "open bracket" / "close bracket", "open brace" / "close brace"

Symbols:

"at sign", "ampersand", "hashtag" / "hash", "asterisk", "underscore"
"dollar sign", "percent" / "percent sign"
"plus sign", "minus sign", "equals" / "equals sign"
"slash", "backslash"
"trademark" (TM), "copyright" (C), "degree" / "degree sign" (deg)

Line Breaks:

"new line" / "begin a new line" — starts a new line
"new paragraph" / "start a new paragraph" / "skip a line" — starts a new paragraph with a blank line

Corrections:

"scratch that" — removes the previous sentence or clause
"actually [word]" — replaces the preceding word with the one you say after "actually"

Smart Formatting

Beyond explicit voice commands, the smart format pipeline applies several automatic formatting rules to your transcript before it appears in the input bar. All of this processing happens after transcription and before you see the text — so what lands in the input bar is already clean and polished:

Filler removal — Words like "um", "uh", "er", and "hmm" are stripped out automatically. You don't need