Transcribe

The transcribing feature utilizes WhisperX (an open-source wrapper on Whisper with additional functionality for detecting start and stop times for each word) to transcribe audio or video. Transcribing the content produces a Transcription object with comprehensive transcription information including the word-level, character-level, and sentence-level timestamps. Transcribing content is a prerequisite for clipping content.


Usage

    from clipsai import Transcriber

    transcriber = Transcriber()
    transcription: Transcription = transcriber.transcribe(
        audio_file_path="/abs/path/to/video.mp4"
    )

Transcriber Class

Source Code

A class for transcribing audio or video using WhisperX.

Methods

  • Name
    transcribe
    Type
    -> Transcription
    Description

    Transcribes an audio or video file.

Required Parameters

  • Name
    audio_file_pathstring
    Description

    Absolute path to the audio or video file to transcribe.

Optional Parameters

  • Name
    iso6391_lang_codestring = None
    Description

    ISO 639-1 language code to transcribe the media in. Default is None, which autodetects the media's language.

  • Name
    batch_sizeint = 16
    Description

    whisperx batch size. Reduce if low on GPU memory.

  • Name
    detect_language
    Type
    -> string
    Description

    Detects the language of an audio or video file.

Required Parameters

  • Name
    audio_file_pathstring
    Description

    Absolute path to the audio or video file to transcribe.

Optional Parameters

  • Name
    iso6391_lang_codestring = None
    Description

    ISO 639-1 language code to transcribe the media in. Default is None, which autodetects the media's language.

  • Name
    batch_sizeint = 16
    Description

    whisperx batch size. Reduce if low on GPU memory.


Transcription Class

Source Code

The Transcription class offers a detailed breakdown of audio or video transcriptions. It enables thorough analysis by providing structured access to the content at multiple levels - from individual characters and words to full sentences.

Properties

  • Name
    characters
    Type
    list[Character]
    Description

    A list of characters from the text as Character objects and ordered by start time.

  • Name
    words
    Type
    list[Word]
    Description

    A list of words from the text as Word objects and ordered by start time.

  • Name
    sentences
    Type
    list[Sentence]
    Description

    A list of sentences from the text as Sentence objects and ordered by start time.

  • Name
    text
    Type
    string
    Description

    The full textual content of the transcription.

  • Name
    language
    Type
    string
    Description

    The ISO 639-1 language code of the transcription's language.

  • Name
    created_time
    Type
    datetime
    Description

    The time when the transcription was created.

  • Name
    start_time
    Type
    float
    Description

    The start time of the transcript in seconds.

  • Name
    end_time
    Type
    float
    Description

    The end time of the transcript in seconds.

  • Name
    source_software
    Type
    string
    Description

    The software used for transcribing.

Methods

  • Name
    find_word_index
    Type
    -> int
    Description

    Finds the index in the transcript's character info who's start or end time is closest to 'target_time' (seconds).

Required Parameters

  • Name
    target_timefloat
    Description

    The time in seconds to search for.

  • Name
    type_of_timestring: start | end
    Description
    • start: returns the index of the word with the closest start time before target_time.
    • end: returns the index of the word with the closest end time after target time.
  • Name
    find_sentence_index
    Type
    -> int
    Description

    Finds the index in the transcript's sentence info who's start or end time is closest to 'target_time' (seconds).

Required Parameters

  • Name
    target_timefloat
    Description

    The time in seconds to search for.

  • Name
    type_of_timestring: start | end
    Description
    • start: returns the index of the sentence with the closest start time before target_time.
    • end: returns the index of the sentence with the closest end time after target time.

Sentence Class

Source Code

Represents a sentence in a transcription.

Properties

  • Name
    start_time
    Type
    float
    Description

    The start time of the sentence in seconds.

  • Name
    end_time
    Type
    float
    Description

    The end time of the sentence in seconds.

  • Name
    start_char
    Type
    int
    Description

    The index of the sentence's start character in the full text.

  • Name
    end_char
    Type
    int
    Description

    The index of the sentence's end character in the full text.

  • Name
    text
    Type
    string
    Description

    The text of the word.

Methods

  • Name
    to_dict
    Type
    -> dict
    Description

    Returns the properties of the sentence as a dictionary.


Word Class

Source Code

Represents a word in a transcription.

Properties

  • Name
    start_time
    Type
    float
    Description

    The start time of the word in seconds.

  • Name
    end_time
    Type
    float
    Description

    The end time of the word in seconds.

  • Name
    start_char
    Type
    int
    Description

    The index of the word's start character in the full text.

  • Name
    end_char
    Type
    int
    Description

    The index of the word's end character in the full text.

  • Name
    text
    Type
    string
    Description

    The text of the word.

Methods

  • Name
    to_dict
    Type
    -> dict
    Description

    Returns the properties of the word as a dictionary.


Character Class

Source Code

Represents a character in a transcription.

Properties

  • Name
    start_time
    Type
    float
    Description

    The start time of the character in seconds.

  • Name
    end_time
    Type
    float
    Description

    The end time of the character in seconds.

  • Name
    word_index
    Type
    int
    Description

    The index of the word in the transcription of the character.

  • Name
    sentence_index
    Type
    int
    Description

    The index of the sentence in the transcription of the character.

  • Name
    text
    Type
    string
    Description

    The text of the character.

Methods

  • Name
    to_dict
    Type
    -> dict
    Description

    Returns the properties of the character as a dictionary.