Clip

The clipping feature leverages the TextTiling algorithm to segment long-form audio content into coherent clips using the transcript. This approach, first conceptualized by Marti A. Hearst in the 1990s, detects shifts in the topics of a piece of content by analyzing word usage and distribution patterns. Thanks to recent advances in NLP, Texttiling with BERT Embeddings provides significant improvements over Texttiling's original formulation and can be readily applied to SoA transcriptions using Whisper. The algorithm segments the text at the granularity of sentences with the entire process focusing on detecting topic shifts rather than topics themselves. This is particularly effective in identifying distinct sections within a narrative and, consequently, clips of varying lengths optimized for short and extended audio content segments.


Usage

The following returns the start and end time of the clips.

from clipsai import ClipFinder, Transcriber

transcriber = Transcriber()
transcription = transcriber.transcribe(audio_file_path="/abs/path/to/video.mp4")

clipfinder = ClipFinder()
clips = clipfinder.find_clips(transcription=transcription)

print("StartTime: ", clips[0].start_time)
print("EndTime: ", clips[0].end_time)

To trim the video using the returned clips, run the following code.

media_editor = clipsai.MediaEditor()

# use this if the file contains audio stream only
media_file = clipsai.AudioFile("/abs/path/to/audio_only_file.mp4")
# use this if the file contains both audio and video stream
media_file = clipsai.AudioVideoFile("/abs/path/to/video.mp4")

clip = clips[0]  # select the clip you'd like to trim
clip_media_file = media_editor.trim(
    media_file=media_file,
    start_time=clip.start_time,
    end_time=clip.end_time,
    trimmed_media_file_path="/abs/path/to/clip.mp4",  # doesn't exist yet
)

ClipFinder Class

Source Code

A class for finding engaging clips based on the input transcript.

Methods

  • Name
    find_clips
    Type
    -> list[Clip]
    Description

    Finds clips in an audio file's transcription using the TextTiling Algorithm.

Required Parameters

  • Name
    transcriptionTranscription
    Description

    The transcription of an audio or video file to find clips from.


Clip Class

Source Code

Represents a clip of a video or audio file.

Properties

  • Name
    start_time
    Type
    string
    Description

    The start time of the clip in seconds.

  • Name
    end_time
    Type
    string
    Description

    The end time of the clip in seconds.

  • Name
    start_char
    Type
    string
    Description

    The start character in the transcription of the clip.

  • Name
    end_char
    Type
    string
    Description

    The end character in the transcription of the clip.

Methods

  • Name
    copy
    Type
    -> Clip
    Description

    Returns a copy of the Clip instance.

  • Name
    to_dict
    Type
    -> dict
    Description

    Returns a dictionary representation of the clip.