Score your predictions

Test on the TurnBench public dev set by generating a predictions JSON and dropping it below.

Loading the dev gold…

Scored locally in your browser against the public dev set.

Submission format

One JSON file: for every conversation, the times at which your model commits to each event.

{
  "schema_version": 1,
  "predictions": [
    {
      "conversation_id": "20",
      "speaker_1": {
        "eot": [12.84, 58.43, 104.71],
        "interruption": [31.20]
      },
      "speaker_2": {
        "eot": [27.10, 71.92],
        "interruption": []
      }
    },
    // one entry per conversation...
  ]
}

schema_version: Always 1.
predictions: Exactly one entry per conversation.
conversation_id: Same as the conversation id in the dataset.
speaker_1 / speaker_2: The corresponding channel in the conversation.
eot: Times at which this speaker's turn ends.
interruption: Times at which this speaker interrupts the other speaker.

Submissions are causal: A model's output at time t may only depend on audio up to t. Each event is reported at the time the model committed to it, not the time the event happened in the audio: an event detected at 0.7 s using audio through 1.0 s is reported at 1.0 s. Submitting affirms this.
Event times are strictly increasing: Each list is float seconds on the shared conversation clock, sorted, and within the audio's duration.