TurnBench

Score your predictions

Test on the TurnBench public dev set by generating a predictions JSON and dropping it below.

Loading the dev gold…

Scored locally in your browser against the public dev set.

Submission format

One JSON file: for every conversation, the times at which your model commits to each event.

{
  "schema_version": 1,
  "predictions": [
    {
      "conversation_id": "20",
      "speaker_1": {
        "eot": [12.84, 58.43, 104.71],
        "interruption": [31.20]
      },
      "speaker_2": {
        "eot": [27.10, 71.92],
        "interruption": []
      }
    },
    // one entry per conversation...
  ]
}
schema_version
Always 1.
predictions
Exactly one entry per conversation.
conversation_id
Same as the conversation id in the dataset.
speaker_1 / speaker_2
The corresponding channel in the conversation.
eot
Times at which this speaker's turn ends.
interruption
Times at which this speaker interrupts the other speaker.
Submissions are causal
A model's output at time t may only depend on audio up to t. Each event is reported at the time the model committed to it, not the time the event happened in the audio: an event detected at 0.7 s using audio through 1.0 s is reported at 1.0 s. Submitting affirms this.
Event times are strictly increasing
Each list is float seconds on the shared conversation clock, sorted, and within the audio's duration.