Score your predictions
Test on the TurnBench public dev set by generating a predictions JSON and dropping it below.
Loading the dev gold…
Scored locally in your browser against the public dev set.
Submission format
One JSON file: for every conversation, the times at which your model commits to each event.
{ "schema_version": 1, "predictions": [ { "conversation_id": "20", "speaker_1": { "eot": [12.84, 58.43, 104.71], "interruption": [31.20] }, "speaker_2": { "eot": [27.10, 71.92], "interruption": [] } }, // one entry per conversation... ] }
- schema_version
- Always 1.
- predictions
- Exactly one entry per conversation.
- conversation_id
- Same as the conversation id in the dataset.
- speaker_1 / speaker_2
- The corresponding channel in the conversation.
- eot
- Times at which this speaker's turn ends.
- interruption
- Times at which this speaker interrupts the other speaker.
- Submissions are causal
- A model's output at time t may only depend on audio up to t. Each event is reported at the time the model committed to it, not the time the event happened in the audio: an event detected at 0.7 s using audio through 1.0 s is reported at 1.0 s. Submitting affirms this.
- Event times are strictly increasing
- Each list is float seconds on the shared conversation clock, sorted, and within the audio's duration.