Microsoft's MiniLM-L12-H384-uncased language model achieved state-of-the-art results on the SQuAD 2.0 question-answering benchmark, with exact match and F1 scores of 76.13% and 79.54%, respectively. The model was trained on the SQuAD 2.0 dataset using a batch size of 12, learning rate of 4e-5, and 4 epochs. The authors suggest using their model as a starting point for building large language models for downstream NLP tasks.
Microsoft's MiniLM-L12-H384-uncased language model achieved state-of-the-art results on the SQuAD 2.0 question-answering benchmark, with exact match and F1 scores of 76.13% and 79.54%, respectively. The model was trained on the SQuAD 2.0 dataset using a batch size of 12, learning rate of 4e-5, and 4 epochs. The authors suggest using their model as a starting point for building large language models for downstream NLP tasks.
You can use cURL or any other http client to run inferences:
curl -X POST \
-d '{"question": "Who jumped?", "context": "The quick brown fox jumped over the lazy dog."}' \
-H "Authorization: bearer $DEEPINFRA_TOKEN" \
-H 'Content-Type: application/json' \
'https://api.deepinfra.com/v1/inference/deepset/minilm-uncased-squad2'
which will give you back something similar to:
{
"answer": "fox",
"score": 0.1803228110074997,
"start": 16,
"end": 19,
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0
}
}
webhook
fileThe webhook to call when inference is done, by default you will get the output in the response of your inference request