ONNX Runtime Performance Tuning
ONNX Runtime provides high performance across a range of hardware options through its Execution Providers interface for different environments.
Along with this flexibility comes decisions for tuning and usage. For each model running with different execution providers, there are a few settings that can be tuned (thread number, wait policy, and so on) to improve performance.
This document covers basic tools and troubleshooting checklists that can be leveraged to optimize your ONNX Runtime (ORT) model and hardware.
Refer to a simple demo of deploying and optimizing a distilled BERT model to inference on device in the browser.
Here are some additional topics to explore for more information on performance tuning ONNX Runtime.