ONNX Runtime Performance Tuning

ONNX Runtime provides high performance across a range of hardware options through its Execution Providers interface for different environments.

Along with this flexibility comes decisions for tuning and usage. For each model running with different execution providers, there are a few settings that can be tuned (thread number, wait policy, and so on) to improve performance.

This document covers basic tools and troubleshooting checklists that can be leveraged to optimize your ONNX Runtime (ORT) model and hardware.

Refer to a simple demo of deploying and optimizing a distilled BERT model to inference on device in the browser.

Here are some additional topics to explore for more information on performance tuning ONNX Runtime.

ONNX Runtime Performance Tuning

1. Performance Tuning Tools

2. Choosing the Execution Provider for best performance

3. Tips for Tuning Performance

4. Troubleshooting Performance Issues

5. Mobile Performance Tuning