INT8 has significantly lower precision and dynamic range compared to FP32.
High-throughput INT8 math
DP4A: int8 dot product Requires sm_61+ (Pascal TitanX, GTX 1080, Tesla P4, P40 and others).
Calibration Dataset
When preparing the calibration dataset, you should capture the expected distribution of data in typical inference scenarios. You want to make sure that the calibration dataset covers all the expected scenarios; for example, clear weather, rainy day, night scenes, etc. If you are creating your own dataset, we recommend creating a separate calibration dataset. The calibration dataset shouldn’t overlap with the training, validation or test datasets, in order to avoid a situation where the calibrated model only works well on the these datasets. 具有代表性,最好是val set的子集。
Calibration can be slow, therefore, the IInt8Calibrator interface provides methods for caching intermediate data. Using these methods effectively requires a more detailed understanding of calibration.
When building an INT8 engine, the builder performs the following steps:
Builds a 32-bit engine, runs it on the calibration set, and records a histogram for each tensor of the distribution of activation values.
Builds a calibration table from the histograms.
Builds the INT8 engine from the calibration table and the network definition.
The calibration table can be cached. Caching is useful when building the same network multiple times, for example, on multiple platforms. It captures data derived from the network and the calibration set. The parameters are recorded in the table. If the network or calibration set changes, it is the application’s responsibility to invalidate the cache.
The cache is used as follows:
if a calibration table is found, calibration is skipped, otherwise: the calibration table is built from the histograms and parameters
then the INT8 network is built from the network definition and the calibration table.
Cached data is passed as a pointer and length. After you have implemented the calibrator, you can configure the builder to use it:
1
builder->setInt8Calibrator(calibrator);
The make_plan program must run on the target system in order for the TensorRT engine to be optimized correctly for that system. However, if an INT8 calibration cache was produced on the host, the cache may be re-used by the builder on the target when generating the engine (in other words, there is no need to do INT8 calibration on the target system itself).
INT8 calibration cache can be re-used, while engine can not.