Intel® VTune™ Profiler

Locate Performance Bottlenecks Fast

Stop guessing why software is slow. Advanced sampling and profiling techniques quickly analyze your code, isolate issues, and deliver insights for optimizing performance on modern processors.
Intel® VTune™ Amplifier is also available as part of the Professional and Cluster Editions of Intel® Parallel Studio XE (for Linux and Windows).


Why Performance Analysis?

Without the right data, you’re guessing about how to improve software performance and are unlikely to make the most effective improvements. Intel® VTune™ Profiler collects key profiling data and presents it with a powerful interface that simplifies its analysis and interpretation. Optimize software for:

  • High-performance computing (HPC) in weather forecasting, finite element analysis, and bioinformatics
  • Embedded applications for IoT, transportation, and manufacturing
  • Media software for video transcoding and image processing
  • Cloud applications or Java* services in containers
  • Device drivers
  • Game engines
  • Storage that includes Storage Performance Development Kit (SPDK) and Data Plane Development Kit (DPDK) polled software
Additional Capabilities
Single ThreadOptimize single-threaded performance.
MultithreadedMultithreaded - Effectively use all available cores.
SystemSystem - See a system-level view of application performance.
Media & OpenCL™ ApplicationsMedia & OpenCL™ Applications - Deliver high-performance image and video processing pipelines.
HPC & CloudHPC & Cloud - Access specialized, in-depth analyses for HPC and cloud computing.
Memory & Storage ManagementMemory & Storage Management - Diagnose memory, storage, and data plane bottlenecks
Analyze & Filter DataAnalyze & Filter Data - Mine data for answers.
EnvironmentEnvironment - Fits your environment and workflow.

The Intel® VTune™ Amplifier is available as part of the Intel® oneAPI Toolkits.

Using OpenMP & Intel® VTune™ Profiler for Faster Code and Improved ROI


training video

shows the performance effect adding a single OpenMP statement to some Fortran code which is then compiled with Intel® Fortran.

Since the example being used contains a load balance problem, Intel® VTune™ Profiler is used to analyse the performance of this demonstration program.

VTune pin-points the hot-spot that causes the imbalance.

Then, after explaining the imbalance situation, it is shown how to change the OpenMP statement, so that the run-time performance of the example program is further increased.

Steps taken to improve run-time performance

Adding an OpenMP statement to the Fortran code
Setting up the performance analysis in VTune
Explaining the load imbalance and how to change the OpenMP code
The results: serial / with OpenMP / with OpenMP load-balanced / with OpenMP load-balanced and optimised binary code

Finally it is shown that the performance could be maximised by using the optimization possibilities of Intel Fortran.

The maximum speed-up on an Intel Core i7 (4 cores): 3.6 faster than the regular serialised running code.

Additional information


Full (new lic.) incl. 1 year support, Pre-Expiry Support Service Renewal (SSR), Post-Expiry Support Service Renewal (SSR), Upgrade


single user, floating network, 1-seat


educational, regular / commercial