One Library with Multiple Fabric Support
Intel® MPI Library is a multifabric message-passing library that implements the open-source MPICH specification. Use the library to create, maintain, and test advanced, complex applications that perform better on HPC clusters based on Intel® processors.
- Develop applications that can run on multiple cluster interconnects chosen by the user at run time.
- Quickly deliver maximum end-user performance without having to change the software or operating environment.
- Achieve the best latency, bandwidth, and scalability through automatic tuning for the latest Intel® platforms.
- Reduce the time to market by linking to one library and deploying on the latest optimized fabrics.
The Intel® MPI Library is part of the Intel® oneAPI HPC Toolkit.
OpenFabrics Interfaces (OFI) Support
This framework is among the most optimized and popular tools for exposing and exporting communication services to high-performance computing (HPC) applications. Its key components include APIs, provider libraries, kernel services, daemons, and test applications.
Intel® MPI Library uses OFI to handle all communications, enabling a more streamlined path that starts at the application code and ends with data communications. Tuning for the underlying fabric can now happen at run time through simple environment settings, including network-level features like multirail for increased bandwidth. Additionally, this support helps developers deliver optimal performance on exascale solutions based on Intel® Omni-Path Architecture.
The result: increased communication throughput, reduced latency, simplified program design, and a common communication infrastructure.
Implementing the high-performance MPI 3.1 standard on multiple fabrics, the library lets you quickly deliver maximum application performance (even if you change or upgrade to new interconnects) without requiring major modifications to the software or operating systems.
- Scaling verified up to 262,000 processes
- Thread safety allows you to trace hybrid multithreaded MPI applications for optimal performance on multicore and many-core Intel® architecture
- Support for multi-endpoint communications lets an application efficiently split data communication among threads, maximizing interconnect utilization
- Improved start scalability through the mpiexec.hydra process manager (Hydra is a process management system for starting parallel jobs. It is designed to natively work with multiple network protocols such as ssh, rsh, pbs, slurm, and sge.)
Whether you need to run Transmission Control Protocol (TCP) sockets, shared memory, or one of many interconnects based on Remote Direct Memory Access (RDMA)—including Ethernet and InfiniBand*—Intel MPI Library covers all configurations by providing an accelerated, universal, multifabric layer for fast interconnects via OFI.
Intel MPI Library dynamically establishes the connection, but only when needed, which reduces the memory footprint. It also automatically chooses the fastest transport available.
- Develop MPI code independent of the fabric, knowing it will run efficiently on whatever network you choose at run time.
- Use a two-phase communication buffer enlargement capability to allocate only the memory space required.
Application Binary Interface Compatibility
An application binary interface (ABI) is the low-level nexus between two program modules. It determines how functions are called and also the size, layout, and alignment of data types. With ABI compatibility, applications conform to the same set of runtime naming conventions.
Intel MPI Library offers ABI compatibility with existing MPI-1.x and MPI-2.x applications. So even if you’re not ready to move to the new 3.1 standard, you can take advantage of the library’s performance improvements without recompiling, and also use its runtimes.
Performance & Tuning Utilities
Two additional functionalities help you achieve top performance from your applications.
Intel® MPI Benchmarks
This utility performs a set of MPI performance measurements for point-to-point and global communication operations across a range of message sizes. Run all of the supported benchmarks or specify a single executable file in the command line to get results for a particular subset.
The generated benchmark data fully characterizes:
- Performance of a cluster system, including node performance, network latency, and throughput
- Efficiency of the MPI implementation
Sometimes the library’s wide variety of default parameters aren’t delivering the highest performance. When that happens, use mpitune to adjust your cluster or application parameters, and then iteratively adjust and fine-tune the parameters until you achieve the best performance.