A European HPC-centric Benchmark Framework

European Commission

Get funding

Use:
Date closing: March 24, 2026
Amount: -
Industry focus: All
Total budget: -
Entity type: Public Agency
Vertical focus: All

Status:
Open
Funding type:
Geographic focus: EU;
Public/Private: Public
Stage focus:
Applicant target:

Add a Review Send question

Overview

Expected Outcome:

The Project is expected to contribute to the following outcomes:

Enhanced decision-making through comprehensive system comparisons that improves the procurement process for exascale, post-exascale supercomputers and supercomputers with dedicated AI capabilities. This will enable more informed choices regarding the acquisition of new systems and upgrades of existing ones
Competent HPC application developers and end-users in selecting systems that best meet their needs, balancing quality factors like accuracy with considerations of cost, such as time-to-solution
Overall improved operation and fine-tuning of HPC and HPC-AI systems leading to improved performance, throughput and energy optimization, and improved end-user experience
A unified, extensible and well-documented benchmarking framework to easily accommodate new, community-contributed benchmarks with common standards, versioning and control
A well-maintained and continuously updated benchmarking suite for exascale and post-exascale HPC, incl. set of apps, as well as AI models.

Scope:

A. Deployment of a benchmarking framework for designing, developing and executing exascale HPC and HPC-AI benchmarks. The envisioned benchmarking framework will:

offer a fine grained and fair comparison methodology among different HPC systems, i.e. all benchmarks, benchmark run rules^[1] and benchmark submission rules must be designed to ensure reproducibility, repeatability and replicability of metrics on the same system, ensuring fairness and comparability of metrics across different systems
define precise performance metrics with a clear focus on energy-related performance indicators
standardise all benchmarking input- and output formats
collect and report all benchmarking results while offering statistically sound result analyses
ensure that all benchmarks are executable on the respective target environment(s)
offer a standardized structured workflow capturing and streamlining the entire benchmarking process
offer a standardised repository with transparent version control
provide a reference implementation for each benchmark
use a EuroHPC reference system, where applicable, to normalize the performance metrics produced by the benchmarking suite, i.e. each benchmark is run and measured on this system to establish a reference value for that benchmark^[2] subsequently, the normalized performance is the quotient of the performance value attained on the EuroHPC reference machine and the one on the system under test
is of production-quality and ready to assess all EuroHPC supercomputers and supercomputers with AI capabilities,
provides all required templates with relevant input data to properly execute the benchmarking suite on every EuroHPC system.

The benchmarking framework along with its workflows will be realised in a software implementation that offers to the end-user a dynamic workspace for the entire workflow.

B. Establishing a comprehensive exascale HPC and HPC-AI benchmarking suite utilizing the framework developed in the first objective. This benchmarking suite, with its associated performance metrics, will be designed to measure and assess the performance of HPC, as well as HPC-AI^[3] systems at various levels of granularity, encompassing:

Microbenchmarks: Microbenchmarks focus on small or very small building blocks of real programs. They are typically characterized by a narrow focus on a single subsystem and used by component developers or system integrators for assessing the performance and optimizing specific parts of the system, e.g. the memory subsystem, or the interconnect. Examples include: dense and sparse linear algebra operations including tensor operations, spectral methods, n-body methods, (un)structured grid methods and others.
Application and workflow benchmarks: Application benchmarks are used for measuring the performance of a system under typical user inflicted workloads. Applications are comprehensive and attain a broad focus covering multiple components and their interactions. They are used by end-users, system administrators, and procurement authorities who need to evaluate overall system performance and compare different systems for their specific purposes: system selection, system optimization or system procurement. Examples are CFD, molecular dynamics simulation, numerical weather prediction, atomic scale materials modelling and others, AI model training, service/inference. Note that the concept of an application benchmarks encompasses real application benchmarks and their synthetic flavours, proxy applications, mini-apps, kernels and similar. Workflow benchmarks go beyond application benchmarks by accounting for system-performance effects of the flow and control-flow complexities of integrated scientific workflows. These workflows couple computational and data manipulation steps across simulation and modelling, end-to-end AI workflows, and high-performance data analytics.
System benchmarks: System benchmarks offer a comprehensive system performance assessment under conditions where multiple, diverse workloads are concurrently executed and orchestrated by job schedulers and workload managers, reflecting a realistic, multi-user production environment. This involves running a curated portfolio of applications and is used by system administrators for optimizing the performance of schedulers and workload managers and by procurement authorities to assistant in system procurement decision-making. An example is running an ensemble of large AI multimodal model training simultaneously with large scale lattice Boltzmann simulations.

The envisioned benchmarking suite is expected to:

be generally hardware agnostic
provide documentation for developers and end-users
catalogue well-established benchmarks of both technical areas
continuously update the portfolio with novel benchmarks of both technical areas
ensure that each benchmark produces at least one metric, examples are time-to-solution (under a quality constraint), throughput or utilization define reliable and appropriate common metrics to compare the different architectures based on pre-defined criteria (e.g. efficiency)
ensure that all benchmarks and associated metrics will comprehensively cover all relevant workloads and performance aspects ensuring to meet the diverse needs of the European HPC-AI community in a future-proof manner
offer a comprehensive coverage of contemporary and upcoming architectures, utilizing current representative and upcoming workloads from the HPC and HPC-AI domains
be application oriented, reflecting actual use-patterns, use-cases and diverse workloads in all three technical areas (exascale HPC, as well as HPC-AI ), ensuring that the genuine capabilities and limitations of each system is well-captured
ensure the scalability of each benchmark by identifying relevant scale parameters^[4].

Proposals should provide a thorough justification for the selection of each benchmark and performance metric, clearly explaining how they align with the specific requirements and priorities of the European HPC-AI landscape. The inclusion or integration of existing benchmarks under the umbrella of this initiative is encouraged, provided there are prior agreements with the benchmark owners and compatibility with licensing conditions.

Proposals must outline a strategy for ensuring the sustainability and availability of the benchmarking suite beyond the duration of the action, specifically focusing on how to transform it into a community-driven effort. The proposal must also outline a clear IP plan targeting industry needs.

The consortium will actively coordinate with international collaborators to establish common and objective benchmarking standards.

The project will also propose and maintain a detailed strategic development roadmap for the action, which:

anticipates future developments in HPC, including emerging technologies and prospective AI models
identifies and addresses novel opportunities for exascale systems with a clear focus on energy efficiency
foresees the hardware agnostic (ARM, x86, RISC-V) and hardware inclusive (processors, accelerators and hybrid systems) support of heterogeneous systems

The consortium will actively engage with industry and research communities through workshops, working groups, and feedback loops to receive continuous feedback ensuring that all benchmarks are relevant and up to date.

Requirements:

The proposal will eliminate duplication of effort by building on existing European benchmarking efforts and initiatives in HPC, such as Unified European Application Benchmark Suite (UEABS). Each proposal is expected to outline a strategy for aligning with and incorporate their results
The benchmark suite must specify a workload in an implementation independent way
Define a dataset and quality criteria:
1. Detail benchmark specifications to test the supercomputing current and feature systems against SOTA metrics
2. Address relevant metrics of the target system per technical area i.e. Performance, Scalability, Resource utilization, Energy efficiency, and extend where necessary
3. Contribute to relevant standardization efforts, including security standards such as ISO 27001, and AI standards like ISO 22989 and 23053
4. Define a methodology to deal with legacy applications and legacy systems.
Define HPC utilization metrics including breakdown by benchmarking area (microbenchmarks, application benchmarks, mixed-workload benchmarks) and corresponding qualitative and quantitative KPIs to drive the development towards the objectives:
1. Define effective KPIs between the different benchmarking areas
2. Collect and analyse user feedback to evaluate how the benchmark suite efficiently and fairly compares diverse systems
Define a mechanism to monitor the benchmarking framework and pool appropriate existing benchmark suites, relevant for architectures of all participating HPC centres for deployment in a common data repository:
1. The developed automation framework together with the benchmarks will be onboarded to a common software repository created within other EuroHPC initiatives
2. Enable continuous improvement, e.g. within an automated integration and testing workflow for the benchmark suite and framework repository, with appropriate tools, including version tracking of the benchmarks (where applicable including the data sets, build infrastructure, etc.)
3. Define a mechanism for extending the benchmark suite: identification, selection, and standardisation of future relevant benchmarks, governance
4. Extensive user documentation must be prepared and deemed sufficient by the users to effectively understand and use the benchmark suite.
The consortium should demonstrate complementary expertise regarding the two main technical areas/key topics that add up to the modularity layered benchmark framework
The benchmarking framework and the encompassing benchmarking suite will be made available to the user communities under the European Union Public Licence (EUPL)
The benchmarking framework will be defined through a consensus among stakeholders representing the HPC and HPC-AI communities, ensuring alignment with their diverse needs. This collaborative approach will establish a single point of agreement, providing a unified standard that accommodates the evolving landscape of high-performance computing and its related fields
All technical and legal aspects should already be addressed at the proposal stage and not deferred to a later time or the consortium agreement. Where required, an appropriate modification of, e.g., the general terms and conditions for users of supercomputers should be elaborated and implemented by the participating HPC operators.

[1] Run rules define required and forbidden hardware, software, optimization, tuning, and procedures.

[2] When two different systems are compared with the same benchmark, their performance relative to each other must be invariant, even if different reference machines are used.

[3] We shall refer to conventional HPC and HPC-AI systems and benchmarks collectively as HPC-AI systems and benchmarks.

[4] For example, the scale parameter for an FFT benchmark is the window size and the scale parameters for AI model training applications include the size of the dataset, model size, and, in some cases, the number of models being trained simultaneously (e.g., in bagging scenarios).

Last updated on 2026-03-05 13:51

Questions & Answers (0)
Reviews (0)