GPU Computing Project

theory - documentation - lab

GOAL: implement an algorithm on GPU, analyze and model its performance and optimize it.

Topics

You are free in choosing a topic. For instance, you can parallelize the algorithm of your thesis, or another one that interests you.
Discuss your choice with the professor.
Inspiration can be found in the project section of the PPP course and below you also find GPU-specific topics.

Special topics

You can also choose among the following topics:

Test and use SYCL (see khronos) for your GPU implementation

device code is specified in source code with advanced C++ constructs (Lambdas)
see also coldplay.com -> ComputeCpp based on SYCL - for linux

Make a memory access logging and analysis system

The memory access pattern greatly determines the performance.
Log at runtime each memory access (adding 'probes' to the code that write memory address to an output array)
Analyze the memory log: check the patterns for coalesced/uncoalesced access and bank conflicts
Predict the runtime

Develop and test more intricate microbenchmarks

e.g. detect the scheduler of GPUs
in-depth analysis of Intel/AMD GPUs which are different

Objectives

For a (frequently occurring) computational problem you should develop an efficient GPU solution in OpenCL.

Implement or get the sequential version.
Create a naive, GPU implementation which is correct (compare with sequential result).
Also measure speedup, performance (flops) and bandwidth. We advice you to make some management code (or script) that automates this!
Optimize the naive implementation by trying to overcome the performance limits it contains.

Study alternative optimizations and compare their performance.

Model the performance with the pipeline model, test how close the simulated performance is to the real performance.

Make sure to invest enough time in the comparisons and modeling. The focus is not only on the complexity of the GPU implementation! It's better to have a less complex implementation with a good analysis!

Organization

Here are the rules for the project:

The project will be under the guidance of Jan Lemeire.
You can work alone or in groups of two/three.
The deadline to choose a topic is 25th March, 2025.
Meet us somewhere halfway the project to discuss your current problems, to let us give advice and to define the expected end result. Make an individual appointment.
The deadline for the project is [to be discussed with students].

Programming and Optimization Tips: see end of lesson 5.

We expect the following deliverables:

All relevant code related to the project.

sequential code + parallel implementations
the parallel versions should check their result with the one of the sequential version to proof that the outcome is correct!

A short report that describes:
- The problem (brief).
- The different implementations. You can be brief here, since we have your source code. A diagram or scheme might be helpful here.
- Links to sources of information and of source code.
- Description of the GPUs used. Refer to the sharepoint excel sheet.
- Most important: a discussion of the performance of the different implementations

Additional topics

a) Tree Construction and Tree Traversal (can be split into 2 separate topics)

Given a mesh of vertices (forming triangles) describing a 3D volume:

We want to know in each point in space the distance to the volume, i.e. the distance to the closest vertex/triangle:

A naive way to do this is to traverse for each point in space over all vertices and find the closest one.
If the mesh contains about 1 million vertices and for the space 512x512x512 points are considered, this will take ages...
Finding the closest vertex fast happens by tree describing a Bounding Volume Hierarchy:

the tree and tree traversal is explained here (although it is not exactly the same problem).
construction of the tree is explained here.

CUDA-code is provided. Use if for our problem, make it OpenCL and think about optimizations. On a CPUquadcore i7 the above mesh takes 15 minutes to complete... On a GPU?

we will provide you the data.

Project 2023

Assignment
Plane Sweep Code

Q&A project: 17 April
deadline project: May 15
exam: 13-14 June