Preferably use Visual C++ (community edition is for free)
Consider the following basic algorithms:
Implement and compare the performance based on the following
approaches:
Compare the outcomes to make sure the algorithms are doing the
same thing (use random data as input).
Measure time, calculate computational performance
(operations/second), bandwidth (bytes per second) and cycles per
basic instruction (query the clock frequency).
Compare this for all the versions.
Calculate the speedup with the sequential version with full
optimization (-O2 flag).
Also compare the sequential version with versions with lower
optimization levels and manual optimization. As such, try to guess
what optimizations the compiler is doing.
Run your code on at least 2 systems.
Describe in detail the processor and GPU characteristics.
Estimate the peak performance of the processor (the theoretical
maximal performance which the processor can deliver in ideal
situations, for both memory access and computational performance)
and measure the GPUs with our microbenchmarks: www.gpuperformance.org.
Test in advanced mode and upload the results to the database.
You can work with 2 or alone.
You can visit me to discuss the results and get feedback.
Try to explain the different results. Is the peak performance
reached? Why not? why are optimizations better? or not?
Did auto-parallelization and auto-vectorization work? Check the
compiler reports!
Check the assembler code for automatic loop unrolling and
vectorization.
Additional questions:
As results we expect:
The code, preferably following my
template.
The experimental results on at least 2
systems (runtime, speed, bandwidth, ... see my template)
you can write your results to file with the
following command in cmd.exe: VectorElementWiseProduct.exe
> results.txt (do not forget to press enter
if required by the code)
Description of the CPU-GPUs used, theoretical
peak performance estimation of CPU, results of GPU microbenchmarks
in our database
Analysis of the results: explain why optimization
tricks and parallelization works or does not work
- Back to the top -