The performance of GPU computing

    Presentation during tutorial of 13/10/2011 - Scheme


  1. Inefficient pattern (in the 'serial sense')
  2. APP0: Limited parallelism
  3. APP1: Branching
  4. APP2: Parallel Memory Access
  5. APP3: Synchronization
  6. APP4: Dependent Instructions  (in-thread parallelism)
  7. Configuration
  8. Latency Hiding
Algorithm Optimization:
  1. Resource-tradeoff: reduce resource usage in favor of another
  2. Patterns:

References (with these we 'prove' our completeness)