CUDA Toolkit (Tools & Utilities) のバージョン選択について
CUDA Toolkitについて、オフィスサイトウとして仕事で納品するものは CUDA Toolkit 3.2 を使用しています。理由・・・3.2 での依存要求が多いため。
かといって全く 4.0 を扱わないのではありません。以下、 CUDA Toolkit 4.0 説明について
http://developer.nvidia.com/cuda-toolkit-40
Easier Application Porting
- Share GPUs across multiple threads
- Use all GPUs in the system concurrently from a single host thread
- No-copy pinning of system memory, a faster alternative to cudaMallocHost()
- C++ new/delete and support for virtual functions
- Support for inline PTX assembly
- Thrust library of templated performance primitives such as sort, reduce, etc.
- NVIDIA Performance Primitives (NPP) library for image/video processing
- Layered Textures for working with same size/format textures at larger sizes and higher performance
Faster Multi-GPU Programming
- Unified Virtual Addressing
- GPUDirect v2.0 support for Peer-to-Peer Communication
New & Improved Developer Tools
- Automated Performance Analysis in Visual Profiler
- C++ debugging in CUDA-GDB for Linux and MacOS
- GPU binary disassembler for Fermi architecture (cuobjdump)
- Parallel Nsight 2.0 now available for Windows developers with new debugging and profiling features.
Check out the NEW CUDA 4.0 Math Library Performance Review
Find all the latest versions of other Libraries and Tools on our Tools & EcoSystem Page
The latest released NVIDIA Drivers are always available at www.nvidia.com/drivers
For previous releases, see the CUDA Toolkit Release Archive
Get yourself fully trained- check out the latest CUDA Webinars
Become a CUDA Registered Developer, report bugs, engage with NVIDIA engineering
Jump to: [Windows][ Linux ] [ MacOS ]
===
CUDA 4.0 Library Performance Overview
The performance of many math functions has improved with the release of the CUDA 4.0 Toolkit.
This presentation includes the performance results of many of the key functions.
Results include performance measurements for :
- cuFFT – Fast Fourier Transforms Library
- cuBLAS – Complete BLAS Library
- cuSPARSE – Sparse Matrix Library
- cuRAND – Random Number Generation (RNG) Library
- NPP – Performance Primitives for Image & Video Processing
- Thrust – Templated Parallel Algorithms & Data Structures
- math.h – C99 floating-point Library
CUDA_4 0_Math_Libraries_Performance_6_14.pdf | A review of the performance of CUDA 4.0 Math Libraries, including cuFFT, cuBLAS, cuSPARSE, cuRAND, NPP, Thrust and others |
以上