Modern GPUs such as the NVIDIA GeForce-8 series are massively parallel, many-core compute engines. According to the semiconductor industry scaling roadmap, these compute engines could scale up to 10,000x the peak performance of 2007 microprocessors by the end of the year 2016. Such a dramatic increase in computation power will likely trigger multiple major science discoveries as well as revolutions in consumer applications. We are experiencing a once-in-a-life-time opportunity in our profession. Like any other massively parallel computer systems, in order to achieve high performance, an application programmer currently has to understand the desirable parallel programming idioms, potential performance pitfalls, and proven coding strategies for the hardware. However, the programming and code optimization models of GPU computing are quite different from those of parallel systems based on traditional CPUS. In this presentation, I will describe the recent results of a collaborative effort between the University of Illinois and NVIDIA on building an infrastructure of CUDA optimization tools, educational materials (courses.ece.uiuc.edu/ece498/al), and coding frameworks for application developers to fully exploit the current and future GPU computing platforms. I will then discuss the coming challenges and some promising work to address these challenges.