Friday, December 23, 2011

openGL iterating post 4

4th and final day, here's what I've got for you:

Iteration 14: View ports
Iteration 15: Adding animation and resizing

only 2 more iterations today. But That wraps up my series of commentaries I think. There is more, it goes on to texturing and loading an obj. but as of iteration 15 you have all of the most basic systems in place for a video game. Particularly you should feel well equipped to be able to continue on to other tutorials. i might describe a little more how graphics systems work later. but for now I'm happy with what I've accomplished in this series. enjoy.

Thursday, December 22, 2011

openGL iterating post 3

And now it looks like i posted twice in one day! false. I was just not quite as quick to the publishing as I am tonight. shazam! with seconds to spare I post todays exploits!

Iteration 8: uniforms
Iteration 9: glutIdleFunc
Iteration 10: glm
Iteration 11: 3D coordinates
Iteration 12: transform
Iteration 13: the CUBE!

Links to the individual files have been replaced by a single link to a github for my blog:
This tutorial's files can be found under the folder "OpenGL"

Wednesday, December 21, 2011

openGL iterating post 2

Technically this is on the 22nd, but I really meant it to be 20 minutes ago. anyway. I've got the next batch of commentaries done. I've taken to calling them commentaries in contrast to tutorials, because if you want the tutorial, go to the wiki book:
That is just the tutorial I'm following. I'm just adding to it to help some people along. mostly this are honestly just to help me think through things. but in the off chance that they help some one else. sweet. enjoy:

Post #2:
Iteration 3: Reworking to a more realistic structure.
Iteration 4: Introducing: vertex buffer objects!
Iteration 5: Passing info to the vertex shader.
Iteration 6: Passing info from vertex to fragment shaders.
Iteration 7: Interweaving in buffer objects.

Links to the individual files have been replaced by a single link to a github for my blog:
This tutorial's files can be found under the folder "OpenGL"

Tuesday, December 20, 2011

openGL iterating post 1

I've started to write up my iterations through to make an openGL program. I'm really just following through with this tutorial: but I hope my comments will help newbies ease into openGL a little better. here they are, enjoy:

iteration 1: Making a bloody window
iteration 2: Making a basic render pipeline

Links to the individual files have been replaced by a single link to a github for my blog:

Sunday, December 18, 2011

Interface to the World

I've been trying to get it working so that I can display the values generated by my opencl gravity simulation. Right now it out put the info to a text file. the blender script I had once upon a time has since vanished and so not really interested in rewriting that script, I've set about trying to make things in OpenGL, and I've come to a realization.

I've come to the conclusion that there are two approaches to learning these sorts of things (programing api, languages, interfaces, etc). You can take a case example and accomplish only and exactly what you want. A good example of this is me with blender scripting. I'm relatively useless in scripting blender, but the things I have accomplished have proven to be quite handy. The advantages of this approach are clear. it is fast, nimble, and gets you your end result in a satsifactory way. The cons are that it is very focused in potential, and what you learn has very little re-use-ability.

Then, there is the approach of learning by building blocks. In this approach you learn more than what you want and spend a lot of time doing it. A good example of this would be my classes. I don't know when I'll use half of the things I hear about, but I know what to do when I come accross them, I have been familiarized with the topic. The advantage here is that you know the whole system. When you are given a new task, you can start up right away with a clear idea of where to go and what you need. the result is you get from point a to point b really quickly because you've taken the route 15 times before.

Now, this is coming out because I'm trying to work with openGL and openCL at the same time. I've managed to accomplish little things in both. But when I want to share information I need to use openGL buffers. I've not used these in either. for openCL it won't be to hard, but i need to learn them in openGL as well. and so I'm taking a step back. before I can actually accomplish what I want to accomplish, i need to get a better understanding of how openGL works. So openCL, my love, you will have to wait.

I'm going to start posting a series of code files like I did for my openCL example tomorrow. my objectives for this tutorial are as follows:
1. platform independent code. how you compile is up to you to figure out. (pst, if you use ubuntu compiling is "gcc -o test code_file_name_here.c -lGL -lGLU -lglut", glad we had this talk)
2. leave no step uncommented. like I did with the cl tutorial. explain what every step is, what it does, why it matters. This I think is the problem a lot of tutorials I've tried to read have had. Sure, when you have a general idea about how the system works it is a bit more information than you want to read. but when you are new, it is the only thing keeping you afloat. these tutorials are for some one who is completely new. Which happens to include myself. funny how that works.
3. itterative code files. explain how to accomplish one aspect of it, create the code file, have the code file for download, and repeat. slowly build to a final result which is finally capable of getting the read of the tutorial somewhere.

In terms of what I hope to accomplish. I don't really know yet. I don't really know the scope of what there is to be covered to really be able to write what I want to cover, but I'll post here updates as I get i itterate through:

Post #1:
Step 1: Make a bloody window.
Step 2: Making a basic render pipeline.

Post #2:
Step 3: Reworking to a more realistic structure.
Step 4: Introducing: vertex buffer objects!
Step 5: Passing info to the vertex shader.
Step 6: Passing info from vertex to fragment shaders.
Step 7: Interweaving in buffer objects.

Post #3:
Step 8: uniforms
Step 9: glutIdleFunc
Step 10: glm
Step 11: 3D coordinates
Step 12: transform
Step 13: the CUBE!

Post #4:
Step 14: View ports
Step 15: Adding animation and resizing

Saturday, December 3, 2011

Memory Access

I figure I should post something, but I haven't had a chance to really come up with anything about OpenCL or GL to post. I have been looking at information, just not had the chance to make any exploits. I watched through all 6 videos on OpenCL at Mac research, which are fantastic and I highly advise you to go watch those if you are reading this. But one of the videos comments that depending on how you use the memory banks reffered to as "Local" on the graphics card, they can essentially registers. Then it hit me, not every programmer knows what that means. not many really understand how memory flows when you are dealing with a CPU let alone a GPU. So I thought I'd take a moment to explain how this works.

I'm going to actually explain it, but I thought I'd give you a little analogy to reference before I get going. there are 4 kinds of memory, registers, Cache, Ram, and Disk. if you were to equate them to note books, registers would be the last couple and the next couple letters you wrote (a modern 32-bit processor has 6 4 byte registers, a 64-bit processor has 14 8 byte registers). The Cache is the current page of the note book you are writing on. RAM is the stack of notebooks on your desk. and Disk is the dozens of bookshelves filled with notebooks around the room.

reason for this is simply put here:

Blazing fast Fast Slow Painfully slow
Register Cache Ram Disk

Bytes Kilobytes Gigabytes Terrabytes
Register Cache Ram Disk

Now that you get the general idea for these various kinds of memory. Lets start first with where these are located. Both Registers and the Cache are in the actual processor, and therefore when you buy a computer, the processor is the peice that determines the size of these two memories. When the processor executes a machine level the only memory it is capable of accessing is the register. There for it makes sense that these are the fastest. the reason for the tiny size is a matter of addressing and locality. To keep the instructions small, the number of registers must be small. a typical instruction has to address 3 registers, for example addition: a+b store in c. it has to address a,b, and c. The other reason is that the further the information has to travel, the longer it takes to get there. Registers are right there ready to go into the circuitry that is about to is about to be executed.

Cache is the next level in memory. because of its proximity to the processing, it is also very fast. But when the processor wants to use information saved in the Cache it must do a load command to pull the information from the cache to the registers so it can be manipulated. Registers run on the order of 752 gigabits persecond while cache runs closer to 16-24 gigabits persecond. a bit of a speed difference.

Now, to the Processor, the Cache and RAM actually look like the same thing. The hardware will actually direct the processor's load command to the correct location, be it Cache or RAM. So why not have a larger cache? On a processor die(die, the actual circuitry of the processor.) which you can see here: When you look at this you can see there is a large dark portion on the left that is just a repeating pattern and a lighter portion with lots of paths and bunch of constructs of some sort. the right side is the actual processor, the right side is the cache. where are the registers you might ask? well, they are far to small to even see. each of the 4 byte registers in this processor are the same size as the 4k registers that make up the cache. 4k registers, 6 registers. you can imagine why you might not be able to see them. but back to my point. the cache is half of this chip. half. this is why the cache isn't bigger. Memory, compared to the actual processing circuitry, is quite large.

The next level of memory is the system memory, or RAM. RAM stands for Random Access Memory. The cache and system memory are technically both RAM, but we make the distinction between the two based mostly on the cache being housed by the processor. In terms of speed, modern Ram is capable typically of about 8-16 gigabits persecond, which isn't much slower than the cache, it is slower, but not the speed difference between the cache and the registers. RAM is where we start working with latency though. Latency is the amount of time it takes from when you give a system input to when it starts giving out put. If you go buy ram online it will give it's latency as one of the statistics about it. ram has a latency between 6-9 ms(milisecond). that means that when you send a request from the processor to the ram there is a 6-9ms delay between the request and the response. this is huge. stargeringly huge.

lets say you have fast ram at 6ms. and you have a fast processor at 3GHz. 3GHz just means that in 1 second it makes 3,000,000,000 cycles, or 3,000,000 cycles per ms. if your latency is 6ms, that means 3,000,000*6 or 18,000,000 processor cycles pass in the time it takes to get your request from the RAM. that is a lot of wasted resources. Modern processors have memory controlers that deal with getting information from the ram to the cache to minimize the frequency of what are refered to as "cache misses" where you have to wait for information to be transfered from ram to the cache.

The last level of memory is disk. this may be solid state drives magnetic disk drives. Solid state drives are actually a system similar to RAM. They use a type of ram called flash memory. One of the features of RAM is that if it looses its power all the information is lost. Flash memory on the other hand does not loose its information if it looses memory. Solid state drives are pretty fast. Capable of 1-3 gigabits persecond (though you'll typically see them listed in megabytes persecond. translating what I said then into that same form, 125-375 megabytes persecond). Magnetic disks are slugish in comparison at 40-200 megabits persecond. thats 0.04-0.2 gigabits persecond.

So why not SSDs all the way? this one I'm sure most of you know as these systems are quite as abscure and mysterious as registers and caches. for 100$ you can get a 64gigabyte SSD, about an order of magnitude larger than your RAM. for 100$ you could also get a 750gb magnetic disk drive. So the constraining feature of SSDs then is cost.

So to summarize what I've said. Registers, extremely fast, capable of dumping their contents every cycle, but because of limitations imposed by the size of instructions and the speed at which light travels, there can only be 6 of them. Cache, quite fast, resides on the processor itself, but due to size limitations, can't be bigger. Ram, slow, it has vast amounts more data than the cache, but again because of limitations of size and the speed of light (a photon at one end of a stick of ram will only be able to travel as far as the other end of the stick of ram in the time it takes for your CPU to complete a cycle.) as well as heat (if we try to pack ram closer, it is going to start to melt itself due to an inability to disperse the heat it produces.) ram is limited to a couple gigabytes. Lastly there is hard drives. These hold huge amounts of data, but are quite slow in getting to and reading that information as well. So when you compare the two end of the spectrum, a registers hold about 32 bytes of information and can transfer about 752 gigabits. in contrast, a magnetic disk can hold a terabyte or more, which is 31,250,000,000 times as much data, but can only transfer about 0.2 gigabits of it persecond, which makes registers about 3960 times faster.

All the kinds of memory are quite necessary to get your computer running quickly and hold the large quantities of data that we have come to enjoy. In my mind it is kind of like orbits of planets. futher planets orbit very slowly but have huge orbits, and planets that are painful close to the sun have tiny orbits but orbit ridiculously fast.

So, when it the Mac Research guy said that the local memory buffers on the GPU are so fast they can function as registers, it is amazing. it is trun the small handful of registers into a large group of almost 4k. that has amazing potential. Storing information in the local memory you can get calculations to just scream. Typically the GPU are clocked at about a 5th of the speed of a CPU. If you can things running in the local memory buffer, then you can get each individual stream processor to start to give the CPU a run for its money. that is crazy fast.

That was a lot longer more than I had intended to write, and there are somethings that I left out. But I'll wrap this up here for now. Besides, its not like anyone reads this so it doesn't really matter. :p

Thursday, November 3, 2011


Code writing has been put on hold for the month of November for the post part. I'm partaking i nanowrimo so I'll be spending the majority of my time writing writing a novel. I've done some research on concurrency of which I'll write about here soon. but for now, I probably won't post again till much later. Not that anyone reads this to care. xD

Thursday, October 27, 2011

Working with openCL programs

So. it turns out, programing and debugging is hard. Not that that should suprise anyone. Particularly if you are reading this. But what i'm refering to particularly is trying to program and debug openCL kernels. I like to think I'm a pretty slick programmer. I should really use a debugger more often then I do, but I can manage typically to figure out all of my class assignements using only compiler out put and a lot of thinking. Trying the same approach with these OpenCL kernels has miserably failed. The reason being, I really have no clues as to why compiling kernels fails. This whole system is still very new to me, so having no clues is remarkably not helpful.

So I went out in search of help. What I found was the intel sdk offline compiler. But it was a .rpm not a .deb. But I decided to press my luck and give it a go. I converted the .rpm with "alien sdk.rpm" and then installed it. Took me a moment but I eventually figured out that the program names were "ioc" and "". Then I tried ioc. gave me an error about shared library not found. so i tried, it gave me a pair of errors but the gui still came up. I tried to compile, and no luck. nothing happened, say for the build log turning red. The fix was simple obvious, though to my great embarrasment i didn't realize such for almost an hour and a half. sudo apt-get install libnuma-dev was all I needed to get it working, on Red Hat and its kin this may not be a problem.

So now I have intel sdk offline compiler, or IOC. What can I do with this then eh? Well, I can start with code that I think works like this:

__kernel void force(float** Galaxy, const unsigned int starc)
int i = get_global_id(0);

float x,y,z,d,force;
int j;
for(j = 0; j < starc; j++)
if (j == i) continue;
//find relative distance
x = Galaxy[i][1] - Galaxy[j][1];
y = Galaxy[i][2] - Galaxy[j][2];
z = Galaxy[i][3] - Galaxy[j][3];
d = x*x+y*y+z*z;
if (d == 0) continue;
force = ((0.00000066742799999999995)*Galaxy[i][0]*Galaxy[j][0])/(d);
Galaxy[i][7] = (x*x)*force*(-1)/d;
Galaxy[i][8] = (y*y)*force*(-1)/d;
Galaxy[i][9] = (z*z)*force*(-1)/d;
}//end for loop

but know it doesn't because my program fails to build it every time. enter the code in the IOC and get a compiler out put like this:

Using default instruction set architecture.
Intel OpenCL CPU device was found!
Device name: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
Device version: OpenCL 1.1 (Build 15293.6649)
Device vendor: Intel(R) Corporation
Device profile: FULL_PROFILE
:1:1: error: kernel argument can't be a pointer to pointer in OpenCL
:1:1: error: kernel argument can't be a pointer to pointer in OpenCL

Build failed!

well dang. There goes a base assumption on how I was going to work this. It would seem that you can't have an array of arrays in openCL. Not to big of a problem really. in this case, my array is an array of arrays of uniform size. each array is even 10 elements, making it real easy to just simply have it be 1 big array. if i want Galaxy[10][3] i just give it instead Galaxy[10*10+3]. now to rewrite it that way I get this:

__kernel void force(float* Galaxy, const unsigned int starc)
int i = get_global_id(0);

float x,y,z,d,force;
int j;
for(j = 0; j < starc; j++)
if (j == i) continue;
//find relative distance
x = Galaxy[i*10+1] - Galaxy[j*10+1];
y = Galaxy[i*10+2] - Galaxy[j*10+2];
z = Galaxy[i*10+3] - Galaxy[j*10+3];
d = x*x+y*y+z*z;
if (d == 0) continue;
force = ((0.00000066742799999999995)*Galaxy[i*10]*Galaxy[j*10])/(d);
Galaxy[i*10+7] = (x*x)*force*(-1)/d;
Galaxy[i*10+8] = (y*y)*force*(-1)/d;
Galaxy[i*10+9] = (z*z)*force*(-1)/d;
}//end for loop

and it says this:

/*blah blah blah you don't care about my computer architecture so I'll leave that out*/
:1:1: error: Arguments to __kernel that are pointers must be declared with the __global, __constant or __local qualifier in OpenCL
:1:1: error: Arguments to __kernel that are pointers must be declared with the __global, __constant or __local qualifier in OpenCL

Build failed!

A VAST! so that is where my major logical disconnect is. Galaxy needs to be global! so I change my first line to:

__kernel void force(__global float* Galaxy, const unsigned int starc)

and it says:

Build started
Kernel was not vectorized
Build succeeded!

Not sure what it means by "Kernel was not vectorized". So i tried looking at the assembly code. Upon looking at said code it became incredibly clear that I don't know what the heck I'm looking at. I mean, I know assembly, but not for my graphics card, so i decided that if the build succeeded, I'll be content and try plugging this back into my program. But in doing so I took a cue from the big blob blog and moved the code to a seperate code file. This was done largely because with this offline compiler I can work with kernels in it and then save them as .cl files. trying to then turn it into a string just sounded like more trouble than it was worth, so i'm going to read it in instead. his method for loading the kernel source is this:

#define MAX_SOURCE_SIZE (0x100000)
// Load the kernel source code into the array source_str
FILE *fp;
char *source_str;
size_t source_size;

fp = fopen("", "r");
if (!fp) {
fprintf(stderr, "Failed to load kernel.\n");
source_str = (char*)malloc(MAX_SOURCE_SIZE);
source_size = fread( source_str, 1, MAX_SOURCE_SIZE, fp);
fclose( fp );

and that looks like a pretty good idea to me. so i shamelessly copied it. this also ment a minor alteration of my program creation line from:
program = clCreateProgramWithSource(context, 1, (const char **) & KernelSourceForce, NULL, &err);
program = clCreateProgramWithSource(context, 1, (const char **) & KernelSourceForce, (const size_t *) & source_size, &err);
tehcnically I could probably get away with out includeing "source_size" as the string should be null terminated. but since I have it, may as well pass it along. also, you'll notice the cast that he had that i have also "(const size_t *) &" reasons for this. firstly, because of the nature of the beast, you can't actually pass the value, it has to be a pointer. even though passing a 32-bit int is smaller than a 64-bit pointer, it just must be this way. as for size_t. that has to do also with nature of the beat. size_t is a platform depend unsighned data type. so this allows openCL to make your int readable by the GPU.

with those simple changes I compiled and ran and the kernel built! oh happy happy day!

Tuesday, October 18, 2011

Library wrappers

I'm trying to work as hard as I can on this project, but trying to swallow open CL and open GL is certainly no simple task. But I'm also being made aware of cross platform issues. while it is true that openGL and CL are just specifications and are therefore platform independent, it doesn't mean that the platform doesn't factor into it. the vast majority of the API is portable and will run on Linux/Mac/Windows just fine with out any trouble. Where the problems lie is in making calls to the system.

In linux, in most cases, this means GL/glx.h, in windows it means windows.h. And I don't actually know what it is for mac (and I'm only mostly guessing with windows). While these libraries are portable, there are implementation differences between them. David Rosen in his post on the Wolfire blog, said "'re going to be wrapping these low-level APIs in an abstraction layer anyway..." and I disagreed at first. I thought that with larger structures sure, or if there was more programmers than just mean and I wasn't trying to just do it quick and dirty, probably, but that just seemed like an extra step for nothing.

As I plung further into these libraries I get where he was coming from. Allow me to explain the reason why you must write a library to abstract away from the graphics/etc API, and what that abstraction might look like.

1. Portability:
As I said earlier, just because openGL and openCL are portable, doesn't mean that they don't have implementation differences. The abstraction layer exists to prevent you from having to think about those differences more than once. And to port your engine to a new system with a different implementation, all that needs to be done is write a new abstraction for that system and the rest of your graphics (or physics) works fine and dandy.

2. Simplicity:
It is actually faster. Portability aside, you will achieve your result faster with a layer of abstraction than you will with trying to go with out one. OpenGL and OpenCL are not just game libraries, they are incredibly powerful general purpose graphics and computing libraries. Meaning, there is a lot to them and in the average game you'll probably only use a quarter of what they are capable of doing. Trying to keep track of the whole thing is a waste of time and energy. Abstract out what you need out, and forget the rest.

3. Personality:
This follows along with simplicity. You program will be unique to what it needs to do. it will (and if it doesn't it should) naming conventions, best practices, and ways of doing things in general. CL/GL are not your program. Cl/GL do not have your naming conventions, best practices, nor ways of doing things in general, nor will you theirs. Making a layer of abstraction makes the API your own. It also allows you to optimize as your program needs. If you use cubes alot you can abstract out a data type that is specific to cubes, this allows your code to assume things. if you know something is a cube, you need not say it has 8 vertices, you can assume that, you need not say the sides are all equal, you can assume that, and thus, you can describe a cube with only 4 data points (locx,locy,locz,scale) in contrast to 24 (3*8 vertices locations). The library allows your program's personality be polished.

4. Debugging:
When you abstract the library away, when you have one problem, it shows up everywhere, but if you fix that problem, it is fixed everywhere. If you don't abstract and you have a lot of code that is kind of similar, you can spend a lot of time chasing down similar bugs, or missing them completely and have error ridden code. one location for makes it easier to keep working properly, clean, and efficient.

How this might look then is a simple hierarchy. (Hierarchies are your friends) You have at the top a single library interface for the rest of your code. inside of it, you have platform/implementation specific plug-gins for each target platform. those are all wrapped in a uniform package to create API calls, regardless of implementation, the same. Then you have the data structures and functions that are specific to your program, allowing you to optimize and specialize relatively easily.

Just some simple thoughts. I'm still learning, so take my advise with a grain of salt. But it is certainly food for thought.

Friday, September 30, 2011

Open CL first program

I struggled to find a really good program to getting started on openCL. so, in typical programmer fashion, I decided to make my own that fixed where others came short. I tried a couple places to post this, but no where had really good support for posting my code. What I ended up doing was realizing that I can host files from UbuntuOne. You can get the file from here.

The entire program is in one file for simplicity (first of my complaints against other examples). I go through the minimum steps to accomplish what is needed (another complaint) and explain what each step does and what the arguments that go into it are (my other big annoyance).

Remember, I'm learning this to as I blog, so I explain everything to the best of my understanding (by basically reading the openCL reference guide and rewording it). There are a couple things in that file that I out right said, I don't understand. But Considering the amount of comments I put in, if you read all of this, you should have a very good grasp on the basic steps. I might format it more down the road to make it prettier, easier to read, and more consistent. Any comments for improvements are welcome.

But from this point you have a code base to start from. And it is always always easier to start from a code base then to start from scratch. iterative development ftw!

Tuesday, September 27, 2011

Open Interfaces

So, it has been my intent to figure out how to program a game engine from the ground up for some time now. But I've never managed to get over the difficulty of the interfaces. Let me first define "ground" because this could be vauge. Ground in this case starts where driver interfaces end. Ground level stands on top of Open GL (for graphics), Open AL (for audio), and Open CL (for physics). With these 3 interfaces a professional level game engine can be made. Am I going to make a professional level game engine in this blog? lord no. But with my faithful copy of "Game engine Architecture" by my side, I'm certainly going to make something for the betterment of all young game developers just trying to learn how things work.

For me, one of the hardest steps has been simply trying to over come getting started. What I'm particularly refering to is getting those 3 open interfaces installed and working. Today, for the first time ever, I have succeeded at doing such. this took me months. don't be upset if it takes you some time, and if my information here doesn't help you, start asking on stack overflow. That being said, I'm running Ubuntu 11.04 right now. I will probably post later about how to do this in windows, but if you want to be serious about developing the c and c++ code that I will be talking about here, get yourself on linux. all sorts of options avaliable for doing such, if you are scared, try virtualisation, if you are confident duel boot. if you are bold, reinstall and put windows to rest once and for all.

With open GL and Open Cl the difficulty rested largely in figuring out why things weren't compiling. I read at least a dozen getting started guides. none helped. say for the last line of one in the opengl wiki ( "gcc -o example example.c -lX11 -lGL -lGLU" that is the line for compiling code for open GL in Ubuntu. The thing to know is that Ubuntu comes with OpenGL, stupid simple, but its true. When I had figured that out, things started falling into place. In fact, if you are on a computer w/o a graphics card (netbook for example) the example code from the wiki above should compile from a fresh install.

Open Cl takes a little more. I haven't succesfully implemented it on a computer with out a graphics card yet, so this assumes you have a graphics card. If you are an nVidia user, your graphics card needs to be cuda capable (if it has "GeForce" in it's name, you are good). if you are an ATI user, your card needs to be compatible with the ATI Stream SDK, which I believe is Radeon 5000+. For OpenCL, your graphics card driver should have come with the capacity, but it hasn't been "enabled" in a manner of speaking yet. For nVidia you need to get the most recent CUDA toolkit. It will include the headers needed to compile code. For ATI, the most recent Stream toolkit. With that you have the bare minimum of what you need to get working.

But if you try to compile (gcc -o example example.c -lOpenCL) it will fail, it will tell you that the .h file can't be found. what needs to happen is you need to make a symbolic link. For openCl it is a matter of pointing the includes to the right spot. gcc will lookin the folder /usr/include/ for .h files (or links to them. Create a symbolic link here that will point to where the .h files are:

sudo ln -s /usr/local/cuda/include/CL/ /usr/include/

this will create the link to the open CL libraries. If that doesn't work, then you can tell gcc to add the cuda include path to its search list. this is done with the flag -B. so type in "gcc -B /usr/local/cuda/include/CL/" and it will now look in that directory for the .h files and you should be able to compile.

Lastly there is openAL. It is important to note here that openGL and openCL have no affilation with openAL. GL and CL are open standards kept by khronos group. AL is a cross platform API that sits on top of drivers (such as ASLA, Pulse Audio, EAX, or Direct Audio on windows). Because audio doesn't take quite the effort that graphics and physics do, we can get away with the extra over head created here, for the benefit of having a system that is simple to cross platform. OpenAL is developed chiefly by Creative Audio. It was inspired by OpenGL, where OpenGL strived to create a cross platform, hardware agnostic, graphics card api, OpenAl sought to do the same for 3D audio in computer games. This is where OpenAL differs from khronos's OpenSL. OpenAL provides desktops sound, and is made entirely with games in mind, OpenSL is aimed chiefly at handleds.

Now, there isn't a lot of resources out there for openAL. But for installation in ubuntu, it is very simple "sudo apt-get install libopenal-dev" will install all you need. wa bam.

The next hardest thing for me has been each interface's hello world program. something that show that something has happened. the link i gave earlier for open GL was what finally worked for me. I haven't taken it all apart yet to figure out what does what, but i have something openGl specific running. that is one interface down.

I've never found a good openCL example. the simplest are all so long and do so much. So I'm going to have to create my own. that will be next post.