Friday, December 23, 2011

openGL iterating post 4

4th and final day, here's what I've got for you:

Iteration 14: View ports
Iteration 15: Adding animation and resizing

only 2 more iterations today. But That wraps up my series of commentaries I think. There is more, it goes on to texturing and loading an obj. but as of iteration 15 you have all of the most basic systems in place for a video game. Particularly you should feel well equipped to be able to continue on to other tutorials. i might describe a little more how graphics systems work later. but for now I'm happy with what I've accomplished in this series. enjoy.

Thursday, December 22, 2011

openGL iterating post 3

And now it looks like i posted twice in one day! false. I was just not quite as quick to the publishing as I am tonight. shazam! with seconds to spare I post todays exploits!

Iteration 8: uniforms
Iteration 9: glutIdleFunc
Iteration 10: glm
Iteration 11: 3D coordinates
Iteration 12: transform
Iteration 13: the CUBE!

Links to the individual files have been replaced by a single link to a github for my blog:
This tutorial's files can be found under the folder "OpenGL"

Wednesday, December 21, 2011

openGL iterating post 2

Technically this is on the 22nd, but I really meant it to be 20 minutes ago. anyway. I've got the next batch of commentaries done. I've taken to calling them commentaries in contrast to tutorials, because if you want the tutorial, go to the wiki book:
That is just the tutorial I'm following. I'm just adding to it to help some people along. mostly this are honestly just to help me think through things. but in the off chance that they help some one else. sweet. enjoy:

Post #2:
Iteration 3: Reworking to a more realistic structure.
Iteration 4: Introducing: vertex buffer objects!
Iteration 5: Passing info to the vertex shader.
Iteration 6: Passing info from vertex to fragment shaders.
Iteration 7: Interweaving in buffer objects.

Links to the individual files have been replaced by a single link to a github for my blog:
This tutorial's files can be found under the folder "OpenGL"

Tuesday, December 20, 2011

openGL iterating post 1

I've started to write up my iterations through to make an openGL program. I'm really just following through with this tutorial: but I hope my comments will help newbies ease into openGL a little better. here they are, enjoy:

iteration 1: Making a bloody window
iteration 2: Making a basic render pipeline

Links to the individual files have been replaced by a single link to a github for my blog:

Sunday, December 18, 2011

Interface to the World

I've been trying to get it working so that I can display the values generated by my opencl gravity simulation. Right now it out put the info to a text file. the blender script I had once upon a time has since vanished and so not really interested in rewriting that script, I've set about trying to make things in OpenGL, and I've come to a realization.

I've come to the conclusion that there are two approaches to learning these sorts of things (programing api, languages, interfaces, etc). You can take a case example and accomplish only and exactly what you want. A good example of this is me with blender scripting. I'm relatively useless in scripting blender, but the things I have accomplished have proven to be quite handy. The advantages of this approach are clear. it is fast, nimble, and gets you your end result in a satsifactory way. The cons are that it is very focused in potential, and what you learn has very little re-use-ability.

Then, there is the approach of learning by building blocks. In this approach you learn more than what you want and spend a lot of time doing it. A good example of this would be my classes. I don't know when I'll use half of the things I hear about, but I know what to do when I come accross them, I have been familiarized with the topic. The advantage here is that you know the whole system. When you are given a new task, you can start up right away with a clear idea of where to go and what you need. the result is you get from point a to point b really quickly because you've taken the route 15 times before.

Now, this is coming out because I'm trying to work with openGL and openCL at the same time. I've managed to accomplish little things in both. But when I want to share information I need to use openGL buffers. I've not used these in either. for openCL it won't be to hard, but i need to learn them in openGL as well. and so I'm taking a step back. before I can actually accomplish what I want to accomplish, i need to get a better understanding of how openGL works. So openCL, my love, you will have to wait.

I'm going to start posting a series of code files like I did for my openCL example tomorrow. my objectives for this tutorial are as follows:
1. platform independent code. how you compile is up to you to figure out. (pst, if you use ubuntu compiling is "gcc -o test code_file_name_here.c -lGL -lGLU -lglut", glad we had this talk)
2. leave no step uncommented. like I did with the cl tutorial. explain what every step is, what it does, why it matters. This I think is the problem a lot of tutorials I've tried to read have had. Sure, when you have a general idea about how the system works it is a bit more information than you want to read. but when you are new, it is the only thing keeping you afloat. these tutorials are for some one who is completely new. Which happens to include myself. funny how that works.
3. itterative code files. explain how to accomplish one aspect of it, create the code file, have the code file for download, and repeat. slowly build to a final result which is finally capable of getting the read of the tutorial somewhere.

In terms of what I hope to accomplish. I don't really know yet. I don't really know the scope of what there is to be covered to really be able to write what I want to cover, but I'll post here updates as I get i itterate through:

Post #1:
Step 1: Make a bloody window.
Step 2: Making a basic render pipeline.

Post #2:
Step 3: Reworking to a more realistic structure.
Step 4: Introducing: vertex buffer objects!
Step 5: Passing info to the vertex shader.
Step 6: Passing info from vertex to fragment shaders.
Step 7: Interweaving in buffer objects.

Post #3:
Step 8: uniforms
Step 9: glutIdleFunc
Step 10: glm
Step 11: 3D coordinates
Step 12: transform
Step 13: the CUBE!

Post #4:
Step 14: View ports
Step 15: Adding animation and resizing

Saturday, December 3, 2011

Memory Access

I figure I should post something, but I haven't had a chance to really come up with anything about OpenCL or GL to post. I have been looking at information, just not had the chance to make any exploits. I watched through all 6 videos on OpenCL at Mac research, which are fantastic and I highly advise you to go watch those if you are reading this. But one of the videos comments that depending on how you use the memory banks reffered to as "Local" on the graphics card, they can essentially registers. Then it hit me, not every programmer knows what that means. not many really understand how memory flows when you are dealing with a CPU let alone a GPU. So I thought I'd take a moment to explain how this works.

I'm going to actually explain it, but I thought I'd give you a little analogy to reference before I get going. there are 4 kinds of memory, registers, Cache, Ram, and Disk. if you were to equate them to note books, registers would be the last couple and the next couple letters you wrote (a modern 32-bit processor has 6 4 byte registers, a 64-bit processor has 14 8 byte registers). The Cache is the current page of the note book you are writing on. RAM is the stack of notebooks on your desk. and Disk is the dozens of bookshelves filled with notebooks around the room.

reason for this is simply put here:

Blazing fast Fast Slow Painfully slow
Register Cache Ram Disk

Bytes Kilobytes Gigabytes Terrabytes
Register Cache Ram Disk

Now that you get the general idea for these various kinds of memory. Lets start first with where these are located. Both Registers and the Cache are in the actual processor, and therefore when you buy a computer, the processor is the peice that determines the size of these two memories. When the processor executes a machine level the only memory it is capable of accessing is the register. There for it makes sense that these are the fastest. the reason for the tiny size is a matter of addressing and locality. To keep the instructions small, the number of registers must be small. a typical instruction has to address 3 registers, for example addition: a+b store in c. it has to address a,b, and c. The other reason is that the further the information has to travel, the longer it takes to get there. Registers are right there ready to go into the circuitry that is about to is about to be executed.

Cache is the next level in memory. because of its proximity to the processing, it is also very fast. But when the processor wants to use information saved in the Cache it must do a load command to pull the information from the cache to the registers so it can be manipulated. Registers run on the order of 752 gigabits persecond while cache runs closer to 16-24 gigabits persecond. a bit of a speed difference.

Now, to the Processor, the Cache and RAM actually look like the same thing. The hardware will actually direct the processor's load command to the correct location, be it Cache or RAM. So why not have a larger cache? On a processor die(die, the actual circuitry of the processor.) which you can see here: When you look at this you can see there is a large dark portion on the left that is just a repeating pattern and a lighter portion with lots of paths and bunch of constructs of some sort. the right side is the actual processor, the right side is the cache. where are the registers you might ask? well, they are far to small to even see. each of the 4 byte registers in this processor are the same size as the 4k registers that make up the cache. 4k registers, 6 registers. you can imagine why you might not be able to see them. but back to my point. the cache is half of this chip. half. this is why the cache isn't bigger. Memory, compared to the actual processing circuitry, is quite large.

The next level of memory is the system memory, or RAM. RAM stands for Random Access Memory. The cache and system memory are technically both RAM, but we make the distinction between the two based mostly on the cache being housed by the processor. In terms of speed, modern Ram is capable typically of about 8-16 gigabits persecond, which isn't much slower than the cache, it is slower, but not the speed difference between the cache and the registers. RAM is where we start working with latency though. Latency is the amount of time it takes from when you give a system input to when it starts giving out put. If you go buy ram online it will give it's latency as one of the statistics about it. ram has a latency between 6-9 ms(milisecond). that means that when you send a request from the processor to the ram there is a 6-9ms delay between the request and the response. this is huge. stargeringly huge.

lets say you have fast ram at 6ms. and you have a fast processor at 3GHz. 3GHz just means that in 1 second it makes 3,000,000,000 cycles, or 3,000,000 cycles per ms. if your latency is 6ms, that means 3,000,000*6 or 18,000,000 processor cycles pass in the time it takes to get your request from the RAM. that is a lot of wasted resources. Modern processors have memory controlers that deal with getting information from the ram to the cache to minimize the frequency of what are refered to as "cache misses" where you have to wait for information to be transfered from ram to the cache.

The last level of memory is disk. this may be solid state drives magnetic disk drives. Solid state drives are actually a system similar to RAM. They use a type of ram called flash memory. One of the features of RAM is that if it looses its power all the information is lost. Flash memory on the other hand does not loose its information if it looses memory. Solid state drives are pretty fast. Capable of 1-3 gigabits persecond (though you'll typically see them listed in megabytes persecond. translating what I said then into that same form, 125-375 megabytes persecond). Magnetic disks are slugish in comparison at 40-200 megabits persecond. thats 0.04-0.2 gigabits persecond.

So why not SSDs all the way? this one I'm sure most of you know as these systems are quite as abscure and mysterious as registers and caches. for 100$ you can get a 64gigabyte SSD, about an order of magnitude larger than your RAM. for 100$ you could also get a 750gb magnetic disk drive. So the constraining feature of SSDs then is cost.

So to summarize what I've said. Registers, extremely fast, capable of dumping their contents every cycle, but because of limitations imposed by the size of instructions and the speed at which light travels, there can only be 6 of them. Cache, quite fast, resides on the processor itself, but due to size limitations, can't be bigger. Ram, slow, it has vast amounts more data than the cache, but again because of limitations of size and the speed of light (a photon at one end of a stick of ram will only be able to travel as far as the other end of the stick of ram in the time it takes for your CPU to complete a cycle.) as well as heat (if we try to pack ram closer, it is going to start to melt itself due to an inability to disperse the heat it produces.) ram is limited to a couple gigabytes. Lastly there is hard drives. These hold huge amounts of data, but are quite slow in getting to and reading that information as well. So when you compare the two end of the spectrum, a registers hold about 32 bytes of information and can transfer about 752 gigabits. in contrast, a magnetic disk can hold a terabyte or more, which is 31,250,000,000 times as much data, but can only transfer about 0.2 gigabits of it persecond, which makes registers about 3960 times faster.

All the kinds of memory are quite necessary to get your computer running quickly and hold the large quantities of data that we have come to enjoy. In my mind it is kind of like orbits of planets. futher planets orbit very slowly but have huge orbits, and planets that are painful close to the sun have tiny orbits but orbit ridiculously fast.

So, when it the Mac Research guy said that the local memory buffers on the GPU are so fast they can function as registers, it is amazing. it is trun the small handful of registers into a large group of almost 4k. that has amazing potential. Storing information in the local memory you can get calculations to just scream. Typically the GPU are clocked at about a 5th of the speed of a CPU. If you can things running in the local memory buffer, then you can get each individual stream processor to start to give the CPU a run for its money. that is crazy fast.

That was a lot longer more than I had intended to write, and there are somethings that I left out. But I'll wrap this up here for now. Besides, its not like anyone reads this so it doesn't really matter. :p