Intro
So this morning is the first morning that I am working without a day job. I must say its liberating…. but enough about that crap! It’s time to get to work!
This morning was a programming morning. The considerable performance drop of my scene during my modeling efforts led me to investigate methods for improving my scene’s rendering performance, both in the editor and in the game itself (since they both use the exact same rendering engine). Back in the day, I used to laboriously accomplish this using frustum culling. Frustum culling is a technique that uses a spatial data structure, an octree, to categorize the mesh objects in a scene into cubic regions, and then mathematically detect whether or not those regions are in the viewing volume (frustum) currently visible in the scene. This technique works okay, but it’s a pain in the ass to implement and I’d rather not if I don’t absolutely need to. Furthermore, it doesn’t handle the situation of occlusion, when a very large 3D object is in front of a smaller one, eclipsing it, making it entirely invisible and useless to render since it’ll fail the z-buffer test.
Thus, enters “occlusion queries”. A very cool OpenGL technique which allows you query exactly how much of a 3D object was truly rendered, and decide whether or not to keep rendering it in the future. This is exactly what I needed. It all sounds great in theory, now let me tell you about some of the issues I had implementing it. I’ll try to avoid some of the ugly Objective-C syntax that surrounds this code in my actual system in my snippets.
Technique Overview
So in practice, occlusion culling is quite simple. There are basically 3 steps.
First, you must render the scene using very simple solid bounding-box geometry. ie, for each discrete mesh object within your scene, you render a giant solid box that entirely bounds that object. You only render this box with a very simple flat-color shader which will keep your query render very fast. You don’t actually want these boxes to appear in your scene, so you do this step with color buffer writes and depth buffer writes turned off (masks set to FALSE).
Next, you query the results of the above for each box rendered and determine which models were visible (not occluded). You make a note of the ones that were.
Finally, you render the scene normally, with the extra check to ensure that you don’t render the model objects that were not visible.
Setup
So the first thing I needed was a single occlusion query per mesh object in my scene. In OpenGL, these (like many things in GL) are GLuint ids. I dropped these into my entity mesh class
|
//Per each model
GLuint occlusionQuery;
BOOL occluded;
EntityMeshOcclusionState occlusionState; //hidden, visible, waiting |
Then, in the model init and dealloc code, I generate the query objects as needed.
|
//Init
glGenQueries(1, &occlusionQuery);
//dealloc
if(occlusionQuery)
{
glDeleteQueries(1, &occlusionQuery);
} |
I then set up a special method that renders the solid bounding box geometry used during the occlusion query. Now here’s where things get tricky. There’s a way to do occlusion queries wrong (which I found out the hard way). So much so that the performance benefit that they offer is entirely negated by the pipeline stalling that you can inadvertently cause. Note the check against the EntityMeshOcclusionWaiting state. This will be explained in the next section.
|
-(void) renderOcclusionQuery
{
if(occlusionState != EntityMeshOcclusionWaiting)
{
occlusionState = EntityMeshOcclusionWaiting;
glBeginQuery(GL_ANY_SAMPLES_PASSED, occlusionQuery);
[self renderSolidCubeBBox];
glEndQuery(GL_ANY_SAMPLES_PASSED);
}
} |
Scene Rendering
To kick this off, I added a new special method to my Scene class called renderOcclusionQueries. I then inserted a call to this method in my scene’s main render method like so. Note the usage of glColorMask and glDepthMask to ensure the query bounding boxes don’t actually render to the screen.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
|
-(void) render
{
if(occlusionCulling)
{
[self renderOcclusionQueries];
}
//don't waste time if we don't need to
if(![renderPassManager numberOfPasses])
{
[self renderSinglePass];
}
else
{
[renderPassManager renderAll:self];
}
}
-(void) renderOcclusionQueries
{
glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
glDepthMask(GL_FALSE);
[VGL modelViewLoadIdentity];
//camera (viewing) transform
[activeCamera apply];
[VGL enableFlatSolidColorRendering:YES];
[VGL setPrimaryColor:make_float4(1, 1, 1, 1)];
for(Entity *entity in entities)
{
if(![self isMeshEntity:entity])
continue;
if(!entity.hidden && ![entity.passRendererExclusions containsObject:[renderPassManager currentlyRenderingNode]])
{
[entity renderOcclusionQuery];
}
}
[VGL enableFlatSolidColorRendering:NO];
glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);
glDepthMask(GL_TRUE);
} |
With the queries properly set up, I can now use them during my main rendering pass of all the entities to ensure I only draw whats necessary. Again, this was tricky. I had to absolutely make sure I never stall the pipeline. No matter what. This means I don’t retrieve a query result unless GL_QUERY_RESULT_AVAILABLE is true. If it isn’t, I leave the query in the “waiting” state. I also don’t start a new query when it’s in the waiting state (note the check against this in the above entity renderOcclusionQuery method). This essentially means that the occlusion queries are entirely asynchronous with respect to the main rendering.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
|
//main geoemetry rendering
for(Entity *entity in entities)
{
if(occlusionCulling)
{
GLuint passed = INT_MAX;
GLuint available = 0;
glGetQueryObjectuiv(entity.occlusionQuery, GL_QUERY_RESULT_AVAILABLE, &available);
if(available)
{
passed = 0;
glGetQueryObjectuiv(entity.occlusionQuery, GL_QUERY_RESULT, &passed);
entity.occlusionState = (passed) ? EntityMeshOcclusionVisible : EntityMeshOcclusionHidden;
entity.occluded = (passed) ? NO : YES;
}
}
else
{
entity.occlusionState = EntityMeshOcclusionVisible;
entity.occluded = NO;
}
if(!entity.occluded)
{
[entity render];
}
} |
That’s pretty much all there is to it.
Considerations
Now aint nothin in this world fo free. So there’s some things I should mention. First, if I wasn’t also targeting iOS mobile, I would have probably used the OpenGL conditional rendering method which essentially does alot of the above checking for me automatically. I noodled around with this and couldn’t get equivalently good performance so I just moved onto the manual way. I also don’t like how I still have to submit all the expensive drawing “and non” drawing calls with conditional rendering and essentially trust the driver to do whats best. My method ensures NOTHING is ran if the object isn’t visible. With the downside being initiating readbacks from the OpenGL device back to the CPU. However, I’m getting very decent performance with this so I’m happy.
Also, because the queries are truly async, I can get myself into trouble when running this code on very slow or buggy graphics cards (ahem.. intel..ahem). The problem being, if the query takes too long, you may look at a space where an object should be, and not see it for a few frames while waiting for the query to catch up. This finally explains to me why when playing some games on my wii u (such as call of duty), I sometimes turn real fast, and see an object suddenly appear a few frames late.