Common C++ Performance Mistakes in Games презентация

Содержание

About the Data ATG reviews code to find bottlenecks and make perf recommendations 50 titles per year 96% use C++ 1 in 3 use “advanced” features like templates or generics

Слайд 1Common C++ Performance Mistakes in Games
Pete Isensee
Xbox Advanced Technology Group


Слайд 2About the Data
ATG reviews code to find bottlenecks and make perf

recommendations
50 titles per year
96% use C++
1 in 3 use “advanced” features like templates or generics

Слайд 3Why This Talk Is Important
The majority of Xbox games are CPU

bound
The CPU bottleneck is often a language or C++ library issue
These issues are not usually specific to the platform

Слайд 4Format
Definition of the problem
Examples
Recommendation
For reference
A frame is 17 or 33 ms

(60fps / 30fps)
Bottlenecks given in ms per frame

Слайд 5Issue: STL
Game using std::list
Adding ~20,000 objects every frame
Rebuilding the list every

frame
Time spent: 6.5 ms/frame!
~156K overhead (2 pointers per node)
Objects spread all over the heap


Слайд 6std::set and map
Many games use set/map as sorted lists
Inserts are slow

(log(N))
Memory overhead: 3 ptrs + color
Worst case in game: 3.8 ms/frame



Слайд 7std::vector
Hundreds of push_back()s per frame
VS7.1 expands vector by 50%
Question: How many

reallocations for 100 push_back()s?
Answer: 13! (1,2,3,4,5,7,10,14,20,29,43,64,95)


Слайд 8Clearly, the STL is Evil


Слайд 9Use the Right Tool for the Job
The STL is powerful, but

it’s not free
Filling any container is expensive
Be aware of container overhead
Be aware of heap fragmentation and cache coherency
Prefer vector, vector::reserve()

Слайд 10The STL is Evil, Sometimes
The STL doesn’t solve every problem
The STL

solves some problems poorly
Sometimes good old C-arrays are the perfect container
Mike Abrash puts it well:
“The best optimizer is between your ears”


Слайд 11Issue: NIH Syndrome
Example: Custom binary tree
Sorted list of transparent objects
Badly unbalanced
1

ms/frame to add only 400 items
Example: Custom dynamic array class
Poorer performance than std::vector
Fewer features

Слайд 12Optimizations that Aren’t
void appMemcpy( void* d, const void* s, size_t b

)
{
// lots of assembly code here ...
}
appMemcpy( pDest, pSrc, 100 ); // bottleneck

appMemcpy was slower than memcpy for anything under 64K

Слайд 13Invent Only What You Need
std::set/map more efficient than the custom tree

by 10X
Tested and proven
Still high overhead
An even better solution
Unsorted vector or array
Sort once
20X improvement

Слайд 14Profile
Run your profiler
Rinse. Repeat.
Prove the improvement.
Don’t rewrite the C runtime or

STL just because you can. There are more interesting places to spend your time.

Слайд 15Issue: Tool Knowledge
If you’re a programmer, you use C/C++ every day
C++

is complex
CRT and STL libraries are complex
The complexities matter
Sometimes they really matter

Слайд 16vector::clear
Game reused global vector in frame loop
clear() called every frame to

empty the vector
C++ Standard
clear() erases all elements (size() goes to 0)
No mention of what happens to vector capacity
On VS7.1/Dinkumware, frees the memory
Every frame reallocated memory

Слайд 17Zero-Initialization
struct Array { int x[1000]; };
struct Container {
Array arr;

Container() : arr() { }
};
Container x; // bottleneck

Costing 3.5 ms/frame
Removing : arr() speeds this by 20X

Слайд 18Know Thine Holy Standard
Use resize(0) to reduce container size without affecting

capacity
T() means zero-initialize PODs. Don’t use T() unless you mean it.
Get a copy of the C++ Standard. Really.
www.techstreet.com; search on 14882
Only $18 for the PDF


Слайд 19Issue: C Runtime
void BuildScore( char* s, int n )
{
if(

n > 0 )
sprintf( s, “%d”, n );
else
sprintf( s, “” );
}
n was often zero
sprintf was a hotspot

Слайд 20qsort
Sorting is important in games
qsort is not an ideal sorting function
No

type safety
Comparison function call overhead
No opportunity for compiler inlining
There are faster options

Слайд 21Clearly, the CRT is Evil


Слайд 22Understand Your Options
itoa() can replace sprintf( s, “%d”, n )
*s

= ‘\0’ can replace sprintf( s, “” )
std::sort can replace qsort
Type safe
Comparison can be inlined
Other sorting options can be even faster: partial_sort, partition

Слайд 23Issue: Function Calls
50,000-100,000 calls/frame is normal
At 60Hz, Xbox has 12.2M cycles/frame
Function

call/return averages 20 cycles
A game calling 61,000 functions/frame spends 10% CPU (1.7 ms/frame) in function call overhead

Слайд 24Extreme Function-ality
120,000 functions/frame
140,000 functions/frame
130,000 calls to a single function/frame (ColumnVec::operator[])
And the

winner:
340,000 calls per frame!
9 ms/frame of call overhead


Слайд 25Beware Elegance
Elegance → levels of indirection → more functions → perf

impact
Use algorithmic solutions first
One pass through the world
Better object rejection
Do AI/physics/networking less often than once/frame

Слайд 26Inline Judiciously
Remember: inline is a suggestion
Try “inline any suitable” compiler option
15

to 20 fps
68,000 calls down to 47,000
Try __forceinline or similar keyword
Adding to 5 funcs shaved 1.5 ms/frame
Don’t over-inline


Слайд 27Issue: for loops
// Example 1: Copy indices to push buffer
for( DWORD

i = 0; i < dwIndexCnt; ++i )
*pPushBuffer++ = arrIndices[ i ];

// Example 2: Initialize vector array
for( DWORD i = 0; i < dwMax; ++i )
mVectorArr[i] = XGVECTOR4(0,0,0,0);

// Example 3: Process items in world
for( itr i = c.begin(); i < c.end(); ++i )
Process( *i );


Слайд 28Watch Out For For
Never copy/clear a POD with a for loop
std::algorithms

are optimized; use them

memcpy( pPushBuffer, arrIndices,
dwIndexCnt * sizeof(DWORD) );

memset( mVectorArr, 0, dwMax * sizeof(XGVECTOR4) );

for_each( c.begin(), c.end(), Process );

Слайд 29Issue: Exception Handling
Most games never throw
Most games never catch
Yet, most games

enable EH
EH adds code to do stack unwinding
A little bit of overhead to a lot of code
10% size increase is common
2 ms/frame in worst case


Слайд 30Disable Exception Handling
Don’t throw or catch exceptions
Turn off the C++ EH

compiler option
For Dinkumware STL
Define “_HAS_EXCEPTIONS=0”
Write empty _Throw and _Raise_handler; see stdthrow.cpp and raisehan.cpp in crt folder
Add #pragma warning(disable: 4530)


Слайд 31Issue: Strings
Programmers love strings
Love hurts
~7000 calls to stricmp in frame loop
1.5

ms/frame
Binary search of a string table
2 ms/frame


Слайд 32Avoid strings
String comparisons don’t belong in the frame loop
Put strings in

an table and compare indices
At least optimize the comparison
Compare pointers only
Prefer strcmp to stricmp

Слайд 33Issue: Memory Allocation
Memory overhead
Xbox granularity/overhead is 16/16 bytes
Overhead alone is often

1+ MB
Too many allocations
Games commonly do thousands of allocations per frame
Cost: 1-5 ms/frame


Слайд 34Hidden Allocations
push_back(), insert() and friends typically allocate memory
String constructors allocate
Init-style calls

often allocate
Temporary objects, particularly string constants that convert to string objects


Слайд 35Minimize Per-Frame Allocations
Use memory-friendly data structures, e.g. arrays, vectors
Reserve memory in

advance
Use custom allocators
Pool same-size allocations in a single block of memory to avoid overhead
Use the explicit keyword to avoid hidden temporaries
Avoid strings


Слайд 36Other Tidbits
Compiler settings: experiment
dynamic_cast: just say no
Constructors: performance killers
Unused static array

space: track this
Loop unrolling: huge wins, sometimes
Suspicious comments: watch out
“Immensely slow matrix multiplication”

Слайд 37Wrap Up
Use the Right Tool for the Job
The STL is Evil,

Sometimes
Invent Only What You Need
Profile
Know Thine Holy Standard
Understand Your Options


Beware Elegance
Inline Judiciously
Watch Out For For
Disable Exception Handling
Avoid Strings
Minimize Per-frame Allocations


Слайд 38Call to Action: Evolve!
Pass the rubber chicken
Share your C++ performance mistakes

with your team
Mentor junior programmers
So they only make new mistakes
Don’t stop learning
You can never know enough C++

Слайд 39Questions
Fill out your feedback forms
Email: pkisensee@msn.com
This presentation: www.tantalon.com/pete.htm


Обратная связь

Если не удалось найти и скачать презентацию, Вы можете заказать его на нашем сайте. Мы постараемся найти нужный Вам материал и отправим по электронной почте. Не стесняйтесь обращаться к нам, если у вас возникли вопросы или пожелания:

Email: Нажмите что бы посмотреть 

Что такое ThePresentation.ru?

Это сайт презентаций, докладов, проектов, шаблонов в формате PowerPoint. Мы помогаем школьникам, студентам, учителям, преподавателям хранить и обмениваться учебными материалами с другими пользователями.


Для правообладателей

Яндекс.Метрика