ast.h |
-*- c++ -*- |
37345 |
ast_array_index.cpp |
If \c ir is a reference to an array for which we are tracking the max array
element accessed, track that the given element has been accessed.
Otherwise do nothing.
This function also checks whether the array is a built-in array whose
maximum size is too small to accommodate the given index, and if so uses
loc and state to report the error.
|
15340 |
ast_expr.cpp |
|
2272 |
ast_function.cpp |
We need to process the parameters first in order to know if we can
raise or not a unitialized warning. Calling set_is_lhs silence the
warning for now. Raising the warning or not will be checked at
verify_parameter_modes.
|
95627 |
ast_to_hir.cpp |
\file ast_to_hir.c
Convert abstract syntax to to high-level intermediate reprensentation (HIR).
During the conversion to HIR, the majority of the symantic checking is
preformed on the program. This includes:
* Symbol table management
* Type checking
* Function binding
The majority of this work could be done during parsing, and the parser could
probably generate HIR directly. However, this results in frequent changes
to the parser code. Since we do not assume that every system this complier
is built on will have Flex and Bison installed, we have to store the code
generated by these tools in our version control system. In other parts of
the system we've seen problems where a parser was changed but the generated
code was not committed, merge conflicts where created because two developers
had slightly different versions of Bison installed, etc.
I have also noticed that running Bison generated parsers in GDB is very
irritating. When you get a segfault on '$$ = $1->foo', you can't very
well 'print $1' in GDB.
As a result, my preference is to put as little C code as possible in the
parser (and lexer) sources.
|
349136 |
ast_type.cpp |
'subroutine' isnt a real qualifier. |
36025 |
builtin_functions.cpp |
\file builtin_functions.cpp
Support for GLSL built-in functions.
This file is split into several main components:
1. Availability predicates
A series of small functions that check whether the current shader
supports the version/extensions required to expose a built-in.
2. Core builtin_builder class functionality
3. Lists of built-in functions
The builtin_builder::create_builtins() function contains lists of all
built-in function signatures, where they're available, what types they
take, and so on.
4. Implementations of built-in function signatures
A series of functions which create ir_function_signatures and emit IR
via ir_builder to implement them.
5. External API
A few functions the rest of the compiler can use to interact with the
built-in function module. For example, searching for a built-in by
name and parameters.
|
374255 |
builtin_functions.h |
extern "C" |
2549 |
builtin_int64.h |
IF CONDITION |
43996 |
builtin_types.cpp |
\file builtin_types.cpp
The glsl_type class has static members to represent all the built-in types
(such as the glsl_type::_float_type flyweight) as well as convenience pointer
accessors (such as glsl_type::float_type). Those global variables are
declared and initialized in this file.
This also contains _mesa_glsl_initialize_types(), a function which populates
a symbol table with the available built-in types for a particular language
version and set of enabled extensions.
|
19917 |
builtin_variables.cpp |
Building this file with MinGW g++ 7.3 or 7.4 with:
scons platform=windows toolchain=crossmingw machine=x86 build=profile
triggers an internal compiler error.
Overriding the optimization level to -O1 works around the issue.
MinGW 5.3.1 does not seem to have the bug, neither does 8.3. So for now
we're simply testing for version 7.x here.
|
63550 |
float64.glsl |
|
58605 |
generate_ir.cpp |
for SWIZZLE_X, &c. |
1359 |
glcpp |
|
|
glsl_lexer.cpp |
A lexical scanner generated by flex |
163592 |
glsl_lexer.ll |
|
42804 |
glsl_optimizer.cpp |
|
25958 |
glsl_optimizer.h |
Main GLSL optimizer interface.
See ../../README.md for more instructions.
General usage:
ctx = glslopt_initialize();
for (lots of shaders) {
shader = glslopt_optimize (ctx, shaderType, shaderSource, options);
if (glslopt_get_status (shader)) {
newSource = glslopt_get_output (shader);
} else {
errorLog = glslopt_get_log (shader);
}
glslopt_shader_delete (shader);
}
glslopt_cleanup (ctx);
|
3181 |
glsl_parser.cpp |
A Bison parser, made by GNU Bison 3.5. |
246677 |
glsl_parser.h |
A Bison parser, made by GNU Bison 3.5. |
6544 |
glsl_parser.yy |
|
97360 |
glsl_parser_extras.cpp |
for PRIx64 macro |
80777 |
glsl_parser_extras.h |
Most of the definitions here only apply to C++
|
34759 |
glsl_symbol_table.cpp |
-*- c++ -*- |
9229 |
glsl_symbol_table.h |
-*- c++ -*- |
3727 |
hir_field_selection.cpp |
There are two kinds of field selection. There is the selection of a
specific field from a structure, and there is the selection of a
swizzle / mask from a vector. Which is which is determined entirely
by the base type of the thing to which the field selection operator is
being applied.
|
3127 |
int64.glsl |
|
2680 |
ir.cpp |
Modify the swizzle make to move one component to another
\param m IR swizzle to be modified
\param from Component in the RHS that is to be swizzled
\param to Desired swizzle location of \c from
|
61767 |
ir.h |
-*- c++ -*- |
75126 |
ir_array_refcount.cpp |
\file ir_array_refcount.cpp
Provides a visitor which produces a list of variables referenced.
|
6103 |
ir_array_refcount.h |
\file ir_array_refcount.h
Provides a visitor which produces a list of variables referenced.
|
3649 |
ir_basic_block.cpp |
\file ir_basic_block.cpp
Basic block analysis of instruction streams.
|
3402 |
ir_basic_block.h |
GLSL_IR_BASIC_BLOCK_H |
1424 |
ir_builder.cpp |
|
11482 |
ir_builder.h |
This little class exists to let the helper expression generators
take either an ir_rvalue * or an ir_variable * to be automatically
dereferenced, while still providing compile-time type checking.
You don't have to explicitly call the constructor -- C++ will see
that you passed an ir_variable, and silently call the
operand(ir_variable *var) constructor behind your back.
|
7225 |
ir_builder_print_visitor.cpp |
for PRIx64 macro |
23909 |
ir_builder_print_visitor.h |
-*- c++ -*- |
1361 |
ir_clone.cpp |
The only possible instantiation is the generic error value. |
13153 |
ir_constant_expression.cpp |
\file ir_constant_expression.cpp
Evaluate and process constant valued expressions
In GLSL, constant valued expressions are used in several places. These
must be processed and evaluated very early in the compilation process.
* Sizes of arrays
* Initializers for uniforms
* Initializers for \c const variables
|
31637 |
ir_equals.cpp |
Helper for checking equality when one instruction might be NULL, since you
can't access a's vtable in that case.
|
5478 |
ir_expression_flattening.cpp |
\file ir_expression_flattening.cpp
Takes the leaves of expression trees and makes them dereferences of
assignments of the leaves to temporaries, according to a predicate.
This is used for breaking down matrix operations, where it's easier to
create a temporary and work on each of its vector components individually.
|
2704 |
ir_expression_flattening.h |
\file ir_expression_flattening.h
Takes the leaves of expression trees and makes them dereferences of
assignments of the leaves to temporaries, according to a predicate.
This is used for automatic function inlining, where we want to take
an expression containing a call and move the call out to its own
assignment so that we can inline it at the appropriate place in the
instruction stream.
|
1815 |
ir_expression_operation.h |
Sentinels marking the last of each kind of operation. |
4687 |
ir_expression_operation.py |
Basic iterator for a set of type signatures. Various kinds of sequences of
types come in, and an iteration of type_signature objects come out.
|
43788 |
ir_expression_operation_constant.h |
|
64728 |
ir_expression_operation_strings.h |
|
5690 |
ir_function.cpp |
< Match requires implicit conversion. |
13922 |
ir_function_can_inline.cpp |
\file ir_function_can_inline.cpp
Determines if we can inline a function call using ir_function_inlining.cpp.
The primary restriction is that we can't return from the function other
than as the last instruction. In lower_jumps.cpp, we can lower return
statements not at the end of the function to other control flow in order to
deal with this restriction.
|
2472 |
ir_function_detect_recursion.cpp |
|
11764 |
ir_function_inlining.h |
\file ir_function_inlining.h
Replaces calls to functions with the body of the function.
|
1410 |
ir_hierarchical_visitor.cpp |
|
9573 |
ir_hierarchical_visitor.h |
-*- c++ -*- |
9455 |
ir_hv_accept.cpp |
\file ir_hv_accept.cpp
Implementations of all hierarchical visitor accept methods for IR
instructions.
|
12304 |
ir_optimization.h |
\file ir_optimization.h
Prototypes for optimization passes to be called by the compiler and drivers.
|
8814 |
ir_print_glsl_visitor.cpp |
samplerExternal uses texture2D |
54026 |
ir_print_glsl_visitor.h |
-*- c++ -*- |
2763 |
ir_print_visitor.cpp |
for PRIx64 macro |
16924 |
ir_print_visitor.h |
-*- c++ -*- |
3253 |
ir_reader.cpp |
anonymous namespace |
34867 |
ir_reader.h |
-*- c++ -*- |
1386 |
ir_rvalue_visitor.cpp |
\file ir_rvalue_visitor.cpp
Generic class to implement the common pattern we have of wanting to
visit each ir_rvalue * and possibly change that node to a different
class.
|
6915 |
ir_rvalue_visitor.h |
\file ir_rvalue_visitor.h
Generic class to implement the common pattern we have of wanting to
visit each ir_rvalue * and possibly change that node to a different
class. Just implement handle_rvalue() and you will be called with
a pointer to each rvalue in the tree.
|
3852 |
ir_set_program_inouts.cpp |
\file ir_set_program_inouts.cpp
Sets the inputs_read and outputs_written of Mesa programs.
Mesa programs (gl_program, not gl_shader_program) have a set of
flags indicating which varyings are read and written. Computing
which are actually read from some sort of backend code can be
tricky when variable array indexing involved. So this pass
provides support for setting inputs_read and outputs_written right
from the GLSL IR.
|
15335 |
ir_uniform.h |
stdbool.h is necessary because this file is included in both C and C++ code.
|
6387 |
ir_unused_structs.cpp |
|
3591 |
ir_unused_structs.h |
|
1232 |
ir_validate.cpp |
\file ir_validate.cpp
Attempts to verify that various invariants of the IR tree are true.
In particular, at the moment it makes sure that no single
ir_instruction node except for ir_variable appears multiple times
in the ir tree. ir_variable does appear multiple times: Once as a
declaration in an exec_list, and multiple times as the endpoint of
a dereference chain.
|
36134 |
ir_variable_refcount.cpp |
\file ir_variable_refcount.cpp
Provides a visitor which produces a list of variables referenced,
how many times they were referenced and assigned, and whether they
were defined in the scope.
|
4628 |
ir_variable_refcount.h |
\file ir_variable_refcount.h
Provides a visitor which produces a list of variables referenced,
how many times they were referenced and assigned, and whether they
were defined in the scope.
|
2934 |
ir_visitor.h |
-*- c++ -*- |
3948 |
link_atomics.cpp |
Atomic counter uniform as seen by the program.
|
12731 |
link_functions.cpp |
If ir is an ir_call from a function that was imported from another
shader callee will point to an ir_function_signature in the original
shader. In this case the function signature MUST NOT BE MODIFIED.
Doing so will modify the original shader. This may prevent that
shader from being linkable in other programs.
|
11940 |
link_interface_blocks.cpp |
\file link_interface_blocks.cpp
Linker support for GLSL's interface blocks.
|
20114 |
link_uniform_block_active_visitor.cpp |
If a block with this block-name has not previously been seen, add it.
If a block with this block-name has been seen, it must be identical to
the block currently being examined.
|
10434 |
link_uniform_block_active_visitor.h |
Size of the array before array-trimming optimizations.
Locations are only assigned to active array elements, but the location
values are calculated as if all elements are active. The total number
of elements in an array including the elements in arrays of arrays before
inactive elements are removed is needed to be perform that calculation.
|
2747 |
link_uniform_blocks.cpp |
empty |
21082 |
link_uniform_initializers.cpp |
These functions are put in a "private" namespace instead of being marked
static so that the unit tests can access them. See
http://code.google.com/p/googletest/wiki/AdvancedGuide#Testing_Private_Code
|
11330 |
link_uniforms.cpp |
\file link_uniforms.cpp
Assign locations for GLSL uniforms.
\author Ian Romanick <ian.d.romanick@intel.com>
|
64179 |
link_varyings.cpp |
\file link_varyings.cpp
Linker functions related specifically to linking varyings between shader
stages.
|
122925 |
link_varyings.h |
\file link_varyings.h
Linker functions related specifically to linking varyings between shader
stages.
|
8426 |
linker.cpp |
\file linker.cpp
GLSL linker implementation
Given a set of shaders that are to be linked to generate a final program,
there are three distinct stages.
In the first stage shaders are partitioned into groups based on the shader
type. All shaders of a particular type (e.g., vertex shaders) are linked
together.
- Undefined references in each shader are resolve to definitions in
another shader.
- Types and qualifiers of uniforms, outputs, and global variables defined
in multiple shaders with the same name are verified to be the same.
- Initializers for uniforms and global variables defined
in multiple shaders with the same name are verified to be the same.
The result, in the terminology of the GLSL spec, is a set of shader
executables for each processing unit.
After the first stage is complete, a series of semantic checks are performed
on each of the shader executables.
- Each shader executable must define a \c main function.
- Each vertex shader executable must write to \c gl_Position.
- Each fragment shader executable must write to either \c gl_FragData or
\c gl_FragColor.
In the final stage individual shader executables are linked to create a
complete exectuable.
- Types of uniforms defined in multiple shader stages with the same name
are verified to be the same.
- Initializers for uniforms defined in multiple shader stages with the
same name are verified to be the same.
- Types and qualifiers of outputs defined in one stage are verified to
be the same as the types and qualifiers of inputs defined with the same
name in a later stage.
\author Ian Romanick <ian.d.romanick@intel.com>
|
183709 |
linker.h |
-*- c++ -*- |
8795 |
linker_util.cpp |
for gl_uniform_storage |
13825 |
linker_util.h |
Sometimes there are empty slots left over in UniformRemapTable after we
allocate slots to explicit locations. This struct represents a single
continouous block of empty slots in UniformRemapTable.
|
3762 |
list.h |
\file list.h
\brief Doubly-linked list abstract container type.
Each doubly-linked list has a sentinel head and tail node. These nodes
contain no data. The head sentinel can be identified by its \c prev
pointer being \c NULL. The tail sentinel can be identified by its
\c next pointer being \c NULL.
A list is empty if either the head sentinel's \c next pointer points to the
tail sentinel or the tail sentinel's \c prev poiner points to the head
sentinel. The head sentinel and tail sentinel nodes are allocated within the
list structure.
Do note that this means that the list nodes will contain pointers into the
list structure itself and as a result you may not \c realloc() an \c
exec_list or any structure in which an \c exec_list is embedded.
|
22417 |
loop_analysis.cpp |
Find an initializer of a variable outside a loop
Works backwards from the loop to find the pre-loop value of the variable.
This is used, for example, to find the initial value of loop induction
variables.
\param loop Loop where \c var is an induction variable
\param var Variable whose initializer is to be found
\return
The \c ir_rvalue assigned to the variable outside the loop. May return
\c NULL if no initializer can be found.
|
24012 |
loop_analysis.h |
-*- c++ -*- |
6438 |
loop_unroll.cpp |
anonymous namespace |
19481 |
lower_blend_equation_advanced.cpp |
f(Cs,Cd) = Cs*Cd |
18897 |
lower_buffer_access.cpp |
\file lower_buffer_access.cpp
Helper for IR lowering pass to replace dereferences of buffer object based
shader variables with intrinsic function calls.
This helper is used by lowering passes for UBOs, SSBOs and compute shader
shared variables.
|
17305 |
lower_buffer_access.h |
\file lower_buffer_access.h
Helper for IR lowering pass to replace dereferences of buffer object based
shader variables with intrinsic function calls.
This helper is used by lowering passes for UBOs, SSBOs and compute shader
shared variables.
|
2718 |
lower_builtins.cpp |
\file lower_builtins.cpp
Inline calls to builtin functions.
|
1858 |
lower_const_arrays_to_uniforms.cpp |
\file lower_const_arrays_to_uniforms.cpp
Lower constant arrays to uniform arrays.
Some driver backends (such as i965 and nouveau) don't handle constant arrays
gracefully, instead treating them as ordinary writable temporary arrays.
Since arrays can be large, this often means spilling them to scratch memory,
which usually involves a large number of instructions.
This must be called prior to link_set_uniform_initializers(); we need the
linker to process our new uniform's constant initializer.
This should be called after optimizations, since those can result in
splitting and removing arrays that are indexed by constant expressions.
|
4846 |
lower_cs_derived.cpp |
\file lower_cs_derived.cpp
For hardware that does not support the gl_GlobalInvocationID and
gl_LocalInvocationIndex system values, replace them with fresh
globals. Note that we can't rely on gl_WorkGroupSize or
gl_LocalGroupSizeARB being available, since they may only have been defined
in a non-main shader.
[ This can happen if only a secondary shader has the layout(local_size_*)
declaration. ]
This is meant to be run post-linking.
|
7668 |
lower_discard.cpp |
\file lower_discard.cpp
This pass moves discards out of if-statements.
Case 1: The "then" branch contains a conditional discard:
---------------------------------------------------------
if (cond1) {
s1;
discard cond2;
s2;
} else {
s3;
}
becomes:
temp = false;
if (cond1) {
s1;
temp = cond2;
s2;
} else {
s3;
}
discard temp;
Case 2: The "else" branch contains a conditional discard:
---------------------------------------------------------
if (cond1) {
s1;
} else {
s2;
discard cond2;
s3;
}
becomes:
temp = false;
if (cond1) {
s1;
} else {
s2;
temp = cond2;
s3;
}
discard temp;
Case 3: Both branches contain a conditional discard:
----------------------------------------------------
if (cond1) {
s1;
discard cond2;
s2;
} else {
s3;
discard cond3;
s4;
}
becomes:
temp = false;
if (cond1) {
s1;
temp = cond2;
s2;
} else {
s3;
temp = cond3;
s4;
}
discard temp;
If there are multiple conditional discards, we need only deal with one of
them. Repeatedly applying this pass will take care of the others.
Unconditional discards are treated as having a condition of "true".
|
4785 |
lower_discard_flow.cpp |
@file lower_discard_flow.cpp
Implements the GLSL 1.30 revision 9 rule for fragment shader
discard handling:
"Control flow exits the shader, and subsequent implicit or
explicit derivatives are undefined when this control flow is
non-uniform (meaning different fragments within the primitive
take different control paths)."
There seem to be two conflicting things here. "Control flow exits
the shader" sounds like the discarded fragments should effectively
jump to the end of the shader, but that breaks derivatives in the
case of uniform control flow and causes rendering failure in the
bushes in Unigine Tropics.
The question, then, is whether the intent was "loops stop at the
point that the only active channels left are discarded pixels" or
"discarded pixels become inactive at the point that control flow
returns to the top of a loop". This implements the second
interpretation.
|
4761 |
lower_distance.cpp |
\file lower_distance.cpp
This pass accounts for the difference between the way
gl_ClipDistance is declared in standard GLSL (as an array of
floats), and the way it is frequently implemented in hardware (as
a pair of vec4s, with four clip distances packed into each).
The declaration of gl_ClipDistance is replaced with a declaration
of gl_ClipDistanceMESA, and any references to gl_ClipDistance are
translated to refer to gl_ClipDistanceMESA with the appropriate
swizzling of array indices. For instance:
gl_ClipDistance[i]
is translated into:
gl_ClipDistanceMESA[i>>2][i&3]
Since some hardware may not internally represent gl_ClipDistance as a pair
of vec4's, this lowering pass is optional. To enable it, set the
LowerCombinedClipCullDistance flag in gl_shader_compiler_options to true.
|
24752 |
lower_if_to_cond_assign.cpp |
\file lower_if_to_cond_assign.cpp
This flattens if-statements to conditional assignments if:
- the GPU has limited or no flow control support
(controlled by max_depth)
- small conditional branches are more expensive than conditional assignments
(controlled by min_branch_cost, that's the cost for a branch to be
preserved)
It can't handle other control flow being inside of its block, such
as calls or loops. Hopefully loop unrolling and inlining will take
care of those.
Drivers for GPUs with no control flow support should simply call
lower_if_to_cond_assign(instructions)
to attempt to flatten all if-statements.
Some GPUs (such as i965 prior to gen6) do support control flow, but have a
maximum nesting depth N. Drivers for such hardware can call
lower_if_to_cond_assign(instructions, N)
to attempt to flatten any if-statements appearing at depth > N.
|
10977 |
lower_instructions.cpp |
\file lower_instructions.cpp
Many GPUs lack native instructions for certain expression operations, and
must replace them with some other expression tree. This pass lowers some
of the most common cases, allowing the lowering code to be implemented once
rather than in each driver backend.
Currently supported transformations:
- SUB_TO_ADD_NEG
- DIV_TO_MUL_RCP
- INT_DIV_TO_MUL_RCP
- EXP_TO_EXP2
- POW_TO_EXP2
- LOG_TO_LOG2
- MOD_TO_FLOOR
- LDEXP_TO_ARITH
- DFREXP_TO_ARITH
- CARRY_TO_ARITH
- BORROW_TO_ARITH
- SAT_TO_CLAMP
- DOPS_TO_DFRAC
SUB_TO_ADD_NEG:
---------------
Breaks an ir_binop_sub expression down to add(op0, neg(op1))
This simplifies expression reassociation, and for many backends
there is no subtract operation separate from adding the negation.
For backends with native subtract operations, they will probably
want to recognize add(op0, neg(op1)) or the other way around to
produce a subtract anyway.
FDIV_TO_MUL_RCP, DDIV_TO_MUL_RCP, and INT_DIV_TO_MUL_RCP:
---------------------------------------------------------
Breaks an ir_binop_div expression down to op0 * (rcp(op1)).
Many GPUs don't have a divide instruction (945 and 965 included),
but they do have an RCP instruction to compute an approximate
reciprocal. By breaking the operation down, constant reciprocals
can get constant folded.
FDIV_TO_MUL_RCP lowers single-precision and half-precision
floating point division;
DDIV_TO_MUL_RCP only lowers double-precision floating point division.
DIV_TO_MUL_RCP is a convenience macro that sets both flags.
INT_DIV_TO_MUL_RCP handles the integer case, converting to and from floating
point so that RCP is possible.
EXP_TO_EXP2 and LOG_TO_LOG2:
----------------------------
Many GPUs don't have a base e log or exponent instruction, but they
do have base 2 versions, so this pass converts exp and log to exp2
and log2 operations.
POW_TO_EXP2:
-----------
Many older GPUs don't have an x**y instruction. For these GPUs, convert
x**y to 2**(y * log2(x)).
MOD_TO_FLOOR:
-------------
Breaks an ir_binop_mod expression down to (op0 - op1 * floor(op0 / op1))
Many GPUs don't have a MOD instruction (945 and 965 included), and
if we have to break it down like this anyway, it gives an
opportunity to do things like constant fold the (1.0 / op1) easily.
Note: before we used to implement this as op1 * fract(op / op1) but this
implementation had significant precision errors.
LDEXP_TO_ARITH:
-------------
Converts ir_binop_ldexp to arithmetic and bit operations for float sources.
DFREXP_DLDEXP_TO_ARITH:
---------------
Converts ir_binop_ldexp, ir_unop_frexp_sig, and ir_unop_frexp_exp to
arithmetic and bit ops for double arguments.
CARRY_TO_ARITH:
---------------
Converts ir_carry into (x + y) < x.
BORROW_TO_ARITH:
----------------
Converts ir_borrow into (x < y).
SAT_TO_CLAMP:
-------------
Converts ir_unop_saturate into min(max(x, 0.0), 1.0)
DOPS_TO_DFRAC:
--------------
Converts double trunc, ceil, floor, round to fract
|
68546 |
lower_int64.cpp |
\file lower_int64.cpp
Lower 64-bit operations to 32-bit operations. Each 64-bit value is lowered
to a uvec2. For each operation that can be lowered, there is a function
called __builtin_foo with the same number of parameters that takes uvec2
sources and produces uvec2 results. An operation like
uint64_t(x) * uint64_t(y)
becomes
packUint2x32(__builtin_umul64(unpackUint2x32(x), unpackUint2x32(y)));
|
11894 |
lower_jumps.cpp |
\file lower_jumps.cpp
This pass lowers jumps (break, continue, and return) to if/else structures.
It can be asked to:
1. Pull jumps out of ifs where possible
2. Remove all "continue"s, replacing them with an "execute flag"
3. Replace all "break" with a single conditional one at the end of the loop
4. Replace all "return"s with a single return at the end of the function,
for the main function and/or other functions
Applying this pass gives several benefits:
1. All functions can be inlined.
2. nv40 and other pre-DX10 chips without "continue" can be supported
3. nv30 and other pre-DX10 chips with no control flow at all are better
supported
Continues are lowered by adding a per-loop "execute flag", initialized to
true, that when cleared inhibits all execution until the end of the loop.
Breaks are lowered to continues, plus setting a "break flag" that is checked
at the end of the loop, and trigger the unique "break".
Returns are lowered to breaks/continues, plus adding a "return flag" that
causes loops to break again out of their enclosing loops until all the
loops are exited: then the "execute flag" logic will ignore everything
until the end of the function.
Note that "continue" and "return" can also be implemented by adding
a dummy loop and using break.
However, this is bad for hardware with limited nesting depth, and
prevents further optimization, and thus is not currently performed.
|
39923 |
lower_mat_op_to_vec.cpp |
\file lower_mat_op_to_vec.cpp
Breaks matrix operation expressions down to a series of vector operations.
Generally this is how we have to codegen matrix operations for a
GPU, so this gives us the chance to constant fold operations on a
column or row.
|
12704 |
lower_named_interface_blocks.cpp |
\file lower_named_interface_blocks.cpp
This lowering pass converts all interface blocks with instance names
into interface blocks without an instance name.
For example, the following shader:
out block {
float block_var;
} inst_name;
main()
{
inst_name.block_var = 0.0;
}
Is rewritten to:
out block {
float block_var;
};
main()
{
block_var = 0.0;
}
This takes place after the shader code has already been verified with
the interface name in place.
The linking phase will use the interface block name rather than the
interface's instance name when linking interfaces.
This modification to the ir allows our currently existing dead code
elimination to work with interface blocks without changes.
|
11062 |
lower_offset_array.cpp |
\file lower_offset_array.cpp
IR lower pass to decompose ir_texture ir_tg4 with an array of offsets
into four ir_tg4s with a single ivec2 offset, select the .w component of each,
and return those four values packed into a gvec4.
\author Chris Forbes <chrisf@ijw.co.nz>
|
2745 |
lower_output_reads.cpp |
\file lower_output_reads.cpp
In GLSL, shader output variables (such as varyings) can be both read and
written. However, on some hardware, reading an output register causes
trouble.
This pass creates temporary shadow copies of every (used) shader output,
and replaces all accesses to use those instead. It also adds code to the
main() function to copy the final values to the actual shader outputs.
|
6109 |
lower_packed_varyings.cpp |
\file lower_varyings_to_packed.cpp
This lowering pass generates GLSL code that manually packs varyings into
vec4 slots, for the benefit of back-ends that don't support packed varyings
natively.
For example, the following shader:
out mat3x2 foo; // location=4, location_frac=0
out vec3 bar[2]; // location=5, location_frac=2
main()
{
...
}
Is rewritten to:
mat3x2 foo;
vec3 bar[2];
out vec4 packed4; // location=4, location_frac=0
out vec4 packed5; // location=5, location_frac=0
out vec4 packed6; // location=6, location_frac=0
main()
{
...
packed4.xy = foo[0];
packed4.zw = foo[1];
packed5.xy = foo[2];
packed5.zw = bar[0].xy;
packed6.x = bar[0].z;
packed6.yzw = bar[1];
}
This lowering pass properly handles "double parking" of a varying vector
across two varying slots. For example, in the code above, two of the
components of bar[0] are stored in packed5, and the remaining component is
stored in packed6.
Note that in theory, the extra instructions may cause some loss of
performance. However, hopefully in most cases the performance loss will
either be absorbed by a later optimization pass, or it will be offset by
memory bandwidth savings (because fewer varyings are used).
This lowering pass also packs flat floats, ints, and uints together, by
using ivec4 as the base type of flat "varyings", and using appropriate
casts to convert floats and uints into ints.
This lowering pass also handles varyings whose type is a struct or an array
of struct. Structs are packed in order and with no gaps, so there may be a
performance penalty due to structure elements being double-parked.
Lowering of geometry shader inputs is slightly more complex, since geometry
inputs are always arrays, so we need to lower arrays to arrays. For
example, the following input:
in struct Foo {
float f;
vec3 v;
vec2 a[2];
} arr[3]; // location=4, location_frac=0
Would get lowered like this if it occurred in a fragment shader:
struct Foo {
float f;
vec3 v;
vec2 a[2];
} arr[3];
in vec4 packed4; // location=4, location_frac=0
in vec4 packed5; // location=5, location_frac=0
in vec4 packed6; // location=6, location_frac=0
in vec4 packed7; // location=7, location_frac=0
in vec4 packed8; // location=8, location_frac=0
in vec4 packed9; // location=9, location_frac=0
main()
{
arr[0].f = packed4.x;
arr[0].v = packed4.yzw;
arr[0].a[0] = packed5.xy;
arr[0].a[1] = packed5.zw;
arr[1].f = packed6.x;
arr[1].v = packed6.yzw;
arr[1].a[0] = packed7.xy;
arr[1].a[1] = packed7.zw;
arr[2].f = packed8.x;
arr[2].v = packed8.yzw;
arr[2].a[0] = packed9.xy;
arr[2].a[1] = packed9.zw;
...
}
But it would get lowered like this if it occurred in a geometry shader:
struct Foo {
float f;
vec3 v;
vec2 a[2];
} arr[3];
in vec4 packed4[3]; // location=4, location_frac=0
in vec4 packed5[3]; // location=5, location_frac=0
main()
{
arr[0].f = packed4[0].x;
arr[0].v = packed4[0].yzw;
arr[0].a[0] = packed5[0].xy;
arr[0].a[1] = packed5[0].zw;
arr[1].f = packed4[1].x;
arr[1].v = packed4[1].yzw;
arr[1].a[0] = packed5[1].xy;
arr[1].a[1] = packed5[1].zw;
arr[2].f = packed4[2].x;
arr[2].v = packed4[2].yzw;
arr[2].a[0] = packed5[2].xy;
arr[2].a[1] = packed5[2].zw;
...
}
|
36858 |
lower_packing_builtins.cpp |
A visitor that lowers built-in floating-point pack/unpack expressions
such packSnorm2x16.
|
47382 |
lower_precision.cpp |
\file lower_precision.cpp
|
21418 |
lower_shared_reference.cpp |
\file lower_shared_reference.cpp
IR lower pass to replace dereferences of compute shader shared variables
with intrinsic function calls.
This relieves drivers of the responsibility of allocating space for the
shared variables in the shared memory region.
|
17680 |
lower_subroutine.cpp |
\file lower_subroutine.cpp
lowers subroutines to an if ladder.
|
3830 |
lower_tess_level.cpp |
\file lower_tess_level.cpp
This pass accounts for the difference between the way gl_TessLevelOuter
and gl_TessLevelInner is declared in standard GLSL (as an array of
floats), and the way it is frequently implemented in hardware (as a vec4
and vec2).
The declaration of gl_TessLevel* is replaced with a declaration
of gl_TessLevel*MESA, and any references to gl_TessLevel* are
translated to refer to gl_TessLevel*MESA with the appropriate
swizzling of array indices. For instance:
gl_TessLevelOuter[i]
is translated into:
gl_TessLevelOuterMESA[i]
Since some hardware may not internally represent gl_TessLevel* as a pair
of vec4's, this lowering pass is optional. To enable it, set the
LowerTessLevel flag in gl_shader_compiler_options to true.
|
16174 |
lower_texture_projection.cpp |
\file lower_texture_projection.cpp
IR lower pass to perform the division of texture coordinates by the texture
projector if present.
Many GPUs have a texture sampling opcode that takes the projector
and does the divide internally, thus the presence of the projector
in the IR. For GPUs that don't, this saves the driver needing the
logic for handling the divide.
\author Eric Anholt <eric@anholt.net>
|
3208 |
lower_ubo_reference.cpp |
\file lower_ubo_reference.cpp
IR lower pass to replace dereferences of variables in a uniform
buffer object with usage of ir_binop_ubo_load expressions, each of
which can read data up to the size of a vec4.
This relieves drivers of the responsibility to deal with tricky UBO
layout issues like std140 structures and row_major matrices on
their own.
|
38867 |
lower_variable_index_to_cond_assign.cpp |
\file lower_variable_index_to_cond_assign.cpp
Turns non-constant indexing into array types to a series of
conditional moves of each element into a temporary.
Pre-DX10 GPUs often don't have a native way to do this operation,
and this works around that.
The lowering process proceeds as follows. Each non-constant index
found in an r-value is converted to a canonical form \c array[i]. Each
element of the array is conditionally assigned to a temporary by comparing
\c i to a constant index. This is done by cloning the canonical form and
replacing all occurances of \c i with a constant. Each remaining occurance
of the canonical form in the IR is replaced with a dereference of the
temporary variable.
L-values with non-constant indices are handled similarly. In this case,
the RHS of the assignment is assigned to a temporary. The non-constant
index is replace with the canonical form (just like for r-values). The
temporary is conditionally assigned to each element of the canonical form
by comparing \c i with each index. The same clone-and-replace scheme is
used.
|
18874 |
lower_vec_index_to_cond_assign.cpp |
\file lower_vec_index_to_cond_assign.cpp
Turns indexing into vector types to a series of conditional moves
of each channel's swizzle into a temporary.
Most GPUs don't have a native way to do this operation, and this
works around that. For drivers using both this pass and
ir_vec_index_to_swizzle, there's a risk that this pass will happen
before sufficient constant folding to find that the array index is
constant. However, we hope that other optimization passes,
particularly constant folding of assignment conditions and copy
propagation, will result in the same code in the end.
|
8203 |
lower_vec_index_to_swizzle.cpp |
\file lower_vec_index_to_swizzle.cpp
Turns constant indexing into vector types to swizzles. This will
let other swizzle-aware optimization passes catch these constructs,
and codegen backends not have to worry about this case.
|
3336 |
lower_vector.cpp |
\file lower_vector.cpp
IR lowering pass to remove some types of ir_quadop_vector
\author Ian Romanick <ian.d.romanick@intel.com>
|
6207 |
lower_vector_derefs.cpp |
anonymous namespace |
7344 |
lower_vector_insert.cpp |
anonymous namespace |
4833 |
lower_vertex_id.cpp |
\file lower_vertex_id.cpp
There exists hardware, such as i965, that does not implement the OpenGL
semantic for gl_VertexID. Instead, that hardware does not include the
value of basevertex in the gl_VertexID value. To implement the OpenGL
semantic, we'll have to convert gl_Vertex_ID to
gl_VertexIDMESA+gl_BaseVertexMESA.
|
4857 |
lower_xfb_varying.cpp |
\file lower_xfb_varying.cpp
|
6646 |
main.cpp |
@file main.cpp
This file is the main() routine and scaffolding for producing
builtin_compiler (which doesn't include builtins itself and is used
to generate the profile information for builtin_function.cpp), and
for glsl_compiler (which does include builtins and can be used to
offline compile GLSL code and examine the resulting GLSL IR.
|
3453 |
opt_add_neg_to_sub.h |
empty |
2034 |
opt_algebraic.cpp |
\file opt_algebraic.cpp
Takes advantage of association, commutivity, and other algebraic
properties to simplify expressions.
|
33281 |
opt_array_splitting.cpp |
\file opt_array_splitting.cpp
If an array is always dereferenced with a constant index, then
split it apart into its elements, making it more amenable to other
optimization passes.
This skips uniform/varying arrays, which would need careful
handling due to their ir->location fields tying them to the GL API
and other shader stages.
|
14869 |
opt_conditional_discard.cpp |
\file opt_conditional_discard.cpp
Replace
if (cond) discard;
with
(discard <condition>)
|
2724 |
opt_constant_folding.cpp |
\file opt_constant_folding.cpp
Replace constant-valued expressions with references to constant values.
|
6243 |
opt_constant_propagation.cpp |
\file opt_constant_propagation.cpp
Tracks assignments of constants to channels of variables, and
usage of those constant channels with direct usage of the constants.
This can lead to constant folding and algebraic optimizations in
those later expressions, while causing no increase in instruction
count (due to constants being generally free to load from a
constant push buffer or as instruction immediate values) and
possibly reducing register pressure.
|
15425 |
opt_constant_variable.cpp |
\file opt_constant_variable.cpp
Marks variables assigned a single constant value over the course
of the program as constant.
The goal here is to trigger further constant folding and then dead
code elimination. This is common with vector/matrix constructors
and calls to builtin functions.
|
7084 |
opt_copy_propagation_elements.cpp |
\file opt_copy_propagation_elements.cpp
Replaces usage of recently-copied components of variables with the
previous copy of the variable.
This should reduce the number of MOV instructions in the generated
programs and help triggering other optimizations that live in GLSL
level.
|
21094 |
opt_dead_builtin_variables.cpp |
Pre-linking, optimize unused built-in variables
Uniforms, constants, system values, inputs (vertex shader only), and
outputs (fragment shader only) that are not used can be removed.
|
3398 |
opt_dead_builtin_varyings.cpp |
\file opt_dead_builtin_varyings.cpp
This eliminates the built-in shader outputs which are either not written
at all or not used by the next stage. It also eliminates unused elements
of gl_TexCoord inputs, which reduces the overall varying usage.
The varyings handled here are the primary and secondary color, the fog,
and the texture coordinates (gl_TexCoord).
This pass is necessary, because the Mesa GLSL linker cannot eliminate
built-in varyings like it eliminates user-defined varyings, because
the built-in varyings have pre-assigned locations. Also, the elimination
of unused gl_TexCoord elements requires its own lowering pass anyway.
It's implemented by replacing all occurrences of dead varyings with
temporary variables, which creates dead code. It is recommended to run
a dead-code elimination pass after this.
If any texture coordinate slots can be eliminated, the gl_TexCoord array is
broken down into separate vec4 variables with locations equal to
VARYING_SLOT_TEX0 + i.
The same is done for the gl_FragData fragment shader output.
|
21052 |
opt_dead_code.cpp |
\file opt_dead_code.cpp
Eliminates dead assignments and variable declarations from the code.
|
7286 |
opt_dead_code_local.cpp |
\file opt_dead_code_local.cpp
Eliminates local dead assignments from the code.
This operates on basic blocks, tracking assignments and finding if
they're used before the variable is completely reassigned.
Compare this to ir_dead_code.cpp, which operates globally looking
for assignments to variables that are never read.
|
9767 |
opt_dead_functions.cpp |
\file opt_dead_functions.cpp
Eliminates unused functions from the linked program.
|
3971 |
opt_flatten_nested_if_blocks.cpp |
\file opt_flatten_nested_if_blocks.cpp
Flattens nested if blocks such as:
if (x) {
if (y) {
...
}
}
into a single if block with a combined condition:
if (x && y) {
...
}
|
2811 |
opt_flip_matrices.cpp |
\file opt_flip_matrices.cpp
Convert (matrix * vector) operations to (vector * matrixTranspose),
which can be done using dot products rather than multiplies and adds.
On some hardware, this is more efficient.
This currently only does the conversion for built-in matrices which
already have transposed equivalents. Namely, gl_ModelViewProjectionMatrix
and gl_TextureMatrix.
|
3960 |
opt_function_inlining.cpp |
\file opt_function_inlining.cpp
Replaces calls to functions with the body of the function.
|
13628 |
opt_if_simplification.cpp |
\file opt_if_simplification.cpp
Moves constant branches of if statements out to the surrounding
instruction stream, and inverts if conditionals to avoid empty
"then" blocks.
|
3811 |
opt_minmax.cpp |
\file opt_minmax.cpp
Drop operands from an expression tree of only min/max operations if they
can be proven to not contribute to the final result.
The algorithm is similar to alpha-beta pruning on a minmax search.
|
15421 |
opt_rebalance_tree.cpp |
\file opt_rebalance_tree.cpp
Rebalances a reduction expression tree.
For reduction operations (e.g., x + y + z + w) we generate an expression
tree like
+
/ \
+ w
/ \
+ z
/ \
x y
which we can rebalance into
+
/ \
/ \
+ +
/ \ / \
x y z w
to get a better instruction scheduling.
See "Tree Rebalancing in Optimal Editor Time and Space" by Quentin F. Stout
and Bette L. Warren.
Also see http://penguin.ewu.edu/~trolfe/DSWpaper/ for a very readable
explanation of the of the tree_to_vine() (rightward rotation) and
vine_to_tree() (leftward rotation) algorithms.
|
9666 |
opt_redundant_jumps.cpp |
\file opt_redundant_jumps.cpp
Remove certain types of redundant jumps
|
3664 |
opt_structure_splitting.cpp |
\file opt_structure_splitting.cpp
If a structure is only ever referenced by its components, then
split those components out to individual variables so they can be
handled normally by other optimization passes.
This skips structures like uniforms, which need to be accessible as
structures for their access by the GL.
|
11074 |
opt_swizzle.cpp |
\file opt_swizzle.cpp
Optimize swizzle operations.
First, compact a sequence of swizzled swizzles into a single swizzle.
If the final resulting swizzle doesn't change the order or count of
components, then remove the swizzle so that other optimization passes see
the value behind it.
|
3364 |
opt_tree_grafting.cpp |
\file opt_tree_grafting.cpp
Takes assignments to variables that are dereferenced only once and
pastes the RHS expression into where the variable is dereferenced.
In the process of various operations like function inlining and
tertiary op handling, we'll end up with our expression trees having
been chopped up into a series of assignments of short expressions
to temps. Other passes like ir_algebraic.cpp would prefer to see
the deepest expression trees they can to try to optimize them.
This is a lot like copy propagaton. In comparison, copy
propagation only acts on plain copies, not arbitrary expressions on
the RHS. Generally, we wouldn't want to go pasting some
complicated expression everywhere it got used, though, so we don't
handle expressions in that pass.
The hard part is making sure we don't move an expression across
some other assignments that would change the value of the
expression. So we split this into two passes: First, find the
variables in our scope which are written to once and read once, and
then go through basic blocks seeing if we find an opportunity to
move those expressions safely.
|
11564 |
opt_vectorize.cpp |
\file opt_vectorize.cpp
Combines scalar assignments of the same expression (modulo swizzle) to
multiple channels of the same variable into a single vectorized expression
and assignment.
Many generated shaders contain scalarized code. That is, they contain
r1.x = log2(v0.x);
r1.y = log2(v0.y);
r1.z = log2(v0.z);
rather than
r1.xyz = log2(v0.xyz);
We look for consecutive assignments of the same expression (modulo swizzle)
to each channel of the same variable.
For instance, we want to convert these three scalar operations
(assign (x) (var_ref r1) (expression float log2 (swiz x (var_ref v0))))
(assign (y) (var_ref r1) (expression float log2 (swiz y (var_ref v0))))
(assign (z) (var_ref r1) (expression float log2 (swiz z (var_ref v0))))
into a single vector operation
(assign (xyz) (var_ref r1) (expression vec3 log2 (swiz xyz (var_ref v0))))
|
12647 |
program.h |
extern "C" |
2009 |
propagate_invariance.cpp |
\file propagate_invariance.cpp
Propagate the "invariant" and "precise" qualifiers to variables used to
compute invariant or precise values.
The GLSL spec (depending on what version you read) says, among the
conditions for getting bit-for-bit the same values on an invariant output:
"All operations in the consuming expressions and any intermediate
expressions must be the same, with the same order of operands and same
associativity, to give the same order of evaluation."
This effectively means that if a variable is used to compute an invariant
value then that variable becomes invariant. The same should apply to the
"precise" qualifier.
|
3720 |
README |
Welcome to Mesa's GLSL compiler. A brief overview of how things flow: |
10776 |
s_expression.cpp |
-*- c++ -*- |
6159 |
s_expression.h |
-*- c++ -*- |
4733 |
serialize.cpp |
\file serialize.cpp
GLSL serialization
Supports serializing and deserializing glsl programs using a blob.
|
48965 |
serialize.h |
extern "C" |
1687 |
shader_cache.cpp |
\file shader_cache.cpp
GLSL shader cache implementation
This uses disk_cache.c to write out a serialization of various
state that's required in order to successfully load and use a
binary written out by a drivers backend, this state is referred to as
"metadata" throughout the implementation.
The hash key for glsl metadata is a hash of the hashes of each GLSL
source string as well as some API settings that change the final program
such as SSO, attribute bindings, frag data bindings, etc.
In order to avoid caching any actual IR we use the put_key/get_key support
in the disk_cache to put the SHA-1 hash for each successfully compiled
shader into the cache, and optimisticly return early from glCompileShader
(if the identical shader had been successfully compiled in the past),
in the hope that the final linked shader will be found in the cache.
If anything goes wrong (shader variant not found, backend cache item is
corrupt, etc) we will use a fallback path to compile and link the IR.
|
9568 |
shader_cache.h |
SHADER_CACHE_H |
1576 |
standalone.cpp |
@file standalone.cpp
Standalone compiler helper lib. Used by standalone glsl_compiler and
also available to drivers to implement their own standalone compiler
with driver backend.
|
22129 |
standalone.h |
GLSL_STANDALONE_H |
1756 |
standalone_scaffolding.cpp |
This file declares stripped-down versions of functions that
normally exist outside of the glsl folder, so that they can be used
when running the GLSL compiler standalone (for unit testing or
compiling builtins).
|
9516 |
standalone_scaffolding.h |
This file declares stripped-down versions of functions that
normally exist outside of the glsl folder, so that they can be used
when running the GLSL compiler standalone (for unit testing or
compiling builtins).
|
3933 |
string_to_uint_map.cpp |
\file string_to_uint_map.cpp
\brief Dumb wrapprs so that C code can create and destroy maps.
\author Ian Romanick <ian.d.romanick@intel.com>
|
1546 |
string_to_uint_map.h |
Map from a string (name) to an unsigned integer value
\note
Because of the way this class interacts with the \c hash_table
implementation, values of \c UINT_MAX cannot be stored in the map.
|
5185 |
test_optpass.h |
TEST_OPTPASS_H |
1274 |
TODO |
|
689 |
xxd.py |
|
3639 |