Name Description Size
ast.h -*- c++ -*- 37345
ast_array_index.cpp If \c ir is a reference to an array for which we are tracking the max array element accessed, track that the given element has been accessed. Otherwise do nothing. This function also checks whether the array is a built-in array whose maximum size is too small to accommodate the given index, and if so uses loc and state to report the error. 15340
ast_expr.cpp 2272
ast_function.cpp We need to process the parameters first in order to know if we can raise or not a unitialized warning. Calling set_is_lhs silence the warning for now. Raising the warning or not will be checked at verify_parameter_modes. 95627
ast_to_hir.cpp \file ast_to_hir.c Convert abstract syntax to to high-level intermediate reprensentation (HIR). During the conversion to HIR, the majority of the symantic checking is preformed on the program. This includes: * Symbol table management * Type checking * Function binding The majority of this work could be done during parsing, and the parser could probably generate HIR directly. However, this results in frequent changes to the parser code. Since we do not assume that every system this complier is built on will have Flex and Bison installed, we have to store the code generated by these tools in our version control system. In other parts of the system we've seen problems where a parser was changed but the generated code was not committed, merge conflicts where created because two developers had slightly different versions of Bison installed, etc. I have also noticed that running Bison generated parsers in GDB is very irritating. When you get a segfault on '$$ = $1->foo', you can't very well 'print $1' in GDB. As a result, my preference is to put as little C code as possible in the parser (and lexer) sources. 349136
ast_type.cpp 'subroutine' isnt a real qualifier. 36025
builtin_functions.cpp \file builtin_functions.cpp Support for GLSL built-in functions. This file is split into several main components: 1. Availability predicates A series of small functions that check whether the current shader supports the version/extensions required to expose a built-in. 2. Core builtin_builder class functionality 3. Lists of built-in functions The builtin_builder::create_builtins() function contains lists of all built-in function signatures, where they're available, what types they take, and so on. 4. Implementations of built-in function signatures A series of functions which create ir_function_signatures and emit IR via ir_builder to implement them. 5. External API A few functions the rest of the compiler can use to interact with the built-in function module. For example, searching for a built-in by name and parameters. 374255
builtin_functions.h extern "C" 2549
builtin_int64.h IF CONDITION 43996
builtin_types.cpp \file builtin_types.cpp The glsl_type class has static members to represent all the built-in types (such as the glsl_type::_float_type flyweight) as well as convenience pointer accessors (such as glsl_type::float_type). Those global variables are declared and initialized in this file. This also contains _mesa_glsl_initialize_types(), a function which populates a symbol table with the available built-in types for a particular language version and set of enabled extensions. 19917
builtin_variables.cpp Building this file with MinGW g++ 7.3 or 7.4 with: scons platform=windows toolchain=crossmingw machine=x86 build=profile triggers an internal compiler error. Overriding the optimization level to -O1 works around the issue. MinGW 5.3.1 does not seem to have the bug, neither does 8.3. So for now we're simply testing for version 7.x here. 63550
float64.glsl 58605
generate_ir.cpp for SWIZZLE_X, &c. 1359
glcpp
glsl_lexer.cpp A lexical scanner generated by flex 163592
glsl_lexer.ll 42804
glsl_optimizer.cpp 25958
glsl_optimizer.h Main GLSL optimizer interface. See ../../README.md for more instructions. General usage: ctx = glslopt_initialize(); for (lots of shaders) { shader = glslopt_optimize (ctx, shaderType, shaderSource, options); if (glslopt_get_status (shader)) { newSource = glslopt_get_output (shader); } else { errorLog = glslopt_get_log (shader); } glslopt_shader_delete (shader); } glslopt_cleanup (ctx); 3181
glsl_parser.cpp A Bison parser, made by GNU Bison 3.5. 246677
glsl_parser.h A Bison parser, made by GNU Bison 3.5. 6544
glsl_parser.yy 97360
glsl_parser_extras.cpp for PRIx64 macro 80777
glsl_parser_extras.h Most of the definitions here only apply to C++ 34759
glsl_symbol_table.cpp -*- c++ -*- 9229
glsl_symbol_table.h -*- c++ -*- 3727
hir_field_selection.cpp There are two kinds of field selection. There is the selection of a specific field from a structure, and there is the selection of a swizzle / mask from a vector. Which is which is determined entirely by the base type of the thing to which the field selection operator is being applied. 3127
int64.glsl 2680
ir.cpp Modify the swizzle make to move one component to another \param m IR swizzle to be modified \param from Component in the RHS that is to be swizzled \param to Desired swizzle location of \c from 61767
ir.h -*- c++ -*- 75126
ir_array_refcount.cpp \file ir_array_refcount.cpp Provides a visitor which produces a list of variables referenced. 6103
ir_array_refcount.h \file ir_array_refcount.h Provides a visitor which produces a list of variables referenced. 3649
ir_basic_block.cpp \file ir_basic_block.cpp Basic block analysis of instruction streams. 3402
ir_basic_block.h GLSL_IR_BASIC_BLOCK_H 1424
ir_builder.cpp 11482
ir_builder.h This little class exists to let the helper expression generators take either an ir_rvalue * or an ir_variable * to be automatically dereferenced, while still providing compile-time type checking. You don't have to explicitly call the constructor -- C++ will see that you passed an ir_variable, and silently call the operand(ir_variable *var) constructor behind your back. 7225
ir_builder_print_visitor.cpp for PRIx64 macro 23909
ir_builder_print_visitor.h -*- c++ -*- 1361
ir_clone.cpp The only possible instantiation is the generic error value. 13153
ir_constant_expression.cpp \file ir_constant_expression.cpp Evaluate and process constant valued expressions In GLSL, constant valued expressions are used in several places. These must be processed and evaluated very early in the compilation process. * Sizes of arrays * Initializers for uniforms * Initializers for \c const variables 31637
ir_equals.cpp Helper for checking equality when one instruction might be NULL, since you can't access a's vtable in that case. 5478
ir_expression_flattening.cpp \file ir_expression_flattening.cpp Takes the leaves of expression trees and makes them dereferences of assignments of the leaves to temporaries, according to a predicate. This is used for breaking down matrix operations, where it's easier to create a temporary and work on each of its vector components individually. 2704
ir_expression_flattening.h \file ir_expression_flattening.h Takes the leaves of expression trees and makes them dereferences of assignments of the leaves to temporaries, according to a predicate. This is used for automatic function inlining, where we want to take an expression containing a call and move the call out to its own assignment so that we can inline it at the appropriate place in the instruction stream. 1815
ir_expression_operation.h Sentinels marking the last of each kind of operation. 4687
ir_expression_operation.py Basic iterator for a set of type signatures. Various kinds of sequences of types come in, and an iteration of type_signature objects come out. 43788
ir_expression_operation_constant.h 64728
ir_expression_operation_strings.h 5690
ir_function.cpp < Match requires implicit conversion. 13922
ir_function_can_inline.cpp \file ir_function_can_inline.cpp Determines if we can inline a function call using ir_function_inlining.cpp. The primary restriction is that we can't return from the function other than as the last instruction. In lower_jumps.cpp, we can lower return statements not at the end of the function to other control flow in order to deal with this restriction. 2472
ir_function_detect_recursion.cpp 11764
ir_function_inlining.h \file ir_function_inlining.h Replaces calls to functions with the body of the function. 1410
ir_hierarchical_visitor.cpp 9573
ir_hierarchical_visitor.h -*- c++ -*- 9455
ir_hv_accept.cpp \file ir_hv_accept.cpp Implementations of all hierarchical visitor accept methods for IR instructions. 12304
ir_optimization.h \file ir_optimization.h Prototypes for optimization passes to be called by the compiler and drivers. 8814
ir_print_glsl_visitor.cpp samplerExternal uses texture2D 54026
ir_print_glsl_visitor.h -*- c++ -*- 2763
ir_print_visitor.cpp for PRIx64 macro 16924
ir_print_visitor.h -*- c++ -*- 3253
ir_reader.cpp anonymous namespace 34867
ir_reader.h -*- c++ -*- 1386
ir_rvalue_visitor.cpp \file ir_rvalue_visitor.cpp Generic class to implement the common pattern we have of wanting to visit each ir_rvalue * and possibly change that node to a different class. 6915
ir_rvalue_visitor.h \file ir_rvalue_visitor.h Generic class to implement the common pattern we have of wanting to visit each ir_rvalue * and possibly change that node to a different class. Just implement handle_rvalue() and you will be called with a pointer to each rvalue in the tree. 3852
ir_set_program_inouts.cpp \file ir_set_program_inouts.cpp Sets the inputs_read and outputs_written of Mesa programs. Mesa programs (gl_program, not gl_shader_program) have a set of flags indicating which varyings are read and written. Computing which are actually read from some sort of backend code can be tricky when variable array indexing involved. So this pass provides support for setting inputs_read and outputs_written right from the GLSL IR. 15335
ir_uniform.h stdbool.h is necessary because this file is included in both C and C++ code. 6387
ir_unused_structs.cpp 3591
ir_unused_structs.h 1232
ir_validate.cpp \file ir_validate.cpp Attempts to verify that various invariants of the IR tree are true. In particular, at the moment it makes sure that no single ir_instruction node except for ir_variable appears multiple times in the ir tree. ir_variable does appear multiple times: Once as a declaration in an exec_list, and multiple times as the endpoint of a dereference chain. 36134
ir_variable_refcount.cpp \file ir_variable_refcount.cpp Provides a visitor which produces a list of variables referenced, how many times they were referenced and assigned, and whether they were defined in the scope. 4628
ir_variable_refcount.h \file ir_variable_refcount.h Provides a visitor which produces a list of variables referenced, how many times they were referenced and assigned, and whether they were defined in the scope. 2934
ir_visitor.h -*- c++ -*- 3948
link_atomics.cpp Atomic counter uniform as seen by the program. 12731
link_functions.cpp If ir is an ir_call from a function that was imported from another shader callee will point to an ir_function_signature in the original shader. In this case the function signature MUST NOT BE MODIFIED. Doing so will modify the original shader. This may prevent that shader from being linkable in other programs. 11940
link_interface_blocks.cpp \file link_interface_blocks.cpp Linker support for GLSL's interface blocks. 20114
link_uniform_block_active_visitor.cpp If a block with this block-name has not previously been seen, add it. If a block with this block-name has been seen, it must be identical to the block currently being examined. 10434
link_uniform_block_active_visitor.h Size of the array before array-trimming optimizations. Locations are only assigned to active array elements, but the location values are calculated as if all elements are active. The total number of elements in an array including the elements in arrays of arrays before inactive elements are removed is needed to be perform that calculation. 2747
link_uniform_blocks.cpp empty 21082
link_uniform_initializers.cpp These functions are put in a "private" namespace instead of being marked static so that the unit tests can access them. See http://code.google.com/p/googletest/wiki/AdvancedGuide#Testing_Private_Code 11330
link_uniforms.cpp \file link_uniforms.cpp Assign locations for GLSL uniforms. \author Ian Romanick <ian.d.romanick@intel.com> 64179
link_varyings.cpp \file link_varyings.cpp Linker functions related specifically to linking varyings between shader stages. 122925
link_varyings.h \file link_varyings.h Linker functions related specifically to linking varyings between shader stages. 8426
linker.cpp \file linker.cpp GLSL linker implementation Given a set of shaders that are to be linked to generate a final program, there are three distinct stages. In the first stage shaders are partitioned into groups based on the shader type. All shaders of a particular type (e.g., vertex shaders) are linked together. - Undefined references in each shader are resolve to definitions in another shader. - Types and qualifiers of uniforms, outputs, and global variables defined in multiple shaders with the same name are verified to be the same. - Initializers for uniforms and global variables defined in multiple shaders with the same name are verified to be the same. The result, in the terminology of the GLSL spec, is a set of shader executables for each processing unit. After the first stage is complete, a series of semantic checks are performed on each of the shader executables. - Each shader executable must define a \c main function. - Each vertex shader executable must write to \c gl_Position. - Each fragment shader executable must write to either \c gl_FragData or \c gl_FragColor. In the final stage individual shader executables are linked to create a complete exectuable. - Types of uniforms defined in multiple shader stages with the same name are verified to be the same. - Initializers for uniforms defined in multiple shader stages with the same name are verified to be the same. - Types and qualifiers of outputs defined in one stage are verified to be the same as the types and qualifiers of inputs defined with the same name in a later stage. \author Ian Romanick <ian.d.romanick@intel.com> 183709
linker.h -*- c++ -*- 8795
linker_util.cpp for gl_uniform_storage 13825
linker_util.h Sometimes there are empty slots left over in UniformRemapTable after we allocate slots to explicit locations. This struct represents a single continouous block of empty slots in UniformRemapTable. 3762
list.h \file list.h \brief Doubly-linked list abstract container type. Each doubly-linked list has a sentinel head and tail node. These nodes contain no data. The head sentinel can be identified by its \c prev pointer being \c NULL. The tail sentinel can be identified by its \c next pointer being \c NULL. A list is empty if either the head sentinel's \c next pointer points to the tail sentinel or the tail sentinel's \c prev poiner points to the head sentinel. The head sentinel and tail sentinel nodes are allocated within the list structure. Do note that this means that the list nodes will contain pointers into the list structure itself and as a result you may not \c realloc() an \c exec_list or any structure in which an \c exec_list is embedded. 22417
loop_analysis.cpp Find an initializer of a variable outside a loop Works backwards from the loop to find the pre-loop value of the variable. This is used, for example, to find the initial value of loop induction variables. \param loop Loop where \c var is an induction variable \param var Variable whose initializer is to be found \return The \c ir_rvalue assigned to the variable outside the loop. May return \c NULL if no initializer can be found. 24012
loop_analysis.h -*- c++ -*- 6438
loop_unroll.cpp anonymous namespace 19481
lower_blend_equation_advanced.cpp f(Cs,Cd) = Cs*Cd 18897
lower_buffer_access.cpp \file lower_buffer_access.cpp Helper for IR lowering pass to replace dereferences of buffer object based shader variables with intrinsic function calls. This helper is used by lowering passes for UBOs, SSBOs and compute shader shared variables. 17305
lower_buffer_access.h \file lower_buffer_access.h Helper for IR lowering pass to replace dereferences of buffer object based shader variables with intrinsic function calls. This helper is used by lowering passes for UBOs, SSBOs and compute shader shared variables. 2718
lower_builtins.cpp \file lower_builtins.cpp Inline calls to builtin functions. 1858
lower_const_arrays_to_uniforms.cpp \file lower_const_arrays_to_uniforms.cpp Lower constant arrays to uniform arrays. Some driver backends (such as i965 and nouveau) don't handle constant arrays gracefully, instead treating them as ordinary writable temporary arrays. Since arrays can be large, this often means spilling them to scratch memory, which usually involves a large number of instructions. This must be called prior to link_set_uniform_initializers(); we need the linker to process our new uniform's constant initializer. This should be called after optimizations, since those can result in splitting and removing arrays that are indexed by constant expressions. 4846
lower_cs_derived.cpp \file lower_cs_derived.cpp For hardware that does not support the gl_GlobalInvocationID and gl_LocalInvocationIndex system values, replace them with fresh globals. Note that we can't rely on gl_WorkGroupSize or gl_LocalGroupSizeARB being available, since they may only have been defined in a non-main shader. [ This can happen if only a secondary shader has the layout(local_size_*) declaration. ] This is meant to be run post-linking. 7668
lower_discard.cpp \file lower_discard.cpp This pass moves discards out of if-statements. Case 1: The "then" branch contains a conditional discard: --------------------------------------------------------- if (cond1) { s1; discard cond2; s2; } else { s3; } becomes: temp = false; if (cond1) { s1; temp = cond2; s2; } else { s3; } discard temp; Case 2: The "else" branch contains a conditional discard: --------------------------------------------------------- if (cond1) { s1; } else { s2; discard cond2; s3; } becomes: temp = false; if (cond1) { s1; } else { s2; temp = cond2; s3; } discard temp; Case 3: Both branches contain a conditional discard: ---------------------------------------------------- if (cond1) { s1; discard cond2; s2; } else { s3; discard cond3; s4; } becomes: temp = false; if (cond1) { s1; temp = cond2; s2; } else { s3; temp = cond3; s4; } discard temp; If there are multiple conditional discards, we need only deal with one of them. Repeatedly applying this pass will take care of the others. Unconditional discards are treated as having a condition of "true". 4785
lower_discard_flow.cpp @file lower_discard_flow.cpp Implements the GLSL 1.30 revision 9 rule for fragment shader discard handling: "Control flow exits the shader, and subsequent implicit or explicit derivatives are undefined when this control flow is non-uniform (meaning different fragments within the primitive take different control paths)." There seem to be two conflicting things here. "Control flow exits the shader" sounds like the discarded fragments should effectively jump to the end of the shader, but that breaks derivatives in the case of uniform control flow and causes rendering failure in the bushes in Unigine Tropics. The question, then, is whether the intent was "loops stop at the point that the only active channels left are discarded pixels" or "discarded pixels become inactive at the point that control flow returns to the top of a loop". This implements the second interpretation. 4761
lower_distance.cpp \file lower_distance.cpp This pass accounts for the difference between the way gl_ClipDistance is declared in standard GLSL (as an array of floats), and the way it is frequently implemented in hardware (as a pair of vec4s, with four clip distances packed into each). The declaration of gl_ClipDistance is replaced with a declaration of gl_ClipDistanceMESA, and any references to gl_ClipDistance are translated to refer to gl_ClipDistanceMESA with the appropriate swizzling of array indices. For instance: gl_ClipDistance[i] is translated into: gl_ClipDistanceMESA[i>>2][i&3] Since some hardware may not internally represent gl_ClipDistance as a pair of vec4's, this lowering pass is optional. To enable it, set the LowerCombinedClipCullDistance flag in gl_shader_compiler_options to true. 24752
lower_if_to_cond_assign.cpp \file lower_if_to_cond_assign.cpp This flattens if-statements to conditional assignments if: - the GPU has limited or no flow control support (controlled by max_depth) - small conditional branches are more expensive than conditional assignments (controlled by min_branch_cost, that's the cost for a branch to be preserved) It can't handle other control flow being inside of its block, such as calls or loops. Hopefully loop unrolling and inlining will take care of those. Drivers for GPUs with no control flow support should simply call lower_if_to_cond_assign(instructions) to attempt to flatten all if-statements. Some GPUs (such as i965 prior to gen6) do support control flow, but have a maximum nesting depth N. Drivers for such hardware can call lower_if_to_cond_assign(instructions, N) to attempt to flatten any if-statements appearing at depth > N. 10977
lower_instructions.cpp \file lower_instructions.cpp Many GPUs lack native instructions for certain expression operations, and must replace them with some other expression tree. This pass lowers some of the most common cases, allowing the lowering code to be implemented once rather than in each driver backend. Currently supported transformations: - SUB_TO_ADD_NEG - DIV_TO_MUL_RCP - INT_DIV_TO_MUL_RCP - EXP_TO_EXP2 - POW_TO_EXP2 - LOG_TO_LOG2 - MOD_TO_FLOOR - LDEXP_TO_ARITH - DFREXP_TO_ARITH - CARRY_TO_ARITH - BORROW_TO_ARITH - SAT_TO_CLAMP - DOPS_TO_DFRAC SUB_TO_ADD_NEG: --------------- Breaks an ir_binop_sub expression down to add(op0, neg(op1)) This simplifies expression reassociation, and for many backends there is no subtract operation separate from adding the negation. For backends with native subtract operations, they will probably want to recognize add(op0, neg(op1)) or the other way around to produce a subtract anyway. FDIV_TO_MUL_RCP, DDIV_TO_MUL_RCP, and INT_DIV_TO_MUL_RCP: --------------------------------------------------------- Breaks an ir_binop_div expression down to op0 * (rcp(op1)). Many GPUs don't have a divide instruction (945 and 965 included), but they do have an RCP instruction to compute an approximate reciprocal. By breaking the operation down, constant reciprocals can get constant folded. FDIV_TO_MUL_RCP lowers single-precision and half-precision floating point division; DDIV_TO_MUL_RCP only lowers double-precision floating point division. DIV_TO_MUL_RCP is a convenience macro that sets both flags. INT_DIV_TO_MUL_RCP handles the integer case, converting to and from floating point so that RCP is possible. EXP_TO_EXP2 and LOG_TO_LOG2: ---------------------------- Many GPUs don't have a base e log or exponent instruction, but they do have base 2 versions, so this pass converts exp and log to exp2 and log2 operations. POW_TO_EXP2: ----------- Many older GPUs don't have an x**y instruction. For these GPUs, convert x**y to 2**(y * log2(x)). MOD_TO_FLOOR: ------------- Breaks an ir_binop_mod expression down to (op0 - op1 * floor(op0 / op1)) Many GPUs don't have a MOD instruction (945 and 965 included), and if we have to break it down like this anyway, it gives an opportunity to do things like constant fold the (1.0 / op1) easily. Note: before we used to implement this as op1 * fract(op / op1) but this implementation had significant precision errors. LDEXP_TO_ARITH: ------------- Converts ir_binop_ldexp to arithmetic and bit operations for float sources. DFREXP_DLDEXP_TO_ARITH: --------------- Converts ir_binop_ldexp, ir_unop_frexp_sig, and ir_unop_frexp_exp to arithmetic and bit ops for double arguments. CARRY_TO_ARITH: --------------- Converts ir_carry into (x + y) < x. BORROW_TO_ARITH: ---------------- Converts ir_borrow into (x < y). SAT_TO_CLAMP: ------------- Converts ir_unop_saturate into min(max(x, 0.0), 1.0) DOPS_TO_DFRAC: -------------- Converts double trunc, ceil, floor, round to fract 68546
lower_int64.cpp \file lower_int64.cpp Lower 64-bit operations to 32-bit operations. Each 64-bit value is lowered to a uvec2. For each operation that can be lowered, there is a function called __builtin_foo with the same number of parameters that takes uvec2 sources and produces uvec2 results. An operation like uint64_t(x) * uint64_t(y) becomes packUint2x32(__builtin_umul64(unpackUint2x32(x), unpackUint2x32(y))); 11894
lower_jumps.cpp \file lower_jumps.cpp This pass lowers jumps (break, continue, and return) to if/else structures. It can be asked to: 1. Pull jumps out of ifs where possible 2. Remove all "continue"s, replacing them with an "execute flag" 3. Replace all "break" with a single conditional one at the end of the loop 4. Replace all "return"s with a single return at the end of the function, for the main function and/or other functions Applying this pass gives several benefits: 1. All functions can be inlined. 2. nv40 and other pre-DX10 chips without "continue" can be supported 3. nv30 and other pre-DX10 chips with no control flow at all are better supported Continues are lowered by adding a per-loop "execute flag", initialized to true, that when cleared inhibits all execution until the end of the loop. Breaks are lowered to continues, plus setting a "break flag" that is checked at the end of the loop, and trigger the unique "break". Returns are lowered to breaks/continues, plus adding a "return flag" that causes loops to break again out of their enclosing loops until all the loops are exited: then the "execute flag" logic will ignore everything until the end of the function. Note that "continue" and "return" can also be implemented by adding a dummy loop and using break. However, this is bad for hardware with limited nesting depth, and prevents further optimization, and thus is not currently performed. 39923
lower_mat_op_to_vec.cpp \file lower_mat_op_to_vec.cpp Breaks matrix operation expressions down to a series of vector operations. Generally this is how we have to codegen matrix operations for a GPU, so this gives us the chance to constant fold operations on a column or row. 12704
lower_named_interface_blocks.cpp \file lower_named_interface_blocks.cpp This lowering pass converts all interface blocks with instance names into interface blocks without an instance name. For example, the following shader: out block { float block_var; } inst_name; main() { inst_name.block_var = 0.0; } Is rewritten to: out block { float block_var; }; main() { block_var = 0.0; } This takes place after the shader code has already been verified with the interface name in place. The linking phase will use the interface block name rather than the interface's instance name when linking interfaces. This modification to the ir allows our currently existing dead code elimination to work with interface blocks without changes. 11062
lower_offset_array.cpp \file lower_offset_array.cpp IR lower pass to decompose ir_texture ir_tg4 with an array of offsets into four ir_tg4s with a single ivec2 offset, select the .w component of each, and return those four values packed into a gvec4. \author Chris Forbes <chrisf@ijw.co.nz> 2745
lower_output_reads.cpp \file lower_output_reads.cpp In GLSL, shader output variables (such as varyings) can be both read and written. However, on some hardware, reading an output register causes trouble. This pass creates temporary shadow copies of every (used) shader output, and replaces all accesses to use those instead. It also adds code to the main() function to copy the final values to the actual shader outputs. 6109
lower_packed_varyings.cpp \file lower_varyings_to_packed.cpp This lowering pass generates GLSL code that manually packs varyings into vec4 slots, for the benefit of back-ends that don't support packed varyings natively. For example, the following shader: out mat3x2 foo; // location=4, location_frac=0 out vec3 bar[2]; // location=5, location_frac=2 main() { ... } Is rewritten to: mat3x2 foo; vec3 bar[2]; out vec4 packed4; // location=4, location_frac=0 out vec4 packed5; // location=5, location_frac=0 out vec4 packed6; // location=6, location_frac=0 main() { ... packed4.xy = foo[0]; packed4.zw = foo[1]; packed5.xy = foo[2]; packed5.zw = bar[0].xy; packed6.x = bar[0].z; packed6.yzw = bar[1]; } This lowering pass properly handles "double parking" of a varying vector across two varying slots. For example, in the code above, two of the components of bar[0] are stored in packed5, and the remaining component is stored in packed6. Note that in theory, the extra instructions may cause some loss of performance. However, hopefully in most cases the performance loss will either be absorbed by a later optimization pass, or it will be offset by memory bandwidth savings (because fewer varyings are used). This lowering pass also packs flat floats, ints, and uints together, by using ivec4 as the base type of flat "varyings", and using appropriate casts to convert floats and uints into ints. This lowering pass also handles varyings whose type is a struct or an array of struct. Structs are packed in order and with no gaps, so there may be a performance penalty due to structure elements being double-parked. Lowering of geometry shader inputs is slightly more complex, since geometry inputs are always arrays, so we need to lower arrays to arrays. For example, the following input: in struct Foo { float f; vec3 v; vec2 a[2]; } arr[3]; // location=4, location_frac=0 Would get lowered like this if it occurred in a fragment shader: struct Foo { float f; vec3 v; vec2 a[2]; } arr[3]; in vec4 packed4; // location=4, location_frac=0 in vec4 packed5; // location=5, location_frac=0 in vec4 packed6; // location=6, location_frac=0 in vec4 packed7; // location=7, location_frac=0 in vec4 packed8; // location=8, location_frac=0 in vec4 packed9; // location=9, location_frac=0 main() { arr[0].f = packed4.x; arr[0].v = packed4.yzw; arr[0].a[0] = packed5.xy; arr[0].a[1] = packed5.zw; arr[1].f = packed6.x; arr[1].v = packed6.yzw; arr[1].a[0] = packed7.xy; arr[1].a[1] = packed7.zw; arr[2].f = packed8.x; arr[2].v = packed8.yzw; arr[2].a[0] = packed9.xy; arr[2].a[1] = packed9.zw; ... } But it would get lowered like this if it occurred in a geometry shader: struct Foo { float f; vec3 v; vec2 a[2]; } arr[3]; in vec4 packed4[3]; // location=4, location_frac=0 in vec4 packed5[3]; // location=5, location_frac=0 main() { arr[0].f = packed4[0].x; arr[0].v = packed4[0].yzw; arr[0].a[0] = packed5[0].xy; arr[0].a[1] = packed5[0].zw; arr[1].f = packed4[1].x; arr[1].v = packed4[1].yzw; arr[1].a[0] = packed5[1].xy; arr[1].a[1] = packed5[1].zw; arr[2].f = packed4[2].x; arr[2].v = packed4[2].yzw; arr[2].a[0] = packed5[2].xy; arr[2].a[1] = packed5[2].zw; ... } 36858
lower_packing_builtins.cpp A visitor that lowers built-in floating-point pack/unpack expressions such packSnorm2x16. 47382
lower_precision.cpp \file lower_precision.cpp 21418
lower_shared_reference.cpp \file lower_shared_reference.cpp IR lower pass to replace dereferences of compute shader shared variables with intrinsic function calls. This relieves drivers of the responsibility of allocating space for the shared variables in the shared memory region. 17680
lower_subroutine.cpp \file lower_subroutine.cpp lowers subroutines to an if ladder. 3830
lower_tess_level.cpp \file lower_tess_level.cpp This pass accounts for the difference between the way gl_TessLevelOuter and gl_TessLevelInner is declared in standard GLSL (as an array of floats), and the way it is frequently implemented in hardware (as a vec4 and vec2). The declaration of gl_TessLevel* is replaced with a declaration of gl_TessLevel*MESA, and any references to gl_TessLevel* are translated to refer to gl_TessLevel*MESA with the appropriate swizzling of array indices. For instance: gl_TessLevelOuter[i] is translated into: gl_TessLevelOuterMESA[i] Since some hardware may not internally represent gl_TessLevel* as a pair of vec4's, this lowering pass is optional. To enable it, set the LowerTessLevel flag in gl_shader_compiler_options to true. 16174
lower_texture_projection.cpp \file lower_texture_projection.cpp IR lower pass to perform the division of texture coordinates by the texture projector if present. Many GPUs have a texture sampling opcode that takes the projector and does the divide internally, thus the presence of the projector in the IR. For GPUs that don't, this saves the driver needing the logic for handling the divide. \author Eric Anholt <eric@anholt.net> 3208
lower_ubo_reference.cpp \file lower_ubo_reference.cpp IR lower pass to replace dereferences of variables in a uniform buffer object with usage of ir_binop_ubo_load expressions, each of which can read data up to the size of a vec4. This relieves drivers of the responsibility to deal with tricky UBO layout issues like std140 structures and row_major matrices on their own. 38867
lower_variable_index_to_cond_assign.cpp \file lower_variable_index_to_cond_assign.cpp Turns non-constant indexing into array types to a series of conditional moves of each element into a temporary. Pre-DX10 GPUs often don't have a native way to do this operation, and this works around that. The lowering process proceeds as follows. Each non-constant index found in an r-value is converted to a canonical form \c array[i]. Each element of the array is conditionally assigned to a temporary by comparing \c i to a constant index. This is done by cloning the canonical form and replacing all occurances of \c i with a constant. Each remaining occurance of the canonical form in the IR is replaced with a dereference of the temporary variable. L-values with non-constant indices are handled similarly. In this case, the RHS of the assignment is assigned to a temporary. The non-constant index is replace with the canonical form (just like for r-values). The temporary is conditionally assigned to each element of the canonical form by comparing \c i with each index. The same clone-and-replace scheme is used. 18874
lower_vec_index_to_cond_assign.cpp \file lower_vec_index_to_cond_assign.cpp Turns indexing into vector types to a series of conditional moves of each channel's swizzle into a temporary. Most GPUs don't have a native way to do this operation, and this works around that. For drivers using both this pass and ir_vec_index_to_swizzle, there's a risk that this pass will happen before sufficient constant folding to find that the array index is constant. However, we hope that other optimization passes, particularly constant folding of assignment conditions and copy propagation, will result in the same code in the end. 8203
lower_vec_index_to_swizzle.cpp \file lower_vec_index_to_swizzle.cpp Turns constant indexing into vector types to swizzles. This will let other swizzle-aware optimization passes catch these constructs, and codegen backends not have to worry about this case. 3336
lower_vector.cpp \file lower_vector.cpp IR lowering pass to remove some types of ir_quadop_vector \author Ian Romanick <ian.d.romanick@intel.com> 6207
lower_vector_derefs.cpp anonymous namespace 7344
lower_vector_insert.cpp anonymous namespace 4833
lower_vertex_id.cpp \file lower_vertex_id.cpp There exists hardware, such as i965, that does not implement the OpenGL semantic for gl_VertexID. Instead, that hardware does not include the value of basevertex in the gl_VertexID value. To implement the OpenGL semantic, we'll have to convert gl_Vertex_ID to gl_VertexIDMESA+gl_BaseVertexMESA. 4857
lower_xfb_varying.cpp \file lower_xfb_varying.cpp 6646
main.cpp @file main.cpp This file is the main() routine and scaffolding for producing builtin_compiler (which doesn't include builtins itself and is used to generate the profile information for builtin_function.cpp), and for glsl_compiler (which does include builtins and can be used to offline compile GLSL code and examine the resulting GLSL IR. 3453
opt_add_neg_to_sub.h empty 2034
opt_algebraic.cpp \file opt_algebraic.cpp Takes advantage of association, commutivity, and other algebraic properties to simplify expressions. 33281
opt_array_splitting.cpp \file opt_array_splitting.cpp If an array is always dereferenced with a constant index, then split it apart into its elements, making it more amenable to other optimization passes. This skips uniform/varying arrays, which would need careful handling due to their ir->location fields tying them to the GL API and other shader stages. 14869
opt_conditional_discard.cpp \file opt_conditional_discard.cpp Replace if (cond) discard; with (discard <condition>) 2724
opt_constant_folding.cpp \file opt_constant_folding.cpp Replace constant-valued expressions with references to constant values. 6243
opt_constant_propagation.cpp \file opt_constant_propagation.cpp Tracks assignments of constants to channels of variables, and usage of those constant channels with direct usage of the constants. This can lead to constant folding and algebraic optimizations in those later expressions, while causing no increase in instruction count (due to constants being generally free to load from a constant push buffer or as instruction immediate values) and possibly reducing register pressure. 15425
opt_constant_variable.cpp \file opt_constant_variable.cpp Marks variables assigned a single constant value over the course of the program as constant. The goal here is to trigger further constant folding and then dead code elimination. This is common with vector/matrix constructors and calls to builtin functions. 7084
opt_copy_propagation_elements.cpp \file opt_copy_propagation_elements.cpp Replaces usage of recently-copied components of variables with the previous copy of the variable. This should reduce the number of MOV instructions in the generated programs and help triggering other optimizations that live in GLSL level. 21094
opt_dead_builtin_variables.cpp Pre-linking, optimize unused built-in variables Uniforms, constants, system values, inputs (vertex shader only), and outputs (fragment shader only) that are not used can be removed. 3398
opt_dead_builtin_varyings.cpp \file opt_dead_builtin_varyings.cpp This eliminates the built-in shader outputs which are either not written at all or not used by the next stage. It also eliminates unused elements of gl_TexCoord inputs, which reduces the overall varying usage. The varyings handled here are the primary and secondary color, the fog, and the texture coordinates (gl_TexCoord). This pass is necessary, because the Mesa GLSL linker cannot eliminate built-in varyings like it eliminates user-defined varyings, because the built-in varyings have pre-assigned locations. Also, the elimination of unused gl_TexCoord elements requires its own lowering pass anyway. It's implemented by replacing all occurrences of dead varyings with temporary variables, which creates dead code. It is recommended to run a dead-code elimination pass after this. If any texture coordinate slots can be eliminated, the gl_TexCoord array is broken down into separate vec4 variables with locations equal to VARYING_SLOT_TEX0 + i. The same is done for the gl_FragData fragment shader output. 21052
opt_dead_code.cpp \file opt_dead_code.cpp Eliminates dead assignments and variable declarations from the code. 7286
opt_dead_code_local.cpp \file opt_dead_code_local.cpp Eliminates local dead assignments from the code. This operates on basic blocks, tracking assignments and finding if they're used before the variable is completely reassigned. Compare this to ir_dead_code.cpp, which operates globally looking for assignments to variables that are never read. 9767
opt_dead_functions.cpp \file opt_dead_functions.cpp Eliminates unused functions from the linked program. 3971
opt_flatten_nested_if_blocks.cpp \file opt_flatten_nested_if_blocks.cpp Flattens nested if blocks such as: if (x) { if (y) { ... } } into a single if block with a combined condition: if (x && y) { ... } 2811
opt_flip_matrices.cpp \file opt_flip_matrices.cpp Convert (matrix * vector) operations to (vector * matrixTranspose), which can be done using dot products rather than multiplies and adds. On some hardware, this is more efficient. This currently only does the conversion for built-in matrices which already have transposed equivalents. Namely, gl_ModelViewProjectionMatrix and gl_TextureMatrix. 3960
opt_function_inlining.cpp \file opt_function_inlining.cpp Replaces calls to functions with the body of the function. 13628
opt_if_simplification.cpp \file opt_if_simplification.cpp Moves constant branches of if statements out to the surrounding instruction stream, and inverts if conditionals to avoid empty "then" blocks. 3811
opt_minmax.cpp \file opt_minmax.cpp Drop operands from an expression tree of only min/max operations if they can be proven to not contribute to the final result. The algorithm is similar to alpha-beta pruning on a minmax search. 15421
opt_rebalance_tree.cpp \file opt_rebalance_tree.cpp Rebalances a reduction expression tree. For reduction operations (e.g., x + y + z + w) we generate an expression tree like + / \ + w / \ + z / \ x y which we can rebalance into + / \ / \ + + / \ / \ x y z w to get a better instruction scheduling. See "Tree Rebalancing in Optimal Editor Time and Space" by Quentin F. Stout and Bette L. Warren. Also see http://penguin.ewu.edu/~trolfe/DSWpaper/ for a very readable explanation of the of the tree_to_vine() (rightward rotation) and vine_to_tree() (leftward rotation) algorithms. 9666
opt_redundant_jumps.cpp \file opt_redundant_jumps.cpp Remove certain types of redundant jumps 3664
opt_structure_splitting.cpp \file opt_structure_splitting.cpp If a structure is only ever referenced by its components, then split those components out to individual variables so they can be handled normally by other optimization passes. This skips structures like uniforms, which need to be accessible as structures for their access by the GL. 11074
opt_swizzle.cpp \file opt_swizzle.cpp Optimize swizzle operations. First, compact a sequence of swizzled swizzles into a single swizzle. If the final resulting swizzle doesn't change the order or count of components, then remove the swizzle so that other optimization passes see the value behind it. 3364
opt_tree_grafting.cpp \file opt_tree_grafting.cpp Takes assignments to variables that are dereferenced only once and pastes the RHS expression into where the variable is dereferenced. In the process of various operations like function inlining and tertiary op handling, we'll end up with our expression trees having been chopped up into a series of assignments of short expressions to temps. Other passes like ir_algebraic.cpp would prefer to see the deepest expression trees they can to try to optimize them. This is a lot like copy propagaton. In comparison, copy propagation only acts on plain copies, not arbitrary expressions on the RHS. Generally, we wouldn't want to go pasting some complicated expression everywhere it got used, though, so we don't handle expressions in that pass. The hard part is making sure we don't move an expression across some other assignments that would change the value of the expression. So we split this into two passes: First, find the variables in our scope which are written to once and read once, and then go through basic blocks seeing if we find an opportunity to move those expressions safely. 11564
opt_vectorize.cpp \file opt_vectorize.cpp Combines scalar assignments of the same expression (modulo swizzle) to multiple channels of the same variable into a single vectorized expression and assignment. Many generated shaders contain scalarized code. That is, they contain r1.x = log2(v0.x); r1.y = log2(v0.y); r1.z = log2(v0.z); rather than r1.xyz = log2(v0.xyz); We look for consecutive assignments of the same expression (modulo swizzle) to each channel of the same variable. For instance, we want to convert these three scalar operations (assign (x) (var_ref r1) (expression float log2 (swiz x (var_ref v0)))) (assign (y) (var_ref r1) (expression float log2 (swiz y (var_ref v0)))) (assign (z) (var_ref r1) (expression float log2 (swiz z (var_ref v0)))) into a single vector operation (assign (xyz) (var_ref r1) (expression vec3 log2 (swiz xyz (var_ref v0)))) 12647
program.h extern "C" 2009
propagate_invariance.cpp \file propagate_invariance.cpp Propagate the "invariant" and "precise" qualifiers to variables used to compute invariant or precise values. The GLSL spec (depending on what version you read) says, among the conditions for getting bit-for-bit the same values on an invariant output: "All operations in the consuming expressions and any intermediate expressions must be the same, with the same order of operands and same associativity, to give the same order of evaluation." This effectively means that if a variable is used to compute an invariant value then that variable becomes invariant. The same should apply to the "precise" qualifier. 3720
README Welcome to Mesa's GLSL compiler. A brief overview of how things flow: 10776
s_expression.cpp -*- c++ -*- 6159
s_expression.h -*- c++ -*- 4733
serialize.cpp \file serialize.cpp GLSL serialization Supports serializing and deserializing glsl programs using a blob. 48965
serialize.h extern "C" 1687
shader_cache.cpp \file shader_cache.cpp GLSL shader cache implementation This uses disk_cache.c to write out a serialization of various state that's required in order to successfully load and use a binary written out by a drivers backend, this state is referred to as "metadata" throughout the implementation. The hash key for glsl metadata is a hash of the hashes of each GLSL source string as well as some API settings that change the final program such as SSO, attribute bindings, frag data bindings, etc. In order to avoid caching any actual IR we use the put_key/get_key support in the disk_cache to put the SHA-1 hash for each successfully compiled shader into the cache, and optimisticly return early from glCompileShader (if the identical shader had been successfully compiled in the past), in the hope that the final linked shader will be found in the cache. If anything goes wrong (shader variant not found, backend cache item is corrupt, etc) we will use a fallback path to compile and link the IR. 9568
shader_cache.h SHADER_CACHE_H 1576
standalone.cpp @file standalone.cpp Standalone compiler helper lib. Used by standalone glsl_compiler and also available to drivers to implement their own standalone compiler with driver backend. 22129
standalone.h GLSL_STANDALONE_H 1756
standalone_scaffolding.cpp This file declares stripped-down versions of functions that normally exist outside of the glsl folder, so that they can be used when running the GLSL compiler standalone (for unit testing or compiling builtins). 9516
standalone_scaffolding.h This file declares stripped-down versions of functions that normally exist outside of the glsl folder, so that they can be used when running the GLSL compiler standalone (for unit testing or compiling builtins). 3933
string_to_uint_map.cpp \file string_to_uint_map.cpp \brief Dumb wrapprs so that C code can create and destroy maps. \author Ian Romanick <ian.d.romanick@intel.com> 1546
string_to_uint_map.h Map from a string (name) to an unsigned integer value \note Because of the way this class interacts with the \c hash_table implementation, values of \c UINT_MAX cannot be stored in the map. 5185
test_optpass.h TEST_OPTPASS_H 1274
TODO 689
xxd.py 3639