Uncategorized

Introduction to Shader Programming

In this post, I will show you how to take full advantage of the GPU by creating custom shaders. Shaders are used in two places in the FireMonkey framework: for creating materials that you can apply to 3D objects, and for creating filter effects that you can apply to bitmaps. I will focus mostly on the first use case in this post, although what you learn here can be applied to filter effects as well.

About 3D in FireMonkey

Although most developers probably use FireMonkey as a 2D application framework (much like the VCL), FireMonkey has had support for 3D applications from the beginning. This post will not go into details on how to create 3D applications with FireMonkey. That would leave little room to talk about the actual topic of this article (and it is long enough already). So I assume you have some basic understanding of the 3D concepts in FireMonkey, or are willing to just jump in and get your feet wet. For information about FireMonkey 3D, you can consult the Delphi documentation, Delphi sample projects, blog posts like Bruce McGee’s 3D article or Andrea Magni’s excellent FireMonkey book.

2 Ways to do 3D

I do want to point out that there are two ways you can add 3D content to your application:

  1. You can use a 3D form (TForm3D) and add your 3D controls to the form. If you also need to use 2D controls, you will need to add a TLayer3D (or TBufferedLayer3D) to the form and add your 2D controls to that layer. You usually want to change the Projection property of that layer to Screen so it doesn’t get distorted by the camera.
  2. You can use a regular (2D) form and add a TViewPort3D to it. You then add your 3D controls to the viewport.

Which option you should use depends on the situation. In general, if most of your form contains 3D content, you should use the first option. Otherwise, the second option is more efficient.

3D for 2D

You may assume that 3D forms and applications are only useful for specific use cases where you need to work with 3D models or other types of 3D content. However, 3D forms can also be used to create a purely 2D user interface, by using plane-like objects without a camera so they look like regular 2D rectangles. A reason you may want to do this is so you can utilize GPU shaders to render those rectangles. For example, our Lumicademy product looks like a 2D application, but the main form is actually a 3D form. This enables us to very efficiently render multiple live videos and other content to the screen using specialized GPU shaders, thereby freeing the CPU to perform other tasks (such as decoding those videos).

About GPUs and Shaders

The GPU or video card is responsible for rendering 2D and 3D content to the screen. When it comes to 3D content, the GPU only works with 3D triangles (and sometimes points and lines). It converts the coordinates of these triangles to screen coordinates, clips the triangles to the screen and renders them using colors, textures or custom effects.

In the early days, video cards and APIs used a fixed graphics pipeline. This means that the GPU would take care of most of the hard work for you: You would give the API a list of triangles, colors, texture coordinates and other data, and the GPU would figure out how to render it all. If the GPU wasn’t capable of certain functionality, the API (eg. DirectX) would use software emulation if configured to do so.

Nowadays, virtually all GPUs and APIs use a programmable graphics pipeline. In some regards, you could say this is a step back since the GPU and API do less work for you. Instead, you need to tell the GPU how triangles (vertices) must be converted to screen coordinates, and how each pixel must be rendered. This is a lot more work for the developer, but also offers much more flexibility and enables functionality that is simple not possible with a fixed pipeline.

The newest graphics APIs (DirectX 12, Metal and Vulkan) shed even more abstraction layers to give developers even more control over the GPU hardware. This is at the cost of increasingly complicated APIs but allows for even more efficient rendering.

Depending on the GPU and API, the graphics pipeline can have multiple programmable stages. Every programmable GPU has at least two programmable stages: a vertex transformation stage and a pixel shader stage. Newer GPUs and APIs may add additional stages, such as a tessellation stage. But unless you are creating a sophisticated game engine, you only need the first two stages. These are also the only stages that you can customize in FireMonkey.

The following image shows a possible graphics pipeline with 4 stages of which the first and last ones are programmable and the two middle ones are fixed.

In the vertex transformation stage, individual triangle vertices are transformed from 3D world space to 2D screen space. You program this stage by creating a vertex shader. This shader is also used to pass data that is needed for rendering to the pixel shader stage. The vertex shader is called once for each vertex.

The next two stages are fixed. The shape assembly stage converts the transformed vertices to shapes (triangles). The rasterization stage then rasterizes these triangles into a set of pixels.

Finally, the pixel shader stage determines to color of each pixel. This is the second programmable stage, for which you write a pixel shader (aka fragment shader in OpenGL and Metal). This shader calculates the final pixel color based on some algorithm and/or textures supplied to the shader. It can receive input data from the vertex shader if needed. The pixel shader is called once for each pixel.

Shader Languages

Although most concepts of vertex and pixel shaders are the same across all graphics APIs, each API has its own shader language that is used to write these shaders:

  • On Windows, the DirectX API is used, which uses the High Level Shader Language (HLSL).
  • On Android, macOS and iOS, the OpenGL API is used, which comes with the GL Shader Language (GLSL).
  • OpenGL is deprecated on macOS and iOS (although still supported at the time of writing). You should use the Metal API on these platforms and write your shaders in the Metal Shader Language (MSL). One of my previous blog posts shows how to enable Metal on these platforms.

There is also a new cross-platform API called Vulkan, that is the official successor of OpenGL. Unfortunately, since this API is not supported by Apple, it hasn’t gained much traction yet. If it becomes popular in the future, Delphi may add support for it. This would also mean another shader language to learn (although it is very similar to GLSL).

All APIs provide the option to compile the shader source code on-the-fly in your app. Some APIs also provide the option to compile the shader off-line. In that case, you pass the compiled bytecode to the API.

FireMonkey uses compiled bytecode for DirectX shaders, and source code for OpenGL and Metal shaders. To compile DirectX shaders, you need to use the Direct3D Shader Compiler tool (fxc.exe), which ships with the DirectX SDK. I also included this tool in the GitHub repository that accompanies this post.

FireMonkey Materials

In FireMonkey, the vertex shader and pixel shader are encapsulated in a material (derived from TCustomMaterial). However, you cannot apply a material directly to a 3D objects. Instead, FireMonkey uses the concept of a material source (derived from TMaterialSource), which is used to link a material to a 3D object. You usually add a component derived from TMaterialSource to your form and set the MaterialSource property of a 3D control to this component. The material source will then create the corresponding TMaterial descendant when it is needed to render the control.

So materials usually come in pairs: a material source and a material. For example, Delphi’s TColorMaterialSource is used at design time. It creates a TColorMaterial at runtime for rendering.

The remainder of this post focuses mostly on creating some custom material sources and materials for rendering various kinds of effects.

Demo Applications

To demonstrate the use of custom materials and shaders, I added a GpuProgramming folder to our JustAddCode GitHub repository with 6 sample projects. We start simple with a shader that renders just blue pixels and work our way up to (slightly) more advanced scenarios that use textures and finally a simple plasma effect (which is also used for the title image of this post).

Most sample applications just show a spinning TPlane control with our custom material (source) applied to it. Since GPUs work exclusively with points, lines and triangles, a rectangular plane is represented with two triangles. FireMonkey takes care of creating these triangles for you and passing them to the GPU.

You can install your material sources into a design time package so you can place them on a form and link them to controls visually. However, to keep the samples simple and avoid dependencies on packages, the material sources are created and linked in code. This is done in the OnCreate event of the form. For example, for the first demo app, this method looks like this:

procedure TFormMain.Form3DCreate(Sender: TObject);
begin
  FMaterialSource := TBlueMaterialSource.Create(Self);
  Plane.MaterialSource := FMaterialSource;
end;

The code just creates a material source (for a solid blue material in this example) and links it to the plane. We will look at how to create this material next.

Alternative Backends

By default, the sample applications use the default graphics backend for the platform. This is DirectX 11 on Windows and OpenGL on all other platforms.

However, you can also build each sample with an alternative backend by choosing the “Debug_AlternativeBackend” configuration. This configuration uses DirectX 9 on Windows and Metal on macOS and iOS.

Each sample app has a header that shows the name of the context class (and thus graphics backend) that is currently used (for example, TDX11Context when DirectX 11 is used).

Sample 1: Getting Started

This is the easiest example, that just renders every pixel using a blue color. Even so, this example takes up the most space in this article due to the fact that there is a lot of scaffolding to put up. Before we look at the Delphi side, lets take a look at what shader source code looks like.

Vertex Shader – HLSL

Let’s start with the HLSL (DirectX) code of the vertex shader (in the file VertexShader.DX.txt):

float4x4 MVPMatrix;

float4 main(float4 position: POSITION0): SV_Position0
{
  return mul(MVPMatrix, position);
}

All shader languages are C-like languages, and HLSL is no exception. The sole purpose of this vertex shader is to convert a vertex coordinate (a 4D position vector) from world space to screen space. This is done by multiplying the position with a model-view-projection matrix (MVPMatrix). This 4×4 matrix (of type float4x4) is a calculated for you by FireMonkey based on the camera and viewport. It is a combination of 3 matrices:

  • The Model matrix converts the model (vertex) coordinates from local space to world space.
  • The View matrix converts the result from world space to camera space.
  • And finally, the Projection matrix converts from camera space to screen space, taking things like lens size and perspective projecting into account.

The MVPMatrix variable is a so-called uniform input, which means that its value is constant for multiple invocations of the shader (that is, its value is the same for each vertex that is transformed). FireMonkey calculates this matrix in Delphi code and passes it to the shader.

The shader must have function called main that returns the transformed position (a 4D vector of type float4). Its input is the source position in local space. This type of variable is called a varying input in HLSL (or attribute in GLSL), meaning that its value is unique to each invocation of the shader. These inputs must be marked with a semantic, which is a name used to link the input and output of parts of the graphics pipeline. The POSITION0 semantic used here means that this parameter represents position data.

The entire function is also marked with a (system value) semantic called SV_Position0, meaning that the function result represents the transformed position.

FireMonkey requires that HLSL shares are compiled to bytecode. This is done with the following command line:

fxc /T vs_3_0 /E main /O3 /FoVertexShader.DX9 VertexShader.DX.txt
fxc /T vs_4_0 /E main /O3 /FoVertexShader.DX11 VertexShader.DX.txt

The parameters of interest are:

  • /T to specify the target profile. For our purposes, the following values are used:
    • vs_3_0: a version 3.0 vertex shader (used by DirectX 9)
    • vs_4_0: a version 4.0 vertex shader (used by DirectX 11)
    • ps_3_0: a version 3.0 pixel shader (used by DirectX 9)
    • ps_4_0: a version 4.0 pixel shader (used by DirectX 11)
  • /E to specify the name of the entry point, which is main in our case.
  • /Fo to specify the name of the output file

This is followed by the name of the input file. Each sample project has a Shaders directory with the source code of all shaders, as well as a batch file (Build.bat) the compiles the shaders and generates a resource file.

Vertex Shader – GLSL

The GLSL (OpenGL) version looks a bit different:

uniform vec4 _MVPMatrix[4];

attribute vec4 a_Position;

void main()
{
  gl_Position.x = dot(_MVPMatrix[0], a_Position);
  gl_Position.y = dot(_MVPMatrix[1], a_Position);
  gl_Position.z = dot(_MVPMatrix[2], a_Position);
  gl_Position.w = dot(_MVPMatrix[3], a_Position); 
}

The main differences compared to the HLSL version are:

  • The data types are named differently (eg. vec4 instead of float4 and mat4 instead of float4x4).
  • Uniform inputs (constants) must be marked with a uniform keyword. These variables must start with an underscore (as in _MVPMatrix). This is not a GLSL rule but a FireMonkey requirement.
  • The per-vertex inputs are not passed as a parameter to the main function, but must be declared with an attribute keyword instead. Attributes can have any name, but FireMonkey requires fixed names so it knows what these attributes represent (in HLSL, this is not required since the semantic is used for this purpose). So position attributes must be named a_Position (the a_ prefix is also a FireMonkey requirement).
  • GLSL uses some predefined variable names (starting with a gl_ prefix) for common variables in the pipeline. The output vertex coordinate is always stored in the predefined variable gl_Position.
  • GLSL also supports matrix multiplication, so the _MVPMatrix variable could be of type mat4. However, FireMonkey requires that matrices are passed as arrays of vectors instead. This also means that the matrix multiplication has to be split into 4 separate vector dot operations. Fortunately, this part of the vertex shader is boilerplate and you will use the same code in most GLSL vertex shaders you write.

There is no need to compile this shader off-line, since FireMonkey will compile it for you when needed.

Vertex Shader – MSL

Finally, we have the MSL (Metal) version:

using namespace metal;

struct Vertex 
{
  <#VertexDeclaration#>
};

struct ProjectedVertex
{
  float4 position [[position]];
};

vertex ProjectedVertex vertexShader(
  constant Vertex *vertexArray [[buffer(0)]],
  const unsigned int vertexId [[vertex_id]],
  constant float4x4 &MVPMatrix [[buffer(1)]]) 
{
  Vertex in = vertexArray[vertexId];
  ProjectedVertex out;
  out.position = float4(in.position[0], in.position[1],
    in.position[2], 1) * MVPMatrix;
  return out;
}

This looks a bit more complicated:

  • The using namespace metal part means that the code can use standard types and functions from the metal namespace (like a uses clause in Delphi).
  • Next, it declares two structs (records):
    • The Vertex struct represents the type of input vertex. FireMonkey will fill this in for you by replacing <#VertexDeclaration#> with the source code needed to represent the vertex (this <#...#> tag is not a Metal feature).
    • The ProjectedVertex structure represents the output vertex. In this case, it only contains an output position, but we will add more in later examples. The [[position]] attribute is like a semantic in HLSL and indicates that this field represents a vertex position.
  • Finally, we have the main function, which starts with the keyword vertex to indicate this is a vertex shader (not to be confused with Vertex with in an uppercase V, which is the type of the input vertices). It returns the transformed vertex of type ProjectedVertex.
  • This shader has 3 parameters:
    • The vertexArray parameter is an array of input vertices (of type Vertex). It has a constant qualifier, meaning that the vertex array is stored in the constant (read-only) address space (not to be confused with the const qualifier). Parameters in the constant address space must have a [[buffer(index)]] attribute, where index is the location of the buffer. This index is later used in Delphi code to link this parameter to the vertices supplied by FireMonkey.
    • The vertexId parameter contains the index of the vertex in the vertextArray parameter that is currently being processed. This parameter has a const (not constant) qualifier meaning it is read-only (much like the const qualifier for parameters in Delphi). It must have a [[vertex_id]] attribute to tell Metal what the parameter is used for.
    • Finally the MVPMatrix parameter contains the 4×4 model-view-project matrix, also stored in the constant address space. It uses a different buffer index than the vertexArray parameter.
  • The body of the function just extracts the vertex with the given Id from the array and multiplies it with the transformation matrix.

Again, there is no need to compile this shader off-line.

Pixel Shader – HLSL

The pixel shader is pretty simple since it always returns just a blue color:

float4 main(): SV_Target0
{
  return float4(0.0, 0.0, 1.0, 1.0);
}

The function returns a 4D vector, which is not only used for positions (X, Y, Z, W), but also for colors (R, G, B, A). Color components range from 0.0 (fully off) to 1.0 (fully on). Alpha components also range from 0.0 (fully transparent) to 1.0 (fully opaque). If you use values outside of this range, the GPU will automatically clip (or saturate) them.

The body of the function sets the Blue and Alpha components of the color to 1.0. Remember to always set the Alpha value as well.

Note that the function is marked with an SV_Target0 semantic, meaning it returns the target pixel color.

Pixel Shader – GLSL

The GLSL version is just as simple:

void main()
{
  gl_FragColor = vec4(0.0, 0.0, 1.0, 1.0);
}

Here, the predefined gl_FragColor variable must be set to the output color.

Pixel Shader – MSL

Finally, the Metal version looks like this:

fragment float4 fragmentShader()
{
  return float4(0.0, 0.0, 1.0, 1.0);    
}

Here, the function must start with the fragment keyword to indicate that this is a pixel (or fragment) shader.

Shader Material

Now we are finally ready to use these shaders to create a material source and material (remember, these always come in pairs). Since these materials have no configurable properties, the implementation is pretty simple. Lets start with the material source:

type
  TBlueMaterialSource = class(TMaterialSource)
  protected
    function CreateMaterial: TMaterial; override;
  end;

function TBlueMaterialSource.CreateMaterial: TMaterial;
begin
  Result := TBlueMaterial.Create;
end;

You must always override the CreateMaterial method and create the actual material instance in its implementation.

If you look at the source code of materials that ship with Delphi (in the FMX.Materials unit), you will see that an off-line tool is used to convert the (byte) code of HLSL and GLSL shaders to Delphi byte arrays. The MSL shaders are included as Delphi strings of MSL source code.

In the sample projects, we use a different approach which doesn’t require an external tool: we link the shaders into the executable as resources and use a TResourceStream to load them. The interface of the TBlueMaterial class looks like this:

type
  TBlueMaterial = class(TCustomMaterial)
  private class var
    FShaderArch: TContextShaderArch;
    FVertexShaderData: TBytes;
    FPixelShaderData: TBytes;
    FMatrixIndex: Integer;
    FMatrixSize: Integer;
  private
    class procedure LoadShaders; static;
    class function LoadShader(
      const AResourceName: String): TBytes; static;
  protected
    procedure DoInitialize; override;
  end;

The class defines 5 static class variables (because we don’t need different values of these variables for each instance):

  • FShaderArch contains the shader architecture that is currently used, based on the graphics backend. It will have the value DX9, DX11, GLSL or Metal.
  • FVertexShaderData and FPixelShaderData contain the byte code or source code of the shaders, as read from the resource.
  • FMatrixIndex contains the index of the MVPMatrix variable in the shader, which is needed to link the Delphi matrix to the corresponding matrix in the shader. This value can be different depending on the shader architecture.
  • FMatrixSize contains the size of a matrix. This also depends on the shader architecture. For DX11, this is the size of a matrix in bytes (which is 4 x 4 x 4 = 64). For other architectures, this is the size of a matrix as the number of 4D vectors it contains (which is 4).

You must always override the DoInitialize method to register the shaders:

procedure TBlueMaterial.DoInitialize;
begin
  inherited;
  if (FShaderArch = TContextShaderArch.Undefined) then
    LoadShaders;

  FVertexShader := TShaderManager.RegisterShaderFromData('blue.fvs',
    TContextShaderKind.VertexShader, '', [
    TContextShaderSource.Create(FShaderArch, FVertexShaderData,
    [TContextShaderVariable.Create('MVPMatrix', 
     TContextShaderVariableKind.Matrix, FMatrixIndex, FMatrixSize)])
  ]);

  FPixelShader := TShaderManager.RegisterShaderFromData('blue.fps',
    TContextShaderKind.PixelShader, '', [
    TContextShaderSource.Create(FShaderArch, FPixelShaderData, [])
  ]);
end;

If this is the first time this material is used, then the shaders are loaded from the resource using the static class method LoadShaders. Next, it registers the vertex shader and pixel shader. Note that FVertexShader and FPixelShader are fields of the TCustomMaterial class, from which TBlueMaterial derives.

The TShaderManager.RegisterShaderFromData method has these parameters:

  • A name that uniquely identifies that shader. You can use any name you want as long as it is unique in your code. A convention is to use a name based on the material class name, with an .fvs suffix for vertex shaders and .fps suffix for pixel (fragment) shaders.
  • The kind of shader (either VertexShader or PixelShader).
  • An optional string containing the original shader source code for reference. This is usually left empty though.
  • An array of shader sources (each of type TContextShaderSource, which is a record type). When you look at the built-in materials that ship with Delphi, you will notice that it registers shaders for all languages here. However, since we already know what graphics backend is being used, we only use a single TContextShaderSource element with the following parameters:
    • The shader architecture (DX9, DX11, GLSL or Metal).
    • The shader byte code or source code.
    • An array of shader (uniform) variables that are passed from Delphi to the shader.
  • This simple shader only has a single shader variable: the model-view-projection matrix. The TContextShaderVariable record has a constructor with the following parameters:
    • The name of the shader variable (MVPMatrix in this case).
    • The type of the variable (TContextShaderVariableKind.Matrix here).
    • The index of the variable in the shader. For DirectX and OpenGL shaders, this is the zero-based index of the variable as it appears in the source code (that is, the first variable has index 0, the second variable has index 1 etc.). For Metal shaders, this is the index as specified in the [[buffer(index)]] attributed discussed above. In our sample, we pass the value of FMatrixIndex here, which is set in the LoadShaders method.
    • The size of the variable in units that depend in the shader architecture. As mentioned earlier, DirectX 11 uses a size in bytes, and other architectures use a size in number of 4D vector units. We use FMatrixSize which is also set in the LoadShaders method.

The LoadShaders method detects the current shader architecture, sets the size of the FMatrixIndex and FMatrixSize fields accordingly and loads the corresponding vertex and pixels shaders from the resource:

class procedure TBlueMaterial.LoadShaders;
begin
  var Suffix := '';
  var ContextClass := TContextManager.DefaultContextClass;

  {$IF Defined(MSWINDOWS)}
  if (ContextClass.InheritsFrom(TCustomDX9Context)) then
  begin
    FShaderArch := TContextShaderArch.DX9;
    FMatrixIndex := 0;
    FMatrixSize := 4;
    Suffix := 'DX9';
  end
  else if (ContextClass.InheritsFrom(TCustomDX11Context)) then
  begin
    FShaderArch := TContextShaderArch.DX11;
    FMatrixIndex := 0;
    FMatrixSize := 64;
    Suffix := 'DX11';
  end;
  {$ELSE}
  if (ContextClass.InheritsFrom(TCustomContextOpenGL)) then
  begin
    FShaderArch := TContextShaderArch.GLSL;
    FMatrixIndex := 0;
    FMatrixSize := 4;
    Suffix := 'GL';
  end;
  {$ENDIF}

  {$IF Defined(MACOS)}
  if (ContextClass.InheritsFrom(TCustomContextMetal)) then
  begin
    FShaderArch := TContextShaderArch.Metal;
    FMatrixIndex := 1;
    FMatrixSize := 4;
    Suffix := 'MTL';
  end;
  {$ENDIF}

  if (FShaderArch = TContextShaderArch.Undefined) then
    raise EContext3DException.Create('Unknown or unsupported 3D context class');

  FVertexShaderData := LoadShader('VERTEX_SHADER_' + Suffix);
  FPixelShaderData := LoadShader('PIXEL_SHADER_' + Suffix);
end;

class function TBlueMaterial.LoadShader(const AResourceName: String): TBytes;
begin
  var Stream := TResourceStream.Create(HInstance, AResourceName, RT_RCDATA);
  try
    SetLength(Result, Stream.Size);
    Stream.ReadBuffer(Result, Length(Result));
  finally
    Stream.Free;
  end;
end;

Now we can finally apply this material to a 3D control and run the application:

I know, pretty boring for that much work. But we have to take baby steps before we run. Fortunately, with most of the scaffolding setup now, building on this in the next examples will be faster.

Sample 2: Custom Color

An obvious improvement to the first sample is to make the color configurable. This can easily be done by adding a uniform input to the pixel shader for the color.

Pixel Shaders

The vertex shader remains unchanged. Below are the updated pixel shaders, where the new/modified lines are highlighted:

// HLSL

float4 Color;

float4 main(): SV_Target0
{
  return Color;
}
// GLSL

uniform vec4 _Color;

void main()
{
  gl_FragColor = _Color;
}
// MSL

fragment float4 fragmentShader(
  constant float4 &Color [[buffer(0)]])
{
  return Color;    
}

Material

On the Delphi side, we also need to add a Color property to our new material (TCustomColorMaterial):

type
  TCustomColorMaterial = class(TCustomMaterial)
    ...
  private
    FColor: TAlphaColor;
    procedure SetColor(const AValue: TAlphaColor);
  protected
    procedure DoApply(const Context: TContext3D); override;
    procedure DoInitialize; override;
  public
    constructor Create; override;

    property Color: TAlphaColor read FColor write SetColor;
  end;

{ TCustomColorMaterial }

constructor TCustomColorMaterial.Create;
begin
  inherited;
  FColor := TAlphaColors.Blue;
end;

procedure TCustomColorMaterial.DoApply(const Context: TContext3D);
begin
  inherited;
  Context.SetShaderVariable('Color', FColor);
end;

procedure TCustomColorMaterial.DoInitialize;
begin
  inherited;
  ...
  FPixelShader := TShaderManager.RegisterShaderFromData('CustomColor.fps',
    TContextShaderKind.PixelShader, '', [
    TContextShaderSource.Create(FShaderArch, FPixelShaderData,
    [TContextShaderVariable.Create('Color', 
     TContextShaderVariableKind.Vector, 0, FColorSize)])
  ]);
end;

procedure TCustomColorMaterial.SetColor(const AValue: TAlphaColor);
begin
  if (AValue <> FColor) then
  begin
    FColor := AValue;
    DoChange;
  end;
end;

This code introduces a couple of new concepts:

  • The uniform shader input is represented through a property (Color) of the Delphi material class.
  • To pass the color to the GPU shader, you must override the DoApply method. There, you call the SetShaderVariable method to pass the color to the shader. You must supply the name of the uniform shader input as it appears in the shader source code (case sensitive) and the value to set.
  • The Initialize method has been slightly updated to add a shader variable to the pixel shader. Its type is a 4D vector and its size (FColorSize) is 4 when using DirectX 11, or 1 on all other architectures.
  • The property setter for the Color property follows a common pattern: check if the value is actually different, and if so, update the backing field and call DoChange. DoChange notifies all 3D controls that use the material that the material has changed, which will result in a repaint request.

We also need to update the material source (TCustomColorMaterialSource). You also need to add a (published) Color property here. The implementation of the getter and setter just use the value from the actual underlying material:

type
  TCustomColorMaterialSource = class(TMaterialSource)
  private
    function GetColor: TAlphaColor;
    procedure SetColor(const AValue: TAlphaColor);
  protected
    function CreateMaterial: TMaterial; override;
  published
    property Color: TAlphaColor read GetColor write SetColor;
  end;

{ TCustomColorMaterialSource }

function TCustomColorMaterialSource.CreateMaterial: TMaterial;
begin
  Result := TCustomColorMaterial.Create;
end;

function TCustomColorMaterialSource.GetColor: TAlphaColor;
begin
  Result := TCustomColorMaterial(Material).Color;
end;

procedure TCustomColorMaterialSource.SetColor(const AValue: TAlphaColor);
begin
  TCustomColorMaterial(Material).Color := AValue;
end;

The result is slightly more interesting:

This material is very similar to Delphi’s built-in TColorMaterial (but without an opacity component).

Sample 3: Using a Texture

The next sample applies a texture to the plane instead of a single color. A texture is a bitmap that lives on the GPU. The pixel shader “samples” pixels in the texture based on a couple of texture coordinates (aka UV coordinates). Unlike a bitmap, where the X and Y coordinates range from 0 to the width and height of the bitmap, the U and V texture coordinates are floating point values that range from (0.0, 0.0) for the top-left corner to (1.0, 1.0) for the bottom-right corner (regardless of bitmap dimensions and whether the bitmap is square or rectangular):

FireMonkey TPlane controls automatically set the texture coordinates of the top-left corner of the plane to (0, 0) and the bottom-right corner of the plane to (1, 1), so it always fits the entire texture. In this case, each vertex has two sets of coordinates: a 4D position vector and a 2D texture coordinates vector.

Vertex Shaders

The vertex shader must be adjusted to accept both vectors for each vertex (new lines are highlighted):

// HLSL

float4x4 MVPMatrix;

void main(
  float4 inPosition: POSITION0,
  float2 inTexCoord: TEXCOORD0,
  out float4 outPosition: SV_Position0,
  out float2 outTexCoord: TEXCOORD0)
{
  outPosition = mul(MVPMatrix, inPosition);
  outTexCoord = inTexCoord;
}

The TEXCOORD0 semantic is used to indicate this is a texture coordinate. There is an additional out-parameter that is set to the input texture coordinate. The rasterization stage will automatically interpolate the texture coordinates for each pixel before it passes it to the pixel shader (which will be presented later).

// GLSL

uniform vec4 _MVPMatrix[4];

attribute vec4 a_Position;
attribute vec2 a_TexCoord0;

varying vec2 TexCoord;

void main()
{
  gl_Position.x = dot(_MVPMatrix[0], a_Position);
  gl_Position.y = dot(_MVPMatrix[1], a_Position);
  gl_Position.z = dot(_MVPMatrix[2], a_Position);
  gl_Position.w = dot(_MVPMatrix[3], a_Position); 
  
  TexCoord = a_TexCoord0;
}

The output texture coordinate (TexCoord) is marked with a varying keyword. This keyword tells the rasterization stage to vary (interpolate) this value for each pixel. (This is different from a varying input in HLSL). So to recap, there are 4 types of variables in GLSL:

  • Regular local or global variables (or parameters).
  • Attributes (per-vertex values).
  • Uniforms (values that are constant for all vertices).
  • Varyings (values that will be interpolated between the vertex and pixel shader).
// MSL

using namespace metal;

struct Vertex 
{
  <#VertexDeclaration#>
};

struct ProjectedVertex
{
  float4 position [[position]];
  float2 texCoord;
};

vertex ProjectedVertex vertexShader(
  constant Vertex *vertexArray [[buffer(0)]],
  const unsigned int vertexId [[vertex_id]],
  constant float4x4 &MVPMatrix [[buffer(1)]]) 
{
  Vertex in = vertexArray[vertexId];
  ProjectedVertex out;
  out.position = float4(in.position[0], in.position[1],
    in.position[2], 1) * MVPMatrix;
  out.texCoord = in.texcoord0;
  return out;
}

Here, the texture coordinate is added to the ProjectedVertex structure/record. It does not have a specific attribute attached to it, which means that the rasterization stage will interpolate its value.

Pixel Shaders

The pixel shaders have to be modified as well to accept the (interpolated) texture coordinates from the vertex shader, as well as a texture (and sampler) that is set on the Delphi side:

// HLSL

Texture2D Texture;
SamplerState Sampler;

float4 main(float4 position: SV_POSITION,
  float2 texCoord: TEXCOORD0): SV_Target0
{
  return Texture.Sample(Sampler, texCoord);
}

The shader has 2 additional uniform inputs: the texture (of type Texture2D) and a sampler (of type SampleState). A sampler is an object that is used to “sample” (retrieve) colors from a texture. It can be configured to customize how pixels should be filtered (eg. nearest neighbor filtering, linear interpolation, mipmapping etc.) among other things. The same sampler can be use with multiple textures.

The main function has an additional texCoord parameter that is passed from the vertex shader (and interpolated by the previous rasterization stage). It uses the texture and the sampler to return the color from the texture at the given texture coordinates.

// GLSL

uniform sampler2D _Texture;

varying vec2 TexCoord;

void main()
{
  gl_FragColor = texture2D(_Texture, TexCoord);
}

In GLSL, the sampler2D type contains both the texture and sampler state. The TexCoord varying is retrieved from the vertex shader. The main function uses the built-in texture2D function to sample the texture at the given texture coordinates.

// MSL

using namespace metal;

struct ProjectedVertex
{
  float4 position [[position]];
  float2 texCoord;
};

fragment float4 fragmentShader(
  const ProjectedVertex in [[stage_in]],
  const texture2d<float> Image [[texture(0)]],
  const sampler ImageSampler [[sampler(0)]])
{
  return Image.sample(ImageSampler, in.texCoord);
}

On the MSL side, we need to duplicate the ProjectedVertex declaration from the vertex shader. It is passed as input to the main function (marked with a [[stage_in]] attribute). The function also retrieves the texture and sampler as parameters. The [[texture(index)]] and [[sampler(index)]] attributes are used to link the texture on the Delphi side to these parameters in the shader.

Material

On the Delphi side, the material is updated to reflect the new shaders:

type
  TImageMaterial = class(TCustomMaterial)
  ...
  private
    FTexture: TTexture; // Reference
    procedure SetTexture(const AValue: TTexture);
  protected
    procedure DoInitialize; override;
    procedure DoApply(const Context: TContext3D); override;
  public
    property Texture: TTexture read FTexture write SetTexture;
  end;

{ TImageMaterial }

procedure TImageMaterial.DoApply(const Context: TContext3D);
begin
  inherited;
  Context.SetShaderVariable('Texture', FTexture);
end;

procedure TImageMaterial.DoInitialize;
begin
  ...
  FPixelShader := TShaderManager.RegisterShaderFromData('image.fps',
    TContextShaderKind.PixelShader, '', [
    TContextShaderSource.Create(FShaderArch, FPixelShaderData,
    [TContextShaderVariable.Create('Texture', 
     TContextShaderVariableKind.Texture, 0, 0)])
  ]);
end;

procedure TImageMaterial.SetTexture(const AValue: TTexture);
begin
  FTexture := AValue;
  DoChange;
end;

Most of this should be familiar by now. The DoApply method is overridden to bind the texture to the (pixel) shader. And the DoInitialize method is overridden accordingly to add the Texture shader variable to the pixel shader. This variable is of type TContextShaderVariableKind.Texture, and its size is always 0. The index is the index of the texture in the shader source code, and in case of MSL, the index passed to the [[texture(index)]] attribute.

Note that the property setter SetTexture always calls DoChange, even if the texture instance is the same. This is because the underlying texture pixel data may have changed.

The material source has to be updated as well:

type
  TImageMaterialSource = class(TMaterialSource)
  private
    FImage: TBitmap;
    procedure SetImage(const AValue: TBitmap);
    procedure HandleImageChanged(Sender: TObject);
  protected
    function CreateMaterial: TMaterial; override;
  public
    constructor Create(AOwner: TComponent); override;
    destructor Destroy; override;
  published
    property Image: TBitmap read FImage write SetImage;
  end;

{ TImageMaterialSource }

constructor TImageMaterialSource.Create(AOwner: TComponent);
begin
  inherited;
  FImage := TTextureBitmap.Create;
  FImage.OnChange := HandleImageChanged;
end;

function TImageMaterialSource.CreateMaterial: TMaterial;
begin
  Result := TImageMaterial.Create;
end;

destructor TImageMaterialSource.Destroy;
begin
  FImage.Free;
  inherited;
end;

procedure TImageMaterialSource.HandleImageChanged(Sender: TObject);
begin
  if (not FImage.IsEmpty) then
    TImageMaterial(Material).Texture := TTextureBitmap(FImage).Texture;
end;

procedure TImageMaterialSource.SetImage(const AValue: TBitmap);
begin
  FImage.Assign(AValue);
end;

The material source manages the image bitmap. This bitmap is of type TTextureBitmap, which is a special kind of bitmap that is backed by a texture on the GPU. We attach an OnChange event handler to it so we can update the underlying material when the bitmap changes.

The result is what you would expect:

Sample 4: Circle Cutout

We build on the previous example to cutout a circular area from the texture. You see this a lot for profile pictures. Of course, we could modify the texture itself and set all pixels outside of the circle to transparent:

However, this requires a manual change to the texture. By using a modified pixel shader, you could make the cutout for any image. A way to do this, is to think of a unit circle that covers the entire texture. This circle has its center at location (0, 0) and a radius of 1:

For each rendered pixel, we can then calculate the distance from that pixel to the center of the circle. If the distance is <= 1.0, the pixel is inside the circle. Otherwise, it is outside the circle and we set it to transparent.

The shader already receives a set of texture coordinates for the image. Remember that these range from (0.0, 0.0) to (1.0, 1.0). If we multiply these by 2 and subtract (1.0, 1.0), then the result will range from (-1.0. -1.0) to (1.0, 1.0), which covers the unit circle. Then, the calculations become trivial.

Pixel Shaders

We can keep the same vertex shaders as the previous example. The pixel shaders have to be updated though:

// HLSL

Texture2D Texture;
SamplerState Sampler;

float4 main(float4 position: SV_POSITION, 
  float2 texCoord: TEXCOORD0): SV_Target0
{
  // texCoord ranges from 0.0 to 1.0
  // For calculating a unit circle, remap to -1.0 to 1.0
  float2 locationFromCenter = (2.0 * texCoord) - float2(1.0, 1.0);
  
  // Calculate distance from center
  float distanceFromCenter = length(locationFromCenter);
  
  // A distance greater than 1 means we are outside the circle.
  // Return a transparent color in that case
  if (distanceFromCenter > 1.0)
    return float4(0.0, 0.0, 0.0, 0.0);
    
  return Texture.Sample(Sampler, texCoord);
}

In line 11, the texture coordinate is remapped from the range (0.0, 0.0) – (1.0, 1.0) to the range (-1.0, -1.0) – (1.0, 1.0) as discussed above. Then, in line 14, this value is used to calculate the distance from the center. It uses the built-in length function for this, which calculates the length of a 2D, 3D or 4D vector.

If this length is greater than 1.0, we are outside of the circle and return a fully transparent color. Otherwise, we sample the texture as we did before.

// GLSL

uniform sampler2D _Texture;

varying vec2 TexCoord;

void main()
{
  vec2 locationFromCenter = (2.0 * TexCoord) - vec2(1.0, 1.0);
  float distanceFromCenter = length(locationFromCenter);
  
  if (distanceFromCenter > 1.0)
  {
    gl_FragColor = vec4(0.0, 0.0, 0.0, 0.0);
    return;
  }
    
  gl_FragColor = texture2D(_Texture, TexCoord);
}
// MSL

using namespace metal;

struct ProjectedVertex
{
  float4 position [[position]];
  float2 texCoord;
};

fragment float4 fragmentShader(
  const ProjectedVertex in [[stage_in]],
  const texture2d<float> Image [[texture(0)]],
  const sampler ImageSampler [[sampler(0)]])
{
  float2 locationFromCenter = (2.0 * in.texCoord) - float2(1.0, 1.0);
  float distanceFromCenter = length(locationFromCenter);
  
  if (distanceFromCenter > 1.0)
    return float4(0.0, 0.0, 0.0, 0.0);

  return Image.sample(ImageSampler, in.texCoord);
}

The GLSL and MSL versions are very similar and should make sense by now.

We don’t need to update the materials at the Delphi side since all changes are handled by the shader.

This is what the result looks like:

Sample 5: Feathered Edge

If you look closely at the edge of these circles, you will see that these are very jagged:

This is because the inside/outside circle decision is binary: a pixel is either inside the circle or outside the circle. If you are running on a high DPI monitor or retina display, this may not be very noticeable, but on other displays it looks ugly. You want to create a softer edge to reduce the aliasing artifacts (that is, we want to anti-alias the edge).

One way to do this is to look at the distanceFromCenter value calculated in the pixel shader. We know that the pixel lies outside of the circle if this value is greater than 1.0 and inside otherwise. To create an anti-aliased edge we could do something different if the value is close to 1.0. For example, if the value is between 0.99 and 1.0, then we could gradually change the alpha (opacity) value of the pixel depending on how far the value is between 0.99 and 1.0. We would set the alpha value to 1.0 (opaque) when the distance is 0.99, and to 0.0 (transparent) if the distance is 1.0. For values in between, we interpolate the alpha value.

We take it one step further by making this threshold configurable. I call this feathering, since it is similar to feathering a selection in tools like Photoshop. The feather value ranges from 0.0 (no feathering, jagged edges) to 1.0 (full feathering, where pixels become partially transparent as soon as we move away from the center).

Pixel Shaders

Again, we can keep the same vertex shaders and only need to update the pixels shaders. I will only show the HLSL version this time, since the other versions are very similar:

Texture2D Texture;
SamplerState Sampler;

float Feather;

float4 main(float4 position: SV_POSITION,
  float2 texCoord: TEXCOORD0): SV_Target0
{
  // texCoord ranges from 0.0 to 1.0
  // For calculating a unit circle, remap to -1.0 to 1.0
  float2 locationFromCenter = (2.0 * texCoord) - float2(1.0, 1.0);
  
  // Calculate distance from center, but subtract it from 1.0 so
  // 0.0 is at the edge, and 1.0 is at the center. This makes 
  // subsequent calculations easier.
  float distance = 1.0 - length(locationFromCenter);
  
  // A distance less than 0 means we are outside the circle.
  // Return a transparent color in that case
  if (distance < 0.0)
    return float4(0.0, 0.0, 0.0, 0.0);
    
  // Get pixel color from texture
  float4 color = Texture.Sample(Sampler, texCoord);
  
  // If the distance is between 0.0 and Feather, then smoothly
  // interpolate the Alpha value so the edge fades out to fully 
  // transparent
  if (distance < Feather)
  {
    float alpha = smoothstep(0.0, Feather, distance);
    color = color * alpha;
  }
  
  return color;
}

The Feather value is passed from Delphi as a uniform shader input.

For ease of calculation, the distance value is calculated a bit differently: now a value of 1.0 is at the center of the circle, and 0.0 at the edge. So if the distance is smaller than 0.0, we can return a fully transparent color.

Otherwise, we sample the texture. Then, if the distance is smaller than the Feather value, we are near the edge of the circle and need to calculate an alpha value. We do this by interpolating the alpha value between 0.0 and 1.0, based on the distance and Feather values. We can use the built-in smoothstep function for this, which has 3 parameters:

  • min: the minimum range of the x parameter.
  • max: the maximum range of the x parameter.
  • x: the value to be interpolated.

It returns 0.0 if x < min, 1.0 if is x > max and otherwise interpolates between 0.0 and 1.0 depending on the position of x between min and max.

Then it multiplies the sampled color with this alpha value.

You might think that it would be sufficient to just set the alpha value of the color to this alpha value (eg. Color.a = alpha instead of Color = Color * alpha). However, unlike most graphic engines, FireMonkey requires that the output of the pixel shader is alpha pre-multiplied. This means that it is not sufficient to just set the alpha value; You need to pre-multiply the R, G and B components of the color with the alpha value as well. Not doing so will result in visual artifacts (like weirdly blended edges).

Material

On the Delphi side, the material must be updated accordingly. However, the only change is the addition of a Feather property, and you know by know how to do this. So I will not show the code here, but you can of course always check the sample project yourself.

The sample app is slightly modified. It adds a slider that the user can change to update the feather value:

Sample 6: Plasma Effect

Finally, just for fun, lets create an animated plasma effect. This effect takes a set of texture coordinates as input (without a texture), as well as a timestamp in seconds. The pixel shaders then performs a bunch of (trigonometric) calculations on these values to spit out something that looks like swirling plasma.

I didn’t write this plasma effect myself. Instead I adapted it from a sample on the very cool ShaderToy website. This site uses WebGL pixels shaders (a derivative from GLSL) to create all sorts effects. I created a HLSL, GLSL and MSL version of klk’s Simple Plasma shader on this site.

Vertex Shaders

float4x4 MVPMatrix;

void main(
  float4 inPosition: POSITION0,
  float2 inTexCoord: TEXCOORD0,
  out float4 outPosition: SV_Position0,
  out float2 outUV: TEXCOORD0)
{
  outPosition = mul(MVPMatrix, inPosition);
  outUV = (inTexCoord - 0.5) * 8.0;
}

This HLSL vertex shader is very similar to the one used before. The only change is that we scale the input texture coordinate before we pass it to the pixel shader. We could also do this in the pixel shader. However, if there is a calculation that can be performed in both the vertex shader and the pixel shader, then it is usually more efficient to perform it in the vertex shader. This is because usually the vertex shader is called less than the pixel shaders because there are usually way less vertices in a scene than there are pixels.

The GLSL and MSL versions are very similar and not shown here.

Pixel Shaders

The HLSL pixel shader performs the calculations based on the texture coordinates and timestamp:

float Time;

float4 main(float4 position: SV_POSITION, 
  float2 uv: TEXCOORD0): SV_Target0
{
  float i0 = 1.0;
  float i1 = 1.0;
  float i2 = 1.0;
  float i4 = 0.0;
  
  for (int s = 0; s < 7; s++) 
  {
    float2 r = float2(
      cos(uv.y * i0 - i4 + Time / i1),
      sin(uv.x * i0 - i4 + Time / i1)) / i2;
      
    r += float2(-r.y, r.x) * 0.3;    
    uv += r;
    
    i0 *= 1.93;
    i1 *= 1.15;
    i2 *= 1.7;
    i4 += 0.05 + 0.1 * Time * i1;
  }
  
  float r = sin(uv.x - Time) * 0.5 + 0.5;
  float b = sin(uv.y + Time) * 0.5 + 0.5;
  float g = sin((uv.x + uv.y + sin(Time * 0.5)) * 0.5) * 0.5 + 0.5;
  
  return float4(r, g, b, 1.0);
}

I will not go into details of the calculations here, since that is not the point. The code introduces a couple of new concepts, such as a for-loop and the built-in sin and cos functions.

Again, the GLSL and MSL versions are very similar and not shown here.

The same goes for the materials on the Delphi side. These are very similar to previous versions and only add a TimeInSeconds property, which you know by now how to implement.

The result is pretty interesting:

Note that the pixel shader is very calculation-heavy: if calculates 18 sine/cosine values for every single pixel. Still, it runs pretty fast, even on my old iPad mini with a 2048 x 1536 retina display (that is over 56 million sine/cosine calculations per frame!). This is because many GPUs have lots of processing cores that run in parallel to render the final image. Still you will see that mobile GPUs are (much) less powerful than even cheap video cards in desktop PCs and laptops.

Where To Go From Here

Knowing how to program the GPU can be a useful skill since the GPU is much more efficient for graphical operations than the CPU, and allows for effects that are impractical to perform in real-time on the CPU.

At Grijjy, we even use GPU shaders for (arguably) simple operations like rendering videos to the screen in our Lumicademy app. This offloads a lot of work from the CPU, giving it more time to keep the user interface responsive and to perform background tasks.

Check out these resources for information and specifications of the shader languages used in this post:

I know that this was a very long post, but still, I was only able to cover the tip of the iceberg (one that is 99% underwater that is). Still, I hope you found it useful. The only real way to get the hang of this is to try it yourself. So don’t be afraid to experiment…

13 thoughts on “Introduction to Shader Programming

  1. This is by far the best article I’ve read on the subject. Would hope Embarcadero at some point extends their help system to include articles of this level. Thank you so much, what a gem. You even link to Apple’s MSL specification at the end!
    I will be trying to make some convolutions which require two textures as input and render to an offline (not screen) render target texture, so your article was great for me to begin. Keep it up! Best regards.

    Like

      1. It was badly needed to have a good start on the way FMX implements the GPU pixel shader stuff, otherwise the GPU in the VCL was a worse nightmare, had to implement everything from scratch to not play directly with API mess and rather be talking the Delphi style. Talking to the GPU with pixel shaders you can be shading with global illumination algorithms, with data stored in textures as vectors, intensities, angles, or whatever your architecture likes, braking away completely from a fixed pipeline.

        Liked by 1 person

Leave a reply to zemorangoguy Cancel reply