Yonatan Fedaeli: .Net Garbage Collection (GC)

Abstract

In this post I'd like to write about .Net Garbage Collector (GC), its characteristics, algorithm, different mechanisms and its significant contribution to .Net applications.

Furthermore, I'll strive to present best practices and extensively describe how the GC behavior influences the application's performance and on the entire development process.

Most of this information is from a presentation I prepared & presented in the past, in my previous work places, as part of a learning series I created about .Net CLR Internals.

The aim of this series was to enrich the teams with advanced topics while creating practical and concise presentations.

Naturally, there is lots of information out there about the .Net GC, but from my experience, I found that by presenting such advance topic while focusing on the main features, and not by explaining every irrelevant detail, the learning experience is much more productive. Especially for developers that are interested in understanding how the Garbage Collector works and how to write better performance oriented software.

Having said that, the Garbage Collector topic is very large and full with internal information, so when required I'll probably present more detailed internal information.

Hope you'll achieve benefit from this presentation, and most importantly enjoy the ride… J

Remark: Framework Version

In this article I'll describe the general behavior of the .Net Garbage Collector; some information could be modified from one CLR version to another.

Related articles:

1. .Net GC – Finalization Mechanism & the Dispose Pattern

2. .Net Garbage Collection Modes

3. .Net Type Internals

4. Sync Block Index (SBI) \ Object Header Word

5. .Net debugging – Get started

Contents

1. Introduction

- .Net managed environment

- Virtual Address Space

- Managed (GC) Heap

- Allocation

- De-allocation

2. The GC Algorithm (Mark, Sweep & Compact)

3. Mark Phase

- 'Live objects' graph

- GC Roots

1. Local variables.

2. Static variables.

3. GC handles.

4. The f-reachable queue.

4. Sweep & Compact Phases

5. Generations

- Partial Garbage Collections

- Generational Model Theory

- Generation 0

- Generation 1

- Generation 2

- The Large Object Heap (LOH)

6. GC Modes

1. Workstation GC:

1. Concurrent Workstation GC.

2. Non-Concurrent Workstation GC.

2. Server GC.

7. Finalization

- Finalization list

- F-reachable queue

- Finalizer thread

- The Finalization mechanism drawback

8. Dispose pattern

- Dispose Pattern Implementation

- The Using Statement

9. Summary

Introduction

In the introduction section I'll cover the following topics:

- .Net managed environment

- Virtual Address Space

- Managed (GC) Heap

- Allocation

- De-allocation

.Net managed environment

The Garbage Collector's main purpose is to manage the application's memory and to allocate and de-allocate objects on the Managed Heap (a.k.a. GC Heap).

This means that (as opposed to languages such as C++) the .Net Framework (CLR & FCL) provide managed environment (automatic memory management), that is to say, we don’t need to manually (by code) free memory of unused objects, the GC performs all the necessary in the most effective way.

- CLR – Common Language Runtime.

- FCL – Framework Class Library.

One of the main reasons for unpredictable issues at runtime (when and where they occur), used to be a result of misused 'memory management' by developers. (Mainly by forgetting to release memory or by trying to use memory that was already released)

It's reasonable to say that the importance of the GC mechanism is priceless, since in addition to the automatic memory management it provides, the GC also allows developers to focus on the business logic development, without any concern of the memory management, as we did in the past.

Moreover, it's important to understand how the GC works since it's one of the primary mechanisms affecting the performance of .NET applications.

Virtual Address Space

Prior to diving into the .Net Garbage Collector internals, I'd like to elaborate a bit about Virtual Address Space, with a distinction between 32bit & 64bit Win OS.

A Virtual Address Space is the set of ranges of virtual addresses that a process could use while executing, provided & managed by the operating system.

On a 32-bit architecture:

When running a process on a 32bit system, the OS assign 4GB of memory for its virtual address space (2³²).

However, by default, the OS allows the process to use only 2GB of memory (user-mode), while the other 2GB are used by the OS itself (kernel-mode).

On some Win 32bit system, it’s possible to enlarge this virtual address space to 3GB by setting the IMAGE_FILE_LARGE_ADDRESS_AWARE flag.

On a 64-bit architecture:

As is known, on a 64bit system, we could run both 32bit and 64bit processes.

- 32bit process: up to 4GB.

- 64bit process: up to 8TB (and more…).

As this is not the main topic of this article, I'm not going to elaborate additionally on this, just mention that we could also control these extensions by using the LARGEADDRESSAWARE flag.

But it's very important to understand that the OS is controlling and assigning the virtual address space for the process, the relevant sizes for different OS versions and processes, since it impacts on our application's behavior, and its overall performance.

Virtual Address Space – States

The VAS could be in 1 of the following 3 states:

1. Free

A particular range of virtual addresses (block of memory) is free for allocations, meaning there are no references from the process to this memory.

2. Reserved

This block of memory is reserved for a particular object(s), meaning no other object(s) could use it. However, it can't contain data until the memory is committed.

3. Committed

This block of memory was committed, meaning it was assigned to physical memory and could contain the objects data.

Managed (GC) Heap

When the CLR initializes a .Net process, the GC preserve a segment in memory (from the process's virtual address space) designated for both the Managed Heap and for the Large Object Heap (LOH).

The GC also maintains a 'Next Object Pointer' (NextObjPtr\NOP) that points to the next available address in the Managed Heap, which is been used in the allocation process. (Also modified in the collection process – I'll elaborate later on).

In addition, this allocation approach ensures that when allocating several objects together, they would be stored contiguously in memory. Most of the time these objects are logically related, thus it creates 'locality of reference' which directly impact on the application performance. (The Working Set is smaller and sometimes the CPU Cache could contain all relevant objects, thus Cache Misses could be prevented).

Remark: The LOH contains 'large objects' (of size 85 KB and more) and is treated a bit differently than the Managed Heap. (Additional information in the Large Objects Heap section below)

Allocation

The allocation operation is relatively cheap operation, meaning it usually involves incrementing the 'Next Object Pointer' (NOP) and not using a complex algorithm, which cause high performance issues. (Even for a large amount of objects)

Remarks:

- Multi-Processor systems are treated differently (support additional synchronization mechanism), and not discussed in this article, but still the use of the NOP for allocating objects provide good performance.

- As noted above, 'Large Objects' are treated a bit differently and are stored in the LOH.

De-allocation

The GC 'releases' old objects by reclaiming their memory from the managed heap, after they are no longer needed in the application.

The De-allocation operation (with respect to the allocation operation) is relatively high performance operation, since its algorithm is more complex, and includes the Mark, Sweep & Compact phases.

GC collection Time

The GC performs the de-allocation when a memory usage threshold is reached.

Remark:

Later in this article, I'll describe the GC Generation feature, and elaborate how and when objects are being allocated and de-allocated (collected), and what exactly a 'memory usage threshold' is?

The GC Algorithm

There are several different memory management strategies, such as: 'Reference Counting GC', 'Free List Management' implemented in different frameworks.

Each of these (mostly old) algorithms has its share of disadvantages.

I won’t elaborate on these mechanisms, only indicate that the .NET CLR uses the Tracing GC mechanism, which is a significant improved implementation, that revoked all of the weaknesses of the old strategies.

Tracing GC comprises 3 phases:

1. Mark Phase:

The GC looks for all the live objects that are still referenced by the application. (Using GC Roots)

2. Sweep Phase:

The GC 'takes back' the space occupied by dead objects.

3. Compact Phase:

The GC moves live objects in memory, so that they would remain continuously in memory.

After completing the Mark, Sweep & Compact phases the GC collected back unused memory space (by freeing unreferenced\old objects), rearranged the remaining referenced objects (a.k.a live objects) in a continuous manner, and updated the live objects' addresses.

These phases prevent fragmentation, and other new allocations could find free space in the managed heap.

Remark: Sweep & Compact Phase

Some documentation address the Sweep & Compact phase as only one phase, which reclaims the unreachable (dead) objects memory, moves the remaining reachable (live) objects down the Managed Heap, so they would remain continuously in memory, and finally updates these objects addresses respectively.

The naming of these phases is not really significant, more important is to understand these phases algorithm.

In this article I'll address the Sweep & Compact phase as if it was one.

Mark Phase

When it's time for collection the GC starts with the Mark Phase.

In the Mark Phase the GC builds a graph of all objects that are still 'alive', meaning they are still referenced and reachable by other objects in the application.

How does the GC build the 'live objects' graph?

1. GC Roots:

- In order to know which objects are still alive, the GC uses GC Roots.

- GC Roots are objects that the GC uses as a starting point for building the graph.

- Objects that could be GC Roots are:

o Local variables

o Static variables

o GC handles

o The f-reachable queue

Remark: Further detailed information in the 'GC Root' section below.

2. Exploring GC Roots internal reference fields:

- After the GC established these GC Roots, it examines their internal reference fields and uses them to continue building the 'live objects' graph.

- The GC keeps doing this operation, until all reference objects have been examined.

How & why does the GC 'mark' the objects it examined?

The GC marks the objects it examined in order to avoid 'Reference Cycles', meaning, since objects in .Net could reference each other, the GC have to mark the objects that it already examined.

The GC marks these objects using the Sync Block Index (SBI) field.

The GC explores the thread's stack memory, resolves all roots and modify a bit in the 'Sync Block Index' filed of every object it examined.

Remark: Sync Block Index (SBI)

This article focuses on the Garbage Collector algorithm, additional mechanisms and performance issues, so I won't elaborate on the SBI field.

Additional detailed information could be found in previous articles I wrote:

- .Net Type Internals,

- Sync Block Index (SBI) \ Object Header Word.

However, I'll briefly mention that every Reference Type in .Net architecture, in addition to its data members (that were declared by the developer), also contains additional overhead, the following 2 fields:

1. Method Table Pointer – Which directs to the type's Method Table (internal CLR structure that contains relevant information on the type, such as: method addresses, flags, additional pointers to other CLR internal data structures etc.)

2. Sync Block Index – Serves for multiple usages:

- Thread Synchronization (using the CLR Monitor Mechanism with the lock keyword)

- Store the object's Hash Code (used in a hash-based collection, such as: Dictionary)

- As part of the GC work (Mark phase)

- As part of the GC Finalization mechanism.

The CLR uses these fields at run-time, and developers can't access or use them.

The GC uses the 'Sync Block Index' as part of its algorithm and part of the Finalization mechanism.

Mark phase completed

At the end of the Mark phase the GC completed building the 'live objects' graph, meaning now the GC could reclaim the unreachable objects memory in the Sweep & Compact phase.

GC Roots

As aforementioned, the CG uses a set of roots as starting points to build the 'live objects graph'.

There are 4 types of GC Roots:

1. Local variables:

Local 'Reference Type' variables are roots for other objects they might contain, thus they are used by the GC as 'GC Roots' as long as their declaring method is on the Thread Stack.

This means that, in the Mark Phase, the GC examines all methods on the Thread Stack and uses the reference types – local variables as roots (starting points) for building the 'live objects' graph.

Eager Root Collection

A local 'reference type' variable is not necessarily an active root until the end of the method it was declared in.

In some scenarios, additional time-consuming code could appear after the last use of the local variable, and a GC might occur at that time.

In such cases, the reference variable is no longer an active root, and the GC needs to reclaim its memory.

This behavior is called – Eager Root Collection.

How does 'Eager Root Collection' work?

We should ask: how does the GC know when a reference type object is no longer an active root in the method, meaning where was it last used, in order to know when and where it could be collected?

Well, actually the GC doesn’t know it by itself; it uses the Just-In-Time Compiler (a.k.a JIT Compiler or Jitter).

The JIT Compiler builds the methods at runtime (from MSIL instructions to native machine code), and at that time, the JIT Compiler indicates the first and last use of every reference type in this method in a designated internal CLR structures (tables).

Actually the first and last use of the reference type is their instruction pointer addresses.

This way the GC could use these tables of addresses in the Mark Phase to indicate whether a GC Root is still an active root, since it knows the exact place in code (address) in which it started to operate.

2. Static variables:

As we know, static variables are created when their declaring type is loaded.

Static variables would remain 'alive' in the Managed Heap for the entire lifetime of the AppDomain (until it is unloaded).

Thus, in case a static reference type contains other objects, it is considered as their roots, and the GC could use this static variable when it builds the 'live objects' graph, in the Mark Phase.

3. GC handles:

The CLR uses the 'GC Handles' as a mechanism which provides protection for objects that need to be transferred from the managed environment to an unmanaged code and back. (Their reference pointer is transferred)

These objects are set to be 'GC Handles', and by that, preventing the garbage collector from reclaiming their memory and form moving them around the managed heap (they are pined in memory), even though they have no other GC Roots in the managed environment.

Otherwise, in case the GC was to operate when these objects in the unmanaged code:

- These objects' memory could be reclaimed (if no other GC Root was found).

- Or the GC was to move them around the managed memory and update their new location (as part of the sweep & compact phase).

This behavior would cause unpredictable issues in the application and potentially severe errors.

Thus, prior to sending an object from the managed environment to an unmanaged code, we need to pin it in memory by setting it as a GC Handle.

GC Handle Table

When the CLR initialize a .Net process, it provides each AppDomain a 'GC handle table'.

The 'GC Handles table' contains all the pointers (addresses), of objects in the managed environment that were passed to an unmanaged code.

When it's time for collection, the GC examines this table and operates accordingly.

Meaning, since the GC can't reclaim these objects memory or move them around memory, they serve as GC Roots to their own internal reference fields.

Remark: Fragmentation

By creating GC handles we are instructing the GC to leave handle objects in their original location in memory, during the sweep & compact phase. This would result in a memory fragmentation, which directly effects on the entire application performance.

4. F-Reachable queue:

The f-reachable queue is part of the GC Finalization mechanism which I'll describe later in this article.

I'll just mention that the f-reachable queue is an internal CLR structure that maintain addresses of objects that are part of the Finalization mechanism (These objects implemented a 'Finalizer').

These objects are not needed by the application anymore, and still the GC can’t reclaim their memory yet.

Therefore, they serve as roots for their own internal reference objects.

Sweep & Compact Phases

At the end of the Mark phase the GC completed building the 'live objects' graph.

The Managed Heap contains these 'live objects' (the marked objects) which are still reachable from the application, and also the unreachable objects which are considered as 'garbage' and their memory could be reclaimed.

The GC uses the 'live objects' graph to perform the following:

1. Reclaim the unreachable (dead) objects memory.

2. Move the remaining reachable (live) objects down the Managed Heap, so they would remain continuously in memory.

3. Update these objects addresses to points to their new location.

4. Update the 'Next Object Pointer' to point to the new next available address in memory.

The Sweep and Compact phase is influenced by the 'Next Object Pointer' model, and its algorithm was implemented accordingly.

Meaning, the GC compacts the live objects in the Managed Heap in order to allow the 'Next Object Pointer' to point to a new free and large address space in the Managed Heap, by this allowing new allocations to be performed smoothly and very fast.

In addition, this model also prevents fragmentation which naturally effects on the entire application's performance.

The following image illustrates a GC Heap prior and post garbage collection.

Description

The red objects (Object 1 & Object 4) are not needed nor referenced by the application at this stage.

Meaning, the GC didn't add them into the 'live objects' graph at the Mark Phase.

Therefore, at the Sweep & Compact Phase the GC could reclaim their memory and update the remaining objects' (Object 2, Object 3 and Object 5) addresses.

In addition, the GC updates the 'Next Object Pointer' to point to the next available place in the Heap memory.

Sweep & Compact Phase - Performance

In terms of performance, moving objects in memory (copying from one place to another) is a very expensive operation.

In cases that the Managed Heap contains many live objects that the GC needs to move, the Sweep & Compact phase would cause a very bad performance hit.

Thus, the GC uses the Generational Model, in which it divides the Managed Heap into regions – Generations, while each generation could be collected separately (partial collection) and thus limiting the amount of objects that could be moved in memory, and by that effecting directly on the performance.

Remark: The generational model also improves the Mark Phase performance, since in partial collection, the GC operates in a particular generation, which doesn't contain all the objects in the heap, thus less objects to examine.

Additional detailed information in the 'Generation' section below.

Generations

The generational GC model (a.k.a ephemeral GC) was introduced in order to improve the collection performance, which naturally effects on the entire application's performance.

The Managed Heap is divided into 3 Generations (regions in memory): Generation 0, Generation 1 and Generation 2, while every region contains relevant objects in the lifetime of the application.

Large objects also have a designated region in the heap, the Large Object Heap (LOH), and are treated differently when a garbage collection occurs. (Detailed information later on)

These generations allows the garbage collector to perform partial collections.

Partial Garbage Collections

Partial collections are performed on a single generation, which significantly reduces the number of objects the GC needs to examine in the mark phase, and to move in the sweep & compact phases.

The partial collection mechanism provides the performance improvement of the GC work.

Generational Model Theory

The generational model was created based on a simple theory:

As far as the object is newer, it is expected to die sooner, and as far as the object is older, it is expected to die later.

Simply put, temporary objects would not be in use shortly after they were created, and objects that were created early in the application lifecycle, would live longer.

Meaning there is a direct linkage between objects age, and how long they are expected to live – their life expectancy.

Each generation would contain objects according to their ages, and their life expectancy, and the GC would move objects between generations as older as they be with each collection:

1. Generation 0 – Contains the youngest objects.

2. Generation 1 – Contains mature objects.

3. Generation 2 – Contains old objects.

In the following sections I'll describe each generation separately.

Generation 0

Gen 0: Description

- Most new objects are created in Gen 0.

o Except for Large Objects. (detailed information later on)

- Gen 0 contains objects that are frequently accessed, usually together, for a short period of time.

- Gen 0 is very small.

o Its size usually starts between 256KB – 4MB.

- Performing GC on Gen 0 is very fast & efficient:

o Does not contain many objects. (it's small)

o It’s likely that all objects in Gen 0 could appear in the CPU cache, which is very fast!

Gen 0: Invoking Garbage Collection

The GC is invoked when the application is trying to allocate memory for a new object(s), and Generation 0 is full.

Meaning, a new allocation request can NOT be satisfied in Gen 0 since it's full, thus we need to clear room for the new object(s).

The garbage collector Mark, Sweep & Compact only objects in Gen 0.

This is a partial garbage collection.

Gen 0: After GC completion

- Most Gen 0 objects are collected!

o New objects are expected to die quickly.

o Most of the objects are temporary or unreferenced.

- Some objects might survive:

o These objects could be result of incorrect implementation.

o Or perhaps the GC occurred while creating temporary objects.

Gen 0: Promotion to Gen 1

- Objects that have survived GC are promoted to Gen 1.

o These objects are copied from Gen 0 to Gen 1 (as part of the sweep & compact phase).

o This indicates that they are expected to live longer.

Generation 1

Gen 1: Description

- Gen 1 contains objects that have survived 1 garbage collection.

- Objects that reached Gen 1 are still considered temporary short-lived objects.

- Gen 1 is small.

o Its size usually starts between 512KB – 4MB.

o Slightly larger than Gen 0.

- Performing GC on Gen 1 is still relatively very fast & efficient.

Gen 1: Invoking Garbage Collection

- A new allocation request can NOT be satisfied in Gen 1.

o Gen 1 space is full.

- A prior collection in Gen 0 triggered the allocation request.

o Objects are promoted from Gen 0 to Gen 1.

- The garbage collector Mark, Sweep & Compact only objects in Gen 1.

o This is still a partial garbage collection.

Gen 1: After GC completion

- Most Gen 1 objects are collected!

o These objects weren’t collected in Gen 0, but most likely would be collected in Gen 1.

- Some objects might survive.

Gen 1: Promotion to Gen 2

- Objects that have survived Gen 1 GC are promoted to Gen 2.

o This indicates that they are now considered old objects.

- Mid-life Crisis:

o Mid-life crisis occur when temporary objects that should have been collected in Gen 1, would be prompt to Gen 2, and die shortly afterwards.

o We need to ensure that temporary objects won’t survive Gen 1.

Generation 2

Gen 2: Description

- Gen 2 contains objects that have survived at least 2 GC.

- Gen 2 objects are considered old.

o They won’t be collected soon.

- Gen 2 size:

o Its size is not limited.

o It can use the entire memory space of the .Net process.

§ Up to 2 GB on a 32-bit system.

§ Up to 8 TB on a 64-bit system.

Gen 2: Invoking Garbage Collection

- In Gen 2, the GC is invoked based on dynamic thresholds, and not based on new allocation requests that can't be satisfied. (As in Gen 0 and Gen 1)

o Gen 2 size is extremely high, and waiting for it to be filled would cause severe performance issues.

- Performing GC in Gen 2 is far more expensive than in Gen 0 and Gen 1.

o In Gen 2 the GC performs full garbage collection.

- Performing GC in Gen 2 is inefficient since most of the objects are still referenced and probably won't be collected.

Partial GC – Objects in different generations

As described in the previous sections, partial collections are performed on a single generation, which significantly reduces the number of objects the GC needs to examine in the mark phase, and to move in the sweep & compact phases.

But what happens when an object from Gen 2 (or Gen 1) is referencing an object in Gen 0 for example. Meaning an 'older object' is referencing a 'younger one', and they DO NOT reside in the same generation…?

This would mean that the GC can’t collect this object and need to promote it to the next generation.

However, in partial collections, in the Mark Phase, the GC only marks objects in the generation it's collecting (in order to improve performance).

Meaning the GC doesn't know that the object in Gen 0 has a reference from an object in Gen 2, since it doesn’t examined Gen 2 at all, and not using any GC roots from this generation.

This could lead to a severe error since the GC might collect the younger object.

Fortunately, the JIT-Compiler comes to aid.

When the JIT-Compiler compiles a method that might have a reference between objects in different generations, it adds relevant information to an internal table - Card Table.

The GC uses this table in the Mark Phase and adds all relevant objects to its GC Roots set, and by that, insures that all relevant objects would be examined correctly.

The Large Object Heap (LOH)

Description

The Large Object Heap (LOH) is a designated region in the Heap used to store large objects.

Large objects are objects lager than 85 KB. (Most likely, XML, JSON, large byte arrays and so on)

The LOH was created in order to avoid the Sweep & Compact phases of the Garbage Collection, since this would mean copying large objects in memory (promoting them between generations), which is a severe performance hit.

Thus, large objects are allocated directly in the LOH.

GC Implementation

The original Sweep & Compact implementation is not suited for the LOH, and another algorithm is used for managing and collecting large objects.

This algorithm uses a Linked List that maintains a free 'memory blocks' in the LOH.

For every new allocation request, the GC needs to search in the list for a suitable free memory block. When an object is dead, the de-allocation operation is actually returning the object's memory block to the linked list.

This implementation complicates the GC works and causes additional performance cost, however it’s still preferable to copying large objects in memory.

Invoking GC in the LOH

Large objects are considered part of Generation 2.

When Generation 2 reached a dynamic threshold and needs to be collected, it also invokes a collection on the LOH, which is a full garbage collection.

In addition, when a dynamic threshold is reached in the LOH, Generation 2 is collected as well.

LOH Fragmentation

Since the original Sweep & Compact phases are not implemented for the LOH, memory fragmentation appears in the LOH, which again cause additional performance cost.

GC Modes

As this article grew longer, I’ve decided to extract the GC Modes topic to a different post, for the reader convenience:

.Net Garbage Collection Modes

Finalization

As this article grew longer, I’ve decided to extract the Finalization Mechanism & the Dispose Pattern topics to a different post, for the reader convenience:

.Net GC – FinalizationMechanism & the Dispose Pattern

Dispose Pattern

As this article grew longer, I’ve decided to extract the Finalization Mechanism & the Dispose Pattern topics to a different post, for the reader convenience:

.Net GC – FinalizationMechanism & the Dispose Pattern

Summary

In this article we extensively explored the .Net Garbage Collection, its features, algorithm, its role in .Net process, and its great contribution to the performance of our applications.

We learned:

- How the GC manages the heap memory…

- How the GC allocates and de-allocates objects…

- When the GC collects memory…

- How the GC algorithm (Mark, Sweep & Compact) works…

- How & Why the GC manages generations…

- How could we choose a proper GC mode for our application…

- How & Why the finalization mechanism works…

- How & Why to use the dispose pattern…

- And many more…

To conclude, I would say that after reading this article you should have a deep understanding on the GC algorithm, different phases, its work and its great contribution to the performance and credibility of .Net applications.

Furthermore, we should appreciate its enormous role in the development cycle, since developers shouldn't consider memory management at all and could focus only on business (and infrastructure) development.

Related articles:

1. .Net GC – Finalization Mechanism & the Dispose Pattern

2. .Net Garbage Collection Modes

3. .Net Type Internals

4. Sync Block Index (SBI) \ Object Header Word

5. .Net debugging – Get started

The End

Hope you enjoyed!

Appreciate your comments…

Yonatan Fedaeli

.Net Garbage Collection (GC)

No comments: