Skip to main content

Part 2 of writting an expermental code analyzer and sort of interpreter for C#

This is a continuation of my blog series about writing a hybrid C# code analyzer and interpreter.
Compared to part one, I renamed the variable references from "TrackedVariableReference" to "EvaluatedObjectReference" and a variable is now known as an "EvalutedObject" or any subtypes derived from it.

This part will focus on the simulation of the code execution. The core part is the object that stores the current execution state which is of type "CodeEvaluatorExecutionState". The most important things stored in this state object are the objects of the type "CodeEvaluatorExecutionFrame". These objects represent a method call and store all the local parameters corresponding to a method call. These parameters include the local "this" reference, the local objects created inside the method, the passed parameters from caller of the method and a special slot for a reference which contains the results of an expression. This last slot is important because when we access a member from a variable we store it inside this slot. When we call a method, the result is also stored in this slot. Basically anything operation which returns any sort of object is stored in this slot.

The structure of the "CodeEvaluatorExecutionFrame" looks like this:

public class CodeEvaluatorExecutionFrame
{
private readonly EvaluatedObjectReference _returningMethodParameters = new EvaluatedObjectReference();
private EvaluatedObjectReference _memberAccessResult;
#region Public Properties
/// <summary>
/// Gets or sets the accessed reference.
/// </summary>
/// <value>
/// The accessed reference.
/// </value>
public EvaluatedObjectReference MemberAccessResult
{
get { return _memberAccessResult; }
set { _memberAccessResult = value; }
}
/// <summary>
/// Gets or sets the current method.
/// </summary>
/// <value>
/// The current method.
/// </value>
public EvaluatedMethodBase CurrentMethod { get; set; }
/// <summary>
/// Gets or sets the current syntax node.
/// </summary>
/// <value>
/// The current syntax node.
/// </value>
public SyntaxNode CurrentSyntaxNode { get; set; }
/// <summary>
/// Gets the local variables.
/// </summary>
/// <value>
/// The local variables.
/// </value>
public List<EvaluatedObjectReference> LocalReferences { get; } = new List<EvaluatedObjectReference>();
/// <summary>
/// Gets or sets the stack variables.
/// </summary>
/// <value>
/// The stack variables.
/// </value>
public Dictionary<int, EvaluatedObjectReference> PassedMethodParameters { get; } =
new Dictionary<int, EvaluatedObjectReference>();
/// <summary>
/// Gets the returning method parameters.
/// </summary>
/// <value>
/// The returning method parameters.
/// </value>
public EvaluatedObjectReference ReturningMethodParameters
{
get { return _returningMethodParameters; }
}
/// <summary>
/// Gets or sets the this reference.
/// </summary>
/// <value>
/// The this reference.
/// </value>
public EvaluatedObjectReference ThisReference { get; set; }
#endregion
}
view raw document.cs hosted with ❤ by GitHub
When we first start to simulate the execution of a program, an initial execution frame is generated from the method that we passed as a starting point to the simulator. When a method is called inside the code, another execution frame is pushed in the execution state object and becomes the active execution frame. This means that from now on all newly created references for objects will be stored in the new frame and newly created objects will only be accessible from references stored in the new frame.

When a the end of a called method is reached, the current execution frame is deleted and removed from the execution state object. If the method returns any objects then they will be stored in the expression result slot from the previous frame from which the method was called. This result can be used for another expression which will produce another result and so on. An expression result is actually "consumed" by next expression in a chain of expressions. For example we can chain multiple method calls calling another method on the result returned by the previous method call.

A constructor or property is seen as a normal method call. For a constructor we do not pass the "this" reference but instead we create it from scratch and it always returns a result, the newly created object. Properties are seen like normal methods but with a predefined and standard parameter for the "set" accessor and a method that always return something for the "get" accessor.

The data is passed around inside the simulator by objects which have the type "EvaluatedObject" or a subtype of it like I mentioned in the beginning. For each object we only store the fields as normal references. The structure of an object looks like this:

public class EvaluatedObject
{
#region Public Methods and Operators
/// <summary>
/// Pushes the history.
/// </summary>
/// <param name="expression">The expression.</param>
/// <param name="executionFrame">The execution frame.</param>
public void PushHistory(SyntaxNode expression, CodeEvaluatorExecutionFrame executionFrame)
{
var evaluatedObjectHistory = new EvaluatedObjectHistory
{
SyntaxNode = executionFrame.CurrentSyntaxNode
};
_history.Add(evaluatedObjectHistory);
}
#endregion
#region Constructors and Destructors
public EvaluatedObject(EvaluatedTypeInfo typeInfo, List<EvaluatedObjectReference> fields)
{
ObjectFactory.BuildUp(this);
_typeInfo = typeInfo;
_fields.AddRange(fields);
}
protected EvaluatedObject(EvaluatedTypeInfo typeInfo)
{
ObjectFactory.BuildUp(this);
_typeInfo = typeInfo;
}
#endregion
#region Fields
protected readonly List<EvaluatedObjectHistory> _history = new List<EvaluatedObjectHistory>();
protected readonly List<EvaluatedObjectReference> _fields = new List<EvaluatedObjectReference>();
protected EvaluatedTypeInfo _typeInfo;
#endregion
#region Public Properties
/// <summary>
/// Gets the member variables.
/// </summary>
/// <value>
/// The member variables.
/// </value>
public virtual IReadOnlyList<EvaluatedObjectReference> Fields
{
get { return _fields.Union(_typeInfo.SharedStaticObject._fields).ToList(); }
}
/// <summary>
/// Gets the history.
/// </summary>
/// <value>
/// The history.
/// </value>
public virtual List<EvaluatedObjectHistory> History
{
get { return _history; }
}
/// <summary>
/// Gets or sets the parent heap.
/// </summary>
/// <value>
/// The parent heap.
/// </value>
public IEvaluatedObjectsHeap ParentHeap { get; set; }
/// <summary>
/// Gets or sets the type information.
/// </summary>
/// <value>
/// The type information.
/// </value>
public virtual EvaluatedTypeInfo TypeInfo
{
get { return _typeInfo; }
}
#endregion
}
view raw gistfile1.txt hosted with ❤ by GitHub
The rest of the object data like the methods and properties are stored in a common place in the objects type. So for each object we have a type definition that is called "EvaluatedTypeInfo". This type stores all the methods that an object has plus links to the base types and interfaces. It's structure looks like this:

public class EvaluatedTypeInfo : EvaluatedMember
{
private readonly EvaluatedStaticObject _sharedStaticObject;
public EvaluatedTypeInfo()
{
_sharedStaticObject = new EvaluatedStaticObject(this);
}
public EvaluatedStaticObject SharedStaticObject
{
get { return _sharedStaticObject; }
}
#region Fields
#endregion
#region Public Properties
/// <summary>
/// Gets all fields.
/// </summary>
/// <value>
/// All fields.
/// </value>
public List<EvaluatedField> AllFields { get; } = new List<EvaluatedField>();
public List<EvaluatedMethod> AllMethods { get; } = new List<EvaluatedMethod>();
public List<EvaluatedProperty> AllProperties { get; } = new List<EvaluatedProperty>();
public List<EvaluatedTypeInfo> BaseTypeInfos { get; } = new List<EvaluatedTypeInfo>();
/// <summary>
/// Gets the constructors.
/// </summary>
/// <value>
/// The constructors.
/// </value>
public List<EvaluatedConstructor> Constructors { get; } = new List<EvaluatedConstructor>();
public List<EvaluatedField> Fields { get; } = new List<EvaluatedField>();
public Dictionary<int, EvaluatedTypeInfo> GenericTypeInfos { get; } = new Dictionary<int, EvaluatedTypeInfo>();
public bool IsDelegateRelatedType { get; set; }
public bool IsGenericDefinition { get; set; }
public bool IsGenericRealization { get; set; }
public bool IsInterfaceType { get; set; }
public bool IsReferenceType { get; set; }
public bool IsValueType { get; set; }
public List<EvaluatedMethod> Methods { get; } = new List<EvaluatedMethod>();
public List<MemberDeclarationSyntax> NamespaceDeclarations { get; } = new List<MemberDeclarationSyntax>();
public List<EvaluatedProperty> Properties { get; } = new List<EvaluatedProperty>();
public List<UsingDirectiveSyntax> UsingDirectives { get; } = new List<UsingDirectiveSyntax>();
#endregion
}
view raw typeinfo.cs hosted with ❤ by GitHub
Another thing worth mentioning is the fact that when we access a method, they are wrapped around a special "delegate" object. This object can be passed around just like any other object but the method stored inside it can be invoked. It also stores a references to the initial object on which the method was retrieved.

Also each object type definition contains a special static object that contains all the static references for that type. These instances are shared for all the objects of that type.

That's about it for now. In part 3 I run explain how to handle the actual code execution simulation,



Comments

Popular posts from this blog

Some software development common sense ideas

 I haven't really written here in a long time so it's time to write some new things. These are just some random common sense things that seemed to short to write about individually but seem really interesting and useful to me or other people 1. There is nothing fixed in software development, all things vary by circumstances and time Remember the documentation that didn't seem that important when you started the project, well after a couple of years one the application has grown and become really complicated, no one actually knows everything about the application anymore. So now you really need that documentation. What happens if you suddenly need much more people to develop the application because of some explosive growth? Without documentation, new developers will just look at the application like they look at a painting. This actually happened to me. Maybe in the beginning of a project, a technology really helped you a lot but as the project grew, it started making things...

Some things which are often blindly applied and followed by developers which are not always good

This is probably one of the most controversial things that I have written so far but I am just tired of hearing about these things and discussing these things. Other developers that I know share part of my feelings. I would rather hear more about how people built things, overcame challenges or what new interesting ideas and concepts they implemented. Those things are really interesting and innovative, not hearing about the same theoretical things over and over again. I can just read and learn those things from 100 sources on the internet. Firstly, one of the most discussed and promoted things is agile/scrum development. I think I have been through 5-8 workshops about agile development methodology. And each time, some things differed. There is no 100% standard approach to this. Everyone uses their own version of this development methodology and seem to argue a lot that their approach is right and everyone else is doing it wrong. You go to an interview, this will be one of the first 10 t...

Some things about doing presentations that I learned recently

Lately I had to do more presentations ranging from technical ones about various technologies to other presentation about projects that I work on or even tools that I made for developers and testers. I didn't learn a big list of things from them but rather a couple of really important things to keep in mind when doing a presentation. To begin with, as a presenter your purpose is to make people understand what you are explaining first and foremost. I have seen all too many experienced developers trying to impress the audience when presenting something. They use a lot of pompous language, terms and jargon to make things seem more complicated than they really are. This is just to feed their own ego, to send a subliminal message to the crowd over the lines of: "Look at me how good I am, I managed to understand and apply these things which sound really complicated when I tell them". Imagine if they used a more simple language and a language which is less scary and intimid...