Pointers, loose coupling, indirectness and the interesting link between them with some surprising hidden pitfalls
Lately I was working at work with various services, maybe one or two microservices. I also worked a bit on an open source project, entity framework core. And then I remembered about pointers which I learned in the first year at university.
Now you ask me, what do these things have in common? Lets start firstly with pointers. Usually in programming you can access a resource directly from the code. This resource can be a number, an integer or just about anything else. But by using pointers you don't access the resource directly. The pointer is pretty much a number that has a deeper meaning in a context.
Now to access that resource, you need to do a bit of translation. You don't access it directly anymore in the code. You have to figure out where it points to and then see what was the last value that was assigned in the place its pointing to. Then you know the actual value it points to. So there is kind of jump in the normal workflow when accessing a resource. And that resource can be shared and accessed by multiple pointers at the same time. This is the same case with references. And you can have multiple levels of indirectness. The application becomes more "spread out" once you start using pointers and references. You need to start zooming out a bit more to see the bigger picture.
Moving on to a different concept, in object oriented programming we can have virtual methods or interfaces. They don't seem like it but are actually related to pointers and references. So instead of accessing directly a method on an object, the object contains a token or reference to a method that is stored somewhere else. When you call that method on that object, the runtime has to resolve the actual method based on that reference. That reference is like an index or "pointer" inside a list or array which contain all the possible method that can be called with that signature.
Now things become even more complicated because when you call a method on an object, you need to actually figure out what method is being called. Now the translation can be more complicated because you need to figure out in your code what kind of object you are trying to call a method on, from where it originates so that you can understand how the program execution will continue next. Things become even more complicated when you try to figure out what kind of object are you dealing with because a lot of times it can be provided to you by a specialized mechanism which can have it's own logic to determine what kind of object it will provide you with.
The simplest example of this is the WebRequest class in .Net framework. It has a generic create method which returns a derivate of the web request class based on the provided parameters like in the example bellow:
If the provided address used a different protocol that http, then a different request type would have been created instead of an http request. It could have been an ftp address and a ftp request would have been generated.
But this is a simple case. You can have use dependency injection mechanisms which are responsible for providing you with actual implementations of an interface. This makes things more loosely coupled and makes your application more flexible to changes. But it also makes it a bit harder to understand how everything fits together to see the bigger picture and what your application actually does for the user. You need to "zoom out" even more to understand things at this level. And the dependency injection mechanism can return different instances of an interface based on the current context. Usually this context can refer to the current thread or the current web request if you are developing a website, especially in ASP .Net.
It might give you the same instance for multiple calls because they are in the same context or separate instances for each call. If you have one instance in the same context and it is used multiple times it might get changed or corrupted by a previous call. Or it might attempt to execute incompatible actions on the same instance. For example you might have a database repository and when you get the instance for that repository, you will delete a record from the database and after that when you get the same instance, you might try to get the deleted record from the database. Sometimes you might not even be aware that you are getting the same instance and you can get an exception when you are trying to execute conflicting actions.
But there is more. Applications can get so big and so loosely coupled that they are deployed as multiple separate services on various machines. Things become even more complicated here as there is virtually no limit to how these services can be organized and depend on one another. You might have multiple instances of that application that share some common services but also have a lot of separated services too. Or some instances might have some services in common. Or you might have multiple environments each which it's own separate instances of the common services used across all the instances of the application. It can become tricky and challenging sometimes to figure out what service is actually used.
Maybe an instance of an application does some bad which corrupts data in a service that is used by another application. And then that application tries to use that data from that common service, it might crash.
And the entire system might get so big, spread around tens, hundreds and thousands of machines that very few people have the bigger picture and can understand what it actually supposed to do and how does it do that. But I haven't had a lot of experience here fortunately though I did have so implement some services at a point in time used by a system like this which linked multiple services together making a simple interface for the major front end website, easing the people's' job that worked on those website which had to figure out how to located and use those services.
The funny thing is that these services are referenced by addresses. Pointers were actually addresses too, memory addresses to be more precise but in the case of services we are dealing with web addresses or urls. And you still rely on a system to locate those addresses but in this case, it's not the memory system anymore but the DNS/IP systems.
And when the components are very loosely coupled, it is much more easier when you change one of them, to forget about the adjacent components or services that use it. Because of this, you can introduce some breaking changes. Maybe that service is shared across many other clients and they are all uniforms. I actually ran into this issue when I changed a major service used across multiple websites. The website passed the request to my service providing me with 2 parameters: the name of the website and the username. The username used to have this format username@website.com. But we had to change the service to only accept usernames which did not contain the domain name too. But there were some website which people forgot about and forgot to update because they were pretty much rarely used and forgotten by most people. When I updated the service I broke those website which were actually used by some key clients.
Finally, when you have so many services a lot of effort goes into managing communication between these services. You need to have some kind of data transfer objects to transfer that between these services and some clients to access other services. You need to also code the "glue" which keeps everything together.
So the bottom line is, it's a good thing in general to keep things well organized and as separated as possible inside big applications to reduce coupling. But it comes with some costs that a lot of people are not very well aware all the time. Witth these systems with very loose components, it is harder to put everything together in order to understand how everything works and what it is actually supposed to do in the bigger picture. And it also takes extra effort to manage it because you also have to create and manage the "glue" that keeps everything working together.
Now you ask me, what do these things have in common? Lets start firstly with pointers. Usually in programming you can access a resource directly from the code. This resource can be a number, an integer or just about anything else. But by using pointers you don't access the resource directly. The pointer is pretty much a number that has a deeper meaning in a context.
Now to access that resource, you need to do a bit of translation. You don't access it directly anymore in the code. You have to figure out where it points to and then see what was the last value that was assigned in the place its pointing to. Then you know the actual value it points to. So there is kind of jump in the normal workflow when accessing a resource. And that resource can be shared and accessed by multiple pointers at the same time. This is the same case with references. And you can have multiple levels of indirectness. The application becomes more "spread out" once you start using pointers and references. You need to start zooming out a bit more to see the bigger picture.
Moving on to a different concept, in object oriented programming we can have virtual methods or interfaces. They don't seem like it but are actually related to pointers and references. So instead of accessing directly a method on an object, the object contains a token or reference to a method that is stored somewhere else. When you call that method on that object, the runtime has to resolve the actual method based on that reference. That reference is like an index or "pointer" inside a list or array which contain all the possible method that can be called with that signature.
Now things become even more complicated because when you call a method on an object, you need to actually figure out what method is being called. Now the translation can be more complicated because you need to figure out in your code what kind of object you are trying to call a method on, from where it originates so that you can understand how the program execution will continue next. Things become even more complicated when you try to figure out what kind of object are you dealing with because a lot of times it can be provided to you by a specialized mechanism which can have it's own logic to determine what kind of object it will provide you with.
The simplest example of this is the WebRequest class in .Net framework. It has a generic create method which returns a derivate of the web request class based on the provided parameters like in the example bellow:
If the provided address used a different protocol that http, then a different request type would have been created instead of an http request. It could have been an ftp address and a ftp request would have been generated.
But this is a simple case. You can have use dependency injection mechanisms which are responsible for providing you with actual implementations of an interface. This makes things more loosely coupled and makes your application more flexible to changes. But it also makes it a bit harder to understand how everything fits together to see the bigger picture and what your application actually does for the user. You need to "zoom out" even more to understand things at this level. And the dependency injection mechanism can return different instances of an interface based on the current context. Usually this context can refer to the current thread or the current web request if you are developing a website, especially in ASP .Net.
It might give you the same instance for multiple calls because they are in the same context or separate instances for each call. If you have one instance in the same context and it is used multiple times it might get changed or corrupted by a previous call. Or it might attempt to execute incompatible actions on the same instance. For example you might have a database repository and when you get the instance for that repository, you will delete a record from the database and after that when you get the same instance, you might try to get the deleted record from the database. Sometimes you might not even be aware that you are getting the same instance and you can get an exception when you are trying to execute conflicting actions.
But there is more. Applications can get so big and so loosely coupled that they are deployed as multiple separate services on various machines. Things become even more complicated here as there is virtually no limit to how these services can be organized and depend on one another. You might have multiple instances of that application that share some common services but also have a lot of separated services too. Or some instances might have some services in common. Or you might have multiple environments each which it's own separate instances of the common services used across all the instances of the application. It can become tricky and challenging sometimes to figure out what service is actually used.
Maybe an instance of an application does some bad which corrupts data in a service that is used by another application. And then that application tries to use that data from that common service, it might crash.
And the entire system might get so big, spread around tens, hundreds and thousands of machines that very few people have the bigger picture and can understand what it actually supposed to do and how does it do that. But I haven't had a lot of experience here fortunately though I did have so implement some services at a point in time used by a system like this which linked multiple services together making a simple interface for the major front end website, easing the people's' job that worked on those website which had to figure out how to located and use those services.
The funny thing is that these services are referenced by addresses. Pointers were actually addresses too, memory addresses to be more precise but in the case of services we are dealing with web addresses or urls. And you still rely on a system to locate those addresses but in this case, it's not the memory system anymore but the DNS/IP systems.
And when the components are very loosely coupled, it is much more easier when you change one of them, to forget about the adjacent components or services that use it. Because of this, you can introduce some breaking changes. Maybe that service is shared across many other clients and they are all uniforms. I actually ran into this issue when I changed a major service used across multiple websites. The website passed the request to my service providing me with 2 parameters: the name of the website and the username. The username used to have this format username@website.com. But we had to change the service to only accept usernames which did not contain the domain name too. But there were some website which people forgot about and forgot to update because they were pretty much rarely used and forgotten by most people. When I updated the service I broke those website which were actually used by some key clients.
Finally, when you have so many services a lot of effort goes into managing communication between these services. You need to have some kind of data transfer objects to transfer that between these services and some clients to access other services. You need to also code the "glue" which keeps everything together.
So the bottom line is, it's a good thing in general to keep things well organized and as separated as possible inside big applications to reduce coupling. But it comes with some costs that a lot of people are not very well aware all the time. Witth these systems with very loose components, it is harder to put everything together in order to understand how everything works and what it is actually supposed to do in the bigger picture. And it also takes extra effort to manage it because you also have to create and manage the "glue" that keeps everything working together.
Comments
Post a Comment