Published on

The cost of modeling


Language of the problem

In one of previous tales, we were discussing the language of the problem.

Essentially, that is what Domain-Driven Design is about - carefully listening to another human being and capturing the concepts (along with related activities) in the form of the model.

What comes first is the understanding, so the mental model we built inside of our mind so that we are able to communicate effectively (or at least we should).

Another representation might be a model specified using our favorite programming language. Then we refer to it as "code".

I mentioned one observation in "Essentially bounded, accidentally unlimited":

Conclusion 🔍

There is no single solution, but there is a single problem.

Based on this premise, we are able to create various models so that we tackle the essential complexity. The one that never disappears (provided that we are solving the right problem!).

Listen to the expert

The last part of "Language of the problem" contains a brain teaser, or better, modeling teaser.

Given the model:

public class Emails
  private List<Email> _emails; // details

  public Emails(List<Email> emails) =>
    _emails = emails; // ctor

  public List<Email> GetInvalidEmails() => // is it ANY email?
    .Where(email => !email.IsValid)

Is the method GetInvalidEmails saying all truth? Its name clearly states we can expect only invalid emails, but the result doesn't indicate that.

Without any problems, one can easily notice that Email contains a IsValid property. This might tell us the story that all the Emails returned would have IsValid set to false.

But wait, given a collection of invalid emails, one could do with them?

Let's assume that our expert says that in this marathon, oops, I was about to say sprint, we need only to "print" them.

That's not fancy.

Unfortunately, a collection of invalid emails, represented by compile-time model List<Email> with runtime IsValid indication set to false, gives a lot more, e.g. using a very cryptic (to our expert) Aggregate operation.

Can we do better?

Domain out of balance

In my observation, people tend to watch shorter and shorter videos. If a content creator publishes a video longer than 20 min, there might be a high probability it won't get that many views as 8 min video (common sense, isn't it?).

We are all busy. We need to be fast, efficient, productive.

Even by having support of the greatest IDEs in the market, it's "still" quicker to leave List<Email> as a representation of invalid emails.

Physical typing is just a top of an iceberg. There's even bigger cost - mental, or psychological one.

By clearly listening to our imaginary expert, we know what should be expressed in the "code" and it is not a list of emails with IsValid boolean flag.

To me it looks as if we got out of balance with our domain. We no longer align with the mental model presented by the expert.

And what is worse, it's easy to get used to such approach.

Constraints liberate

What if, just for a mental expertiment, we would create a new concept called, wait for it, InvalidEmails?

public class InvalidEmails

Let's think of it in terms of "Organization-Driven Design" - and imagine that InvalidEmails is a local expert in working with invalid emails.

What would he be responsible for?

For now, it's just printing the invalid emails in the form of text.

We could probably use well-known ToString method, right? Technically speaking yes, but as an expert, he doesn't know what "ToString" means.

But when you ask him to print the invalid emails, we would eagerily do so.

public class InvalidEmails
   public string Print()
   { /* implementation goes here... */}

What to print if there's nothing available?

public class InvalidEmails
   private List<InvalidEmail> _invalidEmails // received from ctor
   public string Print()
   { /* implementation goes here... */}

We introduced another "concept" - InvalidEmail. There is no longer ANY Email with runtime IsValid property set to false, but a very precise specification.

Both for the responsibility and for the related contract details.

Hold on cowboy!

I hear it (currently in my head only) - you are complicating so much, keep it simple, you ain't gonna need it, etc.

Well, undoubtedly we introduced "more" code. "More" readable code.

Things got segregated which by definition implies it might require a bit more "infrastructure" - characters in the given file (or separate files).

Are we really optimizing for a number of characters? Nope, we all know it's not the case.

Should we be afraid that such code soon will come unmaintanable piece of mud? Who knows.

Are we communicating the intentions? Yup, that's one of the points.

Just having List<Email> with runtime property IsValid set to false as a representation of invalid emails is not enough.

Same stands just for IsValid boolean flag - it is so-called status property. You could easily imagine an enum property called, you won't guess it, Status with two possible values: Valid and Invalid.

Mixed language, mixed worlds

I dare to say that most of the primtive obsession examples are there because of mixing up two aspects: data and model.

Either a boolean flag or an enum are suitable for working with infrastructure components:

  • serializing them to JSON so that we can return them to from the controller
  • putting them inside of a message so that we are able to send them to a queue
  • inserting them to a database
what does your model resemble - a data structure?

Shouldn't we use right "abstractions" on various levels?

When talking to an expert, a model is clearly communicating what is possible.

When talking to a database, a data structure has its own benefits.

Aren't we introducing the model-to-data (impedance) mismatch? Surely we do.

How much does it hurt?

Language cost

There's also another point on "proper" modeling - what your tools allow you to do.

Representing two cases for an email: valid email and invalid email, might be troublesome in some languages.

It would be even undesirable to model them separately as such usage won't get magical label "idiomatic", in terms of the given language.

Given the need for expressing the model as close as experts are telling us, I would say that some languages encourage modeling , whereas other rather discourage it.

I am not stating that C# is "worse" (or something) than F#, but it would be really uncommon to see two classes for representing two separate cases of an email.

Then it would "break" one of the CUPID properties - I - idiomatic.

On the other hand, in F# it is a quite common to have:

type Email =
| Invalid of InvalidEmail
| Valid of ValidEmail

The cost (I am not talking about a runtime, performance-wise cost) of using such construct is simply nothing, no additional effort required.

Model states, not statuses

In our particular example, statuses are reserved for infrastructure side - one might say they are "optimized" for it.

What about states? States are for the essential complexity, for representing the concepts available in the domain you are designing the software for.

A question is how easily your language allows you to model.

Modeling states also gives very nice property - we are reasoning with the types we see on the screen. This means we don't need to "load" the model into our brain. We keep it externalized.

That's what I tried to convey in "The value of Value Objects" blog post.

Additionally, you are able to restrict some of the operations/responsibilities to work only with a specific state.

This is what we mean by "embedding the invariants/rules in the type".

Are you capable of paying the cost?

What is really crucial here that it is not only applicable for the domain/business concepts.

We are solving problems all the time, some of them are technical in nature which is a domain itself.

Then the language of the domain is highly technical. Still we are able to express it in the decent way and the good part is that we are the experts.

Materializing the domain in the code costs - sometimes you need to put some additional key strokes to reveal the intention. But again, are we suppose to minimize the key strokes?

Scattering pure data structures along the entire codebase costs too - it might seem "simpler", but isn't it deceiving that less "code" simply expresses the same intentions as expressive language?

It might be that you are afraid that soon such modeling approach will turn your code into a piece of turdpile because it will leak toward undesirable places of your codebase? Watch your "layers" and pay attention to the local architecture - maybe some layers see too much? If you would like to see a metaphor for working with such state please check "Organization-Driven Design".

The biggest cost?

Nothing is so limiting to us as ourselves. Even if you think that such reasoning is utter nuts, give it a try.

Might it be that there are some interesting assumptions you made long time ago so that they got a bit...Stiff?

Challenge your mental representation (hence mental model) of how the model looks like.

Slow down, start from specifying through tests what the model should do (you don't know what I am talking about? Please check "The ambiguity of TDD").

Listen to your problem experts, if you don't have such, listen to the language, pay attention to the words and be free with your tempo. Be free to yourself.

And the last thing.

Should you always design rich, well-encapsulated models? Yes, my zealot, always (I hope you see the irony here!).

A bonus: modeling teaser

Imagine that one expert says that the patient can be either healthy or sick.

When he or she is sick, we need to send a text message with drug prescription.

How would you model that in your favorite language?