When I wrote previously about refactoring I introduced the clean coding concepts of creating a table of contents, removing duplicated instructions, and never remembering more than is required.
There often isn’t much disagreement from developers over removing duplicated instructions. Never remembering more than is required sometimes conflicts with style guides that ask for all variable declarations at the top of a function, but they are counteracted by use of shorter functions to limit variable scopes.
Creating a table of contents seems to be the one that causes the most disagreement, starting when the whole routine falls within a screen and raising to the highest pitch when our logic consists of a single line of code. There are heated debates over refactorings that encapsulate simple logic statements within a function that is only called once.
But sometimes, it is still the right thing to do.
Why refactor the smallest details? Readability, Reuse, Re-requirements
Let’s be clear about perspective. What we’re arguing for is very small individual improvements to code. Taken in one at a time, they are insignificant. But good programming practices pay off in aggregate, compounding their value every time the code is read, reused, or requirements change across an entire network of connected applications. They also pay off in the writing, becoming more natural and easier to apply with consistent practice. Yes, there is a cost, but it is one worth paying, at least in regular small installments every time you revisit a section of code.
Students and beginners often fail to see the point, feeling that this addition of helper functions makes the code unreadable or at least unwieldy. It is true that at the byte code level a function is merely a GOTO statement, and terrible spaghetti code can be created via a breeding ground of many functions. But academic instructors also often leverage very plain text editing software in coursework, which exacerbates the notion that nested functions can be hard to read. In modern Integrated Development Environments a.k.a IDEs (by which I include Vim, Emacs and other powerful “text” editors with all their related developer plugins), you can jump or peek at function declarations easily. If the function is short and only called once, declare it directly under or above the caller, and it can appear on the same screen.
But beyond this, the student is really forgetting in business you will not be in the details of your application most of the time. You will be reading, rewriting, or re-architecting it to fix and adapt it more often. You’ll have other people’s code to read and they will be reading yours. For these reasons you need to optimize your code for reading at a higher level. You use automated tests to ensure lower level functions and components operate as expected without actually reading the logic directly. This is why books like Clean Code make a point of asking for what students feel are “extremely” short methods.
Finally, a function should have a single purpose and be named correspondingly. This is invaluable for knowing where to look for bugs, where code changes should go, and when your design is not adequate for the task. In introductory learning programs this is not important because specifications are relatively clear, programs rarely continue to be used after the assignment is due, and requirements almost never change. In business you will often learn the specifications by making a prototype and receiving unclear feedback, the programs will be used longer than you are at the company, and the requirements will change rapidly and often. When you are learning bugs are quashed in single all night coding session, they don’t wake you up at 3 am costing your company thousands or millions of dollars for every minute they go unfixed. Bugs in homework assignments won’t cost you your job and reputation if you don’t have a speedy fix.
When to refactor the smallest details? Comments, Calculations, Concrete implementations
The most obvious code smell indicating you haven’t followed the concept of creating a table of contents to its logical extremity, is the presence of comments within a function.
Every time you write a comment, you should grimace and feel the failure of your ability of expression.
– Bob Martin
While comments are often spread liberally by college professors throughout introductory code, they are actually not very appropriate for long lived business code. Comments often rot, becoming less and less relevant, and sometimes become just plain wrong, as the code that they describe changes. Function names don’t suffer from this to nearly the same degree. If nothing else the parameters and return values self document at least some of the players involved. Comments and function names are not equivalent documentation tools. In IDEs, renaming a function is a much more consistent and fluid affair. In Visual Studio, renaming the function is easy as right clicking a function name a picking the rename option.
Comments have a place in scaffolding pseudo code during development, or sometimes at the top of functions declarations to describe high level usage (preferably with tooling that provides automatic documentation export and/or Intellisense), but within a function they are almost always inappropriate. There they are often indicating that there is a correct higher level description of the component logic, which is a perfect way to choose a function name to encapsulate that logic.
Another easy way to identify candidates for extracting methods is when there is an intermediate value that must be calculated. In many algorithms you’ll need a few values calculated that are used to get a final result e.g. half the length of a collection in sorting or searching, or, as we will see in later example, calculating damage in a game. Even if these calculations are simple, we can cut down on the visual noise of the primary logic block by placing them in a separate function. Extracting these details can also limit the lifetime of related variables, making their usage both easier to understand and potentially more performant as that memory is released when the variables leave scope.
A related identifier is the use of concrete, particularly primitive values, as the result of calculations. Extracting single lines from a method can make the use of polymorphic behavior much easier, by pushing value type details lower, so they can be dealt with during refactoring as separate concerns from the overall logic of the application.
How to refactor the smallest details? Practice, Practice, Practice
Think of a function as instructions to get to a local store. If possible, you should be working at the level of detail that is appropriate for the reader. When describing the whole trip, it should be high level details: go down the street, find it on the right. Since this is a program and not a person, we have to be explicit in each step, but those details can be hidden in functions. Each of those can also be driven down further into other methods, until we have changes on language primitives. So going down the street might be described by: go outside, pick a mode of transportation, and travel south on main street; while go outside might be further described by actions like: get dressed, grab your keys, open your door, step outside, lock your door.
Consider this lightly modified example from these Reddit comments:
Before Refactoring
receivedHit(evt) { //Calculate the damage to apply int damage = evt.getDamage() - playerArmor; updatePlayerHealth(damage) }
After Refactoring
receivedHit(evt) { const damage = calculateDamage(evt) updatePlayerHealth(damage) }
In this example we see all 3 of the Whens of method extraction refactorings we previously discussed. There is a comment that describes the line below it. There is a calculation whose value is used later in the logic. Finally, the values are concrete primitive types.
If this is for a game we will write once and then never use again, discarding it after the assignment or code jam, then the first set of code is appropriate. But if the code is long lived, needing to be maintained and adapted, perhaps because your code jam project has become a viral hit, the second is much better, although it could be improved further. We self documented the fact we are calculating the damage we will apply, removed the chirp of noise given off by the math of calculation from the high level behavior, and we replaced our concrete type with an implicit reference. Now if our requirements change and we have damage types and resistances (e.g. fire, acid, etc.) none of the code in receivedHit() will change. We can just change the contents of calculateDamage() and modify the signatures of the other related functions to pass the more detailed Damage object.
This is the scenario you will find yourself in when writing business software more often than not. Therefore, practicing writing software this way until it is a habit is a good career move. Keep it up and you’ll find it is both easier and natural to have “extremely” short self documenting methods.
Before you go
I’d like to get your feedback! Please take a moment to post a comment below to let me know your thoughts on this post. Anything you think should be introduced to a beginner or junior developer ASAP? Or is there a facet to the business of software I’ve overlooked? I would be very happy to hear about either.
Also, I putting together a very focused mailing list on the topics that I regularly cover on this blog. If this post was useful to you, please sign up here to get future emails from me on this and related subjects. Thanks!