At first, I thought this was going to be an article about having zero exceptions, because exceptions aren't allowed, perhaps due to a type system.
This isn't really possible in a language that doesn't have type annotations, because in such a language, you can't make any run-time assumptions. All bets are off! Anything could be undefined! Anything might throw at any time!
In a language like Scala, you can almost completely eliminate the possibility of exceptions. You just have to be disciplined about never throwing. If you always use types that represent the possible outcomes of an operation or the valid inputs a module might accept, then you're forced to handle every possibility, and you're forced to eventually converge all your code paths to produce an output value from some bounded set. The more tightly bounded, the better! (Granted, the Scala type system doesn't guarantee that a function won't throw, and some standard library functions do throw, so all isn't perfect.)
Don't want to deal with all the cases right away? No problem, don't throw, just coerce the result to some temporary default value.
If you're disciplined enough in this approach, you'll eliminate all business logic exceptions. You'll still have to worry about errors in the underlying runtime, like OutOfMemoryError, but those you can defer to the operations tier.
I wrote the first version of the API I'm currently working on in that exact mindset in Scala with the Play framework. It basically boils down to everything returning `Either[Error, X]` where X is whatever the service is returning.
I came to dislike this approach because most of the time the error doesn't happen and it's just wasted code that's a bit verbose. So now I'm moving into Akka and going to allow well-defined exceptions to throw that get handled by a supervisor.
It was nice to know errors were handled, but I came to the opposite opinion that exceptions truly are exceptional and errors are an instance of the exceptional taking place. Shrug, the bottom line is being disciplined in error-handling. I know there's varying views on this approach, but I really like crafting method signatures that indicate what they are intended to return rather than `Either` for everything. There's places where `Either` is great like registration where a bad password isn't really an "exception" but user error.
If you don't really care about the details of the unhappy path, just use Option. The real problem with exceptions is that they crash the whole program, but it's really difficult to know you've accounted for all possible exceptions. With pattern matching on an Option, Either, or Validation, you'll get a compile-time warning if you haven't accounted for all branches. Checked exceptions, like in Java, also have this property, they're just really awkward to use.
I'm not sure what the recommended alternative to this scenario would be based upon your comment:
I like to check invariants at certain points in my code. Them not being correct is an obvious bug. I would prefer to fail in the most obvious fashion if they aren't correct, which for our production environment is an exception.
Getting to zero exceptions is not a sensible goal (in languages where error handling is done through exceptions). Not all exceptions are bugs. What's supposed to happen when a network connection goes down?
To turn something into useful metrics there needs to be more than zero of it the first place. So maybe the right way of putting it would be to say "Don't just log bugs. Fix them!". Calling that "zero exceptions" is misleading.
That's why the article is "Getting to Zero" not "Starting at Zero"; the target audience is teams where the opening quotes sound like something you'd overhear.
Not understanding the negativity. It was a reasonably well done little blog post, and if you're on, or responsible for, a team where that situation and attitude is occurring it provides some good concise guidelines on how to make life better.
I think it is a wholly sensible goal for a web app. If a network connection goes down, it is helpful to the user to inform them of what is going on rather than 500ing; or, if it's an async job, retries are appropriate. An exception is the end of a flow and is generally unhelpful for customer-facing tools. An exception should mean 'hey, a user did some edge case that illuminated an issue we should fix'.
Why is an Exception only something final? Exceptions can be caught, even by type. I regularly use exceptions to break and resume long-running daemons. These exceptions have nothing to do with the user nor the end to the program. I don't think Exception == Fatal Error.
The terminology is important here. If by error handling you mean usual flow control when something off the happy path occurs, that is subtly but crucially different from a condition or event that could not have been foreseen. On all the platforms I have developed on that have exceptions, exceptions are for situations that I don't know how to recover from with logic that is really part of my app anyway... This includes network connections dropping as that's not something I couldn't foresee.
Sometimes if you work with this approach you find that the designers of the libraries upon which you build have, frustratingly, decided to force you to catch exceptions to deal with business as usual, but generally I find that treating exceptions as 'cannot continue' has worked well for most code I've written in the last 12 years.
While that is a good counter example, it would be nice to have 0 exceptions (ERROR and the like) logged on most days.
I have seen too many apps/modules that throw up and log just because of bad user input. This noise gets tedious to wade through to find the real problems.
I'd write an error handler which may issue retries, or roll back transactions, and then log that it did timeout (instead of letting it bubble up to the top.)
While this is a good goal, and I don't think it's not a sensible goal, the amount of developer bandwidth required to take the approach given in the article is enormous.
Fixing exceptions as they come in is great if you've got someone on deck specifically for fixing them, but what happens if everyone is already working on something else? Especially if that exception is something that is relatively unimportant? The idea of setting aside time for working on it is much better. I like to do a refactor Friday and work specifically on this type of goal.
I've been in environments where it worked just fine. Bug fixes (and exceptions = bug fix) always took priority over features. And yet not only did we never get delayed, we routinely were ahead of schedule.
As the article points out, the trick is to get it into your blood -at the start-. If it feels like fixing exceptions takes all your time it's probably because you have let a lot of them creep in.
How many known exceptions do you introduce per sprint? It should be zero; nothing new that you're writing should be creating an exception, that isn't fixed in the sprint.
So what's left? Exceptions that are uncovered outside of the sprint. That is, you had a feature written, tests for it written, QA test it, client demos show it off, and any exception you saw you fixed already. So outside of all of that, where can exceptions even occur? Well, after release, obviously. Weird race conditions, deviations off the critical path, users doing unexpected things, etc. But the frequency of those should be pretty low. Maybe one, two a sprint, max? Surely you can take the time to dig in and fix those without causing the sprint to slip.
Now, as the article also mentions, when you start getting a huge, complex codebase (either due to time, or team size), this gets harder. More complexity = more edge cases. That's one of the reasons to reduce complexity wherever possible, to isolate functionality as much as possible. And this also assumes you have a development process that has you delivering every sprint, and having the stuff used. If you're doing some waterfall "we'll code like crazy for the next year, then test, then deliver", yeah, you probably can't do it. You already said the feature was done; no time to fix that bug! Sorry.
In ObjectStudio Smalltalk, there was a "top level exception handler." So long as you weren't doing something involving call-outs to C or custom primitives, you could just set the top level exception handler to an empty block. (Empty lambda)
Tada! No exceptions! (Of course, this is the worst possible thing to do.)
It's kind of cheating because Smalltalk exceptions were mostly just ordinary Smalltalk execution. You could start writing your own debugger and be browsing stack traces in a matter of minutes.
Apple's Cocoa framework (for Mac, I don't think this is true on iOS) surrounds each event callout with an exception handler that logs any exceptions and otherwise swallows them and allows execution to proceed. It's a bit crazy, and frustrating if you cause an exception to be thrown but don't notice until it's too late to figure out where it came from, or don't know why suddenly the app is behaving strangely since the exception left things in an inconsistent state.
You can do that easily on Windows by adding a Vectored Exception Handler (VEH).
But of course that doesn't make exceptions go away, you just save on stack unwinding (if you discard the exception), there is still a pretty big overhead.
The best part about trying to measure exceptions or work towards zero exceptions is the system you put in place to detect them.
What you care about eliminating are the ones that happen a lot, or the ones that happen immediately following a release or feature flag being turned on.
If you aren't measuring or capturing things in a reasonable way, that's where you'll feel tons of pain. Otherwise I agree with a few of the comments; getting to zero is a noble goal but perhaps not the point.
I've always been surprised that there aren't widely-used tools (I think they do exist but just aren't very popular) that monitor logs and analyze them for irregularities. Have it learn what messages are 'normal' and only send up an alarm when something statistically significant crops up. It doesn't even have to be heavy machine-learning, just some basic statistical analysis would probably help a great deal. Add in correlating multiple logs with one another (increasingly an issue when you've got dozens of servers actually running your application, any of which could be raising important messages while all of them clutter the log with usually-unimportant messages). Taking a firehose of data and breaking it down into what needs attention is something software should be much better at than human beings.
So it's really a zero exceptions, except for the ones you deliberately choose to ignore policy. But that's pretty much what we do now, except I'm not aware of a good method for muting certain exceptions. (There may be one! It just never occurred to me that we could mute certain exceptions.)
I think we could take it a step farther and say certain kinds of exception should never occur and should also never be ignored, like NoMethodError / nil pointer exceptions.
This isn't really possible in a language that doesn't have type annotations, because in such a language, you can't make any run-time assumptions. All bets are off! Anything could be undefined! Anything might throw at any time!
In a language like Scala, you can almost completely eliminate the possibility of exceptions. You just have to be disciplined about never throwing. If you always use types that represent the possible outcomes of an operation or the valid inputs a module might accept, then you're forced to handle every possibility, and you're forced to eventually converge all your code paths to produce an output value from some bounded set. The more tightly bounded, the better! (Granted, the Scala type system doesn't guarantee that a function won't throw, and some standard library functions do throw, so all isn't perfect.)
Don't want to deal with all the cases right away? No problem, don't throw, just coerce the result to some temporary default value.
If you're disciplined enough in this approach, you'll eliminate all business logic exceptions. You'll still have to worry about errors in the underlying runtime, like OutOfMemoryError, but those you can defer to the operations tier.