Norris numbers

npsimons · on July 23, 2014

The real interesting corollary here is that no matter what level you are at, you will probably want to leverage a language that's as powerful as possible, since you will be able to accomplish more with less lines of code[1].

A few months back, I expressed disappointment with Google Maps' dropping of support for displaying KMZs from other servers, and my intent to put together a replacement. When I finally got around to it, I found a JS library[2] (plus plugin[3]) that allowed me to replicate what I needed in about 10 lines of JS[4]. Of course, this is more the power of leveraging a library (and public tile servers), but I think it's still instructive to my point.

[1] - http://www.paulgraham.com/power.html

[2] - http://leafletjs.com/

[3] - https://github.com/mpetazzoni/leaflet-gpx

[4] - http://hardcorehackers.com/~npsimons/photos/2014/07-13:%20Ow...

smsm42 · on July 23, 2014

I think this is not the right understanding, since LOC here is being used as a proxy for complexity, but language that packs more complexity into single line would not necessarily reduce the complexity. It might, but it might not - e.g. it is true that you can implement Conway's life game as one-liner in APL [1], but does it become much simpler than multi-line implementation of the same in more mainstream language? I would not really say so.

[1] http://catpad.net/michael/apl/

DanielBMarkham · on July 23, 2014

The real trick is knowing when a new feature adds linear complexity (its own weight only) or geometric complexity (interacts with other features).

This is why so many folks love DSLs. Fully abstract and get your geometric features out early; test them and make them flexible without having to re-code anything.

sp332 · on July 23, 2014

Alan Kay is heading up a project to write a whole OS - network stack, graphics and all - in 20,000 lines of code. http://vpri.org/html/work/ifnct.htm The whole project is basically made out of DSLs.

renox · on July 23, 2014

Is the project still alive?

The last progress report about the project was in 2011..

sp332 · on July 23, 2014

Well, they're still writing papers about it at least. http://vpri.org/html/writings.php

agumonkey · on July 23, 2014

Recent papers mention STEPS but it seems a done experiment.

sgt101 · on July 23, 2014

There is shockingly little data on this available. The most systematic research I have found was led by Lehman at Imperial in the 90's : http://www.eis.mdx.ac.uk/staffpages/mml/feast2/papers/abstra...

Interestingly I am in a position to revisit the evolution of one of the systems in the FEAST study. The key insight from FEAST was the idea of an "S-curve" where complexity overwhelms drive for features. Initial development progress is slow because the infrastructure and framework for the system is not in place, then comes rapid development then a slow down as complexity kicks in.

The reviewed system behaved like that in the study (I checked the data) and for a few years after. But then a period of explosive growth occurred, halted only by a strategic decision to move away from the platform for technology management (obsolescence of mainframes) reasons.

NathanKP · on July 23, 2014

I've never coded a piece of software that needed 3 million lines of code. I'm at the 200,000 LOC level right now.

However, from my software architect experience I imagine that any project in the millions of lines of code would be best broken into smaller services that communicate through a common backbone.

This type of architecture would allow a large dev team to be broken into smaller groups that each focus on smaller, manageable subsets of the large code base with each subset being a service which then communicates with other subsets through a backbone.

It would require a lot of internal documentation and back and forth communication between internal teams to get the services to integrate with each other flawlessly but it shouldn't be too difficult with proper care and talent.

chipsy · on July 23, 2014

Yes, I point to how we accomplished networking. "Internet-scale" code is basically a lot of interacting systems of protocols. Protocol development seems like the "final step" in scaling a system.

walshemj · on July 23, 2014

Is this 1500 lines in a monolithic program with no structure?

Certainly I found no problems in working with some of the billing systems I worked on at BT which had way way beyond 2k loc and probably around the 200k.

I recall 1400 long line if blocks in one bit of fortran 77 code :-)

But we did have a real genius as team leader

adamtj · on July 23, 2014

Genius?

A weak man can lift a small rock. A strong man can lift a large rock. But to lift a ten ton boulder, you need a crane. Even a weak man can operate a crane, if he knows how.

Your genius is like a strong but stupid man. Maybe he or she can lift a 1400 line 'if' rock, I mean block, but that's probably the limit. Any more would require a different mode of operation -- the programming equivalent of using a crane.

The point is that people who haven't broken the 1500 line wall will write code with no structure. That's why they can't break through the wall. That's the heaviest code they can lift without assistance.

Certainly there are well structured 1500 line programs. But those overwhelmingly come from people who have broken through that limit. The people who know how to use a crane will tend to use them even for smaller projects.

Why would I go to all the trouble to use a giant crane just to lift a smallish 20 kg (40 lbs) rock? Because then I won't get tired or strain my back. So I can lift another. And another, and another. I can do this all day without breaking a sweat. The strong man is tired and sore and worn out from just a few dozen small rocks, while I'm still fresh, still going, and still able to lift multi-ton boulders.

walshemj · on July 23, 2014

And your point is?

We where doing MR back in the early 80's on the largest cluster of super mini's in the UK - dam right Cliff was a genius the project would have not worked with out him.

adamtj · on July 23, 2014

My point? What, are you feeling insulted? If so, stop. After all, I don't know him or you or anything about the project you were on. Anybody reading my comment would know that.

I thought maybe you missed the point of the article, that's all. Read it again, more closely. Perhaps then you'll see Cliff in a different light.

If you missed the point of the article and he wrote a 1400 line 'if' statement, then it's at least plausible that you're both stuck behind that first or second wall. I could easily be wrong. Then again, maybe this article has the key you need to catapult yourself far ahead of Cliff. He may well be a genius, but you could potentially be far more effective and productive. You may even be able to accomplish things that Cliff would fail at.

So, give the article another read.

metaphorm · on July 23, 2014

Fortran77 though. pretty limited set of organizational tools to work with there. Procs and Modules is it, I think (though I'm no expert). Many of the patterns we're used to using for code organization just wouldn't be available.

So how you would deal with a really big if statement without having more sophisticated tools? The business logic must have had a very large number of conditional branches. Even if you encapsulate the instructions of each branch in their own proc, it still doesn't change the fact that you're going to have a large number of conditions.

This isn't particularly a solved problem even in modern languages. We can use all kinds of layers of abstraction to make individual units of code smaller, but it's sort of an illusion. Splatting your business logic across a dozen or more files/modules/classes whatever might make it easier to look at in a text editor but it doesn't make it easier to develop or test. If you've got complex business logic you're going to have a complex program.

walshemj · on July 23, 2014

the system was broken down into a lot of sub processes from memory the 1400 long branch if was just a branching if then else statement - most weren't as bad.

A core part of the documentation was large number of A1 sheets which covered an entre wall of our office ;-)

The core of the system working out what do to with all the log records was a collection of Pl/1G.

We did have to build a lot of extra stuff you get for free nowadays we had a custom build system that you could use to build any part or the whole system written in primes JCL.

walshemj · on July 23, 2014

Ladies and gentleman I give you a prime example of unconscious incompetence

sp332 · on July 23, 2014

Yes. Thoughtful class and package decomposition will let you scale up to 20,000 lines.

walshemj · on July 23, 2014

In the olden days what we used to call subroutines or subprograms :-)

GregBuchholz · on July 23, 2014

How many lines of code is our DNA? Could there be a similar effect at work in other systems, concerning network size and orthogonality that limits their size? Things like the amount of complexity in DNA of organisms, or the size of cells, etc.?

vanderZwan · on July 24, 2014

Well, there's this:

For years, the prevailing assumption was simply that modules evolved because entities that were modular could respond to change more quickly, and therefore had an adaptive advantage over their non-modular competitors. But that may not be enough to explain the origin of the phenomena.

The team discovered that evolution produces modules not because they produce more adaptable designs, but because modular designs have fewer and shorter network connections, which are costly to build and maintain. As it turned out, it was enough to include a "cost of wiring" to make evolution favor modular architectures.

http://www.astrobio.net/topic/exploration/robotics-a-i/evolv...

philh · on July 23, 2014

The size of our (non-junk) DNA is limited by its mutation rate. (Though I don't know how the limit grows as a function of mutation rate, and I don't know if we've hit that limit as a species.)

GregBuchholz · on July 23, 2014

I thought that "junk DNA" was now a discarded misnomer, and that while there are sections that don't code for protein synthesis, it still has a biological function. It looks like it is still up in the air as to what percentage of non-coding DNA is biologically vital.

http://en.wikipedia.org/wiki/Noncoding_DNA

philh · on July 23, 2014

If junk DNA doesn't exist, the limit applies to the whole genome.

(I lean towards thinking that some non-junk DNA has been previously mistakenly classified as junk, but that junk DNA still exists in a meaningful sense. But I'm a layman, and I don't even have a strong idea of what the expert consensus is.)

jamesdutc · on July 23, 2014

This is a very interesting phenomenon, but I think one should be wary of using it for decision-making in the absence of distinguishing principles. "Keep things simple": yes, of course, and I also look both ways before I cross the street.

Believing in this phenomenon has promotional value - those other guys just don't "get it." I worry that deeply internalising it bears great risk of self-delusion. Acknowledgement of this phenomenon, irrespective of whether it exists or not, may prove detrimental.

One need not heed the arguments of those lesser 2,000-liners; only a 200,000-liner possesses the je ne sais quoi to know the right choices.

simgidacav · on July 23, 2014

I'm not convinced by the "number of lines" measure unit. It's clearly something which depends on the programming language, so one should gather different statistics depending on the language. Not to mention the fact that different projects might be prone to get messed up in different ways...

habitue · on July 23, 2014

I don't see how "refuse to add features" allows complexity to scale past 20K. That seems more like a strategy to stay under a given number of LOC, dodging the issue (which isn't to say it's a bad strategy).

cfallin · on July 23, 2014

I see it more as "refuse to add the wrong features". The key to scaling IMHO is to avoid the ball-of-mud where every feature interacts with every other feature, making the design impossible to disentangle and modify sanely. In software engineering terms, we want low coupling. So we choose the right primitives at the lowest level, such that additional requirements/feature requests can be built by composing primitives or attaching add-ons or using a well-defined hook interface or somesuch. This keeps the core clean and easy to reason about, and avoids surprises.

mechsin · on July 23, 2014

I don't quiet see it as just refusing to add the wrong features either. While that is defiantly part of it the article states not to add something "unless you need it right now, and need it badly". That seems like quite a high threshold in our minds sometimes but it really isn't. I think it just points to needing to think about what you’re adding and how you’re adding before you just go and do it. Stuff like do I really have to do this with a tall stack of if statements and check each case or is there a better way or what happens if I modify this class method, etc. As pointed out the advice to move from 2,000 to 20,000 is thoughtful classes and proper packaging. Those are what most would consider just BBP but if you’re a novice who’s major experience is writing script type things you don't normal consider things like how is your process going to fit into a over arching frame work and even if you do you many not have the raw experience necessary to let you see the shape your project is going to take in the future.

chipsy · on July 23, 2014

I have to wonder if "breaking the wall" is correlated with being able to write more densely and create more functionality with less code.

not_kurt_godel · on July 23, 2014

It's not about density. In fact, the opposite. It's about making each piece of the code dead-simple in terms of readability and functionality. When every class does exactly what you would expect it to, you can build complex structures reliably (which themselves act as dead-simple abstractions on top of the building blocks, and so on).

chipsy · on July 23, 2014

I disagree. Simple things are _necessarily_ dense in their implementation because they're so exacting. Recall Simple Made Easy[0].

[0] http://www.infoq.com/presentations/Simple-Made-Easy

Terr_ · on July 23, 2014

Yeah: Once a good abstraction is in place, you don't really care how dense the innards are because you don't have to look. (And when you do have to crack it open you can give it you undivided attention.)

agzam · on July 23, 2014

2000 lines of C++ and 2000 lines of Haskell code are definitely not the same things

ExpiredLink · on July 23, 2014

The article has some valid points. But a person who needs to emphasize his superiority against novice programmers in such a way lacks maturity of personality.

interpol_p · on July 23, 2014

I didn't get that from the article. It felt like the author was emphasising experience, not superiority. We all went through these stages, the author acknowledges that.

The article doesn't say anything negative about novice programmers, just that these are the things we go through when we learn. I certainly went through the stages described, and I'm hitting the near-million-line projects now and wonder what happens next.

ahris · on July 23, 2014

As a novice programmer, I found this post extremely helpful! It's given me insights on a problem I wasn't even aware I had.