PyPy is not an option everywhere. Since as much I know, many, many enhancements and libraries are just not available for PyPy but just for plain old CPython.
PyPy might be an interesting project with much potential, but there seams to me a long way until it can be an one-shot replacement for CPyton.
That's the cost of being a dynamic language. Since Python objects can be dynamically enhanced everywhere (also from inheriting classes and even from outside of the class) it needs dictionaries. But those can be very memory inefficient, specially on modern 64bit Hardware. One dict can easily take 1-2k for very few stored attributes (size can even depend on actual names used, because of the nature of dicts). So when it comes to millions of object instances, it is better to use __slots__ but those come with a cost: Those objects are not enhance-able any more. You have to know all attributes of the objects in advance. So you should only use it on objects that are really used a lot or are really simple.
No, this is a cost of this particular style of dynamic object model. Not all dynamic languages are dynamic in this way.
> Those objects are not enhance-able any more.
I don't see any reason why Python can't do what Clojure's defrecord does: Provide fixed fields for pre-declared slots, while still using a dictionary for extensions. It has been a while since I've used python, but I'm almost certain that there is some __special__ magic that can make this work with relative ease.
It's also worth pointing out that most modern JITs, like V8 or PyPy, can automatically detect "hidden classes" like this and optimize these objects to pack such static fields.
"modern JITs" aren't so modern at all. All that work was originally done on Smalltalk in the 1980s — it's also entirely tangential to JITing compilers, as it can easily be done with interpreters too, so it's not even a cost of this particular style of dynamic object model — it's a cost of this implementation strategy of this particular style of dynamic object model. The fact that PyPy manages fine shows it is not the language, or any model to which it subscribes, that is at fault.
I did not say, that every dynamic language has to implement it this way, but Python does. And Python was not intended to be a language for building up huge data masses in memory, though it does not so bad in most cases (leaving out the mentioned ones).
Pythons style has some advantages though. Simplicity and really a huge quantity of flexibility are two of them.
(Traditional) Python normally is not a truly compiled language -- it is just a rather simple precompilation step that makes life easier for the interpreter. When you compile or "JIT" the code, you have more options. You see the whole program. The precompiler of Python does not! It just sees the local module. So, it can not find all classes that might enhance a base class.
If you have any ideas, how to implement a better language, why don't you implement your own? It's up to you! (I guess, creating new (more or less useful) programming languages is the hobby of computer scientists anyhow)
--
I'm working on a Ruby compiler and have taken pretty much [the PyPy] approach [of automatically using slots when possible]: Any instance variable names I can statically determine are candidates for the equivalent treatment (allocating a slot at a fixed offset in the object structure).
Anything else will still go in a dictionary. In practice my experience is that a huge proportion of objects will have a fairly static set of attributes, and the dynamic set is often small enough that having pointers them included in every instance is still often cheaper than using dictionaries.
---
In a static language, your options are generally to either statically allocate slots, or explicitly use a dictionary anyway.
You are right. But the normal application of Python is in circumstances, where you are not under such pressure anyway and flexibility and ease of use is more a requirement than having maximum speed or minimal memory requirements. And of course it is particularly the CPython implementation that I described ... but CPython is still the implementation that is used most and the one where most enhancements, libraries and so exist. Other implementations might do better in many ways, but what made Python great, is still available in its original form, the CPython environment.
I did something similar for a batch log processing system I wrote in Python some time ago. All the log messages could be classified as representing one of a few dozen 'packet' types, each represented by an object instance (so I could do some additional processing later), so predefining each type's fixed sets of fields using slots noticeably decreased memory usage. Of course, it was the first time I had ever done anything like that in Python, so I may have been doing it wrong...
Anyways, definitely a good short read, thanks for posting!
Per the Python 3 docs (for some reason not in the Python 2 docs, but the same holds): "Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to."
Most of the space for the NonSlotted version is in the __dict__, and if you print the size of ni.__dict__ you'll probably get a couple of hundred bytes.
Useful tip. Anecdotally this helped me save 40% of memory on some data I need to store in memory for analysis: Used to be about 1KB per object, after adding __slots__ it came down to 590 bytes.
Or CLOS.
It's a similar problem with keyword arguments passed in hash tables, I think. The space occupied is less of an issue (unless in a deep recursion) but it's slower then constructing a list of pairs, plus order of parameters is lost..
I didn't say that __slots__ makes JIT possible. However, it does make writing one easier. (Also makes writing a faster one easier.)
EDIT: Is it the new modus operandi on HN: If a statement isn't seemingly 100% in support of your pet language, automatically read the statement in the dimmest and narrowest way possible?
Just think of it as there was some confusion and possible ambiguity, and your clarification has cleared it up for anyone interested in the subject but not yet knowledgeable enough. Someone can skim through and have their mental model corrected slightly now - a very nice thing!
Probably missing a lot of context here, but wondering why you wouldn't use something like nginx or squid for serving static content, as they are designed for this kind of use case.
Good question -- however, it's not completely static content. The hotel reviews and photos are more or less static (updated only on deployment), however a fair number of the features of the site are dynamic: user accounts, real-time pricing, search, recently-viewed hotels, etc.
Have you considered using something like Memcached or Redis then? There'd be some overhead sending data over a local TCP connection, but I think it would be a lot more memory-efficient.
Extra-nice thing about this feature: it can be enabled and disabled, for a class, with a very little effort. So you can check correctness first, and optimize later.
I echo what they said in that post, though: don't prematurely optimize. If you find you have tons of objects and need the RAM or you're actually paying a premium for hash accesses, then fields can save you some effort... but if you've a small use case, don't bother.
I'm not entirely sure, but based on my experience with OO in Perl i guess that it simply uses an array in a special attribute, instead of putting the various attributes into dict keys on the actual object. Possibly it even uses some kind of inside-out implementation where the arrays are stored via closure in some other scope and only visible to accessor methods.
I believe CPython just allocates slightly more memory than the structure describing the object requres and stores the attributes in fixed locations immediately after it. It's basically the same way that it handles attributes of built-in types, except that those also have a C struct describing the attribute layout.
Because you are one of those rare HN creatures that enjoy using it, after using most consumer OS since the early 80's, and need it for coding/system administration.