Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Abstraction without overhead: traits in Rust (rust-lang.org)
212 points by steveklabnik on May 11, 2015 | hide | past | favorite | 137 comments


Great article. The main bit of new information for me was that while Rust supports dynamic dispatch, its implementation has a noticeable difference from C++. In C++, the vtable pointer is in the object itself. In Rust, it's stored inside what is essentially a "fat pointer." Pointers to traits ("trait objects") are actually two pointers: the pointer to the vtable and the pointer to the actual object.

This seems to have one major downside (pointers are twice as big), but lots of upsides:

    - allows traits to be implemented for existing types, as opposed
      to C++ where the type's declaration has to list all base classes.

    - allows a type to be used through dynamic dispatch while allowing users
      who don't need this to avoid the vtable overhead.

    - one less indirection in the call sequence for dynamic dispatch.
I like it a lot overall, though the idea of 16-byte pointers on 64-bit architectures does make me slightly queasy.


It also makes multiple inheritance (which Rust has, through Java-like interfaces) easy to implement, and fast at runtime. The virtual inheritance of C++ is a real mess, by contrast [1].

[1]: http://www.phpcompiler.org/articles/virtualinheritance.html

Edit: I don't mean to bash C++ here, BTW; the skinny-pointer approach has a lot of benefits when all you need is single inheritance (and there are early-stage proposals to add it to Rust too). But I don't think it works well for multiple inheritance.


I apologise in advance that this may not be most appropriate place to ask this question.

I am looking for a Rust tutorial. It looks like there was one, but it was deprecated in favour of 'the book'. But 'the book' doesn't seem to have a tutorial yet.

Are there any tutorial's running through how to build some small piece of working software.

I found the Golang tutorial, where you build a very basic blog extremely enjoyable. Does Rust have anything similar?

https://golang.org/doc/articles/wiki/

Thanks


The book basically has two sets of tutorials: the "Syntax and Semantics" section is a bottom-up tutorial, and the "Learn Rust" section is a project-based, more top-down one. It's true that only one chapter of Learn Rust has landed at the moment. It's basically what I'm doing right now. Should have two or three more chapters over the next few days.


Great, thanks Steve.


What is the representation for a type like Box<Reader+Writer>? Is there an extra indirection to get each of the vtables? Is there synthesized combined trait type and just one vtable pointer?


I am pretty sure you can't actually construct such a type (at least not at the moment) so Rust sort of sidesteps the question.


IIRC you can.


I get this when I try something like `as Box<Read+Write>`

    error: only the builtin traits can be used as closure or object bounds [E0225]


TIL; thanks :)


    >- allows traits to be implemented for existing types, as opposed
    >  to C++ where the type's declaration has to list all base classes.

    >- allows a type to be used through dynamic dispatch while allowing users
    >  who don't need this to avoid the vtable overhead.
In C++ you can have this as well. For example, std::function is able to wrap any callable type without them having to have any base classes. Sean Parent gives a great talk about this: http://channel9.msdn.com/Events/GoingNative/2013/Inheritance...


This only works for callables, though; if you want a virtual dispatch wrapper for some other set of methods, you have to write it yourself (right?). On the other hand, if you do have a relevant base class and you know the exact type, you can avoid virtual calls in C++ just by declaring the relevant methods final, although the vtable pointer will still bloat the struct a bit.

However, I'd say Rust traits are more elegant in unifying what are two separate worlds in C++:

(1) calling virtual methods on a base class pointer, which can be (usually is) dispatched at runtime, but thus can't support methods with generic parameters or having a 'virtual type' (instead of a method), and

(2) accessing members of template parameter classes, including (possibly generic) methods, constants, typedefs, etc. - much more flexible, but doesn't work at runtime. Also dynamically typed, for better or worse; Rust thinks worse.

In Rust, (1) is a trait object, and (2) can be done with a generic parameter specified to implement a certain trait with methods, associated types, and constants. Rust's compiler doesn't try to be magic (unlike, say, Haskell), and so traits with generic methods and such can't be made into trait objects, but they're still traits - they feel like the same basic kind of thing.


C++ is Turing-complete, so you can do whatever you want. What Rust offers is an hierarchy free (i.e., no inheritance) polymorphism mechanism as a core part of the language.


The turing-complete argument makes no sense here.


I think he means that, through templates, the C++ type system is Turing-complete.


Rust's is too.


Though the skimpy recursion limit makes it fairly impossible to do anything interesting with this property, much to the chagrin of some people. :P


Sure it does. I can pass around a struct containing any bytes I want and do anything I want with it. If I want to define FatPointer that contains an opaque pointer (or some equivalent union) and a polymorphic Strategy object of some sort, I can.

You can do the same in C, for that matter. Or in assembly.

The reason we still come up with new languages is so that you can write code that conforms to certain patterns easily.


No, Turing-completeness does not mean you can "do whatever you want". It means you can compute whatever you want that is computable. If the way you want to compute things is to embed a Rust compiler, yes you can do that, but there are many things that we'd colloquially refer to as "do whatever you want" that apply to the semantics of C++ itself.

You cannot, for instance, create a new abstract base class and have a pre-existing class in someone else's header implement your abstract base class, such that a pointer to that pre-existing class can be dynamic_cast to your class.

Because that statement refers to C++ semantics, not your ability to compute things, it's not affected by the claim of Turing-completeness.


> You cannot, for instance, create a new abstract base class and have a pre-existing class in someone else's header implement your abstract base class, such that a pointer to that pre-existing class can be dynamic_cast to your class.

Sure you can, for some value of base class and cast. You might not be able to use particular keywords, but data is data and math is math. You might not be able to use the built-in type system to do it.

C and javascript programmers write their own inheritance-based type systems all the time. And anything you can do in C, you can do in C++ if you really want to.


JS is a bad example because people who do that are preferring inheritance to composition, which is absurd.

In the case of C, yes you can do that, and where performance is critical you might prefer to do it, but it's obviously not going to be your first choice if you have any sense.


That has nothing to do with Turing-completeness. There are Turing-complete languages which don't even have pointers.


Does it have to be 16 bytes? I could imagine code opting in to compressed pointers (with alignment/heap restrictions). And you'd only need enough vtable bits to cover all unique types right? But imagining is easier than implementing of course. It would require recompiling all code to use the same pointer style, but whenever unwind-free Rust exists, we'll already need different compilation options for everything anyways, right?

Though the added complexity could offset the 16-to-8 byte size reduction.


I don't try think that's feasible. The vtable is stored in the binary's data block, so that's usually a pretty compact region of memory, maybe even referenceable by 16 bits. Except that's no longer true if the type's impl for the trait is defined in a shared library.

It should be possible as a compiler option if you are compiling every object that you link to and statically linking them into a single binary. Even then there's be a performance hit.


Would there be a hit? C++ has to do this:

object pointer -> load vtable -> load function pointer

while a compressed fat pointer would be like:

fat pointer -> shift to extract vtable index -> shift to make into a global table offset -> load global table base from a global variable -> load function pointer

Two loads in both cases, but in the compressed case the first one will almost always be cached; more code bloat, but not that much, and you save a bit of memory in the objects themselves.


The compiler could even choose the say 256 most popular vtables and use this optimization for those. Not sure how sane that would be.


This is also how Go does it. It looks like Rust traits can do everything Go interfaces can do, and more?


More or less, yeah. The main missing features are that downcasting isn't implemented at present (since Rust's other features remove the need for most of it), and implementation is explicit instead of implicit (a design decision which I've elaborated on in the past).


There are some key differences, like Rust traits using generics, and Go interfaces using an empty interface for generic code.


Go's interfaces have these additional features:

- They auto-implement on types that have the appropriate methods (a feature that probably is unwanted in Rust) - They can be runtime downcasted via `.(type_name_here)` - They can be runtime type switched via `switch foo.(type)`

I'm not sure of the vtable internals though.


The other major downside being the extra level of indirection, which could potentially cause an extra cache miss per method call. A drop in the park bucket compared to most other languages, but a serious consideration for many of the areas that C and C++ dominate.


Extra indirection? There is less indirection.

With the C++ approach the data is one pointer away but the vtable is two pointers away.

With the Rust approach both data and vtable are just one pointer away.


That's assuming that a "full pointer" (i.e. an unpredictable address to an arbitrary point in heap memory) is the only way to get a reference to an object. What about if you want to stick a bunch of objects in a vector, and iterate over them linearly? In the C++ approach, the data will have a nice linear cache access pattern. In the Rust approach (unless I'm missing something) you'd be storing a vector of (double sized) pointers, to god knows where in heap memory... suffering a cache miss on every access, on average.


Since you are talking about a heterogeneous [edit: homogeneous] array, you would store the concrete structs continously in Rust as well. Rust would also not waste space in the vector for storing a vtable-pointer, and would instead construct fat pointers dynamically when needed (since it knows the type, it knows which vtable to inject in the fat pointer).


> you would store the concrete structs continously in Rust as well. Rust would also not waste space in the vector for storing a vtable-pointer, and would instead construct fat pointers dynamically when needed (since it knows the type, it knows which vtable to inject in the fat pointer).

Uh, what? How is the compiler just supposed to magically know which of the structures in the array are of what type, without any additional identifying information? I'm assuming that in this optimized case, there's a hidden type field in each struct, that it would use to index into a table of vtable pointers? If so, there you go, that's yet another level of indirection.


I meant to say homogenous array, which I assume you were talking about.


No, I was talking about iterating over an array of objects calling their virtual functions (or either of the additional cases listed above). Of course it's easy to "do the right thing" with homogeneous arrays, either in the compiler or by hand if need be. But if you're iterating over a homogeneous array, calling the same virtual function on every single one, and your compiler somehow manages to notice this before you do, you probably screwed up in your design somewhere, so that's not the kind of problem I'm talking about.

It usually is smarter for performance to do the "data oriented design" thing and break the heterogeneous arrays into separate homogeneous arrays, so that you can potentially avoid a few levels of indirection, hoist loop invariants out, and maybe even make use of SIMD. But the whole point of the conversation was to talk about a nontrivial abstraction that (supposedly) trades performance for clarity. So I gave a scenario that would exercise that overhead.


> That's assuming that a "full pointer" (i.e. an unpredictable address to an arbitrary point in heap memory) is the only way to get a reference to an object.

No, it's assuming that a "full pointer" is the only way to get a reference to a polymorphic object. This is true in both C++ and Rust.

> What about if you want to stick a bunch of objects in a vector, and iterate over them linearly?

You can do this in both C++ and Rust. But you can only put a bunch of objects in a vector if they have a statically-known type. The point of dynamic dispatch is calling methods of an object where you don't statically know its type.


How would you put or index objects in a vector in C++ if they are virtual / dynamically sized? I'm in impression that unless you have the dynamically sized object behind a pointer, you get object slicing.


I didn't say they were dynamically sized. You could either have objects of the same type, but with virtual functions, or you could make a union of all applicable objects (thus guaranteed to be constant sized, at the size of the largest object) and store those in the array/vector, switching on a type enum or calling a function from an inherited base class.


> You could either have objects of the same type, but with virtual functions

If objects have the same type, which is statically known, there is no need for virtual functions because the compiler can resolve the specific method implementation at compile-time.

> or you could make a union of all applicable objects (thus guaranteed to be constant sized, at the size of the largest object) and store those in the array/vector, switching on a type enum

You can do this in both C++ and Rust easily, with equivalent efficiency in both. In Rust you would just use an enum type. This design is generally highly discouraged though, because it requires callers to be aware of all possible "derived classes".

My comments were only about the case where you are using true language-level polymorphism.


This is also how interfaces are implemented in Go.


And typeclasses in Haskell as well.


I don't think that's true. Dictionary-passing (or something equivalent) attaches the table of methods to the functions which take typeclass instances as arguments (with extra, hidden arguments), while Rust's Box<Trait> attaches the table of methods to the object pointer itself. So you can have a list of Box<Show> in Rust, but you can't have a list of Show in Haskell. You have to set up the extra indirection yourself using something like this: https://wiki.haskell.org/Existential_type#Dynamic_dispatch_m....

I'm not an expert in Rust or Haskell, though, so I welcome corrections.


> but you can't have a list of Show in Haskell

I'm not sure exactly what you mean, but my naive interpretation is that you can:

    Prelude> :t map show
    map show :: Show a => [a] -> [String]


That's not a list of Show, that's a list of a single type that instantiates Show.

The difference being that in your code you can only put in a single type at a time, eg [Int] or [String], but not both Int and String under the common Show interface type.


Aha, thanks (to twic also). Now I get it!


I'm not exactly sure what they mean, since the following works, and is pretty much a drop-in replacement for a Box<...> trait object in Rust.

  data Showable = forall a. Show a => Showable a
This allows for

  [Showable 1, Showable "foo", Showable 'x']
(It requires the ExistentialQuantification language feature.)


Right, this is exactly what I meant by setting up the indirection yourself. The main difference seems to be that in Haskell, you have to write this code separately for each typeclass, and wrap/unwrap it manually. In Rust, you just change Show to &Show and you get dynamic dispatch with no extra code. Here's a gist of the two approaches: https://gist.github.com/evanpw/89d89aae1159c608c476. One other difference is that in the Haskell showStatic, static dispatch is not a guarantee, just an optimization.


This works on single instances, but I'm trying to see how it plays out with lists of `Bar`. As far as I can see, you need to use `Box`: https://play.rust-lang.org/?code=struct%20Foo%3B%0Astruct%20...

Which seems to me to be effectively the same as `Barrable` in this case, although it's more generic.


> As far as I can see, you need to use `Box`

Which is exactly what you said in the first place!

> So you can have a list of Box<Show> in Rust, but you can't have a list of Show in Haskell.

But isn't this comparing two different things? You can't have a list of trait/typeclass in either language (excuse the pseudo-syntax):

    Rust: [Bar]
    Haskell: [Bar a => a]
But you can (with some extensions in Haskell) have:

    Rust: [Box<Bar>]
    Haskell: [Box Bar]


That takes a list of a, where a is some type which is Show. You can't have a grab-bag of different Show things in there.


Typeclasses benefit from the same "zero runtime overhead" in Haskell (but not Scala). This is particularly important in non-strict languages where inlining is a more prominent aspect of making code performant. Fortunately, GHC is a lot easier to understand WRT optimizations than gcc.

You revert to the OOP-style vtables if existential quantification is introduced because you have to pack code with the data, rather than being able to inline the data from a statically known index of methods into the concrete call-sites.

As it stands, I'm more likely to want Clean-style uniqueness typing in Haskell for a non-GC'd life-cycle for memory than I am to use Rust, but it's nice to see what we can accomplish with linear types baked into the compiler. Wish regions hadn't been abandoned, but it seems like somebody wanted Rust pushed into becoming a product quickly.


  > Wish regions hadn't been abandoned, but it seems like 
  > somebody wanted Rust pushed into becoming a product quickly.
No idea what you're referring to here. Lifetimes are entirely based on the regions literature, and four frigging years of design iteration is hardly "pushed into becoming a product quickly". :P


Lifetimes aren't regions.

4 years when it is a recapitulation of existing technology and hasn't had time to push anything forward and we already have PLs with similar facilities? That's a product-oriented rather than research-oriented direction whether you think it's too fast or not. For comparison, Haskell dates to the early 90s, based on non-strict FP languages from the 80s. ML itself started in the 70s. Much of what makes Haskell nice today happened because it had a decade to gestate without the demands of industry coming first. Applicative wasn't discovered until 2008. Those discoveries are a big part of why I happily use Haskell for work today.

This is emphatically not a value judgment, I will likely not use Rust in anger so I'm not your customer anyway at least WRT the programming language. Representations of linear types embeddable in dependently typed languages (such as Brady is figuring out in Idris) will probably be the next step.

In my ideal universe, there's a language for people to experiment with theoretical models and practical applications of linear typing such as Idris provides for DTPLs. This is particularly appealing as it could enable programmers to define their own linearly typed models for the compiler to enforce.


I don't think Rust was ever intended to be primarily a research language. From what I gathered even from early docs (in Graydon's reign), it was intended to implement ideas previously introduced in research languages in a practical systems language. They even made mention of adopting only safe ideas* (and I'm too lazy to look for those quotes). I think they've had to innovate some but that wasn't the primary intent.


> I don't think Rust was ever intended to be primarily a research language.

Indeed, Graydon chose the name "Rust" specifically because he wanted to avoid cutting-edge language features in favour of old, well-tested ones.

(Reasonable people could disagree whether the 1.0 language meets that goal, but it was a goal.)


That would explain a lot. How do the goals contrast with C++ then? Smaller language?


For reasons of backwards-compatibility, it's basically impossible to design extensions to C++ that make the language memory-safe by default. C++11/14 are admirable best-effort approaches to doing so, but they provide only tools for helping to enforce memory safety without providing any actionable guarantees. Rust guarantees memory safety, with the only possibility of unsafety being relegated to blocks of code specifically denoted as `unsafe`.

In addition to memory safety, Rust's other goal was to improve the ability of programmers to reason about low-level concurrency, motivated by the enormous pain that both the Firefox and Chrome developers are currently experiencing by trying to adapt their browsers to a multicore world. The serendipitous discovery was that the same mechanism used to guarantee memory safety will also statically guarantee that your program is free of data races.

TL;DR: Rust's goals are guaranteed memory safety, guaranteed freedom from data races, and zero runtime overhead relative to C++.


kibwen, this whole reply should be an FAQ somewhere if it isn't already. Nice precision!


Actually, lifetimes are basically the same as regions, going back to FX87. Lifetimes in Rust are more like regions in Tofte & Talpin 94 than those in FX87, but where inter-region pointers are constrained by the lifetime of regions rather than changing the lifetimes of the regions (as required by Rust's goals for regions).


Rust's goals for linear typing and regions changed. Ownership types didn't happen until typestate was abandoned. The current design is quite different from what was being explored before. If they're happy with what they've got, more power to them, but the project's priorities have changed dramatically in the last 2 years. I think it's worth asking why something that started seemingly as a research language backed off the original work and was converted into a relatively safe technology product in a short span of time.

I find it profoundly disquieting when redactions are created around the history of projects, pushing the impression that the place arrived at was what was wanted all along. Our failures can inform posterity as much as our successes.[1] The core Rust team has been perfectly candid about the history of the project.

[1]: Cf. pre-Monad IO subsystems for Haskell, http://www.scs.stanford.edu/~dbg/readings/haskell-history.pd... and http://research.microsoft.com/en-us/um/people/simonpj/papers...


I didn't claim (and neither did kibwen, who unlike me is a major Rust contributor) that Rust's approaches around memory management didn't change. But your claim that "Lifetimes aren't regions" is false -- lifetimes are squarely in the 30-year research history of region systems.


A region calculus[1], such as Tofte described, is not what I see in Rust as it exists today. It could've been so with typestate, if I understood the intent behind typestate correctly.

If what Rust has is understood to be regions, then I need a couple of words for distinguishing the two. Ordinarily, I refer to what Rust/C++ have as "ownership types" and what exists in research as regions/RBMM/RC.

[1]: http://www.researchgate.net/profile/Simon_Helsen/publication... (the defintion provided here is what I understand "regions" to mean, as contrasted with what exists in C++ and Rust)


As someone not terribly familiar with the literature but familiar with Rust's type system, I just spend a few minutes googling various pages about region based memory management (including Tofte + Talpin) as well as skimming your link (not very hard, I admit), and I don't really understand what it lets you do (that doesn't exist in a roughly analogous form in Rust). Well, there's the fact that most of the papers describe region inference, while Rust is fully explicit, but that doesn't seem critical to the scheme to me. Out of curiosity, I'd be interested to see you elaborate on the difference.

Also, I do not know what other 'PLs with similar facilities' exist with anywhere near as much effort put into them (other than Cyclone, which is dead). You mentioned C++, but it doesn't have lifetime checking at all, which is of course a core feature of Rust...


C++ doesn't cover the full extent of safety Rust offers at all, but there is a fair bit of overlap in how the functionality is packaged up WRT ownership types.

I checked the docs and it looks like at least http://doc.rust-lang.org/0.12.0/guide-lifetimes.html#named-l... is covered which was one of my objections. I still don't like that I can't design my own linearly typed constraints, but it appears that was never a goal to begin with. Not surprising given the lack of emphasis on expressive types.

As it stands my only options for linear'ish types are indexed Monads in Haskell or building a model of linear types in a proof assistant or DTPL.


I and some others are working on a rough approximation to linear types here: https://github.com/Manishearth/humpty_dumpty

A perfect implementation can be added to the language, but getting it to work perfectly with backcompat is hard so it might be made in 2.0.


Haskell type classes can not be implemented with zero runtime overhead in all cases, because it is possible to have an unbounded number of instances at runtime for a single type class. The Rust equivalent will just fail at compile time (of the trait implementation, not the trait definition) due to a recursion error.


Could you describe some of the upsides of regions as compared to Rust-style lifetimes and borrowck?


There is always overhead when adding abstractions--the only question is whether you pay at runtime or at compile time. C++ (and presumably Rust) choose the latter, Python and Go choose the former.


Yes, in these discussions, the overhead being referred to is runtime overhead.

There's also overhead in the sense of complexity for the programmer, which isn't really either of those two.


> There's also overhead in the sense of complexity for the programmer

Well, from the programmers' perspectives there are both read-time and write-time overheads. In C++-land, the discussion about the new (to C++) 'auto' keyword is about the trade-offs between the two.


C++ chooses both kinds, with templates and virtual functions. Rust does too.


Technically Rust (and C++, IIRC) choose both, but the mindsets of most Rust(/C++) programmers/libraries is to use lower cost abstractions wherever possible. Rust provides features and libraries that provide a dynamic alternative to many different statically checked things; eg trait objects, cell types (internal mutability), shared pointers, etc.

The difference is in the mindset: Most Rust programmers will hem and haw at having to use Box<Trait> until we don't have any other viable alternative (with Rust and its enums, there usually are many better ways to do it with lower cost). Java/Go libraries are completely okay with using dynamic dispatch everywhere. While writing Java code I won't be concerned about using an interface type or `Object` or whatever because zero-cost abstractions are not Java's thing. Of course, if there is a static alternative I will still prefer it, but I won't be too bothered if I just stick with `Object`.


C++ traits, templates, and non-virtual classes have no overhead.


If you want to be really pedantic, there is also the choice between "runtime overhead" and "code overhead" (code specialization).

And why not mention the overhead of the programmer foregoing these "short cuts" and hand-coding it herself.


That first one ain't pedantic; C++ code on the whole tends to generate a lot of assembly bloat with inlined methods and template instantiations, and while executable size probably isn't that important for most use cases, it is for quite a few.


It is pedantic in the sense that "overhead" in a programming context will most of the time mean some kind of runtime overhead, unless noted otherwise.


For being a lower level language, Rust's abstractions really make feel closer to a higher level language than to C. It will be even more so if HKT's land.



> What you do use, you couldn’t hand code any better.

... given you have no information about runtime behavior other than the static code.

How to achieve "zero-cost abstractions" is, as usual, a design tradeoff.

Rust -- like C++ -- lets you choose in your code whether you'd like to pay for an abstraction or not. This does produce good machine code, but has two costs: 1/ the language gets more complicated and the programmer needs to be aware of what she's paying for, and 2/ you might end up paying for stuff you end up not using (for example, all button listeners might end up being the same type, and a vtable adds unnecessary cost), so that the generated code is only optimal if you have no other information about runtime behavior.

There's another way to add zero-cost abstractions: have a single simple abstraction and a JIT to figure out at runtime -- based on observed behavior -- what the optimal machine code is. This is what the JVM does. All method calls are virtual from the programmer's perspective, and no choice needs to be made ahead of time. At runtime, the JIT views the class hierarchy and usage, and decides whether a specialized, inlined version of a function is produced or whether a vtable is actually necessary (HotSpot even makes a special case when there are exactly two implementations, replacing the vtable with an `if`). If runtime behavior changes (a new type of listener is added or even new code is loaded at runtime that adds another implementation), the JIT will notice, reconsider and recompile. So in Java, virtual or even interface method calls are also zero-cost, even though you have no choice about using them; the decision on how to implement the abstraction is done by the (JIT) compiler at runtime. This, too, is a tradeoff -- a simpler language and truly optimal code taking into consideration not only static consideration but actual runtime behavior -- at the cost of a possibly significant warmup time and possible non-optimal "mistakes" by the JIT.


I wouldn't call that zero-cost. It's more like variable-cost, which could be even worse than always doing virtual calls in some application types.


The average case is always as expensive or less than a virtual call. But yes, the JIT most certainly introduces unpredictability, which may be unsuitable for hard realtime applications (hard realtime Java programs employ AOT compilation for those classes that require absolute predictability).


It's a problem for more than just realtime applications. Requiring a JIT means requiring a runtime, and that makes it much harder to do thing like embed Rust libraries in scripting languages or expose a C interface.


JIT has little to do with interoperation. It's quite simple to generate C symbols pointing to stubs. What makes interoperation hard is usually a GC rather than a JIT (it's just that most JITted languages also employ a GC; but if you look at, say, Go, it's just as hard to embed or link against than as and it doesn't employ a JIT at all -- its runtime "just" performs scheduling and GC).

---

BTW, Java can be embedded in scripting languages because those languages run on the platform itself and share the runtime. Because of the JIT -- that optimizes across libraries and languages -- the interoperation is cheaper than with C. So much so, that you get the following story: As part of the work being done at Oracle on Graal, HotSpot's next-gen JIT, they've ported various scripting languages to the new JIT, among them Ruby. They've found[1] that if they interpret/JIT the C code of the native Ruby extensions they get better performance than a "plain" Ruby runtime calling into statically compiled C, because the JIT is able to optimize across the language barrier.

[1]: http://www.chrisseaton.com/rubytruffle/cext/


Right, GC is hard to embed and inlining is a powerful optimization. But the runtime that both JIT and GC require makes things hard for JIT on its own.


> Traits are interfaces

So why not use "interface" keyword?


> So why not use "interface" keyword?

They aren't really interfaces. They can be like interfaces (just method signatures) or like classes without data (including definitions for some or all methods). Using a name distinct from either "class" or "interface" limits the degree to which incorrect expectations from similar-but-critically-different constructs in other language interfere with understanding of the Rust construct.


Traits as defined in scala can actually have state and data.


Do Scala's traits change behavior depending on order of definition?

That scares me tbh.


There are ways to fix that, but I could never get Martin interested in my exclusive inheritance proposals. In particular, only one trait should be allowed to "implement" an abstract method in an object, although typical overriding is still unrestricted. It does require a new keyword (implement vs. override). As for overriding, traits could be ordered globally based on extension relationships and other fixed markers (e.g. Names).


Depending on order of inheritance? Yes. And yes, it should scare you (though IME problems are rare and are obvious when they do occur). I love Scala but it's very much a language of awkward compromises.


Because trait is a much better word for it: https://en.wikipedia.org/wiki/Trait_(computer_programming)


So in Rust traits can include full methods bodies, not only signatures?


Yes. The feature has been proposed for Java too: "defender methods".

Originally I resisted them on the grounds of not being necessary, but they're used all over now. Being able to supply a default implementation is extremely useful.


Quick correction: Java 8 does have defender methods.


default method implementations makes traits behave like mixins, it's very useful (and it is a form of code reuse).


Totally agreed. I was wrong to resist them :)


Thanks to all for explanations! :) And now I think Traits are traits, but can be used as interfaces too :)



They can also contain associated constants.


> Traits are somewhat between an interface and a mixin: an interface is made only of method signatures, while a trait includes also the full method definitions, on the other side mixins include method definitions, but they can also carry state through attributes while traits usually don't.

Rust's traits may specify just a method signature and force the implementor of the trait to implement the method, but they may specify full definitions of methods too.


Java's interfaces are getting default implementations now, aren't they?


Yes, they are in Java 8, though you can't implement an interface for an existing class in Java, so it's not quite a trait.


It sits between interface and implementation. They're... more an `intermediate`.


Probably for the same reason that they call their unions "enums" or their heap blocks "boxes". Standardization of jargon has never been a thing with rust, just roll with it.


Well, enums aren't unions, they're tagged unions. And other languages call boxed values by that name too.


Stop it, they're unions. :) Among the target market, people are going to intimately familiar with the use of "enum" and "union" from the C language. Rust's concept of a single object that can store exactly one of several types of sub-objects matches one quite closely, and not the other. Having a runtime tag and affirmative checking doesn't change the nature of the thing. We don't call "cars" something different when we add cruise control or anti-lock brakes.

"Enumerate" in English is just a fancy word for "count" -- it means to assign numbers to a bunch of things. Which is exactly what the C concept did. The Rust usage (to mean "something that can be in one of a few different states") is new, though Java has something fairly close too.

It was a poor choice, sorry. Likewise being deliberately difficult with "trait" vs. "interface" (picking Self's jargon instead of the term that literally everyone already knows from decades of OOP) didn't serve you well. Thus we have blog posts like this needing to tell us what we probably should have been able to figure out from context.

Finally, regarding "box" vs. "block". Other languages (C# is the only one that comes to mind off-hand) have used the idea of "boxing" to imply the allocation of space for and copying of pass-by-reference data. That's sort of a different notion than simple heap allocation, so it sort of gets its own jargon I guess. I didn't complain anyway. But with Rust, a "box" really is used to refer to a dynamic heap block in any context. We sort of already had a perfectly good word for that.

Pretentious jargon isn't the worst crime in the world, but I do think Rust seems needlessly complicated in the way it likes to play Shakespeare with existing concepts.


But a Rust enum is equivalent to a C-style one in the basic case. Rust's is just able to do more as well to support the common idiom of tagged unions. Unions themselves cannot be expressed safely in Rust, so it'd do little good IMHO to have different terms for them. I guess you could say that they should have avoid "enum" and went with "data" as in Haskell, but then people would complain that the basic ADTs are just enums and wonder why they can't just use "enum" like before!


I don't really want to argue about enum vs. union, because as far as I'm concerned they're pretty much equally accurate or inaccurate. Enums in C carry tags, but no data. Unions in C carry data, but no tags. Rust ADTs carry both (or one, or neither). So "enum" or "union" are pretty much equally good/bad names as far as I'm concerned. Consensus was in favor of "enum", so we went with it. Swift did too, so the small amount of emerging consensus as to what to call ADTs in C-like languages is nice.

> Likewise being deliberately difficult with "trait" vs. "interface" (picking Self's jargon instead of the term that literally everyone already knows from decades of OOP) didn't serve you well.

But "trait" is the right term. I would agree with you if Rust traits couldn't have implementations via default methods, but they do, and interfaces usually can't (except in Java 8). Interfaces strongly suggest, well, interface, as opposed to implementation; however, traits mix and match both. You can perfectly well have traits that exist only to provide "mixin"-style implementations.

In earlier versions of Rust, traits were called interfaces (and there were separate constructs to provide implementations of interfaces), but one of the design simplifications was to unify all those concepts into one: the trait.

> Finally, regarding "box" vs. "block". Other languages (C# is the only one that comes to mind off-hand) have used the idea of "boxing" to imply the allocation of space for and copying of pass-by-reference data. That's sort of a different notion than simple heap allocation, so it sort of gets its own jargon I guess. I didn't complain anyway. But with Rust, a "box" really is used to refer to a dynamic heap block in any context.

Sorry, I just have to disagree here. I don't think anything would be simpler if the keyword were "block":

    let x: Block<f32> = block 3.0;
"Block" as a verb doesn't mean "allocate" in the same way that "box" does: if anything, "block" implies something related to putting threads to sleep for I/O. And as a type, "block" sounds like a code block—i.e. something like a lambda. Ruby uses "block" for this, for instance.


> And as a type, "block" sounds like a code block—i.e. something like a lambda. Ruby uses "block" for this, for instance.

As does, notably, Objective-C, like Ruby due to the Smalltalk heritage.

I've never heard the term "block" used to refer to heap allocations, so I don't think it would really help.


> regarding "box" vs. "block". Other languages > (C# is the only one that comes to mind off-hand)

F# actually uses box much like Rust does:

    // box an int
    let o = box 1
Examples: http://fsharpforfunandprofit.com/posts/cli-types/

MSDN: https://msdn.microsoft.com/en-us/library/ee340516.aspx


I guess we'll have to agree to disagree here. I think what you consider 'pretentious jargon' really depends on what kinds of languages you've used and the way you think about them. I think calling enums unions would have been a very big mistake, as they would mislead C programmers. Even you say that they're 'quite close', but they're not the same thing.


I, uh, don't really follow that logic. A Rust enum isn't even "quite close" to a C enum, yet you used a colliding term. Surely a C programmer who would have been mislead about the fact that a union type is runtime-tagged and checked is going to be very misled that an "enum" has no numeric value and can contain variable state at runtime.


  >  A Rust enum isn't even "quite close" to a C enum
This is incorrect. The following Rust enum compiles down to a single byte, whose variants are represented by the numbers 0, 1, and 2:

  enum Foo {
      Zero,
      One,
      Two
  }
You can even give them all numeric values explicitly:

  enum Bar {
      Ten = 10,
      Eighty = 80,
      TwoHundred = 200
  }
And you can also just tell it where to start and let it count from there:

  enum Qux {
      Five = 5,
      Six,
      Seven
  }
If you throw the #[repr(C)] attribute on any of these then Rust will make sure to size them as C would on your particular platform (on my machine this attribute inflates them from 8 bits to 64 bits), making them usable directly from C as well.


I guess my ultimate point is that _everything_ is a colliding term in some way, with some language. Rust isn't C, so you can't assume that things map 1:1 to C.

Names are hard.


I'll have to admit enums are where I gave up the first time I read the Rust tutorial. They seem almost completely unrelated to enums in C and other languages.


Because they are enums done right. Enums in C and many other languages are often just a helpers to define integer constants (or abused in this way) and can be evil.


> They seem almost completely unrelated to enums in C and other languages.

They're not. Payload-less enums devolve to C enums. Rust's enum simply build the enum+union enumeration pattern into the language, and allow leaving out the "union" part.


Right, because they're unions. :)

Just go back to that tutorial with your C hat on and substitute the word "union" for "enum" and I promise it will all make sense. All your intuition about C unions will cross over just fine, and the new Rust rules (they're tagged at runtime and the compiler enforces that you can only ever use fields of a runtype-checked subtype) are straightforward extensions.

Likewise the linked blog post begins, comfortingly, with "Traits are interfaces". Once you get beyond the new jargon, you find it wraps a concept which is 95% compatible with something you've been using for years.

That the Rust team seems to find no value in this kind of naming, preferring the excess precision that comes with Create-Your-Own-Name, is what I was calling "pretentious jargon" in a previous post in the thread. It's really not that bad (I mean really, they're just names), but it doesn't speak well to where the designers heads were when they invented this stuff.

Really, that's what's starting to creep my out about Rust. Just like C++ 30 years ago, it seems like Rust has caught itself up in an internal rush (among its rock-star language nerd designers) for Masterpiece Status and sort of forgotten the goal of creating a practical tool for working programmers... At some point in the near future I have to wonder if we're going to start seeing blog posts about choosing a "sane subset" of Rust with which to develop software.


> and sort of forgotten the goal of creating a practical tool for working programmers

This is blatantly false. I've been in touch with Rust development for quite some time, and pragmatism has been paramount in all design decisions. Suggesting otherwise on the basis of disagreeing with some naming choices is complete nonsense.


  > it seems like Rust has caught itself up in an internal 
  > rush (among its rock-star language nerd designers) for 
  > Masterpiece Status and sort of forgotten the goal of 
  > creating a practical tool for working programmers
This is complete hogwash. Just because you disagree with the chosen terminology doesn't justify attacks on the character of the Rust developers.


Sigh... it's an opinion. I even used "seems". I was around when we all watched C++ go from "exciting new tool we should all use" to "wait, does anyone else understand that new stuff because I don't anymore". This feels exactly the same.

I'm no dummy, yet Rust is just confusing as hell sometimes. And you guys frankly don't seem to care (again: note marker "seems" to indicate a personal opinion and not a "character attack"). That turns me off. It turns lots of people off. And I don't see any significant effort being made at making it an easy tool to learn and use.


> And I don't see any significant effort being made at making it an easy tool to learn and use.

Just to name a few off the top of my head:

1. Lots of focus on friendly compiler error messages, including typo correction, automatic lifetime suggestions, and error message explanations.

2. A strong worse-is-better approach in many aspects of the language design, such as preventing reference-counted cycles (we don't try to), numeric overflow (we don't try except in debug mode), typeclass decidability (it's deliberately undecidable in corner cases to avoid complex rules), prevention of deadlocks (we don't try), userland threads (we don't implement them anymore), asynchronous I/O (it's out of the domain of libstd for now), etc.

3. Blog posts like this one to introduce aspects of Rust, as well as the tutorial.

4. The Cargo package manager, as well as crates.io.

5. Naming conventions designed to fit well with C, for example choosing "enum" over "data"/"datatype" as in ML, "trait" over "class" as in Haskell (since the latter means something totally different), but modified in some cases to avoid leading programmers of C-like languages astray (for example, "interface" changing to "trait"). This naming process has taken time, but I think Rust is in a pretty good place now. There are obviously disagreements as to the naming, but we can't please everybody.

Certainly we weren't perfect, but there was a lot of effort put into making Rust as easy to use as possible.


I'm updating one year old Rust code currently, it's pretty obvious a tremendous amount of work has gone toward making it a more usable language. Yet it is not a small one, so some effort to master it is to be expected...


> Just go back to that tutorial with your C hat on and substitute the word "union" for "enum" and I promise it will all make sense. All your intuition about C unions will cross over just fine, and the new Rust rules (they're tagged at runtime and the compiler enforces that you can only ever use fields of a runtype-checked subtype) are straightforward extensions.

This is not true. Rust enums allow you to match on which of the types you have. C unions do not. If you want to implement a switch statement over the possible members of a C union, you need to put it inside a struct with a type field. You don't need to do so in Rust, and you can't do so and have it compile.

(That said, if your real complaint is that the official docs on enums are confusing, I'd certainly agree with that.)


> (That said, if your real complaint is that the official docs on enums are confusing, I'd certainly agree with that.)

I'd love more specifics about which docs, if you have some time.


I hestiate to speak because everyone will turn it gray, but it's worth pointing out that the whole idea of enums having C-like "discriminants", which like four people have yelled at me about, is entirely missing from the book.

I had to go look it up in the reference, where it is sort of hidden too.


I just sent in this PR, is this helpful? https://github.com/rust-lang/rust/pull/25348

Specifically I ended up rewriting most of the enum page: https://github.com/geofft/rust/blob/trpl-fix-enums/src/doc/t...

I think there's more that can be done (e.g. the book doesn't document that if every variant of an enum is data-less, you can cast it to an integer), but hopefully this is a start.


trpl/enums.md is super weird, and also in the wrong order (the book documents enums before structs, and way before tuple structs, whereas enums are IMO easiest explained once you've already explained structs). I was going to send you a PR tomorrow or so. :)


Fair enough. Tomorrow is basically the deadline before 1.0 though, so send it early :)


And I guess by "standardization" you mean use a C-like name, even though it is inaccurate and misleading and the data type declaration is taken from another language family (ML)?

(I agree that "enum" is a wrong name to use, but apparently for different reasons.)


"new traits can be implemented for existing types (as with Hash above). That means abstractions can be created after-the-fact, and applied to existing libraries."

Sounds a bit like a Ruby monkey patch. What happens in case of conflict - my trait adds a .hash method, but there already was one?


Trait methods are only visible when the trait is in scope. So a conflict would only appear when both traits are imported and the method is called, in which case you'll have to disambiguate.


Cool, thanks for explaining. :) Would it be a compile-time or runtime error?


It would be a compile-time error saying (in essence) "multiple applicable methods in scope" and giving you a list of the available methods.


Awesome!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: