If you have any interest in programming languages, you’ve probably heard of Rust by now. I had been meaning to try it out for quite some time, since the claims that the borrow checker can make whole classes of memory errors impossible with zero runtime cost are very intruiging. Unfortunately, each time I got far enough to examine some sample code, I would immediately recoil in horror at the syntax and put it away again.
A few months ago, I finally decided to push through my negative syntax
impressions and give it a real try. Now that I’ve spent a couple of months
with Rust and written a few thousand lines of code, there’s a lot to like. The
borrow checker and ownership tracking are indeed powerful. There are many
high-quality libraries available, and pervasive use of things like Option
and
Result
make error handling very clean. In many ways, Rust feels like a more
practical version of Haskell: You get (most of) the most useful monads and
(some of) the most popular algebraic data types, but without the pain of
rewriting your algorithms into fully functional versions.
But oh, that syntax! It has so many strange warts that make the language harder to learn and harder to read. Let’s look at a few examples.
First, many of the keywords and common data structures are abbreviations, such
as fn
, mut
, pub
, dyn
, impl
, mod
, Vec
, &str
. Not only does your
brain have to fill back in the full word from each abbreviation, but at least
some of them are expected to be pronounced as though all the missing letters
were present. For example, the Vec
documentation says
“written as Vec<T>
and pronounced ‘vector’.” What’s the story here? Were
the designers being charged by the keystroke? Just spell them out and let
people’s editors do completion if it’s so horrifying to type a few extra
characters (or should I say chars?). Even worse, some of the important keywords
and operators mix these abbreviations with symbols, such as using &mut
to
mean a mutable reference.
Next up: Poorly-named common data types. For example, if I write let a = [1,
2 3]
, I would normally expect to get a vector initially containing 1, 2, and
3. Not in Rust! That syntax makes a
into a read-only array containing
exactly those three elements. If I want a
to be modifiable after
initialization, I have to write something like let a = vec![1, 2, 3]
(which
presumably generates some code to create a correctly-sized Vec
and then copy
the static array elements into it). Why would you reserve the shortest syntax
for the less-useful fixed array?
Similarly, String
represents an owned string, &String
represents a
reference to a string, and &str
represents a read-only string slice. A slice
of a vector (oops, Vec<T>
) or an array is spelled &[T]
, so why is a slice
of a String
spelled &str
? Like the previous point, this smacks of some
kind of keystroke parsimony.
Why does calling a macro require !
? One explanation I heard is that it’s a
marker to indicate that extra code is going to be generated, but that doesn’t
make sense. Calling a function runs an arbitrary amount of code, so why would
I need extra scrutiny for the code generated by a macro? Don’t get me wrong:
Rust macros are way better than C preprocessor macros. But from an ergonomics
point of view, why would I want to write println!()
instead of println()
?
Why would I want to have to remember that it’s a macro instead of a regular
function call?
Next, Rust is expression oriented, but it still has statements. The only
difference between an expression and a statement is the presence of a trailing
semicolon. When you combine this with using the last expression in a block as
the result of the block, this makes a mess of moving code around. Like the
:=
operator in go, it’s very easy to break a working sequence of code by
inadvertently introducing or removing a statement where an expression is
needed, or vice versa. Also like :=
, the compiler can nearly always tell you
exactly what you did wrong and where to add (or remove) the semicolon. So why
have this distinction at all? Why not make everything an expression?
There are a bunch of shorthands that save a few keystrokes at the expense of making the behavior less obvious for the reader. A couple of examples:
- Using
[3; 5]
to mean[3, 3, 3, 3, 3]
. I definitely see the use of reusing an initializer expression without repeating it, but who would ever guess what this means from reading it? Since when does a semicolon imply repetition? Python uses[3] * 5
for approximately the same thing and perl uses(3) x 5
. Perhaps one of those could have been reused, or even something like[5 of 3]
. - Allowing
struct { field }
to meanstruct { field: field }
. This gets extra confusing if you mix local variables with explicit assignments; it’s really easy to mis-read it as an ordered list of field assignments instead of the implicit mapping. It also makes refactoring harder because renaming a variable or a struct member might unexpectedly require you to introduce the explicit field label.
Rust also reuses the same syntax to mean multiple things. Each of the features
where this happens is separately useful, but why should they be spelled the
same when they’re completely unrelated? For example, consider ..
:
buf[..len]
is a slice of the firstlen
elements ofbuf
.struct { ..rhs }
creates astruct
with all the otherwise unset fields set fromrhs
( x, .. )
inside a pattern match deconstructs the matched object to extractx
and ignore the rest of the fields.
Conversely, it also provides mutiple syntaxes for the same thing. As far as I can tell, all of these produce an identical trait bound:
fn f(a: impl T)
fn f<U: T>(a: U)
fn f<R>(a: R) where R: T
As it turns out, all of those complaints are really syntactic complaints. Yes, they increase Rust’s already-steep learning curve. Yes, they put more burden on the reader to mentally simulate Rust’s rules instead of just reading what’s presented. Even so, I don’t think any of them actively thwart useful programming patterns. Even though I don’t love the syntax, I’m going to continue on and see if I can learn to like it. The borrow checker is very cool, and it’s nice to have both automatic memory management and deterministic variable lifetimes.