BLAKE3 1.0

ta988 · on July 26, 2021

BLAKE3 is a successor to BLAKE2 (used by wireguard for example) a really fast crypto hash function. It uses binary trees and other approaches that allow to use SIMD and multithreading.

tomcam · on July 26, 2021

Thank you for the concise explanation and context.

codetrotter · on July 26, 2021

Would a future version of the Wireguard protocol and implementations benefit from switching to BLAKE3? Enough so that it would make sense for them to make the switch either, as the only difference between now and then, or alongside other changes they may have in mind? And if it makes sense, are they planning on doing it?

tptacek · on July 26, 2021

It's unlikely, since part of the design of WireGuard is not to allow negotiation of primitives. They'd version the whole protocol, and BLAKE2->BLAKE3 is not a strong incentive to do that.

syrusakbary · on July 26, 2021

This is awesome. We have been using BLAKE3 in Wasmer for more than a year [1] and has been working great for us.

We were trying to find a hash function for WebAssembly files that is crypto-safe. The only alternatives that were fast enough (>1Gbps) were not crypto-safe (ahash, murmur, crc32), so BLAKE3 was the obvious choice to not sacrifice on speed while getting a cryptographic hash function.

[1] https://github.com/wasmerio/wasmer/pull/1140

comagoosie · on July 26, 2021

Yeah it's a bummer that a highwayhash is only cryptographically strong (but not cryptographically secure) as it can be significantly faster (3x).

But BLAKE3 does seem to offer the best compromise when a cryptographically secure hash is required.

22c · on July 27, 2021

> a highwayhash is only cryptographically strong (but not cryptographically secure)

Can you let me know what you mean by strong vs. secure? When would you use one vs. the other? I've heard both of these terms used but they seem almost interchangeable[1].

I've also heard things like "this would be suitable for encrypting a password which is stored at rest" vs. "this could be suitable for a short lived one-time key", but I don't know what the correct terminology is there.

[1] https://en.wikipedia.org/wiki/Strong_cryptography

tptacek · on July 27, 2021

It's weird terminology. highwayhash aims to be cryptographically secure for its problem domain. But it's designed by (afaik) non-cryptographers, has weird security claims (they gauge security from statistical tests) and hasn't (afaik) received any formal peer review; it can also be instantiated in sub-secure hash sizes. You shouldn't use it to protect secrets, beyond the kind of incident secrecy you'd ask from something like SipHash.

It would be better if people would be clear about this stuff; you see the same thing from the PCG RNG people, who say that their generator isn't a CSPRNG, but is somehow more secure than other non-CSPRNGs.

d110af5ccf · on July 27, 2021

Without intending to endorse the wording, I suspect what's being communicated might be something related to the difficulty for an attacker to derail your system by cheaply predicting hashes. IIRC, the standard Java runtime HashMap implementation was susceptible to this at one point - an attacker could easily and cheaply force all values into only a few buckets.

The idea being, you might not care about actual cryptographic security but instead only the feasibility of some sort of cheap online collision attack.

tptacek · on July 27, 2021

Right, that's what SipHash tries to do too (SipHash was designed by two very reputable cryptographers).

22c · on July 27, 2021

In layman's terms, sounds like it's something which is difficult to guess but not necessarily difficult to crack?

Klasiaster · on July 26, 2021

One of my favorite aspects of the b3sum utility is that it has a --no-names flag:

    $ echo hi | b3sum --no-names
    0b8b60248fad7ac6dfac221b7e01a8b91c772421a15b387dd1fb2d6a94aee438

lbotos · on July 26, 2021

I've noticed in my shells that when I do this I get the newline, so be wary of that if you are passing this hash around. I think echo -n should fix that on most platforms.

I just tried your "hi" against https://connor4312.github.io/blake3/index.html

and get

hi/n - 0b8b60248fad7ac6dfac221b7e01a8b91c772421a15b387dd1fb2d6a94aee438

hi - 85052e9aab1b67b6622d94a08441b09fd5b7aca61ee360416d70de5da67d86ca

anthk · on July 26, 2021

Use printf.

forty · on July 26, 2021

Sometimes genius is in the little things :)

leo_bloom · on July 26, 2021

If I had a BLAKE3 implementation available in the programming language of choice, is there any reason to still prefer the SHA family over it (for integrity checks, not for password hashing, as mentioned in the readme)?

nmadden · on July 26, 2021

Aside from the various technical reasons others have given, I would like to say please have a look at the underlying design of SHA-3 - it’s really elegant, with so many applications beyond just hash functions. Ironically, I feel like SHA-3 should obsolete block ciphers like AES more than it obsoletes SHA-2.

https://keccak.team/sponge_duplex.html

rstuart4133 · on July 26, 2021

> look at the underlying design of SHA-3 - it’s really elegant

Yes, I imlemented a whole pile of hash functions, and I agree wholeheartedly. Whereas md5/sha seem to be have been designed by pouring a hodgepodge of complexity into a algorithm until something indecipherable turned up sha3 is simple. It's just a small number of easily understood operations, each with a clear purpose.

Actually, it looked to me like it's been an evolution. md5 is insanely complex and the sha2 family got simpler, then then we get t sha3.

Symmetric algorithms look to be going the same way. DES is insanely complex, AES less so, and Speck in almost unbelievably simple (look at the source code on Wikipedia https://en.wikipedia.org/wiki/Speck_(cipher)). It seems to be an unfashionable viewpoint, but in my mind that simplicity makes Speck seem more worthy of trust that a lot of it's rivals.

Mind you,

stouset · on July 26, 2021

SHA-3 is extremely slow compared to common ciphers like AES and ChaCha20. Sponge functions might someday become the building blocks of symmetric ciphers, but it's unlikely that SHA-3 will (without hardware acceleration).

oconnor663 · on July 27, 2021

For historical reasons the SHA-3 standard made extremely conservative choices with its security parameters, particularly the number of rounds. The result is that SHA-3 is slower than SHA-2 in a lot of cases, but it didn't have to be that way. The same team of cryptographers published the KangarooTwelve hash in 2016, with half the number of rounds. I think that implies that SHA-3 could've been twice as fast with no loss in security. KangarooTwelve also introduces a tree structure, which enables a lot of the same optimizations that you see in BLAKE3, and the two designs are interesting to compare. (See section 7.6 of the BLAKE3 paper.)

nmadden · on July 27, 2021

Well SHA-3 is a hash function, and indeed somewhat slow in software. But the team have since enormously expanded the primitives based on the same core design, with much better performance: https://keccak.team/sw_performance.html

You can also look at things like the Strobe framework, which builds essentially all of its symmetric crypto out of the SHA-3 core permutation: https://strobe.sourceforge.io/

GoblinSlayer · on July 26, 2021

BLAKE3 has a large state, it's a tree hash, so you need to keep the tree of hashes, which is proportional to logarithm of the message size. SHA has fixed state for any message.

ReactiveJelly · on July 27, 2021

Huh, I didn't know that.

https://github.com/BLAKE3-team/BLAKE3/blob/b404c851c284ed01f...

I peeked at the reference impl (380 lines of safe Rust) and found this. It has a 1,728-byte array for tree state, which is enough for 2 ^ 64 bytes.

So in practice it's also fixed.

loeg · on July 26, 2021

Federal government (FIPS) compliance.

api · on July 26, 2021

Security-wise they are roughly equivalent. While SHA has had more eyes on it I doubt either construction will ever be practically broken at hash sizes like 384 or 512 bits. Someone may find an "academic break" at some point.

BLAKE3 is faster, sometimes a lot faster, on hardware without SHA instructions. On hardware with SHA instructions SHA may be faster. Same as the AES story where AES is faster than ChaCha on CPUs with dedicated hardware but slower otherwise.

nonagono · on July 26, 2021

According to zooko, one of the authors, in new-ish cpus blake3 beats sha256 even with hardware acceleration: https://twitter.com/zooko/status/1419403567320821760

magila · on July 26, 2021

I'd like to see the benchmarks, including power draw. I suspect it is similar to soft ChaCha vs hard AES. A ChaCha software implementation can achieve similar speed as the AES hardware at the cost of significantly higher power draw due to pushing the AVX units at near maximum utilization.

Matthias247 · on July 26, 2021

I benchmarked hardware AES vs software ChaCha20, and the former showed an overall performance improvement of an end to end QUIC software stack of more than 50%. The pure crypto difference is probably even higher. That's a huge gap - even thought it might totally be possible that the ChaCha20 implementation of Ring is still improvable.

As a result of that, I asked for rustls to default to AES instead of the previous ChaCha20 default [1]

[1] https://github.com/ctz/rustls/issues/509

cesarb · on July 26, 2021

Some implementations (IIRC Firefox is one of them) choose it dynamically: if hardware AES is available, AES is ordered before ChaCha20, otherwise ChaCha20 is ordered first. Last time I looked, at least for Intel the i3 didn't have hardware AES (but the i5 and i7 have it), so it was not uncommon for lower-end hardware (which needs speed the most) to have AES slower than ChaCha20.

GoblinSlayer · on July 26, 2021

It should be noted that chacha20 has insanely comfortable security margin. With a more accurate estimate chacha8 has a security margin similar to that of AES, and chacha20 has 2.5 times more rounds, see if it's worth the cost.

wmf · on July 26, 2021

Unfortunately almost no one can use ChaCha8 since the standards all call for ChaCha20.

bobbyskelton41 · on July 26, 2021

That sort of depends: https://github.com/rui314/mold/issues/92#issuecomment-879709...

SAI_Peregrinus · on July 26, 2021

And if you're designing hardware (or on an FPGA) ChaCha (and BLAKE2/3) are actually really freakin' fast for the die area they take. IIRC ChaCha20 beats AES in hardware. It's just extremely rare (FPGA only) to find it in hardware, since there's not enough demand. I expect that to eventually change.

api · on July 26, 2021

Given that all these are ARX cores, I wonder if a fused ARX instruction could cover a wide range of them?

brandmeyer · on July 26, 2021

ARMv8.2 has rotate-and-xor and xor-and-rotate so that the (extremely cheap) xor can be saved.

SAI_Peregrinus · on July 26, 2021

Probably. With a barrel shifter easily. Barrel shifters are a bit slower than wired shifts though, so for ultimate speed you'd end up with hardware shift amounts.

api · on July 26, 2021

The advantage would be a single instruction that implemented the core of a wide range of things including BLAKE2, BLAKE3, Salsa, ChaCha, Speck, and more.

maqp · on July 26, 2021

One aspect of security is also the misuse resistance. You can of course create a secure MAC with SHA256 in the HMAC configuration, but it usually takes a masters level course on cryptography to know what is Merkle-Damgård construction, and why it's design is imperfect:

You can't just do SHA256(key + message) to generate a safe MAC. With BLAKE (and all SHA3 finalists) you can do that safely.

It's true every time you make the algorithm more misuse resistant, the universe will come up with a more dunning Kruger, but despite that, it's something that can actually improve, the security is already more than adequate: Like Schneier so eloquently put it, "we're building a fence for sheep, it doesn't matter if the fence pole is a mile or two miles high".

api · on July 26, 2021

Good point. Misuse resistance is also why I am a fan of SIV constructions for stream ciphers, since "repeat a nonce = instant death" is a footgun.

Repeating a nonce is easier than you might think if you are using threads and accessing a nonce counter non-atomically, have a bad RNG, are on an embedded platform with bad RNG seeding, have a bug that overwrites some memory used to generate nonces, or just transfer a ton of data with the same key (birthday attack). SIV makes nonce reuse fairly benign. The only consequence is that if you happen to reuse a nonce with two identical messages, an attacker could tell that you sent the same message twice. That's generally not catastrophic and statistically is far less likely than repeating a nonce with different messages. Repeating a nonce with different messages generally does nothing in SIV.

You could theoretically use SIV with no nonce, with the only consequence being that an attacker could always tell if you sent duplicate messages. Not sure why you'd do that though.

IMHO since we now have ciphers that are probably "unbreakable for the foreseeable future" (e.g. AES and ChaCha) we should probably concentrate on creating and popularizing misuse-resistant constructions as much as possible. It's good to remove footguns.

maqp · on July 26, 2021

Hadn't really looked into SIV as I've only written stuff that always generates XChaCha nonces with getrandom but yeah I can totally see why the platform etc. could cause issues that lead to nonce-reuse. This was most informative post, thank you so much!

api · on July 26, 2021

SIV is usually done with AES/GMAC constructions but you could do it with ChaChaPoly just fine.

The big downside is that it requires two passes on encrypt: one to create the MAC and derive the IV and another to encrypt. The overhead for this is small for message/packet based systems though since after pass one the data will be sitting hot in the processor's L0 cache. Decryption can be done in one pass.

forty · on July 28, 2021

Aren't you supposed to Mac the encrypted data?

staticassertion · on July 26, 2021

> You can't just do SHA256(key + message) to generate a safe MAC.

Can you explain this?

dagenix · on July 26, 2021

A Sha256 hash is just a dump of the internal state of the function. If you know the hash, you can keep running the hash function for more data and calculate a new hash for the original data with new data appended.

maqp · on July 26, 2021

What @dagenix said. See e.g. Thomas Pornin's answer here https://crypto.stackexchange.com/a/3979 for more details

stouset · on July 26, 2021

If you have the output

    h = SHA-256(k || m1)

you can easily compute a function `F(h, m2)` such that

    SHA-256(k || m1 || m2) = F(h, m2)

allowing you to forge a verifier for `m1 || m2` under `k` for any `m2` you wish without actually knowing `k`.

wvh · on July 27, 2021

You can with the truncated versions, though.

vluft · on July 26, 2021

single-threaded blake3 is about 1.9x faster than (hardware) sha256 on my Zen 2 CPU.

garmaine · on July 26, 2021

Hardware implementation.

fsiefken · on July 26, 2021

Hopefully, like BLAKE2 it makes it into OpenSSL.

One thing I was wondering about, OpenSSL 1.3 dropped support for SSL compression. Would that mean the compression function in BLAKE2 and future BLAKE3 integration couldn't be used or is this a different layer? https://en.wikipedia.org/wiki/BLAKE_(hash_function)

pbsd · on July 26, 2021

The compression function in a hash function has nothing to do with the general data compression that TLS got rid of; it's a component of the hash that takes n bits and outputs less than n bits in an as unstructured manner as possible.

AlotOfReading · on July 26, 2021

It won't be added to OpenSSL until it's standardized (see [1]). However there's an external provider available of somewhat dubious quality that adds support for it ([2]).

[1] https://github.com/openssl/openssl/issues/11613#issuecomment...

[2] https://github.com/J-Montgomery/blake3-prov

Croftengea · on July 26, 2021

Is this 1.0 ready indeed?

a comment in blake3_neon.c: // TODO: This is probably incorrect for big-endian ARM. How should that work?

pornel · on July 26, 2021

1.0 is not the last and final version of a product. There can always be 1.0.1 or 1.1 or 2.0. Numbers are plentiful.

Rust ecosystem is already overthinking 1.0 releases, which results in tons of crates having 0.x versions while being depended on as de-facto stable.

st_goliath · on July 26, 2021

> 1.0 is not the last and final version of a product.

Of course, but if you bother to do semantic versioning, it should strive to be a stable one and not a "we are still experimenting" release.

IMHO knowing that something doesn't work, or isn't even implemented yet and will be addressed in the next release is OK for an 0.x.y release, but you shouldn't rush towards 1.0.0, already planing to release it "unfinished" and complete it later on in 1.0.1. That IMO kind of misses the point of the versioning semantics.

user-the-name · on July 26, 2021

Big-endian ARM basically does not exist in the wild. It is very much an edge case of an edge case, and definitely not something that should be a blocker for a 1.0 release.

wongarsu · on July 26, 2021

By that logic, would you ever release a 1.0?

Imho, just having a stable API that works on x86-64 without optimizations would be enough for a 1.0 release. Having a stable API with highly optimized implementation for x86 and common ARM systems is a lot for a 1.0 release. The limitations on ARM BE could be better documented though.

st_goliath · on July 26, 2021

> By that logic, would you ever release a 1.0?

Of course! But you need a well defined feature set you want to have for 1.0 and stick to that. The scope of that is up to the people who run the project, I did not make any demands what this specific software should include in their 1.0 release. I did not say that it needs to be perfect, include all bells and whistles imaginable, all possible CPU optimizations and smell nice in order to merit an 1.0 release. What I'm saying is, that you should make an effort to ensure that the implementation of this feature set is somewhat stable.

So rather to the contrary, I would also suggest to keep the scope of a 1.0 release smaller than that, just like you suggested:

> Imho, just having a stable API that works on x86-64 without optimizations would be enough for a 1.0 release.

Releasing an un-optimized reference implementation as 1.0, or maybe only one optimized code path for x86_64 would IMO be perfectly fine. If they decide they really want optimizations for all kinds of CPUs in the 1.0 release, also fine with me. What the scope for 1.0 should or should not be is their choice. And it's also completely besides my point.

I'm specifically arguing against doing what pornel seems to imply: My point is, you shouldn't release an implementation you know misbehaves in some cases, because "we can fix it later".

I'm a fan of semantic versioning. And I firmly believe that in addition to API & ABI stability, for a major version release, some effort should be taken to iron out the implementation of that API as well. Of course bugs can, and will, crop up later and can be fixed with a patch level release, but IMO you shouldn't rush towards a 1.0 release with a backlog of known bugs for the next release. That's what I meant with "kind of misses the point of the versioning semantics".

staticassertion · on July 26, 2021

Support for an architecture would be a minor release, no? That's "adding a feature". It wouldn't be a breaking change.

cesarb · on July 26, 2021

What it being 1.0 means is that the API is stable. There might still be bugs in the implementation (no software is perfect), but once it's reached 1.0, it's hoped that the API is good enough and will not have to change for quite some time. If you look at these release notes for 1.0.0, they were all API changes.

brandmeyer · on July 27, 2021

IMO, this shouldn't be a TODO. This should be an #error based on the standard C macros for detecting endianness at compile-time.

oconnor663 · on July 27, 2021

This is a good point, and I'd like to fix it. When I look for endianness macros, it doesn't seem like there's a common standard. Is there an approach you'd recommend?

brandmeyer · on July 28, 2021

It seems that you're right - ISO C doesn't have standard macros for this. That said, since this code is ARM-specific, and you are already leveraging the ACLE header <arm_neon.h>, I think that a negative test for __ARM_BIG_ENDIAN is appropriate.

https://developer.arm.com/documentation/101028/0012/5--Featu...

oconnor663 · on July 30, 2021

I just put up https://github.com/BLAKE3-team/BLAKE3/pull/188 for this. Let me know if you have any suggestions about how to test it?

kbumsik · on July 26, 2021

I know ARM in principal supports BE, but never heard real BE ARM products yet. Is there one? I am just curious.

tyingq · on July 26, 2021

The hardware products support setting endianness at boot time. It's just a matter of OS's supporting BE mode.

Netbsd releases big endian versions that work on various ARM boards. Here's an announcement showing it works on an Rpi3 and below, for example: https://mail-index.netbsd.org/port-arm/2020/12/03/msg007117....

They said at the time that it's not yet working on the Rpi4 because of issues getting BE mode and UEFI to work together.

I don't know that it's used often, but it does exist.

howinteresting · on July 26, 2021

Why would anyone want to use BE mode at all?

tyingq · on July 26, 2021

Handy for testing that your code doesn't have any endian bugs that might crop up if someone tried compiling it on older RISC hardware?

Or perhaps very slightly more efficient networking code, since big endian is the default order for many operations there?

brandmeyer · on July 27, 2021

TMS570 dual-lockstep automotive safety controllers are hardwired to be big-endian.

Their close cousins the RM57 line look like nearly identical parts that are little-endian. I believe, but cannot prove, that the only difference in silicon is a factory one-time-programmable setting (fuse).

"Why would anyone want to build a BE machine?" In this case, they are trying to gain market-share from big-endian POWER microcontrollers.

gsnedders · on July 26, 2021

It used to be much more common than it is nowadays; commonality between desktop systems used for development and production has slowly got rid of it. I know certain TVs and set-top boxes at least used to use big-endian, but mostly them long-ago migrated to little-endian nowadays.

jedisct1 · on July 26, 2021

Looks like the 1.0 tag is only about the Rust implementation. The Changelog only lists Rust things.

The algorithm itself hasn't changed, which is great!

staticassertion · on July 26, 2021

So what are the next steps for cryptographic hash functions? Is it going to just be "make even faster" ? Or are we chasing new properties, or stronger guarantees (or hopes?) of existing properties?

tptacek · on July 26, 2021

Variants of "make even faster" (including "making keys smaller" or "allow to run on more limited hardware") is most of what drives new core cryptographic primitives. There's a tweet from one of the BLAKE authors (@veorq) saying what I think a bunch of cryptography engineers believe, that it's unlikely that SHA2 will ever be directly broken.

There are security improvements in new primitives, often based on a notion of "misuse-resistance". SHA2 isn't broken; it remains quite strong. But it's a classic Merkle Damgard design, which means you can take the output of SHA2 and hash more data into it, which breaks simple keyed hash constructions like H(k, m) (this is why we have HMAC!). SHA3 and other modern hashes don't have this property; you can safely build simple keyed hash constructions out of them. You can see the same thing with AEADs (the SIVs vs. GCM) and with curves (Curve25519, for instance, where keys are --- don't @ me --- just simple random byte strings).

pbsd · on July 26, 2021

There is also more/different functionality (e.g., designing a permutation or a tweakable blockcipher instead of a plain blockcipher, or a hash based on those), or better analyzability---making primitives that are simpler to "prove" (or make an argument for) secure against some classes of attacks.

And these days there are also the primitives purposefully designed to run in blockchain....whatever it is, using large GF(p) or GF(2^n) field operations as components. This mostly falls under the "more limited hardware" umbrella.

ogogmad · on July 26, 2021

> you can take the output of SHA2 and hash more data into it

Isn't this problem fixed by simply dropping some bits from the end? See SHA-224 for instance.

tptacek · on July 26, 2021

Yes, though in a MAC construction, you'd just use HMAC anyways. People still use HMAC with SHA3, even though it's unnecessary, and I don't see anything wrong with that either.

cesarb · on July 26, 2021

> Curve25519, for instance, where keys are --- don't @ me --- just simple random byte strings

IIRC, just simple random byte strings with a couple of specific bits forced to 0 and another couple of specific bits forced to 1.

tptacek · on July 26, 2021

Don't @ me! :)

infogulch · on July 26, 2021

An associative / non-commutative hash function over lists. This would mean that the hash of a list, H(list1), could be "concatenated" with another hash, H(list2), to find the hash of the concatenation and be equivalent to H(list1+list2). I think such a construction could be huge for cryptographic verification of data in data structures while avoiding baking in incidental details about the data structure itself.

I hesitate to link to it because I already know of a potential attack, but I played around with an implementation that does this with matrix multiplication over galois field elements of reinterpreted hash digests [1]. I mean, the associative / non-commutative part works nicely, but I expect that factoring the matrix is possible which makes it utterly useless as a cryptographic primitive [2]. "don't roll your own crypto" would apply, but I'm obviously not using such an academic exercise in production.

[1]: https://blog.infogulch.com/2021/07/15/Merklist-GF.html

[2]: https://math.stackexchange.com/questions/4200988/using-rando...

kragen · on July 27, 2021

This is super interesting, thanks! I just wrote a related note in http://canonical.org/~kragen/dernocua.git in text/hash-consing-ropes.md, linking to those two notes; my interest is using it for hash consing, like symbol interning but for all your strings.

infogulch · on July 27, 2021

Something that might interest you even more is the related questions by Jason Hise that I linked to. They explored using 8x8 bit-element matrices as an associative hash of a string and apparently it works pretty good as a hash for string keys in a hashmap. Both questions, answers, and comments all have some interesting context, and Jason answers some questions I had about their usage in comments on the second question.

Using 8x8 Binary Matrices as a hash - https://math.stackexchange.com/questions/1902462/using-8x8-b...

Finding Prime Binary Matrices - https://math.stackexchange.com/questions/1914853/finding-pri...

kragen · on Aug 2, 2021

It's a pretty interesting idea! If you were implementing it in hardware, it would probably be faster. In software, I suspect that my suggestion of composition of linear forms in the ring ℤ/nℤ would be more efficient.

andyferris · on July 27, 2021

I agree - such a thing would be really, really useful!

ogogmad · on July 26, 2021

Not an expert, but it would be nice to provide security proofs. Not a must-have thing, perhaps, but certainly nice to have. One such example of a "provably secure" hash function is this one: https://en.wikipedia.org/wiki/SWIFFT#:~:text=In%20cryptograp.... Unfortunately, it's quite slow, and can't be used as a "random oracle".

A lot of security proofs assume the random oracle model (but not for hash functions though, which are supposed to implement the "random oracle"). Strictly speaking, the model is incorrect, because a hash function is never a random oracle. I don't know if a suitable substitute has been found.

chronogram · on July 26, 2021

Could someone explain how these versions in hash algorithms work please? As I understand it version 0.3.7 from last year would produce the same hashes as version 1.0, so is the algorithm itself finished long ago and it's just the tooling that got updated over time?

maqp · on July 26, 2021

Platform support is one aspect, another is re-implementing the algorithm in constant time to prevent side channel leaks. Also perhaps the code was edited to be stable under different fuzzers. Note: this is all speculation and generalization; I haven't looked at BLAKE3 commits at all.

cesarb · on July 26, 2021

It's not the version of the BLAKE3 hash algorithm (which has been finished a long time ago and won't change anymore), it's the version of the reference implementation. The changes from 0.3.7 were just fixing a build issue on MSVC (this was release 0.3.8), and breaking changes to the API of the reference implementation, mostly removing parts of the API of the reference implementation which weren't stable enough for a 1.0 release.

johnisgood · on July 26, 2021

Can someone who knows Rust tell me what "<&[u8; CHUNK_LEN]" is exactly?

loeg · on July 26, 2021

Reference (“&”) to a slice (“[]”) of CHUNK_LEN bytes (“u8”, unsigned 8-bit integers).

johnisgood · on July 26, 2021

Thank you! What is "<"? Full line is:

  let mut chunks = ArrayVec::<&[u8; CHUNK_LEN], NUM_INPUTS>::new();

Edit:

I probably should not comment to everyone one-by-one, so thank you all for the answers!

SAI_Peregrinus · on July 26, 2021

Generic type. Just like C++ or Java, really.

So `ArrayVec::<T, const CAPACITY: usize>` is an ArrayVec of items of type T, with capacity CAPACITY (of type usize, think size_t). An ArrayVec is a vector backed by a fixed-size array. Basically an array with convenience methods on it.[1]

In this case, T is `&[u8; CHUNK_LEN]` and `CAPACITY` is `NUM_INPUTS`. `&[u8; CHUNK_LEN]` is roughly equivalent to a `uint8_t* array[CHUNK_LEN]`.

So to simplify even more it's a fixed-size vector of byte arrays. The details of why it's not quite equivalent to that C code (how Rust checks the safety) are beyond the scope of this comment.

[1] https://docs.rs/arrayvec/0.7.1/arrayvec/struct.ArrayVec.html

Tehnix · on July 26, 2021

In Rust, <> are used for type arguments. E.g. if you make a function with a generic type, it would be `fn my_func<T>(arg1: T) { ... }`

Now, what we have here is commonly referred to as the Turbo Fish[0] operator (the `::<>` syntax). In this case it's there to help let the compiler know which concrete type you want to create your new `ArrayVec` with :)

[0]: https://doc.rust-lang.org/1.30.0/book/first-edition/generics...

stusmall · on July 26, 2021

It is the opening of a generic type parameter. In this case ArrayVec[1] takes two, a type T for the element type and a const for the maximum size of the ArrayVec. The use of ::<> is affectionately known as "the turbofish" and is a way to give the compiler type information when it can't infer it.

1. https://docs.rs/arrayvec/0.7.1/arrayvec/struct.ArrayVec.html

tlamponi · on July 26, 2021

Those are used for generic in that case.

Foo<T> supports more types, often limited with certain traits they must support.

Here it's passing the actual type on instantiation, in that case there are two types: an u8 slice and a number of inputs, just guestimating the latter as I did not check the original definition in the source.

See https://doc.rust-lang.org/book/ch10-01-syntax.html for a better explanation.

cornstalks · on July 26, 2021

<> is used in Rust similar to how C++ uses them in templates.

ArrayVec::new() is a generic function that takes two generic arguments: a type (&[u8; CHUNK_LEN]) and a value (NUM_INPUTS). So ArrayVec is the structure, <...> are the generic arguments (I replaced stuff with ellipses for brevity), and new() is the function call. :: is used as a "path" separator to syntactically differentiate between all these different parts.

chipsa · on July 26, 2021

That looks like a template instantiation to me. It's "<&[u8; CHUNK_LEN], NUM_INPUTS>" So the angle brackets are wrapping two arguments to the template.

aortega · on July 27, 2021

Rust syntax really is crazy. It's designed to be utterly incomprehensible.

cornstalks · on July 26, 2021

Minor correction: it's a reference to an array, not a slice.

xucheng · on July 26, 2021

Performance aside, I am wondering what is the current state-of-the-art cryptography analysis over BLAKE3?

aidenn0 · on July 26, 2021

> Secure, unlike MD5 and SHA-1. And secure against length extension, unlike SHA-2.

Aren't some variants of SHA-2 secure against length extension (like SHA512/256)?

pbsd · on July 26, 2021

Correct, the truncated versions of SHA2 are secure against length extension.

maqp · on July 26, 2021

So SHA224 and SHA384? They're not exactly common. SHA256 is pretty much the standard and SHA512 is usually used for hashing larger files due to the larger block size and thus faster speed. I don't think I've ever seen 224/384 used anywhere.

tialaramex · on July 26, 2021

Importantly SHA-512/256 does not mean "Either of these two different functions" but instead another function, which is similar to (but slightly different from) performing SHA-512 and then throwing away all but 256 bits.

This prevents length extension because you've thrown away 256 bits the attacker needs to perform their attack.

SAI_Peregrinus · on July 26, 2021

SHA224, SHA384, and SHA512/256 (yes, that's NOT the same as SHA512/SHA256, it's SHA512 truncated to 256 bits) are the truncated versions of SHA2. Rarely used. Confusingly named (in the case of SHA512/256).

adrian_b · on July 26, 2021

For processors that have hardware SHA256 (AMD Zen, most 64-bit ARM, some Intel models), SHA224 can be computed with the same instructions.

For 64-bit CPUs without hardware SHA256, SHA512 & SHA384 are the fastest and they have identical speed (as only the values of some constants differ, while the algorithm is the same).

Most libraries and hash utilities implement all these variants, so any variant can be chosen without problems.

aidenn0 · on July 26, 2021

SHA-512/256 is what I default to since it is faster than SHA-256 but doesn't have the ridiculously long hash length of SHA-512

adrian_b · on July 26, 2021

It is faster on any 64-bit CPU without hardware SHA256, i.e. on most Intel CPUs.

On CPUs with hardware SHA256 (AMD Zen, 64-bit ARM, some recent Intel CPUs), SHA256 is faster.

encryptluks2 · on July 26, 2021

Wonder if they'll ever get this added to Alpine/Busybox as a default package? Even b2sum is missing.

doovd · on July 26, 2021

How does it compare against xxhash?

chronogram · on July 26, 2021

xxHash is not cryptographic, whereas BLAKE3 is. You can use the former for things such as file integrity from errors checking, and the latter for file integrity from malicious intent checking. Or, one is for the file system and the other is for the internet. You should use something altogether slower like bcrypt for passwords though, because then you don't want to have speed in case someone has a copy of your database with a million passwords and tries to dictionary attack those million.

memco · on July 26, 2021

I like to check smhasher for comparison of many hash implementations: https://github.com/rurban/smhasher. It’s a bit clunky to search for the exact items you want to compare, but there’s lots of data and a good summary under the table.

unsigner · on July 26, 2021

xxhash is not suitable for cryptographic purposes, but much faster.

kebman · on July 26, 2021

I assume MD5 is in there too now. Just out of curiosity, what are the most common use cases for non-cryptographic hash functions?

Dylan16807 · on July 26, 2021

MD5 isn't particularly fast. Basic non-vector BLAKE3 beats it, and SHA1 beats it by even more.

kebman · on July 27, 2021

Yes, I'm already aware of that, thank you. Could you please expand upon what the main use cases are for non-cryptographic hashes?

Dylan16807 · on July 27, 2021

> Yes, I'm already aware of that, thank you.

Oh. Well you said "in there too now" so I thought you were saying it was like xxhash in being insecure and fast. Even if that wasn't your intent, I wanted to make it clear to other people.

> Could you please expand upon what the main use cases are for non-cryptographic hashes?

Very fast hashes are good for hash maps, or even critical for them. That covers a huge amount of uses. And judging by the list on xxhash's website, a bunch of file transfer programs use it as well as bloom filter implementations, along with the databases that are probably using it for various maps.

kebman · on July 27, 2021

Cool! I did make a duplicate file finder once. Seems I'll have to update it with this algo then! Thanks for mentioning it! :D

adkadskhj · on July 26, 2021

Hash Map/Set keys is a common one, for example.

loeg · on July 26, 2021

Xxhash is a non-cryptographic hash and is much faster.

peepeepoopoo69 · on July 26, 2021

In b4 blakecoin

Foobar8568 · on July 26, 2021

And I thought that JPA was slacking all this time drinking beers :o