I was wondering, if you couldn’t add UTF-8 support to CPUs.

There are a couple of commands that need to run quite frequently, so that having hardware acceleration should prove useful - for example counting the (UTF-8) length of a string: In UTF-8 a character takes between 1 (ASCII) and 6 bytes in storage. This is efficient for many languages such as European languages if they want to be able to display any character when needed. (In fact, the common character set “latin 1” aka iso8859-1 does not contain the european currency sign “euro”, which resulted in a modified character set, “latin 9”)

Similar to the “rep” prefix of some CPUs this could prove useful, even when you’ll often be slowed down by memory access.

I remember having seen memcpy and memcmp being realized with bytewise operations prefixed with the “rep” statement. Especially memcpy, isn’t that faster when doing 32/64-bit operations and just doing the odd few bytes separately?