UTF-8 Validation

mediumBit ManipulationArrays & HashingTime: O(n) · Space: O(1)

mediumTime: O(n)Space: O(1)

Signals to notice

validate byte sequence encodingmulti-byte character rulesbit pattern checking

Brute force first

No simpler alternative — you must validate the encoding rules byte by byte. It is a fair place to begin because it matches the surface of the question, yet it does not capture the deeper structure that makes the problem simpler.

The key insight

Scan left to right. Check the leading bits of each byte to determine if it starts a 1, 2, 3, or 4-byte character. Then verify the next N-1 bytes start with '10'. Instead of recomputing the world every time, you preserve just enough context to let the next decision become obvious.

What must stay true

The first byte's leading bits determine how many continuation bytes follow. Each continuation byte must start with '10' (bits 7-6). Any violation means invalid encoding. As long as that statement keeps holding, you can trust the steps built on top of it.

Easy way to go wrong

Not checking that continuation bytes actually start with '10' — a valid first byte followed by wrong continuation bytes is still invalid. When the code becomes mechanical before the idea is clear, small edge cases start breaking the whole story.

Arrays & Hashing Pattern