Documentation

Init.Data.String.Decode

UTF-8 decoding #

This file contains the low-level proof that UTF-8 decoding and encoding are inverses. An important corollary is that attempting to decode a byte array that is valid UTF-8 will succeed, which is required for the definition of String.toList.

Char.utf8Size #

theorem Char.utf8Size_eq_two_iff {c : Char} :
c.utf8Size = 2 127 < c.val c.val 2047
theorem Char.utf8Size_eq_three_iff {c : Char} :
c.utf8Size = 3 2047 < c.val c.val 65535

utf8EncodeChar #

utf8EncodeChar low-level API #

Returns the sequence of bytes in a character's UTF-8 encoding.

Equations
    Instances For

      utf8EncodeChar BitVec API #

      Size one #

      Size two #

      Size three #

      Size four #

      parseFirstByte #

      parseFirstByte definition #

      Instances For
        @[inline]
        Equations
          Instances For

            parseFirstByte low-level API #

            parseFirstByte BitVec API #

            Size one #

            Size two #

            Size three #

            Size four #

            isInvalidContinuationByte definition & API #

            @[inline]
            Equations
              Instances For

                parseFirstByte, isInvalidContinuationByte and utf8EncodeChar #

                assemble₁ #

                @[inline]
                Equations
                  Instances For

                    assemble₂ #

                    @[inline]
                    Equations
                      Instances For
                        @[inline]
                        Equations
                          Instances For
                            @[inline]
                            Equations
                              Instances For

                                assemble₃ #

                                @[inline]
                                Equations
                                  Instances For
                                    @[inline]
                                    Equations
                                      Instances For
                                        @[inline]
                                        Equations
                                          Instances For

                                            assemble₄ #

                                            @[inline]
                                            Equations
                                              Instances For
                                                @[inline]
                                                Equations
                                                  Instances For
                                                    @[inline]
                                                    Equations
                                                      Instances For
                                                        @[inline]

                                                        Decodes and returns the Char whose UTF-8 encoding begins at i in bytes.

                                                        Returns none if i is not the start of a valid UTF-8 encoding of a character.

                                                        Equations
                                                          Instances For
                                                            @[inline]
                                                            Equations
                                                              Instances For

                                                                utf8DecodeChar? low-level API #

                                                                Main theorems #

                                                                utf8DecodeChar?_utf8EncodeChar_append and toByteArray_of_utf8DecodeChar?_eq_some are the two main results that together imply that UTF-8 encoding and decoding are inverse.

                                                                Corollaries #

                                                                @[inline]
                                                                def ByteArray.utf8DecodeChar (bytes : ByteArray) (i : Nat) (h : (bytes.utf8DecodeChar? i).isSome = true) :

                                                                Decodes and returns the Char whose UTF-8 encoding begins at i in bytes.

                                                                This function requires a proof that there is, in fact, a valid Char at i. utf8DecodeChar? is an alternative function that returns Option Char instead of requiring a proof ahead of time.

                                                                Equations
                                                                  Instances For
                                                                    theorem ByteArray.utf8DecodeChar_extract_congr {bytes : ByteArray} (i j j' : Nat) {h : ((bytes.extract i j).utf8DecodeChar? 0).isSome = true} {h' : ((bytes.extract i j').utf8DecodeChar? 0).isSome = true} :
                                                                    (bytes.extract i j).utf8DecodeChar 0 h = (bytes.extract i j').utf8DecodeChar 0 h'

                                                                    UInt8.IsUTF8FirstByte #

                                                                    Predicate for whether a byte can appear as the first byte of the UTF-8 encoding of a Unicode scalar value.

                                                                    Equations
                                                                      Instances For
                                                                        @[inline]
                                                                        Equations
                                                                          Instances For