Documentation

Init.Data.String.Decode

UTF-8 decoding #

This file contains the low-level proof that UTF-8 decoding and encoding are inverses. An important corollary is that attempting to decode a byte array that is valid UTF-8 will succeed, which is required for the definition of String.toList.

Char.utf8Size #

theorem Char.utf8Size_eq_two_iff {c : Char} :
c.utf8Size = 2 127 < c.val c.val 2047
theorem Char.utf8Size_eq_three_iff {c : Char} :
c.utf8Size = 3 2047 < c.val c.val 65535

utf8EncodeChar #

utf8EncodeChar low-level API #

Returns the sequence of bytes in a character's UTF-8 encoding.

Instances For

    utf8EncodeChar BitVec API #

    Size one #

    Size two #

    Size three #

    Size four #

    parseFirstByte #

    parseFirstByte definition #

    Instances For

      parseFirstByte low-level API #

      parseFirstByte BitVec API #

      Size one #

      Size two #

      Size three #

      Size four #

      isInvalidContinuationByte definition & API #

      parseFirstByte, isInvalidContinuationByte and utf8EncodeChar #

      assemble₁ #

      @[inline]
      Instances For

        assemble₂ #

        @[inline]
        Instances For
          @[inline]
          Instances For

            assemble₃ #

            @[inline]
            Instances For
              @[inline]
              Instances For

                assemble₄ #

                @[inline]
                Instances For
                  @[inline]
                  Instances For
                    @[inline]
                    Instances For
                      @[inline]

                      Decodes and returns the Char whose UTF-8 encoding begins at i in bytes.

                      Returns none if i is not the start of a valid UTF-8 encoding of a character.

                      Instances For
                        @[inline]
                        Instances For

                          utf8DecodeChar? low-level API #

                          Main theorems #

                          utf8DecodeChar?_utf8EncodeChar_append and toByteArray_of_utf8DecodeChar?_eq_some are the two main results that together imply that UTF-8 encoding and decoding are inverse.

                          Corollaries #

                          @[inline]
                          def ByteArray.utf8DecodeChar (bytes : ByteArray) (i : Nat) (h : (bytes.utf8DecodeChar? i).isSome = true) :

                          Decodes and returns the Char whose UTF-8 encoding begins at i in bytes.

                          This function requires a proof that there is, in fact, a valid Char at i. utf8DecodeChar? is an alternative function that returns Option Char instead of requiring a proof ahead of time.

                          Instances For
                            theorem ByteArray.utf8DecodeChar_extract_congr {bytes : ByteArray} (i j j' : Nat) {h : ((bytes.extract i j).utf8DecodeChar? 0).isSome = true} {h' : ((bytes.extract i j').utf8DecodeChar? 0).isSome = true} :
                            (bytes.extract i j).utf8DecodeChar 0 h = (bytes.extract i j').utf8DecodeChar 0 h'

                            UInt8.IsUTF8FirstByte #

                            Predicate for whether a byte can appear as the first byte of the UTF-8 encoding of a Unicode scalar value.

                            Instances For
                              @[implicit_reducible, inline]
                              @[inline]
                              Instances For