Strings #
This file builds on the UTF-8 verification in Init.Data.String.Decode and the preliminary
material in Init.Data.String.Defs to get the theory of strings off the ground. In particular,
in this file we construct the decoding function String.data : String → List Char and show that
it is a two-sided inverse to List.asString : List Char → String. This in turn enables us to
understand the validity predicate on positions in terms of lists of characters, which forms the
basis for all further verification for strings.
Equations
Instances For
Decodes an array of bytes that encode a string as UTF-8 into the corresponding string, or panics if the array is not a valid UTF-8 encoding of a string.
Equations
Instances For
Converts a string to a list of characters.
Since strings are represented as dynamic arrays of bytes containing the string encoded using UTF-8, this operation takes time and space linear in the length of the string.
Examples:
Equations
Instances For
Converts a string to a list of characters.
Since strings are represented as dynamic arrays of bytes containing the string encoded using UTF-8, this operation takes time and space linear in the length of the string.
Examples:
Equations
Instances For
Equations
Returns true if p is a valid UTF-8 position in the string s.
This means that p ≤ s.rawEndPos and p lies on a UTF-8 character boundary. At runtime, this
operation takes constant time.
Examples:
String.Pos.isValid "abc" ⟨0⟩ = trueString.Pos.isValid "abc" ⟨1⟩ = trueString.Pos.isValid "abc" ⟨3⟩ = trueString.Pos.isValid "abc" ⟨4⟩ = falseString.Pos.isValid "𝒫(A)" ⟨0⟩ = trueString.Pos.isValid "𝒫(A)" ⟨1⟩ = falseString.Pos.isValid "𝒫(A)" ⟨2⟩ = falseString.Pos.isValid "𝒫(A)" ⟨3⟩ = falseString.Pos.isValid "𝒫(A)" ⟨4⟩ = true
Equations
Instances For
Equations
Copies a region of a string to a new string.
The region of s from b (inclusive) to e (exclusive) is copied to a newly-allocated String.
If b's offset is greater than or equal to that of e, then the resulting string is "".
If possible, prefer String.slice, which avoids the allocation.
Equations
Instances For
Equations
Instances For
Efficiently checks whether a position is at a UTF-8 character boundary of the slice s.
Equations
Instances For
Equations
Given a valid position on s.str which is within the bounds of the slice s, obtains the
corresponding valid position on s.
Equations
Instances For
Given a slice and a valid position within the slice, obtain a new slice on the same underlying string by replacing the start of the slice with the given position.
Equations
Instances For
Equations
Instances For
Given a slice and a valid position within the slice, obtain a new slice on the same underlying string by replacing the end of the slice with the given position.
Equations
Instances For
Equations
Instances For
Given a slice and two valid positions within the slice, obtain a new slice on the same underlying string formed by the new bounds, or panic if the given end is strictly less than the given start.
Equations
Instances For
Equations
Instances For
Equations
Instances For
Returns the byte at the given position in the string, or panics if the position is the end position.
Equations
Instances For
Returns the character at the position pos of a string, taking a proof that p is not the
past-the-end position.
This function is overridden with an efficient implementation in runtime code.
Examples:
Equations
Instances For
Advances a valid position on a slice to the next valid position, or panics if the given position is the past-the-end position.
Equations
Instances For
Equations
Instances For
Returns the previous valid position before the given position, or panics if the position is the start position.
Equations
Instances For
Constructs a valid position on s from a position and a proof that it is valid.
Equations
Instances For
Constructs a valid position s from a position, panicking if the position is not valid.
Equations
Instances For
Advances a valid position on a string to the next valid position, given a proof that the position is not the past-the-end position, which guarantees that such a position exists.
Equations
Instances For
Advances a valid position on a string to the next valid position, or panics if the given position is the past-the-end position.
Equations
Instances For
Returns the previous valid position before the given position, or panics if the position is the start position.
Equations
Instances For
Constructs a valid position on s from a position and a proof that it is valid.
Equations
Instances For
Constructs a valid position s from a position, panicking if the position is not valid.
Equations
Instances For
Returns the character at position p of a string. If p is not a valid position, returns the
fallback value (default : Char), which is 'A', but does not panic.
This function is overridden with an efficient implementation in runtime code. See
String.Pos.Raw.utf8GetAux for the reference implementation.
This is a legacy function. The recommended alternative is String.Pos.get, combined with
String.pos or another means of obtaining a String.Pos.
Examples:
"abc".get ⟨1⟩ = 'b'"abc".get ⟨3⟩ = (default : Char)because byte3is at the end of the string."L∃∀N".get ⟨2⟩ = (default : Char)because byte2is in the middle of'∃'.
Equations
Instances For
Equations
Instances For
Returns the character at position p of a string. If p is not a valid position, returns none.
This function is overridden with an efficient implementation in runtime code. See
String.utf8GetAux? for the reference implementation.
This is a legacy function. The recommended alternative is String.Pos.get, combined with
String.pos? or another means of obtaining a String.Pos.
Examples:
Equations
Instances For
Returns the character at position p of a string. Panics if p is not a valid position.
See String.pos? and String.Pos.get for a safer alternative.
This function is overridden with an efficient implementation in runtime code. See
String.utf8GetAux for the reference implementation.
This is a legacy function. The recommended alternative is String.Pos.get, combined with
String.pos! or another means of obtaining a String.Pos.
Examples
"abc".get! ⟨1⟩ = 'b'
Equations
Instances For
Equations
Instances For
The slice from the beginning of s up to p (exclusive).
Equations
Instances For
Equations
Instances For
The slice from p (inclusive) up to the end of s.
Equations
Instances For
Equations
Instances For
Given a string and two valid positions within the string, obtain a slice on the string formed by the two positions.
This happens to be equivalent to the constructor of String.Slice.
Equations
Instances For
Given a string and two valid positions within the string, obtain a slice on the string formed by the new bounds, or panic if the given end is strictly less than the given start.
Equations
Instances For
Equations
Instances For
Copies a region of a slice to a new string.
The region of s from b (inclusive) to e (exclusive) is copied to a newly-allocated String.
If b's offset is greater than or equal to that of e, then the resulting string is "".
If possible, prefer Slice.slice, which avoids the allocation.
Equations
Instances For
Returns the next position in a string after position p. If p is not a valid position or
p = s.endPos, returns the position one byte after p.
A run-time bounds check is performed to determine whether p is at the end of the string. If a
bounds check has already been performed, use String.next' to avoid a repeated check.
This is a legacy function. The recommended alternative is String.Pos.next or one of its
variants like String.Pos.next?, combined with String.pos or another means of obtaining
a String.ValisPos.
Some examples of edge cases:
"abc".next ⟨3⟩ = ⟨4⟩, since3 = "abc".endPos"L∃∀N".next ⟨2⟩ = ⟨3⟩, since2points into the middle of a multi-byte UTF-8 character
Examples:
Equations
Instances For
Equations
Instances For
Returns the position in a string before a specified position, p. If p = ⟨0⟩, returns 0. If p
is greater than rawEndPos, returns the position one byte before p. Otherwise, if p occurs in the
middle of a multi-byte character, returns the beginning position of that character.
For example, "L∃∀N".prev ⟨3⟩ is ⟨1⟩, since byte 3 occurs in the middle of the multi-byte
character '∃' that starts at byte 1.
This is a legacy function. The recommended alternative is String.Pos.prev or one of its
variants like String.Pos.prev?, combined with String.pos or another means of obtaining
a String.Pos.
Examples:
"abc".get ("abc".rawEndPos |> "abc".prev) = 'c'"L∃∀N".get ("L∃∀N".rawEndPos |> "L∃∀N".prev |> "L∃∀N".prev |> "L∃∀N".prev) = '∃'
Equations
Instances For
Equations
Instances For
Returns true if a specified byte position is greater than or equal to the position which points to
the end of a string. Otherwise, returns false.
Examples:
(0 |> "abc".next |> "abc".next |> "abc".atEnd) = false(0 |> "abc".next |> "abc".next |> "abc".next |> "abc".next |> "abc".atEnd) = true(0 |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".atEnd) = false(0 |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".next |> "L∃∀N".atEnd) = true"abc".atEnd ⟨4⟩ = true"L∃∀N".atEnd ⟨7⟩ = false"L∃∀N".atEnd ⟨8⟩ = true
Equations
Instances For
Equations
Instances For
Returns the character at position p of a string. Returns (default : Char), which is 'A', if
p is not a valid position.
Requires evidence, h, that p is within bounds instead of performing a run-time bounds check as
in String.get.
A typical pattern combines get' with a dependent if-expression to avoid the overhead of an
additional bounds check. For example:
def getInBounds? (s : String) (p : String.Pos) : Option Char :=
if h : s.atEnd p then none else some (s.get' p h)
Even with evidence of ¬ s.atEnd p, p may be invalid if a byte index points into the middle of a
multi-byte UTF-8 character. For example, "L∃∀N".get' ⟨2⟩ (by decide) = (default : Char).
This is a legacy function. The recommended alternative is String.Pos.get, combined with
String.pos or another means of obtaining a String.Pos.
Examples:
"abc".get' 0 (by decide) = 'a'let lean := "L∃∀N"; lean.get' (0 |> lean.next |> lean.next) (by decide) = '∀'
Equations
Instances For
Equations
Instances For
Returns the next position in a string after position p. The result is unspecified if p is not a
valid position.
Requires evidence, h, that p is within bounds. No run-time bounds check is performed, as in
String.next.
A typical pattern combines String.next' with a dependent if-expression to avoid the overhead of
an additional bounds check. For example:
def next? (s : String) (p : String.Pos) : Option Char :=
if h : s.atEnd p then none else s.get (s.next' p h)
This is a legacy function. The recommended alternative is String.Pos.next, combined with
String.pos or another means of obtaining a String.Pos.
Example:
Equations
Instances For
Equations
Instances For
Returns the first position where the two strings differ.
If one string is a prefix of the other, then the returned position is the end position of the shorter string. If the strings are identical, then their end position is returned.
Examples:
"tea".firstDiffPos "ten" = ⟨2⟩"tea".firstDiffPos "tea" = ⟨3⟩"tea".firstDiffPos "teas" = ⟨3⟩"teas".firstDiffPos "tea" = ⟨3⟩
Equations
Instances For
Equations
Instances For
Creates a new string that consists of the region of the input string delimited by the two positions.
The result is "" if the start position is greater than or equal to the end position or if the
start position is at the end of the string. If either position is invalid (that is, if either points
at the middle of a multi-byte UTF-8 character) then the result is unspecified.
This is a legacy function. The recommended alternative is String.Pos.extract, but usually
it is even better to operate on String.Slice instead and call String.Slice.copy (only) if
required.
Examples:
"red green blue".extract ⟨0⟩ ⟨3⟩ = "red""red green blue".extract ⟨3⟩ ⟨0⟩ = """red green blue".extract ⟨0⟩ ⟨100⟩ = "red green blue""red green blue".extract ⟨4⟩ ⟨100⟩ = "green blue""L∃∀N".extract ⟨1⟩ ⟨2⟩ = "∃∀N""L∃∀N".extract ⟨2⟩ ⟨100⟩ = ""
Equations
Instances For
Returns the character index that corresponds to the provided position (i.e. UTF-8 byte index) in a string.
If the position is at the end of the string, then the string's length in characters is returned. If the position is invalid due to pointing at the middle of a UTF-8 byte sequence, then the character index of the next character after the position is returned.
Examples:
"L∃∀N".offsetOfPos ⟨0⟩ = 0"L∃∀N".offsetOfPos ⟨1⟩ = 1"L∃∀N".offsetOfPos ⟨2⟩ = 2"L∃∀N".offsetOfPos ⟨4⟩ = 2"L∃∀N".offsetOfPos ⟨5⟩ = 3"L∃∀N".offsetOfPos ⟨50⟩ = 4
Equations
Instances For
Equations
Instances For
Equations
Instances For
Checks whether substrings of two strings are equal. Substrings are indicated by their starting
positions and a size in UTF-8 bytes. Returns false if the indicated substring does not exist in
either string.
This is a legacy function. The recommended alternative is to construct slices representing the
strings to be compared and use the BEq instance of String.Slice.