String

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally understood as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. A string may also denote more general arrays or other sequences (or list) data types and structures.

Depending on programming language and precise data type used, a variable declared to be a string may either cause storage in memory to be statically allocated for a predetermined maximum length or employ dynamic allocation to allow it to hold a variable number of elements.

When a string appears literally in the source code, it is known as a string literal or an anonymous string.

In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set called an alphabet.

Formal theory


Let Σ be a non-empty finite set of symbols (alternatively called characters), called the alphabet. No assumption is made about the nature of the symbols. A string (or word) over Σ is any finite sequence of symbols from Σ.[2] For example, if Σ = {0, 1}, then 01011 is a string over Σ.

The length of a string s is the number of symbols in s (the length of the sequence) and can be any non-negative integer; it is often denoted as |s|. The empty string is the unique string over Σ of length 0, and is denoted ε or λ.[2][3]

The set of all strings over Σ of length n is denoted Σn. For example, if Σ = {0, 1}, then Σ2 = {00, 01, 10, 11}. Note that Σ0 = {ε} for any alphabet Σ.

The set of all strings over Σ of any length is the Kleene closure of Σ and is denoted Σ*. In terms of Σn,

{\displaystyle \Sigma ^{*}=\bigcup _{n\in \mathbb {N} \cup \{0\}}\Sigma ^{n}} \Sigma^{*} = \bigcup_{n \in \mathbb{N} \cup \{0\}} \Sigma^{n}
For example, if Σ = {0, 1}, then Σ* = {ε, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, ...}. Although the set Σ* itself is countably infinite, each element of Σ* is a string of finite length.

A set of strings over Σ (i.e. any subset of Σ*) is called a formal language over Σ. For example, if Σ = {0, 1}, the set of strings with an even number of zeros, {ε, 1, 00, 11, 001, 010, 100, 111, 0000, 0011, 0101, 0110, 1001, 1010, 1100, 1111, ...}, is a formal language over Σ.

Concatenation and substrings

Concatenation is an important binary operation on Σ*. For any two strings s and t in Σ*, their concatenation is defined as the sequence of symbols in s followed by the sequence of characters in t, and is denoted st. For example, if Σ = {a, b, ..., z}, s = bear, and t = hug, then st = bearhug and ts = hugbear.

String concatenation is an associative, but non-commutative operation. The empty string ε serves as the identity element; for any string s, εs = sε = s. Therefore, the set Σ* and the concatenation operation form a monoid, the free monoid generated by Σ. In addition, the length function defines a monoid homomorphism from Σ* to the non-negative integers (that is, a function {\displaystyle L:\Sigma ^{*}\mapsto \mathbb {N} \cup \{0\}} L: \Sigma^{*} \mapsto \mathbb{N} \cup \{0\}, such that {\displaystyle L(st)=L(s)+L(t)\quad \forall s,t\in \Sigma ^{*}} L(st)=L(s)+L(t)\quad \forall s,t\in\Sigma^*).

A string s is said to be a substring or factor of t if there exist (possibly empty) strings u and v such that t = usv. The relation "is a substring of" defines a partial order on Σ*, the least element of which is the empty string.

Prefixes and suffixes

A string s is said to be a prefix of t if there exists a string u such that t = su. If u is nonempty, s is said to be a proper prefix of t. Symmetrically, a string s is said to be a suffix of t if there exists a string u such that t = us. If u is nonempty, s is said to be a proper suffix of t. Suffixes and prefixes are substrings of t. Both the relations "is a prefix of" and "is a suffix of" are prefix orders.

Comments

Popular posts from this blog

What's the fastest way to concatenate two Strings in Java?