Character (computing)
Internal representation of a single character (symbol, sequence of one or more bytes) used within a computer or system.
— Wikipedia
What represent character in computer?
Characters represent letters of the alphabet, punctuation, or other symbols
(synonym for code point).
Example of characters can be found in ASCII note.
Characters are typically combined into ==strings==.
Historically character can be used to denote contiguous bits, size of
character was various: 4, 5, 6, 8 bits (most popular). How modern systems
implement characters?
All modern systems use a varying-size sequence of these fixed-sized pieces, for
instance UTF-8 uses a varying number of 8-bit code units to define a “code
point” and Unicode uses varying number of those to define a “character”.
Character sets, coded character sets, and encodings
Some notes from great w3.org character encodings article 1.
A coded character set (unlike a character set) is a mapping, when each character assigned to unique number.
Units of a coded character set are known as code points, which represents
the positon of a character. For example code point for the letter á
in the
Unicode coded character set is 225
in decimal, or 0xE1
in hexadecimal
notation.
How character encoding connected with coded character set?
The character encoding reflects the way the coded character set is mapped to
bytes for manipulation in a computer, some like characters ↔ bytes.
character encoding process.excalidrawThe process of character encoding
Is alphabet character always referring to single code point in Unicode?
Actually no, for user-perceived character (alphabet the smallest component)
in complex scenarion may actually be a sequence of code points. For example, the
Vietnamese letter ề
will be perceived as a single letter even if the
underlying code point sequence is U+0065 LATIN SMALL LETTER E
+ U+0302 COMBINING CIRCUMFLEX ACCENT
+ U+0300 COMBINING GRAVE ACCENT
.
What is a grapheme cluster in Unicode?
A grapheme cluster is a collection of symbols that together represent an
individual character that the user will see within a string on the screen.
Sometimes need to use multiple grapheme clusters to represent one character
(ক্ষ).
What is the relationship between glyphs and code points in a font system?
A font can be described as collection of glyphs. In simple scenario, a glyph
is the visual representation of a code point. In fact, more than one glyph may
be used to represent a single code point, and multiple code points may be
represented by a single glyph (emoji character for “family”
👨👩👧👧).