When applying the BETWEEN operator on TEXT values, how are values compared?

Question

In SQL, when applying the BETWEEN operator on a range of TEXT values, each value must be compared somehow to be ordered correctly. What kind of comparison is done on TEXT values?

Answer

In most programming languages, including SQLite and Python, TEXT or string values are compared based on their lexicographical ordering, and when using the BETWEEN operator for a range of TEXT values in SQL, the values will be sorted in this way.

Lexicographical ordering is basically the ordering you would find words in a dictionary. If we had two words, they would be compared starting from their first letter, second letter, and so on, until we find a non-matching letter. The word which has the letter that comes first in the alphabet would ultimately be sorted to come first in this lexicographical ordering.

If two words have different lengths, but match up to the last letter of the shorter word, the shorter word will appear first in the ordering.

Example

A = "Alien"
B = "Aliens"
C = "Alike"

/* 
   Because A and B share the same sequence of characters 
   up to the last character of A, which is shorter, A < B. 

   Also, because "k" comes after "e" in the alphabet, C will 
   come last in the ordering of these 3 words.

   A < B < C
*/
15 Likes

Is there a reference for the priorty of special character ordering?
Eg, what would the order be for:
Aliens
Alien!
Alien?
Alien3

15 Likes

I believe that due to ASCII TABLE value of each character
the order must be the following:

Alien!
Alien3
Alien?
Aliens

10 Likes

On the difference between lexicographical and alphabetical, because I was interested:
On Quora, I found someone explain that “lexicographical order” is more generalized than “alphabetical order”, as you can define any set of symbols, even mathematical ones, to be ordered in a specific way.

I guess it wouldn’t be a major crime to mix up the both, especially as most of the time we will use what we call alphanumerical order.


(https://www.quora.com/Whats-the-difference-between-lexicographical-and-alphabetical-order)

7 Likes

I’ve noticed in the exercises that

SELECT *
FROM movies
WHERE name BETWEEN 'A' AND 'Fz';

returns a list of movies from “A Beautiful Mind” to “Furious Seven”

But

SELECT *
FROM movies
WHERE name BETWEEN 'A' AND 'FZ';

returns a list of movies from “A Beautiful Mind” to “End of Days”…

Is this because SQLite’s lexicographical ordering looks something like

…ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz…

Or is something else causing this discrepency?

6 Likes

im also wandering why. does anybody more expert then us know the answer?

@celeryt0p @core2941988678 as others have mentioned, the default lexicographical order in SQL and many computing languages uses the ASCII table so they can more generally define the order for numerics and special characters.

So you are correct in that sense: it would sort it as
…ABCDEFGHIJKLMNOPQRSTUVWXYZ]^_`abcdefghijklmnopqrstuvwxyz…

Hope this helps!

1 Like