Why does the uniq function only work for adjacent lines?

Question

We learn in this lesson that in producing the unique entries in a file, uniq will only compare lines which are adjacent and so it is best used along with sort. Why doesn’t uniq actually return unique entries or simply call sort on our behalf?

Answer

One possible response to this question has to do with the so called Unix Philosophy which, in brief, focuses on (1) designing small functions with one well-defined purpose and (2) chaining these functions together to accomplish more complex tasks. With this in mind, let’s think about uniq. An entry in a file is unique if it is not equal to any other entry. This statement is easy to make but invites many different potential solutions. One of which is to first sort the entries and then compare adjacent lines. Remembering that each function should do one thing, uniq shouldn’t sort your file since sort can do this. It should only do the check for adjacent pairs.

This decision is a subjective, of course, but similar decisions are made with many command line utilities and so it is helpful to keep this perspective in mind.

Furthermore, note that this decision is not necessarily bad. maybe you know that all duplicates are adjacent and you don’t want the overhead of sorting your file. Then uniq will still be appropriate for this case.

17 Likes

The Unix philosophy is a big part of it. It’s also worth noting that ‘uniq’ might just be a bad name. You don’t necessarily only use it for producing unique lines.

For example, if I write a list of instructions like “Go home, go to the shops, go home”, then the instructions probably make sense. I don’t want to delete the final instruction. But if the instructions were “Go home, go home, go to the shops” then that’s different. I can’t go home twice in a row. In a list like this, I might want to delete adjacent duplicates (and only adjacent ones).

“Delete adjacent duplicate lines” is a decent utility to have. But that’s hard to boil down into a short name. It probably gets used with sort often enough, that “uniq” sounded like a better name.

That’s the point of small sharp tools: it gives you a broader range of options. If uniq automatically sorted the text first, then you wouldn’t have the option to use it on its own.

2 Likes