sort Command Purpose Sorts files, merges files that are already sorted, and checks files to determine if they have been sorted. Syntax sort [ -A ] [ -b ] [ -c ] [ -d ] [ -f ] [ -i ] [ -m ] [ -n ] [ -r ] [ -u ] [ -o OutFile ] [ -t Character ] [ -T Directory ] [ -y [ Kilobytes ] ] [ -z RecordSize ] [ [ + [ FSkip ] [ .CSkip ] [ b ] [ d ] [ f ] [ i ] [ n ] [ r ] ] [ - [ FSkip ] [ .CSkip ] [ b ] [ d ] [ f ] [ i ] [ n ] [ r ] ] ] ... [ -k KeyDefinition ] ... [ File ... ] Description The sort command sorts lines in the files specified by the File parameters and writes the result to standard output. If the File parameter specifies more than one file, the sort command con- catenates the files and sorts them as one file. A - (minus sign) in place of a file name specifies standard input. If you do not specify any file names, the command sorts standard input. An output file can be specified with the -o flag. If no flags are specified, the sort command sorts entire lines of the input file based upon the collation order of the current lo- cale. Sort Keys A sort key is a portion of an input line that is specified by a field number and a column number. Fields are parts of input lines that are separated by field separators. The default field separator is a sequence of one or more consecutive blank charac- ters. A different field separator can be specified using the -t flag. The tab and the space characters are the blank characters in the C and English Language locales. When using sort keys, the sort command first sorts all lines on the contents of the first sort key. Next, all the lines whose first sort keys are equal are sorted upon the contents of the second sort key, and so on. Sort keys are numbered according to the order they appear on the command line. If two lines sort equally on all sort keys, the entire lines are then compared based upon the collation order in the current locale. When numbering columns within fields, the blank characters in a default field separator are counted as part of the following field. Leading blanks are not counted as part of the first field, and field separator characters specified by the -t flag are not counted as parts of fields. Leading blank characters can be ignored using the -b flag. Sort keys can be defined using the following two methods: * -k KeyDefinition * FSkip.CSkip (obsolescent version). Sort Key Definition Using the -k Flag The -k KeyDefinition flag uses the following form: -k [FStart][.CStart][Modifier][, [FEnd][.CEnd][Modifier]] The sort key includes all characters beginning with the field specified by the FStart variable and the column specified by the CStart variable and ending with the field specified by the FEnd variable and the column specified by the CEnd variable. Any field or column number in the KeyDefinition variable may be omit- ted. The default values are: FStart Beginning of the line CStart First column in the field FEnd End of the line CEnd Last column of the field The value of the Modifier variable can be one or more of the letters b, d, f, i, n, or r. The modifiers apply only to the field definition they are attached to and have the same effect as the flag of the same letter. The modifier letter b applies only to the end of the field definition to which it is attached. For example: -k 3.2b,3r specifies a sort key beginning in the second nonblank column of the third field and extending to the end of the third field, with the sort on this key to be done in reverse collation order. If the FStart variable and the CStart variable fall beyond the end of the line or after the FEnd variable and the CEnd variable, then the sort key is ignored. A sort key can also be specified in the following manner: [+[FSkip1] [.CSkip1] [Modifier] ] [-[FSkip2] [.CSkip2] [Modifier]] The +FSkip1 variable specifies the number of fields skipped to reach the first field of the sort key and the +CSkip variable specifies the number of columns skipped within that field to reach the first character in the sort key. The -FSkip variable specifies the number of fields skipped to reach the first charac- ter after the sort key, and the -CSkip variable specifies the number of columns to skip within that field. Any of the field and column skip counts may be omitted. The defaults are: FSkip1 Beginning of the line CSkip1 Zero FSkip2 End of the line CSkip2 Zero The modifiers specified by the Modifier variable are the same as in the -k flag key sort definition. The field and column numbers specified by +FSkip1.CSkip1 variables are generally one less than the field and column number of the sort key itself because these variables specify how many fields and columns to skip before reaching the sort key. For ex- ample: +2.1b -3r specifies a sort key beginning in the second nonblank column of the third field and extending to the end of the third field, with the sort on this key to be done in reverse collation order. The statement +2.1b specifies that two fields are skipped and then the leading blanks and one more column are skipped. If the +FSkip1.CSkip1 variables fall beyond the end of the line or after the -FSkip2.CSkip2 variables, then the sort key is ignored. Note: The maximum number of fields on a line is 10. Flags Note: A -b, -d, -f, -i, -n, or -r flag that appears before any sort key definitions applies to all sort keys. None of the -b, -d, -f, -i, -n, or -r flags may appear alone after a - k KeyDefinition; if they are attached to a KeyDefinition variable as a modifier, they apply only to the at- tached sort key. If one of these flags follows a +Fskip.Cskip or -Fskip.Cskip sort key definition, the flag only applies to that sort key. -A Sorts on a byte-by-byte basis using ASCII collation order in- stead of collation in the current locale. -b Ignores leading spaces and tabs to find the first or last column of a field. -c Checks that input is sorted according to the ordering rules specified in the flags. A nonzero value is returned if the input file is not correctly sorted. -d Sorts using dictionary order. Only letters, digits, and spaces are considered in comparisons. -f Changes all lowercase letters to uppercase before comparison. -i Ignores all nonprinting characters during comparisons. -k KeyDefinition Specifies a sort key. The format of the KeyDefinition option is: [FStart][.CStart][Modifier][,[FEnd][.CEnd][Modifier]] The sort key includes all characters beginning with the field specified by the FStart variable and the column specified by the CStart variable and ending with the field specified by the FEnd variable and the column specified by the CEnd variable. The value of the Modifier variable can be b, d, f, i, n, or r. The modifiers are equivalent to the flags of the same letter. -m Merges multiple input files only; the input are assumed to be already sorted. -n Sorts numeric fields by arithmetic value. A numeric field may contain leading blanks, an optional minus sign, decimal digits, thousands-separator characters, and an optional radix character. Numeric sorting of a field containing any nonnumeric character gives unpredictable results. -o OutFile Directs output to the file specified by the OutFile parameter instead of standard output. The value of the OutFile parameter can be the same as the File parameter. -r Reverses the order of the specified sort. -t Character Specifies Character as the single field separator character. -u Suppresses all but the first line in each set of lines that sort equally according to the sort keys and options. -T Directory Places all temporary files that are created into the directory specified by the Directory parameter. -y[Kilobytes] Starts the sort command using the number of kilo- bytes of main storage specified by the Kilobytes parameter and adds storage as needed. (If the value specified in the Kilobytes parameter is less than the minimum storage site or greater than the maximum, the minimum or maximum is used instead). If the -y flag is omitted, the sort command starts with the default storage size. The -y0 flag starts with minimum storage, and the -y flag (with no Kilobytes value) starts with maximum storage. The amount of storage used by the sort command affects performance significantly. Sorting a small file in a large amount of storage is wasteful. -z RecordSize Prevents abnormal termination if any of the lines being sorted are longer than the default buffer size. When the -c or -m flags are specified, the sorting phase is omitted and a system default buffer size is used. If sorted lines are longer than this size, sort terminates abnormally. The -z option speci- fies recording of the longest line in the sort phase so adequate buffers can be allocated in the merge phase. RecordSize must designate a value in bytes equal to or greater than the longest line to be merged. Examples 1. To sort the fruits file with the LC_ALL, LC_COLLATE, or LANG environment variable set to En_US, enter: LANG=En_US sort fruits This displays the contents of the fruits file sorted in ascending lexicographic order. The characters in each column are compared one by one, including spaces, digits, and special characters. For instance, if the fruits file contains the text: banana orange Persimmon apple %%banana apple ORANGE the sort command displays: %%banana ORANGE Persimmon apple apple banana orange In the ASCII collating sequence, the % (percent sign) precedes uppercase letters, which precede lowercase letters. If your current locale specifies a character set other than ASCII, your results may be different. 2. To sort in dictionary order, enter: sort -d fruits This sorts and displays the contents of the fruits file, compar- ing only letters, digits, and spaces. If the fruits file is the same as in example 1, then the sort command displays: ORANGE Persimmon apple apple %%banana banana orange The -d flag ignores the % (percent sign) character because it is not a letter, digit, or space, placing %%banana with banana. 3. To group lines that contain uppercase and special characters with similar lowercase lines, enter: sort -d -f fruits The -d flag ignores special characters and the -f flag ignores differences in case. With the LC_ALL, LC_COLLATE, or LANG en- vironment variable set to C, the output for the fruits file be- comes: apple apple %%banana banana ORANGE orange Persimmon 4. To sort, removing duplicate lines, enter: sort -d -f -u fruits The -u flag tells the sort command to remove duplicate lines, making each line of the file unique. This displays: apple %%banana orange Persimmon Not only is the duplicate apple removed, but banana and ORANGE as well. These are removed because the -d flag ignores the %% spe- cial characters and the -f flag ignores differences in case. 5. To sort as in example 4, removing duplicate instances unless capitalized or punctuated differently, enter: sort -u +0 -d -f +0 fruits Entering the +0 -d -f does the same type of sort that is done with -d -f in example 3. Then the +0 performs another comparison to distinguish lines that are not identical. This prevents the -u flag from removing them. Given the fruits file shown in example 1, the added +0 distin- guishes %%banana from banana and ORANGE from orange. However, the two instances of apple are identical, so one of them is deleted. apple %%banana banana ORANGE orange Persimmon 6. To specify the character that separates fields, enter: sort -t: +1 vegetables This sorts the vegetables file, comparing the text that follows the first colon on each line. The +1 tells the sort command to ignore the first field and to compare from the start of the second field to the end of the line. The -t: flag tells the sort command that colons separate fields. If vegetables contains: yams:104 turnips:8 potatoes:15 carrots:104 green beans:32 radishes:5 lettuce:15 Then, with the LC_ALL, LC_COLLATE, or LANG environment variable set to C, the sort command displays: carrots:104 yams:104 lettuce:15 potatoes:15 green beans:32 radishes:5 turnips:8 Note that the numbers are not in numeric order. This happened when a lexicographic sort compares each character from left to right. In other words, 3 comes before 5, so 32 comes before 5. 7. To sort numbers, enter: sort -t: +1 -n vegetables This sorts the vegetables file numerically on the second field. If the vegetables file is the same as in example 6, then the sort command displays: radishes:5 turnips:8 lettuce:15 potatoes:15 green beans:32 carrots:104 yams:104 8. To sort more than one field, enter: sort -t: +1 -2 -n +0 -1 -r vegetables OR sort -t: -k2,2n -k1,1r vegetables This performs a numeric sort on the second field (+1 -2 -n). Within that ordering, it sorts the first field in reverse alpha- betic order (+0 -1 -r). With the LC_ALL, LC_COLLATE, or LANG en- vironment variable set to C, the output looks like this: radishes:5 turnips:8 potatoes:15 lettuce:15 green beans:32 yams:104 carrots:104 The command sorts the lines in numeric order. When two lines have the same number, they appear in reverse alphabetic order. 9. To replace the original file with the sorted text, enter: sort -o vegetables vegetables This stores the sorted output into the vegetables file (-o vegetables). Implementation Specifics This command is part of Base Operating System (BOS) Runtime. Files /usr/bin/sort Contains the sort command. /usr/tmp Temporary space during the sort command processing. /tmp Temporary space during the sort command processing, if need- ed. Related Information Files Overview in AIX Version 3.2 System User's Guide: Base and Devices. Input and Output Redirection Overview in AIX Version 3.2 System User's Guide: Base and Devices. National Language Support Overview for Programming in AIX Ver- sion 3.2 General Programming Concepts. The comm command, join command, uniq command.