Symbol frequency

  • 20
  • Locked
The program is started with one argument: the name of a file that contains text. Calculate how often each symbol is encountered. Sort the results by increasing ASCII code (read about it online). Example: ','=44, 's'=115, 't'=116. Display the sorted results: [symbol1] frequency1 [symbol2] frequency2
You can't complete this task, because you're not signed in.
Comments (8)
  • Popular
  • New
  • Old
You must be signed in to leave a comment
Dawid Bujnicki Junior Developer
20 June 2021, 12:07
Do not use BufferedReader for this one - it skips new line symbols and fails to pass requirement 4. Also, do not define all character list before reading file - add entries while reading :)
horst
Level 26 , Not in list
13 September 2020, 10:55
Just for fun: There is a very inefficient way to solve the counting and sorting. You can create an int-array with the length of Character.MaxValue (65535). This way, there is a "slot" for every possible unicode-character occurring in the file. Next, you can use the integer values you read from the file to increment the array at the respective index. You'll end up with an int-array where the index is equal to the ascii-code of a symbol and the value of the array at that index is its frequency.
int[] positions = new int[Character.MAX_VALUE];
            while (inputStream.available() > 0) {
                int c = inputStream.read();
                positions[c]++;
            }
After that, you can iterate through that array and print its index (as char) and its value (if that value is not zero, of course). So, all you have to do is run through a totally oversized array twice while ignoring probably 97% of its contents. :)
Andrei
Level 41
19 January 2021, 09:37
Very interesting solution, but why is this inefficient? L.E. I assume it's because of the size of the array which would lead to a considerable increase in memory allocation.
DarthGizka
Level 24 , Wittenberg, Germany
6 June 2021, 18:33
@horst: you are mistaken. Input streams read (unsigned) byte values, not characters. Ergo you only need an int[256], and this makes it very efficient indeed (fits into the L1 cache even if you use long[] instead of int[]). In a Western/European context the most efficient way of dealing with *characters* (i.e. UTF-16) is * hope that there aren't any characters from the supplemental multilingual plane * use int[256] for characters < 256 (99.9% of them) * use a map for the rest (e.g. Euro sign, smilies, whatever) In any case you need to remember that Java uses the UTF-16 coding, and some Unicode code points need more than one Java char. I.e. char-by-char processing falls flat on its face if 'extended' characters can be present (i.e. those not from the 64k base plane).
Dave Andrea
Level 41 , Canada
25 July 2020, 02:44
It won't verify if you include the characters with a frequency of 0. So you best skip printing those.
Ewerton Backend Developer
4 July 2019, 14:56
TreeMap and lambda foreach can really simplify this code :)
Ahmed
Level 23 , Amsterdam, Netherlands
24 August 2019, 11:59
TreeMap saved me again ;). Thanks!
Weichen Ouyang
Level 25 , San Jose, United States
15 September 2019, 22:37
Care to share your code for the sorting? Many thanks. I use Arraylist of the characters to solve the problem: ArrayList<Character> sortedKeys = new ArrayList<Character>(kv.keySet()); Collections.sort(sortedKeys);