Sunday, April 20, 2025

Unmasking the Repetition: Counting Duplicate Words in Java

Have you ever wondered how many times a specific word pops up in a lengthy piece of text? Whether you're analyzing user feedback, processing documents, or simply curious about word frequency, identifying duplicate words can be quite insightful. Java, with its rich set of tools, makes this task surprisingly straightforward. Let's dive in and explore a simple yet effective way to count duplicate words in a given sentence using Java.

The Approach: Leveraging the Power of HashMap

The core idea behind our approach is to iterate through each word in the sentence and maintain a count of its occurrences. A HashMap in Java is an ideal data structure for this purpose. Here's why:

 * Key-Value Pairs: HashMap stores data in key-value pairs. We can use each unique word as the key and its frequency as the value.

 * Efficient Lookups: Checking if a word has already been encountered and updating its count is a fast operation in a HashMap.

The Java Code in Action

Let's take a look at the Java code that implements this logic:

import java.util.Arrays;

import java.util.HashMap;

import java.util.Map;


public class DuplicateWordCounter {


    public static void countDuplicateWords(String sentence) {

        // 1. Split the sentence into words

        String[] words = sentence.toLowerCase().split("\\s+");


        // 2. Create a HashMap to store word counts

        Map<String, Integer> wordCounts = new HashMap<>();


        // 3. Iterate through the words and update counts

        for (String word : words) {

            wordCounts.put(word, wordCounts.getOrDefault(word, 0) + 1);

        }


        // 4. Print the duplicate word counts

        for (Map.Entry<String, Integer> entry : wordCounts.entrySet()) {

            if (entry.getValue() > 1) {

                System.out.println(entry.getKey() + ": " + entry.getValue());

            }

        }

    }


    public static void main(String[] args) {

        String sentence = "This is a simple sentence this has multiple duplicate words is is a";

        System.out.println("Duplicate word counts in the sentence:");

        countDuplicateWords(sentence);

    }

}


Breaking Down the Code:

 * Splitting the Sentence:

   String[] words = sentence.toLowerCase().split("\\s+");


   We first convert the input sentence to lowercase using toLowerCase() to ensure that words like "This" and "this" are treated as the same. Then, we use the split("\\s+") method to split the sentence into an array of individual words. The regular expression "\\s+" matches one or more whitespace characters, effectively separating the words.

 * Initializing the HashMap:

   Map<String, Integer> wordCounts = new HashMap<>();


   We create an empty HashMap called wordCounts to store the words and their corresponding counts. The keys will be String (the words), and the values will be Integer (the frequency).

 * Iterating and Counting:

   for (String word : words) {

    wordCounts.put(word, wordCounts.getOrDefault(word, 0) + 1);

}


   We loop through each word in the words array. For each word:

   * wordCounts.getOrDefault(word, 0): This tries to retrieve the current count of the word from the wordCounts map. If the word is not yet present in the map, it returns a default value of 0.

   * + 1: We increment the count (either the existing count or the default 0) by 1.

   * wordCounts.put(word, ...): We update the wordCounts map with the current word and its updated count. If the word was not present before, it's added to the map with a count of 1.

 * Printing Duplicate Counts:

   for (Map.Entry<String, Integer> entry : wordCounts.entrySet()) {

    if (entry.getValue() > 1) {

        System.out.println(entry.getKey() + ": " + entry.getValue());

    }

}


   Finally, we iterate through the entries (key-value pairs) in the wordCounts map. For each entry, we check if the value (the count) is greater than 1. If it is, we print the word (the key) and its count, indicating that it's a duplicate word.

Running the Code

When you run the main method with the example sentence, the output will be:

Duplicate word counts in the sentence:

is: 2

a: 2

this: 2

words: 2

duplicate: 2


Further Enhancements

This basic implementation can be extended in several ways:

 * Ignoring Punctuation: You could preprocess the sentence to remove punctuation marks before splitting it into words.

 * Case Sensitivity: If you need a case-sensitive count, you can skip the toLowerCase() step.

 * Sorting Results: You could sort the output based on the frequency of the words.

 * Handling Edge Cases: Consider how to handle empty sentences or sentences with only one word.

Conclusion

Counting duplicate words in Java is a fundamental text processing task that can be efficiently achieved using the HashMap data structure. This approach provides a clear and concise way to identify and quantify word repetition within a given sentence. By understanding this basic technique, you can build upon it to perform more complex text analysis and gain valuable insights from your data.


This Content Sponsored by Buymote Shopping app

BuyMote E-Shopping Application is One of the Online Shopping App

Now Available on Play Store & App Store (Buymote E-Shopping)

Click Below Link and Install Application: https://buymote.shop/links/0f5993744a9213079a6b53e8

Sponsor Content: #buymote #buymoteeshopping #buymoteonline #buymoteshopping #buymoteapplication


No comments:

Post a Comment