Nobal Tech: Text Normalizer - Dealing with ascents

I had to sort French texts in alphabetical order. It was not as simple as we compare English strings because we must deal with the French ascents such as é and à.

If we don't process anything and use the simple string comparison function, we get équipement after zebra. However, we need équipement between words starting from 'd' and 'f' i.e. we want équipement as if it were equipement. In order to solve the problem, we must compare strings after we normalize and remove Diacritic:

String normalizedStr1=Normalizer.normalize(Text1, Normalizer.Form.NFD).replaceAll("[\u0300-\u036F]", "");

String normalizedStr2=Normalizer.normalize(Text2, Normalizer.Form.NFD).replaceAll("[\u0300-\u036F]", "");

Now we make comparison between normalizedStr1 and normalizedStr2 instead of Text1 and Text2.

Nobal Tech

Text Normalizer - Dealing with ascents

0 comments:

Post a Comment

About Me

Blog Archive

Labels

Number of Visitors