Preprocess a string, removing special characters and handling abbreviations.
Source:R/helper_functions.R
preprocess_string.Rd
Replace some common characters / character sequences (e.g., Ä, Ü, "DIPL.-ING.") with their uppercase equivalents and removes punctuation, empty spaces and the word "Diplom".
Details
charToRaw()
helps to find UTF-8 characters.
Examples
if (FALSE) { # \dontrun{
preprocess_string(c(
"Verkauf von B\u00fcchern, Schreibwaren",
"Fach\u00e4rztin f\u00fcr Kinder- und Jugendmedizin im \u00f6ffentlichen Gesundheitswesen",
"Industriemechaniker",
"Dipl.-Ing. - Agrarwirtschaft (Landwirtschaft)"
))
} # }