# Example string string <- "data-science" # Extract substring after the hyphen result <- sub(".*-", "", string) print(result) # Output: "science"
[1] "science"
Welcome back, R Programmers! Today, we’ll explore a common task: extracting a substring after a specific character in R. Whether you’re cleaning data or transforming strings, this skill is quite handy. We’ll look at three approaches: using base R, stringr
, and stringi
. Let’s dive in!
Base R provides several functions to manipulate strings. Here, we’ll use sub
and strsplit
to extract a substring after a specific character.
sub
The sub
function allows us to replace parts of a string based on a pattern. Here’s how to extract the part after a specific character, say a hyphen (-
).
# Example string string <- "data-science" # Extract substring after the hyphen result <- sub(".*-", "", string) print(result) # Output: "science"
[1] "science"
Explanation:
.*-
is a regular expression where .*
matches any character (except for line terminators) zero or more times, and -
matches the hyphen.""
is the replacement, effectively removing everything up to and including the hyphen.strsplit
The strsplit
function splits a string into substrings based on a delimiter.
# Example string string <- "hello-world" # Split the string at the hyphen parts <- strsplit(string, "-")[[1]] # Extract the part after the hyphen result <- parts[2] print(result) # Output: "world"
[1] "world"
Explanation:
strsplit(string, "-")
splits the string into parts at the hyphen, returning a list.[[1]]
extracts the first element of the list.[2]
extracts the second part of the split string.stringr
The stringr
package, part of the tidyverse, provides consistent and easy-to-use string functions.
str_extract
The str_extract
function extracts matching patterns from a string.
library(stringr) # Example string string <- "apple-pie" # Extract substring after the hyphen result <- str_extract(string, "(?<=-).*") print(result) # Output: "pie"
[1] "pie"
Explanation:
(?<=-)
is a look behind assertion, ensuring the match occurs after a hyphen..*
matches any character zero or more times.str_split
Similar to strsplit
in base R, str_split
splits a string based on a pattern.
# Example string string <- "open-source" # Split the string at the hyphen parts <- str_split(string, "-")[[1]] # Extract the part after the hyphen result <- parts[2] print(result) # Output: "source"
[1] "source"
Explanation:
str_split(string, "-")
splits the string into parts at the hyphen, returning a list.[[1]]
extracts the first element of the list.[2]
extracts the second part of the split string.stringi
The stringi
package is another powerful tool for string manipulation, providing high-performance functions.
stri_extract
The stri_extract
function extracts substrings based on patterns.
library(stringi) # Example string string <- "front-end" # Extract substring after the hyphen result <- stri_extract(string, regex = "(?<=-).*") print(result) # Output: "end"
[1] "end"
Explanation:
regex = "(?<=-).*"
uses a regular expression where (?<=-)
is a lookbehind assertion ensuring the match occurs after a hyphen, and .*
matches any character zero or more times.stri_split
Similar to strsplit
and str_split
, stri_split
splits a string based on a pattern.
# Example string string <- "full-stack" # Split the string at the hyphen parts <- stri_split(string, regex = "-")[[1]] # Extract the part after the hyphen result <- parts[2] print(result) # Output: "stack"
[1] "stack"
Explanation:
stri_split(string, regex = "-")
splits the string into parts at the hyphen, returning a list.[[1]]
extracts the first element of the list.[2]
extracts the second part of the split string.There you have it—three different ways to extract a substring after a specific character in R. Each method has its own benefits and can be handy depending on your specific needs. Give these examples a try and see which one works best for your data!
Happy coding!