public final class HTMLTextTools
extends java.lang.Object
It includes methods for:
Modifier and Type | Field and Description |
---|---|
static org.apache.log4j.Logger |
log |
static java.util.regex.Pattern |
PATTERN_HTML_MESSAGE
Regular expression to match a string that starts with "<html>" and ends with "</html>",
possibly with leading or trailing whitespace and content after the closing tag.
|
static java.util.regex.Pattern |
PATTERN_HTML_TAG
Compiled
Pattern object for PATTERN_HTML_TAG_STRING . |
static java.lang.String |
PATTERN_HTML_TAG_STRING
Regular expression to match any HTML tag, capturing the tag name in group 1.
|
Constructor and Description |
---|
HTMLTextTools() |
Modifier and Type | Method and Description |
---|---|
static java.lang.String |
colorToHTML(java.awt.Color color)
Converts a
Color object to its HTML hexadecimal color code representation. |
static java.lang.String |
convertForHTMLInclusion(java.lang.String str)
Converts the given String
str into a format suitable for inclusion in HTML messages. |
static java.lang.String |
convertHTMLtoText(java.lang.String str)
Converts HTML to plain text by removing HTML tags, style elements, and converting special characters.
|
static java.lang.String |
disableHTML(java.lang.String str)
Escapes HTML tags in the given string by replacing '<' with "<" and '>' with ">".
|
static java.lang.String |
replaceHtmlEntities(java.lang.String str)
Replaces characters with special meaning in HTML with the
corresponding entities using a StringBuilder for efficiency.
|
static java.lang.String |
stripHTML(java.lang.String str)
Removes all HTML tags from the given string.
|
static java.lang.String |
stripHTMLTags(java.lang.String str,
java.util.Collection<java.lang.String> tagNamesToRemove)
Removes specific HTML tags from a given string.
|
static java.lang.String |
stripHTMLTagsExcept(java.lang.String str,
java.util.Collection<java.lang.String> tagNamesToKeep)
Removes all HTML tags from a string except for those specified in a collection.
|
public static final org.apache.log4j.Logger log
public static final java.lang.String PATTERN_HTML_TAG_STRING
This regex is case-insensitive and matches tags with or without a closing slash. It captures the tag name (e.g., "div", "span", "p") in the first capturing group. It does *not* match self-closing tags (e.g., <img/>). It is primarily intended for use in identifying and extracting tag names, not for robust HTML parsing.
The regex breakdown is as follows:
(?i)
: Enables case-insensitive matching.\\</?
: Matches an optional closing slash ("/"). This allows matching both
opening tags (e.g., <div>) and closing tags (e.g., </div>).(\\w+)
: Matches one or more word characters (letters, numbers, and underscore).
This represents the tag name and is captured in group 1.\\s*
: Matches zero or more whitespace characters. This allows for spaces
around the tag name.(?:[^\\>]\*\>)
: Matches the rest of the tag, including attributes.
(?: ... )
: This is a non-capturing group. We don't need to capture
the attributes, just match them.[^\\>]*
: Matches zero or more characters that are *not* a closing
angle bracket (>). This allows for attributes within the tag.\\>
: Matches the closing angle bracket (>).**Important Considerations:** This regex is a simplified approach and should *not* be used for robust HTML parsing. Parsing HTML with regular expressions can be complex and error-prone, especially with malformed or nested HTML. For reliable HTML parsing, consider using a dedicated HTML parser library.
public static final java.util.regex.Pattern PATTERN_HTML_TAG
Pattern
object for PATTERN_HTML_TAG_STRING
.
Pre-compiling the pattern improves performance when it is used repeatedly.public static final java.util.regex.Pattern PATTERN_HTML_MESSAGE
The regex is designed to capture the HTML content within the tags (group 1) and any content that follows the closing tag (group 2). It uses the following features:
(?s)
: Enables DOTALL mode, making the dot (.) match any character, including
newline characters. This is essential for matching HTML content that spans multiple lines.\\s*
: Matches zero or more whitespace characters. This handles leading and
trailing whitespace around the <html> tags and the trailing content.<html>
: Matches the literal start tag <html>.(.*)
: Captures any characters (including newlines due to DOTALL mode) between
the opening and closing HTML tags. This is the HTML content (group 1).</html>
: Matches the literal closing tag </html>.(.*)
: Captures any characters following the closing HTML tag (including newlines
due to DOTALL mode). This is the trailing content (group 2).\\s*
: Matches zero or more whitespace characters after the trailing content.This pattern allows for flexible matching of HTML strings, even if they have extra text outside the <html> tags.
public static final java.lang.String convertForHTMLInclusion(java.lang.String str)
str
into a format suitable for inclusion in HTML messages.
This method handles cases where the string already contains HTML tags and replaces line
breaks in trailing content with <br/> tags. If the string does not contain HTML tags,
it replaces HTML entities to ensure proper rendering.
**Handling HTML Content:** If str
starts with "<html>" and contains "</html>"
after that, the content between these tags is extracted. Trailing content after the closing
"</html>" tag is appended to the result. Line breaks within the extracted trailing content
are replaced with "<br/>" and the replaceHtmlEntities(String)
method is called to
replace HTML entities. This allows for mixed HTML and non-HTML content in messages, without
corrupting the structure of the final HTML content.
**Handling Non-HTML Content:** If str
does *not* match the HTML pattern, the
replaceHtmlEntities(String)
method is called to replace HTML entities. This
converts characters like <, >, &, and others into their corresponding HTML
entities (< > &), ensuring correct rendering in the client.
str
- The string to convert.null
if the input is null
.public static final java.lang.String replaceHtmlEntities(java.lang.String str)
str
- String to escapepublic static final java.lang.String disableHTML(java.lang.String str)
This method prevents HTML tags from being interpreted by the browser. It should be called before displaying the string in an HTML context, especially when the string originates from user input or other untrusted sources.
str
- the string to escape, may be null
null
if the input is null
public static final java.lang.String stripHTML(java.lang.String str)
This method uses a regular expression to identify and remove any text enclosed within '<' and '>'. It effectively strips all HTML tags from the input string.
str
- the input string, may be null
null
if the input is null
public static final java.lang.String convertHTMLtoText(java.lang.String str)
This method performs the following transformations:
str
- the input string containing HTML content, may be null
null
if the input is null
public static final java.lang.String stripHTMLTags(java.lang.String str, java.util.Collection<java.lang.String> tagNamesToRemove)
This method identifies and removes HTML tags specified in the tagNamesToRemove
collection.
It performs a case-insensitive comparison of tag names and removes the entire tag, including
any attributes.
str
- the input string containing HTML content, may be null
tagNamesToRemove
- a collection of tag names to be removed, may be null
or emptynull
public static final java.lang.String stripHTMLTagsExcept(java.lang.String str, java.util.Collection<java.lang.String> tagNamesToKeep)
This method identifies all HTML tags in the input string and removes any tag that is not
present in the tagNamesToKeep
collection. The comparison of tag names is case-insensitive.
If the tagNamesToKeep
collection is null
, no tags are removed. To remove all tags,
use an empty collection or the stripHTML(String)
method.
str
- the input string containing HTML content, may be null
tagNamesToKeep
- a collection of tag names to be kept, may be null
tagNamesToKeep
,
or the original string if tagNamesToKeep
is null
,
or null
if str
is null
public static final java.lang.String colorToHTML(java.awt.Color color)
Color
object to its HTML hexadecimal color code representation.color
- The Color
object to convert.Copyright © 2000-2025 OAshi S.à r.l. All Rights Reserved.