This section contains the detail about the Regular Expression in Java .
Regular Expression in Java
For matching pattern in java , we are using regular expression. For this, we are using 'java.util.regex' package. It is very similar to Perl programming language. The regular expression is used to define pattern, which is used to find string or set of strings. They can be used to search, edit, or manipulate text and data.
The 'java.util.regex' package primarily consists of the following three classes:
-
Pattern Class: A Pattern object is a compiled representation of a regular expression. The Pattern class provides no public constructors. To create a pattern, you must first invoke one of its public static compile methods, which will then return a Pattern object. These methods accept a regular expression as the first argument.
-
Matcher Class: A Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher method on a Pattern object.
-
PatternSyntaxException: A PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern.
Capturing Groups :
For treating multiple characters as single unit we use Capturing groups. For example, the regular expression (dev) creates a single group containing the letters "d", "e", and "v".
Capturing groups are numbered by counting their opening parentheses from left
to right. In the expression ((X)(Y(Z))), for example, there are four such
groups:
1.((X)(Y(Z)))
2.(X)
3.(Y(Z))
4.(Z)
By using 'groupCount' method you can find the number of groups present in an expression. There is also a special group, group 0, which always represents the entire expression. This group is not included in the total count returned by groupCount.
Example :
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexMatches { public static void main(String args[]) { // String to be scanned to find the pattern. String line = "This order was places for QT3000! OK?"; String pattern = "(.*)(\\d+)(.*)"; // Create a Pattern object Pattern r = Pattern.compile(pattern); // Now create matcher object. Matcher m = r.matcher(line); if (m.find()) { System.out.println("Found value: " + m.group(0)); System.out.println("Found value: " + m.group(1)); System.out.println("Found value: " + m.group(2)); } else { System.out.println("NO MATCH"); } } }
Output :
C:\Program Files\Java\jdk1.6.0_18\bin>javac RegexMatches .java C:\Program Files\Java\jdk1.6.0_18\bin>java RegexMatches Found value: This order was places for QT3000! OK? Found value: This order was places for QT300 Found value: 0 |
Regular Expression Syntax :
Here is the table listing down all the regular expression metacharacter syntax available in Java :
Subexpression | Matches |
---|---|
^ | Matches beginning of line. |
$ | Matches end of line. |
. | Matches any single character except newline. Using m option allows it to match newline as well. |
[...] | Matches any single character in brackets. |
[^...] | Matches any single character not in brackets |
\A | Beginning of entire string |
\z | End of entire string |
\Z | End of entire string except allowable final line terminator. |
re* | Matches 0 or more occurrences of preceding expression. |
re+ | Matches 1 or more of the previous thing |
re? | Matches 0 or 1 occurrence of preceding expression. |
re{ n} | Matches exactly n number of occurrences of preceding expression. |
re{ n,} | Matches n or more occurrences of preceding expression. |
re{ n, m} | Matches at least n and at most m occurrences of preceding
expression. |
a| b | Matches either a or b. |
(re) | Groups regular expressions and remembers matched text. |
(?: re) | Groups regular expressions without remembering matched text. |
(?> re) | Matches independent pattern without backtracking. |
\w | Matches word characters. |
\W | Matches nonword characters. |
\s | Matches whitespace. Equivalent to [\t\n\r\f]. |
\S | Matches nonwhitespace. |
\d | Matches digits. Equivalent to [0-9]. |
\D | Matches nondigits. |
\A | Matches beginning of string. |
\Z | Matches end of string. If a newline exists, it matches just before newline. |
\z | Matches end of string. |
\G | Matches point where last match finished. |
\n | Back-reference to capture group number "n" |
\b | Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets. |
\B | Matches nonword boundaries. |
\n, \t, etc. | Matches newlines, carriage returns, tabs, etc. |
\Q | Escape (quote) all characters up to \E |
\E | Ends quoting begun with \Q |
[ 0 ] Comments