
In "Beginners RegEx - part 1" we saw some character classes (the [a-z] style) to march any character between a and z. We also saw that you can mix character classes to include more characters by using styles like [a-z0-9] to match every letter from a to z and every digit from 0 to 9. And if we wanted allow the uppercase versions of the letters we'd have to write something like [a-zA-Z0-9].
Today we'll look at some metacharacters \char, which can make life easier when using character classes [ characters ].
Here is a short list of some character class metacharacters
\t = TabulatorSo now we can construct some tighter regular expressions which eventually we be easier to read, once you grow acustomed to it.
\n = Newline
\r = Carrige return
\s = Any white space character, ie. space, tab, newline, formfeed etc.
\S = Any character which is NOT a whitespace
\w = Word, which is basically [a-zA-Z0-9_]
\W = Any character which is NOT defined as being a word
\d = Digit, which is basically [0-9]
\D = Any character which is NOT defined as being a digit
So if we have a situation in a html file which looks like this:

Then we can construct a regular expression which matches the url part of the html by writing the following regex.
http://[\w\s/.-]+This will give us a result like:

Let's take the regex apart and explain what is going on.
One of the things you may wonder about, hopefully, is that inside the character class we wrote . (dot) to allow marching of the dots between the url name parts w4nd0rn.blogspot.com and the final . just before the type (.html).
Inside a character class [ ] the . (dot) does not mean any character what so ever. In a character class it means just what it is, ie. . (dot).
No comments:
Post a Comment