NO

Author Topic: Regex work in win32.exe: "Problem with Henry Spencer's Posix windows Port?"  (Read 7017 times)

EdPellesC99

  • Guest
   Hi,

   I have been continuing to try to use Regex.h.
   I have run into a problem with the RegEx expression not working as it should.
   I am having a problem with using the  *  in my regular expression.

It does not seem to be working in the way it should. (neither does the dot). Without being able to use it you are in trouble.

I have a simple RegEx tester file Attached, the file exeeds the 2000 character limit (seems like I do that a lot).

It nicely demonstrates whether the hard coded regex expression, applied against the edit control entry, will give you
a match or no match.

I have the issue shown plainly.

I want to exclude the capital G from an edit control entry.

I want No Match if it is in the first character position and a no Match in any string position after the first.
The proper regex should be "[^G]*", (of course, the carrot inside the brackets meaning Not)
I hope anyone with any experience in Regex will agree.

The asterisk not only does *not* give you a repeat first character mask for each character after the first, it actually voids
the No Match on the first character ( I get Match !).

Has anyone seen a problem using the asterisk or the dot in a regular expression.

EDIT:
==================
I want to point out that:
"[^G]" correctly excludes the capital G for the first character position.
"[^G]{3}" correctly excludes the capital G for the first three character positions. It requires three non-G characters or more to Match. >3 and G is accepted.
I modified the Attached file ever so slightly from the first I attached, and attached the updated ver instead.
==================

Is this a problem in the Henry Spencer Posix port for windows?

Can anyone show me a regex that will achieve my simple goal?
If I cannot solve this, I cannot do much more work with Regex.h

Thanks, Ed




« Last Edit: March 06, 2011, 04:14:21 PM by EdPellesC99 »

EdPellesC99

  • Guest
Here is a small list of what I know:

//static char regex1[]="^[^G]";  //.........................this wants no G in first character position to Match
//static char regex1[]="^[^G]*"; //........................this lets G be anywhere
static char regex1[]="[[^G]]*"; //..........................as above, allows G anywhere

//static char regex1[]="^[^G]{3}"; //.....................this wants 3 or more non-Gs at string start to Match
//static char regex1[]="^[G]{3}";  //.....................this wants 3 or more Gs at string start to Match

//static char regex1[]="^[^G][[:alnum:]]"; //......................This wants No G first char, and only alpha num second pos
//static char regex1[]="^[^G][[:alnum:]]*"; //.....This wants No G first char, allows anything on second char and more
 

I will add to this when I learn more, but I am about out of things to try, everything I read says "^[^G]*" should work. Double bracketing here does not help ( even ...... static char regex1[]="[[^G]]"; does not work properly ).

Here:
http://www.regular-expressions.info/posixbrackets.html I found


Quote
POSIX bracket expressions can be negated. [^x-z[:digit:]] matches a single character that is not x, y, z or a digit.

When I use:
static char regex1[]="[^A-z[:digit:]]";

On the first character entry. I get the correct results, I return 0 if a Capital Letter is entered, and 1 if a small case or digit is entered.

But if you enter:  Hj  it will return 0 too   .....If the Asterisk is not possible to use, how do I apply this Regex Token to every character entered in the edit control?

Eventually I want to be able to say "no punctuation marks" for every character entered, no matter how many.



Is there a chance that I should be using a library?

Like "rxspencer.lib"     ????????     Because I know it does exist
It is here:
http://gnuwin32.sourceforge.net/packages/rxspencer.htm


EDIT: OK I tried the library does nothing. Even substituted Spencers Regex.h w library and NoGo, as the Regex.h by Pelle is modified and necessary.

--Ed
« Last Edit: March 05, 2011, 05:32:21 PM by EdPellesC99 »

EdPellesC99

  • Guest
   At this point, until someone can suggest something or tell me where I have gone wrong I am stuck.

All my input so far was to prove that I have no way of using the * or . as I should be able to do.

In my actual situation I would like to validate the user input of an absolute file path.

Here is the the regex that seems to be as far as I can go to do this:
static char regex1[]="^[C-M]:\\\\[[:alnum:]]";  
// This requires a Drive, then  colon then a backslash then an alphanum as 4th Char.
Fulfilling that requirement then any extra input is Ok'd.

Ideally I would stop the user from uses any punctuation or multiple backslashes in the folder path AFTER the fourth character, but my hands are tied if the ability to use the * or the dot properly is not there.

EDIT:
If I knew how many characters the user would input then I could for e.g. 10 total characters:
static char regex1[]="^[C-M]:\\\\[[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]][[:alnum:]]";  
This would work, but would require 10 satisfactory characters or ...No Match.

Above line can be written:                    (with same results)
static char regex1[]="^[C-M]:\\\\[[:alnum:]]{1,10};


I "could" count the characters then write the regex dynamically to construct the above, but I should not have to do this !!!


Hope someone will agree that something is not right (either with the compiler-Regex action, or with something I am doing) !

I notice that the "LCC-Win32 C Library" has the Perl Regular Expressions as a possibility......

I will stop my whinning now, .....I really believe I have gone as far as I can go to figure this out.

Thanks, Ed
« Last Edit: March 06, 2011, 11:52:01 PM by EdPellesC99 »

CommonTater

  • Guest
In my actual situation I would like to validate the user input of an absolute file path.

Oh boy... I'm thinking you've gotten off on a bit of a tangenet here Ed...

If your user is giving you: C:\users\fredsmith\documents\lastyear\letter001.doc  (for example)

The fastest way to validate the file is to try to open it... If fopen() or CreateFile() fails, the file isn't there.

Alternatively you can call FindFirstFile() with the full path as the search criterion.  If that succeeds you also get the file dates and times, size, mode and attributes... if it fails, it's not there.

Offline TimoVJL

  • Global Moderator
  • Member
  • *****
  • Posts: 2091
GetFileAttributes()/GetFileAttributesEx() are usable too.
May the source be with you

EdPellesC99

  • Guest
   I am using the goal of validating a Folder Path to explore the use of Regex in PellesC programming.

   It is very popular in text editor use, but I think it's main value would be in programming.

It is not a simple subject, there is a lot of terminology, there are so many ways it is used, it has evolved. In one place it is simple.  Posix seems fairly old. Spencer seemed to do his work from Toronto in about 1997.
The only way I can explore what I can do with the Regex possibilities in the PellesC compiler is to try it.

My next need may be to validate other types of user input info. Perhaps data in a format I have defined.

When I try to understand something, I loosely have a need, and I use it as an excuse to explore a tool / technique.

Quote
The fastest way to validate the file is to try to open it... If fopen() or CreateFile() fails, the file isn't there.

From the start of my first thread, (in my mind) I was talking about a Folder Path. A folder path that will be created, if it does not exist, automatically by third party software so it has to be legitimate (and it does crazy things if the folder path is illegal). When I started to see problems with the *, I backed up to a simple example to show I could not get it to work even on a single letter properly.

This IS the sort of thing that Regex is used for in all programming languages, ......verifying user input.

I am not very interested in using regular expressions on files. Grep, awk, eGrep and a bunch more. I am interested in using it the most powerfull way that I can in C & windows programming .

This limits the possibilities. I would like to find out how limited it is.  Not being able to use the repeat operator, in the way I think it is used (* or .), would be a serious limit.

I am hoping someone who has used Regex in PellesC, who has explored this for the tool potential that it is, can advise me.

If I had wanted a get-around, in the beginning I would have taken the user input in, (prior to my program calling the third party software) and tried to Create the Directory, I would then have tried to set the directory as the current directory, and !upon failure known the folder path must not have been legitimate as the folder does not exist. Or perhaps GetFileAttributes , as Timo suggests, on the Directory.  These other methods would have been a whole lot easier than messing with Regex, but I would not have learned anything new.

Instead, I used this as an excuse to learn how to use the Regex possibilities that PellesC gives me: I came a long way, I was hoping I could go further.... Even if I find out it cannot be done that will achieve my goal, and would start looking for options like the "LCC-Win32 C Library", but if it were not necessary it would be going off on a tangent.

If after a couple weeks, no one has a reply that confirms things or corrects me in some way.... I will start to look in that direction.

Thanks, Ed
« Last Edit: March 06, 2011, 11:48:16 PM by EdPellesC99 »

CommonTater

  • Guest
Ahhh... so a deliberate tangent then. 
Good stuff...

EdPellesC99

  • Guest

I did it !!!!!!
I can't believe it !

Quote

There may be a simpler way, but it would not be much simpler.
You have to count the characters and write the regex dynamically. That was easy.
Getting the dog-gone Regex to be as Ultimate as you can was mind numbing !


Can't Continue. Must take adequate time to pinch myself (hours or a day).... this is too good to be true !
later, Ed
« Last Edit: March 09, 2011, 05:42:55 PM by EdPellesC99 »