dimanche 28 juin 2015

Regex to remove punctuation from tokenized text

I'm trying to remove punctuation from tokenized text using regular expressions. Can anyone explain the following behaviour:

$ STRING='hey , you ! what " are you doing ? say ... ," what '
$ echo $STRING | sed -r 's/ [^[:alnum:][:space:]-]+ / /g;'
hey you what are you doing say ," what
$ echo $STRING | sed -r 's/ [[:punct:]]+ / /g;'
hey you what are you doing say ," what
$ echo $STRING | perl -pe 's/ [^[:alnum:][:space:]-]+ / /g;'
hey you what are you doing say ," what
$ echo $STRING | perl -pe 's/ [[:punct:]]+ / /g;'
hey you what are you doing say ," what

The ," token is preserved in the output, which I don't want. It's possible to match this token with:

$ echo $STRING | perl -pe 's/ [",]+ / /g;'
hey you ! what are you doing ? say ... what

URL rewrite not found

My URL it's like this

http://ift.tt/1TWVaiZ

and i want to rewrite url's like this

http://ift.tt/1GTUlAH

I thought this was the way

RewriteEngine On
RewriteBase /validar/
RewriteRule ^/([^/]*)/([^/]*)/$/api.php?desde=$1&que=$2 [L]

But i'm getting 404 and cant make it work

Any ideas? Thanks

PHP: filter specific pattern out of string

My raw output of socket_recvfrom is:

ID IP PING IDENTIFIERNUMBER USERNAME


0 127.0.0.1:1234 0 ID123456789 Moritz

1 127.0.0.1:1234 46 ID123456789 August Jones

2 127.0.0.1:1234 46 ID123456789 Miller


It is a single string that contains all of this informations in once and just contains whitespaces between the informations. All keys can be longer or shorter.

My problem:

When I preg_split("/\s+/") it, then I get a good array with useable data, but when the username contains spaces it creates a second index for this. Not good, all data that comes after this just get destroyed.

I sort the array like this: ID, USERNAME, PING, IDENTIFIERNUMBER, IP

Example by the sorting output with username with one space in it:


ID: 0, USERNAME: Moritz, PING: 0, IDENTIFIERNUMBER: ID123456789, IP: 127.0.0.1:1234

ID: 1, USERNAME: August, PING: Jones, IDENTIFIERNUMBER: 46, IP: ID123456789

ID: 127.0.0.1:1234, USERNAME: 2, PING: Miller, IDENTIFIERNUMBER: 46, IP: ID123456789


How do I get the information correctly out of the string?

Just forgot to say:

The string begins with: --------------------------------- in a not countable order. So it can be like 10 characters or 12. The string ends with:

 (8 users in total)

The regex methode looks good. I only need to filter out the other characters.

--------------------------------- 0 127.0.0.1:1234 0 ID123456789(OK) Moritz 1 127.0.0.1:1234 46 ID123456789(OK) August Jones 2 127.0.0.1:1234 46 ID123456789(OK) Miller (7 users in total)

Last problem: http://ift.tt/1eRW0xy

Regex PHP: Get specific content from a block of code from another website

I have a site from which I want to get specific content from 7 posts. Those all 7 seven posts have same HTML layout (See Below)

<div class="eventInfo">
<h3>Z's(矢沢永吉)</h3>
  <h4>Z's TOUR 2015</h4>

<dl>
    <dt><img src="/event/img/btn_day.png" alt="公演日時" width="92" height="20"> </dt>
    <dd>
      <table width="99%" border="0" cellpadding="0" cellspacing="0">
        <tbody><tr>
      <td width="9%" nowrap="nowrap">2015年6月</td>
      <td width="74%">4日 (木) 19:00開演</td>
    </tr>

  </tbody></table>
</dd>
<dt><img src="/event/img/btn_price.png" alt="料金" width="92" height="20"> </dt>
<dd>S¥10,500 A¥7,500 (全席指定・消費税込)<br><span class="attention">※</span>注意事項の詳細を<a href="http://ift.tt/1HpdVrL" target="_blank">矢沢永吉公式サイト</a>より必ずご確認ください</dd>

<dt><img src="/event/img/btn_ticket.png" alt="一般発売" width="92" height="20"> </dt>
<dd>
 <table width="99%" border="0" cellpadding="0" cellspacing="0">
  <tbody><tr>
    <td width="9%" nowrap="nowrap">2015年5月</td>
    <td width="74%">16日(土)</td>
  </tr>
</tbody></table>
  </dd>

  <dt><img src="/event/img/btn_contact.png" alt="お問合わせ" width="92" height="20"> </dt>
  <dd><a href="http://www.siteurl.com/" target="_blank">ソーゴー大阪</a> 06-6344-3326</dd>

  <dt><img src="/event/img/btn_info.png" alt="公演詳細" width="92" height="20"> </dt>
  <dd><a href="http://ift.tt/1GF5ySz" target="_blank">http://ift.tt/1HpdVrN; </dd>
</dl>
</div>

I just want to fetch the H3 from this layout and the first table in the code. What regex method should I use to get the desired results?

Also these are 7 posts just like the code above and I have to get H3 and the first table from each of it.

I have tested it but not sure that is it a correct way or not: http://ift.tt/1GF5z8P

But as you can see that I have to add unwanted data too like H4 DT IMG :(

This regex doesn't work in c++

It is supposed to match "abababab" since "ab" is repeated more than two times consecutively but the code isn't printing any output. Is there some other trick in using regex in C++.

I tried with other languages and it works just fine.

#include<bits/stdc++.h>

int main(){

  std::string s ("xaxababababaxax");
  std::smatch m;
  std::regex e ("(.+)\1\1+");   

   while (std::regex_search (s,m,e)) {
    for (auto x:m) std::cout << x << " ";
    std::cout << std::endl;
    s = m.suffix().str();
  }

  return 0;
}

Redirect Loops and .htaccess

I just moved from a CentOS dedi to an Ubuntu VPS. The site is custom coded PHP.

Frontend works fine (including rewrite rules). The admin backend I can't get rewrite rules to work...

First error:

H00124: Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.

Then after using debug level:

AH00122: redirected from r->uri = /admin/index.php

The relevant bits of my htaccess are:

# mod_rewrite set:

Options +Includes

RewriteEngine on

# Administration
RewriteCond %{REQUEST_URI} ^(/+)admin/(.*)$
RewriteRule (.*) %{DOCUMENT_ROOT}/admin/index.php [L,QSA]

# Rewrite orther
RewriteCond %{REQUEST_URI} !^(/+)index.php(.*)$
RewriteCond %{REQUEST_URI} !^(/+)syscmd.php$
RewriteRule ^(.*)$ %{DOCUMENT_ROOT}/index.php?page=$1 [L,QSA]

# If Rewriting Failure, Show error message (Internal backup)
RewriteCond %{REQUEST_URI} !^(/+)index.php$
RewriteCond %{REQUEST_URI} !^(/+)syscmd.php$
RewriteRule (.*) \1 [F]

This was working fine on CentOS too.

Any ideas? I already tried adding the following as the first condition:

RewriteCond %{REQUEST_URI} !/admin/ [NC]

That stopped it rewriting /admin completely.

Thanks

Grouping Output Pattern in Regex- Python

I would like to get only the sentences which match the regex pattern as output and avoid these NONE, how do i group the output which matches the pattern?

import re regex = re.compile('(.*)(?:India)') with open("D:/txt_res/abc.txt") as f: for line in f: result = regex.search(line) print(result)

The output which im getting is

None

None

None

<_sre.SRE_Match object; span=(0, 101), match='Email: abc.bitz@gmail.com >

None

None

None

<_sre.SRE_Match object; span=(0, 47), match='XYZ Engineer at ABC Organization, India'>

None

None

<_sre.SRE_Match object; span=(0, 32), match='Intern at S360, India'>

None

None

Using replace() replaces too much content

I'm using replace() to transform t in gwhen tis not followed by a letter p, I'm using this line of code:

"tpto".replace(/(t)[^p]/g, "g");

However, the result of this function is tpgand I was expecting tpgo. As I don't know which letter will follow the t I need something dynamic but I don't know what to do, any ideas?

Java String Replace Using Reular Expression

Original String: Flexible Premium Deferred Annuity (Policy #0410011)

Expected String : Flexible Premium Deferred Annuity

Would appreciate if someone can provide java code to accomplish this.

Thanks.

Download site with selected CGI input

So I have this script:
http://ift.tt/1JrcpWF
How do I download for example, file which has input 1880: http://ift.tt/1SUjWig from this site
They are all cgi files and when I enter http://ift.tt/Y9onHf, it gaves me output for 2013. But I checked the code of the sites and there is input for year and number of entries. So how do I select the entry I need using Python?

Extracting using a string pattern in Regex- Python

Cant we give a string in the Regex? For example, re.compile('((.*)?=<Bangalore>)'), in the below code i have mentioned <Bangalore> but its not displaying.

I want to extract the text before Bangalore.

import re

regex = re.compile('((.*)?=<>)')

line = ("Kathick Kumar, Bangalore who was a great person and lived from 29th 

March 1980 - 21 Dec 2014")

result = regex.search(line)

print(result)

Desired output: Kathick Kumar, Bangalore

How to censor website links?

I've been working on a regex censor for quite the time and can't seem to find a decent way of censoring address links (and attempts to circumvent that).

Here's what I got so far, ignoring escape sequences:

([a-zA-Z0-9_-]+[\\W[_]]*)+(\\.|[\\W]?|dot|\\(\\.\\)|[\\(]?dot[\\)]?)+([\\w]{2,6})((\\.|[\\W]?|dot|\\(\\.\\)|[\\(]?dot[\\)]?)([\\w]{1,4}))*

I'm not so sure what might be causing the problem but however it censors the word "com" and "come" and pretty much anything that is about 3+ letters.

Problem: I want to know how to censor website links and invalid links that are attempts to circumvent the censor. Examples:

Google.com

goo gle .com

g o o g l e . c o m

go o gl e % com

go og le (.) c om

Also a slight addition, is there a possible way to add links to a white list for this? Thank you.

regex encapsulation

I've got a question concerning regex.

I was wondering how one could replace an encapsulated text, something like {key:23} to something like <span class="highlightable">23</span, so that the entity will still remain encapsulated, but with something else.

I will do this in JS, but the regex is what is important, I have been searching for a while, probably searching for the wrong terms, I should probably learn more about regex, generally.

In any case, is there someone who knows how to perform this operation with simplicity?

Thanks!

samedi 27 juin 2015

Converting perl snippet to java

I want to rewrite a perl code in java:

sub validate_and_fix_regex {
    my $regex = $_[0];
    eval { qr/$regex/ };
    if ($@) {
        $regex = rquote($regex);
    }
    return $regex;
}

sub rquote {
    my $string = $_[0] || return;
    $string =~ s/([^A-Za-z_0-9 "'\\])/\\$1/g;
    return $string;
}

the code gets a regex and fix it if it has any escaped character. i cant find any alternative for eval { qr/$regex/ }; and $string =~ s/([^A-Za-z_0-9 "'\\])/\\$1/g; in java.

Regex Look Ahead

Today for a project I was trying to make use of regular expression and learnt about groups and how to use them. I am using this site to test it.The problem is that whenever I write the following regex:

(?=\S*\d)

, the site gives me an error : the expression can match 0 characters and therefore can match infinitely.

while this doesn't throw any error :

(?=\S*\d)(\S{6,16})

can anyone explain to me what is the meaning of the error.

PHP regex strip coma and space from beginning and end of string

I have some strings like this

", One "
", One , Two"
"One, Two "
" One,Two, "
" ,Two ,Three "

and need to remove space and or coma at beginning and end of string only tried few regex with preg_replace(), but they replace all occurrences.

EDIT: Actually would be great to remove all clutter like !@#$%^&*( etc whatever is at the end and beginning of string, but not in between.




Optionally need to make strings look proper by placing word then coma then space then another word (if there's coma one in between words).

Example "One,Two ,Three , Four" into "One, Two, Three, Four".

P.S. Please provide answer as two separate regex as its easier to understand.

In C#, how can I get the start/end indexes of all the replacements by the Regex.Replace() function

I have made a program to highlight the phrases in the input matched by a given Regex expression on the fly.

However, I want to highlight the replacements in the output panel too. To do this, I need to obtain the indexes and lengths found by Regex.Replace(). Unfortunately, it would seem C# doesn't give access to this data. Have I missed something?

I've thought about manually trying to figure out the indexes by accumulating sums given off from the MatchCollection produced by Regex.Matches(). But this is prone to error, and may not take into account the special $ symbol in the replace expression which could throw the figures off.

There must be a more elegant way.

Regex Expression replacing spaces in both end of the word

enter image description here

http://ift.tt/1ICDGzE

/(^|\s):bin:(\s|$)/gm

It is unable to scan and replace the one in the middle. How can I fix that without repeating replace() twice.

variable expansion as a pattern in sed not working

I've a simple script to set several parameters in /etc/ssh/sshd_config :

#! /bin/bash

declare -a param=('Banner' 'ClientAliveInterval' 'ClientAliveCountMax' 'Ciphers' \
'PermitUserEnvironment' 'PermitEmptyPasswords' 'PermitRootLogin' \
'HostbasedAuthentication' 'IgnoreRhosts' 'MaxAuthTries' \
'X11Forwarding' 'LogLevel'\
)

declare -a val=('/etc/issue.net' '300' '0' 'aes128-ctr,aes192-ctr,aes256-ctr' \
'no' 'no' 'no' 'no' 'yes' '4' 'no' 'INFO' \
)

for (( i=0;i<12;i++ ))
do
 #echo "${param[$i]} ${val[$i]}"
  egrep "^[ #]*${param[$i]}.*" /etc/ssh/sshd_config &> /dev/null
   if [ $? -eq 0 ];
    then
       sed -i "s|^[ #]*\$param[$i].*|${param[$i]} ${val[$i]}|1" /etc/ssh/sshd_config
  else
       echo "${param[$i]} ${val[$i]}" >> /etc/ssh/sshd_config
  fi
done;

However the variable expansion in sed pattern match is not working as desired:

sed -i "s|^[ #]*\$param[$i].*|${param[$i]} ${val[$i]}|1" /etc/ssh/sshd_config

Can someone help me. My array expansion and everything in the script is fine though. I've checked the same with an echo printout.

Regex Expressions For Emoji

http://ift.tt/1LneX8A

function custom() {
var str = document.getElementById('original').innerHTML;
var replacement = str.replace(/\B:poop:\B/g,'REPLACED');
document.getElementById('replaced').innerHTML = replacement;
}
custom()

Yes = :poop: should be replaced with "REPLACED" No = :poop: should not be replaced. In other words, remain untouched.

Number 4, 5, 6 doesn't seems to follow the rule provided. I do know why, but I don't have much idea how to combine multiple expressions into one. I have tried many others but I just can't get them to work the way I wanted them to be. Odds aren't in my favor.

And yes, this is very similar to how Facebook emoji in chat box works.

New issue:

enter image description here

http://ift.tt/1ICDGzE

/(^|\s):bin:(\s|$)/gm

It is unable to scan and replace the one in the middle. How can I fix that?

Python regex: Matching a URL

I have some confusion regarding the pattern matching in the following expression. I tried to look up online but couldn't find an understandable solution:

imgurUrlPattern = re.compile(r'(http://i.imgur.com/(.*))(\?.*)?')

What exactly are the parentheses doing ? I understood up until the first asterisk , but I can't figure out what is happening after that.

This is a code from Thinking in Java, I do not understand the output of the code. Please somebody help me

//I am trying to learn Java Regex and encountered a code on which I am stuck from a long while, please explain me the working of this code. //The command line arguement is //String args[] = "abcabcabcdefabc "abc+" "(abc)+" "(abc){2,}"

import java.util.regex.*;

public class PatternMatcher {

public static void main(String[] args) {    
    if(args.length < 2) {     
        System.out.println("Usage:\njava TestRegularExpression " +       
        "characterSequence regularExpression+");       
        System.exit(0);     
        }   

        System.out.println("Input: \"" + args[0] + "\"");    
        for(String arg : args) {      
        System.out.println("Regular expression: \"" + arg + "\"");      
        Pattern p = Pattern.compile(arg);      
        Matcher m = p.matcher(args[0]);       

        while(m.find()) { 
            System.out.println("Match \"" + m.group() + "\" at positions " +        
        m.start() + "-" + (m.end() - 1));      
            }   
        }  
    } 
} 

Output:- Input: "abcabcabcdefabc" Regular expression: "abcabcabcdefabc" Match "abcabcabcdefabc" at positions 0-14 Regular expression: "abc+" Match "abc" at positions 0-2 Match "abc" at positions 3-5 Match "abc" at positions 6-8 Match "abc" at positions 12-14 Regular expression: "(abc)+" Match "abcabcabc" at positions 0-8 Match "abc" at positions 12-14 Regular expression: "(abc){2,}" Match "abcabcabc" at positions 0-8

python regex use capture group to define another groups length { }

I am parsing hex data with python regex. I have the following packet structure:

'\xaa\x01\xFF\x44'

  • \xaa - start of packet
  • \x01 - data length [value can vary from 00-FF]
  • \xFF - data
  • \x44 - end of packet

i want to use python regex to indicate how much of the data portion of the packet to match as such:

r = re.compile('\xaa(?P<length>[\x00-\xFF]{1})(.*){?P<length>}\x44')

this compiles without errors, but it doesnt work (i suspect because it cannot convert the hex value to an appropriate integer) Is there a method by which this can be accomplished in python?

Background: I have been using erlang for packet unpacking and I was looking for something similar in python

Using arrays in regular expressions?

Does anyone know if there is a way to use an array in a regular expression? suppose I want to find out if somefile.txt contains one of an array's elements. Obviously the code below doesn't work, but is there something similar that does work?

array = [thing1 thing2 thing3]
file = File.open("somefile.txt")

file.each_do |line|
if /array/.match(line)
puts line
end

Basically I've got a big list of words to search for, and I'd like to avoid something like this:

($somefile =~ /(thing1|thing2|thing3)/)

Appending a line just after the matched pattern in sed not working

My /etc/pam.d/system-auth-ac has the below auth parameters set:

auth        required      pam_env.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 500 quiet
auth        required      pam_deny.so

I want to insert pam_tally2.so just after pam_env.so. So I want it to be:

auth        required      pam_env.so
auth        required      pam_tally2.so onerr=fail audit silent deny=5 unlock_time=900
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 500 quiet
auth        required      pam_deny.so

The script that I'm using is :

#! /bin/bash

grep "pam_tally2" /etc/pam.d/system-auth-ac &> /dev/null
if [ $? -ne 0 ];
then
   sed -i '/^[]*account[]*required[]*pam_unix.so/aauth\trequired\tpam_tally2.so onerr=fail audit silent deny=5 unlock_time=900' /etc/pam.d/system-auth-ac
else
   sed -i 's/.*pam_tally2.*/auth\trequired\tpam_tally2.so onerr=fail audit silent deny=5 unlock_time=900/1' /etc/pam.d/system-auth-ac
fi

But it gives this error:

sed: -e expression #1, char 116: unterminated address regex

What am I doing wrong ?

generate regex only between two number but find all number include 1 to 10

i want to generate regex only between 1 to 10 i use this

(10|[1-9])

but notepad++ find all the numbers that start 1 to 10 but i want to find only 1 to 10 numbers not all numbers that include part of 1 to 10

sorry for my poor english

fetch attribute of tag using curl request without domDocument in php using regex only

i have URL : https://mobile.twitter.com/session/new

want to find <meta name="csrf_id" content="ce7754058e42523e84e0443138f28f33"> and want to fetch only ce7754058e42523e84e0443138f28f33 value as output. content value may change according to your browser.

I dont want to use domDocument i have to do with regex only

Note : you have to open this URL in mozilla firefox only then only you will find <meta name="csrf_id" content="ce7754058e42523e84e0443138f28f33"> you wont be find this tag in any other browser

How do I report only one error for an XML validation failure of a single element?

I would like to know how to return only one error message when an element fails XML schema date validation. My goal was to preserve the validations provided by xs:date (such as February 29, 2015 being invalid due to 2015 not being a leap year), but also ensure that the date is of the pattern with the following regex:

"\d{4}[-]\d{2}[-]\d{2}"

I tried out this in an online XML schema tester, and I got the expected results. The entire schema is as follows:

<xs:schema xmlns:xs="http://ift.tt/tphNwY" elementFormDefault="qualified" attributeFormDefault="unqualified">
                        <xs:element name="tripDate">
                                                    <xs:simpleType>
                                                        <xs:restriction base="xs:date">
                                                            <xs:whiteSpace value="collapse"/>
                                                            <xs:pattern value="\d{4}[\-]\d{2}[\-]\d{2}"/>
                                                        </xs:restriction>
                                                    </xs:simpleType>
                                                </xs:element></xs:schema>

And the test element value I used (among others) is:

<tripDate>20115-02-20</tripDate>

When I entered these into the formatter at http://ift.tt/1lmKPLW , I got the following outputs back:

    Cvc-pattern-valid: Value '20115-02-20' Is Not Facet-valid With Respect To Pattern '\d{4}[\-]\d{2}[\-]\d{2}' For Type '#AnonType_tripDate'.. Line '1', Column '33'.

    Cvc-type.3.1.3: The Value '20115-02-20' Of Element 'tripDate' Is Not Valid.. Line '1', Column '33'.

My question is, how can I ensure that only one error will ever get returned? Specifically, I only want the "Cvc-type" error to be returned, because that comes in the cases when the pattern is invalid, even though the date itself is valid. Is there any kind of schema setting to suppress the "Cvc-pattern-valid" error, or to collapse error into the error of the parent restriction, which in this case is xs:date? I want the "Cvc-type" error only to be returned, because this is what is handled in the Java code for the application, and I am trying not to unnecessarily add Java code to handle the other error as well (at this point it throws an SQL exception error).

Thank you.

RegEx to match nested parentheses including the start and end parentheses

string st = "this (a,b) and this (s,(r,t),u) is a test";
var regex = new Regex(@"\(([^()]+| (?<Level>\()| (?<-Level>\)))+(?(Level)(?!))\)", RegexOptions.IgnorePatternWhitespace);

foreach (Match c in regex.Matches(input))
{
  Console.WriteLine(c.Value.Trim('(', ')'));
}

The above C# code in .NET 4.5 correctly returns:

a,b
s,(r,t),u

But I need the output including the parentheses as:

(a,b)
(s,(r,t),u)

is there any named regular expression capture for grep?

i'd like to know if its possible to get named regular expression with grep -P(linux bash) from a non formatted string? well.. from any string

For example: John Smith www.website.com john@website.com jan-01-2001 to capture as

$name
$website
$email
$date

but it seems I cant pass any variables from output?

echo "www.website.com" | grep -Po '^(www\.)?(?<domain>.+)$' | echo $domain

has no output

fetch attribute of tag using curl request without domDocument in php

i have URL : http://ift.tt/1LvbSlT

want to find <meta name="p:domain_verify" content="1e8715341cd73cff926f648b6dd4a089"/> and want to fetch only 1e8715341cd73cff926f648b6dd4a089 value as output. content value may change according to your browser.

I dont want to use domDocument

The regular expression in the FILTER option of gcovr

I use gcovr to read coverage information for program foo.c. The syntax is

gcovr -r . -b --filter=FILE_PATTERN

where the '-r .' part indicates the current directory as the search root, '-b' is for branch coverage, and the FILE_PATTERN part is to keep only the data files that match this regular expression, according to gcovr's user guide http://ift.tt/1HnhkFK.

So I use this following command line

gcovr -r . -b --filter='foo\\.c'

But this command finds no information regarding foo.c retrieved. It seems I use a wrong regular expression. What do you think?

Mocha Chai regex are equals

I am trying to test the behaviour of a method that produce regex.

Using Mocha/Chai test suite I have the following code :

describe('regexTest',function () {
  it('should return a regexp', function () {
    var regex = regexTest();
    assert.equal(regex, /someregex/);
  });
});

But it seems that the code above doesn't work. I tried in the chrome console :

/a/ == /a/ 
> false

For the moment the only way I found is to compare the toString of the two regex (that should be equals and that I can compare) :

describe('regexTest',function () {
  it('should return a regexp', function () {
    var regex = regexTest();
    assert.equal(regex.toString(), '/someregex/');
  });
});

Do you know a better way to do this ? Because I find this not really clean ...

regex validating url

I have the following urls as subjects

localhost/hx
localhost/h.php
localhost/h/x
localhost/h/x/y
localhsot/h?hello
localhost/h

Currently my pattern is something like /localhost/h[/]*/i, but this does not work.

I do not want to match line 1 or 2, but the others are valid.

  • Problem 1: I am not able to make / or ? optional, if I remove * from the pattern it does not match localhost/h.

  • Problem 2: I am not able to escape ? for line 5, I tried \\ and \ but neither worked.

Adding dot to rewrite rule

I want to add a dot (.) to my variable in the last rewrite rule, and I have no idea where to put it in this syntax. Does it go after A-Z?

ErrorDocument 404 /
ErrorDocument 403 /

Options ALL -Indexes

RewriteEngine On
RewriteRule ^([0-9/.]+)$ index.php?Patch_No=$1 [NC,L]
RewriteRule ^([0-9/.]+)&([0-9a-zA-Z_-]+)$ index.php?Patch_No=$1&tab=$2 [NC,L]
RewriteRule ^patches php/patches.php [NC,L]
RewriteRule ^([\s'0-9a-zA-Z_-]+)$ index.php?Champion=$1 [NC,L]

Notepad++ replace n characters in a certain pattern

In Notepad++ version 6.7.8.2, I would like to replace the string

<p class=question><b>nn.</b> 

where nn is any charachter 1-20000. (followed by a .)

I've tried expressions, etc. but can't get it to work.

Any help is appreciated.

Matching floating point range using regular expression

I need a regular expression, that matches within a range of floats.

For the range 12.33 - 13.41 I tried the following regex:

(12.[3-9][3-9]|13.?[0-4][0-1]?)\d{0,2}

but it doesn't match i.e. 12.41 or 13.39.

Is this possible with regular expressions?

Kind regards, bernie70

Javascript splitting between number and string with regexp

How can I split following string:

1D11M58S

by means of regexp in javascript?

My result should be:

[1D, 11M, 58S]

thanks

how to adjust javascript regexpression

I need to adjust a regex I am using for a file name. currently the file is structured 02-2015 VRF WE32.pdf the regex splits up the string into 3 pieces by the spaces. This works fine. the result is

02-2015
VRF
WE32

But now i need to split the string into 4 pieces. the 4th being the digits in the 'WE32'. so it needs to look like this

02-2015
VRF
WE
32

Here is what I am using, including some screenshots

var matchesPip = file.name.match(/^\d+\D\d+\s*(\S*\s*)(\S*)/i);
var matchesLoc = file.name.match(/^\d+\D\d+\s*?(\S*)\s*(\S*?)\./i);
var matchesLocCode = file.name.match(NEED HELP HERE);
    $scope.pip = $scope.pipeLookup[matchesPip[1]];
    $scope.loc = $scope.locationLookup[matchesLoc[2]];
    $scope.locCode = $scope.locationCodeLookup[matchesLocCode[3]];

pic1 pic2 pic3

JavaCC - Match some words except some others

I am trying to create a parser for first-order logic using the javacc and I have the following problem.

I have this:

< VARIABLE : < LOWER > (< INNER >)* >

where

< #INNER :
( < UPPER >
| < LOWER >
| < DIGIT >
| < UNDERSCORE > ) >

< #DIGIT : [ "0"-"9" ] >
< #LOWER : [ "a"-"z" ] >
< #UPPER : [ "A"-"Z" ] >
< #UNDERSCORE: "_" >

, but I want to exclude some words from this. In particular i want to exclude the words sum, count, avg, max and min. Is there a way to to this? In other words, I want the variables to begin with a lowercase letter followed by any number of inner tokens, where the inner token may be an uppercase letter, a lowercase letter, a digit or an underscore. However, i do not want the variables to take values such as sum, count, avg, max and min.

C++ regex for checking Gomoku win condition

I construct a string that represents my entire Gomoku game board (5x5 for example), where 0 indicates empty, and 1 or 2 indicates black or white.

A string "x" is placed between each row to separate the rows.

std::string state = "00000x012122x00100x00222x010201"

What I'm trying to do is check for a match of 3 for the current player horizontally (I will deal with vertical and diagonal later); let's say, white, so I am looking for a match of three 2's sequentially in the string, and only three 2's.

Gomoku does not allow overline, which means the regular expression cannot match 4 or more.

Here is my attempt at the problem:

bool Game::check_horizontal(std::string state)
// Checks if the current_player (stored in the class) has won horizontally.
{
    std::string pattern = "(" + std::to_string(current_player) + "{3})"; // Current player is white by default.
    std::regex reg1(pattern);
    if (regex_search(state, reg1) == true)
    {
        std::cout << "Match! Someone has won." << std::endl;
        return true;
    }
    else
    {
        std::cout << "No match... keep going." << std::endl;
        return false;
    }
}

So far it seems that the code works as intended for the state above, but it keeps matching if there are 4 or more of what I'm looking for. If I added another 2 on the 4th row, in the 2nd column, it will still match.

Is there a problem with my regular expression, or my usage of regex?

python regular expression in searching pattern

In Python ,I have pattern as

MY_PATTERN='[^\\\\]*\\\\\\w{0,4}First\\\\.+\\.\\w{3}|[^\\\\]*\\\\\\w{0,4}Second\\\\.+\\w{3}'

I use this pattern to search in my string.

Here What all strings the pattern matches in this case?

Making a [code][/code] for BBcode with php regex

I would like to make a [code][/code] tag for bbcode so that what would be inside wouldn't be taken into account by the php regex that I made.

Example :

Hello [b]newbie[/b], to write in bold, use the following : [code][b](YOURTEXT)[/b][/code]

Should return in HTML :

Hello <strong>newbie</strong>, to write in bold, use the following : [b](YOURTEXT)[/b]

Here is a view of a part of my bbcode function :

<?
function bbcode($var) {
   $var = preg_replace('`\[b\](.+)\[/b\]`isU', '<strong>$1</strong>', $var); 
   $var = preg_replace('`\[i\](.+)\[/i\]`isU', '<em>$1</em>', $var);
   $var = preg_replace('`\[u\](.+)\[/u\]`isU', '<u>$1</u>', $var);
   return $var;
}
?>

Thank you in advance for your kind help !


EDIT : Here is how I finally made it work :

<? 
function bbcode($var) {
$var2 = preg_split('`(\[code].*?\[/code])`isU', $var, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

$var = preg_replace('`\[b\](.+)\[/b\]`isU', '<strong>$1</strong>', $var); 
$var = preg_replace('`\[i\](.+)\[/i\]`isU', '<em>$1</em>', $var);
$var = preg_replace('`\[u\](.+)\[/u\]`isU', '<u>$1</u>', $var);

$var = preg_replace('`(\[code].*?\[/code])`isU', $var2[1], $var);
$var = preg_replace('`\[code\](.+)\[/code\]`isU', '<div>$1</div>', $var);
return $var;
}

$text = 'Hello [b]newbie[/b], to write in bold, use the following [u]lol[/u] : [code][b](YOURTEXT) [u]lol[/u][/b][/code] [b][u]LOL[/u][/b]';

echo bbcode($text); 
?>

HOWEVER, there is a new problem left : if the character chain starts directly with '[code]' for example

[code][b]hello[/b][/code] test

than the result will be :

test test

This is because $var2[1] now leads to what comes after the [/code].

Could someone please help me to make a better delimitation that would also work for that second character chain ? Thank you in advance !

SED, deleting lines between the patterns

This is regarding deleting the lines between the pattern excluding the lines with pattern using SED. If the seconds pattern appears twice or thrice.. i want the lines to be deleted till the last occurrence of the second pattern.. Please help me in writing the regex for this.. Thanks a lot.

Extracting users from twitter status in R. Non-trivial cases like RT@user @user:

I want to extract usernames from a tweet where these usernames may be:

  • followed by some non-alphanumerical characters.
  • not preceded by a white space.

I have this:

    text <- "RT@user1: This is a retweet that mentions @user2."
    tokens <- unlist(strsplit(text, " "))
    mentions.mask <- grepl("@\\w+", tokens)
    mentions <- tokens[mentions.mask]
    mentions
    [1] "RT@user1:" "@user2."

But I would like to get just @user1 and @user2

How can I do it?

Parse text between multiple lines - Python 2.7 and re Module

I have a text file i want to parse. The file has multiple items I want to extract. I want to capture everything in between a colon ":" and a particular word. Let's take the following example.

Description : a pair of shorts
amount : 13 dollars
requirements : must be blue
ID1 : 199658
----

The following code parses the information out.

import re

f = open ("parse.txt", "rb")
fileRead = f.read()

Description = re.findall("Description :(.*?)amount", fileRead, re.DOTALL)
amount = re.findall("amount :(.*?)requirements", fileRead, re.DOTALL)
requirements = re.findall("requirements :(.*?)ID1", fileRead, re.DOTALL)
ID1 = re.findall("ID1 :(.*?)-", fileRead, re.DOTALL)

print Description[0]
print amount[0]
print requirements[0]
print ID1[0]

f.close()

The problem is that sometimes the text file will have a new line such as this

Description 
: a pair of shorts
amount 
: 13 dollars
requirements: must be blue
ID1: 199658
----

In this case my code will not work because it is unable to find "Description :" because it is now separated into a new line. If I choose to change the search to ":(.*?)requirements" it will not return just the 13 dollars, it will return a pair of shorts and 13 dollars because all of that text is in between the first colon and the word, requirements. I want to have a way of parsing out the information no matter if there is a line break or not. I have hit a road block and your help would be greatly appreciated.

How to create regular expression starting with different directory name in Python

Currently I am taking directory path as input from the user. The path will be different for each user(dir_name will be different)

/file/perm/perm13/user123/dir_name/ 

while the structure inside dir_name is same for all users given as below:

tar1 tar2 tar3 tar4 tar5 tar6 source

In each tar, three folders are there: build collateral fix in collateral: I have three different files

 _base.txt, _fullbase.txt and _skip.txt 

I also need corresponding .dot files from source/projectfiles/params/tar_base.dot | tar_fullbase.dot | tar_skip.dot

Here is what I need to do:
1. Take input from user 2. One by one- go into the targets. (tar1 -> tar2 -> tar3 ->tar4 ->tar5) 3. For each tar, search for collateral folder 4. and in collateral folder search for all three .txt files 5. for each .txt file search for corresponding .dot file

currently I do following: I take path for txt from the user

txtfilepath = raw_input (" Please provide file path for the desired txt")
dotfilepath = raw_input (" Please enter corresponding .dotfilepath")
<directory path>/<tar1>/collateral/<xyz_base.txt)

The regex:

platform = re.search("?<=/collateral/).(?="_(base|fullbase|skip)\.txt)",txtfilepath).group(0)

can some python geek help?

python regex to split by comma or space (but leave strings as it is)

I need to split a string by space or by comma. But it should leave single or double quoted strings as it is. Even if it is apart by many spaces or a single space it makes no difference. For e.g.:

    """ 1,' unchanged 1' " unchanged  2 "   2.009,-2e15 """

should return

    """ 1,' unchanged 1'," unchanged  2 ",2.009,-2e15 """

There may be no or more spaces before and after a comma. Those spaces are to be ignored. In this particular context, as shown in the ex string, if two quoted or double quoted strings happen to be next to each other, they will have a space in between or a comma.

I have a previous question at python reg ex to include missing commas, however, for that to work a splitting comma should have a space after.

How to translate a Ruby regex to JavaScript?

In Ruby I have a regex to get a string formatted like "@xxx":

(/(?<!\S)@[A-Za-z0-9\-]+/)

I also need this regex on the client side, but JavaScript can't read this.

How can I change this regex to JavaScript?

How to isolate only the first space in a string using regex?

I have a foreign language to English dictionary that I'm trying to import into a sql database. This dictionary is in a text file and the lines look like this:

field1 field2 [romanization] /definition 1/definition 2/definition 3/

I'm using regex in python to identify the delimiters. So far I've been able to isolate every delimiter except for the space in-between field 1 and field 2.

(?<=\S)\s\[|\]\s/(?=[A-Za-z])|/
#(?<=\S)\s\[  is the opening square bracket after field 2
#\]\s/(?=[A-Za-z]) is the closing square bracket after the romanization
#/ is the forward slashes in-between definitions.
#????????? is the space between field 1 and field two