dimanche 28 juin 2015

Regex to remove punctuation from tokenized text

I'm trying to remove punctuation from tokenized text using regular expressions. Can anyone explain the following behaviour:

$ STRING='hey , you ! what " are you doing ? say ... ," what '
$ echo $STRING | sed -r 's/ [^[:alnum:][:space:]-]+ / /g;'
hey you what are you doing say ," what
$ echo $STRING | sed -r 's/ [[:punct:]]+ / /g;'
hey you what are you doing say ," what
$ echo $STRING | perl -pe 's/ [^[:alnum:][:space:]-]+ / /g;'
hey you what are you doing say ," what
$ echo $STRING | perl -pe 's/ [[:punct:]]+ / /g;'
hey you what are you doing say ," what

The ," token is preserved in the output, which I don't want. It's possible to match this token with:

$ echo $STRING | perl -pe 's/ [",]+ / /g;'
hey you ! what are you doing ? say ... what

URL rewrite not found

My URL it's like this

http://ift.tt/1TWVaiZ

and i want to rewrite url's like this

http://ift.tt/1GTUlAH

I thought this was the way

RewriteEngine On
RewriteBase /validar/
RewriteRule ^/([^/]*)/([^/]*)/$/api.php?desde=$1&que=$2 [L]

But i'm getting 404 and cant make it work

Any ideas? Thanks

PHP: filter specific pattern out of string

My raw output of socket_recvfrom is:

ID IP PING IDENTIFIERNUMBER USERNAME


0 127.0.0.1:1234 0 ID123456789 Moritz

1 127.0.0.1:1234 46 ID123456789 August Jones

2 127.0.0.1:1234 46 ID123456789 Miller


It is a single string that contains all of this informations in once and just contains whitespaces between the informations. All keys can be longer or shorter.

My problem:

When I preg_split("/\s+/") it, then I get a good array with useable data, but when the username contains spaces it creates a second index for this. Not good, all data that comes after this just get destroyed.

I sort the array like this: ID, USERNAME, PING, IDENTIFIERNUMBER, IP

Example by the sorting output with username with one space in it:


ID: 0, USERNAME: Moritz, PING: 0, IDENTIFIERNUMBER: ID123456789, IP: 127.0.0.1:1234

ID: 1, USERNAME: August, PING: Jones, IDENTIFIERNUMBER: 46, IP: ID123456789

ID: 127.0.0.1:1234, USERNAME: 2, PING: Miller, IDENTIFIERNUMBER: 46, IP: ID123456789


How do I get the information correctly out of the string?

Just forgot to say:

The string begins with: --------------------------------- in a not countable order. So it can be like 10 characters or 12. The string ends with:

 (8 users in total)

The regex methode looks good. I only need to filter out the other characters.

--------------------------------- 0 127.0.0.1:1234 0 ID123456789(OK) Moritz 1 127.0.0.1:1234 46 ID123456789(OK) August Jones 2 127.0.0.1:1234 46 ID123456789(OK) Miller (7 users in total)

Last problem: http://ift.tt/1eRW0xy

Regex PHP: Get specific content from a block of code from another website

I have a site from which I want to get specific content from 7 posts. Those all 7 seven posts have same HTML layout (See Below)

<div class="eventInfo">
<h3>Z's(矢沢永吉)</h3>
  <h4>Z's TOUR 2015</h4>

<dl>
    <dt><img src="/event/img/btn_day.png" alt="公演日時" width="92" height="20"> </dt>
    <dd>
      <table width="99%" border="0" cellpadding="0" cellspacing="0">
        <tbody><tr>
      <td width="9%" nowrap="nowrap">2015年6月</td>
      <td width="74%">4日 (木) 19:00開演</td>
    </tr>

  </tbody></table>
</dd>
<dt><img src="/event/img/btn_price.png" alt="料金" width="92" height="20"> </dt>
<dd>S¥10,500 A¥7,500 (全席指定・消費税込)<br><span class="attention">※</span>注意事項の詳細を<a href="http://ift.tt/1HpdVrL" target="_blank">矢沢永吉公式サイト</a>より必ずご確認ください</dd>

<dt><img src="/event/img/btn_ticket.png" alt="一般発売" width="92" height="20"> </dt>
<dd>
 <table width="99%" border="0" cellpadding="0" cellspacing="0">
  <tbody><tr>
    <td width="9%" nowrap="nowrap">2015年5月</td>
    <td width="74%">16日(土)</td>
  </tr>
</tbody></table>
  </dd>

  <dt><img src="/event/img/btn_contact.png" alt="お問合わせ" width="92" height="20"> </dt>
  <dd><a href="http://www.siteurl.com/" target="_blank">ソーゴー大阪</a> 06-6344-3326</dd>

  <dt><img src="/event/img/btn_info.png" alt="公演詳細" width="92" height="20"> </dt>
  <dd><a href="http://ift.tt/1GF5ySz" target="_blank">http://ift.tt/1HpdVrN; </dd>
</dl>
</div>

I just want to fetch the H3 from this layout and the first table in the code. What regex method should I use to get the desired results?

Also these are 7 posts just like the code above and I have to get H3 and the first table from each of it.

I have tested it but not sure that is it a correct way or not: http://ift.tt/1GF5z8P

But as you can see that I have to add unwanted data too like H4 DT IMG :(

This regex doesn't work in c++

It is supposed to match "abababab" since "ab" is repeated more than two times consecutively but the code isn't printing any output. Is there some other trick in using regex in C++.

I tried with other languages and it works just fine.

#include<bits/stdc++.h>

int main(){

  std::string s ("xaxababababaxax");
  std::smatch m;
  std::regex e ("(.+)\1\1+");   

   while (std::regex_search (s,m,e)) {
    for (auto x:m) std::cout << x << " ";
    std::cout << std::endl;
    s = m.suffix().str();
  }

  return 0;
}

Redirect Loops and .htaccess

I just moved from a CentOS dedi to an Ubuntu VPS. The site is custom coded PHP.

Frontend works fine (including rewrite rules). The admin backend I can't get rewrite rules to work...

First error:

H00124: Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.

Then after using debug level:

AH00122: redirected from r->uri = /admin/index.php

The relevant bits of my htaccess are:

# mod_rewrite set:

Options +Includes

RewriteEngine on

# Administration
RewriteCond %{REQUEST_URI} ^(/+)admin/(.*)$
RewriteRule (.*) %{DOCUMENT_ROOT}/admin/index.php [L,QSA]

# Rewrite orther
RewriteCond %{REQUEST_URI} !^(/+)index.php(.*)$
RewriteCond %{REQUEST_URI} !^(/+)syscmd.php$
RewriteRule ^(.*)$ %{DOCUMENT_ROOT}/index.php?page=$1 [L,QSA]

# If Rewriting Failure, Show error message (Internal backup)
RewriteCond %{REQUEST_URI} !^(/+)index.php$
RewriteCond %{REQUEST_URI} !^(/+)syscmd.php$
RewriteRule (.*) \1 [F]

This was working fine on CentOS too.

Any ideas? I already tried adding the following as the first condition:

RewriteCond %{REQUEST_URI} !/admin/ [NC]

That stopped it rewriting /admin completely.

Thanks

Grouping Output Pattern in Regex- Python

I would like to get only the sentences which match the regex pattern as output and avoid these NONE, how do i group the output which matches the pattern?

import re regex = re.compile('(.*)(?:India)') with open("D:/txt_res/abc.txt") as f: for line in f: result = regex.search(line) print(result)

The output which im getting is

None

None

None

<_sre.SRE_Match object; span=(0, 101), match='Email: abc.bitz@gmail.com >

None

None

None

<_sre.SRE_Match object; span=(0, 47), match='XYZ Engineer at ABC Organization, India'>

None

None

<_sre.SRE_Match object; span=(0, 32), match='Intern at S360, India'>

None

None