I have a text file i want to parse. The file has multiple items I want to extract. I want to capture everything in between a colon ":" and a particular word. Let's take the following example.
Description : a pair of shorts
amount : 13 dollars
requirements : must be blue
ID1 : 199658
----
The following code parses the information out.
import re
f = open ("parse.txt", "rb")
fileRead = f.read()
Description = re.findall("Description :(.*?)amount", fileRead, re.DOTALL)
amount = re.findall("amount :(.*?)requirements", fileRead, re.DOTALL)
requirements = re.findall("requirements :(.*?)ID1", fileRead, re.DOTALL)
ID1 = re.findall("ID1 :(.*?)-", fileRead, re.DOTALL)
print Description[0]
print amount[0]
print requirements[0]
print ID1[0]
f.close()
The problem is that sometimes the text file will have a new line such as this
Description
: a pair of shorts
amount
: 13 dollars
requirements: must be blue
ID1: 199658
----
In this case my code will not work because it is unable to find "Description :" because it is now separated into a new line. If I choose to change the search to ":(.*?)requirements" it will not return just the 13 dollars, it will return a pair of shorts and 13 dollars because all of that text is in between the first colon and the word, requirements. I want to have a way of parsing out the information no matter if there is a line break or not. I have hit a road block and your help would be greatly appreciated.
Aucun commentaire:
Enregistrer un commentaire