Skip to content Skip to sidebar Skip to footer

Python Removing References From A Scientific Paper

NOTE: I am inexperienced with regular expressions. I want to be able to convert scientific articles into iTunes tracks. To do this I copy and paste the text in txt files and conver

Solution 1:

something like

import re
 text = ...
 re.sub(r'\((?:[\w \.&]+\, )+[0-9]{4}\)', text)

seems to do it. You can use Debuggex to train yourself in regex.

Solution 2:

This should do the trick:

import re

a = "This method has been shown to outperform previously discussed methods (Smith, J. et al., 2014) and while it has its draw-backs, it is clear that the benefits outweigh the disadvantages (Jones, A. & Karver, B., 2009, Lubber, H. et al., 2013)."

a = re.sub(r"\s\([A-Z][a-z]+,\s[A-Z][a-z]?\.[^\)]*,\s\d{4}\)", "", a)

It replaces by "" (ie nothing) every string made of a space, (, one uppercase letter followed by one or more lowercase letters (ie a name), a comma, a space, one capital letter and a point (optionally separated by a lowercase letter for names like Christine that would be abridged to Ch.), then anything but a closing parenthesis until we reach a comma, a space, four digits and a closing parenthesis. To summarize, it assumes that everything that looks like (Azdfs, E. stuff 2343) should be deleted. I think that should be enough not to get overdetection.

The output I get with my code is This method has been shown to outperform previously discussed methods and while it has its draw-backs, it is clear that the benefits outweigh the disadvantages.

Post a Comment for "Python Removing References From A Scientific Paper"