Python Removing References From A Scientific Paper
Solution 1:
something like
import re
text = ...
re.sub(r'\((?:[\w \.&]+\, )+[0-9]{4}\)', text)
seems to do it. You can use Debuggex to train yourself in regex.
Solution 2:
This should do the trick:
import re
a = "This method has been shown to outperform previously discussed methods (Smith, J. et al., 2014) and while it has its draw-backs, it is clear that the benefits outweigh the disadvantages (Jones, A. & Karver, B., 2009, Lubber, H. et al., 2013)."
a = re.sub(r"\s\([A-Z][a-z]+,\s[A-Z][a-z]?\.[^\)]*,\s\d{4}\)", "", a)
It replaces by "" (ie nothing) every string made of a space, (
, one uppercase letter followed by one or more lowercase letters (ie a name), a comma, a space, one capital letter and a point (optionally separated by a lowercase letter for names like Christine that would be abridged to Ch.
), then anything but a closing parenthesis until we reach a comma, a space, four digits and a closing parenthesis. To summarize, it assumes that everything that looks like (Azdfs, E. stuff 2343)
should be deleted. I think that should be enough not to get overdetection.
The output I get with my code is This method has been shown to outperform previously discussed methods and while it has its draw-backs, it is clear that the benefits outweigh the disadvantages.
Post a Comment for "Python Removing References From A Scientific Paper"