Skip to content Skip to sidebar Skip to footer

Automatic Conversion Of The Advanced String Formatter From The Old Style

Is there any automatic way to convert a piece of code from python's old style string formatting (using %) to the new style (using .format)? For example, consider the formatting of

Solution 1:

Use pyupgrade

pyupgrade --py3-plus <filename>

You can convert to f-strings (formatted string literals) instead of .format() with

pyupgrade --py36-plus <filename>

You can install it with

pip install pyupgrade

Solution 2:

The functionality of the two forms does not match up exactly, so there is no way you could automatically translate every % string into an equivalent {} string or (especially) vice-versa.

Of course there is a lot of overlap, and many of the sub-parts of the two formatting languages are the same or very similar, so someone could write a partial converter (which could, e.g., raise an exception for non-convertible code).

For a small subset of the language like what you seem to be using, you could do it pretty trivially with a simple regex—every pattern starts with % and ends with one of [sdf], and something like {:\1\2} as a replacement pattern ought to be all you need.

But why bother? Except as an exercise in writing parsers, what would be the benefit? The % operator is not deprecated, and using % with an existing % format string will obviously do at least as well as using format with a % format string converted to {}.

If you are looking at this as an exercise in writing parsers, I believe there's an incomplete example buried inside pyparsing.


Some differences that are hard to translate, off the top of my head:

  • * for dynamic field width or precision; format has a similar feature, but does it differently.
  • %(10)s, because format tries to interpret the key name as a number first, then falls back to a dict key.
  • %(a[b])s, because format doesn't quote or otherwise separate the key from the rest of the field, so a variety of characters simply can't be used.
  • %c takes integers or single-char strings; :c only integers.
  • %r/%s/%a analogues are not part of the format string, but a separate part of the field (which also comes on the opposite side).
  • %g and :g have slightly different cutoff rules.
  • %a and !a don't do the exact same thing.

The actual differences aren't listed anywhere; you will have to dig them out by a thorough reading of the Format Specification Mini-Language vs. the printf-style String Formatting language.

Solution 3:

The docs explain some of the differences. As far as I can tell -- although I'm not very familiar with old-style format strings -- is that the functionality of the new style is a superset of the functionality of the oldstyle.

You'd have to do more tweaking to handle edge cases, but I think something simple like

re.replace(r'%(\w+)([sbcdoXnf...])', r'{\1\2}', your_string)

would get you 90% of the way there. The remaining translation -- going from things like %x to {0:x} -- will be too complex for a regular expression to handle (without writing some ridiculously complex conditionals inside of your regex).

Post a Comment for "Automatic Conversion Of The Advanced String Formatter From The Old Style"