Python-docx Add_style With Ctl (complex Text Layout) Language
Solution 1:
After many hours poking around the docx file I realized much to my horror, that the answer lied in style.xml file of the document. Here’s a kind of way to fix it for people with similar problems:
Problems with Text Direction:
- If you’ve ever typed in Arabic or Persian you might have seen that aligning the text right to left doesn’t fix all your problems. Because if you don’t change text direction, then the cursor and punctuation marks remain at the far right of the screen (instead of following the last letter) and there is no right-justify if you need it. Now because I couldn’t change text direction in python-docx even by changing “textDirection” value of document.xml from ‘lrTb’ (Left-Right/Top-Bottom) to ‘rlTb’, I had to make a document with LibreOffice and change its default paragraph style (‘Normal’) to what I had in mind (rtl text direction, etc). This actually saves a lot of time later too because you don’t need to do it in python.
Xml explanation of the font changing problem:
The document with altered default style shows a couple of different things in its style.xml file. In Normal paragraph style under "w:rPr" you can see that there is an additional "w:szCs" that determines the size of complex script font (which you can’t change by changing style.font.size) and in "w:rFonts" the value for "cs" is now my specified Persian font. Also the "w:lang" value, “bidi”, is now “fa-IR” (for Persian). Here’s the xml part I’m talking about:
<w:rPr><w:rFontsw:ascii="FreeMono"w:hAnsi="FreeMono"w:cs="FreeFarsi"/><w:szw:val="40"/><w:rtl/><w:cs/><w:szCsw:val="40"/><w:langw:val="en-Us"w:bidi="fa-IR"/></w:rPr>
Now changing the style.font.size only changes "sz" value (western font size) and doesn’t do anything to "szCs" value (cs font size). And similarly style.font.name only changes "ascii" and "hAnsi" values of "w:rFonts" and doesn't do anything to "cs" value. So to change these values I had to change my style elements in python.
Answer :
from docx import Document
from docx.shared import Pt
#path to doc with altered style:
base_doc_location = 'base.docx'
doc = Document(base_doc_location)
my_style = doc.styles['Normal']
# define your desired fonts
user_cs_font_size = 16
user_cs_font_name = 'FreeFarsi'
user_en_font_size = 12
user_en_font_name = 'FreeMono'# get <w:rPr> element of this style
rpr = my_style.element.rPr
#=================================================='''This probably isn't necessary if you already
have a document with altered style, but just to be
safe I'm going to add this here'''if rpr.rFonts isNone:
rpr._add_rFonts()
if rpr.sz isNone:
rpr._add_sz()
#=================================================='''Get the nsmap string for rpr. This is that "w:"
at the start of elements and element values in xml.
Like these:
<w:rPr>
<w:rFonts>
w:val
The nsmap is like a url:
http://schemas.openxmlformats.org/...
Now w:rPr translates to:
{nsmap url string}rPr
So I made the w_nsmap string like this:'''
w_nsmap = '{'+rpr.nsmap['w']+'}'#=================================================='''Because I didn't find any better ways to get an
element based on its tag here's a not so great way
of getting it:
'''
szCs = None
lang = Nonefor element in rpr:
if element.tag == w_nsmap + 'szCs':
szCs = element
elif element.tag == w_nsmap + 'lang':
lang = element
'''if there is a szCs and lang element in your style
those variables will be assigned to it, and if not
we make those elements and add them to rpr'''if szCs isNone:
szCs = rpr.makeelement(w_nsmap+'szCs',nsmap=rpr.nsmap)
if lang isNone:
lang = rpr.makeelement(w_nsmap+'lang',nsmap =rpr.nsmap)
rpr.append(szCs)
rpr.append(lang)
#=================================================='''Now to set our desired values to these elements
we have to get attrib dictionary of these elements
and set the name of value as key and our value as
value for that dict'''
szCs_attrib = szCs.attrib
lang_attrib = lang.attrib
rFonts_atr = rpr.rFonts.attrib
'''sz and szCs values are string values and 2 times
the font size so if you want font size to be 11 you
have to set sz (for western fonts) or szCs (for CTL
fonts) to "22" '''
szCs_attrib[w_nsmap+'val'] =str(int(user_cs_font_size*2))
'''Now to change cs font and bidi lang values'''
rFonts_atr[w_nsmap+'cs'] = user_cs_font_name
lang_attrib[w_nsmap+'bidi'] = 'fa-IR'# For Persian#=================================================='''Because we changed default style we don't even
need to set style every time we add a new paragraph
And if you change font name or size the normal way
it won't change these cs values so you can have a
font for CTL language and a different font for
western language
'''
persian_p = doc.add_paragraph('نوشته')
en_font = my_style.font
en_font.name = user_en_font_name
en_font.size = Pt(user_en_font_size)
english_p = doc.add_paragraph('some text')
doc.save('ex.docx')
Edit (code improvement): I commented the lines that could use some improvement and put the better lines underneath them.
#rpr = my_style.element.rPr # If None it'll throw errors later
rpr = my_style.element.get_or_add_rPr() # this avoids potential errors#if rpr.rFonts is None:# rpr._add_rFonts()
rFonts = rpr.get_or_add_rFonts()
#if rpr.sz is None:# rpr._add_sz()
rpr.get_or_add_sz()
#by importing these you can make elements and set values quickerfrom docx.oxml.shared import OxmlElement, qn
#szCs = rpr.makeelement(w_nsmap+'szCs',nsmap=rpr.nsmap)
szCs = OxmlElement('w:szCs')
#lang = rpr.makeelement(w_nsmap+'lang',nsmap =rpr.nsmap)
lang = OxmlElement('w:lang')
#szCs_attrib = szCs.attrib#lang_attrib = lang.attrib#rFonts_atr = rpr.rFonts.attrib#szCs_attrib[w_nsmap+'val'] =str(int(user_cs_font_size*2))#rFonts_atr[w_nsmap+'cs'] = user_cs_font_name#lang_attrib[w_nsmap+'bidi'] = 'fa-IR'
szCs.set(qn('w:val'),str(int(user_cs_font_size*2)))
lang.set(qn('w:bidi'),'fa-IR')
rFonts.set(qn('w:cs'),user_cs_font_name)
Solution 2:
I had a similar problem and added the support to the docx library. The forked docx code is in https://github.com/Oritk/python-docx Usage:
run = p.add_run(line)
#ru.font.size = Pt(8) ### This line is redundant - but you can leave itrun.font.cs_size = Pt(8)
run.font.rtl = True
Post a Comment for "Python-docx Add_style With Ctl (complex Text Layout) Language"