Parsing Formulas
openpyxl supports limited parsing of formulas embedded in cells. The openpyxl.formula package contains a Tokenizer class to break formulas into their constituent tokens. Usage is as follows:
>>> from openpyxl.formula import Tokenizer
>>> tok = Tokenizer("""=IF($A$1,"then True",MAX(DEFAULT_VAL,'Sheet 2'!B1))""")
>>> print("\n".join("%12s%11s%9s" % (t.value, t.type, t.subtype) for t in tok.items))
IF( FUNC OPEN
$A$1 OPERAND RANGE
, SEP ARG
"then True" OPERAND TEXT
, SEP ARG
MAX( FUNC OPEN
DEFAULT_VAL OPERAND RANGE
, SEP ARG
'Sheet 2'!B1 OPERAND RANGE
) FUNC CLOSE
) FUNC CLOSE
As shown above, tokens have three attributes of interest:
.value: The substring of the formula that produced this token.type: The type of token this represents. Can be one ofToken.LITERAL: If the cell does not contain a formula, its value is represented by a singleLITERALtoken.Token.OPERAND: A generic term for any value in the Excel formula. (See.subtypebelow for more details).Token.FUNC: Function calls are broken up into tokens for the opener (e.g.,SUM(), followed by the arguments, followed by the closer (i.e.,)). The function name and opening parenthesis together form oneFUNCtoken, and the matching parenthesis forms anotherFUNCtoken.Token.ARRAY: Array literals (enclosed between curly braces) get twoARRAYtokens each, one for the opening{and one for the closing}.Token.PAREN: When used for grouping subexpressions (and not to denote function calls), parentheses are tokenized asPARENtokens (one per character).Token.SEP: These tokens are created from either commas (,) or semicolons (;). Commas createSEPtokens when they are used to separate function arguments (e.g.,SUM(a,b)) or when they are used to separate array elements (e.g.,{a,b}). (They have another use as an infix operator for joining ranges). Semicolons are always used to separate rows in an array literal, so always createSEPtokens.Token.OP_PRE: Designates a prefix unary operator. Its value is always+or-Token.OP_IN: Designates an infix binary operator. Possible values are>=,<=,<>,=,>,<,*,/,+,-,^, or&.Token.OP_POST: Designates a postfix unary operator. Its value is always%.Token.WSPACE: Created for any whitespace encountered. Its value is always a single space, regardless of how much whitespace is found.
.subtype: Some of the token types above use the subtype to provide additional information about the token. Possible subtypes are:Token.TEXT,Token.NUMBER,Token.LOGICAL,Token.ERROR,Token.RANGE: these subtypes describe the various forms ofOPERANDfound in formulae.LOGICALis eitherTRUEorFALSE,RANGEis either a named range or a direct reference to another range.TEXT,NUMBER, andERRORall refer to literal values in the formulaToken.OPENandToken.CLOSE: these two subtypes are used byPAREN,FUNC, andARRAY, to describe whether the token is opening a new subexpression or closing it.Token.ARGandToken.ROW: are used by theSEPtokens, to distinguish between the comma and semicolon. Commas produce tokens of subtypeARGwhereas semicolons produce tokens of subtypeROW
Translating formulae from one location to another
It is possible to translate (in the mathematical sense) formulae from one
location to another using the openpyxl.formulas.translate.Translator
class. For example, there a range of cells B2:E7 with a sum of each
row in column F:
>>> from openpyxl.formula.translate import Translator
>>> ws['F2'] = "=SUM(B2:E2)"
>>> # move the formula one colum to the right
>>> ws['G2'] = Translator("=SUM(B2:E2)", origin="F2").translate_formula("G2")
>>> ws['G2'].value
'=SUM(C2:F2)'
Note
This is limited to the same general restrictions of formulae: A1 cell-references only and no support for defined names.