PDF::Builder::Content::Text(3pm) | User Contributed Perl Documentation | PDF::Builder::Content::Text(3pm) |
PDF::Builder::Content::Text - additional specialized text-related formatting methods. Inherits from PDF::Builder::Content
Note: If you have used some of these methods in PDF::Builder with a graphics type object (e.g., $page->gfx()->method()), you may have to change to a text type object (e.g., $page->text()->method()).
Adds text to the page (left justified), at the current position. Note that there is no maximum width, and nothing to keep you from overflowing the physical page on the right! The width used (in points) is returned.
Adds text to the page (centered). The width used (in points) is returned.
Adds text to the page (right justified). Note that there is no maximum width, and nothing to keep you from overflowing the physical page on the left! The width used (in points) is returned.
The unchanged $width is returned, unless there was some reason to change it (e.g., overflow).
Options:
Word (interword) spacing values (explicit or default) are doubled if nocs is 1. This is to make up for the lack of added/subtracted intercharacter spacing.
If expansion (or reduction) wordspace and charspace changes didn't do enough to make the line fit the desired width, use "hscale()" to finish expanding or condensing the line to fit.
The string is split at regular blanks (spaces), x20, to find the longest substring that will fit the $width. If a single word is longer than $width, it will overflow. To stay strictly within the desired bounds, set the option "spillover"=>0 to disallow spillover.
Hyphenation
If hyphenation is enabled, those methods which split up a string into multiple lines (the "text fill", paragraph, and section methods) will attempt to split up the word that overflows the line, in order to pack the text even more tightly ("greedy" line splitting). There are a number of controls over where a word may be split, but note that there is nothing language-specific (i.e., following a given language's rules for where a word may be split). This is left to other packages.
There are hard coded minimums of 2 letters before the split, and 2 letters after the split. See "Hyphenate_basic.pm". Note that neither hyphenation nor simple line splitting makes any attempt to prevent widows and orphans, prevent splitting of the last word in a column or page, or otherwise engage in paragraph shaping.
Methods
Note that the entire line is fit to the available width via a call to "text_justified". See "text_justified" for options to control stretch and condense. The last line is unjustified (normal size) and left aligned by default, although the option
Options:
Apply the text within the rectangle and return any leftover text (if could not fit all of it within the rectangle). If called in an array context, the unused height is also returned (may be 0 or negative if it just filled the rectangle).
If $continue is 1, the first line does not get special treatment for indenting or outdenting, because we're printing the continuation of the paragraph that was interrupted earlier. If it's 0, the first line may be indented or outdented.
Options:
$over is 1 or 0, with the default 1 (spills over the width).
Example:
$txt->font($font,$fontsize); $txt->leading($leading); $txt->translate($x,$y); $overflow = $txt->paragraph( 'long paragraph here ...', $width, $y+$leading-$bottom_margin );
Note: if you need to change any text treatment within a paragraph (bold or italicized text, for instance), this can not handle it. Only plain text (all the same font, size, etc.) can be typeset with "paragraph()". Also, there is currently very limited line splitting (hyphenation) to better fit to a given width, and nothing is done for "widows and orphans".
$continue is 0 for the first call of section(), and then use the value returned from the previous call (1 if a paragraph was cut in the middle) to prevent unwanted indenting or outdenting of the first line being printed.
For compatibility with recent changes to PDF::API2, paragraphs is accepted as an alias for "section".
Options:
See "paragraph" for other %opts you can use, such as "align" and "pndnt".
Options:
Other options available to "text", such as underlining, can be used here.
The width used (in points) is returned.
Please note that "textlabel()" was not designed to interoperate with other text operations. It is a standalone operation, and does not leave a "next write" position (or any other setting) for another "text" mode operation. A following write will likely be at "(0,0)", and not at the expected location.
"textlabel()" is intended as an "all in one" convenience function for single lines of text, such as a label on some graphics, and not as part of putting down multiple pieces of text. It is possible to figure out the position of a following write (either "textlabel" or "text") by adding the returned width to the original position's x value (assuming left-justified positioning).
Tag names, CSS entries, markup type, etc. are case-sensitive (usually lower-case letters only). For example, you cannot give a <P> paragraph in HTML or a P selector in CSS styling.
$page is the page context. Currently, its only use is for page annotations for links ('md1' []() and 'html' <a>), so if you're not using those, you may pass anything such as "undef" for $page if you wish.
$text is the text context, so that various font and text-output operations may be performed. It is often, but not necessarily always, the same as the object containing the "column" method.
$grfx is the graphics (gfx) context. It may be a dummy (e.g., undef) if no graphics are to be drawn, but graphical items such as the column outline ('outline' option) and horizontal rule (<hr> in HTML markup) use it. Currently, text-decoration underline (default for links, 'md1' "[]()" and 'html' "<a>") or line-through or overline use the text context, but may in the future require a valid graphics context. Images (when implemented) will require a graphics context.
$markup is information on what sort of markup is being used to format and lay out the column's text:
The input txt is a list (anonymous array reference) of strings, each containing one or more paragraphs. A single string may also be given. An empty line between paragraphs may be used to separate the paragraphs. Paragraphs may not span array elements.
* or _ italics, ** bold, *** bold+italic; bulleted list *, numbered list 1. 2. etc.; #, ## etc. headings and subheadings; ---, ===, ___ horizontal rule; [label](URL) external links (to HTML page or within this document, see 'a') ` (backticks) enclose a "code" section
HTML (see below) may be mixed in as desired (although not within "code" blocks marked by backticks, where <, >, and & get turned into HTML entities, disabling the intended tags). Markdown will be converted into HTML, which will then be interpreted into PDF. Note that Text::Markdown may produce HTML for certain features, that is not yet supported by HTML processing (see 'html' section below). Let us know if you need such a feature!
The input txt is a list (anonymous array reference) of strings, each containing one or more paragraphs and other markup. A single string may also be given. Per Markdown formatting, an empty line between paragraphs may be used to separate the paragraphs. Separate array elements will first be glued together into a single string before processing, permitting paragraphs to span array elements if desired.
There are other flavors of Markdown, so other mdn flavors may be defined in the future, such as POD from Perl code.
'i'/'em' (italic), 'b'/'strong' (bold), 'p' (paragraph), 'font' (font face->font-family, color, size->font-size), 'span' (needs style= attribute with CSS to do anything useful), 'ul', 'ol', 'li' (bulleted, numbered lists), 'img' (TBD, image, empty. hspace->margin-left/right, vspace->margin-top/bottom, width, height), 'a' (anchor/link, web page URL or this document target #p[-x-y[-z]]), 'pre', 'code' (TBD, preformatted and code blocks), 'h1' through 'h6' (headings) 'hr' (horizontal rule) 'br' (TBD, line break, empty) 'sup', 'sub' (TBD superscript and subscript) 's', 'strike', 'del' (line-through) 'u', 'ins' (underline) 'ovl' (TBD -- non-HTML, overline) 'k' (TBD -- non-HTML, kerning left/right shift)
are supported (fully or in part unless "TBD"), along with limited CSS for color, font-size, font-family, etc. <style> tags may be placed in an optional <head> section, or within the <body>. In the latter case, style tags will be pulled out of the body and added (in order) on to the end of any style tag(s) defined in a head section. Multiple style tags will be condensed into a single collection (later definitions of equal precedence overriding earlier). These stylings will have global effect, as though they were defined in the head. As with normal CSS, the hierarchy of a given property (in decreasing precedence) is
appearance in a style= tag attribute appearance in a tag attribute (possibly a different name than the property) appearance in a #IDname selector in a <style> appearance in a .classname selector in a <style> appearance in a tag name selector in a <style>
Selectors are quite simple: a single tag name (e.g., body), a single class (.cname), or a single ID (#iname). There are no combinations (e.g., "p.abstract" or "ol, ul"), hierarchies (e.g., "ol > li"), specified number of appearance, or other such complications as found in a browser's CSS. Sorry!
Supported CSS properties:
color (foreground color) display (inline/block) font-family (name as defined to FontManager, e.g. Times) font-size (pt, bare number = pt, % of current size) font-style (normal/italic) font-weight (normal/bold) height (pt, bare number) thickness of horizontal rule list-style-position (outside) TBD inside list-style-type (marker description, see also _marker-before/after) margin-top/right/bottom/left (pt, bare number = pt, % of font-size) text-decoration (none, underline, line-through, overline) text-height (leading, as ratio of baseline-spacing to font-size) text-indent (pt, bare number = pt, % of current font-size) width (pt, bare number) width of horizontal rule
Non-standard CSS "properties". You may want to set these in CSS:
_marker-before (text to insert before <ol> marker, default nothing) _marker-after (text to insert after <ol> marker, default period)
Non-standard CSS "properties". You normally would not set these in CSS:
_fs (current running font size, in points, on the properties stack) _href (URL for <a>, normally provided by href= attribute) _left (running number of points to indent on the left, from margin-left and list nesting) _right (running number of points to indent on the right, from margin-right)
Sizes may be '%' (of font-size), or 'pt' (the default unit). More support may be added over time. CAUTION: comments /* and */ are NOT currently supported in CSS -- perhaps in the future.
Numeric entities (decimal &#nnn; and hexadecimal &#xnnn;) are supported, as well as named entities (— for example).
The input txt is a list (anonymous array reference) of strings, each containing one or more paragraphs and other markup. A single string may also be given. Per normal HTML practice, paragraph tags should be used to mark paragraphs. Note that HTML::TreeBuilder is configured to automatically mark top body-level text with paragraph tags, in case you forget to do so, although it is probably better to do it yourself, to maintain more control over the processing. Separate array elements will first be glued together into a single string before processing, permitting paragraphs to span array elements if desired.
There are other markup languages out there, such as HTML-like Pango, and man page (troff), that might be supported in the future. It is very unlikely that TeX or LaTeX will ever be supported, as they both already have excellent PDF output.
$txt is the input text: a string, an array reference to multiple strings, or an array reference to hashes. See $markup for details.
%opts Options -- a number of these are of course, mandatory.
The top text baseline is assumed to be relative to the UL corner (based on the determined line height), and the column outline clips that baseline, as it does additional baselines down the page (interline spacing is "leading" multiplied by the largest "font_size" or image height needed on that line).
Currently, 'rect' is required, as it is the only column shape supported.
This permits a generically-shaped outline to be defined, scaled (perhaps not preserving the aspect ratio) and placed anywhere on the page. This could save you from having to define similarly-shaped columns from scratch multiple times. If you want to define a relative outline, the lower left corner (whether or not it contains a point, and whether or not it's the first one listed) would usually be "0, 0", to have scaling work as expected. In other works, your outline template should be in the lower left corner of the page.
Note that the "x" position will be determined by the column shape and size (the left-most point of the baseline), so there is no place to explicitly set an "x" position to start at.
The starting font size may be set in a number of ways. It may be inherited from a previous "$text->font(..., font-size)" statement; it may be set via the "font_size" option (overriding any font method inheritance); it may default to 12pt (if neither explicit way is given). For HTML markup, it may of course be modified by the "font" tag or by CSS styling "font-size". For Markdown, it may be modified by CSS styling.
The default is 2 times the font_size passed to "column()", and is not adjusted for any changes of font_size in the markup. An explicit value passed in is also not changed -- the gutter width for the marker will be the same in all lists (keeping them aligned). If you plan to have exceptionally long markers, such as an ordered list of years in Roman numerals, such as (MCMXCIX), you may want to make this gutter a bit wider.
Example: to insert a red cross (X-out) and green tick (check) mark
'substitute' => [ [ '%cross%', '<font face="ZapfDingbats" color="red">', '8', '</font>' ], [ '%tick%', '<font face="ZapfDingbats" color="green">', '4', '</font>' ], ]
should change "%cross%" in Markdown text ('md1') or HTML text ('html') to "<font face="ZapfDingbats" color="green">8</font>" and similarly for "%tick%". This is done after the Markdown is converted to HTML (but before HTML is parsed), so make sure that your macro text (e.g., "%tick%") isn't something that Markdown will try to interpret by itself! Also, Perl's regular expression parser seems to get upset with some characters, such as "|", so don't use them as delimiters (e.g., "|cross|"). You don't have to wrap your macro name in delimiters, but it can make the text structure clearer, and may be necessary in order not to do substitutions in the wrong place.
CAUTION: The Font Manager is not synchronized with whatever state the font is returned to. You should not request the 'current' font, but should instead explicitly set it to a specific face, etc., which resets 'current'.
It is equivalent to "restore = 1".
CAUTION: The Font Manager is not synchronized with whatever state the font is returned to. You should not request the 'current' font, but should instead explicitly set it to a specific face, etc., which resets 'current'.
The Font Manager system is used to supply the requested fonts, so it is up to the application to pre-load the desired font information before "column()" is called. Any request to change the encoding within "column()" will be ignored, as the fonts have already been specified for a specific encoding. Needless to say, the encoding used in creating the input text needs to match the specified font encoding.
Absent any markup changing the font face or styling, whatever is defined by Font Manager as the current font will be what is used. This way, you may inherit the font from the previous "column()", or call "$text-"font($pdf->get_font(), size)> to set both the font and size, or just call "$pdf-"get_font()> to set only the font, relying on the "font_size" option or CSS markup to set the size.
Line fitting (paragraph shaping) is currently quite primitive. Words will not be split (hyphenated). It is planned to eventually add Knuth-Plass paragraph shaping, along with proper language-dependent hyphenation.
Each change of font automatically supplies its maximum ascender and minimum descender, the extents above and below the text line's baseline. Each block of text with a given face and variant, or change of font size, will be given the same vertical extents -- the extents are font-wide, and not determined on a per-glyph basis. So, unfortunately, a block of text "acemnorsuvwz" will have the same vertical extents as a block of text "bdfghijklpqty". For a given line of text, the highest ascender and the lowest descender (plus leading) will be used to position the line at the appropriate distance below the previous line (or the top of the column). No attempt is made to "fit" projections into recesses (jigsaw-puzzle like). If there is an inset into the side of a column, or it is otherwise not a straight vertical line, so long as the baseline fits within the column outline, no check is made whether descenders or ascenders will fall outside the defined column (i.e., project into the inset). We suggest that you try to keep font sizes fairly consistent, to keep reasonably consistent text vertical extents.
Data returned by this call
If there is more text than can be accommodated by the column size, the unused portion is returned, with a return code of 1. It is an empty list if all the text could be formatted, and the return code is 0. "next_y" is the y coordinate where any additional text ("column()" call) could be added to a column (as "start_y") that wasn't completely filled. This would be at the starting point of a new column (i.e., the last paragraph is ended). Note that the application code should check if this position is too far down the page (in the bottom margin) and not blindly use it! Also, as 'md1' is first converted to HTML, any unused portion will be returned as 'pre' markup, rather than Markdown or HTML. Be sure to specify 'pre' for any continuation of the column (with one or more additional "column()" calls), rather than 'none', 'md1', or 'html'.
NOTE: if "restore" has a value of 1, the "column()" call makes no effort to "restore" conditions to any starting values. If your last bit of text left the "current" font with some "odd" face/family, size, italicized, bolded, or colored; that will be what is used by the next column call (or other PDF::Builder text calls). This is done in order to allow you to easily chain from one column to the next, without having to manually tell the system what font, color, etc. you want to return to. On the other hand, in some cases you may want to start from the same initial coditions as usual. You may want to add "get_font()", "font()", "fillcolor()", and "strokecolor()" calls as necessary before the next text output, to get the expected text characteristics. Or, you can simply let "restore" default to 0 to get the same effect.
If "restore" defaults to 0 (or is set to 1), the text settings in the "current" font are left as-is, so that whatever you were doing when you ran out of defined column (as regards to font face/family, size, italic and bold states, and color) should automatically be the same when you make the next "column()" call to make more output.
Additional return codes may be added in the future, to indicate failures of one sort or another.
If the return code $rc was 1 (column was used up), the $next_y returned will be -1, as it would be meaningless to use it.
If $rc is 0 (all input was used up), $unused is an empty anonymous array. It contains nothing to be used.
2023-01-24 | perl v5.36.0 |