Motivation

There are thousand of different web pages with a lot of information in them. But what happens if the user wants the content of the web site to be stored in a local file? The user has to store the whole web site with all graphics and all links. Thereof the amount of data is really huge in some cases. So there is a need of a program or routine which makes it possible to get only the necessary content in another file. 
What is typo3?

Typo3 is an Open Source content management framework, which can store content in a database independent from the layout. By defining templates, content can for example be rendered into various styles of HTML pages, or even into XML or WAP formats. The detail explanation of TYPO3 is not content of this document. To involve the Open Source Community into the development process users can create a so called extension which can be seen as plug in to the core program. The extensions can be registered and administered in the extension repository. Thus they are available for all Typo3 users worldwide.

What is content rendering?

A content management system consists of many parts, such as data storage, data handling, presentation, user management, task management.. to only name a few.
Most parts (in mature content management systems) are interchangeable and extensible by other parts.
Content rendering concerns about the presentation of the stored data, which has to be delivered to the client (or web browser).
It is (in its most simple occurence) only a simple template or form, where the fields from the database are filled in.
Typo3 content rendering offers some more functions: Data is Rendered based on if and case-structures, conditions (such as a client/ browser-type), content of the datafields and so on. 
What is a template?

A template offers the possibility to write down these rules and structures, how content is rendered.
In the pdf_export extension are two types of templates involved:
One is the TypoScript-languages Template (TS template), specifying all content rendering rules.
The other template is a file, which uses subparts and marks, marked with tags:
This is my EXAMPLE subpart
The subparts are substituted with values by the TS template.
All control structures are kept in the TS template and in the Typo3 engine. Having rights to the template file, a person is only able to change layout of the fields in the pdf files (e.g. changing the font). 
What is LaTeX?

LaTeX is a document preparation system for high-quality typesetting. It is most often used for medium-to-large technical or scientific documents, but it can be used for almost any form of publishing.
It is not a word processor! Instead, LaTeX as a markup language encourages authors not to worry too much about the appearance of their documents, but to concentrate on getting the right content. In a nutshell you can say that LaTeX looks related to html with different commands for different appearing. If you are interested in using LaTeX please visit www.latex-project.org. Of course LaTeX is freeware.
By the way we want to say thank you Donald E. Knuth.

What is pdf?

Portable Document Format (PDF) is the open de facto standard for electronic document distribution worldwide. PDF is a universal file format that preserves all the fonts, formatting, graphics, and color of any source document, regardless of the application and platform used to create it. PDF files are compact and can be shared, viewed, navigated, and printed exactly as intended by anyone.(taken from adobe.com)

How does it work?

There are three occurences of the pdf_export:
- A plugin, which can be inserted in a page like normal content element. It is optional, you could also open the created pdf document directly using the URL field of you browser.
- A page rendering the contents in latex code. It is possible to disable the view of this page to the public (perhaps someone does not want to free his tex sources).
- A page, which opens the tex page (no. 2) and passes it to the pdf latex compiler. After that, the file is prepended with some headers, so that the browser knows that it is pdf document.
This results in a open/save dialog with a suggested filename from the pdf export extension.

Requirements

 |
Webserver (SuSE Linux 7.2 or above) |
 |
install Typo3 |
 |
packets te pdf, te etex, te latex, te pscm, tetex (perhaps from yast) |

How to install the extension?

 |
Install the required packages (see requirement section) - the yast- way should be easy enough. |
 |
Connect to the online repository using the extension manager of your typo3 system. |
 |
Install the pdf_export extension. |
 |
There are some files generated by the extension during a run of pdflatex. Thus, the workfiles directory has to be made writeable for the webserver. Using the default constants, in the linux-shell type: |

cd your/path/to/ext/pdf_export/workfiles
chmod 660 *
chown wwwrun.nogroup *

The work of the program in a nutshell

The Program works the way that a user wants a website to be printed. So he clicks on the printable version. When he clicks there a typo3 site is opened. This site parses the content of the database entry for that site into a tex file. Afterwards a php-script calls the pdflatex program and returns the consequently generated file back to the browser window. There the user get the option to store it on his hard disk or just to open it. 
The Template

Content Objects The database contents can be rendered by using Content Objects. These are formed in TypoScript, the Typo3 Template Language. From a programming perspective, they are a aggregation of arrays, defining case or if statements to get the right value for rendering.
To give an example:
cobj = COA
cobj.5 = TEXT
cobj.5.field = date
cobj.5.if.isTrue.field = date
Explanation: The Content Object cobj is a content objects array(COA). Its 5th object is of the type text. The value of that text is determined by the field date, which comes from the database record, that is rendered by cobj. The 4th statement defines, that the the element 5 should only be rendered, if the field date contains a value. 
What happens to render the LATEX source code from Typo3

The easiest way would be to show it in an example.
There you go with a great example. The Text is wrapped and splitted that way:
split {
token.char = 10 # return character
....
1.wrap = \item{|}
}
wrap = \begin{itemize}|\end{itemize}
Explanation: The first thing to do is to control the CType. CType is an attribute of the table tt content. In our example the CType has the type bullets. Thereof all lines where read and wrapped into an:
\item {The read line}
if the end of the field is reached all the wrapped items where wrapped into an:
\begin{itemize} The previous read items \end{itemize}
The result of this computations leads to:
\begin{itemize}
\item{item 1}
\item{item 2}
\end{itemize}
Afterwards the generated text is inserted into the site.
All the other text types like text or pictures or both are rendered the same way. Quite easy . . . 
Occured Problems

We had and still have problems with links. They are not displayed well because TypoScript has problems with wrapping it.
Beside some people think why the extension is not able to convert bold and italic text on the web site into bold and italic in the pdf file. We have tried it a long long time as we found out that Knuth (inventor of latex) thinks that it is not a good writing style to do so. Thats the reason why he has not implemented it into latex. 
Personal remarks from the authors

We would like to give a big Thank You! to the Typo3 Community for their enthusiasm to create this mature master piece of software. By making this great extension we do not only want to extend the features of our own class homepage but also contribute our time and energy to this great bunch of freaks :-).
Safe the vinyl! 
Supported features

 |
Title page |
 |
table of contents |
 |
of course text |
 |
bold, italic text |
 |
tables |
 |
lists (like this one here!) |
 |
mark links |
 |
caching (out of the box feature by typo3) |
 |
pre / verbatim texts |
 |
german special chars |

Tested Environments

The extension was tested with typo3 v.3.5 deployed on the root server version by 1&1 Germany.
If someone tried to deploy the entension on a windows machine (this shouldn t be impossible :-) ), please contact us. 
Planned features

 |
text beside pictures |
 |
header and footer |
 |
document information: author, document title . . . |

Authors tasks

Steffen Grunwald spent the most time in developing the extension with typo 3 he really liked that part.
The Latex part of this work was done by Marc Rösler. Besides he learned a lot of typo3 by watching Steffen. 
Additional information

To see if it works you may visit www.roggeweck.net
To find out more about TYPO3 see www.typo3.com
To download a pdf-viewer for free see www.adobe.com 
|