PDF/Collection feature live on de.wikibooks

Discussion:

Erik Moeller

2008-10-10 01:46:32 UTC

The collection tool, PDF export, and print on demand features are now
live on the German Wikibooks edition. PediaPress (who have developed
these open source features) are a German company, and want to
demonstrate the features at the Frankfurt Book Fair, so it made sense
to start in this language. We hope to add the other Wikibooks
languages really soon. Next stop: Wikipedia.

We'll also be adding OpenDocument and DocBook export once we've tested
them a bit on Wikimedia Labs.

Here's an example full length book rendered with the PDF tool:
http://de.wikibooks.org/wiki/Spezial:Sammlung/load_collection/?colltitle=Benutzer:Eloquence/Kollektionen/Beispiel-Sammlung

(You'll have to click the PDF download button.) As you can see, there
are still some hardcoded English texts to get rid of. In terms of
output quality, formatting of stuff with underlying HTML in the wiki
source texts is the main area of imperfections, since the PDF
generator uses wiki-text as a source and gets a bit confused when it
encounters HTML. But it should generally ignore what it doesn't
understand. If you find cases where it dies, please report them,
ideally through the bug tracker at code.pediapress.com (you have to
register).

This feature will make it possible to maintain the hierarchical
structure of wiki-books through dedicated collection meta-files that
are stored in the wiki. The underlying meta-file in the case above is
this one:

http://de.wikibooks.org/wiki/Benutzer:Eloquence/Kollektionen/Beispiel-Sammlung

As you can see, it's a very simple format. These pages can exist
either in the user namespace or in the project namespace, and will be
automatically detected as "collections" that can then be loaded and
exported via the collection toolbox in the sidebar. But for
user-friendly PDF download, it's probably easiest to integrate links
(in the above format) to ready-made collections into templates, like
the existing "printable version" templates.

One of the nicer aspects of this approach is that you can easily have
multiple views on the same Wikibook, or create a book pulling from
multiple sources. But I also see the collection meta-files potentially
useful for other purposes in the future, such as Wikibooks statistics.

When this is available on all projects, I'll write a bit more. If you
want to play with an English language version, there's still a demo
running at:

http://en.labs.wikimedia.org/wiki/Main_Page

with a full English Wikibooks snapshot database.

Have fun,
Erik

--
Erik Möller
Deputy Director, Wikimedia Foundation

Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

Derbeth

2008-10-10 09:15:25 UTC

Permalink

I wonder about the legal aspects. In my opinion, when you create a ready-to-print version, you have to attach the text of GFDL license to it - directly, not as a link. Like it is done in http://en.wikibooks.org/wiki/Image:LaTeX.pdf.

Secondly, current version of the tool does a plagiarism - beacause it does not mention image authors and does not provide any mean (like by making images clickable) to check these authors. It could generate a list of images with their licenses (as images can have different licenses, not only CC & GFDL, but also FAL), like on page 213 of http://pl.wikibooks.org/wiki/Grafika:C.pdf.

--
http://pl.wikipedia.org - otwarta encyklopedia
http://pl.wikinews.org - otwarte źródło informacji
http://pl.wikibooks.org - otwarte podręczniki

Opera - the fastest browser on Earth!

Erik Moeller

2008-10-10 19:22:49 UTC

Permalink

Post by Derbeth
I wonder about the legal aspects. In my opinion, when you create a ready-to-print version,
you have to attach the text of GFDL license to it - directly, not as a link. Like it is done in
http://en.wikibooks.org/wiki/Image:LaTeX.pdf.

Yes, I agree, and I've already noted this. The code accepts any text
insertion here, so that should be reasonably straightforward.
(Localization is trickier.)

Post by Derbeth
Secondly, current version of the tool does a plagiarism - beacause it does not mention
image authors and does not provide any mean (like by making images clickable) to check
these authors.

Ouch, thanks for pointing that out. Tricky to do this automatically
since it's all wiki-text with templates, but we'll investigate a
solution here.

--
Erik Möller
Deputy Director, Wikimedia Foundation

Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

Derbeth

2008-10-10 19:27:39 UTC

Permalink

Post by Erik Moeller

Ouch, thanks for pointing that out. Tricky to do this automatically
since it's all wiki-text with templates, but we'll investigate a
solution here.

--
http://pl.wikipedia.org - otwarta encyklopedia
http://pl.wikinews.org - otwarte źródło informacji
http://pl.wikibooks.org - otwarte podręczniki

Opera - the fastest browser on Earth!

Mike.lifeguard

2008-10-10 19:53:32 UTC

Permalink

And also why free content is preferred to be on Commons in the first
place.

Post by Derbeth

Post by Erik Moeller

Ouch, thanks for pointing that out. Tricky to do this automatically
since it's all wiki-text with templates, but we'll investigate a
solution here.

Fortunately most images on Commons use {{Information}} template; in other
cases it would be quite reasonable to simply assume, that names from
links to User: namespace in image description are names of the authors.
That's a good example why it's so important to follow standards on Commons.
--
http://pl.wikipedia.org - otwarta encyklopedia
http://pl.wikinews.org - otwarte źródło informacji
http://pl.wikibooks.org - otwarte podręczniki
Opera - the fastest browser on Earth!
_______________________________________________
Textbook-l mailing list
https://lists.wikimedia.org/mailman/listinfo/textbook-l

--
Mike.lifeguard
***@fastmail.fm

Andrew Whitworth

2008-10-10 22:16:03 UTC

Permalink

Maybe now is a good time to revive the "move all our free images to
Commons" crusade. I'll grab my pitchfork...

--Andrew Whitworth

On Fri, Oct 10, 2008 at 3:53 PM, Mike.lifeguard

Post by Mike.lifeguard
And also why free content is preferred to be on Commons in the first
place.

Post by Derbeth

Post by Erik Moeller

Ouch, thanks for pointing that out. Tricky to do this automatically
since it's all wiki-text with templates, but we'll investigate a
solution here.

--
Mike.lifeguard
_______________________________________________
Textbook-l mailing list
https://lists.wikimedia.org/mailman/listinfo/textbook-l

J***@public.gmane.org

2008-10-10 22:54:29 UTC

Permalink

Hi all.

I'd like to introduce three new topics to discuss.

For one I don't like the current payment options of WikiPress. I don't know, if I'm the only one without a credit card, but I'd think it would be useful to accept paypal as option for payment.

My second idea is, that one can link to a free or not so expensive PDF-Editor, or maybe even implement one, so that some minor changes (e.g. size of letters and images) can be done.

Thirdly I think that the images would look better without the frame. The frames have of cause a certain recognition value, so one could argue that they should stay, because it would symbolise Wikis. I'd prefer images without frames though, or with some frame that doesn't stick out too much.

Best wishes,

John
--
**********************************************************************************************
Diese eMail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht
gestattet.

This e-mail may contain confidential and/or privileged information.
Any unauthorised copying, disclosure or distribution of the material
in this e-mail is strictly forbidden.
**********************************************************************************************

Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger

Erik Moeller

2008-10-10 22:58:23 UTC

Permalink

Post by J***@public.gmane.org
Thirdly I think that the images would look better without the frame. The frames have of
cause a certain recognition value, so one could argue that they should stay, because it
would symbolise Wikis. I'd prefer images without frames though, or with some frame that
doesn't stick out too much.

It would be nice if a lot of the layout options could be customized
through a tab on the collection page - that's definitely high on my
wishlist.

--
Erik Möller
Deputy Director, Wikimedia Foundation

Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

Johannes Beigel

2008-10-14 08:38:04 UTC

Permalink

Post by Erik Moeller
(Localization is trickier.)

We're already working on a solution but we have to balance elegance/
maintainability, complexity and practicality for the community/
translators.

For the latter point, I guess some i18n.php file, which is made
available on Betawiki would be preferable. But from the code side, the
obvious (easier) solution would be using gettext.

Note: The PDF generation code (mwlib and mwlib.rl) is almost
completely separate from the Collection extension: It's Python code
running on another server whose releases/commits are not necessarily
synchronized with that of the extnesion etc. So sending messages from
the extension to the render server could result in untranslated
strings. But so does a less maintained gettext .po file :-)

-- Johannes Beigel

Erik Moeller

2008-10-15 02:40:59 UTC

Permalink

Post by Johannes Beigel
For the latter point, I guess some i18n.php file, which is made
available on Betawiki would be preferable. But from the code side, the
obvious (easier) solution would be using gettext.

Understood. I believe there are plans (?) to support gettext in
Betawiki; I've made a separate email introduction regarding this.

That said, I can imagine that in future, the Collection extension
might support a set of advanced export options (layout options, etc.),
and we might have to do a version check against the mw-pdf server
anyway to see whether both are in sync. So if you feel that passing
the i18n strings on to the mw-pdf server would be viable, it would
probably be my preferred solution.

--
Erik Möller
Deputy Director, Wikimedia Foundation

Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

Johannes Beigel

2008-10-14 10:12:55 UTC

Permalink

Post by Erik Moeller

Post by Derbeth
I wonder about the legal aspects. In my opinion, when you create a
ready-to-print version,
you have to attach the text of GFDL license to it - directly, not
as a link. Like it is done in
http://en.wikibooks.org/wiki/Image:LaTeX.pdf.

As Erik wrote: This is already implemented (either a title of an
article or a URL to some license text can be set in
LocalSettings.php), but it's currently not configured.

Post by Erik Moeller

Ouch, thanks for pointing that out. Tricky to do this automatically
since it's all wiki-text with templates, but we'll investigate a
solution here.

We'd highly appreciate input from the community regarding this topic!

The printed books from PediaPress contain a list of figures where the
license of each image is listed, together with the URL to the image
description page. As some kind of "hotfix" this solution could be
implemented in the PDF export of the Collection extension, too. But
this doesn't really solve the problem.

We think it's more of a technical/software thing, so I cross-posted
(and set Reply-To) to Wikitech-l.

In our opinion, license management/handling must be a core feature of
MediaWiki, because the software is explicitely developed for the
collaborative distribution of free content. Licenses of the containing
articles and images should not be represented via some agreed-upon
convention but via structured (and machine-readable) information,
available for each relevant object in the wiki.

Some information that would be desired:

- Full (official) name of the license(s).
- Whether the full text of the license has to be included or a
reference sufficient.
- Reference to the full text of the license(s) (in some rigidly
defined format like wikitext).
- Whether attribution is required. If so: The list of required
attributions.

So, basically all the information that's required to check if it's
possible to take some part of the MediaWiki and use it somewhere else
and all the information that has to be included in that other place.
This information could be made accessible via MediaWiki API, but
ideally it's contained in the wikitext and/or XHTML, too.

All this could be handled via microformats, even inside of templates,
but the main point is that any kind of new technique has to be
enforced, ideally via MediaWiki software itself: In the commons wikis
there are some conventions that can be used in software by people/
companies like us (although we have to work with hacks and
workarounds), but oftentimes, in wikis with smaller communities this
information doesn't even exist at all.

-- Johannes Beigel

Johannes Beigel

2008-10-14 11:56:19 UTC

Permalink

BTW: PediaPress has a stand on the Frankfurter Buchmesse (Frankfurt
Book Fair), booth E427 in hall 4.2. We'd be really happy to meet
people from the community to talk about all kinds of MediaWiki related
stuff.

So, if some of you are there and can make it... we're looking forward
to meet you!

-- Johannes Beigel

Johannes Beigel

2008-10-14 10:12:55 UTC

Permalink

Post by Erik Moeller

Post by Derbeth
I wonder about the legal aspects. In my opinion, when you create a
ready-to-print version,
you have to attach the text of GFDL license to it - directly, not
as a link. Like it is done in
http://en.wikibooks.org/wiki/Image:LaTeX.pdf.

As Erik wrote: This is already implemented (either a title of an
article or a URL to some license text can be set in
LocalSettings.php), but it's currently not configured.

Post by Erik Moeller

Ouch, thanks for pointing that out. Tricky to do this automatically
since it's all wiki-text with templates, but we'll investigate a
solution here.