PediaPress error with multiple pages with the same name

Discussion:

Brianna Laugher

2009-01-28 13:37:54 UTC

Hi,

I'm not sure where to post this, but I'm pretty sure if I put it here
someone from PediaPress will probably read it.

I just found out about the PediaPress bookmarklet "create a collection
from any MediaWiki" thing and I thought I would try it. I added the
"Melbourne" article from en.wikipedia, wikitravel and Wikimedia
Commons, as well as an article called "Street press" from somewhere
else. When I downloaded the PDF, it contained 3 copies of the
en.wikipedia "Melbourne" article and the "Street press" article. So I
guess there is some bug with multiple articles from different sources
that happen to have the same name.

Secondly, this is not a bug but a feature request: en.wikipedia in
particular produces an awful amount of crud that is not that useful
for printing: references, external links etc. For the [[Melbourne]]
article, there are 22 pages of beautiful text and images, and no less
than 11 1/2 pages of crud, mostly consisting of 184 references. Would
it be possible to have an option to exclude references? Maybe replace
them all with a note like "To see original references, please visit
[url]."

thanks,
Brianna

--
They've just been waiting in a mountain for the right moment:
http://modernthings.org/

Brianna Laugher

2009-01-28 13:42:18 UTC

Permalink

oops, one more thing: It appears that for pages collected using the
bookmarklet, license information is not recorded or included. I don't
particularly want a copy of the GFDL any more than references, but a
notice to that effect would be nice.

*looks at the API*
hm... bizarrely, I don't see the license info being one of the API
parameters. Did I miss it or is it a bug?

Brianna

Post by Brianna Laugher
Hi,
I'm not sure where to post this, but I'm pretty sure if I put it here
someone from PediaPress will probably read it.
I just found out about the PediaPress bookmarklet "create a collection
from any MediaWiki" thing and I thought I would try it. I added the
"Melbourne" article from en.wikipedia, wikitravel and Wikimedia
Commons, as well as an article called "Street press" from somewhere
else. When I downloaded the PDF, it contained 3 copies of the
en.wikipedia "Melbourne" article and the "Street press" article. So I
guess there is some bug with multiple articles from different sources
that happen to have the same name.
Secondly, this is not a bug but a feature request: en.wikipedia in
particular produces an awful amount of crud that is not that useful
for printing: references, external links etc. For the [[Melbourne]]
article, there are 22 pages of beautiful text and images, and no less
than 11 1/2 pages of crud, mostly consisting of 184 references. Would
it be possible to have an option to exclude references? Maybe replace
them all with a note like "To see original references, please visit
[url]."
thanks,
Brianna
--
http://modernthings.org/

--
They've just been waiting in a mountain for the right moment:
http://modernthings.org/

Heiko Hees

2009-01-28 21:14:21 UTC

Permalink

Hi,

Post by Brianna Laugher
oops, one more thing: It appears that for pages collected using the
bookmarklet, license information is not recorded or included. I don't
particularly want a copy of the GFDL any more than references, but a
notice to that effect would be nice.
*looks at the API*
hm... bizarrely, I don't see the license info being one of the API
parameters. Did I miss it or is it a bug?

assuming you are talking about the MediaWiki API. Yes this is a bug
with MediaWiki.

In our opinion, license management/handling should be a core feature
of MediaWiki, because the software is explicitely developed for the
collaborative generation of free content. But from the perspective of
a re-user this is almost non-existent[1].

Heiko

[1] http://markmail.org/message/cfshfwqse3gno372

Post by Brianna Laugher
Brianna

Post by Brianna Laugher
Hi,
I'm not sure where to post this, but I'm pretty sure if I put it here
someone from PediaPress will probably read it.
I just found out about the PediaPress bookmarklet "create a
collection
from any MediaWiki" thing and I thought I would try it. I added the
"Melbourne" article from en.wikipedia, wikitravel and Wikimedia
Commons, as well as an article called "Street press" from somewhere
else. When I downloaded the PDF, it contained 3 copies of the
en.wikipedia "Melbourne" article and the "Street press" article. So I
guess there is some bug with multiple articles from different sources
that happen to have the same name.
Secondly, this is not a bug but a feature request: en.wikipedia in
particular produces an awful amount of crud that is not that useful
for printing: references, external links etc. For the [[Melbourne]]
article, there are 22 pages of beautiful text and images, and no less
than 11 1/2 pages of crud, mostly consisting of 184 references. Would
it be possible to have an option to exclude references? Maybe replace
them all with a note like "To see original references, please visit
[url]."
thanks,
Brianna
--
http://modernthings.org/

--
http://modernthings.org/
_______________________________________________
Textbook-l mailing list
https://lists.wikimedia.org/mailman/listinfo/textbook-l

--
Heiko Hees / brainbot technologies AG
Boppstrasse 64 / 55118 Mainz
Fon +49 (0) 61 31 - 2 11 63 91

Brianna Laugher

2009-01-29 12:27:13 UTC

Permalink

(No need to forward my messages to mwlib anymore; I subscribed there
and will write there if appropriate :))

Post by Heiko Hees
Hi,

assuming you are talking about the MediaWiki API. Yes this is a bug
with MediaWiki.
In our opinion, license management/handling should be a core feature
of MediaWiki, because the software is explicitely developed for the
collaborative generation of free content. But from the perspective of
a re-user this is almost non-existent[1].
Heiko
[1] http://markmail.org/message/cfshfwqse3gno372

I see what you mean. I will reply in more detail to your post to wikitech-l.

I meant a much more simple getting of the info that appears in the
footer of MediaWikis which says the license or rights page. I made a
patch to add that info to the API.
<https://bugzilla.wikimedia.org/show_bug.cgi?id=17224> Although
putting the name & URL of the GFDL doesn't comply with it, it is
definitely better than nothing (and for many licenses it is almost
sufficient).

But as I said I will reply on your other post.

thanks,
Brianna

--
They've just been waiting in a mountain for the right moment:
http://modernthings.org/

Brianna Laugher

2009-02-19 02:22:08 UTC

Permalink

Post by Heiko Hees
Hi,

assuming you are talking about the MediaWiki API. Yes this is a bug
with MediaWiki.

You can now find out the license of a wiki via the API :)

http://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=rightsinfo

this still doesn't help with finding the license for files, or knowing
what the requirements of the license are (eg fulltext, or authors),
but at least it lets you put a link and the name of the license.

cheers
Brianna

--
They've just been waiting in a mountain for the right moment:
http://modernthings.org/

Andrew Whitworth

2009-01-28 14:39:52 UTC

Permalink

I'm forwarding this email to the mwlib mailinglist.

--Andrwew Whitworth

---------- Forwarded message ----------
From: Brianna Laugher <brianna.laugher-***@public.gmane.org>
Date: Wed, Jan 28, 2009 at 8:37 AM
Subject: [Textbook-l] PediaPress error with multiple pages with the same name
To: Wikimedia textbook discussion <textbook-l-RusutVdil2icGmH+5r0DM0B+***@public.gmane.org>

Hi,

I'm not sure where to post this, but I'm pretty sure if I put it here
someone from PediaPress will probably read it.

I just found out about the PediaPress bookmarklet "create a collection
from any MediaWiki" thing and I thought I would try it. I added the
"Melbourne" article from en.wikipedia, wikitravel and Wikimedia
Commons, as well as an article called "Street press" from somewhere
else. When I downloaded the PDF, it contained 3 copies of the
en.wikipedia "Melbourne" article and the "Street press" article. So I
guess there is some bug with multiple articles from different sources
that happen to have the same name.

Secondly, this is not a bug but a feature request: en.wikipedia in
particular produces an awful amount of crud that is not that useful
for printing: references, external links etc. For the [[Melbourne]]
article, there are 22 pages of beautiful text and images, and no less
than 11 1/2 pages of crud, mostly consisting of 184 references. Would
it be possible to have an option to exclude references? Maybe replace
them all with a note like "To see original references, please visit
[url]."

thanks,
Brianna

--
They've just been waiting in a mountain for the right moment:
http://modernthings.org/

Heiko Hees

2009-01-28 21:01:46 UTC

Permalink

Hi,

Post by Brianna Laugher
I'm not sure where to post this, but I'm pretty sure if I put it here
someone from PediaPress will probably read it.

yes :)

Post by Brianna Laugher
I just found out about the PediaPress bookmarklet "create a collection
from any MediaWiki" thing and I thought I would try it. I added the
"Melbourne" article from en.wikipedia, wikitravel and Wikimedia
Commons, as well as an article called "Street press" from somewhere
else. When I downloaded the PDF, it contained 3 copies of the
en.wikipedia "Melbourne" article and the "Street press" article. So I
guess there is some bug with multiple articles from different sources
that happen to have the same name.

Yes this is a bug. The feature[1] is rather prototypish. If one wants
to implement this correctly much more work is involved, especially to
get licence handling right.

Post by Brianna Laugher
Secondly, this is not a bug but a feature request: en.wikipedia in
particular produces an awful amount of crud that is not that useful
for printing: references, external links etc. For the [[Melbourne]]
article, there are 22 pages of beautiful text and images, and no less
than 11 1/2 pages of crud, mostly consisting of 184 references. Would
it be possible to have an option to exclude references? Maybe replace
them all with a note like "To see original references, please visit
[url]."

Good idea! I'll add it on the "customization of PDF-output"-wishlist[2].

Heiko

[1] http://pediapress.com/collection/
[2] http://code.pediapress.com/wiki/ticket/419#comment:1