Monday, August 1

Note about Perl

1, In Perl language you don't have to define a variable, and the usage of string and a number is quite confusing. Below is a program :

$a="3"; #$a is a string.
$b=$a+1; # how can $a (a string) plus 1?
print $a. $b; # $b is a number, how can you concatenate a string and a number?

And the result of running this program is

Yes, after the program value of $a is "3" and value of $b is number 4.

2, To encode GB18030 (including GBK and Big5) string into Unicode, you can use Encode::HanExtra. Below is a sample:
use Encode;
use Encode::HanExtra;
my $str = "這是一些大五碼\n";
print $str;
print decode("gb18030", $str);

The result is:

but if you output the result into a file " > abc.txt" and use an editer supporting Unicode to open it, we can see something interesting. I am using EditPlus:

Looks like the same as the first one, isn't it?
Now, selecte the menu "Document"->"Reload As", we can see we have 3 options:

"Default", "Unicode", "UTF-8"
I don't know the difference between "Unicode" and "UTF-8". But anyway, you select "UTF-8", then you can see this:

This means that the decode("gb18030", $str) created a string in UTF-8 coding, even though you can't see it from the direct output. Yes, what you see is NOT what you get.

3, XML::RSS can create a RSS file. But I want to:

Change the stylesheet

Currently there's no stylesheet bind with the RSS created by XML::RSS, so my solution is:
my $rssstring = $rss->as_string;
$rssstring =~ s/<\?xml version=\"1.0\" encoding=\"UTF-8\"\?>/<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<?xml-stylesheet type=\"text\/xsl\" href=\"\/rss.xsl\" ?>/;
open(FILEOUT, ">rss.xml");
print FILEOUT $rssstring;
close FILEOUT;

instead of

quite exhausted, isn't it?

Delete an item

Usually when people update an RSS file, they just delete the old one and create a new one. But I only want to add a new item into the RSS, and if there're more then 10 items, delete the oldest one. In this way I can always maintein the newest 10 items in the RSS. Is there any better RSS module in CPAN to do this job?

4, LWP::Simple is a popular module to get webpages. But today I got this message: Your browser doens't support cookie. To support cookie I use use LWP 5.64:
use LWP 5.64;
my $browser = LWP::UserAgent->new;
$webPage=$browser->get( $fetchURL )->content;

Updated on Aug, 03:
1, I found I can edit the XML::RSS module, so I added a function "add_header()" into the module, and now I can add anything between
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"...
If you want, you can download my and put it into your Perl\site\lib\XML folder to replace the original one. The original is read-only be default, so make sure you know what you are doing.

2, The source code of XML::RSS shows that we can access the items directly. Actually the man page gave a sample how to delete an item:
# insert an item into an RSS file and removes the oldest item if
# there are already 15 items
my $rss = new XML::RSS;
pop(@{$rss->{'items'}}) if (@{$rss->{'items'}} == 15);
$rss->add_item(title => "MpegTV Player (mtv)",
link => "",
mode => 'insert'

I have to admitted that I didn't read the man page carefully :(



At August 03, 2005 12:18 PM, Blogger xls said...

UTF-8 is basically the Unicode version 2.

usually when people talk about unicode , they refer to unicode version 1.

How do you make xys rss? cut and paste by hand?

At August 03, 2005 3:17 PM, Blogger Unknown said...

Thanks, xj. So can we say that UTF-16 is Unicode version 3, and UTF-32?

I have a perl script to get automatically and upload it to
Now since the is down, I changed the script to get the info from yahoo group. I don't need to cut/paste everyday. What I need to do is to make sure I turned on my computer :)

At August 03, 2005 7:23 PM, Blogger xls said...

UTF-8 is sort of independent of unicode. No version correlation. I just made that up. :-)


<< Home