Note about Perl
1, In Perl language you don't have to define a variable, and the usage of string and a number is quite confusing. Below is a program :
$a="3"; #$a is a string.
$b=$a+1; # how can $a (a string) plus 1?
print $a. $b; # $b is a number, how can you concatenate a string and a number?
And the result of running this program is
34
Yes, after the program value of $a is "3" and value of $b is number 4.
2, To encode GB18030 (including GBK and Big5) string into Unicode, you can use Encode::HanExtra. Below is a sample:
use Encode;
use Encode::HanExtra;
my $str = "這是一些大五碼\n";
print $str;
print decode("gb18030", $str);
The result is:
but if you output the result into a file "utfcode.pl > abc.txt" and use an editer supporting Unicode to open it, we can see something interesting. I am using EditPlus:
Looks like the same as the first one, isn't it?
Now, selecte the menu "Document"->"Reload As", we can see we have 3 options:
"Default", "Unicode", "UTF-8"
I don't know the difference between "Unicode" and "UTF-8". But anyway, you select "UTF-8", then you can see this:
This means that the decode("gb18030", $str) created a string in UTF-8 coding, even though you can't see it from the direct output. Yes, what you see is NOT what you get.
3, XML::RSS can create a RSS file. But I want to:
- Change the stylesheet
- Currently there's no stylesheet bind with the RSS created by XML::RSS, so my solution is:
my $rssstring = $rss->as_string;
$rssstring =~ s/<\?xml version=\"1.0\" encoding=\"UTF-8\"\?>/<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<?xml-stylesheet type=\"text\/xsl\" href=\"\/rss.xsl\" ?>/;
open(FILEOUT, ">rss.xml");
print FILEOUT $rssstring;
close FILEOUT;
instead of$rss->save("rss.xml");
quite exhausted, isn't it? - Delete an item
- Usually when people update an RSS file, they just delete the old one and create a new one. But I only want to add a new item into the RSS, and if there're more then 10 items, delete the oldest one. In this way I can always maintein the newest 10 items in the RSS. Is there any better RSS module in CPAN to do this job?
4, LWP::Simple is a popular module to get webpages. But today I got this message: Your browser doens't support cookie. To support cookie I use use LWP 5.64:
use LWP 5.64;
my $browser = LWP::UserAgent->new;
$browser->cookie_jar({});
$webPage=$browser->get( $fetchURL )->content;
Updated on Aug, 03:
1, I found I can edit the XML::RSS module, so I added a function "add_header()" into the module, and now I can add anything between
<?xml version="1.0" encoding="UTF-8"?>
and
<rss version="2.0"...
If you want, you can download my rss.pm and put it into your Perl\site\lib\XML folder to replace the original one. The original rss.pm is read-only be default, so make sure you know what you are doing.
2, The source code of XML::RSS shows that we can access the items directly. Actually the man page gave a sample how to delete an item:
# insert an item into an RSS file and removes the oldest item if
# there are already 15 items
my $rss = new XML::RSS;
$rss->parsefile("fm.rdf");
pop(@{$rss->{'items'}}) if (@{$rss->{'items'}} == 15);
$rss->add_item(title => "MpegTV Player (mtv) 1.0.9.7",
link => "http://freshmeat.net/news/1999/06/21/930003958.html",
mode => 'insert'
);
I have to admitted that I didn't read the man page carefully :(
Labels: Programming
3 Comments:
UTF-8 is basically the Unicode version 2.
usually when people talk about unicode , they refer to unicode version 1.
How do you make xys rss? cut and paste by hand?
Thanks, xj. So can we say that UTF-16 is Unicode version 3, and UTF-32?
I have a perl script to get xys.org automatically and upload it to http://feeds.feedburner.com/xys
Now since the xys.org is down, I changed the script to get the info from yahoo group. I don't need to cut/paste everyday. What I need to do is to make sure I turned on my computer :)
UTF-8 is sort of independent of unicode. No version correlation. I just made that up. :-)
<< Home