Friday, August 30

Select the rows which have the max/min value in a table. Mysql.

I think I am the first one to solve this problem, perfectly. At least, independently.

These 2 StockExchange questions describes the issue pretty well:


Here is a sample table:

personid, value, date_of_service
1,  23,  2013-01-05
2,  24,  2013-01-01
2,  25,  2013-01-02
1,  26,  2013-01-03
1,  27,  2013-01-04
1,  28,  2013-01-04
1,  29,  2013-01-05

Now we want to get the latest value for each person, that would be
2,  25,  2013-01-02
1,  23,  2013-01-05  or 1,  29,  2013-01-05
(person 1 have 2 values for the same date. now we just want to get either one of them.)

The previous solution I found was:

select personid, value, date_of_service from table test join
       (select personid, max(date_of_service) as maxdate from test
group by personid) a
on test.personid=a.personid and test.date_of_service=a.maxdate;
The result actually contains 3 rows:
1,  23,  2013-01-05
2,  25,  2013-01-02
1,  29,  2013-01-05

It is not bad. Get the job done. The join is not very efficient because the table was visited twice, and the join created (n*m) records. This is actually one of the official answers from MySQL. The other 2 official answers are the same good (or bad) as this one.

The two StackExchange pages give the concepts of "max(my_date) over (partition by userid) max_my_date" and "MAX(value) KEEP (DENSE_RANK FIRST ORDER BY date DESC)", but they are Oracle or MS SQL Server concepts, not available in MySQL. With some more help from Xaprb and Madhivanan, I am able to come up with a clean solution:

First,
select @sno:= case when @pid<>personid then 0
                    else @sno+1
    end as serialnumber,
@pid:=personid as personid, value, date_of_service
from test
order by personid, date_of_service desc
We get:
serialnumber, personid, value, date_of_service
0, 1,  23,  2013-01-05
1, 1,  29,  2013-01-05
2, 1,  27,  2013-01-04
3, 1,  28,  2013-01-04
4, 1,  26,  2013-01-03
0, 2,  25,  2013-01-02
1, 2,  24,  2013-01-01

You can see that I added "rank" for each row of same person, order by date desc. Another person will have ranks, starting from 0 again.
So the final solution is
select * from ( the previous sql statement) where serialnumber=0

and we get:
0, 1,  23,  2013-01-05
0, 2,  25,  2013-01-02

Yeah! It works! Compare to the previous solution, this one only visit the table once.

If you want to get the earlest value, you just need to order by personid, date_of_service (without desc); If you want to the the date that it gets the highest value, change the order to personid, value desc. As long as you can make the serialnumber (rank) as 0 for your interest variable, you get it.

If you want to get the two record of the last day for person 1, (1,  23,  2013-01-05) and (1,  29,  2013-01-05), then you need to modify the sql not to grow @sno if the date is the same. You can do the homework.

Seriously, I think I am the first one to crack this problem in MySql.

Happy Hacking!


May be u are.. :)

Any way good work mate..

Its serious time for MySQL team to consider about implementing RANK() and DENSE_RANK() as inbuilt functions.
 
Thhank you for sharing this
 

Wednesday, August 28

Bi-weekly payroll or bi-monthly payroll?

If you google "Bi-weekly  or bi-monthly payroll", you will find a lot of entries, mostly in accounting websites, discussing which one is better. 1 , 2, 3...

Before I start the discussion, we need to learn a little bit about English. "bi-weekly" has two meanings: twice a week, and once every two weeks. The same goes to "bi-monthly": it can be either twice a month, or once every two weeks. So "semi-monthly" would be the better word to describe being paid twice a month.
In one working environment I observed a conversation that one side said "bi-weekly" meaning twice a week, while the other side said "bi-weekly" meaning once every two weeks. So I have to stop them and clear up the confusion so that the conversation could continue. After that, the team has to use the terms "twice a week" and "once every two weeks" in communication.

So to be clear, in this discussion, the "bi-weekly" payroll means the employee gets paid every other Friday. the "bi-monthly" payroll means the employee gets paid on 15th and the last day of each month. If the pay date is not a work day, change it to the previous work day.

Those pages in accounting websites pretty much have the same conclusion:
1, Both of them have same effect on tax. Neither one of them can save more tax than the other one.
2, The "bi-weekly" method need the accountants to do more booking each year, than "bi-monthly" method.
3, It's easier to calculate "bi-weekly" payroll for hourly workers. It is easier to calculate "bi-monthly" payroll for salary workers.

There is one thing they don't talk about: What difference does it make for employees, by choosing "bi-weekly" or "bi-monthly".

As a regular salary work, I found the "bi-weekly" payroll is very annoying:
All of my bills come in monthly, including utility bills, credit card bills, cell phone bills, mortgage, car loan. I guess mortgage and car loan can be negotiable to be paid bi-weekly, but others can't. So with bi-weekly payroll it is very hard to find out how much you earn every month, and how much you pay at the same time.
After switching to a new company that is using bi-weekly payroll, my bookkeeping is a mess now.

Friday, August 23

autorun.inf issue

"autorun.inf" is a file name that if you put into the root folder of a USB or CD/DVD, it can be executed automatically in Windows default setting when the USB/CD/DVD is inserted into computer. You can say it is a variant of autoexec.bat, if you know what that means.

That can be very dangerous. Some virus creates this file to execute itself and gain control in a new computer. That is why whenever I got a new computer under control, I always disable the autorun functionality.

Sure, AntiVirus software knows that too. So most (all that I know of) AntiVirus software disable this file by default. In most enterprise deployment, a user doesn't have the option NOT to disable it.

Here comes the problem: I downloaded a trial version of Matlab. During installation the program tried to unzip files into a local folder. Unfortunately there is a "autorun.inf" in the zip file, so there is no way to unzip successfully. As I said, as an enterprise user, I don't have access to change AntiVirus software settings, or disable it.

So at this point, I don't see a way to install Matlab.

Hi thanks for sharring this
 

Monday, August 12

Warnings in your code

In industry, there is a statistics that every x warnings contain one fatal error that will require you a lot of time/money to debug/fix and recover lost.

There are some warnings that are harmless. You know that, so you think you can ignore them.

When you start ignoring warnings in your code, you start accepting that your code is messy, lousy, and you stop caring for it. That is how the real problem starts. When a fatal warning is created, you don't see it. Several months later, you will find yourself in a situation that your previous work were based on wrong result, and you don't know how you have come so far.

So if you know the warning is harmless, and it is expected, then you should explicitly spell that in your code to suppress the warning. Don't abuse this method though.

If you are using Matlab, the editor gives warning for the lines that don't have ";" sign to complete. If you actually wanted to show the result of that line, please use ";" to complete that line, then use disp() to explicitly show the result.

This way, only the unexpected warnings are left in your code, and you can focus on them to remove them one by one.

A mature system should disallow any warning in the production code.


Programmers are the only species that ignore warnings and fell off the cliff.


Wednesday, August 7

台湾历史上的原住民

1. 历史上,台湾在旧石器时代就有了原住民。从基因和南岛语的考究上,人们大致得出一个结论,东南亚岛屿上很多现今的原住民是当时台湾原住民的后裔。至于当时的台湾人是从东南亚来到这块岛屿上,还是从中国大陆来的,还没有确证来证明或否定。

2. 时间快进到中国的明朝,17世纪20年代。现台中的地方有一个“最強盛的時候曾統治27座村落”的大肚王国;明朝在离岛50公里的澎湖早就建设了政权,作为驻军、往来贸易的落脚点,却懒得进攻荒蛮的台湾岛,只有海盗郑芝龙等在台湾岛建了港口作为基地。荷兰人进攻澎湖未果,在归顺明朝的郑芝龙、李旦的协调下,明朝允许荷兰人在台湾落脚。

3. 此后40年,中国政府(明朝)在澎湖隔岸观火,看着荷兰人和西班牙人在台湾登陆,镇压原住民和一些汉族海盗移民的反抗。直到满清占据了中国大陆,郑成功带领残余明军背水一战杀到台湾,赶走了荷兰人,第一次在这个岛上建立了汉族的统治,以明朝的名义把台湾岛划入中华版图。20年后,投降清朝。

4. 此后200年,大陆不断有人移民台湾,是现今台湾本土人祖先的主要来源。这段时间原住民的生活如何呢?《赛德克•巴莱》最开头30分钟应该是他们生活的最好概括:狩猎在丛林里,到寨里和汉人交易。这里土地肥沃,野兽众多,汉人和原住民应该没有太多的冲突和仇恨。不断会有原住民接纳汉人的生活方式,也会有退守深林的选择。

5. 1895年,黑旗军刘永福等在清朝拒绝支援的情况下无法抵挡日本的入侵,日本统治台湾。原住民对于臣服另一个民族本来应该感觉没有多少区别。可是现代化进程在这个时代入侵了台湾丛林,他们被剥夺了狩猎的自由,传统信仰的自由。现代文明的日本也不可能允许他们再“出草”,猎人头,。本土人(扎根已久的汉人)对日治时期,估计是爱恨交加吧。既享受现代文明带来的方便,也渴望自由。这段时间,本土人、原住民都有参加日军,到东南亚侵略,为大东亚共荣圈献身以进入靖国神社为荣。

6. 1945年台湾回归中国;几年后,战败的国民党带着不愿意降服共产主义的社会精英、军队来到台湾。

所以,台湾的“外省人”是1945年后来台的汉人;“本土人”是过去350年来台的汉人的后代;原住民仅占2%;他们到底想要降服于中国的皇帝,还是日本的皇帝,还是台湾的总统,或者是要独立政权,赶走入侵者,在台独人士的眼里是无足轻重的。他们存在的意义就是证明这块地方“自古是独立的”。
其实,原住民本来一直就没有独立的政权存在;荷兰、汉人的进入打断了大肚王国的进化。现代化的进程也不可能让他们再回到丛林里以传统的狩猎为生。“台湾独立”对于他们的传统来说,于事无补。