Opinions 5: 2011

Friday, October 14, 2011

Airtal India 3g/gprs configuration

I have been using the Airtel gprs configuration named Mobile Office for a long time. Today I called them up, and changed my 3g plan. As an "welcome kit" they proactively sent me 3 new configurations. When I used them, I could connect to the Internet - but to my utter disappointment, other applications (like Google Maps) were unable to connect anymore!

I figured out that they have given me a WAP configuration.

After much research, and a lot of time with the customer representatives, I could finally piece together the lost Mobile Office configuration (which the customer rep confirmed to be the one to use for 3g, and he himself came up with the settings).

Below I've attached the configurations screenshot from my phone for others to use. Once you feed in the values like this, you'll have a fully working 3g Internet connection - not an WAP or MMS server that they push for unfathomable reasons.

Cheers!

Tuesday, October 11, 2011

To rename Google Map starred landmarks

If you use Google Maps on Android, you may have seen that you can mark new POIs (by tapping and holding a location, and starring it), but there is no obvious way to rename the marked location - it takes the name as whichever address Google could look up for that place. It cannot be done even on Google Maps on the Internet.

It also appears that customers have been requesting Google for this feature, and it hasn't been added for a long time. Chances are that Google will one day add this feature - though at this moment it seems the engineers in Google are busy with implementing other features that they find motivating and challenging.

Well, there is an obscure but moderately easy way to help you change the name. The trick lies in another service provided by Google, called Bookmarks. If you open Google Bookmarks, it will show all your starred locations. You can then from there click the edit link to rename a location, it will be immediately updated, and shown next time you use Maps either on your phone or online!

If you have never done this before, you would find this feature nifty because even though Android maps does not show the starred locations by the name you have given, you will be able to search quickly by these names in the map!

Friday, October 7, 2011

Hack yourself out of jetlag

There are a lot of ways that you may know about to reduce jetlag. Well some of them worked on me partially, some didn't. I will talk about something else which works like magic -

Starve till it is time to sleep, then eat your dinner. That's all.

When you are starving, a mechanism in your body takes over, which does not allow you to feel sleepy. This has to have an evolutionary root. In the early days - when there was no Mc Donalds - if an animal is starving, it had to find whatever food in order to survive, while sleeping would not be as important a task.

So when you starve, you will find that you are not sleepy. Of course you will have to bear with hunger, but personally I think missing a meal is better than trying to suppress yawns all day during meetings. Till you are hungry, your biological clock will be suppressed - and once you eat and sleep - it will kick start and adjust to that timing. Next day onwards you should feel great!

Try it and let me know if it works for you too.

Wednesday, October 5, 2011

It is all an illusion

Last week we considered the repercussions of simulated reality, and we saw that if we live in one of them, we would have no way to figure it out (unless the 'person' who is running the simulation decides to leave hints). I left you saying it gets much worse than that. Well, it gets worse, or more interesting than that, depending on how you view it. Here's the bomb - if you are okay to make certain assumptions, chances are that you are indeed living in a simulated reality.

The argument goes this way. Because of the advances in technology and computer science, humans will eventually gain the ability to simulate full nervous system including the brain in a computer. It could happen within 500 years as some predict, or it could take 5 million years - but if it happens, the astounding conclusion will remain same. Once humans gain the ability to carry out these simulations, they will do so for fun or for research. They will end up running a lot many simulations of their ancestors, to study the past. The number of simulated brains thereby will vastly dwarf the number of real brains in our Universe. Hence chances are if we pick a random brain (or consciousness), it is more likely to be a simulated one than 'real'. You, I, all that you know could be nothing but simulated nervous systems, programmed as a hobby of an advanced being.

Before you start to loose track, let me make one thing clear - even if it is the case, it still means (in a sense) that you are real. What you smell, feel, touch and see, are real - atleast to you. Your thoughts, desires, feelings are still real - just as the brain we discussed was real enough to itself or other brains being simulated with it.

The question that I would pose next is how important is it for the simulation to be actually carried out, versus to just set it up and keep it 'paused', to make it 'real'? Little reflection will show us that nothing should change the reality based on whether it is being run or not.

If we think about the brain being simulated on pen and paper - if the entire configuration of the brain's neurons is known, and written down on a paper, how important is really the act of writing the equations and carrying out the operations on paper to make it real? What difference does it make? Once you specify the initial configuration, the remaining is just result of some mathematics which whether you choose to do now, or later, will lead to the same end result. Just like if you know a stone was thrown up, whether or not you do the calculations (know how to do the calculations) doesn't matter, in a sense that stone is destined to come back because mathematics predicts it. If the initial configuration is known, the stone's fate is fixed and is not going to change. Similarly writing the equations down or not does not change the nature of the reality of the brain - actually animating the reality in the computer or not does not change the reality of the world to the virtual characters in the computer. If you simulate, whatever results will be their reality - and that remains a truth irrespective of if you bother to calculate the results or not.

So in a sense, if you specify the initial configurations of a brain (or universe), the resulting world becomes real. Going one step further, you don't even need to provide the initial configuration in meticulous detail - if you describe an initial condition, which is not too weird and can be possible to detail out following laws of physics - someone (or a powerful computer) can fill out the painful details of the configuration of each and every molecule in one of many ways that fits your description. Then it can start simulating it. For example if you say "an universe where a cup is revolving the sun", the computer can fill in all the molecules in the cup, every particles in the sun, and give them the initial velocities so that the cup is in an orbit around the sun - and then simulate that universe and show you the result. Granted the world is not specified uniquely anymore, but what matters is that it can be done in atleast one way. In a sense, just your describing the world makes it real - in the same way as we discussed how describing an initial configuration (in detail) make it real, even though you may not bother to actually simulate it.

Why is it even necessary for you to think or describe the initial configuration at all? Indeed it is not necessary - if a possible initial configuration exists (i.e. it is does not violate any physical laws), it can be in theory simulated - and again irrespective of whether you choose to do it actually doesn't matter - it doesn't change the 'truth' of events that are bound to happen in the universe - and in a sense it's all real

Let's conclude by revisiting what we have seen.
1. The brain when being simulated on pen and paper through mathematical equations, 'feels' it is real
2. Chances are staggeringly high that we are in fact simulations run by some advanced being (even probably advanced 'real' humans) as their toy universe hobby
3. It can even be possible that we are just someone's imaginations, or worse yet nobody even imagined us, but we 'exist' because the initial configuration for our universe exists in a mathematical sense!

Someone once said, "it is all an illusion". Seems there is some truth there after all.

Sunday, October 2, 2011

Geocities.com automatically backed up

If you had a geocities website and you didn't have the chance to retrieve it before geocities died, fear not!

For there is a someone who might have already done it for you - oocities.org

I was surprised to hear some of my colleagues discover my years old site on geocities, and that's how I got to know about it.

They seem to have proactively backed up all (or most) of geocities before it was taken down. The good news is that your data is still there. Not only did they back up and serving they pages, they also allow google to index them - and that's how I could found them.

You simply need to replace "geocities.com/..." with "www.oocities.org/..." and all your old pages will be there. Even the downloads (I tried some .zip files) work!

The bad news is that they do not support modifying anything there yet. So if you want to put a link to your current ventures, you cannot do it. From the FAQ:

For now, we are providing rather an archive and we are not able to give access to our servers for most active sites. But we will be glad to send you your files to enable you to upload and update them somewhere else and to make them available to a bigger audience than here.

You can delete it though, lest you really fear someone will see the embarrassing (ahem) attempts at taming the Internet from a much younger you. To delete something, you simply need to write to them.

We fully respect you copyright and will of course remove your interlectual property from the internet at our earliest convience. Just write a mail to oocities {AT} gmail {DOT} com and it will usually be gone within a few hours.

For me though, it was a nice surprise, and I am happy :) Here's to returning the good old days!

Thank you oocities.org!

Saturday, October 1, 2011

Why Firefox? Why??

Today I am going to talk about some of Firefox's strange design decisions. Don't get me wrong - Firefox is a damn good piece of software, I use it everyday. Also what I will talk about can easily be fixed or ignored - it does not affect anything in the browser critically. While these decisions may not be outright controversial, they are definitely weird. Having said that, here are the strange design decisions -

1. Removal of the RSS Icon
Not so long ago, I think back in the days of Firefox 3.5, you used to get an icon when you were reading a blog or a news channel, indicating that you can subscribe to it. It was placed very conveniently in the addressbar, and it used to show up only for posts which had an RSS or atom feed. It looked something like below -

Then they decided to remove it from there, and place it in a button, which would be turned off by default. You can customize the toolbar, and bring it back from the list of available buttons. If you do that, now (since Firefox 4 till Firefox 7 at the time of writing) it looks like -

Which is good, except it takes up more space, and it does not disappear if the URL does not have a feed. It just turns gray instead, and it is much harder than the previous implementation to determine if the site has a feed or not. The good news is Firefox being extensible, allows you to bring it back, you just need to use this addon.

2. Integration of Personas
On 2009, Firefox decided to integrate Personas - a plugin that allows you to skin the browser by painting images in in the background of UI elements like menus. When it happened, there were 10s if not hundreeds of equally if not more (and I would actually argue more) useful plugins. For example, Greasemonkey, Tab Mix Plus, Download Statusbar, Shareaholic, FoxMarks (XMarks now, and they did integrate a different build of it though not with so many features - while XMarks almost died - but that's a long story). Why would you choose to include Personas in the codebase? Wouldn't these other plugins deserve the same? I understand everyone will not need to use these plugins. Well, not everyone needs Personas as well - I tried some of the skins and they look ugly anyway. Well, the good news is if you don't want to use it, it's then just an unnecessary bloat in the code (however small), and you can choose to just ignore it.

But why Firefox? Why??

Galaxy S II vs Nokia N8, Apple iPhone 4, Samsung Galaxy S

Galaxy S2 is probably the best smartphone available to consumers to date. Consequently I did some research on the device, keeping its mind its close competitors. Below I post my review of the phone for me and others as a quick reference.

Pros -

+ Dual Core 1.2 GHz ARM processor. For comparison, Galaxy S has a Single Core 1 GHz ARM processor. This is very fast for programs optimized for Dual Core (eg. web page browser, programs that manipulates/rotates images like Google Map, image browser, etc.). Even for programs which are meant to run on a single core, the phone will supply plenty of power since the single core speed is higher than most other phones.

+ Interface driven by Android 2.3 coupled with the superfast CPU is very responsive for the end users. There is not a moment when the phone feels that it slowed down even a little.

+ NFC, or Near Field Communication. This is right now only one of very few phones that support NFC. NFC is a protocol, through which the phone can communicate with another phone or device which is NFC aware when they are brought close to each other. It is designed to operate without the requirement of having to manually pair with another device. This tech is used by Google Wallet to authorize payments - where instead of swiping a card, you wave the phone in the authorization machine and (type in your PIN in the phone) to authorize a payment.

+ USB to Go - Not only can you connect the phone to computer as a USB device, you can now use the phone to host USB devices. That is if you have a USB pendrive or hard disk, and you want to transfer videos or songs to the disk from the phone, you can do it without needing a laptop.

+ Camera - The phone sports a decent 8MP camera with dual led flash. The picture quality is good, and it is capable of capturing 1080p videos, which also seem to be of decent quality. If you need a camera with more megapixels, Nokia N8 provides you with a 12MP camera. But megapixels don't mean all, and where Samaung trumps Nokia is in the interface. It is worthy to note here that Galaxy S sports a 5MP camera which takes picture of very decent quality, though for some strange reason it has no flash! Here is an indepth review of the camera with picture samples.

+ Super AMOLED is Samaung's proprietary technology to provide even crisper display at lower power. The phone also has Gorilla Glass, which can stand its share of torture. However don't push it too much, as it may not be completely unbreakable.

+ Android 2.3 with all its goodies, the phone is bound to attract any gadget freak. The interface is crisp, responsive (even to your tilts and shake!) and intuitive - a feast to behold. It already comes with some great software, and it is easy to get some more great applications installed on the phone based on your liking. The ever increasing appstore has lots of very good free applications that you can load. This already has better variety than Apple Appstore, primarily because Google designed it to be much more friendly towards software developers - with checks to ensure writing apps for Android is fun and not frustration.

+ Other goodies include 16GB inbuilt memory, digital compass, acceleration sensor, support for viewing running Flash based webpages, predictive input of text through touchscren swype, and more.

Cons -

- Back button seems to give trouble for some people, where it goes back twice when you press the button, instead of once. Though this seems to happen rarely, it appears to be a design fault in some phones.

- Random restarts happen spontaneously on the phone. This has been pretty much traced to SD card, and could be due to the positioning of the SD card which pushes the battery slightly out. To resolve that you may insert a piece of paper to ensure the battery is always connected. This could also happen because of other reasons, like corrupted applications on SD card.

- Google Maps voice guided navigation does not work on many countries(!!). This is very strange and slightly troubling, since navigation is one of the primary usage these days of buying a smart phone. While Google seems to be mute on reasons for this, I surmise that this has to do with managing load on their servers. Unlike most other GPS, eg. Nokia Maps or Sygic, Google's navigation works by requesting Google to plot the route instead of doing so offline on the device. The solution seems to be to install Brut Maps, a free navigation software that uses Google Maps data, though there are unofficial hacks to enable navigation in Google Maps for your country.

- Battery - Many users have reported that the battery life might be much lower than expected. This could be due to requirement of conditioning of the battery which may require using the phone for some time.

Competitors
Apple - provides a stable phone, interface is practically as good as S2. While Apple offers a good platform for developers, this is where Android excels, and hence you can expect to find more free and paid quality apps and games for Android than iOS. iPhone 4 has higher resolution (960x640 vs 800x480), but lower screen size (3.5" vs 4.3"), kind of defeating the purpose and acting more as a gimmick. iPhone has no external card slot, nor USB2GO. Also it does not have support for NFC. For those conscious about looks, iPhone weighs more and is slightly thicker and bigger. Prices being nearly the same, there is a good difference in the features and looks - which makes me swing much in favor of Galaxy S2 over iPhone 4.

Nokia - unfortunately made many big mistakes and now is paying for it. Symbian, the OS in which this phone runs and on which you develop apps, was no where near as developer friendly as Android. It was clumsy to code in it, and it required you to buy a license to even make your apps install without requiring to get a phone hacked. Also the reviews of Nokia Care are generally pathetic, and it appears that most of the Nokia Care centers hire underpaid executive resulting in unprofessional experience, both in terms of attitude of representatives and in relation with damaging your gadget. They also didn't get any kudos for their decision about a year ago to make navigation free, while completely ignoring customers' plea to support all but recent devices back then. All these are not good sign for a company, and typically mean that the organization is interested more in short term profits rather than long term commitment to the customers. My recommendation will be to steer absolutely clear from Nokia till they can get themselves together to fix their issues, which will take time.

Samsung Galaxy S - would have been a phone I'd love to buy, despite it's much slower and has lower RAM, internal memory and camera resolution (picture quality is very good at 5MP). The price is only half or less than Galaxy S2. However that phone does not have certain features, including any kind of flash for the camera, which is a deal breaker for me! Why, Samsung? Anyway, other features it does not have are NFC, USB2GO, out of the box Flash support in websites (though Youtube has special support).

Verdict
There are some hiccups (mentioned above), and though they are rare it cautions warranty. But if you are looking for best smartphone now, go right ahead, you wouldn't likely be disappointed!

Wednesday, September 28, 2011

SAS: Proc Logistic shows all tied

Logistic regression is used mostly for predicting binary events. I use logistic regression very often as a tool in my professional life, to predict various 0-1 outcomes. For carrying out logistic regression (and other statistical data processing jobs), I primarily use a popular statistical package called SAS. It has been around since the initial years of statistics based marketing, and has established itself as a defacto standard in the risk analytics domain.

When you run a logistic regression in SAS, it shows you a lot of interesting and important parameters in the output. Among the outputs, it will show you the parameter estimates using which you can make a prediction, various statistics involving the data entered, and the statistical confidence of the individual estimates as well as the overall model.

One of the summary reports which tells you how good the model is doing is found at the bottom of the main output -

Association of Predicted Probabilities and Observed Responses
Percent Concordant     85.6    Somers' D    0.714
Percent Discordant     14.2    Gamma        0.715
Percent Tied            0.2    Tau-a        0.279
Pairs                  7791    c            0.857

An analyst can look at all the above parameters to make a quick judgement on how well the model will perform when put to test for predicting the outcome on a new set of data. The 'c' is basically the area under ROC curve, and Somer's D corresponds to gini coefficient under certain conditions. The c should ideally vary between 0.5 to 1, with 0.5 meaning the model is not working at all.

Therefore I was puzzled when I saw this in one of my outputs -

Association of Predicted Probabilities and Observed Responses
Percent Concordant          0.0    Somers' D    -.000
Percent Discordant          0.0    Gamma        -1.00
Percent Tied              100.0    Tau-a        -.000
Pairs                 183334788    c            0.500

The c is 0.5, and Somer's D is 0 - which means the model is pretty much useless. However, all the other tables in the output (not shown here) told me that the model's performance was good! Why would this happen? This repeated couple of times recently for different models - in some cases the c was not 0.5, but still was much lower than what I would expect from experience. One indication I had was that over time my team was trying to model target events which are more and more rare. I could trace the solution to 'Percent Concordant/Discordant/Tied' being measured incorrectly in this particular table.

To measure these concordance percentages, you need to look at all possible pairs in your pool which have opposite observations (assuming binary outcome). Then you see in how many of these pairs the model predicts outcomes the way it happened (concordant), in how many the model is actually predicting the other way round and therefore is incorrect (discordant), and in how many such pairs the model score is exactly same. All the other values in this particular table are calculated using this kind of pairing. In the table above, it tells us that all the pair of observations which have different outcomes, are predicted to have exactly the same score - effectively translating into the model being completely useless.

However, when I checked the data myself, I saw these was not the case. Why would SAS report incorrect statistics in this particular table? The answer was found after lot of research - to optimize the calculations, SAS assumes two scores are identical if they have a difference of < 0.002. However if you notice there is no mention of this in the output. The documentation was also difficult to find. I suspect they added it recently along with an option that helps you to prevent this from happening (more on this later).

This could lead consequences of varying degree when the event being predicted is very rare. Since some or all of the pairs are considered as tied when the scores are within 0.002, the error will lead to wrong concordance. In all cases the result will be incorrect calculation of of c or Somer's D showing much lower predictive power (even zero in extreme case as above) than what it is really.

Ideally SAS should have done two things -
1. SAS output should mention that the table is "Estimated Association of Predicted Probabilities" or something on similar line.
2. The algorithm should vary the 0.002 threshold based on what is the overall rate of the event in the data.

What are the workarounds for the end users?

First, all analysts should be aware of it and not panic if they see strange reports in this particular table. They should then look at the other standard reports and tables they create to evaluate the model performance, disregarding this table completely.

Second, there are two approaches you can take to get the right values.

Approach 1 is to use SAS option BINWIDTH=0 with the MODEL statement in PROC LOGISTIC. Other than the fact that it can take longer as mentioned in SAS documentation (which should be okay since accuracy in this case should win over time taken), the hitch there is that the option was mentioned in SAS Documentation for version 9.22, and it does not work with SAS version 9.1.3 which my organization uses (which makes me feel that the whole explanation on this was recently added)

Approach 2 is to create your own codes/macros to calculate and report the correct values of any statistics out of these table that you normally look at (for example, gini or ROC). It should not be difficult for a seasoned analyst, and something I will recommend as a good exercise for someone with medium experience. After all, who knows what else could come as a surprise if you use SAS original procedures?

Review: Dragon Age (Origins)

This is the first time I played a party RPG. And it was awesome.

You arrive at Lothering

It took me a while to get into the interface. When it started (I chose a human mage) the movement controls (ADSW) felt awkward - different from other first person games, even though the scene was drawn at first person. It didn't take long though to get used to it. But before even I got used to it, the story had already began. I met the mouse and the bear in the Fade. Was already curious to see how (and if) I will escape Fade (which is sort of the dreamworld in the game, also where spirits reside) to pass the test posed to me. I think there is something in the storytelling which makes you get addicted to it from the start. I encountered the Fade demon and defeated him - was surprised by who he was and how I could defeat him without using any offensive abilities. Immediately I got hooked.

Though started out smoothly (where in the first encounter you don't even have to fight to win), the fighting in the game paces up surprisingly fast. I soon found out that the game is not like taking a walk in the park - at times, the combat needs to be carefully planned making full use of the "pause and strategise" feature.

Meanwhile the story continued on. There are some rather surprising turns that happened in the first part of the game, and after that you get a kind of free roaming ability - i.e. you can choose which area (and hence quest) to cover next. When you have covered all the primary quests, you then get to continue the main storyline to the end of the game. It takes a hell lot of time to reach that stage, and it never gets boring. That's because all the primary quests are crafted with lots of details, characters, locations, environments - and a very long story unique to them. Each one of them seems like a game on its own. I am pretty sure, looking at the vast variety and spectacular difference in all angles between the primary quests, that BioWare had different teams working on separate primary quests.

Despite the length of the game, it never feels dull. There are a lot of satisfying turns and twists to keep you fully engaged in the story. And in the course of the game you will meet some very memorable characters. Some of them will join your cause and fight with you - depending on your actions. Most of the companions also have their own side quest. You will need to unlock the side quests through the course of the game - mostly by gaining their appreciation towards you by giving them gifts, or doing things that they respect.

The game also has multiple endings depending on what you choose. Perhaps because my character died at the end of the game saving the world, it was a rather emotionally charged ending for my story. And after it ends, there is a treat - which took me by surprise. It has a slide show which details the impact of your actions on Ferelden for years to come. Since most of these are tied to specific choices you made over the course of the entire game, it is quite a feast.

The "Landsmeet"

The game offers many addons, of which two add new capabilities (along with small a quest each) which you can spend your money on. One is Warden's Keep, which adds a tash for storing your equipments - otherwise you will have to sell your stuff to make space for new items, as you cannot carry unlimited amount of them. The second is The Stone Prisoner, which unlocks the most unique companion - a golem called Shale. He is very well integrated in the rest of the game's stotyline - so once you unlock him, you will never feel that he is a companion that comes from an addon.

Let me pause here and mention something, rather an alert if you have not played the game yet so that you don't miss out on content. [Spoiler Alert] There is a character called Leliana in the game. I was aware of this since the game advertises an add on campaign called "Leliana's Song", though I wasn't sure exactly at which point she would join the game. After a lot of the story passed, I became suspicious and one day read more on her. And to my dismay I found that she is supposed to join in a scene that is only triggered if you visit a particular pub in Lothering, a village which gets obliterated early in the game. And once it is gone, there is no way to get her there in your group (unless you'd like to download dev-tools and use a crude hack to modify your save file, which I didn't want to do). So if you play the game, visit all houses in Lothering till you find Leliana, before Lothering becomes inaccessible!

Pros -
+ Engaging story
+ Your actions can change the course of the missions
+ Challenging fight system
+ Lot's of spells and abilities, but smooth learning curve
+ Very long game which never became dull
+ Companion characters are good, you are bound to take liking to some of them

Cons -
- Graphics is solid and nice, but not spectacular
- Sometimes certain important items are not clearly marked, which can cause you to miss certain experiences, as serious as a companion in your quest. It may be impossible to fix later on if you don't want to start from a previous save, which may not be a great option considering the length of the game.
- Party stash needs to be unlocked by buying Warden's Keep, this should have been a native feature in the game

Overall, this is an excellent single player game - definitely one of the best that I have played. You are bound to have hours of fun playing it - in fact I will be surprised if you don't loose couple of day's sleep due to this game. Excellent and recommended.

Tuesday, September 27, 2011

All about "Information Value"

In statistical data mining, sometimes we need to determine out of a set of variables which ones are best in capturing a desired behavior. For example, let's say you have a pool of customers for your credit card company, and you want to determine who out of them are about to default (i.e. refuse to pay up after possibly making a huge expense). You need to then identify which of the attributes you have on the customer can potentially identify and alert you of such behavior. One of the popular ways in which this is done by analysts is by looking at something called 'Information Value'. In the context of data mining is also sometimes referred to by the short form - InfoVal.

Definition
Information Value of $x$ for measuring $y$ is a number that attempts to quantify the predictive power of $x$ in capturing $y$. Let's assume the target variable $y$ which we are interested in being able to measure, is a 0-1 variable (or an indicator). Let's also further assume that it is the number of accounts who will go bad in the immediate future. Let's now divide our population in 10 equal parts after sorting the entire pool by $x$, and create the deciles. Now we are all set to define Information Value -
$$IV_x = \sum_{i=1}^{10}{\left(bad_i-good_i\right)\ln\frac{bad_i}{good_i}}$$
Here,
$i$ runs from 1 to 10 deciles in which we have divided the data,
$bad_i$ is the proportion of bad accounts captured in $i$th decile out of all bad accounts in the population,
$good_i$ similarly is proportion of good (i.e. not bad) accounts in $i$th decile.

Note that the variable whose effectiveness you want to measure is getting used since it is the variable by which the entire data is sorted and divided into deciles.

How does it work?
But why does it work? You can check that if $x$ has no information on $y$ at all, then the $IV$ turns will trun out to be zero. That's because when you sort by $x$ and create deciles, the deciles are as good as random with respect to $y$. Hence, each decile should capture 10% of total bads and 10% of total goods. So $bad_i-good_i=0$ and $\ln\frac{bad_i}{good_i}=0$. So the $IV$ turns out to be zero.

On the other hand if after sorting by $x$ some decile has higher or lower concentration of bad's than good's, then that would mean that that particular decile is different from the overall population, and $x$ lets us create it. The decile will contribute a positive value to the summation which defines $IV$ in the equation above. So it is clear that for a good $x$ variable, there will be more of such deciles where the proportion of goods and bads differ - and by a larger margin as your $x$ is more effective in capturing $y$ - hence $IV$ indeed gives a measure of predictive power of $x$.

Issues
However, there is something artificial in the definition of $IV$ above - it is the functional form. Indeed there can be many different ways to create the functional form that is being summed up.

To give some examples - $\sum_{i=1}^{10}{\left(bad_i-good_i\right)^2}$, $\sum_{i=1}^{10}{\left|bad_i-good_i\right|\frac{n_i}{n}}$, etc. all should be equally good candidates.

The last one in particular is interesting - because it has the proportion in the equation, making it a consistent measure. That is, if you decide to divide the data into 20 parts or 30 parts and so on, you will go closer and closer to a limit. Incidentally, the limit to which it converges is essentially gini/2 under some assumptions. For $IV$ on the other hand, this leads to a problem - you cannot divide the data indefinitely - as you may hit segments which have no good (or no bad) accounts at all - in which case taking the $ln$ will bomb.

Also for the same reason, it will be inaccurate and unfair to compare two variables when one of them has ties - which makes it impossible to unambiguously divide the population into 10 equal parts. (This is cautionary since it is a mistake that I have seen many analysts make, even when drawing Lorenz curves for example where the inequality of deciles can and should always be taken into account. For IV, to stress, it however cannot be taken into account.)

Origin
If $IV$ has these problems, how did the definition come about in the first place? Also should you use it?

I can only guess how such a metric came into being, and became popular over that despite having drawbacks. The concept of information actually arises in several other branches of mathematics, eg. Information Theory where you measure a very interesting quantity called entropy (which vaguely resembles $IV$), and Fisher's Information of a variable (which deals with how much information a statistic contains about a particular parameter of a distribution). I think someone was aware of these concepts, and wanted to create something similar for the corporate world to measure "information captured in a variable". In the corporate world how things work in most places, is that you show something works, and it becomes de-facto unless someone else challenges it and shows something else works better. Somehow the challenges are rare to come by in most places, which is why at senior leadership there needs to be a conscious effort into engaging the employees into thinking rather than following tradition. For $IV$, this was pretty much the case. We have already seen that it works, and after it was shown to work it became a standard - without proper math/statistical backing to explore pros and cons and to see if it can be bettered. And once something becomes a standard, it gains inertia and becomes legacy - which propels it through 'common wisdom'. The employees of that particular organization learned about it (they learned the formula, understood what it tries to measure, learned the SAS codes or SQL queries required to compute it). When they migrated, the knowledge migrated with them.

Recommendation
The only reason you might want to use $IV$ is because of legacy - you may have worked on a bank which used it, and may have developed some reference points (i.e. thumb rules like "if $IV$ is more than $abc$ then the variable is good, otherwise it is bad") of good or bad $IV$ values.Even if you have used $IV$ before, it should not take a very long time to develop experience on reference points for gini - you will be able to do that through a course of one or two projects. I will in general though recommend not to use IV, in favor of other more intuitive measures - roc, gini, KS to name some. I personally prefer gini - it is easy to understand, is consistent, and is very robust in measuring power of a variable.

To conclude, though $IV$ is still popular in some places of the banking world, I would recommend not using it in favor of gini/ROC which measures the same thing while being more intuitive and without the flaws of $IV$.

How real are you?

Are we real? Of course we are. How much real are we, say when compared to characters of a story, movie, or a video game? You might think that's an odd question - reality is reality, fiction is fiction! It may be so, but let's pause and think for a moment.

What differentiates reality and fiction? We can imagine a huge moon overlooking shores of an urban city surrounded by water - but that won't be real. Or will it? For an event to be real, you need real humans (supposing it involves humans) who have a mind governed by psychology - who think, feel reason like we do. We need real environment, made of billions of atoms so that the repercussion of every action is calculated with the greatest scrutiny. Merely outlining the large scale effects of an event (as is done in a fiction) is not enough to make it real. In short we should have an Universe where all the laws of physics are followed just as in ours, and we should be in it.

Before we go further into it, I would like to propose a thought experiment. Let's take a living human brain, and let's scan the configuration of the brain as precisely as possible. Fear not - since due to recent advances in modern physiology, this can be done without harming the person at all. Let's say we scan every neuron and the state of it - in the sense which other neurons is it connected to, and where is it currently sending any electric impulse. Then in principle, one can sit down and using pen and paper simulate the brain - by writing down its current state and slowly evolving it following the right biological course of the neurons. Using more sophisticated science, infact one may be able to code questions posed to the brain in terms of sound waves - interpreted into electric signals by the relevant portion of the brain. And once the computation has taken its course, following the same biological simulation one may carry out the necessary recoding and end up with the exact vibration of the larynx, from which sound waves can be deduced - pretty much the same sound waves which a real mouth will speak if a real brain is asked a real question. (You can find a variant of this experiment here in Wikipedia.)

However note that the person who is carrying out these computations, does not need to at all 'understand' the question that was posed in form of sound waves. The scanned brain may even belong to Stephen Hawking - and the question a deep one in astrophysics - and the person who is simulating this may not even understand English - but at the end of calculations he will still come out with exactly the same answer that a real Stephen Hawking would have given. Just as in real world, the consciousness and understanding that a brain possesses may be an emergent phenomenon, so can it be for a person meticulously simulating it on pen and paper.

So if the brain of a living person is scanned, and then tediously simulated by following mathematical equation on pen and paper, will give rise to exact same answers that the real brain will give (provided no mistakes are made). This is just like you can predict where Jupiter will be on this day next year - even though you of course cannot see it in the next year in our reality at this moment.

Do you think then the simulated brain is 'real'? Let's see. Let's suppose we note down the configuration of a real brain for simulation, and then destroy the real brain. (Well that's a rather inhuman thing to do, but it is only a thought experiment.)

Will the thought, feeling, knowledge, experience be preserved as the brain is simulated? Yes.

Will the brain answer to any question asked exactly the same way the 'real' person would have answered? Yes.

In general will the brain react in exactly the same way in which the real brain will react, when posed with the same (simulated) conditions? Yes.

So by all accounts the simulated brain's actions will match with the real brain - it will even have a (simulated) consciousness corresponding to the real brain.

The only fact which makes it not real, is that you are not there - you cannot directly see or talk to it or interact with it. What if your brain scan is also taken and is simulated alongside the other brain? Your simulated brain will have no doubt that the other brain is real. For them, since there is no way to detect the real you, your world will be as fictitious - just a 'what if we are being simulated' hypothesis.

Is there any way for the simulated brains to figure out that their world is merely simulated - and is not 'real'? The answer seems no (unless we choose to give them hints - eg. suddenly make a paper appear which has answers to their key scientific questions - which let's say we do not want to). They will see, feel, smell, touch their simulated world just as we do. They will have their own (simulated) thoughts, feelings, experiences, memories. They will experience (simulated) happiness, fear, joy, anger, peace just as we do. For them their (simulated) world is as real as we feel ours is.

It is a slightly disturbing conclusion that there is no way for them to detect they are being simulated, since it means that we could be living in a world which is entirely being simulated, and we would have no way to know about it. But it gets worse.
That's all my schedule permits me to write in the real (or simulated?) world where I live - we will continue the exploration next week. Update: Continued here.

Monday, September 26, 2011

Review of Race Driver GRID coming from NFSMW

I have had Need for Speed: Most Wanted for a long time in my computer, and I recently installed GRID. I will jot down couple of observations that I can make when I compare the two.

+ GRID has two features which are sorely missing in NFSMW, namely ability to replay your race, and damage your car. Both features were there in some other even earlier versions of NFS, but somehow did not make it through to Most Wanted.

In GRID, the replay is shown in a computer controlled camera (you can change it to one of the standard ones if you want), and as the rest of the game is very polished in visual quality. It really makes you look good!

The damage model is good, though coming from NFS where there is no damage, it takes a little time to adopt yourself for handling it. However it does not get too much in the way - as long as you avoid high speed crashes. Your car is immobilized immediately (and the race is over) if you hit the wall at 170mph. The dashboard shows which part of the car is damaged while you are driving. The only other effect I have seen is that if you badly damage one of your wheels, the car will be disbalanced and will have a tendency to automatically steer to one side which you will have to constantly counter throughout the remaining part of the race.

- There is no civilian vehicles and no cops. All the races are pure races, with closed tracks walled off with concrete blocks or blocks of car tires. The tires (as well as parts of your car) get scattered if you bump into them slow enough not to total your car. They will then remain on the track till the end of the race.

- There is no nitro boost. This makes winning more a matter of control.

+/- The difficulty is notched up, partly because of the damage model mentioned above. Also unlike NFS, the competitors here are all more serious in all levels - it is unlikely that you will find someone driving at a slower speed than yours in a long road without turns. Also it will take some time to get used to the controls and the cars, the physics is slightly different from NFSMW. Apparently it's a little more 'realistic' or 'sim' like. However after a bit of practice, you will start winning some of the races. The increased difficulty provides a sense of accomplishment when you win.

+ The game has a feature called 'flashback' which you can use a limited number of times (maximum 5) depending on your difficulty. This lets you rewind time to correct your mistakes (much like Prince of Persia series) and works beautifully in the game. This helps countering the difficulty and balance the game a bit more.

+ The graphics is just a treat for the eyes. The cars and the tracks are gorgeous, they make you want to play on just for the looks. I ended up 'test driving' the cars I own for adapting to the controls, and it was really nice. The menu system is very nice too, it's never static. You will feel that you are almost setting something into motion when you navigate through it.

+ The cars and the race types all have a different feel. It's like multiple racing games in one. You can drive 'Pro Muscle' through the city, or professional motor racing cars in 'Pro Tuned'. The cars for 'Drift' racing have a different weight setting which makes them more susceptible to drifting when you turn.

Friday, June 24, 2011

Memorable quotes from Portal

Memorable quotes from Portal depicting the characteristic humor –

Once the tests are complete – and the AI plans to kill you –

(GLaDOS): [near the end of the nineteenth and final test chamber] "Congratulations. The test is now over. All Aperture technologies remain safely operational up to 4000 degrees Kelvin. Rest assured that there is absolutely no chance of a dangerous equipment malfunction prior to your victory candescence. [the moving platform the player is standing on is sinking into a fire pit] Thank you for participating in this Aperture Science computer-aided enrichment activity. Goodbye."

Once you figured your way into the main chamber of the AI, and attempt to shut it down by destroying parts of it –

(GLaDOS): Have I lied to you? [pause] I mean, in this room? Trust me, leave that thing alone.

Portal 2 –

(GLaDOS): Okay look, we both said a lot of things that you are going to regret. (Combined with the characteristic evil and pseudo-cheerful voice in which it is said, this one is a gem!)

(GLaDOS): This next test involves the Aperture Science Aerial Faith Plate. It was part of an initiative to investigate how well test subjects could solve problems when they were catapulted into space. Results were highly informative: They could not. Good luck!

(Wheatley): Most test subjects do experience some cognitive deterioration after a few months in suspension. Now, you've been under for... quite a bit longer, and it's not completely out of the question that you might have a very minor case of serious brain damage. But don't be alarmed, all right? Although, if you do feel alarm, try to hold onto that feeling, because that is the proper reaction to being told you have brain damage.

(Wheatley): They say the old caretaker of this place went absolutely crazy. Chopped up his entire staff... of robots; all of them robots. They say at night you can still hear the screams... of their replicas; all of them functionally indistinguishable from the originals. No memory of the incident. Nobody knows what they're screaming about. Absolutely terrifying. Though obviously not paranormal in any meaningful way.

(Wheatley): You two are going to love this big surprise. In fact, you might say that you're both going to love it... to death. Love it until it kill- until you're dead. [chuckles] All right? I don't know whether you're picking up on what I'm saying here, but …

(GLaDOS): Yes, thanks. We get it. [Chell and GLaDOS enter a elevator] All right, he's not even trying to be subtle any more. Or maybe he still is, in which case: wow, that's kind of sad.