The Meaning of Money

I have been thinking for a long time about meaning of money and the following represent my personal (and not necessarily with concordance to the most financial experts) opinion on what “The meaning of Money” is.

Shortly: Money represent “Trust distribution system” with tickets (e.g. banknotes) or in case of paperless money more complicated system involving additional layers of trust.

So, let me try to explain what I mean.

  1. There is an authority organization (government, company, etc.), that is responsible for enforcement of laws and rules in given market or area. There is a trust (believing) in the fact that rules (laws) will be followed by all participants in the game (e.g. market).
  2. One participant in the market owns documents (trust tickets) issued by the Trusted Authority.
  3. Other participant owns some goods.
  4. Both participants want to exchange goods and money.
  5. The exchange price (e.g. what quantity of money for what quantity of goods) is determined by market mechanisms and are based on the degree of trust
  • The goods owner valuate the money using the subjective quantitative measure of Trust in authority organization, and subjective measure of relative valuation of his goods to other types of goods in the market or other type of market knowledge.
  • The money owner do the same but from his own point of view.

Authority Organization and distribution of trust

Let me elaborate on each point:

  1. There is no money without authority behind them. This can be government, company, person or any kind of organization. I am not talking here about gold backed money, they have his own value (based on the trust that they can be exchanged for gold at any time). It is important for each participant in the market that rules will be followed. In case of political changes and other extraordinary conditions (war, natural disaster) participants are in doubt that all rules will be followed and that the authority will continue to exist in the current form. Degree of trust decrease and this lead to money inflation process. In some cases trust in the government disappears completely (Yugoslavia suffered 5 × 1015 percent inflation per month (prices double every 16 hours) between 1 October 1993 and 24 January 2944) and this creates hyperinflation.
  2. Each participant can own documents from different authority organizations and use them in the same market.
  3. Each participant can own goods and money at the same time (trivial).
  4. The size of the market is determined by the number of participants who want to exchange money for goods and goods for money.
  5. Market participants can use direct exchange of goods (barter) but it is not always possible and money are preferable. On one side the person that owns goods do not want to have money, they are useless for him (in sense that he can’t eat them for example). On other side, he trust the authority, that the rules will be followed and in this way he can make new exchange (money for new goods that he wants). So, he accepts initial exchange in order to have possibility to make the second one.
  6. What is interesting here, is money markets. One participant exchanges one type of money for other based on his own relative trust (and information on which it is based) in one vs. second authority organization.

How about “virtual money”:

The virtual money add additional layer of trust because of network. We have to have to trust in the authority that issued the money and additionally that virtual representation of money is genuine in the moment of trade. If in the trade operation virtual goods are exchanged genuinity of the virtual goods have to be assured too.

Notes:

  1. There is not one definition of Trust. The money valuation problem is equivalent to quantitative measure of Trust.
  2. My definition of Trust: This is quantitative forecast (based on subjective model) of risk in behavior of organization or person. It is interesting to discuss the relationship between trust and risk: http://www.istc.cnr.it/T3/trust/pages/risk.html

Useful sources of information:Trust Theory Group

Probably most of you do not agree with me. I will be pleased to read other opinions on the subject.

Researching Competitors Keywords in Delicious

Currently I develop a software that creates collages semi-automatically. There are many strong competitors that have outstanding collage making products. I am researching those products and associated search keywords for SEO purposes. Here is what I found very useful for researching the competitors keywords and understanding how users think about competitor products.

First, lets create list of important competitors. Collage Software industry have many participants each one focused around particular niche. My competitors are those who produce software for photo collages, particularly those kind of collages that you can put on a wall as poster (or as computer wall paper), create postcards or combined image for a blog post. I have a short list of around 20 software products that appear in Google when searching particular phrases that describe my product: “photo collage software”, “photo poster” etc…

People get to bookmark web sites in different ways. For example, I use XMarks for Firefox. XMarks company have knowledge of how people describe different websites and what keyword they associate with them. We have no access to that information, so lets look what we can collect from the publicly visible bookmarks at Delicious.com

If you haven’t yet completed you list of competitors just type the search phrases in Delicious search form and look for most bookmarked web sites: At the right of each bookmark is a number. Clicking on the number reveals the users that bookmark this site. On the right of this page is the most important information, the total number each keyword is associated with this web site:

Delicious Photo Collage top kewords

Top of this list are most important words describing the product and the rest are long tail words somehow related to it. On the left is the user descriptions of the site. Descriptions are commonly in English but some are in German, Chinese and Spanish. This competitor product have users in these countries. The main part of descriptions are copy-pasted from the home page and are of no particular interest to me. I dig into descriptions and found some very interesting ones that can be classified as follows:

  • Product description in the users terms. Most users have no knowledge in the field of image processing so they use simple words to describe what the product do (and probably use the same words to search for products).
  • Possible product application. For example: “I can use this for gifts for my mom”, “Fun to teach students”.
  • Keywords translated in other languages. Translating keywords can be tricky, but tags in other languages gives me idea of translations and some variations when the exact translation is not possible: “Montagens de fotos”, “Collagen aus Fotos”.
  • Product opinions: Example: “Looks very easy and fun”, “It is not customizable” “looks like fun to try, but probably cannot download at work”

It will be interesting to look at descriptions when my site become popular but meanwhile I have to dig at my competitors. The main difficulty of this way to research for keywords and user opinion is that it is time consuming. If there is a way to automatically extract all the descriptions and filter duplicate ones I will be happy.

Wikipedia word frequency list

Domain names are one of most valuable assets for each mISV (and each online business). Domain have to represent a lot of information in limited number of characters. It have to be SEO friendly (descriptive), easy to remember and easy to spell. I have done a lot of work to find good (and not registered) domain names for my products.

It is written in a lot of places that good keywords in domain name help you customers to better understand what you product does and they are especially useful for SEO purposes (Search Engine Optimization). Countless articles suggest to look at Google search count for popularity of words before registering domain (and also at search result count). So, I needed a list of words with some indication of how important they are. More precisely I needed a list of phrases not single words, but this article is about a single words only. I use these word list with combination of Google External Keyword tool and Google Trends to hunt for a perfect domain name.

Computational linguistics is specifically concerned with the question of how frequently given words appear in different written contexts (or known as corpus). Frequency list is a list of words (in given language) and associated frequency in given texts. It is like a dictionary with additional “importance” number.

We all think about dictionaries as some fixed list of words, but it is more like a list of words that continuously appear and other that disappear from the list and each word having a rank in it.

And there is a power laws (or long tails as some prefer to name it). Small number of words get all the attention (they are used most frequently) and a large number of words are used rarely (long tail keywords).

There are some well known word frequency lists:

My own word frequency list

I decided to create my own list of words and associated frequencies based on all articles that are in the English version of Wikipedia.

Wikipedia is HUGE. Only the English part is 21GB in XML format. It takes a 5h to parse entire file and extract statistics for all tokens that looks like a word.

Some statistics:

  • Total tokens (words, no numbers): 1,570,455,731
  • Unique tokens (words, no numbers): 5,800,280

It seems that the words frequency distribution follow the Zipf’s law and you can even see similar to the following plot here.

Wikipedia word friequency plot

The chart can be divided on four parts:

  • Rank(1-50)  Count(86M-3M) Examples(the, of, and, in, to, a, is) Words that are stop words.
  • Rank(51-3K) Count(2.4M-56K) Examples(university, January, tea, sharp) Words form the  “core” of the English dictionary — words that are most frequently used.
  • Rank(3K-200K) Count(56K-118) Examples(officiates, polytonality, neoligism) Words that can be found in some large and comprehensive dictionaries (above rank 50K are mostly Long Tail words)
  • Rank(200K-5.8M) Count(117-1) Examples(euprosthenops, eurotrochilus, lokottaravada) Terms from obscure niches, misspelled words, transliterated words from other languages, new words  and “not words at all”

Google study shows that there are 14M one word and 315M two word phrases (bigrams). Currently I have no plans to extract two words phrases due to their large number, but it is interesting to analyze them in context of two words domain names.

Extracting words from Wikipedia

The process of extracting all words and counting them is not an easy task. I used Qt XML library for parsing. The steps to create your own word frequency list are:

  • Download a copy of Wikipedia. I used version dumped in XML format.
  • Write parser to extract text from <title> and <text> tags.
  • Wikipedia uses its own markup language. Write parser to extract all data from markup language and filter-out some unnecessary parts. (this is difficult and vague part)
  • Filter out numbers, special characters.
  • Tokenize.
  • Collect useful statistics.

The good news is that Wikipedia is much clean and organized then the rest of the web. My main difficulties were to parse Wikipedia markup language (it is not strict at some parts) and to manage memory (limited to 2GB and memory leaks at some point). On Linux you can use Valgrind to check for leaks and other memory problems.

“Collect statistics” part can be done in different ways. I used my own implementation of ternary search tree.  It is fast and memory efficient for counting words. It also implements some filtering of strings that can be found in Wikipedia like exceptionally long strings (URL-s for example) and other noise.

Some selected words and associated counts:

  • Google  197920
  • Twitter 894
  • domain  111850
  • domainer 22
  • Wikipedia 3226237
  • Wiki    176827
  • Obama   22941
  • Oprah   3885
  • Moniker 4974
  • GoDaddy 228

When you look at counts published on the web, take in mind that only relative counts is that matter.  Relative count =  (word count/total words count) have meaning of probability of occurrence of given word in given corpus.