<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>blattner technology</title>
	<atom:link href="http://www.b-tec.ch/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.b-tec.ch</link>
	<description>everything is a network!</description>
	<lastBuildDate>Mon, 08 Feb 2010 15:28:32 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Morphological classification of galaxies</title>
		<link>http://www.b-tec.ch/?p=107</link>
		<comments>http://www.b-tec.ch/?p=107#comments</comments>
		<pubDate>Mon, 08 Feb 2010 15:12:03 +0000</pubDate>
		<dc:creator>blattner</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[PatternRecognition]]></category>

		<guid isPermaLink="false">http://www.b-tec.ch/?p=107</guid>
		<description><![CDATA[Like in biology and other natural sciences there are classification-schema for cosmology/astrophysics to classify stars, galaxies and other objects. I naively believed, that the task to classify the morphology of galaxies (E,S types etc.)  is done by state-of-the-art pattern recognition algorithms. 
It seems that this is not the case. The project Galaxy Zoo for [...]]]></description>
			<content:encoded><![CDATA[<p>Like in biology and other natural sciences there are classification-schema for cosmology/astrophysics to classify stars, galaxies and other objects. I naively believed, that the task to classify the morphology of galaxies (E,S types etc.)  is done by state-of-the-art pattern recognition algorithms. </p>
<p>It seems that this is not the case. The project Galaxy Zoo for example instructs layman to classify galaxies according some simple rules. </p>
<p>I was looking for publications about morphological galaxy classification and was surprised about my findings. There is not much published on this topic. Now, there are two options: a) there is not much to say about it, because the problem is that trivial or b) there has been not much research done into this area so far.</p>
<p>I favor b) because of the existence of the GalaxyZoo Project. I wonder how state-of-the-art algorithm perform dealing with these problems.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.b-tec.ch/?feed=rss2&amp;p=107</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search engines: I love them, I hate them!</title>
		<link>http://www.b-tec.ch/?p=78</link>
		<comments>http://www.b-tec.ch/?p=78#comments</comments>
		<pubDate>Thu, 26 Nov 2009 13:36:11 +0000</pubDate>
		<dc:creator>blattner</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[SearchEngine]]></category>

		<guid isPermaLink="false">http://www.b-tec.ch/?p=78</guid>
		<description><![CDATA[Search engines are hard workers. They are our information retrieval slaves. Are we satisfied with them? Do they a good job for us? &#8216;Yes&#8217;, would be the most quoted answer, I guess. But what does &#8216;most&#8217; mean here? Does it mean 99%, 80% or 50.0003%? Giving an answer to this question implies to measure user [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;">Search engines are hard workers. They are our information retrieval slaves. Are we satisfied with them? Do they a good job for us? &#8216;Yes&#8217;, would be the most quoted answer, I guess. But what does &#8216;most&#8217; mean here? Does it mean 99%, 80% or 50.0003%? Giving an answer to this question implies to measure user satisfaction and we all know: that&#8217;s tough. Because of this, one is not in a position to give definitive answers.</p>
<p style="text-align: left;">However, there are some studies giving evidence, that 30%-60% are rather frustrated and therefore NOT satisfied. Before giving some reasons, I quote some interesting figures:<span id="more-78"></span><br />
Between 67% and 78% of the users solve their information needs with two queries or less . Between 60% and 85% of the users don&#8217;t go beyond the first result page. 66% of all users review less then 5 results. Jansen and Spink conclude that users&#8217; queries are not complex in general and the first results are mostly relevant to the information needs. Furthermore, 50% of the results are relevant to the users.<br />
But this means: on average 50% of the provided results are not relevant. This figure further increases, if users put more complex queries. This result was reproduced by Hawking et al. in a slightly different setup.</p>
<p>Another study says: the average user spends 3 minutes for a search session. Most users are dissatisfied when the search session lasts longer then 3 minutes. All in all, there is evidence, that users are not entirely satisfied.</p>
<p style="text-align: left;">Ok, now the why &#8211; just a few reasoning.<br />
First of all: the very basic mechanism of search engines -like google- follow a &#8216;wisdom of the crowds&#8217; approach.  And the &#8216;crowds&#8217; consists mainly of web administrators, bloggers and companies in general. Therefore the top results are a kind of consensus. From this point of view, it is clear, that search engines are not able to satisfy everybody needs.<br />
Secondly: Google&#8217;s top results are highly skewed towards online stores if you search for something you can buy online. Try it with &#8216;flowers&#8217;, &#8216;cd&#8217;&#8230;you name it. To receive specialized information you would have to dig deeply.<br />
Thirdly: Many &#8216;concepts&#8217; in our language are ambiguous: apple (fruit or computer), jaguar (animal, car, operating system). There are many many examples. At the time,  search engines are not able to &#8216;know&#8217; what you exactly have in mind, when submitting your query.</p>
<p>Of course there are more reasons. I guess the problems become worse in future and we should start thinking of new concepts. We need systems able to understand our individual needs. The research community tries to tackle these problems from diverse directions. Good. What are your experiences with search engines?</p>
<p style="text-align: left;">References:</p>
<p style="text-align: left;">[1] B.J. Jansen and A. Spink, An Analysis of Web Documents Retrieved and Viewed, Proceedings of the 4th International Conference on Internet Computing, 2003, pp.65-69<br />
[2] B.J. Jansen, A. Spink, J Baeman, and T. Saracevic, Real life information retrieval: A study of user queries on the  Web, SIGIR Forum, vol32. no 1, 1998, pp.5-17<br />
[3] D. Hawking, N. Craswell, P. Bailey and K. Griffihs, Measuring search engine quality, Information Retrieval, vol. 4, 2001, pp.33-59<br />
[4] C. Silverstein, M. Henzinger, H. Marais and M. Moricz, Analysis of a Very Large AltaVista log, Technical Repport, Digital Systems Research Center, 1998.</p>
<p style="text-align: left;">
<p style="text-align: left;">
<p style="text-align: left;">
<p style="text-align: left;">
<p style="text-align: left;">
]]></content:encoded>
			<wfw:commentRss>http://www.b-tec.ch/?feed=rss2&amp;p=78</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>User data are noisy</title>
		<link>http://www.b-tec.ch/?p=45</link>
		<comments>http://www.b-tec.ch/?p=45#comments</comments>
		<pubDate>Tue, 17 Nov 2009 16:35:57 +0000</pubDate>
		<dc:creator>blattner</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[RecommenderSystems]]></category>

		<guid isPermaLink="false">http://www.b-tec.ch/?p=45</guid>
		<description><![CDATA[Yes we all know: ratings from users are very noisy and not consistent. If you ask users to re-rate items they will do it differently in most cases. This was pointed out by a number of scientists. Knowing and removing noise could lead to better prediction performance. One simple approach to reduce noise is achieved [...]]]></description>
			<content:encoded><![CDATA[<p>Yes we all know: ratings from users are very noisy and not consistent. If you ask users to re-rate items they will do it differently in most cases. This was pointed out by a number of scientists. Knowing and removing noise could lead to better prediction performance. One simple approach to reduce noise is achieved by asking people to re-rate all objects they have rated so far. This was also pointed out by<a href="http://technocalifornia.blogspot.com/2009/08/rate-it-again.html"> technocalifornia</a>.</p>
<p>However, I doubt that users are very happy to re-rate everything. So, how can we learn and remove noise? Currently I try to model user ratings, taking into account inconsistency and peer influence. I hope the model will give some insights, when compared to real world data generated by recommender systems.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.b-tec.ch/?feed=rss2&amp;p=45</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
