Disclaimer:
This page is very much under construction.
Computer Science Topics
Computer Science is something that has always interested me, though only
particular areas of it. I will endevour to keep this up to date, but as my
interests change, this page may grow outdated as I just no longer care about
topics on it. Topics I have interest in though may well be updated to the
day you visit... Provided I have the time to read the massive computer
science Tomes in which is contained the most secret and important
information.....;-)
|
|
|
|
|
|
|
Computational Linguistics
This is the process of automatically generating text documents of
various types from some sort of knowledge. Most often, this process involves a knowledge
base, and some kind of heuristics about explaining the knowledge
therein contained. These heuristics are then used to generate a
document that conveys the knowledge contained in the knowledge base.
As one can easily see, this has many applications, from
automatically generating essays (which I am sure High School students
would be VERY interested in...), to a more robust system that involves
other techniques involved below to produce summaries of large
documents, to provide explanations of equipment given a database of
the specifications and behaviors of the equipment, and many more
interesting things, such as conveying the "thoughts" of a computer
(given that most computer programs hold their knowledge as a large
database, and therefore it is not easily understandable by humans).
Although another thing that this is used for, and I have done, is to
generate web pages for products on a commercial site, given a
knowledge base of available words, and sentence constructions, along
with the information for the products.
Text Generation
Automatic Indexing
The internet has created a glut of information. This information
poses great potential, if only it can be sorted, and evalutated. To
this end, such companies as Google, Yahoo, Alta Vista and others have
provided search engines. These search engines purport to search
through the mass of data out there and present you results that match
what you are looking for. These search engines use specific
algorithms to search through a database of web pages that they update
on a regular basis. Though as happens most of the time, some
individuals prefer that their web pages be listed ahead of other
pages, for advertising, or other purposes. An example of individuals who attempt tactics to get their pages
listed closer to the top of the list on searches are spammers. These
individuals will attempt to get their pages listed at the top of many
searches, even irrelevant ones. Therefore, as their pages tend to
spam the listings, that is the reason for the label. These search engines are an example within the field of Automatic
Indexing. This refers to the fact of indexing pages such that they
may be looked up easily, such as by keyword search. Exactly, this
field is the idea of automatically generating an index for a
collection of documents. As this field only deals with generating an index, it therefore is
separate from the field of information retrieval, which is below. This is something that I am quite interested in, but I have not had
sufficient time to work on. This actually comes from the idea of deriving
meaning from words. More appropriately, making a computer do it. Though there
are inherent difficulties with this, as one might well expect, considering that
computers do not (yet) think, and we as thinking beings are not really sure what
we mean by thinking. This becomes obvious when we try and define what we mean by
it. Most of us can come up with examples, but when we are asked to really pin
down what it is, we all have a hard time.
This is particularly useful for unstructured searching, and unstructured
queries. What is the difference between the two? Well, for my purposes, I define
these as searching is searching a body of data. A query is used to determine
what to search for within that data. In the case of text summarization, and a
sister field, text mining, the data is pure text, such as a novel.
The idea behind text summarization is that the computer should be able to
"read" a document and then summarize, just as a human does, what the document is
about. Most of the current work that has been done in text summarization
is more focused on using a structured text document in order to infer meaning to
certain fields, and then use these to describe the document. A more
interesting problem, and one to which no one that I have read of has yet solved
is, using text summarization, successfully on a free text document.
(more to come...)
Artificial Intelligence is the pursuit of creating an intelligent being. The
current work on this is being done on computers, using this model in order to
artificially create a thinking being, in this case a simulation. Currently this
work is largely mired in figuring out the definitions which are not easy to come
by, such as intelligence, what is it really after all.
Are we only concerned with intelligence as it is measured on intelligence
tests? This is not valid, because a computer can be programmed such that it will
respond absolutely perfectly on an intelligence test, and yet, it will be able
to respond to new questions intelligently. This leads to the idea that
intelligence is not only knowledge, but the ability to learn, and moreover the
ability to apply that learning to new situations. Though even this falls short
of the mark, because even the least intelligent person you can think of still
behaves in a manner that many people would describe as more "intelligent" than
the predescribed responses of a computer. Though also the idea of intelligence,
as applied to intelligent life is that perhaps creativity is also a component.
Though as we continue to do this, we can see that we are only defining
characteristics which an intelligent being must have, and not a true definition
that will work in every case. This part of the current problem of the current
research, though there are some interesting methods by which people are
exploring this frontier, such as emergent technologies, cognitive science,
semantic networks, and other such machinations.
Machine Learning is the study of learning, specifically dedicated
to machines. This does not tell you much more than the words though.
This is the study of making computers learn. Though one must ask,
what do we mean by learn, and how can computers do this thing that we
have so far heard of only animals or humans doing? By learning we mean the process by which behavior improves over
time. Typically what occurs over time is called experience.
Therefore literally it is learning about things by experience. So how can computers do this? Computers are great at storing data,
but even for a computer, the sheer amount of data, were it to record
every moment of time that ever happened to it, is immense. This
further brings up the issue of how to store the data, and further, how
often to "sample" the environment. We naturally conclude that the
program should "sample" the environment whenever necessary, or
whenever something changes. This is very difficult to do though,
consider how you would do it. Any organism recieves information from the environment through
sensors... For humans, this is an array of rods and cones in the
eyes, various methods in the ears, taste buds on the tongue, the many
smell sensing cells of the nose, and of course, every epidermal cell
on the entirity of the human skin. Through this massive array of
cells, humans are able to percieve many changes in their environment,
but what seems to happen is that whenever something triggers one of
these sensors, the sensor sends its data to the brain. This is a
great deal of interruption to the system (the brain). So, the human
machine, has adapted to this just as a machine has. For instance,
a human's eyes "sample" the environment at about 60 Hz... which is why
if a monitor refreshes any faster than 60 Hz you will no longer see
the flicker. Further, the number of types cones in the eye is
small, there are usually three, and they themselves only sample the
environment at a very small rate, showing about 160x160 resolution.
Given these limitations though, humans are able to show remarkable
properties. They are able to see many things, but this also indicates
that they don't, in fact, have an interrupt system running their
sensors, as many people would have previously suggested. Showing that
the human eye only sees things which move slower than 60 Hz, 60 times
per second, shows that the brain actually polls the eyes, the eyes do
not simply notify the brain when they see something that is different
from what they had been seeing. Though even on this, we consider that there is a massive amount of
information that we would have to store even if we were to poll a set
of eyes 60 times per second. A relatively low resolution image say
200x300 pixels, a jpg (a compressed image file), seems to have a rough
estimated size of 10Kb. If we consider the way the eyes sample, then
we look at 600Kb of data per second. This is alone is a massive
amount of data, but what if we consider a full minute of slides of the
environment? This is actually 36Mb of data. If we continue, we find
that we cannot even store one day of these images, so how do we do it?
We must find some way of picking the important things out of images,
or important images, and storing that information. This is an example
of one of the tasks of Machine Learning. Machine Learning itself as a field regards tasks that are
learnable, and tasks which are important to learn. For instance, if a
computer could be taught to write documents given some information,
then we would have the field of Natural Language Generation solved.
Further, it would be solved not by clever heuristics, or algorithms.
It would actually be learned. All that would be required would be to
train the Machine Learning algorithm to write documents. This would
change the face of instruction manuals, and other documentation in
that technical writers could be freed for other tasks. Further,
perhaps a better example, is if a computer could learn to sort
different types of garbage, then material could be recycled without
the need to separate it by hand. As with most of computer science,
Machine Learning may be said to have the focus of making life easier
for humans in that if it can learn to do tasks which humans either
cannot do, or do not like to do, then humans may be freed to do more
pleasurable activities. Though there is also the more humanitarian
call of machine learning. Each person is different. Their brains are organized in different
ways, though macroscopically similar, microscopically they are very
different. This has made prosthetics difficult, because what may work
for one person does not work for all people. Though there is also the
problem of making prosthetics do what people want them to do.
Consider a person that has lost a limb. This person would be given a
prosthetic arm, but the abilities of that limb would be greatly
reduced from the individual's natural arm. What if an electronic arm
could be given to the individual that would respond to his/her
thoughts such that he/she could control the arm just as though it were
their original arm, or with a small loss of mobility? Many would say
that this is impossible. In truth, to control it with their thoughts
may be too imprecise to make any predictions about, but in truth,
there are currently machines which can detect the firing of neurons,
the cells within the brain. These machines are becoming ever more
accurate, and indeed have been tested in this realm and found to be
only minorly successful, so far. With the every increasing accuracy
of the machines though, what causes a person's arm to move is only a
set of neurons firing in the brain. If a machine can sense those
neurons firing, then why could we not move an arm according to those
neurons firing? In truth this research is currently being conducted,
and indeed there have been some successes with prosthetic eyes
actually helping blind people, in the popular media. This is actually
where the learning part comes in though... As stated above, each person is different. Therefore, no one
solution will work for every person. Though if a machine could learn
what neuronal firing patterns are supposed to do what, then the
machine could control a prosthetic, essentially giving a person a part
of their control back from a major accident.Links for info on: Text Summarization
Lots of
information here, including:
Possible references for Text Summarization
Links to Artificial Intelligence Web
pages.
Classifier systems are systems which classify environmental situations into situations requiring a specific action. Internally, these are a combination of a Genetic Algorithm and an Expert System . These two systems combined produce a rule-based system that learns how to act in an environment. A common approach to using these systems is what is called agent based computing. Each system is considered to control an agent in the environment. Each of which has specific actions that it can/has to do.
These systems are made up of rules, such that for each rule in the
system, it has a condition part and an action part. Each rule is
interpreted in the following manner.
IF condition THEN action
In this manner, if the condition is true, then the action is
carried out. The condition for each rule is considered to be a
sense-vector (an array of sensors, or a string in which each place
represents a true or false condition). This sense-vector is provided
by the environment. If the sense-vector for a rule (called a
classifier) is the same as the sense-vector provided by the
environment, then the rule is said to fire. This causes the action
that the classifier suggests to be carried out.
In a classifier system the condition and action parts of each rule are represented by a binary string. The condition part represents whether each sensor is triggered or not (0 being false, and 1 being true) and the action part is a binary representation of the actions available to the system. Typically this is just a binary encoding of the numerical order of the action to be carried out.
The basic classifier system as stated above, is exactly like an expert system, except that all conditions must be specified for each rule. This is not actually the case, and the classifier system was extended to include a method of not specifying the value of each condition in the condition string by adding a '#' symbol to the alphabet of the condition part of each classifier. This new ternary alphabet ({ 0, 1, # }), allows for generalization. This generalization allows there to be less rules in the system, as the system no longer cares about some values within the condition string.
Even with this addition, classifier systems are not significantly different from Expert systems. So where does this Genetic Algorithm come in? In fact, classifier systems are interesting because they are an Expert system that can learn. In some sense, the classifier system starts out with some information (or not as preferred), and then, from that point, it may entirely change its rule set, based on whether or not the rule set indicates the proper actions in the environment. Further, due to this ability to learn, the classifier system can also deal with a changing environment. Of the many operators that are suggested in the Genetic Algorithms research though, subsumption, crossover, and mutation are commonly used in classifier system research.
Holland forseeing the limitations of the above, also proposes that not only can the environment provide sense-vectors, but also that an internal message list may be kept. This actually comes into play in more complex systems, where memory is necessary. To this end, there is actually a workspace that the environmental stimuli comes in from, and the internal message list is kept. A classifier will then fire if it matches either a sense-vector from the environment, or it matches one of the messages on the internal message list (which may be posted to from another classifier).
Artificial Life is for those people who have realized (or think) that artificial intelligence is far too complex to deal with. This is mainly concerned with not artificially simulating intelligence that is on par with human intelligece, but instead it is about simulating some life. The type of life is not as important, as correctly simulating aspects of life. Though another part of this field is the fact that most of the intelligent properties of life are assumed to be emergent properties (properties that we cannot see direct cause of well enough to simulate, but instead as systems get more and more complex these properties simply "emerge." This is partially a philosophical argument which has split many fields, including Psychology. This argument is that the whole is greater than the sum of the parts.
So given this emergent argument how true can it be? Strangely enough, if you
look at complex enough systems, the interaction of individual parts creates some
really strange systems, which given the parts on their own, could not occur. A
perfect example of this is the human neuron, which though it can "fire" or
remain dormant, (giving an effective 0 or 1, though it is a bit more complex
than that), we do somehow seem to get intelligent, adaptive behavior from this
complex system. (more to come...)
Truly this is a fundamental approach which is primarily concerned with the study of intelligence. The actual use of computers is limited to modeling human cognitive processes. Of course, we can only speculate on some of these, though this has gotten some boost in the emergent technologies, though the two camps seem to be separated by quite some distance in issues. This is by far the most psychology oriented of the artificial intelligence fields, and as such, draws on Philosophy, Psychology, Mathematics, Linguistics, Computer Science, and many other fields. This field is also less interested in simulating intelligence, unless it will further our understanding of human intelligence.
Right now this is my favorite field, though I do have some leaning on the
emergent area.
UNIX is an operating system that is largely made popular right now by the Linux movement. This is mostly because right now Linux is attracting media attention. Linux was invented by a man named Linus Torvalds, who was a finnish Graduate Student at the time he invented it, and still controls the Linux Kernel.
Though UNIX is so much more than Linux, that I must at least mention the fact that it is an operating system that started out in about 1969, and was first "invented" at Bell labs. This started out as an internal operating system to Bell labs, and from there moved to be a mainstream, multi-tasking, multi-operator, networked system. Unfortuntately, the only way that it was released to the public was through licensing, which only large companies could buy. Each company that bought a license to the operating system changed it, so that it was their own. This lack of control produced the current splintering of UNIX that now infests the market.
The many flavors of UNIX have been around for a long time, and have gradually improved in quality over time, as they have been tested and re-tested and fixed over approximately 30 years. The more popular versions of UNIX are currently HP/UX (HP), Solaris/SunOS (Sun), IRIX (SGI), and AIX (IBM).
For more on the history of UNIX, please see the below links: