I maintain a list of not-so-good-claims in papers that I have read and
written (yes, including me). I promise to avoid these in any of my
future publications and encourage future researchers to avoid these.
- Incorrect
baseline: Comparing a method with a crippled baseline to make ones method
look better. Example: Showing one’s periodicity detection model is better
than auto-regressive model, which is not designed to detect seasonality or periodicity.
- Incorrect
assumptions:
The problem is right; while the formulation is wrong because of
incorrect assumptions. Example: Using mortality of patients as
surrogate for severity of diseases. This is a convenient but
unrealistic assumption with tons of hidden attributes.
- Unnecessary
problem: The problem does not exist in any application domain. Example:
Speeding up a quadratic algorithm while the data from the applications is so small to require that speedup.
- Incorrect
validation: Validation using low quality labeling. In the name of user
study, authors generate labeled data themselves and thus, bias the
results. A user study with three subjects is naturally done by the
authors or their friends.
- Absence
of significance: Absence of significance in the experimental
comparison of methods is not desirable. For example, if we use RMSE for
to compare two methods we must show how significant is the reduction in
error by the better method. A 4-watts reduction of root-meand-squared-error in a mega-watt
power-plant is not worth re-engineering an algorithm. Note that RMSE has a unit.
- Discovering
known knowledge using known knowledge: Using biased features that have
hidden pathways to the predictions. Example: Using doctor’s notes to
predict the disease of a patient. If a test patient goes to a doctor
for a note, he does not need a prediction for disease. He can just ask the
doctor.
- Incorrect
use of Synthetic data: Using synthetic data without an explanation of
how the data was generated to challenge the method. For example, if we
generate a sine wave for periodicity detection and a classic
random walk for autoregressive model, the synthetic data form trivial
scenarios. Note that, a sine wave has exactly one period and a
random walk has exactly one coefficient to estimate inthe AR model, which is 1.
- Complexity
of methods: Complexity is relative to the reader. However, authors must
make every
effort to present their methods simply. It is a crime to present a
method in
complex ways while there exists a simpler way. Mathematical notations
are useful to convey complex ideas, however, it is an art to find
simpler description instead of a laundry-list of notations or maze-like
plate-representations in papers.
- A
machine trained with another machine: Papers often use automatically
calculated or derived scores such as reputation, helpfulness,
trustworthiness, etc. as labels to train algorithms and show
performances. This is equivalent to training a classifier to behave
like another classifier. It is easier to make a copy of the original
classifier where the trivial non-scientific challenge is to find money
to obtain it.
- Mismatch
between motivation and optimization: Often
motivational example
of a paper does not match with the optimization done in the
paper. Example: Imagine an optimization function that reduces the
average shortest path distance the most by adding a set of edges
in the graph. This optimization function does not motivate a reduction
in number of stops in air-routing (nodes are airports and edges are
flights), if the formulation does not consider
passenger load at each edge.
- Implementation
bias:
Performance improvements of important algorithms are often
results of implementation bias such as comparing a c++ implementation
with a matlab implementation. Empirical evaluation of performances
should always be in the same platform and between the BEST
implementations of the competing methods under the BEST compilers and
so on. Clearly, I prefer avoiding such problems than solving them
appropriately.
- Averaging across datasets: Often
algoirhtms are tested on multiple datasets and authors report an
average error across datasets without the varaince or individual
numbers. A dataset from a real-domain comes with a large set of
dependencies on various domain specific parameters such as operating
conditions, time of the year, etc. Averaging error/accuracy over many
datasets and showing that the average is better than some other methods, is an incorrect way of claiming general superiority.