I maintain a list of not-so-good-claims in papers that I have read and written (yes, including me). I promise to avoid these in any of my future publications and encourage future researchers to avoid these.
  1. Incorrect baseline: Comparing a method with a crippled baseline to make ones method look better. Example: Showing one’s periodicity detection model is better than auto-regressive model, which is not designed to detect seasonality or periodicity.
  2. Incorrect assumptions: The problem is right; while the formulation is wrong because of incorrect assumptions. Example: Using mortality of patients as surrogate for severity of diseases. This is a convenient but unrealistic assumption with tons of hidden attributes.
  3. Unnecessary problem: The problem does not exist in any application domain. Example: Speeding up a quadratic algorithm while the  data from the applications is so small to  require that speedup.
  4. Incorrect validation: Validation using low quality labeling. In the name of user study, authors generate labeled data themselves and thus, bias the results. A user study with three subjects is naturally done by the authors or their friends.
  5. Absence of significance:  Absence of significance in the experimental comparison of methods is not desirable. For example, if we use RMSE for to compare two methods we must show how significant is the reduction in error by the better method. A 4-watts reduction of root-meand-squared-error in a mega-watt power-plant is not worth re-engineering an algorithm. Note that RMSE has a unit.
  6. Discovering known knowledge using known knowledge: Using biased features that have hidden pathways to the predictions. Example: Using doctor’s notes to predict the disease of a patient. If a test patient goes to a doctor for a note, he does not need a prediction for disease. He can just ask the doctor.
  7. Incorrect use of Synthetic data: Using synthetic data without an explanation of how the data was generated to challenge the method. For example, if we generate a sine wave for periodicity detection and a classic random walk for autoregressive model, the synthetic data form trivial scenarios. Note that, a sine wave has exactly one period and a random walk has exactly one coefficient to estimate inthe AR model, which is 1.
  8. Complexity of methods: Complexity is relative to the reader. However, authors must make every effort to present their methods simply. It is a crime to present a method in complex ways while there exists a simpler way. Mathematical notations are useful to convey complex ideas, however, it is an art to find simpler description instead of a laundry-list of notations or maze-like plate-representations in papers.
  9. A machine trained with another machine: Papers often use automatically calculated or derived scores such as reputation, helpfulness, trustworthiness, etc. as labels to train algorithms and show performances. This is equivalent to training a classifier to behave like another classifier. It is easier to make a copy of the original classifier where the trivial non-scientific challenge is to find money to obtain it.
  10. Mismatch between motivation and optimization:  Often motivational example of  a paper does not match with the optimization done in the  paper.  Example: Imagine an optimization function that reduces the average shortest path distance the most by adding a set of edges in the graph. This optimization function does not motivate a reduction in number of stops in air-routing (nodes are airports and edges are flights),  if the formulation does not consider passenger load at each edge.
  11. Implementation bias: Performance improvements of important algorithms are often results of implementation bias such as comparing a c++ implementation with a matlab implementation. Empirical evaluation of performances should always be in the same platform and between the BEST implementations of the competing methods under the BEST compilers and so on. Clearly, I prefer avoiding such problems than solving them appropriately.
  12. Averaging across datasets: Often algoirhtms are tested on multiple datasets and authors report an average error across datasets without the varaince or individual numbers. A dataset from a real-domain comes with a large set of dependencies on various domain specific parameters such as operating conditions, time of the year, etc. Averaging error/accuracy over many datasets and showing that the average is better than some other methods, is an incorrect way of claiming general superiority.