the following paragraph from a U.S patent. can anybody tell from where i get these non genes...and any information which can help me in this regard
" the training set consist of1610 E.coli.K-12 NCBI listed proteincoding genes and 3000 E.coli k-12ORFS(a stretch of sequence of lengthmore than 20amino acids and having start codon, stop codon in the same frame) which have not been reported as genes(non-genes). the validation set has 1000known genes and 1000 non genes from E.coli K-12 distinct from those used in the training set. the test set contains another 1000 genes and 1000 non-genes from the same organism "