Here is the current situation example:
By datamining (meaing reading a lot ) the articles in the database of PUMMED, I selectively collected a list of genes (500) have a common property X (here on called X-genes).
Then, I then divided these genes into five catorgories:
1. Find in Condition A (100 genes in this group, called them AX-genes)
2. Find in Condition B (200 genes in this group, called them BX genes)
3. Find in Condition C (400 genes in this group, called them CX genes)
4. Find in Condition D (20 genes in this group, called them DX genes)
5. Find in Condition E (100 genes in this group, called them EX-genes)
As you can see the number, some X-genes will be in A, B and C; Some will only be in A; Some will not be in any.
(here on call these five catorgories group A, B, C, D and E)
I now run an experiment. and now that I have identified a list of x-genes in right hand (now call RHX-genes).
Here are the question that I would like to discuss with you:
What will be a good statiscal analysis that can give a score that indicate the degree of association between AX-genes and RHX-genes? this score must be able indicate that your right hand most likely to have condition A among conditon A to E becuase the AX-genes vs RHX genes socre is highest in among these conditoons.
what do you guys think?