This file is indexed.

/usr/share/doc/ray/Documentation/l-mers.txt is in ray-doc 2.3.1-1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
2012-03-07/Sébastien Boisvert

Given a sequence, a sliding window can be utilised to generated k-mers.


Sequence:

------------------------

k-mers:

***************
 ***************
  ***************
   ***************
    ***************
     ***************
      ***************
       ***************
        ***************
         ***************
          ***************


This works for de novo assembly.

But let's say we want to search for sequences in the de Bruijn 
graph.


Precisely, we want to search for a sequence
that have some mismatches in respect to the sequence used for
de novo assembly.

So this is the sequence we are searching for:


-----------X------------

***********X***
 **********X****
  *********X*****
   ********X******
    *******X*******
     ******X********
      *****X*********
       ****X**********
        ***X***********
         **X************
          *X*************


Since X is not in the k-mers in the graph, the sequence won't be detected.

One way to deal with this is to build an index of l-mers, parts of k-mers.

Each MPI rank has an array of all possible l-mers.

irb(main):001:0> 4**5
=> 1024
irb(main):002:0> 4**10
=> 1048576


== Adding a k-mer entry to a l-mer entry ==


add(l-mer,k-mer)
	index=convertLmerToInteger(l-mer,lmerLength)

	table[index].add(k-mer)


The table is an array of managed k-mers (with defragmentation).



== Searching for the closest k-mer in the graph ==

If there is more than 1 hit with the same lowest score, take any.


search(k-mer)
	l-mers = GetLmersFromKmer(k-mer)

	hits={}

	for l-mer in l-mers
		k-mers = table[index]
		for j in k-mers
			hits[j]++

	return best hit

The obvious problem here is that an entry in the table will contain a lot of k-mers.

Otherwise, a k-mer could be sent to all MPI ranks so that they can search for it.
But that would not be efficient at all.

Send-to-all communication patterns are just bad.