plag (main_function) module¶
-
class
plag.
GSTHashtable
[source]¶
-
plag.
RKR_GST
(P, T, minimalMatchingLength=3, initsearchSize=20)[source]¶ Computes Running-Karp-Rabin-Greedy-String-Tiling
Argument1: P {string} – pattern Argument2: T {string} – text Argument3: minimalMatchingLength {number} – minimal matching length to be considered (default: {3}) Argument4: initsearchSize {number} – initial search size (default: {20}) Returns: list – tiles
-
plag.
calcSimilarity
(s1List, s2List, tiles, treshold)[source]¶ Calculates Similarity and returns list [similarity:float, suspectedPlagiarism:bool]
-
plag.
createKRHashValue
(substring)[source]¶ Creates a Karp-Rabin Hash Value for the given substring and returns it.
Based on: http://www-igm.univ-mlv.fr/~lecroq/string/node5.html
-
plag.
distToNextTile
(pos, stringList)[source]¶ Returns distance to next tile, i.e. to next marked token. If not tile was found, it returns None.
- case 1: there is a next tile
- -> pos + dist = first marked token -> return dist
- case 2: there is no next tile
- -> pos + dist = len(stringList) -> return None
dist is also number of unmarked token ‘til next tile
-
plag.
isOccluded
(match, tiles)[source]¶ Returns true if the match is already occluded by another match in the tiles list.
“Note that “not occluded” is taken to mean that none of the tokens Pp to Pp+maxmatch-1 and Tt to Tt+maxmatch-1 has been marked during the creation of an earlier tile. However, given that smaller tiles cannot be created before larger ones, it suffices that only the ends of each new putative tile be testet for occlusion, rather than the whole maxmimal match.” [“String Similarity via Greedy String Tiling and Running Karp-Rabin Matching” http://www.pam1.bcs.uwa.edu.au/~michaelw/ftp/doc/RKR_GST.ps]
-
plag.
jumpToNextUnmarkedTokenAfterTile
(pos, stringList)[source]¶ Returns the first postion of an unmarked token after the next tile.
- case 1: -> normal case
- -> tile exists -> there is an unmarked token after the tile
- case 2:
- -> tile exists -> but NO unmarked token after the tile
- case 3:
- -> NO tile exists
-
plag.
main_func
(text1, text2, index_list)[source]¶ This function takes original text files processed text and input file processed text ,compares them and return similarity content.
Argument1: text1 {String} – Combined original files text. Argument1: text2 {String} – Input file text Argument1: index_list {list of integers} – list of index of end of text file
-
plag.
run
(s1, s2, mML=3, treshold=0.5)[source]¶ - This method runs a comparison on the two given strings s1 and s2 returning
- a PlagResult object containing the similarity value, the similarities as list of tiles and a boolean value indicating suspected plagiarism.
Argument1: s1 {string} – string 1 Argument2: s2 {string} – string 2 Argument3: mML {number} – minimumMatchingLength (default: {3}) Argument4: treshold {number} – a single value between 0 and 1 that determines whether a comparsion between string should be marked as plagiarised (default: {0.5}) Returns: object – PlagResult Raises: OutOfRangeError, OutOfRangeError, NoValidArgumentError