Filter any bad words


(Ήαϻe̸d̨ MΌsϮαϝαe̸ΐ) #1

Hi there

I did write code to filter any bad words, hope u can use that in your chat project or comment page of the project. That is very fast and lightweight, Please tell me what is your comment or criticism about code and thank u for reading this.

regards Hamed


(Christoph Berger) #2

Right now it is a main package. Are you planning to make it a library package to include in chat or forum programs?


(J Ohn Stuart) #3

Hi there Hamed,

I’ve read your code, and I like it. You have a nice program and I want to encourage you to improve it and make it public as a go library people can use.

Here is some feedback, namely possible improvements you can add.

Usability:

  • in the README also add info about the limits of the program, something like: “I’ve tested this program on a text 1.000.000 chars long, out of which 10% were bad words, and it has completed in 2,3 seconds”
  • make a way to import and use multiple bad words files, in multiple formats (txt, csv, json, xls)
  • make a way for the user to select what bad words files are going to be used, namely change method FilterBadWords(content string) string to FilterBadWords(content string, badWordsFiles …string) string
  • make your program treat digits and duplicates as letters, for example if “aien” is a bad word then also “ai3n” and “aaien” should be treated bad words

Engineering:

  • currently your program is running on a single processor only. You will get much faster running time if you split your work to multiple processors. So use goroutines to do that.
  • remove Printlnt(how long did it take) from the final product
  • choose a more suggestive package name, and add a package comment (a comment on top of package main) explaining what it does and where to use it
  • add also a larger comment on top of the FilterBadWords function where you explain the algorithm you intend to use
  • you are not using this mkSliceMap for anything
  • for large input files write the result in a file, do not return it as a string
  • for large input file, the fact that you take the input string, remove all non a-zA-Z chars, then change everything to lowercase , and you store this in memory in joinString variable - this consumes a lot of memory
  • you replace in content string every time you find a bad word - this also consumes a lot of memory

For the last 2 improvements, try to find a better algorithm memory wise that can also run on multiple processors. For example, if you are willing to allocate enough memory to store the final answer (if input is 1.000.000 chars then joinString will have a similar size) - better just iterate through input until you reach a space char, then you have found you next word and write this word (or *** if it is a bad word) in the output variable.

Better yet, use 2 processors so that one processes the first part of the text and the other the second. After these 2 have finished, write their results in a file and return this file.

Hope this helps, and I hope you will continue work on this program.


(Ήαϻe̸d̨ MΌsϮαϝαe̸ΐ) #4

yes Christophberger
, I will update that to the library with your help but right now I haven’t time.
thanks for response


(Ήαϻe̸d̨ MΌsϮαϝαe̸ΐ) #5

I am so excited for your response. Thank u for encourage and improve also for nice engineering JOhn_Stuart. I will do execute your help by the improved.

regards Hamed