Filter any bad words

Hamed_Mostafaei · August 7, 2018, 10:04pm

Hi there

I did write code to filter any bad words, hope u can use that in your chat project or comment page of the project. That is very fast and lightweight, Please tell me what is your comment or criticism about code and thank u for reading this.

regards Hamed

christophberger · August 9, 2018, 7:24am

Right now it is a main package. Are you planning to make it a library package to include in chat or forum programs?

JOhn_Stuart · August 9, 2018, 8:57am

Hi there Hamed,

I’ve read your code, and I like it. You have a nice program and I want to encourage you to improve it and make it public as a go library people can use.

Here is some feedback, namely possible improvements you can add.

Usability:

in the README also add info about the limits of the program, something like: “I’ve tested this program on a text 1.000.000 chars long, out of which 10% were bad words, and it has completed in 2,3 seconds”
make a way to import and use multiple bad words files, in multiple formats (txt, csv, json, xls)
make a way for the user to select what bad words files are going to be used, namely change method FilterBadWords(content string) string to FilterBadWords(content string, badWordsFiles …string) string
make your program treat digits and duplicates as letters, for example if “aien” is a bad word then also “ai3n” and “aaien” should be treated bad words

Engineering:

currently your program is running on a single processor only. You will get much faster running time if you split your work to multiple processors. So use goroutines to do that.
remove Printlnt(how long did it take) from the final product
choose a more suggestive package name, and add a package comment (a comment on top of package main) explaining what it does and where to use it
add also a larger comment on top of the FilterBadWords function where you explain the algorithm you intend to use
you are not using this mkSliceMap for anything
for large input files write the result in a file, do not return it as a string
for large input file, the fact that you take the input string, remove all non a-zA-Z chars, then change everything to lowercase , and you store this in memory in joinString variable - this consumes a lot of memory
you replace in content string every time you find a bad word - this also consumes a lot of memory

For the last 2 improvements, try to find a better algorithm memory wise that can also run on multiple processors. For example, if you are willing to allocate enough memory to store the final answer (if input is 1.000.000 chars then joinString will have a similar size) - better just iterate through input until you reach a space char, then you have found you next word and write this word (or *** if it is a bad word) in the output variable.

Better yet, use 2 processors so that one processes the first part of the text and the other the second. After these 2 have finished, write their results in a file and return this file.

Hope this helps, and I hope you will continue work on this program.

Hamed_Mostafaei · August 9, 2018, 2:17pm

yes Christophberger
, I will update that to the library with your help but right now I haven’t time.
thanks for response

Hamed_Mostafaei · August 9, 2018, 2:44pm

I am so excited for your response. Thank u for encourage and improve also for nice engineering JOhn_Stuart. I will do execute your help by the improved.

regards Hamed

system · November 7, 2018, 2:44pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.