This article aims to find the optimal way to search for a word in a folder containing multiple inner folders and a large set of files.
I did the below experiment on a folder containing 8918 folders and 48170 files. The purpose of this experiment is to find out the various ways of searching a string in these folders and trying to find the performance of each.
- grep
- grep + find
- grep + find + exec {} \;
- grep + find + exec {}+
- grep + find + xargs
- ack
- rg
- ag
- Conclusion
grep
Using grep with -r
to recursively search in the current directory and -i
for ignoring case and -n
to display line numbers and -F
to treat the search term as a fixed string rather than regexp.
grep
is used to search term in the current directory like below.
grep -rinF search_term .
On executing with time
$ time grep -rinF "logo sc-cxo-logo" .
Binary file ./build/docroot.tar matches
./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:
<a class="pull-left logo sc-cxo-logo" href="#"></a>
./modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:
class="logo sc-cxo-logo" title="Back to homepage">
...
grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn} -rinF .
116.12s user 8.37s system 38% cpu 5:22.50 total
above took 5 mins 22 seconds to complete the search.
Note: Previous versions of grep might not support recursive (-r or -R) and also in POSIX systems also
this option is not available, now if this option is not available we can use find
with grep
grep + find
If the below command is used
grep search_term`find . -type f`
Using the above command, we first find all files and apply grep
on those files; this will work if the folder has fewer files. If you have a large number of files, then it will fail with argument list too long: grep
On executing with time
$ time grep "logo sc-cxo-logo" `find . -type f`
zsh: argument list too long: grep
grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn} "logo sc-cxo-logo"
0.42s user 0.04s system 99% cpu 0.459 total
grep + find + exec {} \;
In order to avoid argument list too long: grep
let’s use exec .
We can use
find . -type f -exec grep -n search_term {} \; -print
Each line is found by the find
; it would be fed to grep to search in that file. ( {}
is replaced with each file)
On executing with time
$ time find . -type f -exec grep -n "logo sc-cxo-logo" {} \; -print
Binary file ./build/docroot.tar matches
./build/docroot.tar
43: <a class="pull-left logo sc-cxo-logo" href="#"></a>
./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html
65: class="logo sc-cxo-logo" title="Back to homepage">
...
find . -type f -exec grep -n "logo sc-cxo-logo" {} \; -print
124.81s user 139.21s system 51% cpu 8:34.44 total
The above worked fine, but it took 8 minutes 34 seconds to complete. Can we improve this?
grep + find + exec {}+
We can use
find . -type f -exec grep -n search_term {} +
Above is same as option 3, but instead ;
we are using +
. By having +
, set of as many paths possible are sent to grep ( {}
is replaced with as many paths as possible)
$ time find . -type f -exec grep -n "logo sc-cxo-logo" {} +
Binary file ./build/docroot.tar matches
./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:
<a class="pull-left logo sc-cxo-logo" href="#"></a>
./modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:
class="logo sc-cxo-logo" title="Back to homepage">
...
find . -type f -exec grep -n "logo sc-cxo-logo" {} +
82.80s user 7.51s system 26% cpu 5:35.59 total
The above command took 5 mins 35 seconds, similar to grep -r
. Can we do this in another way?
grep + find + xargs
Now lets try
find . -type f -print0 | xargs -0 grep -n search_term
The same can be accomplished with xargs
too.
Note: —print0
and -0
are required if the folders and filenames contains spaces.
On executing with time
$ time find . -type f -print0 | xargs -0 grep -n "logo sc-cxo-logo"
Binary file ./build/docroot.tar matches
./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:
<a class="pull-left logo sc-cxo-logo" href="#"></a>
./modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:
class="logo sc-cxo-logo" title="Back to homepage">
...
xargs -0 grep -n "logo sc-cxo-logo"
82.92s user 7.36s system 32% cpu 4:39.14 total
Above took 4 minutes 39 seconds which did somewhat better than option 1 and option 3. Can we do better than this? Yes, by using third-party utilities.
ack
ack is a grep-like source code search tool.
On executing with time
$ time ack "logo sc-cxo-logo” *
modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html
43: <a class="pull-left logo sc-cxo-logo" href="#"></a>
modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html
65: class="logo sc-cxo-logo" title="Back to homepage">
...
ack "logo sc-cxo-logo" *
6.14s user 6.61s system 9% cpu 2:09.14 total
Above took 2 min, 9 seconds which is a significant improvement on previous options. Can we do better than this?
rg
ripgrep is a line-oriented search tool that recursively searches your current directory for a regex pattern
On executing with time
$ time rg 'logo sc-cxo-logo'
modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp
23: <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">
modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp
111: class="logo sc-cxo-logo" title="Back to homepage">
...
rg 'logo sc-cxo-logo'
1.23s user 4.83s system 8% cpu 1:13.77 total
The above command took, 1 min, 13 seconds, which is better than above all options. Note by default, ripgrep excludes folders like bin. Can we do better than this?
ag
Silver Searcher A code searching tool similar to ack, with a focus on speed.
$ time ag "logo sc-cxo-logo" *
modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html
43: <a class="pull-left logo sc-cxo-logo" href="#"></a>
modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html
65: class="logo sc-cxo-logo" title="Back to homepage">
ag "logo sc-cxo-logo" *
0.88s user 8.13s system 16% cpu 53.761 total
Above took 54 seconds which is a significant improvement. Thus a search which took 5 mins, by using the above tools, we can search in seconds.
Conclusion
Below is the summary of this experiment
command | execution time |
---|---|
grep | 5 mins 22 seconds |
grep + find + exec {} \; | 8 mins 34 seconds |
grep + find + exec {} + | 5 mins 35 seconds |
grep + find + xargs | 4 mins 39 seconds |
ack | 2 mins 9 seconds |
rg | 1 min 13 seconds |
ag | 54 seconds |
We went through various commands to search in a folder; we went through different options and tools to find the optimal way to search.
– RC
Comments