Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GNU Coreutils Gotchas (pixelbeat.org)
57 points by signa11 on Nov 30, 2015 | hide | past | favorite | 30 comments


oooh, can we talk about the broken BSD utils on OS X next?

  $ du -hs big.log
  199M	big.log
  $ cat big.log | time tr 'a' 'b' | md5
         27.22 real        23.70 user         0.43 sys
  c171314106134f6fde035b11f4354464
  $ cat big.log | time gtr 'a' 'b' | md5
          1.05 real         0.30 user         0.23 sys
  c171314106134f6fde035b11f4354464


Yes GNU spends a lot of time improving performance. A couple of examples from the most recent release, which you might think were too simple to optimize significantly:

The yes command (which is generally useful for generating repetitive text):

    $ yes-old | pv > /dev/null ^C
    ... 55.8MiB/s ...
    $ yes-new | pv > /dev/null ^C
    ... 3.44GiB/s ...
Details on that fairly simple change are at http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h...

Also we more than doubled the speed of wc -l (by avoiding function call overhead):

    $ yes | pv | wc-old -l ^C
    ... 230MiB/s ...
    $ yes | pv | wc-new -l ^C
    ... 558MiB/s ...
Also we now generate an infinite stream of integers more efficiently too:

    $ seq-old inf | pv > /dev/null ^C
    ... 13.3MiB/s ...
    $ seq-new inf | pv > /dev/null ^C
    ... 497MiB/s ...


Nice!


Just tested on FreeBSD and the chmod -R 644 works correctly. Weird that it's broken in GNU coreutils.


Hm, it should probably first recur then remove permissions, right?

At any rate, removing list permissions from directories is probably not what you want to do.


The chmod gotcha that I hit occasionally when I haven't used the chmod command for a while is the difference between "chmod -R <perm> <dir>" which recurses as would be expected and "chmod -r <perm> <dir>" which interprets "-r" as the <perm> spec, complains that the file <perm> doesn't exist, then removes read permission from <dir>.


Why pipe into xargs when you can just use find -exec?


Clearly delineating the find command from the chmod command?

That find even have actions are one of those big unix oddities.


xargs can consume multiple files at once. I'd say less forking/execv, better perf.


find -exec can do multiple files at once too (use + instead of ;).


for me, it's details like this:

  $ basename $PWD
  etc
  $ echo $PWD | basename
  <this prints nothing>
I even reported it decade ago but nobody cares.


Discussed at http://lists.gnu.org/archive/html/coreutils/2011-01/msg00080...

Summary of that is you can filter using xargs like:

    get_file_paths | xargs basename -a
No point having two ways to do something, especially when there are caveats about enabling stdin processing


Why should basename accepts its argument on stdin? It doesn't act as a filter, processing a stream of data, but rather as a plain function.


> Why should basename accepts its argument on stdin?

Because a lot of people have use for a filter version

> It doesn't act as a filter, processing a stream of data, but rather as a plain function.

That's the problem with a lot commands, some work as filters, some as simple commands, and other do both. The reasoning on the decision is confusing.

[edit: yes, combining with xargs will often work]


Why do you expect basename to do something with stdin?


I ended up writing my own version of basename and dirname that acted as both filters and commands. Its actually not that hard given the functions you have (might give it another go given the whole pledge thing just to see how it would work). URL versions of these commands are also very useful and not too hard to program in perl.


A practical issue: how many paths should basename accept on stdin? If more than 1, how should they be delimited?


obviously with a newline like every other unix tool. sort, uniq, grep etc.


But paths can contain newlines.


Most tools that deal with paths delimit them on newlines and support a -z or -0 flag to delimit with \0.


    $ echo ${PWD##*/}
    etc

use your shell to its full power


if you need to pipe it in, you should

    basename `echo $PWD`


uhm, what?

Can you give an example where I have a file (a.txt) of 100 file names and need the basename of each? I'm pretty sure dvh just used echo to show a simple example, not an actual use.


Another way:

  while read line ; do basename "$line" ; done < a.txt


Yeah, I think I prefer yours. Probably more robust wrt filenames-with-whitespace too.


Both solutions although nice and helpful pretty much illustrate why I ended up writing a filter capable version.


How about this?

    for f in $(cat a.txt); do basename "$f"; done


1. Broken by filenames that contain whitespace.

2. The shell will expand * and ? (maybe others) in the filenames


    xargs basename -a < a.txt


I take it -a is one of those GNU extensions?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: